On Fri, Oct 15, 2021 at 3:54 PM Aitor Pazos <mail(a)aitorpazos.es> wrote:
Hi all,
Let me introduce the symptoms that triggered this investigation.
First, versions:
- OS: Ubuntu 20.04
- SSSD: 2.2.3-3ubuntu0.7
- Platform: x86_64
Some weeks after starting using a new region/provider we noticed some
intermittent issue impacting SSH logins (ie: clients got `Connection
closed by UNKNOWN port 65535` on `ssh ...`). When looking into it, I was
able to reproduce it somewhat reliably. It happens after some time of
inactivity for that user, but if I constantly issue commands (ie: `watch
sssctl user-checks <user>` or `watch ssh <host> whoami` from a client) the
issue wont trigger. If I stopped trying out those commands and tried again
~30 minutes later, it would fail.
With these symptoms I had these possibilities in mind: some caching issue,
some authentication expiring, connections pools issues...
I enabled detailed logging and compared logs from a successful request and
a failed one and I found what seems to be the source of the error from sssd
logs perspective, but still can't make much sense of it.
Log extracted from sssd_<domain>.log:
```
(Wed Oct 13 19:40:52 2021) [be[<domain>]] [ipa_hbac_rule_info_next]
(0x0400): Sending request for next search base: [cn=hbac,dc=ipa,dc=corp,<dc
<domain>>][2][(&(objectclass=ipaHBACRule)(ipaenabledflag=TRUE)(accessRuleType=allow)(|(hostCategory=all)(...)))]
(Wed Oct 13 19:40:52 2021) [be[<domain>]] [sdap_print_server] (0x2000):
Searching <ip10.1.10.102:389
(Wed Oct 13 19:40:52 2021) [be[<domain>]] [sdap_get_generic_ext_step]
(0x0400): calling ldap_search_ext with
[(&(objectclass=ipaHBACRule)(ipaenabledflag=TRUE)(accessRuleType=allow)(|(hostCategory=all)(...)][cn=hbac,dc=ipa,dc=corp,<dc
<domain>>].
(Wed Oct 13 19:40:52 2021) [be[<domain>]] [sdap_get_generic_ext_step]
(0x1000): Requesting attrs: [objectclass]
(Wed Oct 13 19:40:52 2021) [be[<domain>]] [sdap_get_generic_ext_step]
(0x1000): Requesting attrs: [cn]
(Wed Oct 13 19:40:52 2021) [be[<domain>]] [sdap_get_generic_ext_step]
(0x1000): Requesting attrs: [ipauniqueid]
(Wed Oct 13 19:40:52 2021) [be[<domain>]] [sdap_get_generic_ext_step]
(0x1000): Requesting attrs: [ipaenabledflag]
(Wed Oct 13 19:40:52 2021) [be[<domain>]] [sdap_get_generic_ext_step]
(0x1000): Requesting attrs: [accessRuleType]
(Wed Oct 13 19:40:52 2021) [be[<domain>]] [sdap_get_generic_ext_step]
(0x1000): Requesting attrs: [memberUser]
(Wed Oct 13 19:40:52 2021) [be[<domain>]] [sdap_get_generic_ext_step]
(0x1000): Requesting attrs: [userCategory]
(Wed Oct 13 19:40:52 2021) [be[<domain>]] [sdap_get_generic_ext_step]
(0x1000): Requesting attrs: [memberService]
(Wed Oct 13 19:40:52 2021) [be[<domain>]] [sdap_get_generic_ext_step]
(0x1000): Requesting attrs: [serviceCategory]
(Wed Oct 13 19:40:52 2021) [be[<domain>]] [sdap_get_generic_ext_step]
(0x1000): Requesting attrs: [sourceHost]
(Wed Oct 13 19:40:52 2021) [be[<domain>]] [sdap_get_generic_ext_step]
(0x1000): Requesting attrs: [sourceHostCategory]
(Wed Oct 13 19:40:52 2021) [be[<domain>]] [sdap_get_generic_ext_step]
(0x1000): Requesting attrs: [externalHost]
(Wed Oct 13 19:40:52 2021) [be[<domain>]] [sdap_get_generic_ext_step]
(0x1000): Requesting attrs: [memberHost]
(Wed Oct 13 19:40:52 2021) [be[<domain>]] [sdap_get_generic_ext_step]
(0x1000): Requesting attrs: [hostCategory]
(Wed Oct 13 19:40:52 2021) [be[<domain>]] [sdap_get_generic_ext_step]
(0x2000): ldap_search_ext called, msgid = 18
(Wed Oct 13 19:40:52 2021) [be[<domain>]] [sdap_op_add] (0x2000): New
operation 18 timeout 60
(Wed Oct 13 19:40:52 2021) [be[<domain>]] [sdap_process_result] (0x2000):
Trace: sh[0x55d349f96d00], connected[1], ops[0x55d349ff16e0],
ldap[0x55d34a10cba0]
(Wed Oct 13 19:40:52 2021) [be[<domain>]] [sdap_process_result] (0x2000):
Trace: end of ldap_result list
(Wed Oct 13 19:41:52 2021) [be[<domain>]] [sdap_op_timeout] (0x1000):
Issuing timeout for 18
(Wed Oct 13 19:41:52 2021) [be[<domain>]] [sdap_op_destructor] (0x1000):
Abandoning operation 18
(Wed Oct 13 19:41:52 2021) [be[<domain>]] [generic_ext_search_handler]
(0x0040): sdap_get_generic_ext_recv failed [110]: Connection timed out
(Wed Oct 13 19:41:52 2021) [be[<domain>]] [ipa_hbac_rule_info_done]
(0x0080): Could not retrieve HBAC rules
(Wed Oct 13 19:41:52 2021) [be[<domain>]] [ipa_pam_access_handler_done]
(0x0020): Unable to fetch HBAC rules [110]: Connection timed out
(Wed Oct 13 19:41:52 2021) [be[<domain>]] [dp_req_done] (0x0400): DP
Request [PAM Account #21]: Request handler finished [0]: Success
```
I tried using `ldap_enumeration_search_timeout = 180` which updated the
timeout in (`[sdap_op_add] (0x2000): New operation 18 timeout 60 `) but
same result (`enumerate` and `subdomain_enumerate` config options are left
to their defaults, `false` and `none` respectively).
I looked at the IPA servers LDAP logs and saw now related requests
regarding `ipaHBACRules` so I assume the request didn't even left the
server.
Hi,
Did you try to sniff network traffic (via tcpdump, for example) either on
client or[/and] on server?
I tried tweaking some other caching options like
`cached_auth_timeout` and
`pam_id_timeout` without any impact. Also tried clearing up the caches with
`sss_cache -E` or `systemctl stop sssd && rm /var/lib/sss/db/* &&
systemctl
start sssd` and nothing changed.
This is the error returned by `sssctl user-checks` when it fails:
```
$ sudo sssctl user-checks <user>
user: <user>
action: acct
service: system-auth
SSSD nss user lookup result:
- user name: <user>
- user id: <uid>
- group id: <gid>
- gecos: <name>
- home directory: /home/<user>
- shell: /bin/bash
InfoPipe operation failed. Check that SSSD is running and the InfoPipe
responder is enabled. Make sure 'ifp' is listed in the 'services' option
in
sssd.conf.InfoPipe User lookup with [<user>] failed.
testing pam_acct_mgmt
pam_acct_mgmt: System error
PAM Environment:
- no env -
```
Any pointer here would be very useful.
Thank you!
Aitor
_______________________________________________
sssd-users mailing list -- sssd-users(a)lists.fedorahosted.org
To unsubscribe send an email to sssd-users-leave(a)lists.fedorahosted.org
Fedora Code of Conduct:
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines:
https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives:
https://lists.fedorahosted.org/archives/list/sssd-users@lists.fedorahoste...
Do not reply to spam on the list, report it:
https://pagure.io/fedora-infrastructure