Am Mon, Jul 22, 2024 at 10:57:45PM +0300 schrieb Grigory Trenin:
Hi Sumit,
Yes, I'm running this version. "rpm -q --changelog" also shows that the fix is there:
$ rpm -q --changelog sssd | head -2
- Mon Jul 10 2023 Alexey Tikhonov atikhono@redhat.com - 2.8.2-3
- Resolves: rhbz#2219351 - [sssd] SSSD enters failed state after heavy
load in the system [rhel-8.8.0.z]
Yes, this bug looks similar... but it might be a different issue. In my logs I don't see any handshake_timeouts.
I can see that PAM and NSS responders tried to connect 3 times (because reconnection_retries=3 by default) to backend and then gave up:
(2024-07-21 9:32:13): [pam] [sbus_dbus_connect_address] (0x0020): Unable to connect to unix:path=/var/lib/sss/pipes/private/sbus-dp_company.com [org.freedesktop.DBus.Error.NoServer]: Failed to connect to socket /var/lib/sss/pipes/private/sbus-dp_company.com: Connection refused
- ... skipping repetitive backtrace ...
(2024-07-21 9:32:13): [pam] [sbus_reconnect_attempt] (0x0020): Unable to connect to D-Bus
- ... skipping repetitive backtrace ...
(2024-07-21 9:32:16): [pam] [sbus_dbus_connect_address] (0x0020): Unable to connect to unix:path=/var/lib/sss/pipes/private/sbus-dp_company.com [org.freedesktop.DBus.Error.NoServer]: Failed to connect to socket /var/lib/sss/pipes/private/sbus-dp_company.com: Connection refused
- ... skipping repetitive backtrace ...
(2024-07-21 9:32:16): [pam] [sbus_reconnect_attempt] (0x0020): Unable to connect to D-Bus
- ... skipping repetitive backtrace ...
(2024-07-21 9:32:27): [pam] [sbus_dbus_connect_address] (0x0020): Unable to connect to unix:path=/var/lib/sss/pipes/private/sbus-dp_company.com [org.freedesktop.DBus.Error.NoServer]: Failed to connect to socket /var/lib/sss/pipes/private/sbus-dp_company.com: Connection refused
- ... skipping repetitive backtrace ...
(2024-07-21 9:32:27): [pam] [sbus_reconnect_attempt] (0x0020): Unable to connect to D-Bus
- ... skipping repetitive backtrace ...
(2024-07-21 9:32:27): [pam] [sbus_reconnect] (0x0020): Unable to reconnect: maximum retries exceeded. (2024-07-21 9:32:27): [pam] [sss_dp_on_reconnect] (0x0010): Could not reconnect to company.com provider.
"lsof +E -aUc sssd" also shows that neither PAM nor NSS responders are connected to the other side of /var/lib/sss/pipes/private/sbus-dp_company.com Unix socket. If I kill sssd_nss process manually it reconnects to the socket just fine.
Am I right in my guess that responders try to connect only "reconnection_retries" times, and if not success, will not try to reconnect until the responder is restarted?
Hi,
yes, have you already tried to increase this value in your environment?
Please note that with SSSD-2.10 the way the communication is handled between the components changed and this option is obsolete. Hopefully the new way of communication will also make the issue you see go away.
bye, Sumit
Kind regards, Grigory Trenin -- _______________________________________________ sssd-users mailing list -- sssd-users@lists.fedorahosted.org To unsubscribe send an email to sssd-users-leave@lists.fedorahosted.org Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/ List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedorahosted.org/archives/list/sssd-users@lists.fedorahosted.o... Do not reply to spam, report it: https://pagure.io/fedora-infrastructure/new_issue