Am Mon, Jul 22, 2024 at 08:20:02PM +0300 schrieb Grigory Trenin:
Hello,
Sometimes SSSD is not recovered after being killed by own watchdog. This is a DEV environment, somewhat poorly monitored, OOM kills are common (but SSSD is not killed). RHEL8.8, sssd-2.8.2 with Active Directory.
- Under the memory pressure, the system becomes unresponsive, and
SSSD's own watchdog terminates sssd_be and restarts it. 2. Since the system is still operating very slowly, it takes quite a while to start sssd_be. From the output of “ps” command I can see that sssd_be start time is 09:32:13, but sssd_be’s initial message in its logs “Starting with debug level = 0x0070” is dated 9:34:02. So it took almost 2 minutes to start sssd_be. 3. Meanwhile, nss/pam responders tried to connect to the backend 3 times and gave up with a message “Unable to reconnect: maximum retries exceeded”. 4. OOM killer finally kills some process (not SSSD ones) and the system performance returns back to normal.
So we end up with SSSD up and running, but not functioning, because nss/pam responders will never try connecting to the backend again. And it was caused by its own watchdog. It looks like if watchdog hadn't killed sssd_be, it would have recovered after OOM killer killed the memory hog process.
I still cannot believe that SSSD cannot recover on its own from such a simple situation. To my mind it should try reconnecting to the backend every 60s or something like that.
Is it expected behaviour or am I missing something?
Hi,
this sounds like https://github.com/SSSD/sssd/issues/6803. This should be fixed for RHEL-8.8 with package version sssd-2.8.2-3.el8_8 form errata https://access.redhat.com/errata/RHBA-2023:4525. Are you using this version or an older one?
bye, Sumit
Kind regards, Grigory Trenin -- _______________________________________________ sssd-users mailing list -- sssd-users@lists.fedorahosted.org To unsubscribe send an email to sssd-users-leave@lists.fedorahosted.org Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/ List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedorahosted.org/archives/list/sssd-users@lists.fedorahosted.o... Do not reply to spam, report it: https://pagure.io/fedora-infrastructure/new_issue