cc'ing the sssd users list.
rob
lejeczek via FreeIPA-users wrote:
Hi guys.
One of the masters started recently to find SSSD dead and says the killer is the WATCHDOG - but I'm not sure about that. From sssd.log: ... ********************** BACKTRACE DUMP ENDS HERE
(2022-07-21 7:11:01): [sssd] [svc_child_info] (0x0020): Child [991] ('pac':'pac') was terminated by own WATCHDOG * ... skipping repetitive backtrace ... (2022-07-21 7:11:14): [sssd] [svc_child_info] (0x0020): Child [984] ('abba.xx.priv.yy':'%BE_abba.xx.priv.yy') was terminated by own WATCHDOG * ... skipping repetitive backtrace ... (2022-07-21 7:11:14): [sssd] [svc_child_info] (0x0040): Child [9744] ('nss':'nss') exited with code [3] ********************** PREVIOUS MESSAGE WAS TRIGGERED BY THE FOLLOWING BACKTRACE: * (2022-07-21 7:11:14): [sssd] [sbus_dispatch_reconnect] (0x0400): Connection lost. Terminating active requests. * (2022-07-21 7:11:14): [sssd] [sbus_dispatch_reconnect] (0x4000): Remote client terminated the connection. Releasing data... * (2022-07-21 7:11:14): [sssd] [sbus_connection_free] (0x4000): Connection 0x5576314d9180 will be freed during next loop! * (2022-07-21 7:11:14): [sssd] [mt_svc_restart] (0x0400): Scheduling service abba.xx.priv.yy for restart 1 * (2022-07-21 7:11:14): [sssd] [get_provider_config] (0x0100): Formed command '/usr/libexec/sssd/sssd_be --domain abba.xx.priv.yy --uid 0 --gid 0 --logger=files' for provider '%BE_abba.xx.priv.yy' * (2022-07-21 7:11:14): [sssd] [start_service] (0x0100): Queueing service abba.xx.priv.yy for startup * (2022-07-21 7:11:14): [sssd] [mt_svc_exit_handler] (0x1000): SIGCHLD handler of service nss called * (2022-07-21 7:11:14): [sssd] [svc_child_info] (0x0040): Child [9744] ('nss':'nss') exited with code [3] ********************** BACKTRACE DUMP ENDS HERE
(2022-07-21 7:11:14): [sssd] [svc_child_info] (0x0040): Child [9758] ('pac':'pac') exited with code [3] * ... skipping repetitive backtrace ... (2022-07-21 7:11:16): [sssd] [svc_child_info] (0x0040): Child [9876] ('nss':'nss') exited with code [3] * ... skipping repetitive backtrace ... (2022-07-21 7:11:16): [sssd] [svc_child_info] (0x0040): Child [9877] ('pac':'pac') exited with code [3] * ... skipping repetitive backtrace ... (2022-07-21 7:11:20): [sssd] [svc_child_info] (0x0040): Child [9903] ('nss':'nss') exited with code [3] * ... skipping repetitive backtrace ... (2022-07-21 7:11:20): [sssd] [monitor_restart_service] (0x0010): Process [nss], definitely stopped! (2022-07-21 7:11:20): [sssd] [monitor_quit] (0x3f7c0): Returned with: 1 (2022-07-21 7:11:20): [sssd] [monitor_quit] (0x3f7c0): Terminating [pac][9904] (2022-07-21 7:11:21): [sssd] [monitor_quit] (0x3f7c0): Child [pac] terminated with a signal (2022-07-21 7:11:21): [sssd] [monitor_quit] (0x3f7c0): Terminating [abba.xx.priv.yy][9875] (2022-07-21 7:11:21): [sssd] [monitor_quit] (0x3f7c0): Child [abba.xx.priv.yy] exited gracefully (2022-07-21 7:11:21): [sssd] [monitor_quit] (0x3f7c0): Terminating [sudo][990] (2022-07-21 7:11:21): [sssd] [monitor_quit] (0x3f7c0): Child [sudo] exited gracefully (2022-07-21 7:11:21): [sssd] [monitor_quit] (0x3f7c0): Terminating [ssh][989] (2022-07-21 7:11:21): [sssd] [monitor_quit] (0x3f7c0): Child [ssh] exited gracefully (2022-07-21 7:11:21): [sssd] [monitor_quit] (0x3f7c0): Terminating [ifp][988] (2022-07-21 7:11:21): [sssd] [monitor_quit] (0x3f7c0): Child [ifp] exited gracefully (2022-07-21 7:11:21): [sssd] [monitor_quit] (0x3f7c0): Terminating [pam][987] (2022-07-21 7:11:21): [sssd] [monitor_quit] (0x3f7c0): Child [pam] exited gracefully (2022-07-21 7:11:21): [sssd] [monitor_quit] (0x3f7c0): Terminating [implicit_files][983] (2022-07-21 7:11:21): [sssd] [monitor_quit] (0x3f7c0): Child [implicit_files] exited gracefully
This "death" happens randomly, well, to me at least. Can be just after reboot or several hours of uptime. There is more in log files from /var/log/sssd but before I clutter emails with more logs snippets I was hoping some expert can share some thoughts.
many thanks, L. _______________________________________________ FreeIPA-users mailing list -- freeipa-users@lists.fedorahosted.org To unsubscribe send an email to freeipa-users-leave@lists.fedorahosted.org Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/ List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedorahosted.org/archives/list/freeipa-users@lists.fedorahoste...
Do not reply to spam on the list, report it: https://pagure.io/fedora-infrastructure
Hi,
(2022-07-21 7:11:14): [sssd] [svc_child_info] (0x0020): Child [984]
('abba.xx.priv.yy':'%BE_abba.xx.priv.yy') was terminated by own WATCHDOG -- this means corresponding process - `sssd_be --domain abba.xx.priv.yy` in this case - was blocked too long on 'something' (longer than 3*timeout - see `man sssd.conf`).
You need to figure out what this operation is. For this enable `debug_level = 9` in [$domain] section of sssd.conf and let this happen again. Then take the timestamp of '... was terminated by own WATCHDOG' message from sssd.log and spot the last operation before this timestamp in sssd_$domain.log.
On Thu, Jul 21, 2022 at 2:27 PM Rob Crittenden rcritten@redhat.com wrote:
cc'ing the sssd users list.
rob
lejeczek via FreeIPA-users wrote:
Hi guys.
One of the masters started recently to find SSSD dead and says the killer is the WATCHDOG - but I'm not sure about that. From sssd.log: ... ********************** BACKTRACE DUMP ENDS HERE
(2022-07-21 7:11:01): [sssd] [svc_child_info] (0x0020): Child [991] ('pac':'pac') was terminated by own WATCHDOG
- ... skipping repetitive backtrace ...
(2022-07-21 7:11:14): [sssd] [svc_child_info] (0x0020): Child [984] ('abba.xx.priv.yy':'%BE_abba.xx.priv.yy') was terminated by own WATCHDOG
- ... skipping repetitive backtrace ...
(2022-07-21 7:11:14): [sssd] [svc_child_info] (0x0040): Child [9744] ('nss':'nss') exited with code [3] ********************** PREVIOUS MESSAGE WAS TRIGGERED BY THE FOLLOWING BACKTRACE:
- (2022-07-21 7:11:14): [sssd] [sbus_dispatch_reconnect] (0x0400):
Connection lost. Terminating active requests.
- (2022-07-21 7:11:14): [sssd] [sbus_dispatch_reconnect] (0x4000):
Remote client terminated the connection. Releasing data...
- (2022-07-21 7:11:14): [sssd] [sbus_connection_free] (0x4000):
Connection 0x5576314d9180 will be freed during next loop!
- (2022-07-21 7:11:14): [sssd] [mt_svc_restart] (0x0400):
Scheduling service abba.xx.priv.yy for restart 1
- (2022-07-21 7:11:14): [sssd] [get_provider_config] (0x0100):
Formed command '/usr/libexec/sssd/sssd_be --domain abba.xx.priv.yy --uid 0 --gid 0 --logger=files' for provider '%BE_abba.xx.priv.yy'
- (2022-07-21 7:11:14): [sssd] [start_service] (0x0100): Queueing
service abba.xx.priv.yy for startup
- (2022-07-21 7:11:14): [sssd] [mt_svc_exit_handler] (0x1000):
SIGCHLD handler of service nss called
- (2022-07-21 7:11:14): [sssd] [svc_child_info] (0x0040): Child
[9744] ('nss':'nss') exited with code [3] ********************** BACKTRACE DUMP ENDS HERE
(2022-07-21 7:11:14): [sssd] [svc_child_info] (0x0040): Child [9758] ('pac':'pac') exited with code [3]
- ... skipping repetitive backtrace ...
(2022-07-21 7:11:16): [sssd] [svc_child_info] (0x0040): Child [9876] ('nss':'nss') exited with code [3]
- ... skipping repetitive backtrace ...
(2022-07-21 7:11:16): [sssd] [svc_child_info] (0x0040): Child [9877] ('pac':'pac') exited with code [3]
- ... skipping repetitive backtrace ...
(2022-07-21 7:11:20): [sssd] [svc_child_info] (0x0040): Child [9903] ('nss':'nss') exited with code [3]
- ... skipping repetitive backtrace ...
(2022-07-21 7:11:20): [sssd] [monitor_restart_service] (0x0010): Process [nss], definitely stopped! (2022-07-21 7:11:20): [sssd] [monitor_quit] (0x3f7c0): Returned with: 1 (2022-07-21 7:11:20): [sssd] [monitor_quit] (0x3f7c0): Terminating [pac][9904] (2022-07-21 7:11:21): [sssd] [monitor_quit] (0x3f7c0): Child [pac] terminated with a signal (2022-07-21 7:11:21): [sssd] [monitor_quit] (0x3f7c0): Terminating [abba.xx.priv.yy][9875] (2022-07-21 7:11:21): [sssd] [monitor_quit] (0x3f7c0): Child [abba.xx.priv.yy] exited gracefully (2022-07-21 7:11:21): [sssd] [monitor_quit] (0x3f7c0): Terminating [sudo][990] (2022-07-21 7:11:21): [sssd] [monitor_quit] (0x3f7c0): Child [sudo] exited gracefully (2022-07-21 7:11:21): [sssd] [monitor_quit] (0x3f7c0): Terminating [ssh][989] (2022-07-21 7:11:21): [sssd] [monitor_quit] (0x3f7c0): Child [ssh] exited gracefully (2022-07-21 7:11:21): [sssd] [monitor_quit] (0x3f7c0): Terminating [ifp][988] (2022-07-21 7:11:21): [sssd] [monitor_quit] (0x3f7c0): Child [ifp] exited gracefully (2022-07-21 7:11:21): [sssd] [monitor_quit] (0x3f7c0): Terminating [pam][987] (2022-07-21 7:11:21): [sssd] [monitor_quit] (0x3f7c0): Child [pam] exited gracefully (2022-07-21 7:11:21): [sssd] [monitor_quit] (0x3f7c0): Terminating [implicit_files][983] (2022-07-21 7:11:21): [sssd] [monitor_quit] (0x3f7c0): Child [implicit_files] exited gracefully
This "death" happens randomly, well, to me at least. Can be just after reboot or several hours of uptime. There is more in log files from /var/log/sssd but before I clutter emails with more logs snippets I was hoping some expert can share some thoughts.
many thanks, L. _______________________________________________ FreeIPA-users mailing list -- freeipa-users@lists.fedorahosted.org To unsubscribe send an email to
freeipa-users-leave@lists.fedorahosted.org
Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/ List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives:
https://lists.fedorahosted.org/archives/list/freeipa-users@lists.fedorahoste...
Do not reply to spam on the list, report it: https://pagure.io/fedora-infrastructure
sssd-users mailing list -- sssd-users@lists.fedorahosted.org To unsubscribe send an email to sssd-users-leave@lists.fedorahosted.org Fedora Code of Conduct:
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives:
https://lists.fedorahosted.org/archives/list/sssd-users@lists.fedorahosted.o...
Do not reply to spam on the list, report it:
sssd-users@lists.fedorahosted.org