On Mon, Aug 20, 2012 at 08:33:47AM +0200, Sigbjorn Lie wrote:
When I arrived into the office this morning our Nagios server was displaying a lot of
The "sssd_pam" process was consuming 100% CPU, and I was unable to log on to
the box as anything
else than root.
2310 root 20 0 219m 44m 2176 R 99.6 0.3 2883:27 sssd_pam
In the var/log/sssd/sssd_pam.log file, the following error message was repeated:
[sssd[pam]] [accept_fd_handler] (0x0020): Accept failed [Too many open files]
This being our Nagios server the maximum amount of concurrent open files has been
the default 1024 to 4096 for all users.
This is RHEL 6.3 with sssd-1.8.0-32.
What can I do to prevent this from happening in the future?
In SSSD 1.8 the limit of file descriptors was raised to either 8k or the
hard limit from limits.conf, whichever was lesser.
There is also a new option fd_limit that can be used to set the limit and
in cases where the SSSD has the CAP_SYS_RESOURCE capability, even override
the hard limit from limits.conf 
I'd like to ask for some more info to tell if the server was simply busy or
if we were really leaking a file descriptor:
Do you know how many files were open at the time? Were there many
concurrent logins happening to that server? Did you have a chance to run
lsof to check what file descriptors were open?
 I recall there was an SELinux-related problem in 6.3, but I can't
find the BZ number right now..