On Tue, Jun 13, 2017 at 06:21:28PM +0000, Joakim Tjernlund wrote:
On Tue, 2017-06-13 at 18:01 +0200, Jakub Hrozek wrote:
> On Tue, Jun 13, 2017 at 12:12:05PM +0000, Joakim Tjernlund wrote:
> > > It is now :) was in the wrong section before
> >
> > timeout = 30 in domain section SEEMS to help, no problem since yesterday.
> > What did I really do here?
>
> There is a ticket to document this better already but tl;dr there is a
> watchdog that, unless during three ticks of the 'timeout' value, an
> internal event is received that resets the watchdog, kills the process,
> because the process is presumed stuck.
>
> What happens when sssd writes so many entries to the cache is that the
> write operations blocks the event loop, prevents the delivery of the
> watchdog reset which results in killing of the process.
hmm, on a tmpfs 3*10 secs should be more that enough for that I think.
Also, the process(the domain process) was never dead but eating CPU instead.
well, I was not precise earlier, it doesn't have to be writes, but for
example the loop you showed checks if all members of a group are cached
already or not by searching each member in turn. That is not a write,
but can also block the process.