On (23/08/17 09:20), Lachlan Musicman wrote:
On 22 August 2017 at 20:40, Lukas Slebodnik
<lslebodn(a)redhat.com> wrote:
> >It is a bug in processing group hierarchy in sssd.
> >
> >It would be good if you could provide a minimal reproducer
> >because I expect you cannot dump whole directory server for us :-) :-) :-)
> >
> Another possible solution would be to enable debugging for ldb functions.
> So we might see also action and not only result.
>
> curl -O /usr/local/lib64/sss_ldb_debug.so
https://lslebodn.fedorapeople.
> org/sss_ldb_debug/sss_ldb_debug.so
> echo "LD_PRELOAD=/usr/local/lib64/sss_ldb_debug.so" >>
/etc/sysconfig/sssd
>
> * clear sssd cache and old sssd log files; rm -f /var/lib/sssd/db/*
> /var/log/sssd/*
> * increase debug_level in domain section
> * restart sssd
> * reproduce problem
>
> An provide sanitized log files. Feel free to send them privately
> if you do not want to send them to mailing list.
>
The problem with this - in the short term - is that we know clearing the
sssd cache will fix the problem, so it reproduction will need to wait until
the next person is added that this happens to.
I see. I though you can reproduce it more easily;
but it seems to depend on the order of requested data from sssd.
When I look at this problem, the biggest issue for us is that magic
combination of: despite all servers being identically set up (via Ansible
and Katello), this is the only server that it happens on; it doesn't happen
every time we add a user; and while we know clearing the cache works - we
could automate that as part of the add new user process - clearing cache
also kills working sessions on a busy server. Making it impractical and
inconvenient to the other users.
But I will add this debug.so for you and do these things and next time it
happens I'll slice the log files up for you?
Maybe it would be enough for troubleshooting just set LD_PRELOAD
+ restart sssd.
So if you have another machine where you can reproduce + sssd cache
was not purged then you can try to do it.
LS