I think this means the frontend (responder) either checks too soon
the back end wrote incomplete data.
The responder is the sssd_nss process. When the getgrouplist()
arrives, the cache validity is checked. If the cache is empty or too
old, the sssd_nss process queries the sssd_be process to update the
cache. When the sssd_be process is done, it sends a dbus signal (over a
private unix socket, not the system bus) that the cache is up-to-date
Thank you for clarifying that up. It corresponds quite well the model we already
build ourself reading the different design documents on the wiki and your blog
post "Anatomy of SSSD user lookup".
We are not developer per see (and certainly not in C), but we've got a hard time
matching this model to the asynchronous nature of the code using tevent, SBus
and ldb. Try and miss is our best tool at the moment =)
I wonder if adding another sysdb_initgroups call into
sdap_get_initgr_recv() would verify when/if the groups were written?
Do you mean litteraly add a call to sysdb_initgroups within this function ?
Is that possible with the res parameter of the sdap_get_initgr_recv function ?
int sdap_get_initgr_recv(struct tevent_req *req)
I tried to write a simple program that just calls getgrouplist() in
concurrent threads to simulate your behaviour, but couldn't reproduce
We tried that with processes too, but without able to reproduce the problem either.
Our next step will be to mimic the flow of slurmd on job execution to build a
test case : the slurmd process (double- ?) forks a slurmstepd process that will
call the _initgroups slurm function within the _drop_privileges function :
That's our best call at the moment