On (18/04/15 03:27), Jean-Baptiste Denis wrote:
On 04/16/2015 12:31 PM, Jean-Baptiste Denis wrote:
>> No, it shouldn't be. The whole backend request should run and only then the
>> backend should signal to frontend to re-check the cache. That's why I was
>> suspecting the cleanup task, it's asynchronous.
I think I've got a test case without involving slurm. It is quite reproductible
on my machine. Since it looks like a race, you may need to tweak the parameter
of the python script.
The basic idea is to run a bunch of process and wait for a slight amount of time
before calling the initgroups libc function for a specific user
You have to log as root and not use sudo to prevent sssd cache to be populated
before the test is started. You also *need* to cleanup sssd state before running
## log as root
## check the number secondary group for a user using id for example
# id jbdenis
Here I've got 5 secondary groups (sis is my primary group)
## !! VERY IMPORTANT !! cleanup sssd state
# /etc/init.d/sssd stop && rm -f /var/lib/sss/mc/* /var/lib/sss/db/* &&
## run this program
# python initgroups.py jbdenis 110 5 24 200
wrong number of secondary groups in process 17145 : 0 instead of 5 (sleep 55ms)
wrong number of secondary groups in process 17149 : 0 instead of 5 (sleep 55ms)
# first parameter is a login
# second parameter is your primary gid (could be anything)
# third parameter is your number of secondary groups
# fourth parameter is the number of process you want to run concurrently
# the last parameter is the maximum delay in milliseconds before calling
initgroups (the delay is randomized up to this maximum)
I've got good results with 24 processes and randomized delay of 200ms between
startup. Those parameters are somewhat relative to the machine you're running
the script on I guess. You may have to run this test multiple time before
triggering the bug.
I'm unable to reproduce the bug when I use 0 delay and I think that why we could
reproduce it with our initial test case.
I really hope that you could reproduce the bug on your side.
Thank you for your help,
I tried to reproduce bug with your script but I was not successful.
Domain section from sssd.conf
id_provider = ldap
auth_provider = ldap
debug_level = 0xFFF0
ldap_uri = ldap://172.17.0.1
ldap_search_base = dc=example,dc=com
ldap_schema = rfc2307bis
ldap_group_object_class = groupOfNames
timeout = 600
ldap_pwd_policy = shadow
I tried different values for number of process and maximum delay in milliseconds
My laptop has 4 cores and "Intel(R) Core(TM) i7-3520M CPU @ 2.90GHz"
There have to be something different in my configuration.
Could you provide more information how to reproduce?