Hi all,
I came back to ticket #3222 "sssd still showing ipa user after removed from last group" [1]. And I have new knowledge. But I still do not see the light at the end of the tunnel.
[1] https://fedorahosted.org/sssd/ticket/3222
I attached patch which enables some basic debug on using of memcache. And two reproducers (with and without memcache) which are based on reproducer written in ticket.
If we use memcache, the issue occurs only sometimes.
The difference between both cases is mixed state of switch after sss_nss_mc_getgrnam() call in _nss_sss_getgrnam_r() function.
Note: code says (for default case): /* if using the mmaped cache failed, * fall back to socket based comms */
Could anyone help, please?
The report is:
#--- WRONG
[root@mirach sssd]# date && getent group testgroup Wed Nov 9 16:01:05 CET 2016
[A] record not found (time[1478703665]) [B] record not found (time[1478703665])
testgroup:*:1703800674:
Number of members added 1
[root@mirach sssd]# sss_cache -UG && date && getent group testgroup Wed Nov 9 16:01:07 CET 2016
[A] record not found (time[1478703667]) [B] default (time[1478703667])
testgroup:*:1703800674:testuser
Number of members removed 1
[root@mirach sssd]# sss_cache -UG && date && getent group testgroup Wed Nov 9 16:01:09 CET 2016
mc record expires at [1478703967] | now [1478703669] [A] MC used (time[1478703669])
testgroup:*:1703800674:testuser
[root@mirach sssd]# grep '>>>' *.log sssd_nss.log:(Wed Nov 9 16:01:06 2016) [sssd[nss]] [sss_mmap_set_rec_header] (0x0010): >>> MC STORE expiration [1478703966] | now [1478703666] | delta [300] sssd_nss.log:(Wed Nov 9 16:01:06 2016) [sssd[nss]] [sss_mmap_cache_gr_store] (0x0010): >>> MC STORE [testgroup] [300] members [0] sssd_nss.log:(Wed Nov 9 16:01:07 2016) [sssd[nss]] [sss_mmap_set_rec_header] (0x0010): >>> MC STORE expiration [1478703967] | now [1478703667] | delta [300] sssd_nss.log:(Wed Nov 9 16:01:07 2016) [sssd[nss]] [sss_mmap_cache_gr_store] (0x0010): >>> MC STORE [testgroup] [300] members [1]
#--- RIGHT
[root@mirach sssd]# date && getent group testgroup Wed Nov 9 15:56:54 CET 2016
[A] record not found (time[1478703414]) [B] record not found (time[1478703414])
testgroup:*:1703800674:
Number of members added 1
[root@mirach sssd]# sss_cache -UG && date && getent group testgroup Wed Nov 9 15:56:56 CET 2016
[A] default (time[1478703416]) [B] default (time[1478703416])
testgroup:*:1703800674:testuser
Number of members removed 1
[root@mirach sssd]# sss_cache -UG && date && getent group testgroup Wed Nov 9 15:56:58 CET 2016
[A] record not found (time[1478703418]) [B] record not found (time[1478703418])
testgroup:*:1703800674:
[root@mirach sssd]# grep '>>>' *.log sssd_nss.log:(Wed Nov 9 15:56:54 2016) [sssd[nss]] [sss_mmap_set_rec_header] (0x0010): >>> MC STORE expiration [1478703714] | now [1478703414] | delta [300] sssd_nss.log:(Wed Nov 9 15:56:54 2016) [sssd[nss]] [sss_mmap_cache_gr_store] (0x0010): >>> MC STORE [testgroup] [300] members [0] sssd_nss.log:(Wed Nov 9 15:56:56 2016) [sssd[nss]] [sss_mmap_set_rec_header] (0x0010): >>> MC STORE expiration [1478703716] | now [1478703416] | delta [300] sssd_nss.log:(Wed Nov 9 15:56:56 2016) [sssd[nss]] [sss_mmap_cache_gr_store] (0x0010): >>> MC STORE [testgroup] [300] members [1] sssd_nss.log:(Wed Nov 9 15:56:58 2016) [sssd[nss]] [sss_mmap_set_rec_header] (0x0010): >>> MC STORE expiration [1478703718] | now [1478703418] | delta [300] sssd_nss.log:(Wed Nov 9 15:56:58 2016) [sssd[nss]] [sss_mmap_cache_gr_store] (0x0010): >>> MC STORE [testgroup] [300] members [0]
Regards
On Wed, Nov 09, 2016 at 04:44:12PM +0100, Petr Cech wrote:
Hi all,
I came back to ticket #3222 "sssd still showing ipa user after removed from last group" [1]. And I have new knowledge. But I still do not see the light at the end of the tunnel.
[1] https://fedorahosted.org/sssd/ticket/3222
I attached patch which enables some basic debug on using of memcache. And two reproducers (with and without memcache) which are based on reproducer written in ticket.
If we use memcache, the issue occurs only sometimes.
The difference between both cases is mixed state of switch after sss_nss_mc_getgrnam() call in _nss_sss_getgrnam_r() function.
Note: code says (for default case): /* if using the mmaped cache failed,
- fall back to socket based comms */
Could anyone help, please?
Hi Petr,
this really seems like an issue with memory cache. I'm not an expert there, unfortunately, so I wonder what Michal and Lukas think..
On (15/11/16 09:39), Jakub Hrozek wrote:
On Wed, Nov 09, 2016 at 04:44:12PM +0100, Petr Cech wrote:
Hi all,
I came back to ticket #3222 "sssd still showing ipa user after removed from last group" [1]. And I have new knowledge. But I still do not see the light at the end of the tunnel.
[1] https://fedorahosted.org/sssd/ticket/3222
I attached patch which enables some basic debug on using of memcache. And two reproducers (with and without memcache) which are based on reproducer written in ticket.
If we use memcache, the issue occurs only sometimes.
The difference between both cases is mixed state of switch after sss_nss_mc_getgrnam() call in _nss_sss_getgrnam_r() function.
Note: code says (for default case): /* if using the mmaped cache failed,
- fall back to socket based comms */
Could anyone help, please?
Hi Petr,
this really seems like an issue with memory cache. I'm not an expert there, unfortunately, so I wonder what Michal and Lukas think..
I cannot see any problem with memory cache here. The same data are stored to memory cache as are returned to client. If there is a bug then the bug is before storing data to memory cache. They might be caused by some race condition caused by different timeout for various entries in memory cache.
LS
On (15/11/16 10:03), Lukas Slebodnik wrote:
On (15/11/16 09:39), Jakub Hrozek wrote:
On Wed, Nov 09, 2016 at 04:44:12PM +0100, Petr Cech wrote:
Hi all,
I came back to ticket #3222 "sssd still showing ipa user after removed from last group" [1]. And I have new knowledge. But I still do not see the light at the end of the tunnel.
[1] https://fedorahosted.org/sssd/ticket/3222
I attached patch which enables some basic debug on using of memcache. And two reproducers (with and without memcache) which are based on reproducer written in ticket.
If we use memcache, the issue occurs only sometimes.
The difference between both cases is mixed state of switch after sss_nss_mc_getgrnam() call in _nss_sss_getgrnam_r() function.
Note: code says (for default case): /* if using the mmaped cache failed,
- fall back to socket based comms */
Could anyone help, please?
Hi Petr,
this really seems like an issue with memory cache. I'm not an expert there, unfortunately, so I wonder what Michal and Lukas think..
I cannot see any problem with memory cache here. The same data are stored to memory cache as are returned to client. If there is a bug then the bug is before storing data to memory cache. They might be caused by some race condition caused by different timeout for various entries in memory cache.
BTW, It looks like I can reproduce similar intermitent failures with downstream test (user-membership-list) as well.
The only problem is that I cannot reproduce it on el6.8 which has sssd-1.13.3-22. And memory cache was changed last time in sssd-1.13.2 (https://fedorahosted.org/sssd/ticket/2726)
The bug has to be on responder side.
LS
sssd-devel@lists.fedorahosted.org