[389-users] Re: replication problems

Wednesday, 22 April 2020

Mark,

On frame 9:

It's go until p *mod->mod_bvalues[20]

(gdb)  p *mod->mod_bvalues[21]
Cannot access memory at address 0x0

On frame 7:
It's go until p *replacevals[20]

(gdb) p *replacevals[21]
Cannot access memory at address 0x0

On frame 6:
(gdb) frame 6
#6  0x00007ffff7ada6fa in entry_delete_present_values_wsi_multi_valued
(e=0x7fff8401f500, type=0x7fff84012780 "memberOf", vals=0x0,
csn=0x7fff967fb340, urp=8, mod_op=2, replacevals=0x7fff840127c0)
    at ldap/servers/slapd/entrywsi.c:777
777            valueset_purge(a, &a->a_present_values, csn);
(gdb) print *a
$278 = {a_type = 0x7fff84022b30 "memberOf", a_present_values = {num = 21,
max = 32, sorted = 0x7fff84023ad0, va = 0x7fff84022b50}, a_flags = 4,
a_plugin = 0x6c7e80, a_deleted_values = {num = 0, max = 0,
    sorted = 0x0, va = 0x0}, a_listtofree = 0x0, a_next = 0x7fff84023c00,
a_deletioncsn = 0x7fff840247c0, a_mr_eq_plugin = 0x0, a_mr_ord_plugin =
0x0, a_mr_sub_plugin = 0x0}
(gdb) print *a->a_present_values
Structure has no component named operator*.
(gdb) print *a->a_present_values.va[0]

Thanks,

Alberto Viana

On Wed, Apr 22, 2020 at 4:57 PM Mark Reynolds <mreynolds(a)redhat.com&gt; wrote:

...
 Goto frame 9 and start printing the mod:

 (gdb) p *mod

 (gdb) print i

 (gdb) p *mod->mod_bvalues[0]

 (gdb) p *mod->mod_bvalues[1]

 ... Keep doing that unitl its NULL

 Then goto frame 7

 (gdb) p *replacevals

 (gdb) p *replacevals[0]

 (gdb) p *replacevals[1]

 --- Keeping doing this until its NULL

 Then goto frame 6

 (gdb) print *a

 (gdb) print *a->a_present_values

 (gdb) print *a->a_present_values.va[0]

 (gdb) print *a->a_present_values.va[1]

 --- Keeping doing this until its NULL

 Thanks,
 Mark

 On 4/22/20 3:43 PM, Alberto Viana wrote:

 Mark,

 Yes, I'm  in frame 3, and No, I do not know what modification is, sorry. I
 think thats what I'm  trying to find out, why one of the servers always
 crash if I enable the replication between 2 389.

 Maybe reconfigure my replication, enable debug log and see where stops?

 What else can I do?

 Thanks

 On Wed, Apr 22, 2020 at 4:34 PM Mark Reynolds <mreynolds(a)redhat.com&gt;
 wrote:

>
> On 4/22/20 3:27 PM, Alberto Viana wrote:
>
> Mark,
>
> Here's:
> (gdb) where
> #0  0x00007ffff455399f in raise () at /lib64/libc.so.6
> #1  0x00007ffff453dcf5 in abort () at /lib64/libc.so.6
> #2  0x00007ffff5430cd0 in PR_Assert () at /lib64/libnspr4.so
> #3  0x00007ffff7b71627 in slapi_valueset_done (vs=0x7fff8c022aa8) at
> ldap/servers/slapd/valueset.c:471
> #4  0x00007ffff7b72257 in valueset_array_purge (a=0x7fff8c022aa0,
> vs=0x7fff8c022aa8, csn=0x7fff977fd340) at ldap/servers/slapd/valueset.c:804
> #5  0x00007ffff7b723c5 in valueset_purge (a=0x7fff8c022aa0,
> vs=0x7fff8c022aa8, csn=0x7fff977fd340) at ldap/servers/slapd/valueset.c:834
> #6  0x00007ffff7ada6fa in entry_delete_present_values_wsi_multi_valued
> (e=0x7fff8c01f500, type=0x7fff8c012780 "memberOf", vals=0x0,
> csn=0x7fff977fd340, urp=8, mod_op=2, replacevals=0x7fff8c0127c0)
>     at ldap/servers/slapd/entrywsi.c:777
> #7  0x00007ffff7ada20d in entry_delete_present_values_wsi
> (e=0x7fff8c01f500, type=0x7fff8c012780 "memberOf", vals=0x0,
> csn=0x7fff977fd340, urp=8, mod_op=2, replacevals=0x7fff8c0127c0)
>     at ldap/servers/slapd/entrywsi.c:623
> #8  0x00007ffff7adaa7a in entry_replace_present_values_wsi
> (e=0x7fff8c01f500, type=0x7fff8c012780 "memberOf", vals=0x7fff8c0127c0,
> csn=0x7fff977fd340, urp=8) at ldap/servers/slapd/entrywsi.c:869
> #9  0x00007ffff7adabf1 in entry_apply_mod_wsi (e=0x7fff8c01f500,
> mod=0x7fff8c0127a0, csn=0x7fff977fd340, urp=8) at
> ldap/servers/slapd/entrywsi.c:903
> #10 0x00007ffff7adae52 in entry_apply_mods_wsi (e=0x7fff8c01f500,
> smods=0x7fff977fd3c0, csn=0x7fff8c012160, urp=8) at
> ldap/servers/slapd/entrywsi.c:973
> #11 0x00007fffead19364 in modify_apply_check_expand
>     (pb=0x7fff8c000b20, operation=0x814160, mods=0x7fff8c012750,
> e=0x7fff8c01bc90, ec=0x7fff8c01f480, postentry=0x7fff977fd4b0,
> ldap_result_code=0x7fff977fd434, ldap_result_message=0x7fff977fd4d8)
>     at ldap/servers/slapd/back-ldbm/ldbm_modify.c:247
> #12 0x00007fffead1a430 in ldbm_back_modify (pb=0x7fff8c000b20) at
> ldap/servers/slapd/back-ldbm/ldbm_modify.c:665
> #13 0x00007ffff7b0cd60 in op_shared_modify (pb=0x7fff8c000b20,
> pw_change=0, old_pw=0x0) at ldap/servers/slapd/modify.c:1021
> #14 0x00007ffff7b0b266 in do_modify (pb=0x7fff8c000b20) at
> ldap/servers/slapd/modify.c:380
> #15 0x000000000041592c in connection_dispatch_operation (conn=0x150e220,
> op=0x814160, pb=0x7fff8c000b20) at ldap/servers/slapd/connection.c:638
> #16 0x0000000000417a0e in connection_threadmain () at
> ldap/servers/slapd/connection.c:1767
> #17 0x00007ffff544a568 in _pt_root () at /lib64/libnspr4.so
> #18 0x00007ffff4de52de in start_thread () at /lib64/libpthread.so.0
> #19 0x00007ffff46184b3 in clone () at /lib64/libc.so.6
> (gdb) print *vs->sorted[0]
> Cannot access memory at address 0xffffffffffffffff
>
> Are you in the slapi_valueset_done frame?
>
> Do you know what the modify operation is doing?  It's something with
> memberOf, but if you knew the exact operation, and what the entry looks
> like prior to making that update, it would be very useful to us.
>
> Thanks,
> Mark
>
>
> Thanks,
>
> Alberto Viana
>
> On Wed, Apr 22, 2020 at 4:22 PM Mark Reynolds <mreynolds(a)redhat.com&gt;
> wrote:
>
>>
>> On 4/22/20 3:15 PM, Alberto Viana wrote:
>>
>> William,
>>
>> Here's:
>>
>> (gdb) frame 3
>> #3  0x00007ffff7b71627 in slapi_valueset_done (vs=0x7fff8c022aa8) at
>> ldap/servers/slapd/valueset.c:471
>> 471        PR_ASSERT((vs->sorted == NULL) || (vs->num <
>> VALUESET_ARRAY_SORT_THRESHOLD) || ((vs->num >=
>> VALUESET_ARRAY_SORT_THRESHOLD) && (vs->sorted[0] < vs->num)));
>> (gdb) print *vs
>> $1 = {num = 21, max = 32, sorted = 0x7fff8c023ad0, va = 0x7fff8c022b50}
>>
>> Can you also do a "print *vs->sorted[0]" ?
>>
>> And a "where" so we can see the full stack trace that leads up to this
>> assertion?
>>
>> Thanks,
>>
>> Mark
>>
>>
>>
>> Thanks,
>>
>> Alberto Viana
>>
>> On Sun, Apr 19, 2020 at 8:52 PM William Brown <wbrown(a)suse.de&gt; wrote:
>>
>>>
>>>
>>> > On 18 Apr 2020, at 02:55, Alberto Viana <albertocrj(a)gmail.com&gt;
wrote:
>>> >
>>> > Hi Guys,
>>> >
>>> > I build my own packages (from source), here's the info:
>>> > 389-ds-base-1.4.2.8-20200414gitfae920fc8.el8.x86_64.rpm
>>> > 389-ds-base-debuginfo-1.4.2.8-20200414gitfae920fc8.el8.x86_64.rpm
>>> > python3-lib389-1.4.2.8-20200414gitfae920fc8.el8.noarch.rpm
>>> >
>>> > I'm running in centos8.
>>> >
>>> > Here's what I could debug:
>>> > https://gist.github.com/albertocrj/4d74732e4e357fbc5a27296199127a62
>>> > https://gist.github.com/albertocrj/94fc3521024c7a508f1726923936e476
>>>
>>> So that assert seems to be:
>>>
>>> PR_ASSERT((vs->sorted == NULL) || (vs->num <
>>> VALUESET_ARRAY_SORT_THRESHOLD) || ((vs->num >=
>>> VALUESET_ARRAY_SORT_THRESHOLD) && (vs->sorted[0] <
vs->num)));
>>>
>>> But it's not clear which condition here is being violated.
>>>
>>> It looks like your catching this in GDB though, so can you go to:
>>>
>>> https://gist.github.com/albertocrj/4d74732e4e357fbc5a27296199127a62
>>>
>>> (gdb) frame 3
>>> (gdb) print *vs
>>>
>>> That would help to work out what condition is incorrectly being
>>> asserted here.
>>>
>>> Thanks!
>>>
>>>
>>> >
>>> >
>>> > Do you guys need something else?
>>> >
>>> > Thanks
>>> >
>>> > Alberto Viana
>>> >
>>> >
>>> >
>>> >
>>> > On Tue, Mar 31, 2020 at 8:03 PM William Brown <wbrown(a)suse.de&gt;
wrote:
>>> >
>>> >
>>> > > On 1 Apr 2020, at 05:18, Mark Reynolds
<mreynolds(a)redhat.com&gt;
>>> wrote:
>>> > >
>>> > >
>>> > > On 3/31/20 1:36 PM, Alberto Viana wrote:
>>> > >> Hey Guys,
>>> > >>
>>> > >> 389-Directory/1.4.2.8
>>> > >>
>>> > >> 389 (master) <=> 389 (master)
>>> > >>
>>> > >> In a master to master replication, start to see this error :
>>> > >> [31/Mar/2020:17:30:52.610637150 +0000] - WARN -
>>> NSMMReplicationPlugin - replica_check_for_data_reload - Disorderly shutdown
>>> for replica dc=rnp,dc=local. Check if DB RUV needs to be updated
>>> >
>>> > Also might be good to remind us what distro and packages you have
>>> 389-ds from?
>>> >
>>> > > Looks like the server is crashing which is why you see these
>>> disorderly shutdown messages. Please get a core file and take some stack
>>> traces from it:
>>> > >
>>> > >
>>> http://www.port389.org/docs/389ds/FAQ/faq.html#sts=Debugging%C2%A0Crashes
>>> > >
>>> > > Can you please provide the complete logs?  Also, you might want to
>>> try re-initializing the replication agreement instead of disabling and
>>> re-enabling replication (its less painful and it "might" solve the
issue).
>>> > >
>>> > > Mark
>>> > >
>>> > >>
>>> > >> Even after restart the service the problem persists, I have to
>>> disable and re-enable replication (and replication agr) on both sides, it
>>> works for some time, and the problem comes back.
>>> > >>
>>> > >> Any tips?
>>> > >>
>>> > >> Thanks
>>> > >>
>>> > >> Alberto Viana
>>> > >>
>>> > >>
>>> > >> _______________________________________________
>>> > >> 389-users mailing list --
>>> > >> 389-users(a)lists.fedoraproject.org
>>> > >>
>>> > >> To unsubscribe send an email to
>>> > >> 389-users-leave(a)lists.fedoraproject.org
>>> > >>
>>> > >> Fedora Code of Conduct:
>>> > >> https://docs.fedoraproject.org/en-US/project/code-of-conduct/
>>> > >>
>>> > >> List Guidelines:
>>> > >> https://fedoraproject.org/wiki/Mailing_list_guidelines
>>> > >>
>>> > >> List Archives:
>>> > >>
>>>
https://lists.fedoraproject.org/archives/list/389-users@lists.fedoraproje...
>>> > > --
>>> > >
>>> > > 389 Directory Server Development Team
>>> > >
>>> > > _______________________________________________
>>> > > 389-users mailing list -- 389-users(a)lists.fedoraproject.org
>>> > > To unsubscribe send an email to
>>> 389-users-leave(a)lists.fedoraproject.org
>>> > > Fedora Code of Conduct:
>>> https://docs.fedoraproject.org/en-US/project/code-of-conduct/
>>> > > List Guidelines:
>>> https://fedoraproject.org/wiki/Mailing_list_guidelines
>>> > > List Archives:
>>>
https://lists.fedoraproject.org/archives/list/389-users@lists.fedoraproje...
>>> >
>>> > —
>>> > Sincerely,
>>> >
>>> > William Brown
>>> >
>>> > Senior Software Engineer, 389 Directory Server
>>> > SUSE Labs
>>> >
>>>
>>> —
>>> Sincerely,
>>>
>>> William Brown
>>>
>>> Senior Software Engineer, 389 Directory Server
>>> SUSE Labs
>>>
>>>
>> _______________________________________________
>> 389-users mailing list -- 389-users(a)lists.fedoraproject.org
>> To unsubscribe send an email to 389-users-leave(a)lists.fedoraproject.org
>> Fedora Code of Conduct:
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
>> List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
>> List Archives:
https://lists.fedoraproject.org/archives/list/389-users@lists.fedoraproje...
>>
>> --
>>
>> 389 Directory Server Development Team
>>
>>
> _______________________________________________
> 389-users mailing list -- 389-users(a)lists.fedoraproject.org
> To unsubscribe send an email to 389-users-leave(a)lists.fedoraproject.org
> Fedora Code of Conduct:
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
> List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
> List Archives:
https://lists.fedoraproject.org/archives/list/389-users@lists.fedoraproje...
>
> --
>
> 389 Directory Server Development Team
>
> --

 389 Directory Server Development Team

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

[389-users] Re: replication problems