Francesco Fiore wrote:
>
>
> Rich Megginson wrote:
>> Francesco Fiore wrote:
>>
>>> Hi,
>>> I've two directory server in multimaster configuration. I've to
>>> reinitialize all databases on 2 nd server (B) using the data of the 1st (A).
>>> After the synchronization, server B crash with an segmentation fault.
>>> There isn't any relevant message in the error log.
>>> If I restart the directory server B, I've the same error.
>>> The directory server version is 1.1.3 on Redhat5.
>>>
>>>
>> rpm -qi fedora-ds-base
>>
>> 32-bit or 64-bit?
>>
>> We have fixed quite a few replication bugs since 1.1.3, including a
>> couple of crashes. I recommend upgrading to the latest.
>>
> # rpm -qi 389-ds-base
> Name : 389-ds-base Relocations: (not relocatable)
> Version : 1.2.4 Vendor: Fedora Project
> Release : 1.el5 Build Date: Tue 03 Nov
> 2009 04:47:39 PM CET
> Install Date: Fri 05 Feb 2010 11:49:11 AM CET Build Host:
>
x86-6.fedora.phx.redhat.com
> Group : System Environment/Daemons Source RPM:
> 389-ds-base-1.2.4-1.el5.src.rpm
> Size : 5339258 License: GPLv2 with
> exceptions
> Signature : DSA/SHA1, Fri 06 Nov 2009 05:17:38 PM CET, Key ID
> 119cc036217521f6
> Packager : Fedora Project
> URL :
http://port389.org/
> Summary : 389 Directory Server (base)
> Description :
>
> x86-64
>
> I updated to the last stable version but I've the same error.
> I traced the running process and I discovered that the segmentation
> fault is probably caused by futex system call. I attach the tail of
> the output of the strace command below.
>
> getpeername(6, 0x7fff8256e3a0, [1475252821577171056]) = -1 ENOTCONN
> (Transport endpoint is not connected)
> poll([{fd=42, events=POLLIN}, {fd=-1}, {fd=6, events=POLLIN},
> {fd=-1}, {fd=65, events=POLLIN}], 5, 250) = 1 ([{fd=65, revents=POLLIN}])
> futex(0x145f806c, FUTEX_WAKE_OP_PRIVATE, 1, 1, 0x145f8068,
> {FUTEX_OP_SET, 0, FUTEX_OP_CMP_GT, 1}) = 1
> futex(0x145d0850, FUTEX_WAKE_PRIVATE, 1) = 1
> getpeername(6, 0x7fff8256e3a0, [1475252821577171056]) = -1 ENOTCONN
> (Transport endpoint is not connected)
> poll([{fd=42, events=POLLIN}, {fd=-1}, {fd=6, events=POLLIN},
> {fd=-1}], 4, 250) = 1 ([{fd=42, revents=POLLIN}])
> read(42, "\0", 200) = 1
> getpeername(6, 0x7fff8256e3a0, [1475252821577171056]) = -1 ENOTCONN
> (Transport endpoint is not connected)
> poll([{fd=42, events=POLLIN}, {fd=-1}, {fd=6, events=POLLIN},
> {fd=-1}, {fd=64, events=POLLIN}], 5, 250) = 1 ([{fd=64, revents=POLLIN}])
> futex(0x145f806c, FUTEX_WAKE_OP_PRIVATE, 1, 1, 0x145f8068,
> {FUTEX_OP_SET, 0, FUTEX_OP_CMP_GT, 1}) = 1
> futex(0x14550730, FUTEX_WAKE_PRIVATE, 1 <unavailable ...>
> getpeername(6, 0x7fff8256e3a0, [1475252821577171056]) = -1 ENOTCONN
> (Transport endpoint is not connected)
> poll([{fd=42, events=POLLIN}, {fd=-1}, {fd=6, events=POLLIN},
> {fd=-1}, {fd=65, events=POLLIN}], 5, 250) = 1 ([{fd=65, revents=POLLIN}])
> futex(0x145f806c, FUTEX_WAKE_OP_PRIVATE, 1, 1, 0x145f8068,
> {FUTEX_OP_SET, 0, FUTEX_OP_CMP_GT, 1}) = 1
> futex(0x145d0850, FUTEX_WAKE_PRIVATE, 1) = 1
> getpeername(6, 0x7fff8256e3a0, [1475252821577171056]) = -1 ENOTCONN
> (Transport endpoint is not connected)
> poll([{fd=42, events=POLLIN}, {fd=-1}, {fd=6, events=POLLIN},
> {fd=-1}], 4, 250) = 1 ([{fd=42, revents=POLLIN}])
> read(42, "\0", 200) = 1
> getpeername(6, 0x7fff8256e3a0, [1475252821577171056]) = -1 ENOTCONN
> (Transport endpoint is not connected)
> poll([{fd=42, events=POLLIN}, {fd=-1}, {fd=6, events=POLLIN},
> {fd=-1}, {fd=64, events=POLLIN}], 5, 250) = 1 ([{fd=64, revents=POLLIN}])
> futex(0x145f806c, FUTEX_WAKE_OP_PRIVATE, 1, 1, 0x145f8068,
> {FUTEX_OP_SET, 0, FUTEX_OP_CMP_GT, 1}) = 1
> futex(0x14550730, FUTEX_WAKE_PRIVATE, 1 <unavailable ...>
I debugged the running process and gdb printed this stacktrace after
the segmentation fault:
Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x63b2b940 (LWP 31976)]
0x000000364fa79140 in strcmp () from /lib64/libc.so.6
(gdb) bt
#0 0x000000364fa79140 in strcmp () from /lib64/libc.so.6
#1 0x00002b188041e4fc in ?? () from
/usr/lib64/dirsrv/plugins/libback-ldbm.so
#2 0x00002b188041d8d9 in add_hash () from
/usr/lib64/dirsrv/plugins/libback-ldbm.so
#3 0x00002b188041df27 in ?? () from
/usr/lib64/dirsrv/plugins/libback-ldbm.so
#4 0x00002b188042c273 in id2entry () from
/usr/lib64/dirsrv/plugins/libback-ldbm.so
#5 0x00002b18804594c0 in uniqueid2entry () from
/usr/lib64/dirsrv/plugins/libback-ldbm.so
#6 0x00002b188042b961 in ?? () from
/usr/lib64/dirsrv/plugins/libback-ldbm.so
#7 0x00002b18804445fc in ldbm_back_delete () from
/usr/lib64/dirsrv/plugins/libback-ldbm.so
#8 0x00002b187c4990d4 in ?? () from /usr/lib64/dirsrv/libslapd.so.0
#9 0x00002b187c499413 in do_delete () from /usr/lib64/dirsrv/libslapd.so.0
#10 0x0000000000412e79 in sasl_map_config_add ()
#11 0x0000003590827fad in ?? () from /usr/lib64/libnspr4.so
#12 0x00000036506064a7 in start_thread () from /lib64/libpthread.so.0
#13 0x000000364fad3c2d in clone () from /lib64/libc.so.6
I hope that these information can be useful.
The stacktrace is really useful.
Thanks! If possible, could you
install the debuginfo package and take the stacktrace?
yum install 389-ds-base-debuginfo
--noriko
>
>>> I attach the tails of the error log and the /var/log/messages log.
>>>
>>> [03/Feb/2010:19:20:53 +0100] - import Addressbook2: Workers finished;
>>> cleaning up...
>>> [03/Feb/2010:19:21:13 +0100] - import Addressbook1: Workers finished;
>>> cleaning up...
>>> [03/Feb/2010:19:21:13 +0100] - import Addressbook2: Workers cleaned up.
>>> [03/Feb/2010:19:21:13 +0100] - import Addressbook2: Indexing complete.
>>> Post-processing...
>>> [03/Feb/2010:19:21:13 +0100] - import Addressbook1: Workers cleaned up.
>>> [03/Feb/2010:19:21:13 +0100] - import Addressbook1: Indexing complete.
>>> Post-processing...
>>> [03/Feb/2010:19:21:50 +0100] - import Addressbook2: Flushing caches...
>>> [03/Feb/2010:19:22:27 +0100] - import Addressbook1: Flushing caches...
>>> [03/Feb/2010:19:22:27 +0100] - import Addressbook2: Closing files...
>>> [03/Feb/2010:19:22:27 +0100] - import Addressbook1: Closing files...
>>> [03/Feb/2010:19:32:27 +0100] - import Addressbook2: Import complete.
>>> Processed 3820687 entries in 4957 seconds. (770.77 entries/sec)
>>> [03/Feb/2010:19:32:28 +0100] NSMMReplicationPlugin -
>>> multimaster_be_state_change: replica o=addressbook2 is coming online;
>>> enabling replication
>>> [03/Feb/2010:19:32:29 +0100] - import Addressbook1: Import complete.
>>> Processed 3820339 entries in 4960 seconds. (770.23 entries/sec)
>>> [03/Feb/2010:19:32:29 +0100] NSMMReplicationPlugin -
>>> multimaster_be_state_change: replica o=addressbook1 is coming online;
>>> enabling replication
>>> [03/Feb/2010:19:32:29 +0100] NSMMReplicationPlugin - replica_reload_ruv:
>>> Warning: new data for replica o=addressbook1 does not match the data in
>>> the changelog.
>>> Recreating the changelog file. This could affect replication with
>>> replica's consumers in which case the consumers should be
reinitialized.
>>>
>>> Feb 3 19:32:35 mmt-l-al19 kernel: ns-slapd[5575]: segfault at
>>> 0000000000000000 rip 000000364fa79140 rsp 0000000056bd3b18 error 4
>>>
>>> Have you any idea?
>>>
>>> Thanks
>>>
>>>
>>>
>>
>> --
>> 389 users mailing list
>> 389-users(a)lists.fedoraproject.org
>>
https://admin.fedoraproject.org/mailman/listinfo/389-users
>>
>
> --
> Francesco Fiore
> System Integrator
> Babel S.r.l. -http://www.babel.it
> P.zza S.Benedetto da Norcia, 33 - 00040 Pomezia (Roma)
>
>
> CONFIDENZIALE: Questo messaggio ed i suoi allegati sono di carattere
> confidenziale per i destinatari in indirizzo. Se hai ricevuto questo
> messaggio per errore sei invitato cortesemente a rispondere
> immediatamente al mittente e cancellare tutti i suoi contenuti.
>
> ------------------------------------------------------------------------
>
> --
> 389 users mailing list
> 389-users(a)lists.fedoraproject.org
>
https://admin.fedoraproject.org/mailman/listinfo/389-users
Thanks
--
Francesco Fiore
System Integrator
Babel S.r.l. -http://www.babel.it
P.zza S.Benedetto da Norcia, 33 - 00040 Pomezia (Roma)
CONFIDENZIALE: Questo messaggio ed i suoi allegati sono di carattere
confidenziale per i destinatari in indirizzo. Se hai ricevuto questo
messaggio per errore sei invitato cortesemente a rispondere
immediatamente al mittente e cancellare tutti i suoi contenuti.
--
389 users mailing list
389-users(a)lists.fedoraproject.org
https://admin.fedoraproject.org/mailman/listinfo/389-users