On 07/16/2013 01:23 AM, Mitja Mihelič wrote:
On 07/15/2013 05:28 PM, Rich Megginson wrote:
> On 07/15/2013 02:57 AM, Mitja Mihelič wrote:
>> On 07/12/2013 05:55 PM, Rich Megginson wrote:
>>> On 07/12/2013 08:22 AM, Mitja Mihelič wrote:
>>>> On 07/09/2013 03:34 PM, Rich Megginson wrote:
>>>>> On 07/09/2013 06:43 AM, Mitja Mihelič wrote:
>>>>>> Hi!
>>>>>>
>>>>>> We are having problems with some our 389-DS instances. They
>>>>>> crash after receiving an update from the provider.
>>>>>
>>>>> After looking at the stack trace, I think this is
>>>>>
https://fedorahosted.org/389/ticket/47391
>> Yes, it looks like it might be it. When CONSUMER_ONE crashed for the
>> first time, the last thing replicated was a password change.
>> Do you perhaps know, where I could get a 389DS version for Centos6
>> that has the patch? The ticket says it was pushed to 1.2.11, but
>> would seem that our 1.2.11.15-14 is still an unpatched one and the
>> repositories do not have any newer versions.
>
> Is that the 389-ds-base that is included with CentOS6?
Yes, the 389-ds-base-1.2.11.15-14.el6_4.x86_64 and
389-ds-base-libs-1.2.11.15-14.el6_4.x86_64 are from the official
Centos6 updates repoository.
389-ds-base-debuginfo is from
http://debuginfo.centos.org/6/
The rest are from epel.
Looking at the stack trace you sent earlier - there is only 1 thread?
You ran
gdb -ex 'set confirm off' -ex 'set pagination off' -ex 'thread apply
all bt full' -ex 'quit' /usr/sbin/ns-slapd `pidof ns-slapd` >
stacktrace.`date +%s`.txt 2>&1
? If so, I have no idea what's going on - I've never seen the server deadlock
itself with only 1 thread . . .
>
>>>>>
>>>>>> The crash happened twice after about a week of running without
>>>>>> problems. The crashes happened on two consumer servers but not
>>>>>> at the same time.
>>>>>> The servers are running CentOS 6x with the following 389DS
>>>>>> packages installed:
>>>>>> 389-ds-console-doc-1.2.6-1.el6.noarch
>>>>>> 389-console-1.1.7-1.el6.noarch
>>>>>> 389-adminutil-1.1.15-1.el6.x86_64
>>>>>> 389-dsgw-1.1.10-1.el6.x86_64
>>>>>> 389-ds-base-debuginfo-1.2.11.15-14.el6_4.x86_64
>>>>>> 389-admin-1.1.29-1.el6.x86_64
>>>>>> 389-ds-console-1.2.6-1.el6.noarch
>>>>>> 389-admin-console-doc-1.1.8-1.el6.noarch
>>>>>> 389-ds-1.2.2-1.el6.noarch
>>>>>> 389-ds-base-1.2.11.15-14.el6_4.x86_64
>>>>>> 389-ds-base-libs-1.2.11.15-14.el6_4.x86_64
>>>>>> 389-admin-console-1.1.8-1.el6.noarch
>>>>>>
>>>>>> We are in the process of replacing the Centos 5x base
>>>>>> consumer+provider setup with a CentOS 6x base one. For the time
>>>>>> being, the CentOS 6 machines are acting as consumers for the old
>>>>>> server. They run for a while and then the replicated instances
>>>>>> crash though not at the same time.
>>>>>> One of the servers did not want to start after the crash,
>>>>>
>>>>> Can you provide the error messages from the errors log?
>>>> I have attached error logs from the provider
>>>> (2013-06-27-provider_error) and the consumer
>>>> (2013-06-27-server_two_error) in question.
>>>>>
>>>>>> so I have run db2index on its database. It's been running for
>>>>>> four days and it has still not finished.
>>>>>
>>>>> Try exporting using db2ldif, then importing using ldif2db.
>>>> The export process hangs. After an hour strace still shows:
>>>> futex(0x7f5822670ed4, FUTEX_WAIT, 1, NULL
>>>> The error log for this is attached as
>>>> 2013-07-10-server_two-ldif_import_hangs.
>>>
>>> Are you using db2ldif or db2ldif.pl? If you are using db2ldif, is
>>> the server running? If not, please try first shutting down the
>>> server and use db2ldif.
>>>
>>> If db2ldif still hangs, then please follow the instructions at
>>>
http://port389.org/wiki/FAQ#Debugging_Hangs to get a stack trace of
>>> the hung process.
>> I was using db2ldif with the server shut down. I tried it again and
>> it hung. The LDIF file was created but its size was zero. The
>> produced stack trace is attached as
>> server_two-db2ldif_hang-stacktrace.1373877200.txt.
>>
>>>
>>>>
>>>>>
>>>>>> All I get from db2index now are these outputs:
>>>>>> [09/Jul/2013:13:29:11 +0200] - reindex db: Processed 65095
>>>>>> entries (pass 1104) -- average rate 53686277.5/sec, recent rate
>>>>>> 0.0/sec, hit ratio 0%
>>>>>
>>>>> How many entries do you have in your database?
>>>> The number revolves around 65400. It varies perhaps 2 user del/add
>>>> operations a month and 20 attribute changes per week, if that.
>>>>>
>>>>>>
>>>>>> The other instance did start up, but the replication process did
>>>>>> not work anymore. I disabled the replication to this host and
>>>>>> set it up again. I chose "Initialize consumer now" and
the
>>>>>> consumer crashed every time.
>>>>>
>>>>> Can provide a stack trace of the core when the server crashes?
>>>>> This may be different than the stack trace below.
>>>> The last provided stack trace was produced at the last server
>>>> crash. I will provide another stack trace when CONSUMER_ONE
>>>> crashes again. Currently it refuses to crash at initialization
>>>> time and keeps running.
>>>>>
>>>>>> I have enabled full error logging and could find nothing.
>>>>>> I have read a few threads (not all, I admit) on this list and
>>>>>>
http://directory.fedoraproject.org/wiki/FAQ#Debugging_Crashes
>>>>>> and tried to troubleshoot.
>>>>>>
>>>>>> The crash produced the attached core dump and I could use your
>>>>>> help with understanding it. As well as any help with the crash.
>>>>>> If more info is needed I will gladly provide it.
>>>>>>
>>>>>> Regards, Mitja
>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> 389 users mailing list
>>>>>> 389-users(a)lists.fedoraproject.org
>>>>>>
https://admin.fedoraproject.org/mailman/listinfo/389-users
>>>>>
>>>>
>>>
>>
>