On Thu, Feb 23, 2012 at 12:13:18 -0800, Iain Morgan wrote:
> On Wed, Feb 15, 2012 at 18:19:10 -0600, Rich Megginson wrote:
>> On 02/15/2012 03:51 PM, Iain Morgan wrote:
>>> On Wed, Feb 15, 2012 at 15:04:52 -0600, Rich Megginson wrote:
>>>> On 02/15/2012 01:56 PM, Iain Morgan wrote:
>>>>> On Tue, Feb 14, 2012 at 19:54:39 -0600, Rich Megginson wrote:
>>>>>> On 02/14/2012 06:37 PM, Iain Morgan wrote:
>>>>>>> Hello,
>>>>>>>
>>>>>>> On a fairly frequent basis, one of my 389 DS servers hangs
after certain
>>>>>>> CMP operations. Once this happens, the server cannot be
shutdown
>>>>>>> gracefully. This has been going on for several weeks, and I
have not yet
>>>>>>> found a solution.
>>>>>>>
>>>>>>> My setup consists of two systems running RHEL 6.2 with 389 DS
1.2.9.16.
>>>>>>> Multimaster replication is enabled between the two servers,
but the
>>>>>>> client systems (currently just two test systems)
preferrentially use the
>>>>>>> same server, ServerA. The second server, ServerB, is the one
which is
>>>>>>> experiencing the problem.
>>>>>>>
>>>>>>> We are using class-of-service entries to to set the values
for the
>>>>>>> shadowMax, shadowMin, and shadowWarning attributes. And we
are
>>>>>>> conditionally setting a pwdPolicySubentry attribute for some
entries in
>>>>>>> the same manner.
>>>>>>>
>>>>>>> If I execute an ldapcompare command, such as the following:
>>>>>>>
>>>>>>> # ldapcompare uid=imorgan,ou=People,dc=example,dc=com \
>>>>>>> pwdpolicysubentry:"cn=Special
Policy,ou=Policies,dc=example,dc=com"
>>>>>>>
>>>>>>> the command will occassionally hang. Most of the time, the
command
>>>>>>> succeeds and indicates that the attribute is not defined for
that entry.
>>>>>>> However, once or twice a day it will simply hang.
>>>>>>>
>>>>>>> The access log shows that the CMP request was received, but
no result is
>>>>>>> logged. After this occurs, the server will not shut down
gracefully. The
>>>>>>> init script fails to shut down the server and I end up having
to send a
>>>>>>> SIGKILL to ns-slapd.
>>>>>> When you get the hang, can you attach to the process with gdb?
>>>>>> ps -ef|grep ns-slapd
>>>>>> gdb /usr/sbin/ns-slapd pid-of-ns-slapd
>>>>>>> The error log does not report any issues.
>>>>>>>
>>>>>>> CMP operations against other attributes, such as loginShell,
do not seem
>>>>>>> to exhibit this problem. Also, the problem does not occur on
ServerA;
>>>>>>> only on ServerB. Once the CMP operation has hung, comparisons
against
>>>>>>> other attributes, even shadowMax, continue to work.
>>>>>>>
>>>>>>> As noted above, most of the time the CMP operation returns
normally.
>>>>>>> However, if I reinitialize ServerB from ServerA, the problem
occurs with
>>>>>>> the first CMP operation against ServerB.
>>>>>>>
>>>>>>> Both servers have the same set of RPMs and the dse.ldif on
both systems
>>>>>>> do not have any significant differences.
>>>>>>>
>>>>>>> Has anyone seen a similar issue? Any suggestions on how to
debug of fix
>>>>>>> this?
>>>>>>>
>>>>>>> A somewhat simplified and redacted version of the
class-of-service
>>>>>>> configuration is listed below.
>>>>>>>
>>>>>>> Thanks
>>>>> A gzip'd copy of the 'thread apply all bt full' output is
attached.
>>>>>
>>>> Thanks. Can you do this again after installing the
>>>> 389-ds-base-debuginfo package?
>>>> debuginfo-install 389-ds-base
>>> Ah, sorry about that. Here's the updated output.
>>>
>>>> Are you using Views?
>>>>
http://docs.redhat.com/docs/en-US/Red_Hat_Directory_Server/9.0/html/Admin...
>>> No.
>>>
>> Thanks! This looks like a symptom of
>>
https://fedorahosted.org/389/ticket/247 fixed in 1.2.10
> Hello Rich,
>
> Thanks, I upgraded both of the servers to 1.2.10.1. Unfortunately, it
> did not resolve the issue. I also noticed that if I run the same
> ldapcompare command after the first try fails, the server crashes. I
> can't say whether that is a change in the behaviour, but it is a new
> observation.
>
> I've attached gdb output for the case where the first ldapcompare is
> hanging. And, I've also attached the gdb analysis of the core dump.
>
> --
> Iain Morgan
I've tested 1.2.10.3 and can confirm that it addresses the segfault.
However, the hang (presumably a deadlock) has not gone away. I don't
seem to be able to update bug #305 now that it is closed, so I am
attaching the gdb backtrace of ns-slapd 1.2.10.3 during the server hang.