Hello,
On a fairly frequent basis, one of my 389 DS servers hangs after certain CMP operations. Once this happens, the server cannot be shutdown gracefully. This has been going on for several weeks, and I have not yet found a solution.
My setup consists of two systems running RHEL 6.2 with 389 DS 1.2.9.16. Multimaster replication is enabled between the two servers, but the client systems (currently just two test systems) preferrentially use the same server, ServerA. The second server, ServerB, is the one which is experiencing the problem.
We are using class-of-service entries to to set the values for the shadowMax, shadowMin, and shadowWarning attributes. And we are conditionally setting a pwdPolicySubentry attribute for some entries in the same manner.
If I execute an ldapcompare command, such as the following:
# ldapcompare uid=imorgan,ou=People,dc=example,dc=com \ pwdpolicysubentry:"cn=Special Policy,ou=Policies,dc=example,dc=com"
the command will occassionally hang. Most of the time, the command succeeds and indicates that the attribute is not defined for that entry. However, once or twice a day it will simply hang.
The access log shows that the CMP request was received, but no result is logged. After this occurs, the server will not shut down gracefully. The init script fails to shut down the server and I end up having to send a SIGKILL to ns-slapd.
The error log does not report any issues.
CMP operations against other attributes, such as loginShell, do not seem to exhibit this problem. Also, the problem does not occur on ServerA; only on ServerB. Once the CMP operation has hung, comparisons against other attributes, even shadowMax, continue to work.
As noted above, most of the time the CMP operation returns normally. However, if I reinitialize ServerB from ServerA, the problem occurs with the first CMP operation against ServerB.
Both servers have the same set of RPMs and the dse.ldif on both systems do not have any significant differences.
Has anyone seen a similar issue? Any suggestions on how to debug of fix this?
A somewhat simplified and redacted version of the class-of-service configuration is listed below.
Thanks
On 02/14/2012 06:37 PM, Iain Morgan wrote:
Hello,
On a fairly frequent basis, one of my 389 DS servers hangs after certain CMP operations. Once this happens, the server cannot be shutdown gracefully. This has been going on for several weeks, and I have not yet found a solution.
My setup consists of two systems running RHEL 6.2 with 389 DS 1.2.9.16. Multimaster replication is enabled between the two servers, but the client systems (currently just two test systems) preferrentially use the same server, ServerA. The second server, ServerB, is the one which is experiencing the problem.
We are using class-of-service entries to to set the values for the shadowMax, shadowMin, and shadowWarning attributes. And we are conditionally setting a pwdPolicySubentry attribute for some entries in the same manner.
If I execute an ldapcompare command, such as the following:
# ldapcompare uid=imorgan,ou=People,dc=example,dc=com \ pwdpolicysubentry:"cn=Special Policy,ou=Policies,dc=example,dc=com"
the command will occassionally hang. Most of the time, the command succeeds and indicates that the attribute is not defined for that entry. However, once or twice a day it will simply hang.
The access log shows that the CMP request was received, but no result is logged. After this occurs, the server will not shut down gracefully. The init script fails to shut down the server and I end up having to send a SIGKILL to ns-slapd.
When you get the hang, can you attach to the process with gdb? ps -ef|grep ns-slapd gdb /usr/sbin/ns-slapd pid-of-ns-slapd
The error log does not report any issues.
CMP operations against other attributes, such as loginShell, do not seem to exhibit this problem. Also, the problem does not occur on ServerA; only on ServerB. Once the CMP operation has hung, comparisons against other attributes, even shadowMax, continue to work.
As noted above, most of the time the CMP operation returns normally. However, if I reinitialize ServerB from ServerA, the problem occurs with the first CMP operation against ServerB.
Both servers have the same set of RPMs and the dse.ldif on both systems do not have any significant differences.
Has anyone seen a similar issue? Any suggestions on how to debug of fix this?
A somewhat simplified and redacted version of the class-of-service configuration is listed below.
Thanks
On Tue, Feb 14, 2012 at 19:54:39 -0600, Rich Megginson wrote:
On 02/14/2012 06:37 PM, Iain Morgan wrote:
Hello,
On a fairly frequent basis, one of my 389 DS servers hangs after certain CMP operations. Once this happens, the server cannot be shutdown gracefully. This has been going on for several weeks, and I have not yet found a solution.
My setup consists of two systems running RHEL 6.2 with 389 DS 1.2.9.16. Multimaster replication is enabled between the two servers, but the client systems (currently just two test systems) preferrentially use the same server, ServerA. The second server, ServerB, is the one which is experiencing the problem.
We are using class-of-service entries to to set the values for the shadowMax, shadowMin, and shadowWarning attributes. And we are conditionally setting a pwdPolicySubentry attribute for some entries in the same manner.
If I execute an ldapcompare command, such as the following:
# ldapcompare uid=imorgan,ou=People,dc=example,dc=com \ pwdpolicysubentry:"cn=Special Policy,ou=Policies,dc=example,dc=com"
the command will occassionally hang. Most of the time, the command succeeds and indicates that the attribute is not defined for that entry. However, once or twice a day it will simply hang.
The access log shows that the CMP request was received, but no result is logged. After this occurs, the server will not shut down gracefully. The init script fails to shut down the server and I end up having to send a SIGKILL to ns-slapd.
When you get the hang, can you attach to the process with gdb? ps -ef|grep ns-slapd gdb /usr/sbin/ns-slapd pid-of-ns-slapd
The error log does not report any issues.
CMP operations against other attributes, such as loginShell, do not seem to exhibit this problem. Also, the problem does not occur on ServerA; only on ServerB. Once the CMP operation has hung, comparisons against other attributes, even shadowMax, continue to work.
As noted above, most of the time the CMP operation returns normally. However, if I reinitialize ServerB from ServerA, the problem occurs with the first CMP operation against ServerB.
Both servers have the same set of RPMs and the dse.ldif on both systems do not have any significant differences.
Has anyone seen a similar issue? Any suggestions on how to debug of fix this?
A somewhat simplified and redacted version of the class-of-service configuration is listed below.
Thanks
A gzip'd copy of the 'thread apply all bt full' output is attached.
On 02/15/2012 01:56 PM, Iain Morgan wrote:
On Tue, Feb 14, 2012 at 19:54:39 -0600, Rich Megginson wrote:
On 02/14/2012 06:37 PM, Iain Morgan wrote:
Hello,
On a fairly frequent basis, one of my 389 DS servers hangs after certain CMP operations. Once this happens, the server cannot be shutdown gracefully. This has been going on for several weeks, and I have not yet found a solution.
My setup consists of two systems running RHEL 6.2 with 389 DS 1.2.9.16. Multimaster replication is enabled between the two servers, but the client systems (currently just two test systems) preferrentially use the same server, ServerA. The second server, ServerB, is the one which is experiencing the problem.
We are using class-of-service entries to to set the values for the shadowMax, shadowMin, and shadowWarning attributes. And we are conditionally setting a pwdPolicySubentry attribute for some entries in the same manner.
If I execute an ldapcompare command, such as the following:
# ldapcompare uid=imorgan,ou=People,dc=example,dc=com \ pwdpolicysubentry:"cn=Special Policy,ou=Policies,dc=example,dc=com"
the command will occassionally hang. Most of the time, the command succeeds and indicates that the attribute is not defined for that entry. However, once or twice a day it will simply hang.
The access log shows that the CMP request was received, but no result is logged. After this occurs, the server will not shut down gracefully. The init script fails to shut down the server and I end up having to send a SIGKILL to ns-slapd.
When you get the hang, can you attach to the process with gdb? ps -ef|grep ns-slapd gdb /usr/sbin/ns-slapd pid-of-ns-slapd
The error log does not report any issues.
CMP operations against other attributes, such as loginShell, do not seem to exhibit this problem. Also, the problem does not occur on ServerA; only on ServerB. Once the CMP operation has hung, comparisons against other attributes, even shadowMax, continue to work.
As noted above, most of the time the CMP operation returns normally. However, if I reinitialize ServerB from ServerA, the problem occurs with the first CMP operation against ServerB.
Both servers have the same set of RPMs and the dse.ldif on both systems do not have any significant differences.
Has anyone seen a similar issue? Any suggestions on how to debug of fix this?
A somewhat simplified and redacted version of the class-of-service configuration is listed below.
Thanks
A gzip'd copy of the 'thread apply all bt full' output is attached.
Thanks. Can you do this again after installing the 389-ds-base-debuginfo package? debuginfo-install 389-ds-base
Are you using Views? http://docs.redhat.com/docs/en-US/Red_Hat_Directory_Server/9.0/html/Administ...
On Wed, Feb 15, 2012 at 15:04:52 -0600, Rich Megginson wrote:
On 02/15/2012 01:56 PM, Iain Morgan wrote:
On Tue, Feb 14, 2012 at 19:54:39 -0600, Rich Megginson wrote:
On 02/14/2012 06:37 PM, Iain Morgan wrote:
Hello,
On a fairly frequent basis, one of my 389 DS servers hangs after certain CMP operations. Once this happens, the server cannot be shutdown gracefully. This has been going on for several weeks, and I have not yet found a solution.
My setup consists of two systems running RHEL 6.2 with 389 DS 1.2.9.16. Multimaster replication is enabled between the two servers, but the client systems (currently just two test systems) preferrentially use the same server, ServerA. The second server, ServerB, is the one which is experiencing the problem.
We are using class-of-service entries to to set the values for the shadowMax, shadowMin, and shadowWarning attributes. And we are conditionally setting a pwdPolicySubentry attribute for some entries in the same manner.
If I execute an ldapcompare command, such as the following:
# ldapcompare uid=imorgan,ou=People,dc=example,dc=com \ pwdpolicysubentry:"cn=Special Policy,ou=Policies,dc=example,dc=com"
the command will occassionally hang. Most of the time, the command succeeds and indicates that the attribute is not defined for that entry. However, once or twice a day it will simply hang.
The access log shows that the CMP request was received, but no result is logged. After this occurs, the server will not shut down gracefully. The init script fails to shut down the server and I end up having to send a SIGKILL to ns-slapd.
When you get the hang, can you attach to the process with gdb? ps -ef|grep ns-slapd gdb /usr/sbin/ns-slapd pid-of-ns-slapd
The error log does not report any issues.
CMP operations against other attributes, such as loginShell, do not seem to exhibit this problem. Also, the problem does not occur on ServerA; only on ServerB. Once the CMP operation has hung, comparisons against other attributes, even shadowMax, continue to work.
As noted above, most of the time the CMP operation returns normally. However, if I reinitialize ServerB from ServerA, the problem occurs with the first CMP operation against ServerB.
Both servers have the same set of RPMs and the dse.ldif on both systems do not have any significant differences.
Has anyone seen a similar issue? Any suggestions on how to debug of fix this?
A somewhat simplified and redacted version of the class-of-service configuration is listed below.
Thanks
A gzip'd copy of the 'thread apply all bt full' output is attached.
Thanks. Can you do this again after installing the 389-ds-base-debuginfo package? debuginfo-install 389-ds-base
Ah, sorry about that. Here's the updated output.
Are you using Views? http://docs.redhat.com/docs/en-US/Red_Hat_Directory_Server/9.0/html/Administ...
No.
On 02/15/2012 03:51 PM, Iain Morgan wrote:
On Wed, Feb 15, 2012 at 15:04:52 -0600, Rich Megginson wrote:
On 02/15/2012 01:56 PM, Iain Morgan wrote:
On Tue, Feb 14, 2012 at 19:54:39 -0600, Rich Megginson wrote:
On 02/14/2012 06:37 PM, Iain Morgan wrote:
Hello,
On a fairly frequent basis, one of my 389 DS servers hangs after certain CMP operations. Once this happens, the server cannot be shutdown gracefully. This has been going on for several weeks, and I have not yet found a solution.
My setup consists of two systems running RHEL 6.2 with 389 DS 1.2.9.16. Multimaster replication is enabled between the two servers, but the client systems (currently just two test systems) preferrentially use the same server, ServerA. The second server, ServerB, is the one which is experiencing the problem.
We are using class-of-service entries to to set the values for the shadowMax, shadowMin, and shadowWarning attributes. And we are conditionally setting a pwdPolicySubentry attribute for some entries in the same manner.
If I execute an ldapcompare command, such as the following:
# ldapcompare uid=imorgan,ou=People,dc=example,dc=com \ pwdpolicysubentry:"cn=Special Policy,ou=Policies,dc=example,dc=com"
the command will occassionally hang. Most of the time, the command succeeds and indicates that the attribute is not defined for that entry. However, once or twice a day it will simply hang.
The access log shows that the CMP request was received, but no result is logged. After this occurs, the server will not shut down gracefully. The init script fails to shut down the server and I end up having to send a SIGKILL to ns-slapd.
When you get the hang, can you attach to the process with gdb? ps -ef|grep ns-slapd gdb /usr/sbin/ns-slapd pid-of-ns-slapd
The error log does not report any issues.
CMP operations against other attributes, such as loginShell, do not seem to exhibit this problem. Also, the problem does not occur on ServerA; only on ServerB. Once the CMP operation has hung, comparisons against other attributes, even shadowMax, continue to work.
As noted above, most of the time the CMP operation returns normally. However, if I reinitialize ServerB from ServerA, the problem occurs with the first CMP operation against ServerB.
Both servers have the same set of RPMs and the dse.ldif on both systems do not have any significant differences.
Has anyone seen a similar issue? Any suggestions on how to debug of fix this?
A somewhat simplified and redacted version of the class-of-service configuration is listed below.
Thanks
A gzip'd copy of the 'thread apply all bt full' output is attached.
Thanks. Can you do this again after installing the 389-ds-base-debuginfo package? debuginfo-install 389-ds-base
Ah, sorry about that. Here's the updated output.
Are you using Views? http://docs.redhat.com/docs/en-US/Red_Hat_Directory_Server/9.0/html/Administ...
No.
Thanks! This looks like a symptom of https://fedorahosted.org/389/ticket/247 fixed in 1.2.10
On Wed, Feb 15, 2012 at 18:19:10 -0600, Rich Megginson wrote:
On 02/15/2012 03:51 PM, Iain Morgan wrote:
On Wed, Feb 15, 2012 at 15:04:52 -0600, Rich Megginson wrote:
On 02/15/2012 01:56 PM, Iain Morgan wrote:
On Tue, Feb 14, 2012 at 19:54:39 -0600, Rich Megginson wrote:
On 02/14/2012 06:37 PM, Iain Morgan wrote:
Hello,
On a fairly frequent basis, one of my 389 DS servers hangs after certain CMP operations. Once this happens, the server cannot be shutdown gracefully. This has been going on for several weeks, and I have not yet found a solution.
My setup consists of two systems running RHEL 6.2 with 389 DS 1.2.9.16. Multimaster replication is enabled between the two servers, but the client systems (currently just two test systems) preferrentially use the same server, ServerA. The second server, ServerB, is the one which is experiencing the problem.
We are using class-of-service entries to to set the values for the shadowMax, shadowMin, and shadowWarning attributes. And we are conditionally setting a pwdPolicySubentry attribute for some entries in the same manner.
If I execute an ldapcompare command, such as the following:
# ldapcompare uid=imorgan,ou=People,dc=example,dc=com \ pwdpolicysubentry:"cn=Special Policy,ou=Policies,dc=example,dc=com"
the command will occassionally hang. Most of the time, the command succeeds and indicates that the attribute is not defined for that entry. However, once or twice a day it will simply hang.
The access log shows that the CMP request was received, but no result is logged. After this occurs, the server will not shut down gracefully. The init script fails to shut down the server and I end up having to send a SIGKILL to ns-slapd.
When you get the hang, can you attach to the process with gdb? ps -ef|grep ns-slapd gdb /usr/sbin/ns-slapd pid-of-ns-slapd
The error log does not report any issues.
CMP operations against other attributes, such as loginShell, do not seem to exhibit this problem. Also, the problem does not occur on ServerA; only on ServerB. Once the CMP operation has hung, comparisons against other attributes, even shadowMax, continue to work.
As noted above, most of the time the CMP operation returns normally. However, if I reinitialize ServerB from ServerA, the problem occurs with the first CMP operation against ServerB.
Both servers have the same set of RPMs and the dse.ldif on both systems do not have any significant differences.
Has anyone seen a similar issue? Any suggestions on how to debug of fix this?
A somewhat simplified and redacted version of the class-of-service configuration is listed below.
Thanks
A gzip'd copy of the 'thread apply all bt full' output is attached.
Thanks. Can you do this again after installing the 389-ds-base-debuginfo package? debuginfo-install 389-ds-base
Ah, sorry about that. Here's the updated output.
Are you using Views? http://docs.redhat.com/docs/en-US/Red_Hat_Directory_Server/9.0/html/Administ...
No.
Thanks! This looks like a symptom of https://fedorahosted.org/389/ticket/247 fixed in 1.2.10
Hello Rich,
Thanks, I upgraded both of the servers to 1.2.10.1. Unfortunately, it did not resolve the issue. I also noticed that if I run the same ldapcompare command after the first try fails, the server crashes. I can't say whether that is a change in the behaviour, but it is a new observation.
I've attached gdb output for the case where the first ldapcompare is hanging. And, I've also attached the gdb analysis of the core dump.
On 02/23/2012 01:13 PM, Iain Morgan wrote:
On Wed, Feb 15, 2012 at 18:19:10 -0600, Rich Megginson wrote:
On 02/15/2012 03:51 PM, Iain Morgan wrote:
On Wed, Feb 15, 2012 at 15:04:52 -0600, Rich Megginson wrote:
On 02/15/2012 01:56 PM, Iain Morgan wrote:
On Tue, Feb 14, 2012 at 19:54:39 -0600, Rich Megginson wrote:
On 02/14/2012 06:37 PM, Iain Morgan wrote: > Hello, > > On a fairly frequent basis, one of my 389 DS servers hangs after certain > CMP operations. Once this happens, the server cannot be shutdown > gracefully. This has been going on for several weeks, and I have not yet > found a solution. > > My setup consists of two systems running RHEL 6.2 with 389 DS 1.2.9.16. > Multimaster replication is enabled between the two servers, but the > client systems (currently just two test systems) preferrentially use the > same server, ServerA. The second server, ServerB, is the one which is > experiencing the problem. > > We are using class-of-service entries to to set the values for the > shadowMax, shadowMin, and shadowWarning attributes. And we are > conditionally setting a pwdPolicySubentry attribute for some entries in > the same manner. > > If I execute an ldapcompare command, such as the following: > > # ldapcompare uid=imorgan,ou=People,dc=example,dc=com \ > pwdpolicysubentry:"cn=Special Policy,ou=Policies,dc=example,dc=com" > > the command will occassionally hang. Most of the time, the command > succeeds and indicates that the attribute is not defined for that entry. > However, once or twice a day it will simply hang. > > The access log shows that the CMP request was received, but no result is > logged. After this occurs, the server will not shut down gracefully. The > init script fails to shut down the server and I end up having to send a > SIGKILL to ns-slapd. When you get the hang, can you attach to the process with gdb? ps -ef|grep ns-slapd gdb /usr/sbin/ns-slapd pid-of-ns-slapd > The error log does not report any issues. > > CMP operations against other attributes, such as loginShell, do not seem > to exhibit this problem. Also, the problem does not occur on ServerA; > only on ServerB. Once the CMP operation has hung, comparisons against > other attributes, even shadowMax, continue to work. > > As noted above, most of the time the CMP operation returns normally. > However, if I reinitialize ServerB from ServerA, the problem occurs with > the first CMP operation against ServerB. > > Both servers have the same set of RPMs and the dse.ldif on both systems > do not have any significant differences. > > Has anyone seen a similar issue? Any suggestions on how to debug of fix > this? > > A somewhat simplified and redacted version of the class-of-service > configuration is listed below. > > Thanks
A gzip'd copy of the 'thread apply all bt full' output is attached.
Thanks. Can you do this again after installing the 389-ds-base-debuginfo package? debuginfo-install 389-ds-base
Ah, sorry about that. Here's the updated output.
Are you using Views? http://docs.redhat.com/docs/en-US/Red_Hat_Directory_Server/9.0/html/Administ...
No.
Thanks! This looks like a symptom of https://fedorahosted.org/389/ticket/247 fixed in 1.2.10
Hello Rich,
Thanks, I upgraded both of the servers to 1.2.10.1. Unfortunately, it did not resolve the issue. I also noticed that if I run the same ldapcompare command after the first try fails,
fails? hangs?
the server crashes. I can't say whether that is a change in the behaviour, but it is a new observation.
So with 1.2.10, in addition to the hang, you also get a crash?
Please file a ticket at https://fedorahosted.org/389 along with your configuration and steps to reproduce.
I've attached gdb output for the case where the first ldapcompare is hanging. And, I've also attached the gdb analysis of the core dump.
On Thu, Feb 23, 2012 at 12:13:18 -0800, Iain Morgan wrote:
On Wed, Feb 15, 2012 at 18:19:10 -0600, Rich Megginson wrote:
On 02/15/2012 03:51 PM, Iain Morgan wrote:
On Wed, Feb 15, 2012 at 15:04:52 -0600, Rich Megginson wrote:
On 02/15/2012 01:56 PM, Iain Morgan wrote:
On Tue, Feb 14, 2012 at 19:54:39 -0600, Rich Megginson wrote:
On 02/14/2012 06:37 PM, Iain Morgan wrote: > Hello, > > On a fairly frequent basis, one of my 389 DS servers hangs after certain > CMP operations. Once this happens, the server cannot be shutdown > gracefully. This has been going on for several weeks, and I have not yet > found a solution. > > My setup consists of two systems running RHEL 6.2 with 389 DS 1.2.9.16. > Multimaster replication is enabled between the two servers, but the > client systems (currently just two test systems) preferrentially use the > same server, ServerA. The second server, ServerB, is the one which is > experiencing the problem. > > We are using class-of-service entries to to set the values for the > shadowMax, shadowMin, and shadowWarning attributes. And we are > conditionally setting a pwdPolicySubentry attribute for some entries in > the same manner. > > If I execute an ldapcompare command, such as the following: > > # ldapcompare uid=imorgan,ou=People,dc=example,dc=com \ > pwdpolicysubentry:"cn=Special Policy,ou=Policies,dc=example,dc=com" > > the command will occassionally hang. Most of the time, the command > succeeds and indicates that the attribute is not defined for that entry. > However, once or twice a day it will simply hang. > > The access log shows that the CMP request was received, but no result is > logged. After this occurs, the server will not shut down gracefully. The > init script fails to shut down the server and I end up having to send a > SIGKILL to ns-slapd. When you get the hang, can you attach to the process with gdb? ps -ef|grep ns-slapd gdb /usr/sbin/ns-slapd pid-of-ns-slapd > The error log does not report any issues. > > CMP operations against other attributes, such as loginShell, do not seem > to exhibit this problem. Also, the problem does not occur on ServerA; > only on ServerB. Once the CMP operation has hung, comparisons against > other attributes, even shadowMax, continue to work. > > As noted above, most of the time the CMP operation returns normally. > However, if I reinitialize ServerB from ServerA, the problem occurs with > the first CMP operation against ServerB. > > Both servers have the same set of RPMs and the dse.ldif on both systems > do not have any significant differences. > > Has anyone seen a similar issue? Any suggestions on how to debug of fix > this? > > A somewhat simplified and redacted version of the class-of-service > configuration is listed below. > > Thanks
A gzip'd copy of the 'thread apply all bt full' output is attached.
Thanks. Can you do this again after installing the 389-ds-base-debuginfo package? debuginfo-install 389-ds-base
Ah, sorry about that. Here's the updated output.
Are you using Views? http://docs.redhat.com/docs/en-US/Red_Hat_Directory_Server/9.0/html/Administ...
No.
Thanks! This looks like a symptom of https://fedorahosted.org/389/ticket/247 fixed in 1.2.10
Hello Rich,
Thanks, I upgraded both of the servers to 1.2.10.1. Unfortunately, it did not resolve the issue. I also noticed that if I run the same ldapcompare command after the first try fails, the server crashes. I can't say whether that is a change in the behaviour, but it is a new observation.
I've attached gdb output for the case where the first ldapcompare is hanging. And, I've also attached the gdb analysis of the core dump.
-- Iain Morgan
I've tested 1.2.10.3 and can confirm that it addresses the segfault. However, the hang (presumably a deadlock) has not gone away. I don't seem to be able to update bug #305 now that it is closed, so I am attaching the gdb backtrace of ns-slapd 1.2.10.3 during the server hang.
On 03/12/2012 05:40 PM, Iain Morgan wrote:
On Thu, Feb 23, 2012 at 12:13:18 -0800, Iain Morgan wrote:
On Wed, Feb 15, 2012 at 18:19:10 -0600, Rich Megginson wrote:
On 02/15/2012 03:51 PM, Iain Morgan wrote:
On Wed, Feb 15, 2012 at 15:04:52 -0600, Rich Megginson wrote:
On 02/15/2012 01:56 PM, Iain Morgan wrote:
On Tue, Feb 14, 2012 at 19:54:39 -0600, Rich Megginson wrote: > On 02/14/2012 06:37 PM, Iain Morgan wrote: >> Hello, >> >> On a fairly frequent basis, one of my 389 DS servers hangs after certain >> CMP operations. Once this happens, the server cannot be shutdown >> gracefully. This has been going on for several weeks, and I have not yet >> found a solution. >> >> My setup consists of two systems running RHEL 6.2 with 389 DS 1.2.9.16. >> Multimaster replication is enabled between the two servers, but the >> client systems (currently just two test systems) preferrentially use the >> same server, ServerA. The second server, ServerB, is the one which is >> experiencing the problem. >> >> We are using class-of-service entries to to set the values for the >> shadowMax, shadowMin, and shadowWarning attributes. And we are >> conditionally setting a pwdPolicySubentry attribute for some entries in >> the same manner. >> >> If I execute an ldapcompare command, such as the following: >> >> # ldapcompare uid=imorgan,ou=People,dc=example,dc=com \ >> pwdpolicysubentry:"cn=Special Policy,ou=Policies,dc=example,dc=com" >> >> the command will occassionally hang. Most of the time, the command >> succeeds and indicates that the attribute is not defined for that entry. >> However, once or twice a day it will simply hang. >> >> The access log shows that the CMP request was received, but no result is >> logged. After this occurs, the server will not shut down gracefully. The >> init script fails to shut down the server and I end up having to send a >> SIGKILL to ns-slapd. > When you get the hang, can you attach to the process with gdb? > ps -ef|grep ns-slapd > gdb /usr/sbin/ns-slapd pid-of-ns-slapd >> The error log does not report any issues. >> >> CMP operations against other attributes, such as loginShell, do not seem >> to exhibit this problem. Also, the problem does not occur on ServerA; >> only on ServerB. Once the CMP operation has hung, comparisons against >> other attributes, even shadowMax, continue to work. >> >> As noted above, most of the time the CMP operation returns normally. >> However, if I reinitialize ServerB from ServerA, the problem occurs with >> the first CMP operation against ServerB. >> >> Both servers have the same set of RPMs and the dse.ldif on both systems >> do not have any significant differences. >> >> Has anyone seen a similar issue? Any suggestions on how to debug of fix >> this? >> >> A somewhat simplified and redacted version of the class-of-service >> configuration is listed below. >> >> Thanks A gzip'd copy of the 'thread apply all bt full' output is attached.
Thanks. Can you do this again after installing the 389-ds-base-debuginfo package? debuginfo-install 389-ds-base
Ah, sorry about that. Here's the updated output.
Are you using Views? http://docs.redhat.com/docs/en-US/Red_Hat_Directory_Server/9.0/html/Administ...
No.
Thanks! This looks like a symptom of https://fedorahosted.org/389/ticket/247 fixed in 1.2.10
Hello Rich,
Thanks, I upgraded both of the servers to 1.2.10.1. Unfortunately, it did not resolve the issue. I also noticed that if I run the same ldapcompare command after the first try fails, the server crashes. I can't say whether that is a change in the behaviour, but it is a new observation.
I've attached gdb output for the case where the first ldapcompare is hanging. And, I've also attached the gdb analysis of the core dump.
-- Iain Morgan
I've tested 1.2.10.3 and can confirm that it addresses the segfault. However, the hang (presumably a deadlock) has not gone away. I don't seem to be able to update bug #305 now that it is closed, so I am attaching the gdb backtrace of ns-slapd 1.2.10.3 during the server hang.
Thanks - reopened and attached your stack trace
389-users@lists.fedoraproject.org