Hi Thierry and Kees, 

Thank you for your replies. I provided the outputs of "pstack $(pidof ns-slapd)" to Red Hat Support Center, and they were able to find the root cause for a particular hang issue, it turns out to be a bug. I attached the details. I will continue to troubleshoot with them. Hangs happen a lot in our environment. I will reference your comments to collect  more data to confirm that it is the same issue of all hang's or if there is something else.  

Many thanks! 

Kathy. 


Red Hat SME found an issue related to our hangs. Here are the details.

----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
The IPA schema compat plugin for RHDS deadlocks on delete post op. One thread is trying to acquire a write lock while in delete post op while several other threads are trying to acquire read locks in bind ops. It is not clear which thread is actually holding the lock/s. No FATAL error messages are reported in the logs for lock/unlock API. Because of this issue the RHDS gets into a "hang" like state where it is not responsive to operations which is obviously a problem. There is one thread that is doing post op plugin ops for add op tho it is not in schema compat plugin. And there are a few threads that just stuck in backend search ops inside bdb code likely because the locks they need are held by thread/s stuck in schema compat plugin operations. 
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

A bug was opened to further analysis from our Developers.

 - [2124214 – schema compat plugin deadlock on delete post op](https://bugzilla.redhat.com/show_bug.cgi?id=2124214)




On Tue, Sep 6, 2022 at 4:16 AM Kees Bakker via FreeIPA-users <freeipa-users@lists.fedorahosted.org> wrote:
On 30-08-2022 23:20, Kathy Zhu via FreeIPA-users wrote:
Hi Team, 

We used following to get the number of rwlocks for /usr/sbin/ns-slapd process in Centos 7.9 to catch deadlocks: 

PID=`pidof ns-slapd`

gdb -ex 'set confirm off' -ex 'set pagination off' -ex 'thread apply all bt full' -ex 'quit' /usr/sbin/ns-slapd $PID |& grep '^#0.*lock' | grep pthread_rwlock | sort -u 


That helped us to detect ns-slapd hang caused by deadlocks. 


After migrating to Red Hat 8.6, we had a lot of hangs (dirsvr is running but not responding) and could not find why. We use the same above method, however, we are not able to catch anything. I wonder if there is a different way to count the rwlocks in Red Hat 8.6? 


We realize that there are multiple reasons to cause hangs, however, we would like to rule out the possibility of the deadlock. 


The OS and packages: 


Red Hat Enterprise Linux release 8.6 (Ootpa)

ipa-server.x86_64 4.9.8-7.module+el8.6.0+14337+19b76db2 @rhel-8-for-x86_64-appstream-rpms

slapi-nis-0.56.6-4.module+el8.6.0+12936+736896b2.x86_64

389-ds-base-libs-1.4.3.28-6.module+el8.6.0+14129+983ceada.x86_64

389-ds-base-1.4.3.28-6.module+el8.6.0+14129+983ceada.x86_64



Hey Kathy,

May I suggest to also look at the 389-users mailing list
   389-users@lists.fedoraproject.org
  https://lists.fedoraproject.org/archives/list/389-users@lists.fedoraproject.org
For example threads with subject:
  * Lock table is out of available lock entries
  * Enabling retro changelog maxage with 3 million entries make dirsrv not respond anymore
  * DB_LOCK_DEADLOCK: Locker killed to resolve a deadlock

I have reported about hangs (deadlocks). In our case when Retro Changelog trimming
is enabled we get a hangs within a few minutes. As far as I know there is no solution yet.
That is why we have disabled this trimming. (And our Retro Changelog has accumulated to more
than Gb of data.)
--
Kees

_______________________________________________
FreeIPA-users mailing list -- freeipa-users@lists.fedorahosted.org
To unsubscribe send an email to freeipa-users-leave@lists.fedorahosted.org
Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: https://lists.fedorahosted.org/archives/list/freeipa-users@lists.fedorahosted.org
Do not reply to spam, report it: https://pagure.io/fedora-infrastructure/new_issue