389 DS memory growth
by Nazarenko, Alexander
Hello colleagues,
On March 22nd we updated the 389-ds-base.x86_64 and 389-ds-base-libs.x86_64 packages on our eight RHEL 7.9 production servers from version 1.3.10.2-17.el7_9 to version 1.3.11.1-1.el7_9. We also updated the kernel from kernel 3.10.0-1160.80.1.el7.x86_64 to kernel-3.10.0-1160.88.1.el7.x86_64 during the same update.
Approximately 12 days later, on April 3rd, all the hosts started exhibiting memory growth issues whereby the “slapd” process was using over 90% of the available system memory of 32GB, which was NOT happening for a couple of years prior to applying any of the available package updates on the systems.
Two of the eight hosts act as Primaries (formerly referred to as masters), while 6 of the hosts act as read-only replicas. Three of the read-only replicas are used by our authorization system while the other three read-only replicas are used by customer-based applications.
Currently we use system controls to restrict the memory usage.
My question is whether this is something that other users also experience, and what is the recommended way to stabilize the DS servers in this type of situation?
Thanks,
- Alex
5 months, 1 week
Locking Problem when deleting in parallel
by Harald Strack
Hi,
since we updated to the latest CentOS 7 Version
# rpm -qa | grep 389 389-ds-base-libs-1.3.11.1-3.el7_9.x86_64
389-ds-base-devel-1.3.11.1-3.el7_9.x86_64
389-ds-base-snmp-1.3.11.1-3.el7_9.x86_64
389-adminutil-devel-1.1.22-2.el7.x86_64
389-adminutil-1.1.22-2.el7.x86_64
389-ds-base-1.3.11.1-3.el7_9.x86_64
389-admin-1.1.46-4.el7.x86_64
# uname -r
3.10.0-1160.11.1.el7.x86_64
We experience strange locking (?) behaviour: we have a synchronisation
jobs that tried to delete about 1300 Accounts, always 10 in parallel
using some simple forking perl / shell scripts. Pstree locks like this
auto_sync.pl(21776)───bash(21777)───perl(21798)─┬─perl(21951)───sh(30340)───ldap_remove_use(30341)───ldapremove(30992)───ldapdelete(7489)
├─perl(22691)───sh(1015)───ldap_remove_use(1016)───ldapremove(1687)───ldapdelete(7474) ├─perl(23474)───sh(4344)───ldap_remove_use(4345)───ldapremove(5037)───ldapdelete(7453) ├─perl(24243)───sh(2113)───ldap_remove_use(2114)───ldapremove(2775)───ldapdelete(7528) ├─perl(24979)───sh(29293)───ldap_remove_use(29294)───ldapremove(29943)───ldapdelete(7514) ├─perl(25718)───sh(3190)───ldap_remove_use(3191)───ldapremove(3912)───ldapdelete(7539) ├─perl(26456)───sh(32437)───ldap_remove_use(32438)───ldapremove(624)───ldapdelete(7468) ├─perl(27193)───sh(5442)───ldap_remove_use(5443)───ldapremove(6154)───ldapdelete(7553) ├─perl(27943)───sh(7937)───ldap_remove_use(7938)───ldapremove(8598)───ldapmodify(8683) └─perl(28681)───sh(6549)───ldap_remove_use(6550)───ldapremove(7546)───ldapmodify(7637)
So we run 10 ldapmodify / ldapdelete calls nearly at the same time and
the server does not do anything. After 100s to 400s it returns an error:
[16/Nov/2023:17:41:52.285305117 +0100] conn=512226 op=86 MOD dn="cn=group,ou=...."
[16/Nov/2023:17:43:32.599278565 +0100] conn=512226 op=86 RESULT err=16 tag=103 nentries=0 wtime=0.000075793 optime=100.313978009 etime=100.314051783 csn=655646bb000517e90000
[16/Nov/2023:17:11:43.331110511 +0100] conn=509941 op=2 DEL dn="uid=testuser,ou=People,dc=..."
[16/Nov/2023:17:18:24.325913462 +0100] conn=509941 op=2 RESULT err=1 tag=107 nentries=0 wtime=0.000228257 optime=400.994827179 etime=400.995050827 csn=65564073000017e90000
[16/Nov/2023:17:18:24.326834055 +0100] conn=509941 op=3 UNBIND
causing the client
ldap_delete: Operations error (1)
Some modifies did work, but very slow either. Only a kill and restart
the ns-slapd helped. It's not strictly reproducable, happens after a
while...
Since we have other problems (see "389 DS memory growth") with this
version as well (the versions before did work perfectly for years!) I
think about upgrading the whole cluster to a debian based system with a
more recent Version on debian. We run also some debian 11 based 389 and
some IPAs in podman with Rocky and have no problems at all.
Or are there any other hints we may could try to come around this
strange behaviour on 1.3.11.1-3 ?
br
Harald
5 months, 1 week