On 13-05-2021 08:32, Thierry Bordaz wrote:
On 5/12/21 8:41 PM, Kees Bakker wrote:
On 12-05-2021 19:44, Thierry Bordaz wrote:
On 5/12/21 4:55 PM, Kees Bakker wrote:
Hi Thierry,
Just to be clear, changelogmaxage was changed to -1 by me after the upgrade and I've confirmed it is now set to -1.
The reason for me to change the value was because of the deadlock. Apparently, it did not make much of a difference. It still gets into a deadlock with the value -1.
Did you set nsslapd-changelogmaxage=-1 in the retroCL config entry ?
I've used this input file for ldapmodify
[root@linge ~]# cat change-nsslapd-changelogmaxage.txt dn: cn=Retro Changelog Plugin,cn=plugins,cn=config changetype: modify replace: nsslapd-changelogmaxage nsslapd-changelogmaxage: -1
After that
[root@linge ~]# ldapsearch -H ldapi://%2fvar%2frun%2fslapd-GHS-NL.socket -LLL -o ldif-wrap=no -b 'cn=Retro Changelog Plugin,cn=plugins,cn=config' SASL/EXTERNAL authentication started SASL username: gidNumber=0+uidNumber=0,cn=peercred,cn=external,cn=auth SASL SSF: 0 dn: cn=Retro Changelog Plugin,cn=plugins,cn=config cn: Retro Changelog Plugin nsslapd-attribute: nsuniqueid:targetUniqueId nsslapd-changelogmaxage: -1 nsslapd-include-suffix: cn=dns,dc=ghs,dc=nl nsslapd-plugin-depends-on-named: Class of Service nsslapd-plugin-depends-on-type: database nsslapd-pluginDescription: Retrocl Plugin nsslapd-pluginEnabled: on nsslapd-pluginId: retrocl nsslapd-pluginInitfunc: retrocl_plugin_init nsslapd-pluginPath: libretrocl-plugin nsslapd-pluginType: object nsslapd-pluginVendor: 389 Project nsslapd-pluginVersion: 1.3.10.2 nsslapd-pluginbetxn: on nsslapd-pluginprecedence: 25 objectClass: top objectClass: nsSlapdPlugin objectClass: extensibleObject
You did the right setting. I do not understand why thread 2 is trimming records with that setting.
BTW. I've sent the whole stacktrace directly to you to avoid the 4000+ lines to this mailing list.
Here is the stack trace of one of the threads. The one that hangs in trim_changelog
Something weird is that the same hang occurs https://bugzilla.redhat.com/show_bug.cgi?id=1751295. If I am correct it is between thread 14th and 2nd.
As you are running with a fixed version (> slapi-nis-0.56.5), that means the fix is incomplete. It would require to debug a dump core to know why the fix fails.
[root@linge ~]# rpm -qa slapi* slapi-nis-0.56.5-3.el7_9.x86_64
So, that is not > 0.56.5, but it is the newest available for CentOS7. No?
reading the 'fixed version' in the bug says it if fixed starting 0.56.5
I do not know if it match centOS7 versions
At the moment I killed it and restarted. No deadlock yet, but the system is extreemly slow. And there is just ns-slapd using CPU cycles.
Would you give some precision what is slow ? (response time, replication time,..)
If it is eating CPU, I would recommend 'top -H -p <pid> -b'. If it is always the same thread(s) eating the CPU, a pstack will tell you what it/they is/are doing. If this is various different threads likely it is due to processed requests.
I don't know what numbers to give to show the slowness. The load of the system is almost 4, while the other master (identical system) has a load of almost zero. (And the latter is used as the main LDAP server)
Right now thread 2 and thread 43 are the most active ones. Here are their stacktraces.
Thread 43 (Thread 0x7f4baa1d0700 (LWP 3583)): #0 0x00007f4bcc200a33 in select () at /lib64/libc.so.6 #1 0x00007f4bcf44ab09 in DS_Sleep (ticks=ticks@entry=250) at ldap/servers/slapd/util.c:1086 mSecs = <optimized out> tm = {tv_sec = 0, tv_usec = 26398} #2 0x00007f4bbe587f77 in trickle_threadmain (param=<optimized out>) at ldap/servers/slapd/back-ldbm/dblayer.c:4551 interval = 250 rval = <optimized out> priv = 0x55f87ff28b00 li = <optimized out> debug_checkpointing = 0 #3 0x00007f4bcd1bdc5b in _pt_root () at /lib64/libnspr4.so #4 0x00007f4bccb5dea5 in start_thread () at /lib64/libpthread.so.0 #5 0x00007f4bcc2099fd in clone () at /lib64/libc.so.6
Thread 2 (Thread 0x7f4b93d39700 (LWP 15086)): #0 0x00007f4bce211ef5 in tc_malloc () at /lib64/libtcmalloc.so.4 #1 0x00007f4bcf3c15a3 in slapi_ch_malloc (size=35) at ldap/servers/slapd/ch_malloc.c:95 newmem = <optimized out> #2 0x00007f4bc57809d8 in __os_umalloc () at /lib64/libdb-5.3.so #3 0x00007f4bc5742136 in __db_retcopy () at /lib64/libdb-5.3.so #4 0x00007f4bc571a2b3 in __dbc_iget () at /lib64/libdb-5.3.so #5 0x00007f4bc5728cb2 in __dbc_get_pp () at /lib64/libdb-5.3.so #6 0x00007f4bbe5c1953 in entryrdn_lookup_dn (be=be@entry=0x55f8801ef040, rdn=<optimized out>, id=id@entry=2092284, dn=dn@entry=0x7f4b93d34188, psrdn=psrdn@entry=0x7f4b93d34190, txn=txn@entry=0x7f4b93d342d0) at ldap/servers/slapd/back-ldbm/ldbm_entryrdn.c:1188 rc = <optimized out> ai = 0x55f880232540 db = 0x55f881edaa00 cursor = 0x55f881efe240 db_txn = 0x0 key = {data = 0x55f9026082a0, size = 9, ulen = 9, dlen = 0, doff = 0, app_data = 0x0, flags = 2056} data = {data = 0x0, size = 0, ulen = 0, dlen = 0, doff = 0, app_data = 0x0, flags = 16} keybuf = 0x55f9026082a0 "P2092284" srdn = 0x55f8b9b11500 orignrdn = 0x0 nrdn = 0x55f93b909f80 "changenumber=2092283" nrdn_len = 20 workid = 2092284 elem = <optimized out> maybesuffix = 0 db_retry = <optimized out> #7 0x00007f4bbe5929e7 in id2entry (be=<optimized out>, id=id@entry=2092284, txn=txn@entry=0x7f4b93d342d0, err=err@entry=0x7f4b93d342c4) at ldap/servers/slapd/back-ldbm/id2entry.c:346 sdn = 0x0 normdn = 0x0 srdn = 0x0 bdn = 0x0 rdn = 0x55f93efd4f40 "changenumber=2092283" rc = <optimized out> inst = 0x55f88004a680 db = 0x55f881ed8600 db_txn = <optimized out> key = {data = 0x7f4b93d34200, size = 4, ulen = 0, dlen = 0, doff = 0, app_data = 0x0, flags = 0} data = {data = 0x55f92c680000, size = 985, ulen = 0, dlen = 0, doff = 0, app_data = 0x0, flags = 16} e = 0x0 ee = <optimized out> temp_id = "\000\037\354", <incomplete sequence \374> #8 0x00007f4bbe5ceb72 in ldbm_back_next_search_entry_ext (pb=0x55f95513cd80, use_extension=0) at ldap/servers/slapd/back-ldbm/ldbm_search.c:1627 be = 0x55f8801ef040 inst = 0x55f88004a680 li = 0x55f87ffbf7c0 scope = 2 managedsait = 0 attr = 0x0 filter = 0x55f8a42e8300 sr = 0x55f9557687c0 id = 2092284 e = 0x55f88b819960 nentries = 399583 expire_time = {tv_sec = 0, tv_nsec = 0} tlimit = -1 llimit = -1 slimit = -1 isroot = 1 urls = 0x0 err = 0 basesdn = 0x55f8e973bbc0 target_uniqueid = 0x0 rc = 0 estimate = 0 txn = {back_txn_txn = 0x0} pr_idx = -1 conn = 0x0 op = 0x55f8a2ce14a0 reverse_list = 0 #9 0x00007f4bcf405ff6 in iterate (send_result=1, be=0x55f8801ef040, pr_statp=0x7f4b93d344f4, pagesize=<optimized out>, pnentries=0x7f4b93d34588, pb=0x55f95513cd80) at ldap/servers/slapd/opshared.c:1292 ger_template_entry = 0x0 operation = 0x55f8a2ce14a0 done = 0 e = 0x55f8a82c0180 pr_stat = 0 pr_idx = -1 rc = <optimized out> rval = 1 attrsonly = 0 attrs = 0x0 be = 0x55f8801ef040 rc = <optimized out> #10 0x00007f4bcf405ff6 in send_results_ext (pb=pb@entry=0x55f95513cd80, nentries=nentries@entry=0x7f4b93d34588, pagesize=-1, pr_stat=pr_stat@entry=0x7f4b93d344f4, send_result=1) at ldap/servers/slapd/opshared.c:1645 be = 0x55f8801ef040 rc = <optimized out> #11 0x00007f4bcf408159 in op_shared_search (pb=pb@entry=0x55f95513cd80, send_result=send_result@entry=1) at ldap/servers/slapd/opshared.c:847 be_suffix = <optimized out> err = 0 next_be = 0x0 base = 0x7f4bc394e4e2 "cn=changelog" normbase = <optimized out> fstr = 0x55f8904cc540 "(&(changenumber>=-1)(targetuniqueid=1e89eb86-e20111e8-8820f96e-fc0c60a4))" scope = 2 be = 0x55f8801ef040 be_single = 0x0 be_list = {0x55f8801ef040, 0x0, 0x0, 0x0, 0x55f8801da740, 0x55f800000001, 0x55f900000000, 0x0 <repeats 994 times>} referral_list = {0x0 <repeats 611 times>, 0x7f4bcdd788e3 <ber_sockbuf_ctrl+387>, 0x0, 0x0, 0x0, 0x0, 0x0, 0x7f4bcdd788e3 <ber_sockbuf_ctrl+387>, 0x0 <repeats 186 times>, 0x61e7bfe702042a36, 0x40658d47b4a41ce1, 0xad61e3a2024df476, 0xde7dadc70e0b5fd8, 0xf73159a52a48ea2c, 0x4f70cb70764a2fab, 0x674523017f57e936, 0x98badcfeefcdab89, 0xc3d2e1f010325476, 0x7f4b93d37f50, 0x683f974357510ca2, 0x559e6a9b5a8a133, 0xce1fa8c242f162a9, 0xa6643132814dc94e, 0xadde7596feaa7041, 0xca0e6a6d6158d8b, 0xeb6b73d137578f64, 0x5a6da1000aeb278e, 0x24faa792f298903b, 0xf1b60146abc162e4, 0xf55006a5ea03353b, 0x7f4b93d37f50, 0x6e2c2f26e077f9de, 0x71a715d3942421dd, 0x8000000058200514, 0x0, 0x0, 0x0, 0x0, 0x2a000000000, 0x0, 0xa12a74c04e49e800, 0x0, 0x7f4b93d37f50, 0x2, 0x7f4b93d38020, 0x7f4b93d380e0, 0x2, 0x55f9217d5e00, 0x7f4bcee55d9d <k5_sha1_hash+125>, 0xfea223f37bbd812, 0xf8ec1ab4f83d9a2b, 0x2a0e21f8c65, 0xe077f9de00000000, 0x942421dd6e2c2f26, 0x5820051471a715d3, 0x80000000, 0x0, 0x0, 0x0, 0x0, 0x2a0, 0xfffffffffffffffe, 0xa12a74c04e49e800, 0x7f4bcf07fd60, 0x55f9038e6620, 0x55f8a467c400, 0x55f8cb6cc1e0, 0x0, 0x7f4bcee713e2 <krb5int_hmac_keyblock+418>, 0x40, 0x7f4b93d38170, 0x7f4b93d380e0, 0x5, 0x14970ea702, 0x55f9038e6620, 0x2, 0x40970ea702, 0x55f9217d5e00, 0x1d99616e00000002, 0x14970ea702, 0x55f9038e6620, 0x14, 0xa12a74c04e49e800, 0x20, 0x4, 0x7f4b93d38170, 0xa12a74c04e49e800, 0x4, 0x7f4b93d38170, 0x7f4bcf07ff00 <krb5int_enc_aes256>, 0x7f4bcee6a8f6 <krb5int_dk_decrypt+406>, 0x20, 0xc00000003, 0x55f90dde5e30, 0x7f4b93d380e0, 0x55f9038e6ca0, 0x0, 0x55f8cb6cd3e0, 0x55f8cb6cc1e0, 0x1400000000, 0x55f9038e6ca0, 0x5516000000, 0xa12a74c04e49e800, 0x20, 0x7f4bcf080b00, 0x1e, 0x7f4b93d38290, 0x41ed982026117e4f, 0xb3c887fe91a43017, 0x720d385d3bc2cd56, 0x50a61fb007a7a9ca, 0xa5d3e14dc70e09bd, 0xb6ead04df0228653, 0x674523017e49db9f, 0x98badcfeefcdab89, 0xc3d2e1f010325476, 0x7f4b93d38260, 0x7234e4c832eb2e6b, 0xbab56daaa29a3c86, 0xfda5dc82799faefa, 0x7b24e8909274e0fa, 0xe1bbebd5d6ad6668, 0x6e8fa927ce38a9b, 0x816585922b509df, 0xd2e805e35f84d38d, 0xdf7f8b184e7dfaca, 0x59585d6f1926cba8, 0x59d1d26715819ef8, 0x7f4b93d38260, 0x80a891fe5f5cfd30, 0x705ca35758499ae9, 0x80000000c5f084c6, 0x0, 0x0, 0x0, 0x0, 0x2a000000000, 0x7f4b93d38320, 0xa12a74c04e49e800, 0x55f977b74fc0, 0x7f4b93d38260, 0x2, 0x7f4b93d38330, 0x7f4b93d38400, 0x2, 0x55f94e35f680, 0xa12a74c04e49e800, 0xd7c43895b01e211d, 0x7f4bcee65b20 <k5_iov_cursor_put+176>, 0x2a0f3cb812d, 0xa12a74c04e49e800, 0x58499ae980a891fe, 0x7f4b93d38340, 0x55f89b92fad0, 0x7f4b93d38350, 0x7f4b93d38330, 0x7f4b93d382e0, 0x55f901527130, 0x7f4bcee547cd <krb5int_aes_encrypt+813>, 0x20, 0xa7ff7ad44e49e800, 0x0, 0x7f4b93d382e0, 0x7f4b93d38470, 0x4, 0x10, 0x7f4b00000000, 0x4, 0x0, 0x4, 0x0, 0x21cafd39017982c4, 0x6e273f72669afee0, 0x530d2dd27b61963b, 0xff29c39b99a71313, 0x9ecaf11f4ef579a1, 0x8cb433b09c5edc92, 0x21cafd39017982c4, 0x6e273f72669afee0, 0x0, 0xa12a74c04e49e800, 0x7f4b93d38470...} attrlistbuf = "\300\207ӓK\177\000\000\301\207ӓK\177\000\000\000\000\000\000K\177\000\000\034_\032\315K\177\000\000;\232F\317K\177\000\000\020\211ӓK\177\000\000\000\000\000\000\000\000\000\000P\232F\317K\177\000\000Q\232F\317K\177\000\000\356a\032\315K\177\000\000\020\210ӓK\177", '\000' <repeats 18 times>, "\357\206ӓK\177\000\000\360\206ӓK\177\000\000\006\000\000\000K\177\000\000@\210ӓK\177\000\000A\210ӓK\177\000\000\000\000\000\000K\177\000\000\005\214D\317\000\000\000\000\000\207ӓK\177\000\000P\204x\006\371U\000 _\232F\317K\177", '\000' <repeats 19 times>, "\350IN\300t*\241\000"... attrliststr = <optimized out> attrs = 0x0 rc = 0 internal_op = <optimized out> basesdn = 0x55f8d6a316b0 sdn = 0x55f8e973bbc0 operation = 0x55f8a2ce14a0 referral = 0x0 proxydn = 0x0 proxystr = 0x0 proxy_err = <optimized out> errtext = 0x0 nentries = 0 pnentries = 399583 flag_search_base_found = 1 flag_no_such_object = 0 flag_referral = 0 flag_psearch = <optimized out> err_code = 0 ctrlp = 0x0 ctl_value = 0x0 iscritical = 0 be_name = <optimized out> index = -1 sent_result = 0 pr_stat = 0 pb_conn = 0x0 pagesize = -1 estimate = 0 curr_search_count = <optimized out> pr_be = <optimized out> pr_search_result = <optimized out> pr_idx = -1 orig_sdn = 0x0 free_sdn = 1 vattr_lock_acquired = 1 #12 0x00007f4bcf419697 in search_internal_callback_pb (pb=pb@entry=0x55f95513cd80, callback_data=callback_data@entry=0x7f4b93d38b60, prc=prc@entry=0x7f4bcf418770 <internal_plugin_result_callback>, psec=psec@entry=0x7f4bcf4187c0 <internal_plugin_search_entry_callback>, prec=prec@entry=0x7f4bcf418780 <internal_plugin_search_referral_callback>) at ldap/servers/slapd/plugin_internal_op.c:727 controls = 0x0 op = 0x55f8a2ce14a0 filter = 0x55f8aad34480 fstr = 0x55f8a3cd7180 "(&(changenumber>=-1)(targetuniqueid=1e89eb86-e20111e8-8820f96e-fc0c60a4))" callback_handler_data = {p_res_callback = 0x7f4bcf418770 <internal_plugin_result_callback>, p_srch_entry_callback = 0x7f4bcf4187c0 <internal_plugin_search_entry_callback>, p_ref_entry_callback = 0x7f4bcf418780 <internal_plugin_search_referral_callback>, callback_data = 0x7f4b93d38b60} scope = 2 ifstr = 0x55f8904cc540 "(&(changenumber>=-1)(targetuniqueid=1e89eb86-e20111e8-8820f96e-fc0c60a4))" opresult = -817789072 rc = 0 tmp_attrs = 0x0 #13 0x00007f4bcf4198e8 in search_internal_pb (pb=pb@entry=0x55f95513cd80) at ldap/servers/slapd/plugin_internal_op.c:592 psid = {rc = -1, num_entries = 399583, num_referrals = 0, entry_list_head = 0x55f8e74c92d0, referral_list_head = 0x0} iterator = <optimized out> tmp = 0x7f4b93d38b60 ref_iterator = <optimized out> ref_tmp = 0x7f4bcf6af240 <global_server_id_plg> i = <optimized out> opresult = 0 pb_search_entries = 0x0 pb_search_referrals = 0x0 #14 0x00007f4bcf419b33 in slapi_search_internal_pb (pb=pb@entry=0x55f95513cd80) at ldap/servers/slapd/plugin_internal_op.c:502 #15 0x00007f4bc394bd13 in sync_cookie_get_change_number (uniqueid=<optimized out>, lastnr=<optimized out>) at ldap/servers/plugins/sync/sync_util.c:429 srch_pb = 0x55f95513cd80 cl_entry = <optimized out> rv = 0 filter = 0x55f8904cc540 "(&(changenumber>=-1)(targetuniqueid=1e89eb86-e20111e8-8820f96e-fc0c60a4))" entries = 0x7f4b93d38c6c newnr = 18446744073709551615 uniqueid = <optimized out> attr = 0x55f96ad77e60 val = 0x55f964c56860 #16 0x00007f4bc394bd13 in sync_cookie_update (sc=0x55f8c1753040, ec=ec@entry=0x55f9565852c0) at ldap/servers/plugins/sync/sync_util.c:514 uniqueid = <optimized out> attr = 0x55f96ad77e60 val = 0x55f964c56860 #17 0x00007f4bc394d66d in sync_send_results (arg=<optimized out>) at ldap/servers/plugins/sync/sync_persist.c:643 noattrs = 0x0 ec = 0x55f9565852c0 attrsonly = 0 attrs = 0x0 ectrls = 0x55f906788450 chg_type = 2 req = 0x55f9704795c0 qnode = 0x55f97ea3d320 qnodenext = <optimized out> conn_acq_flag = <optimized out> conn = 0x55f882282ba8 op = 0x55f880953680 rc = <optimized out> connid = 5052 opid = 0 #18 0x00007f4bcd1bdc5b in _pt_root () at /lib64/libnspr4.so #19 0x00007f4bccb5dea5 in start_thread () at /lib64/libpthread.so.0 #20 0x00007f4bcc2099fd in clone () at /lib64/libc.so.
regards thierry [...]