Crash with SEGV after compacting
by Niklas Schmatloch
Hi
My organisation is using a replicated 389-dirsrv. Lately, it has been crashing
each time after compacting.
It is replicable on our instances by lowering the compactdb-interval to
trigger the compacting:
dsconf -D "cn=Directory Manager" ldap://127.0.0.1 -w 'PASSWORD_HERE' backend config set --compactdb-interval 300
This is the log:
[03/Aug/2022:16:06:38.552781605 +0200] - NOTICE - checkpoint_threadmain - Compacting DB start: userRoot
[03/Aug/2022:16:06:38.752592692 +0200] - NOTICE - bdb_db_compact_one_db - compactdb: compact userRoot - 8 pages freed
[03/Aug/2022:16:06:44.172233009 +0200] - NOTICE - bdb_db_compact_one_db - compactdb: compact userRoot - 888 pages freed
[03/Aug/2022:16:06:44.179315345 +0200] - NOTICE - checkpoint_threadmain - Compacting DB start: changelog
[03/Aug/2022:16:13:18.020881527 +0200] - NOTICE - bdb_db_compact_one_db - compactdb: compact changelog - 458 pages freed
dirsrv(a)auth-alpha.service: Main process exited, code=killed, status=11/SEGV
dirsrv(a)auth-alpha.service: Failed with result 'signal'.
dirsrv(a)auth-alpha.service: Consumed 2d 6h 22min 1.122s CPU time.
The first steps are done very quickly, but the step before the 458 pages of the
retro-changelog are freed, takes several minutes. In this time the dirsrv writes
more than 10 G and reads more than 7 G (according to iotop).
After this line is printed the dirsrv crashes within seconds.
What I also noticed is, that even though it said it freed a lot of pages the
retro-changelog does not seem to change in size.
The file `/var/lib/dirsrv/slapd-auth-alpha/db/changelog/id2entry.db` is 7.2 G
before and after the compacting.
Debian 11.4
389-ds-base/stable,now 1.4.4.11-2 amd64
Does someone have an idea how to debug / fix this?
Thanks
1 month, 1 week
Replication agreements creation order
by Alberto Crescente
Hello, I would like to create a three nodes cluster with multi-master
system, but when I go to define the
agreements between nodes I get different results depending on the order
in which I create the agreements.
The nodes hostname are test-389-ds-1, test-389-ds-2 and test-389-ds-3.
If I define the agreements in the following order, some replications
fail after the last agreement creation:
test-389-ds-1 -> test-389-ds-2
test-389-ds-2 -> test-389-ds-1
test-389-ds-1 -> test-389-ds-3
test-389-ds-3 -> test-389-ds-1
test-389-ds-2 -> test-389-ds-3
If I define the agreements in the following order, all is ok:
test-389-ds-1 -> test-389-ds-2
test-389-ds-2 -> test-389-ds-1
test-389-ds-1 -> test-389-ds-3
test-389-ds-2 -> test-389-ds-3
test-389-ds-3 -> test-389-ds-1
test-389-ds-3 -> test-389-ds-2
Error log test-389-ds-1
[10/Mar/2023:18:26:01.268922410 +0100] - ERR -
agmt="cn=agreement-test-389-ds-1-to-test-389-ds-2" (test-389-ds-2:636) -
clcache_load_buffer - Can't locate CSN 640b564b000100030000 in the
changelog (DB rc=-12797). If replication stops, the consumer may need to
be reinitialized.
[10/Mar/2023:18:26:01.272854351 +0100] - ERR - NSMMReplicationPlugin -
changelog program - repl_plugin_name_cl -
agmt="cn=agreement-test-389-ds-1-to-test-389-ds-2" (test-389-ds-2:636):
CSN 640b564b000100030000 not found, we aren't as up to date, or we purged
[10/Mar/2023:18:26:01.273957365 +0100] - ERR - NSMMReplicationPlugin -
send_updates - agmt="cn=agreement-test-389-ds-1-to-test-389-ds-2"
(test-389-ds-2:636): Data required to update replica has been purged
from the changelog. If the error persists the replica must be reinitialized.
Error log test-389-ds-2
[10/Mar/2023:17:14:47.939971206 +0100] - INFO - NSMMReplicationPlugin -
repl5_tot_run - Finished total update of replica
"agmt="cn=agreement-test-389-ds-2-to-test-389-ds-3"
(test-389-ds-3:636)". Sent 15 entries.
[10/Mar/2023:18:13:11.579175020 +0100] - INFO - task_export_thread -
Beginning export of 'userroot'
[10/Mar/2023:18:13:11.581387379 +0100] - INFO - bdb_db2ldif - export
userroot: Processed 17 entries (100%).
[10/Mar/2023:18:13:11.582475585 +0100] - INFO - task_export_thread -
Export finished.
Error log test-389-ds-3
[10/Mar/2023:18:27:29.275950935 +0100] - ERR -
agmt="cn=agreement-test-389-ds-3-to-test-389-ds-1" (test-389-ds-1:636) -
clcache_load_buffer - Can't locate CSN 640b564d000000030000 in the
changelog (DB rc=-12797). If replication stops, the consumer may need to
be reinitialized.
[10/Mar/2023:18:27:29.277963218 +0100] - ERR - NSMMReplicationPlugin -
changelog program - repl_plugin_name_cl -
agmt="cn=agreement-test-389-ds-3-to-test-389-ds-1" (test-389-ds-1:636):
CSN 640b564d000000030000 not found, we aren't as up to date, or we purged
[10/Mar/2023:18:27:29.283542396 +0100] - ERR - NSMMReplicationPlugin -
send_updates - agmt="cn=agreement-test-389-ds-3-to-test-389-ds-1"
(test-389-ds-1:636): Data required to update replica has been purged
from the changelog. If the error persists the replica must be reinitialized.
# ds-replcheck offline -b dc=test,dc=com -m tmp/test-389-ds-1.ldif -r
tmp/test-389-ds-2.ldif --rid 1
...
Missing Entries
=====================================================
Entries missing on Replica:
- cn=repl keep alive 3,dc=test,dc=com (Created on Supplier at: Fri
Mar 10 16:09:47 2023)
Entry Inconsistencies
=====================================================
cn=repl keep alive 1,dc=test,dc=com
----------------------------------------
- Attribute 'keepalivetimestamp' is different:
Supplier:
- Value: 20230310171215Z
- State Info:
keepalivetimestamp;adcsn-640b64ef000000010000;vucsn-640b64ef000000010000:
20230310171215Z
- Date: Fri Mar 10 18:12:15 2023
Replica:
- Value: 20230310161215Z
- State Info:
keepalivetimestamp;adcsn-640b56df000000010000;vucsn-640b56df000000010000:
20230310161215Z
- Date: Fri Mar 10 17:12:15 2023
Is there a method to know the correct sequence of definition of the
agreements?
Regards,
Alberto.
2 months, 2 weeks
Problem with 389-ds authentication
by Mr Mysteron
Hi.
I'm running two 389-ds instances on Centos9 servers, one master and one
readonly slave server.
Global pwpolicy is PBKDF2_SHA256 and local pwpolicy is SSHA512.
The mail-servers are querying the readonly slave server for LDAP data.
All servers are using TLS for encryption.
I'm running a two mail servers, one for incoming mail with Dovecot as an
imap frontend and one for Postfix smtp with Dovecot as a SASL
authentication backend.
The Dovecot imap server has been running LDAP authentication flawlessly,
but I recently switched the Postfix smtp server over to Dovecot SASL
authentication.
Here's when everything started taking an interesting turn.
The incoming Dovecot imap server is set to do an authentication bind:
auth_bind = yes
while the smtp server with Postfix + Dovecot SASL authentication does not
do an auth_bind.
The authentication process started failing on the smtp server with the
following error message for every authenticated user:
dovecot[721505]: auth: Error: ldap(USERNAME): Unknown scheme PBKDF2-SHA512
Changing password for a user will allow authentication against the LDAP
from the smtp server, but when the imap server authenticates and use
auth_bind, then no LDAP authentication is possible do on the smtp server
and the above error message appears again for the user.
I found out, that when I also use auth_bind for Dovecot on the smtp server
everything works.
What I hope someone could explain for me is, what's happening with the
slave queries against the 389-ds ro server instance when the imap server
authenticates the user with auth_bind enabled and the smtp server cannot
authenticate the user when auth_bind is not enabled.
The servers are binding prior to auth_bind with a
dn = cn=binduser,ou=bindaccount,dc=example,dc=com
user so that part is working as intended.
Thank you.
BR,
/MrM
2 months, 2 weeks