On 4 Jan 2021, at 08:06, Glenn Morris <rgm(a)stanford.edu>
wrote:
Hi,
I'm using version 1.4.3 on CentOS 8.3.
I'm trying to set up replication with a single master and a single consumer,
following the steps from
https://access.redhat.com/documentation/en-us/red_hat_directory_server/11...
It seems to work, in that the database is populated on the consumer, and
when I change a database entry on the master, the change appears on the
consumer.
However, replication status commands seem (?) to indicate that something
isn't working completely right. Eg when I do:
dsconf -w "$passwd" -D "$rootdn" $instance repl-agmt status \
--suffix $suffix $agreement
I get:
Replica Enabled: on
Update In Progress: FALSE
Last Update Start: 20210103213704Z
Last Update End: 20210103213704Z
Number Of Changes Sent: 1:1/0
Number Of Changes Skipped: None
Last Update Status: Error (0) Replica acquired successfully: Incremental
update succeeded
Last Init Start: 19700101000000Z
Last Init End: 19700101000000Z
Last Init Status: unavailable
Reap Active: 0
Replication Status: Not in Synchronization: supplier
(5ff237d3000000010000) consumer (Unavailable) State (green) Reason
(error (0) replica acquired successfully: incremental update succeeded)
Replication Lag Time: Unavailable
The last two entries seem to indicate some problem?
In the logs on the consumer, I see the following entries that I think
might be (?) related to replication:
conn=29 fd=64 slot=64 SSL connection from MASTER.IP to MY.IP
conn=29 op=-1 fd=64 closed - unknown error
If I increase the logging level, I get:
DEBUG - connection_read_operation - connection 77 waited 1 times for
read to be ready
DEBUG - connection_read_operation - PR_Recv for connection 77
returns -12109 (unknown error)
DEBUG - disconnect_server_nomutex_ext - Setting conn 77 fd=64 to
be disconnected: reason -12109
Also, when I restart my consumer for the very first time after setting
up the replication agreement, ns-slapd reliably hangs using 100% CPU.
Strace shows endless:
select(0, NULL, NULL, NULL, {tv_sec=0, tv_usec=0}) = 0 (Timeout)
poll([{fd=22, events=POLLIN}], 1, 0) = 0 (Timeout)
where fd/22 = a pipe.
If I kill -9 it, it starts working.
I'm not sure if this has any relation.
Thanks for this. Indeed, if I replace "--port=636
--conn-protocol=LDAPS"
(from "Steps to be Performed on the Supplier" in the Red Hat docs)
with "--port=389 --conn-protocol=StartTLS" when running "repl-agmt
create",
then the status command reports "Replication Status: In Synchronization"
(after the first change is synced). It leaves me wondering a bit how
secure it is though...
StartTLS over 389 is "effectively" equivalent in strength to LDAPS at least for
replication security wise. LDAPS is preferred though.
Saying this, if StartTLS is working but LDAPS is not that points to something else fishy -
StartTLS and LDAPS both use the same CA verification routines and connection/tls
machinery, so perhaps there is a problem in network connectivity or some redirection from
LDAPS. Some basic checks to ensure that ldapwhoami/ldapsearch work over ldaps:// to all
the servers in your topology would be a good start, including then say doing the same
ldapwhoami/ldapsearch from on the nodes in the topology to each other to ensure nothing in
between is causing issues.
—
Sincerely,
William Brown
Senior Software Engineer, 389 Directory Server
SUSE Labs, Australia