Replication status commands seem to fail

Sunday, 3 January 2021

Hi,

I'm using version 1.4.3 on CentOS 8.3.
I'm trying to set up replication with a single master and a single consumer,
following the steps from

https://access.redhat.com/documentation/en-us/red_hat_directory_server/11...

It seems to work, in that the database is populated on the consumer, and
when I change a database entry on the master, the change appears on the
consumer.

However, replication status commands seem (?) to indicate that something
isn't working completely right. Eg when I do:

dsconf -w "$passwd" -D "$rootdn" $instance repl-agmt status \
   --suffix $suffix $agreement

I get:

    Replica Enabled: on
    Update In Progress: FALSE
    Last Update Start: 20210103213704Z
    Last Update End: 20210103213704Z
    Number Of Changes Sent: 1:1/0
    Number Of Changes Skipped: None
    Last Update Status: Error (0) Replica acquired successfully: Incremental
       update succeeded
    Last Init Start: 19700101000000Z
    Last Init End: 19700101000000Z
    Last Init Status: unavailable
    Reap Active: 0
    Replication Status: Not in Synchronization: supplier
    (5ff237d3000000010000) consumer (Unavailable) State (green) Reason
    (error (0) replica acquired successfully: incremental update succeeded)
    Replication Lag Time: Unavailable

The last two entries seem to indicate some problem?

In the logs on the consumer, I see the following entries that I think
might be (?) related to replication:
  conn=29 fd=64 slot=64 SSL connection from MASTER.IP to MY.IP
  conn=29 op=-1 fd=64 closed - unknown error

If I increase the logging level, I get:
    DEBUG - connection_read_operation - connection 77 waited 1 times for
      read to be ready
    DEBUG - connection_read_operation - PR_Recv for connection 77
      returns -12109 (unknown error)
    DEBUG - disconnect_server_nomutex_ext - Setting conn 77 fd=64 to
      be disconnected: reason -12109

Also, when I restart my consumer for the very first time after setting
up the replication agreement, ns-slapd reliably hangs using 100% CPU.
Strace shows endless:

  select(0, NULL, NULL, NULL, {tv_sec=0, tv_usec=0}) = 0 (Timeout)
  poll([{fd=22, events=POLLIN}], 1, 0)    = 0 (Timeout)

where fd/22 = a pipe.
If I kill -9 it, it starts working.
I'm not sure if this has any relation.

TIA for any insight into all this.

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005