389 community,
We are currently testing an upgrade of our LDAP infrastructure centos 7.9 389-ds-1.3.11.1-5.el7_9 to Rhel 9.4 389-ds 2.4.5-9.el9_4. We are wanting to use the replication method to upgrade alongside production and not fully shutdown our center for the migration which would be impactful. The error I am reporting is for initialization of replicas by a rhel 9.x provider. Appears to be something related to rhel 9.x and 389-ds 2.x. Note: I have not seen any other issues besides replica initialization.
Our site has run 389-DS for many years w/o issue and never have seen this issue before. I have done numerous tests to try and isolate this failure. I tried a smaller database and it still fails. different providers and consumers have the same error for rhel 9.x Provider.
Rhel 9.4 389-DS 2.4.5-9 Provider fails to fully initialize Rhel 9.4 389-DS 2.4.5-9 Consumer database.
A Rhel 8.10 provider 389-ds 2.4.5-1 can load our whole data base w/o error on the same consumer Rhel 9.4 389-DS 2.4.5-9 on port 636 LDAPS. I tried rhel 9.4 389-ds 2.4.5-1 (same version that worked on Rhel 8.10) and it fails to the same consumer. I tried changing ports and protocol on rhel 9.4 Provider - Port 636 LDAPS fails, 389 STARTTLS fails, 389 LDAP works on rhel9.4.
The issue is rhel 9.x provider initializing database on rhel 9.4 consumer (fails Sending 33552 entries with TLS). I have tried restricting Provider and consumer to TLS 1.2 then TLS 1.3 and both fail. I tried tuning nsds5ReplicaFlowControlWindow and Debug logs show it gets past the authentication piece and starts to load the consumer and then gets ERR – factory destructor - ERROR bulk import abandoned on consumer. The Provider shows Error "Failed to send extended operation: LDAP error 51 (Server is busy)".
I did try tuning nsds5ReplicaFlowControlWindow and nsds5ReplicaFlowControlPause with various values with little success. I also worry is I start slowing down the Provider to replica traffic it will impact performance. The error logs below are with nsslapd-errorlog-level=24576 and default values for the mentioned tunables.
LDAP Debug Provider Error Log:
[13/Nov/2024:15:54:37.386325753 -0600] - DEBUG - NSMMReplicationPlugin - acquire_replica - agmt="cn=ldap-repb-test-ra" (ldap-repb-test:636): Replica was successfully acquired.
[13/Nov/2024:15:54:37.388243309 -0600] - INFO - NSMMReplicationPlugin - repl5_tot_run - Beginning total update of replica "agmt="cn=ldap-repb-test-ra" (ldap-repb-test:636)".
[13/Nov/2024:15:54:37.390191833 -0600] - DEBUG - NSMMReplicationPlugin - replica_subentry_check - replication keep alive entry <cn=repl keep alive 8,dc=ncsa,dc=illinois,dc=edu> already exists
[13/Nov/2024:15:54:37.513840430 -0600] - ERR - NSMMReplicationPlugin - perform_operation - agmt="cn=ldap-repb-test-ra" (ldap-repb-test:636): Failed to send extended operation: LDAP error 51 (Server is busy)
[13/Nov/2024:15:54:37.515818173 -0600] - DEBUG - NSMMReplicationPlugin - check_flow_control_tot_init - agmt="cn=ldap-repb-test-ra" (ldap-repb-test:636) - Invalid message ids [ msgid sent: -1, rcv: 5]
[13/Nov/2024:15:54:37.519877720 -0600] - ERR - NSMMReplicationPlugin - repl5_tot_log_operation_failure - agmt="cn=ldap-repb-test-ra" (ldap-repb-test:636): Received error 51 (Server is busy): for total update operation
LDAP Consumer (Replica) Error Log:
[13/Nov/2024:15:54:37.345059927 -0600] - DEBUG - NSMMReplicationPlugin - replica_disable_replication - replica dc=ncsa,dc=illinois,dc=edu is acquired
[13/Nov/2024:15:54:37.378553066 -0600] - INFO - bdb_instance_start - Import is running with nsslapd-db-private-import-mem on; No other process is allowed to access the database
[13/Nov/2024:15:54:37.383226816 -0600] - DEBUG - bdb_ldbm_back_wire_import - bdb_bulk_import_start returned 0
[13/Nov/2024:15:54:37.383985431 -0600] - DEBUG - NSMMReplicationPlugin - multisupplier_extop_StartNSDS50ReplicationRequest - encoded response: 0
[13/Nov/2024:15:54:37.384635904 -0600] - DEBUG - NSMMReplicationPlugin - multisupplier_extop_StartNSDS50ReplicationRequest - Finished encoding payload
[13/Nov/2024:15:54:37.385240342 -0600] - DEBUG - NSMMReplicationPlugin - multisupplier_extop_StartNSDS50ReplicationRequest - conn=162194 op=3 repl="dc=ncsa,dc=illinois,dc=edu": StartNSDS90ReplicationRequest: response=0 rc=0
[13/Nov/2024:15:54:37.385907940 -0600] - DEBUG - NSMMReplicationPlugin - consumer_connection_extension_relinquish_exclusive_access - conn=162194 op=3 Relinquishing consumer connection extension
[13/Nov/2024:15:54:37.390220175 -0600] - DEBUG - bdb_ldbm_back_wire_import - bdb_bulk_import_queue returned 0 with entry dc=ncsa,dc=illinois,dc=edu
[13/Nov/2024:15:54:37.505440971 -0600] - DEBUG - bdb_ldbm_back_wire_import - bdb_bulk_import_queue returned 0 with entry cn=Directory Administrators,dc=ncsa,dc=illinois,dc=edu
[13/Nov/2024:15:54:37.507799711 -0600] - DEBUG - bdb_ldbm_back_wire_import - bdb_bulk_import_queue returned 0 with entry ou=Groups,dc=ncsa,dc=illinois,dc=edu
[13/Nov/2024:15:54:37.509710361 -0600] - DEBUG - bdb_ldbm_back_wire_import - bdb_bulk_import_queue returned 0 with entry ou=People,dc=ncsa,dc=illinois,dc=edu
[13/Nov/2024:15:54:37.511664842 -0600] - DEBUG - bdb_ldbm_back_wire_import - bdb_bulk_import_queue returned 0 with entry cn=repl keep alive 5,dc=ncsa,dc=illinois,dc=edu
[13/Nov/2024:15:54:37.513327884 -0600] - DEBUG - bdb_ldbm_back_wire_import - bdb_bulk_import_queue returned 0 with entry ou=resource_delta,dc=ncsa,dc=illinois,dc=edu
[13/Nov/2024:15:54:37.514630160 -0600] - DEBUG - bdb_ldbm_back_wire_import - bdb_bulk_import_queue returned 0 with entry ou=resource_vforge,dc=ncsa,dc=illinois,dc=edu
[13/Nov/2024:15:54:37.516008120 -0600] - DEBUG - bdb_ldbm_back_wire_import - bdb_bulk_import_queue returned 0 with entry ou=resource_holli,dc=ncsa,dc=illinois,dc=edu
..
..
..
[13/Nov/2024:15:54:37.530146158 -0600] - DEBUG - bdb_ldbm_back_wire_import - bdb_bulk_import_queue returned 0 with entry cn=PD Managers,ou=Groups,dc=ncsa,dc=illinois,dc=edu
[13/Nov/2024:15:54:37.542448330 -0600] - ERR - factory_destructor - ERROR bulk import abandoned
[13/Nov/2024:15:54:37.584123281 -0600] - ERR - bdb_import_run_pass - import userroot: Thread monitoring returned: -23
[13/Nov/2024:15:54:37.585520872 -0600] - ERR - bdb_public_bdb_import_main - import userroot: Aborting all Import threads...
[13/Nov/2024:15:54:43.494296253 -0600] - ERR - bdb_public_bdb_import_main - import userroot: Import threads aborted.
[13/Nov/2024:15:54:43.496403618 -0600] - INFO - bdb_public_bdb_import_main - import userroot: Closing files...
[13/Nov/2024:15:54:43.503149502 -0600] - ERR - bdb_public_bdb_import_main - import userroot: Import failed.
[13/Nov/2024:15:54:43.506359910 -0600] - DEBUG - NSMMReplicationPlugin - consumer_connection_extension_destructor - Aborting total update in progress for replicated area dc=ncsa,dc=illinois,dc=edu connid=162194
[13/Nov/2024:15:54:43.507738378 -0600] - ERR - process_bulk_import_op - NULL target sdn
[13/Nov/2024:15:54:43.509027437 -0600] - DEBUG - NSMMReplicationPlugin - replica_relinquish_exclusive_access - conn=162194 op=-1 repl="dc=ncsa,dc=illinois,dc=edu": Released replica held by locking_purl=conn=162194 id=3
-Tim