On Tue, 2018-08-14 at 21:46 +0000, Devon Peters wrote:
Hi folks,
We've got multi-master replication setup between two masters. The
replication recently broke after being stable for a few years, and in
troubleshooting it appears that the issue is that both masters have
the same nsDS5ReplicaId defined (both are set to 2). Both masters
have nearly the same output from running:
$ ldapsearch -x -b 'cn=replica,cn="dc=someorg,dc=com",cn=mapping
tree,cn=config' -D "cn=Directory Manager" -W
# replica, dc\3Dsomeorg\2Cdc\3Dcom, mapping tree, config
dn: cn=replica,cn=dc\3Dsomeorg\2Cdc\3Dcom,cn=mapping
tree,cn=config
objectClass: nsDS5Replica
objectClass: top
nsDS5ReplicaRoot: dc=someorg,dc=com
nsDS5ReplicaType: 3
nsDS5Flags: 1
nsDS5ReplicaId: 2
nsds5ReplicaPurgeDelay: 604800
nsDS5ReplicaBindDN: cn=replication manager,cn=config
cn: replica
nsState::
AgAAAAAAAADES3NbAAAAAAAAAAAAAAAAAgAAAAAAAAABAAAAAAAAAA==
nsDS5ReplicaName: 52c33a02-368211e1-a5228b52-eb63f05c
nsds5ReplicaChangeCount: 17190
nsds5replicareapactive: 0
The only difference in this output between the two servers is the
nsds5ReplicaChangeCount.
I believe I can use ldapmodify to change the replica ID on one of the
nodes, but am unsure whether or not this is the proper way to fix the
issue - or if there is anything additional that needs to take into
account when making this change.
We inherited this LDAP system a while ago, and are not very familiar
with how replication works in general, so we're reluctant to try this
in fear of causing more damage by doing the wrong thing.
First, take backups.
After that you need to check to see if there are missing entries or
attributes on either side of the replication. Choose the master that is
"correct" as the master to leave the replicaID's untouched.
Once done, you would fence the nodes, change the replica ID of the
"less correct" master, then do a full-re-init from the unchanged
master.
That should resolve the issue.
Again, backups. Lots of backups. db2ldif is your friend :)
Hope that helps,
--
Sincerely,
William