I think i have a handle on this now.
There are a number of issues that i am now aware of.
1. old replication agreement to oldbox1 on newbox6
2. corrupt RUVs, giving the impression of Ghost Replicas.
For #1 i normally can delete these fine with a ldap command. BUT! running this crashes the dirsrv service.
ldapdelete -D "cn=Directory Manager" -w $pwd -p 389 -h localhost -x "cn=newbox6.ad.dice.fm-to-oldbox1.ad.dice.fm,cn=replica,cn=dc\3Dad\2Cdc\3Dcompanyx\2Cdc\3Dfm,cn=mapping tree,cn=config"
Apr 21 11:35:48 newbox6.ad.dice.fm systemd[1]: Starting 389 Directory Server AD-DICE-FM.... Apr 21 11:35:11 newbox6.ad.dice.fm systemd[1]: dirsrv@AD-DICE-FM.service: Failed with result 'signal'. Apr 21 11:35:11 newbox6.ad.dice.fm systemd[1]: dirsrv@AD-DICE-FM.service: Main process exited, code=killed, status=6/ABRT Apr 21 11:35:11 newbox6.ad.dice.fm ns-slapd[2934110]: ns-slapd: ldap/servers/plugins/sync/sync_persist.c:234: sync_update_persist_op: Assertion `prim_op' failed.
For #2 , this issue of not being able to remove the replication agreement, stops the removal of the corrupt RUVs.
As you can see i have tried to kick off some RUV removals but they are failing as not all replicas are online. (but they dont exist as #1)
$ ipa-replica-manage list-clean-ruv -p $pass ipa: ERROR: Cannot open log file '/var/log/ipa/cli.log': [Errno 13] Permission denied: '/var/log/ipa/cli.log' CLEANALLRUV tasks RID 12: Not all replicas online, retrying in 40 seconds... RID restarted-2658134: Not all replicas online, retrying in 320 seconds... RID restarted-2658136: Not all replicas online, retrying in 320 seconds...
No abort CLEANALLRUV tasks running
So, more questions really, Why is the ldapdelete crashing the service? How do i fix it? thanks, Nick