We have 3 ipa servers, one of which is throwing an ERROR condition during ipa-healthcheck
for the "ReplicationCheck" test. Ipa-healthcheck shows no errors when run from
the other two replicas. Looking back at the logs, it appears this started about ten days
ago, so it is not a transient issue as the output suggests:
[root(a)ipa1.id.example.com]# ipa-healthcheck --failures-only
[
{
"source": "ipahealthcheck.ds.replication",
"check": "ReplicationCheck",
"result": "ERROR",
"uuid": "2b971ca3-678e-4c26-86a0-5b352027e7e8",
"when": "20211201180013Z",
"duration": "0.687812",
"kw": {
"key": "DSREPLLE0003",
"items": [
"Replication",
"Agreement"
],
"msg": "The replication agreement (
catoipa2.id.example.com) under
\"o=ipaca\" is not in synchronization.\nStatus message: error (18) can't
acquire replica (incremental update transient warning. backing off, will retry update
later.)"
}
},
{
"source": "ipahealthcheck.ds.replication",
"check": "ReplicationCheck",
"result": "ERROR",
"uuid": "99436870-bc98-4ce8-84b1-c0b0806945c8",
"when": "20211201180013Z",
"duration": "0.687829",
"kw": {
"key": "DSREPLLE0003",
"items": [
"Replication",
"Agreement"
],
"msg": "The replication agreement (
catoipa3.id.example.com) under
\"o=ipaca\" is not in synchronization.\nStatus message: error (18) can't
acquire replica (incremental update transient warning. backing off, will retry update
later.)"
}
}
]
389-ds error logs show a slew of these:
[30/Nov/2021:23:41:35.277399980 -0800] - ERR - NSMMReplicationPlugin - send_updates -
agmt="cn=caToipa3.id.example.com" (ipa2:389): Missing data encountered. If the
error persists the replica must be reinitialized.
[30/Nov/2021:23:41:38.288003253 -0800] - ERR - agmt="cn=caToipa3.id.example.com"
(ipa3:389) - clcache_load_buffer - Can't locate CSN 6197e149000000060000 in the
changelog (DB rc=-30988). If replication stops, the consumer may need to be
reinitialized.
[30/Nov/2021:23:41:38.289713999 -0800] - ERR - NSMMReplicationPlugin - send_updates -
agmt="cn=caToipa3.id.example.com" (ipa3:389): Missing data encountered. If the
error persists the replica must be reinitialized.
That would seem to suggest running a "ipa-replica-manage re-initialize --from
$SERVER_TO_PULL_FROM" may resolve the issue, but before we try that, is there
anything else we should look at?
Thanks,
Scott