We have 3 ipa servers, one of which is throwing an ERROR condition during ipa-healthcheck for the "ReplicationCheck" test. Ipa-healthcheck shows no errors when run from the other two replicas. Looking back at the logs, it appears this started
about ten days ago, so it is not a transient issue as the output suggests:
[root@ipa1.id.example.com]# ipa-healthcheck --failures-only
[
{
"source": "ipahealthcheck.ds.replication",
"check": "ReplicationCheck",
"result": "ERROR",
"uuid": "2b971ca3-678e-4c26-86a0-5b352027e7e8",
"when": "20211201180013Z",
"duration": "0.687812",
"kw": {
"key": "DSREPLLE0003",
"items": [
"Replication",
"Agreement"
],
"msg": "The replication agreement (catoipa2.id.example.com) under \"o=ipaca\" is not in synchronization.\nStatus message: error (18) can't acquire replica (incremental update transient warning. backing off, will retry update later.)"
}
},
{
"source": "ipahealthcheck.ds.replication",
"check": "ReplicationCheck",
"result": "ERROR",
"uuid": "99436870-bc98-4ce8-84b1-c0b0806945c8",
"when": "20211201180013Z",
"duration": "0.687829",
"kw": {
"key": "DSREPLLE0003",
"items": [
"Replication",
"Agreement"
],
"msg": "The replication agreement (catoipa3.id.example.com) under \"o=ipaca\" is not in synchronization.\nStatus message: error (18) can't acquire replica (incremental update transient warning. backing off, will retry update later.)"
}
}
]
389-ds error logs show a slew of these:
[30/Nov/2021:23:41:35.277399980 -0800] - ERR - NSMMReplicationPlugin - send_updates - agmt="cn=caToipa3.id.example.com" (ipa2:389): Missing data encountered. If the error persists the replica must be reinitialized.
[30/Nov/2021:23:41:38.288003253 -0800] - ERR - agmt="cn=caToipa3.id.example.com" (ipa3:389) - clcache_load_buffer - Can't locate CSN 6197e149000000060000 in the changelog (DB rc=-30988). If replication stops, the consumer may need to be
reinitialized.
[30/Nov/2021:23:41:38.289713999 -0800] - ERR - NSMMReplicationPlugin - send_updates - agmt="cn=caToipa3.id.example.com" (ipa3:389): Missing data encountered. If the error persists the replica must be reinitialized.
That would seem to suggest running a "ipa-replica-manage re-initialize --from $SERVER_TO_PULL_FROM" may resolve the issue, but before we try that, is there anything else we should look at?
Thanks,
Scott