Dungan, Scott A. via FreeIPA-users wrote:
Update: after 20-30 minutes, the ca replication errors stopped,
leaving only the new IPACertRevocation error and the 404 code:
[root(a)ipa1.id.example.com ~]# ipa-healthcheck --failures-only
...
ra.get_certificate(): Request failed with status 404: Non-2xx response from CA REST API:
404. Certificate ID 0x13 not found (404)
[
{
"source": "ipahealthcheck.ipa.certs",
"check": "IPACertRevocation",
"result": "ERROR",
"uuid": "c447d225-c712-4a77-b174-81f6ba008128",
"when": "20211202025711Z",
"duration": "3.825151",
"kw": {
"key": "20211130180109",
"serial": 19,
"error": "Certificate operation cannot be completed: Request failed
with status 404: Non-2xx response from CA REST API: 404. Certificate ID 0x13 not found
(404)",
"msg": "Request for certificate serial number {serial} in request
{key} failed: {error}"
}
}
]
So there was some data loss.
Re-issuing a cert generally involves revoking the existing one but since
that isn't there, this could be interesting.
What does the tracking for that cert look like? getcert list -i
20211130180109
rob
-----Original Message-----
From: Dungan, Scott A. via FreeIPA-users <freeipa-users(a)lists.fedorahosted.org>
Sent: Wednesday, December 1, 2021 5:18 PM
To: FreeIPA users list <freeipa-users(a)lists.fedorahosted.org>
Cc: Dungan, Scott A. <sdungan(a)caltech.edu>
Subject: [Freeipa-users] Re: ipa-healthcheck: ReplicationCheck ERROR
Hi, Rob.
I think I got ahead of myself and may have made things worse:
Because servers iap2 and ipa3 are in agreement (ldif matches) and had no healthcheck
errors, and ipa1 did, I assumed that ipa1 needed to be re-initialized. The small number of
changes that would be lost on ipa1 was/is considered negligible. To re-initialize ipa1, I
ran the following:
[root(a)ipa1.id.example.com ~]# ipa-csreplica-manage re-initialize --from
ipa3.id.example.com Directory Manager password:
Update in progress, 7 seconds elapsed
Update succeeded
Looks good, but when I do a healthcheck again, now I get additional errors as well as the
original:
[root(a)ipa1.id.example.com ~]# ipa-healthcheck --failures-only ...
ra.get_certificate(): Request failed with status 404: Non-2xx response from CA REST API:
404. Certificate ID 0x13 not found (404 [
{
"source": "ipahealthcheck.ds.replication",
"check": "ReplicationCheck",
"result": "ERROR",
"uuid": "023f04e2-e8d8-459e-b2cb-bb516e30db07",
"when": "20211202005925Z",
"duration": "0.301820",
"kw": {
"key": "DSREPLLE0003",
"items": [
"Replication",
"Agreement"
],
"msg": "The replication agreement (
catoipa2.id.example.com) under
\"o=ipaca\" is not in synchronization.\nStatus messa: error (18) can't
acquire replica (incremental update transient warning. backing off, will retry update
later.)"
}
},
{
"source": "ipahealthcheck.ds.replication",
"check": "ReplicationCheck",
"result": "ERROR",
"uuid": "6b7499ce-2138-415d-88ac-bf24f6f3a6db",
"when": "20211202005925Z",
"duration": "0.301851",
"kw": {
"key": "DSREPLLE0003",
"items": [
"Replication",
"Agreement"
],
"msg": "The replication agreement (
catoipa3.id.example.com) under
\"o=ipaca\" is not in synchronization.\nStatus messa: error (18) can't
acquire replica (incremental update transient warning. backing off, will retry update
later.)"
}
},
{
"source": "ipahealthcheck.ipa.certs",
"check": "IPACertRevocation",
"result": "ERROR",
"uuid": "b02a8f85-985f-4b80-a099-fb3d415b2005",
"when": "20211202005935Z",
"duration": "2.663208",
"kw": {
"key": "20211130180109",
"serial": 19,
"error": "Certificate operation cannot be completed: Request failed
with status 404: Non-2xx response from CA REST API: 40 Certificate ID 0x13 not found
(404)",
"msg": "Request for certificate serial number {serial} in request
{key} failed: {error}"
}
}
]
-----Original Message-----
From: Rob Crittenden <rcritten(a)redhat.com>
Sent: Wednesday, December 1, 2021 11:50 AM
To: FreeIPA users list <freeipa-users(a)lists.fedorahosted.org>
Cc: Dungan, Scott A. <sdungan(a)caltech.edu>
Subject: Re: [Freeipa-users] ipa-healthcheck: ReplicationCheck ERROR
Dungan, Scott A. via FreeIPA-users wrote:
> We have 3 ipa servers, one of which is throwing an ERROR condition
> during ipa-healthcheck for the "ReplicationCheck" test.
> Ipa-healthcheck shows no errors when run from the other two replicas.
> Looking back at the logs, it appears this started about ten days ago,
> so it is not a transient issue as the output suggests:
>
>
>
> [root(a)ipa1.id.example.com]# ipa-healthcheck --failures-only
>
>
>
> [
>
> {
>
> "source": "ipahealthcheck.ds.replication",
>
> "check": "ReplicationCheck",
>
> "result": "ERROR",
>
> "uuid": "2b971ca3-678e-4c26-86a0-5b352027e7e8",
>
> "when": "20211201180013Z",
>
> "duration": "0.687812",
>
> "kw": {
>
> "key": "DSREPLLE0003",
>
> "items": [
>
> "Replication",
>
> "Agreement"
>
> ],
>
> "msg": "The replication agreement (
catoipa2.id.example.com)
> under \"o=ipaca\" is not in synchronization.\nStatus message: error
> (18) can't acquire replica (incremental update transient warning.
> backing off, will retry update later.)"
>
> }
>
> },
>
> {
>
> "source": "ipahealthcheck.ds.replication",
>
> "check": "ReplicationCheck",
>
> "result": "ERROR",
>
> "uuid": "99436870-bc98-4ce8-84b1-c0b0806945c8",
>
> "when": "20211201180013Z",
>
> "duration": "0.687829",
>
> "kw": {
>
> "key": "DSREPLLE0003",
>
> "items": [
>
> "Replication",
>
> "Agreement"
>
> ],
>
> "msg": "The replication agreement (
catoipa3.id.example.com)
> under \"o=ipaca\" is not in synchronization.\nStatus message: error
> (18) can't acquire replica (incremental update transient warning.
> backing off, will retry update later.)"
>
> }
>
> }
>
> ]
>
>
>
> 389-ds error logs show a slew of these:
>
>
>
> [30/Nov/2021:23:41:35.277399980 -0800] - ERR - NSMMReplicationPlugin -
> send_updates - agmt="cn=caToipa3.id.example.com" (ipa2:389): Missing
> data encountered. If the error persists the replica must be reinitialized.
>
> [30/Nov/2021:23:41:38.288003253 -0800] - ERR -
> agmt="cn=caToipa3.id.example.com" (ipa3:389) - clcache_load_buffer -
> Can't locate CSN 6197e149000000060000 in the changelog (DB rc=-30988).
> If replication stops, the consumer may need to be reinitialized.
>
> [30/Nov/2021:23:41:38.289713999 -0800] - ERR - NSMMReplicationPlugin -
> send_updates - agmt="cn=caToipa3.id.example.com" (ipa3:389): Missing
> data encountered. If the error persists the replica must be reinitialized.
>
>
>
> That would seem to suggest running a "ipa-replica-manage re-initialize
> --from $SERVER_TO_PULL_FROM" may resolve the issue, but before we try
> that, is there anything else we should look at?
You use ipa-csreplica-manage to manage the CA replication agreements.
But yes, it looks like you need to re-initialize some of them.
I'd suggest dump the ldif of the two to be re-inited and see if there are any entries
there not recorded in the one you will re-init from to see if there is any potential data
loss.
rob
_______________________________________________
FreeIPA-users mailing list -- freeipa-users(a)lists.fedorahosted.org
To unsubscribe send an email to freeipa-users-leave(a)lists.fedorahosted.org
Fedora Code of Conduct:
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines:
https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives:
https://lists.fedorahosted.org/archives/list/freeipa-users@lists.fedoraho...
Do not reply to spam on the list, report it:
https://pagure.io/fedora-infrastructure
_______________________________________________
FreeIPA-users mailing list -- freeipa-users(a)lists.fedorahosted.org
To unsubscribe send an email to freeipa-users-leave(a)lists.fedorahosted.org
Fedora Code of Conduct:
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines:
https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives:
https://lists.fedorahosted.org/archives/list/freeipa-users@lists.fedoraho...
Do not reply to spam on the list, report it:
https://pagure.io/fedora-infrastructure