Fraser, Rob and list,
Just to let you guys know this is now resolved.

Although I didn't see any conflicts in the existing ranges, the renewal process was using an already existing serial. So, I went ahead and followed the steps in the section "Fixing range conflicts" from Fraser's blog, making the standby ranges the active ranges and assigning new standby ranges. Certificates were renewed after resubmission, but some of them with the wrong subject, to the point I had to roll back the changes (from backup)

I then followed the same process to fix the ranges, but instead of resubmitting the renewal requests the default way, I forced certmonger to use new serial numbers for the certificates by following the instructions in this post:
https://lists.fedorahosted.org/archives/list/freeipa-users@lists.fedorahosted.org/message/AIPJAKGCR4JO7JNKN4QWTJLEJHEEJASW/
This was done for the CA master only.

After all certificates were renewed for the CA master, resubmitting the certificate renewals for the replicas (without modifying the certmonger behavior) did the trick:
getcert resubmit -d /etc/pki/pki-tomcat/alias -n 'subsystemCert cert-pki-ca'
getcert resubmit -d /etc/pki/pki-tomcat/alias -n 'ocspSigningCert cert-pki-ca'
getcert resubmit -d /etc/pki/pki-tomcat/alias -n 'auditSigningCert cert-pki-ca'
getcert resubmit -f /var/lib/ipa/ra-agent.pem

Thanks so much Fraser and Rob for your help!

All the best,
Guillermo



On Fri, Jul 26, 2019 at 7:01 AM Guillermo Fuentes <gfuentes2k@gmail.com> wrote:
Awesome!
Thanks so much!
Guillermo

On Fri, Jul 26, 2019 at 4:18 AM Fraser Tweedale <ftweedal@redhat.com> wrote:
On Tue, Jul 23, 2019 at 09:34:45PM -0400, Guillermo Fuentes wrote:
> Thanks so much Fraser for your reply.
> Looking forward to your blog post!
> All the best,
> Guillermo
>
Here you go:

https://frasertweedale.github.io/blog-redhat/posts/2019-07-26-dogtag-replica-ranges.html

Cheers,
Fraser

> On Tue, Jul 23, 2019 at 8:22 PM Fraser Tweedale <ftweedal@redhat.com> wrote:
> >
> > On Tue, Jul 23, 2019 at 12:50:53AM -0400, Guillermo Fuentes via FreeIPA-users wrote:
> > > Hi list,
> > >
> > > I'm having an issue where the CA subsystem certificates are failing to renew.
> > >
> > > *** Environment:
> > > 4 FreeIPA replica servers with CA.
> > >
> > > Currently CentOS 7 up-to-date.
> > > Initially setup as FreeIPA 3 on CentOS 6 upgraded to CentOS 7 and kept
> > > up-to-date with the latest stable release available:
> > > ipa-server-4.6.4-10.el7.centos.3.x86_64
> > > 389-ds-base-1.3.8.4-23.el7_6.x86_64
> > > pki-ca-10.5.9-13.el7_6.noarch
> > > pki-server-10.5.9-13.el7_6.noarch
> > >
> > > This past June third-party wildcard certificate was installed for both
> > > the HTTPS and LDAP servers using the ipa-server-certinstall command.
> > >
> > > Trying to be proactive as the following CA subsystem certificates will
> > > expire on August 12, 2019:
> > > subject: CN=CA Audit,O=EXAMPLE.COM
> > > subject: CN=OCSP Subsystem,O=EXAMPLE.COM
> > > subject: CN=CA Subsystem,O=EXAMPLE.COM
> > > subject: CN=IPA RA,O=EXAMPLE.COM
> > >
> > >
> > > *** Issue:
> > > Getting the following error when listing the certificate info for each
> > > of them in the CA Renewal and CRL Master:
> > > status: MONITORING
> > > ca-error: Server at
> > > "https://ipa1.example.com:8443/ca/agent/ca/profileProcess" replied: 1:
> > > Server Internal Error
> > > stuck: no
> > > CA: dogtag-ipa-ca-renew-agent
> > > issuer: CN=Certificate Authority,O=EXAMPLE.COM
> > > expires: 2019-08-12 ...
> > > pre-save command: /usr/libexec/ipa/certmonger/stop_pkicad
> > > post-save command: /usr/libexec/ipa/certmonger/renew_ca_cert
> > > "auditSigningCert cert-pki-ca"
> > > track: yes
> > > auto-renew: yes
> > >
> > > Getting the following error in the other CA replicas for the same
> > > certificates (I'm assuming fixing it in the Master should take care of
> > > the replicas):
> > > ca-error: Invalid cookie: u''
> > >
> > >
> > > Looking at the logs in the Master, the renewal seems to fail with an
> > > LDAP error code 68 (Entry Already Exists) when adding serial 268304406
> > > (when resubmitting it). Doesn't that serial suppose to exist as it's
> > > being renewed?
> > >
> > > These are some relevant logs after resubmitting:
> > > - certmonger journal shows:
> > > ...
> > > Jul 22 20:30:19 ipa1.example.com dogtag-ipa-renew-agent-submit[10116]:
> > > GET https://ipa1.example.com:8443/ca/agent/ca/profileProcess?requestId=9980043&xml=true&op=approve&name=CN%3DCA+Audit%2CO%3DEXAMPLE.COM&notBefore=2019-07-22+20%3A31%3A12&notAfter=2021-07-11+20%3A31%3A12&keyUsageCritical=true&keyUsageDigitalSignature=true&keyUsageNonRepudiation=true&keyUsageKeyEncipherment=false&keyUsageDataEncipherment=false&keyUsageKeyAgreement=false&keyUsageKeyCertSign=false&keyUsageCrlSign=false&keyUsageEncipherOnly=false&keyUsageDecipherOnly=false&signingAlg=SHA256withRSA
> > > Jul 22 20:30:19 ipa1.example.com
> > > dogtag-ipa-ca-renew-agent-submit[10097]: dogtag-ipa-renew-agent
> > > returned 2
> > > Jul 22 20:30:19 ipa1.example.com certmonger[1288]: 2019-07-22 20:30:19
> > > [1288] Server at
> > > "https://ipa1.example.com:8443/ca/agent/ca/profileProcess" replied: 1:
> > > Server Internal Error
> > >
> > > - /var/log/pki/pki-tomcat/ca/debug
> > > ...
> > > [22/Jul/2019:20:30:19][http-bio-8443-exec-16]: Repository: checkRange
> > > mLastSerialNo=268304406
> > > [22/Jul/2019:20:30:19][http-bio-8443-exec-16]: Repository:
> > > getNextSerialNumber: returning retSerial 268304406
> > > [22/Jul/2019:20:30:19][http-bio-8443-exec-16]: CAService:
> > > issueX509Cert: setting issuerDN using exact CA signing cert subjectDN
> > > encoding
> > > [22/Jul/2019:20:30:19][http-bio-8443-exec-16]: About to ca.sign cert.
> > > [22/Jul/2019:20:30:19][http-bio-8443-exec-16]: sign cert get algorithm
> > > [22/Jul/2019:20:30:19][http-bio-8443-exec-16]: sign cert encoding cert
> > > [22/Jul/2019:20:30:19][http-bio-8443-exec-16]: sign cert encoding algorithm
> > > [22/Jul/2019:20:30:19][http-bio-8443-exec-16]: CA cert signing: signing cert
> > > [22/Jul/2019:20:30:19][http-bio-8443-exec-16]: Getting algorithm
> > > context for SHA256withRSA RSASignatureWithSHA256Digest
> > > [22/Jul/2019:20:30:19][http-bio-8443-exec-16]: Signing Certificate
> > > [22/Jul/2019:20:30:19][http-bio-8443-exec-16]: storeX509Cert 268304406
> > > [22/Jul/2019:20:30:19][http-bio-8443-exec-16]: In storeX509Cert
> > > [22/Jul/2019:20:30:19][http-bio-8443-exec-16]: In
> > > LdapBoundConnFactory::getConn()
> > > [22/Jul/2019:20:30:19][http-bio-8443-exec-16]: masterConn is connected: true
> > > [22/Jul/2019:20:30:19][http-bio-8443-exec-16]: getConn: conn is connected true
> > > [22/Jul/2019:20:30:19][http-bio-8443-exec-16]: getConn: mNumConns now 5
> > > [22/Jul/2019:20:30:19][http-bio-8443-exec-16]: returnConn: mNumConns now 6
> > > LDAP operation failure - cn=268304406,ou=certificateRepository, ou=ca,
> > > o=ipaca: error result
> > >         at com.netscape.cmscore.dbs.DBSSession.add(DBSSession.java:125)
> > >         at com.netscape.cmscore.dbs.CertificateRepository.addCertificateRecord(CertificateRepository.java:737)
> > >         at com.netscape.ca.CAService.storeX509Cert(CAService.java:893)
> > >         at com.netscape.ca.CAService.storeX509Cert(CAService.java:579)
> > >         at com.netscape.ca.CAService.issueX509Cert(CAService.java:564)
> > >         at com.netscape.cms.profile.common.CAEnrollProfile.execute(CAEnrollProfile.java:203)
> > >         at com.netscape.cms.servlet.cert.RequestProcessor.approveRequest(RequestProcessor.java:371)
> > >         at com.netscape.cms.servlet.cert.RequestProcessor.processRequest(RequestProcessor.java:173)
> > > ...
> > >
> > > - /var/log/dirsrv/slapd-EXAMPLE-COM/access
> > > ...
> > > [22/Jul/2019:20:30:18.703610909 +0000] conn=15 op=44 SRCH
> > > base="cn=Certificate Manager Agents,ou=groups,o=ipaca" scope=0
> > > filter="(uniqueMember=uid=ipara,ou=people,o=ipaca)" attrs="cn"
> > > [22/Jul/2019:20:30:18.703958648 +0000] conn=15 op=44 RESULT err=0
> > > tag=101 nentries=1 etime=0.0002262912
> > > [22/Jul/2019:20:30:18.706322783 +0000] conn=14 op=2439 SRCH
> > > base="cn=9980043,ou=ca,ou=requests,o=ipaca" scope=0
> > > filter="(objectClass=*)" attrs=ALL
> > > [22/Jul/2019:20:30:18.706916435 +0000] conn=14 op=2439 RESULT err=0
> > > tag=101 nentries=1 etime=0.0020648732
> > > [22/Jul/2019:20:30:18.758736256 +0000] conn=14 op=2440 SRCH
> > > base="cn=9980043,ou=ca,ou=requests,o=ipaca" scope=0
> > > filter="(objectClass=*)" attrs=ALL
> > > [22/Jul/2019:20:30:18.759537249 +0000] conn=14 op=2440 RESULT err=0
> > > tag=101 nentries=1 etime=0.0051791092
> > > [22/Jul/2019:20:30:18.984013755 +0000] conn=15 op=45 SRCH
> > > base="ou=People,o=ipaca" scope=2
> > > filter="(description=2;268304403;CN=Certificate
> > > Authority,O=EXAMPLE.COM;CN=IPA RA,O=EXAMPLE.COM)" attrs=ALL
> > > [22/Jul/2019:20:30:18.984630800 +0000] conn=15 op=45 RESULT err=0
> > > tag=101 nentries=1 etime=0.1719630345
> > > [22/Jul/2019:20:30:18.988077577 +0000] conn=15 op=46 SRCH
> > > base="ou=Groups,o=ipaca" scope=2
> > > filter="(&(objectClass=groupofuniquenames)(cn=*))" attrs=ALL
> > > [22/Jul/2019:20:30:18.990404518 +0000] conn=15 op=46 RESULT err=0
> > > tag=101 nentries=14 etime=0.0005432147
> > > [22/Jul/2019:20:30:18.994230483 +0000] conn=15 op=47 SRCH
> > > base="uid=ipara,ou=People,o=ipaca" scope=0 filter="(objectClass=*)"
> > > attrs=ALL
> > > [22/Jul/2019:20:30:18.994648102 +0000] conn=15 op=47 RESULT err=0
> > > tag=101 nentries=1 etime=0.0004001393
> > > [22/Jul/2019:20:30:18.996091715 +0000] conn=15 op=48 SRCH
> > > base="cn=Certificate Manager Agents,ou=groups,o=ipaca" scope=0
> > > filter="(uniqueMember=uid=ipara,ou=people,o=ipaca)" attrs="cn"
> > > [22/Jul/2019:20:30:18.996470330 +0000] conn=15 op=48 RESULT err=0
> > > tag=101 nentries=1 etime=0.0001510753
> > > [22/Jul/2019:20:30:19.023676890 +0000] conn=14 op=2441 ADD
> > > dn="cn=268304406,ou=certificateRepository,ou=ca,o=ipaca"
> > > [22/Jul/2019:20:30:19.024455023 +0000] conn=14 op=2441 RESULT err=68
> > > tag=105 nentries=0 etime=0.1736683576
> > > [22/Jul/2019:20:30:19.037339256 +0000] conn=28692 op=5 UNBIND
> > > [22/Jul/2019:20:30:19.037387249 +0000] conn=28692 op=5 fd=162 closed - U1
> > > ...
> > >
> > >
> > > Can someone shed some light on what the problem could be and what to
> > > try to fix it?
> > >
> > > Any help is really appreciated.
> > >
> > > Guillermo
> > >
> > Hi Guillermo,
> >
> > It looks like it could be CA replica range conflicts resulting in
> > LDAP problems.  i.e. two replicas trying to assign request IDs from
> > the same range; an entry (probably created by a different replica)
> > already exists, and that is why the add fails.
> >
> > We have been seeing more and more issues lately that seem to be
> > related to replica ranges.  I don't yet have a theory about why, but
> > it could be related to replication problems (even transient ones).
> > I will write a blog post in the next day or so explaining how
> > replica range assignments work.  That should give you enough info to
> > continue analysis of the problem.  Please prod me if I haven't
> > followed up before end of week.
> >
> > Thanks,
> > Fraser