Would you mind having a look through the DS error and access logs on
the affected system, to see if there are any clues about why the VLV
index became inconsistent?
It seems there are no records of VLV-related errors in DS logs.
The only messages in error log that contain 'vlv' term are either
backup-related (dblayer_copyfile, dblayer_copy_directory) or about
building VLV index (ldbm_back_ldbm2index) that I initiated myself.
Didn't find any clues in access logs either, records look like regular
LDAP queries to me:
conn=130 op=11 SRCH base="ou=ca,ou=requests,o=ipaca" scope=1
filter="(requestState=*)" attrs=ALL
conn=130 op=11 SORT requestId
conn=130 op=11 VLV 5:0:0819990000 2:2 (0)
conn=130 op=11 RESULT err=0 tag=101 nentries=2 etime=0.0001728512
My only guess is that VLV index was damaged some time ago when BDB ran out
of file descriptors and panicked (which was caused by default value of
nsslapd-maxdescriptors=1024 in cn=config being too low for our setup):
ERR - libdb - BDB2520 /var/lib/dirsrv/slapd-LOCAL-DOMAIN/db/log.0000000242: log file
unreadable: Too many open files
ERR - libdb - BDB0061 PANIC: Too many open files
ERR - libdb - BDB0060 PANIC: fatal region error detected; run recovery
ERR - idl_new_fetch - idl_new.c (1); server stopping as database recovery needed.
I've restored domain database from another replica but didn't do anything about
CA database which probebly was a mistake.
It feels a little worrying that logs show no signs of inconsistent VLV because
that means we're unable to monitor and fix the issue before it becomes a problem
as it happened in our case.
That's a great post, thank you!
Regards,
Boris