I ran verify-db.pl on both servers and got the same output [1].  Both were running while I did this so unsure if that can cause false negatives, the message printed only mentions false positives.

I've had nsslapd-errorlog-level set to 8192 since I saw the warnings in my first email, and those warnings haven't shown up since.  I also have not done a reinit since then either as my scripts that now look for inconsistencies in the directory show no problems.  Our directory is rather static with an occasional addition of a new user.  The users missing from ldap02 that prompted the reinit were not ones I had moved like the accounts mentioned in my original logs.  Those users were created by an automated system and are always left in ou=People.

If newer versions available to EL6 appear to have solved this for others then I'll plan on upgrading these systems in hopes of removing this issue with MMR.

Thanks,
- Trey

[1]:
# /usr/lib64/dirsrv/slapd-ldap01/verify-db.pl
*****************************************************************
verify-db: This tool should only be run if recovery start fails
and the server is down.  If you run this tool while the server is
running, you may get false reports of corrupted files or other
false errors.
*****************************************************************
Verify log files in /var/lib/dirsrv/slapd-ldap01/db ... Good
Verify db files ... Good

# /usr/lib64/dirsrv/slapd-ldap02/verify-db.pl
*****************************************************************
verify-db: This tool should only be run if recovery start fails
and the server is down.  If you run this tool while the server is
running, you may get false reports of corrupted files or other
false errors.
*****************************************************************
Verify log files in /var/lib/dirsrv/slapd-ldap02/db ... Good
Verify db files ... Good

On Mon, Aug 10, 2015 at 1:56 PM, Mark Reynolds <mareynol@redhat.com> wrote:


On 08/10/2015 02:51 PM, German Parente wrote:
hi Trey,

not sure which is the bug. Perhaps someone else here can give details ?
It could have come from the moment that entryrdn index has been created but this was a very old version.

For instance:

https://bugzilla.redhat.com/show_bug.cgi?id=729369

Sincerely, I cannot say when the entryrdn index got corrupted.
You can try running verify-db.pl to see if it reports any problems. If it does say there are issues, you could try exporting (db2ldif -r) and importing (ldif2db) on the master to reindex the entire database, and then try reiniting the other replica.

Mark

But what I could say is that our customers in recent versions are not hitting this issue any more.

Thanks and regards,

German




----- Original Message -----
From: "Trey Dockendorf" <treydock@gmail.com>
To: "General discussion list for the 389 Directory server project." <389-users@lists.fedoraproject.org>
Sent: Monday, August 10, 2015 6:55:41 PM
Subject: Re: [389-users] Replication reinit skipping entries



German,

Thanks for the response. Do you recall which version it was that fixed this
issue or have reference to bug ticket? Looking at latest EL6 RPM changlelog
doesn't show anything obviously related to this issue. I'm on
1.2.11.15-32.el6_5 and appears latest available is 1.2.11.15-60.el6. The
1.2.2 package is from EPEL and not sure why it was installed but appears to
only install a LICENSE file.
Thanks
- Trey

Hi again Trey,

Sorry, I haven't seen your logs. But the errors are identical to what I am
describing.

Version 389-ds-1.2.2-1.el6.noarch is rather old and I would advice to, as
first action, update to current version of 389-ds-base.

Regards,

German.

----- Original Message -----
From: "German Parente" < gparente@redhat.com >
To: "General discussion list for the 389 Directory server project." <
389-users@lists.fedoraproject.org >
Sent: Friday, August 7, 2015 8:22:27 PM
Subject: Re: [389-users] Replication reinit skipping entries

Hi Trey,

I have seen this issue twice in customer cases. There was a bug sometime
ago
which provoke that during on-line re-init, an entry was not sent from
supplier side (because of corruption in entryrdn) and then, in the consumer
side all the children of this entry were skipped.

this is fixed in recent versions of 389-ds-base. All our customers having
this issue have workarounded it by:

- updating to current version so as the issue will not happen any more.
- fix db by: export -r + off-line re-import in all the replicas.

the errors you mention are of this sort ?

[28/May/2015:10:38:12 -0300] - WARNING: Import is running with
nsslapd-db-private-import-mem on; No other process is allowed to access the
dat
abase
[28/May/2015:10:38:16 -0300] - import xxxx: WARNING: Skipping entry
"uid=13364081204,dc=somedc" which has
no parent, ending at line 0 of file "(bulk import)"
[28/May/2015:10:38:16 -0300] - import xxxx: WARNING: bad entry: ID 7127
[28/May/2015:10:38:16 -0300] - import xxxx: WARNING: Skipping entry
"uid=05722535249,dc=somedc" which has
no parent, ending at line 0 of file "(bulk import)"
[28/May/2015:10:38:17 -0300] - import xxxx: WARNING: bad entry: ID 7242

Regards,

German.



----- Original Message -----
From: "Trey Dockendorf" < treydock@gmail.com >
To: "General discussion list for the 389 Directory server project."
< 389-users@lists.fedoraproject.org >
Sent: Friday, August 7, 2015 7:51:05 PM
Subject: [389-users] Replication reinit skipping entries

I recently discovered my two 389DS servers in master-master replication
had
some inconsistencies. Initially the only differences were 3 users added
to
ldap01 did not exist in ldap02. I re-initialized ldap02 from ldap01 and
now
am seeing that 3 groups defined are being skipped [1].

I read in another thread that someone else saw this when they moved a
LDAP
record from one location to another in the directory. I believe that may
be
what happened here as I know the SLURM user and group both used to exist
in
a different OU. I moved them to the "Service" OUs some months ago. What's
odd is that this move did not cause the user records to be skipped, just
the
group records. The thread I saw regarding something similar appears to
have
the fix resolved in 1.2.10 series. Is this some different bug?

As a work around and test of a fix I deleted the 'backupuser' LDAP group
from
ldap01 and added it back via a LDIF. I then reinitialized ldap02 from
ldap01
and that group now exists on ldap02, but I still get a warning [2]. The
nsuniqueid in the warning is not the nsuniqueid of the newly created
backupuser entry. Is there anything to be concerned about with this
warning?

These are the 389-ds packages installed on both ldap01 and ldap02:

389-admin-1.1.35-1.el6.x86_64
389-admin-console-1.1.8-1.el6.noarch
389-admin-console-doc-1.1.8-1.el6.noarch
389-adminutil-1.1.19-1.el6.x86_64
389-adminutil-devel-1.1.19-1.el6.x86_64
389-console-1.1.7-1.el6.noarch
389-ds-1.2.2-1.el6.noarch
389-ds-base-1.2.11.15-32.el6_5.x86_64
389-ds-base-devel-1.2.11.15-32.el6_5.x86_64
389-ds-base-libs-1.2.11.15-32.el6_5.x86_64
389-ds-console-1.2.6-1.el6.noarch
389-ds-console-doc-1.2.6-1.el6.noarch
389-dsgw-1.1.11-1.el6.x86_64

Let me know what other information may be useful and if this is something
I
need to submit as a bug report.

Thanks,
- Trey

[1]:

[07/Aug/2015:12:35:20 -0500] NSMMReplicationPlugin - conn=353332 op=3
Relinquishing consumer connection extension
[07/Aug/2015:12:35:20 -0500] - import userRoot: WARNING: Skipping entry
"cn=slurm,ou=Service Groups,dc=brazos,dc=tamu,dc=edu" which has no
parent,
ending at line 0 of file "(bulk import)"
[07/Aug/2015:12:35:20 -0500] - import userRoot: WARNING: Skipping entry
"cn=rsv,ou=Service Groups,dc=brazos,dc=tamu,dc=edu" which has no parent,
ending at line 0 of file "(bulk import)"
[07/Aug/2015:12:35:20 -0500] - import userRoot: WARNING: bad entry: ID 20
[07/Aug/2015:12:35:20 -0500] - import userRoot: WARNING: bad entry: ID 22
[07/Aug/2015:12:35:21 -0500] - import userRoot: WARNING: Skipping entry
"cn=backupuser,ou=Service Groups,dc=brazos,dc=tamu,dc=edu" which has no
parent, ending at line 0 of file "(bulk import)"
[07/Aug/2015:12:35:21 -0500] - import userRoot: WARNING: bad entry: ID
4102
[07/Aug/2015:12:35:24 -0500] NSMMReplicationPlugin - conn=353332 op=4242
Acquired consumer connection extension
[07/Aug/2015:12:35:24 -0500] - import userRoot: Workers finished;
cleaning
up...
[07/Aug/2015:12:35:24 -0500] - import userRoot: Workers cleaned up.
[07/Aug/2015:12:35:24 -0500] - import userRoot: Indexing complete.
Post-processing...
[07/Aug/2015:12:35:24 -0500] - import userRoot: Generating
numSubordinates
complete.
[07/Aug/2015:12:35:24 -0500] - import userRoot: Flushing caches...
[07/Aug/2015:12:35:24 -0500] - import userRoot: Closing files...
[07/Aug/2015:12:35:24 -0500] - import userRoot: Import complete.
Processed
4238 entries (3 were skipped) in 4 seconds. (1059.50 entries/sec)

[2]:
[07/Aug/2015:12:38:48 -0500] NSMMReplicationPlugin - conn=353340 op=3
Relinquishing consumer connection extension
[07/Aug/2015:12:38:49 -0500] - import userRoot: WARNING: Skipping entry
"cn=slurm,ou=Service Groups,dc=brazos,dc=tamu,dc=edu" which has no
parent,
ending at line 0 of file "(bulk import)"
[07/Aug/2015:12:38:49 -0500] - import userRoot: WARNING: Skipping entry
"cn=rsv,ou=Service Groups,dc=brazos,dc=tamu,dc=edu" which has no parent,
ending at line 0 of file "(bulk import)"
[07/Aug/2015:12:38:49 -0500] - import userRoot: WARNING: bad entry: ID 20
[07/Aug/2015:12:38:49 -0500] - import userRoot: WARNING: bad entry: ID 22
[07/Aug/2015:12:38:50 -0500] - import userRoot: WARNING: Skipping entry
"nsuniqueid=15ed1e81-b6a411e3-9084dfca-5696e563,cn=backupuser,ou=Service
Groups,dc=brazos,dc=tamu,dc=edu" which has no parent, ending at line 0 of
file "(bulk import)"
[07/Aug/2015:12:38:50 -0500] - import userRoot: WARNING: bad entry: ID
4102
[07/Aug/2015:12:38:52 -0500] NSMMReplicationPlugin - conn=353340 op=4243
Acquired consumer connection extension
[07/Aug/2015:12:38:52 -0500] - import userRoot: Workers finished;
cleaning
up...
[07/Aug/2015:12:38:52 -0500] - import userRoot: Workers cleaned up.
[07/Aug/2015:12:38:52 -0500] - import userRoot: Indexing complete.
Post-processing...
[07/Aug/2015:12:38:52 -0500] - import userRoot: Generating
numSubordinates
complete.
[07/Aug/2015:12:38:52 -0500] - import userRoot: Flushing caches...
[07/Aug/2015:12:38:52 -0500] - import userRoot: Closing files...
[07/Aug/2015:12:38:53 -0500] - import userRoot: Import complete.
Processed
4239 entries (3 were skipped) in 5 seconds. (847.80 entries/sec)

--
389 users mailing list
389-users@lists.fedoraproject.org
https://admin.fedoraproject.org/mailman/listinfo/389-users
--
389 users mailing list
389-users@lists.fedoraproject.org
https://admin.fedoraproject.org/mailman/listinfo/389-users
--
389 users mailing list
389-users@lists.fedoraproject.org
https://admin.fedoraproject.org/mailman/listinfo/389-users

--
389 users mailing list
389-users@lists.fedoraproject.org
https://admin.fedoraproject.org/mailman/listinfo/389-users
--
389 users mailing list
389-users@lists.fedoraproject.org
https://admin.fedoraproject.org/mailman/listinfo/389-users

--
389 users mailing list
389-users@lists.fedoraproject.org
https://admin.fedoraproject.org/mailman/listinfo/389-users