Hello Noriko,
----- Mail original -----
De: "Noriko Hosoi" <nhosoi(a)redhat.com>
À: "General discussion list for the 389 Directory server project."
<389-users(a)lists.fedoraproject.org>
Envoyé: Mercredi 5 Novembre 2014 21:26:47
Objet: Re: [389-users] 389-Directory/1.3.1.6 cannot setup replica
Hello Andrey,
On 11/05/2014 11:53 AM, Ivanov Andrey (M.) wrote:
<blockquote>
Hi,
i'm having the same problem. I'm in the process of migration from our 389DS
v1.2.10.25/CentOS5 to 389DS on CentOS 7. Everything is working fine on standalone servers
but the replication (especially online initialization). It stucks _each time_ during the
online initialization with SSL/TLS (and sometimes without SSL/TLS). And exactly with the
same error messages as you describe.
The network problems in my case are excluded - i used both virtual machines on the same
ESXi in the same network and/or physical servers, the results are the same.
I've tried compiling all the branches latest available (tags 1.3.2.24, 1.3.1.22,
1.3.3.5). In all the cases the result was the same. The server pushing the updates just
gets stuck at some random number of entries sent to consumer (we have ~30000 entries, it
gets stuck at random somewhere from 1200 to 25000 entries, the entries stuck have nothing
particular in size, it's completely random).
When you tried the versions "(tags 1.3.2.24, 1.3.1.22, 1.3.3.5)", it does not
report any extra errors in the error log? E.g.,
<blockquote>
Incoming BER Element was @@@ bytes, max allowable is ### bytes. Change the
nsslapd-maxbersize attribute in cn=config to increase.
</blockquote>
</blockquote>
No - as i said, it was clearly not a problem of nsslapd-maxbersize or
nsslapd-maxsasliosize or nsslapd-sasl-max-buffer-size, i'v checked it. Anyway, as i
said, the on-line initialization has more chances to go to the end (sometimes it goes to
the end with the same set of data) if i disable SSL, increase the number of CPUs in VM and
increase the nsslapd-db-checkpoint-interval, each time it's stuck in a different
place, so it's not entry-related...
<blockquote>
<blockquote>
</blockquote>
Next time it happens, could it be possible to get the stacktraces from the hung server?
<blockquote>
http://www.port389.org/docs/389ds/FAQ/faq.html#debugging-hangs
</blockquote>
</blockquote>
Ok, i'll do that tomorrow (for 1.3.2.24 since i'm testing mainly this one). It
happens each time during a full on-line initialization, so it won't be difficult
difficult to reproduce :) It does not really hang, only the online initialization hangs in
fact (with the logs similar to the original mail)...
<blockquote>
<blockquote>
</blockquote>
<blockquote>
1.2.10.24 compiled with compat-db-4.7 on CentOS 7 has the least of these problems (and the
initial replication is 10 times faster - it takes 8 seconds instead of 80 for 1.3.x!).
I've been using 1.2.10.24 on CentOS5 compiled with mozilla ldap labraries and
1.2.10.23 obn CentOS 7 compiled with opendlap librarires. The first one had no problems at
all to push the initial replication, the second one had intermittent problems, but much
less than v1.3.x
I've noticed that this problem is getting worse (or simply appears) if :
* the replica is be of type 3 (multi-master), with replication agreements in both
directions
* our schema has several additional attribuites, it may be also important
</blockquote>
We improved the schema replication at some time, which should be in 1.3.2.24 and 1.3.3.5.
https://fedorahosted.org/389/ticket/47721
</blockquote>
The schema ldif is small and it is deployed before the initial on-line init, it is not
replicated, so i don't think it's a problem.
<blockquote>
<blockquote>
* if the virtual machine has only one CPU. Adding a second CPU increases the number of
transferred entries before the initialization gets stuck. So it may me some
thread/transaction contention or deadlock.
* if the replication agreement uses SSL(port 636) or TLS(port389). Using port 389 with
LDAP protocol instead of TLS/SSL increases the number of transferred entries before the
initialization gets stuck. Sometimes the initialization even ends successfully in this
case.
* decreasing nsslapd-db-checkpoint-interval (say, to 5 seconds) also gets the problem
worse
</blockquote>
This indicates the issue is in the BDB?
</blockquote>
Don't know. My hypotheses are :
* using plugin transactions compared to 1.2.10.x
* bdb version? but even with compat-db-47 and 1.2.10 the problem still happens on CentOS7,
though much less frequently. It never happens with 1.2.10 with rpm bdb on CentOS5.
* change from mozilla ldap libraries to openldap libraries?
seems to be some sort of thread or transaction contention that is reduced when i add
CPUs/increase checkpoint interval. It really looks like the master server just does not
send entries any more at some moment... SSL/TLS slows the things down so less entries are
sent before everything gets stuck...
I'll get back with more information (stacktraces) tomorrrow.
Thanks!
<blockquote>
Thanks,
--noriko
<blockquote>
When the on-line intialization is finished (if it finishes), there are no problems. I
think it is related to the volume of data transferred, so small incremental updates do not
generate any problem....
If necessary, i will make any debugs/tests - it is a critical element of our
infrastructure, so i'd like this problem to be resolved...
Regards,
Andrey IVANOV
----- Mail original -----
<blockquote>
De: "陳含 林" <laneovcc(a)gmail.com>
À: 389-users(a)lists.fedoraproject.org
Envoyé: Mercredi 5 Novembre 2014 18:01:37
Objet: [389-users] 389-Directory/1.3.1.6 cannot setup replica
hello all,
I have setup a IDM/freeipa master using CentOS7 , and import about 5000 hosts.
then i try to setup a IDM/freeipa replication server by using ipa-replica-install.
It seems the total update on replication server hangs after about 1000+ entries imported.
I try to trigger a total update by setting nsds5beginreplicarefresh, but the result was
the same.
Any one help me ? Thanks!
idm1 is the master, idm2 is the replication server.
master server logs:
[06/Nov/2014:00:21:48 +0800] - 389-Directory/ 1.3.1.6 B2014.219.1825 starting up
[06/Nov/2014:00:21:48 +0800] schema-compat-plugin - warning: no entries set up under
cn=computers, cn=compat,dc=idc
[06/Nov/2014:00:21:51 +0800] - Skipping CoS Definition cn=Password
Policy,cn=accounts,dc=idc--no CoS Templates found, which should be added before the CoS
Definition.
[06/Nov/2014:00:21:51 +0800] - Skipping CoS Definition cn=Password
Policy,cn=accounts,dc=idc--no CoS Templates found, which should be added before the CoS
Definition.
[06/Nov/2014:00:21:51 +0800] - slapd started. Listening on All Interfaces port 389 for
LDAP requests
[06/Nov/2014:00:21:51 +0800] - Listening on All Interfaces port 636 for LDAPS requests
[06/Nov/2014:00:21:51 +0800] - Listening on /var/run/slapd-IDC.socket for LDAPI requests
[06/Nov/2014:00:21:51 +0800] - Entry "uid=admin,ou=people,o=ipaca" -- attribute
"krbExtraData" not allowed
[06/Nov/2014:00:40:26 +0800] NSMMReplicationPlugin -
agmt="cn=meToidm2.ra.cn.idc" (idm2:389): The remote replica has a different
database generation ID than the local database. You may have to reinitialize the remote
replica, or the local replica.
[06/Nov/2014:00:40:26 +0800] NSMMReplicationPlugin - Beginning total update of replica
"agmt="cn=meToidm2.ra.cn.idc" (idm2:389)".
replication server logs:
[06/Nov/2014:00:40:18 +0800] - 389-Directory/ 1.3.1.6 B2014.219.1825 starting up
[06/Nov/2014:00:40:18 +0800] ipalockout_get_global_config - [file ipa_lockout.c, line
185]: Failed to get default realm (-1765328160)
[06/Nov/2014:00:40:18 +0800] ipaenrollment_start - [file ipa_enrollment.c, line 393]:
Failed to get default realm?!
[06/Nov/2014:00:40:18 +0800] - slapd started. Listening on All Interfaces port 389 for
LDAP requests
[06/Nov/2014:00:40:18 +0800] - Listening on All Interfaces port 636 for LDAPS requests
[06/Nov/2014:00:40:18 +0800] - Listening on /var/run/slapd-IDC.socket for LDAPI requests
[06/Nov/2014:00:40:19 +0800] NSMMReplicationPlugin -
agmt="cn=meToidm1.ra.cn.idc" (idm1:389): The remote replica has a different
database generation ID than the local database. You may have to reinitialize the remote
replica, or the local replica.
[06/Nov/2014:00:40:20 +0800] NSMMReplicationPlugin - multimaster_be_state_change: replica
dc=idc is going offline; disabling replication
[06/Nov/2014:00:40:20 +0800] - WARNING: Import is running with
nsslapd-db-private-import-mem on; No other process is allowed to access the database
[06/Nov/2014:00:40:40 +0800] - import userRoot: Processed 1581 entries -- average rate
79.0/sec, recent rate 79.0/sec, hit ratio 0%
[06/Nov/2014:00:41:00 +0800] - import userRoot: Processed 1581 entries -- average rate
39.5/sec, recent rate 39.5/sec, hit ratio 0%
[06/Nov/2014:00:41:20 +0800] - import userRoot: Processed 1581 entries -- average rate
26.4/sec, recent rate 0.0/sec, hit ratio 0%
[06/Nov/2014:00:41:41 +0800] - import userRoot: Processed 1581 entries -- average rate
19.8/sec, recent rate 0.0/sec, hit ratio 0%
[06/Nov/2014:00:42:01 +0800] - import userRoot: Processed 1581 entries -- average rate
15.8/sec, recent rate 0.0/sec, hit ratio 0%
[06/Nov/2014:00:42:21 +0800] - import userRoot: Processed 1581 entries -- average rate
13.2/sec, recent rate 0.0/sec, hit ratio 0%
[06/Nov/2014:00:42:41 +0800] - import userRoot: Processed 1581 entries -- average rate
11.3/sec, recent rate 0.0/sec, hit ratio 0%
[06/Nov/2014:00:43:01 +0800] - import userRoot: Processed 1581 entries -- average rate
9.9/sec, recent rate 0.0/sec, hit ratio 0%
[06/Nov/2014:00:43:21 +0800] - import userRoot: Processed 1581 entries -- average rate
8.8/sec, recent rate 0.0/sec, hit ratio 0%
[06/Nov/2014:00:43:41 +0800] - import userRoot: Processed 1581 entries -- average rate
7.9/sec, recent rate 0.0/sec, hit ratio 0%
[06/Nov/2014:00:44:01 +0800] - import userRoot: Processed 1581 entries -- average rate
7.2/sec, recent rate 0.0/sec, hit ratio 0%
[06/Nov/2014:00:44:21 +0800] - import userRoot: Processed 1581 entries -- average rate
6.6/sec, recent rate 0.0/sec, hit ratio 0%
[06/Nov/2014:00:44:41 +0800] - import userRoot: Processed 1581 entries -- average rate
6.1/sec, recent rate 0.0/sec, hit ratio 0%
[06/Nov/2014:00:45:01 +0800] - import userRoot: Processed 1581 entries -- average rate
5.6/sec, recent rate 0.0/sec, hit ratio 0%
[06/Nov/2014:00:45:21 +0800] - import userRoot: Processed 1581 entries -- average rate
5.3/sec, recent rate 0.0/sec, hit ratio 0%
[06/Nov/2014:00:45:21 +0800] - import userRoot: Decided to end this pass because the
progress rate has dropped below the 50% threshold.
[06/Nov/2014:00:45:21 +0800] - import userRoot: Ending pass number 1 ...
[06/Nov/2014:00:45:22 +0800] - import userRoot: Foreman is done; waiting for workers to
finish...
[06/Nov/2014:00:45:22 +0800] - import userRoot: Workers finished; cleaning up...
[06/Nov/2014:00:45:22 +0800] - import userRoot: Workers cleaned up.
[06/Nov/2014:00:45:22 +0800] - import userRoot: Sweeping files for merging later...
[06/Nov/2014:00:45:22 +0800] - WARNING: Import is running with
nsslapd-db-private-import-mem on; No other process is allowed to access the database
[06/Nov/2014:00:45:22 +0800] - import userRoot: Sweep done.
[06/Nov/2014:00:45:22 +0800] - import userRoot: Beginning pass number 2
[06/Nov/2014:00:45:42 +0800] - import userRoot: Processed 1581 entries (pass 2) -- average
rate 214748285.8/sec, recent rate 0.0/sec, hit ratio 0%
[06/Nov/2014:00:46:02 +0800] - import userRoot: Processed 1581 entries (pass 2) -- average
rate 107374142.9/sec, recent rate 0.0/sec, hit ratio 0%
[06/Nov/2014:00:46:22 +0800] - import userRoot: Processed 1581 entries (pass 2) -- average
rate 71582761.9/sec, recent rate 0.0/sec, hit ratio 0%
[06/Nov/2014:00:46:42 +0800] - import userRoot: Processed 1581 entries (pass 2) -- average
rate 53687071.4/sec, recent rate 0.0/sec, hit ratio 0%
[06/Nov/2014:00:47:02 +0800] - import userRoot: Processed 1581 entries (pass 2) -- average
rate 42949657.1/sec, recent rate 0.0/sec, hit ratio 0%
[06/Nov/2014:00:47:22 +0800] - import userRoot: Processed 1581 entries (pass 2) -- average
rate 35791381.0/sec, recent rate 0.0/sec, hit ratio 0%
[06/Nov/2014:00:47:43 +0800] - import userRoot: Processed 1581 entries (pass 2) -- average
rate 30678326.5/sec, recent rate 0.0/sec, hit ratio 0%
[06/Nov/2014:00:48:03 +0800] - import userRoot: Processed 1581 entries (pass 2) -- average
rate 26843535.7/sec, recent rate 0.0/sec, hit ratio 0%
[06/Nov/2014:00:48:23 +0800] - import userRoot: Processed 1581 entries (pass 2) -- average
rate 23860920.6/sec, recent rate 0.0/sec, hit ratio 0%
[06/Nov/2014:00:48:43 +0800] - import userRoot: Processed 1581 entries (pass 2) -- average
rate 21474828.6/sec, recent rate 0.0/sec, hit ratio 0%
[06/Nov/2014:00:49:03 +0800] - import userRoot: Processed 1581 entries (pass 2) -- average
rate 19522571.4/sec, recent rate 0.0/sec, hit ratio 0%
[06/Nov/2014:00:49:23 +0800] - import userRoot: Processed 1581 entries (pass 2) -- average
rate 17895690.5/sec, recent rate 0.0/sec, hit ratio 0%
[06/Nov/2014:00:49:43 +0800] - import userRoot: Processed 1581 entries (pass 2) -- average
rate 16519098.9/sec, recent rate 0.0/sec, hit ratio 0%
[06/Nov/2014:00:50:03 +0800] - import userRoot: Processed 1581 entries (pass 2) -- average
rate 15339163.3/sec, recent rate 0.0/sec, hit ratio 0%
[06/Nov/2014:00:50:23 +0800] - import userRoot: Processed 1722 entries (pass 2) -- average
rate 0.5/sec, recent rate 43.0/sec, hit ratio 0%
[06/Nov/2014:00:50:26 +0800] - import userRoot: Workers finished; cleaning up...
[06/Nov/2014:00:50:26 +0800] - import userRoot: Workers cleaned up.
[06/Nov/2014:00:50:26 +0800] - import userRoot: Sweeping files for merging later...
[06/Nov/2014:00:50:26 +0800] - WARNING: Import is running with
nsslapd-db-private-import-mem on; No other process is allowed to access the database
[06/Nov/2014:00:50:26 +0800] - import userRoot: Sweep done.
[06/Nov/2014:00:50:26 +0800] - import userRoot: Beginning 2-way merge of up to 44 files...
[06/Nov/2014:00:50:26 +0800] - import userRoot: No files to merge for "aci".
[06/Nov/2014:00:50:26 +0800] - import userRoot: No files to merge for
"automountkey".
[06/Nov/2014:00:50:26 +0800] - import userRoot: No files to merge for
"carLicense".
[06/Nov/2014:00:50:26 +0800] - import userRoot: No files to merge for "cn".
[06/Nov/2014:00:50:26 +0800] - import userRoot: No files to merge for
"displayname".
[06/Nov/2014:00:50:26 +0800] - import userRoot: No files to merge for
"entryusn".
[06/Nov/2014:00:50:26 +0800] - import userRoot: No files to merge for "fqdn".
[06/Nov/2014:00:50:26 +0800] - import userRoot: No files to merge for
"gidnumber".
[06/Nov/2014:00:50:26 +0800] - import userRoot: No files to merge for
"givenName".
[06/Nov/2014:00:50:26 +0800] - import userRoot: No files to merge for
"ipakrbprincipalalias".
[06/Nov/2014:00:50:26 +0800] - import userRoot: No files to merge for
"ipasudorunas".
[06/Nov/2014:00:50:26 +0800] - import userRoot: No files to merge for
"ipasudorunasgroup".
[06/Nov/2014:00:50:26 +0800] - import userRoot: No files to merge for
"ipauniqueid".
[06/Nov/2014:00:50:26 +0800] - import userRoot: No files to merge for
"krbPrincipalName".
[06/Nov/2014:00:50:26 +0800] - import userRoot: No files to merge for
"macAddress".
[06/Nov/2014:00:50:26 +0800] - import userRoot: No files to merge for "mail".
[06/Nov/2014:00:50:26 +0800] - import userRoot: No files to merge for
"mailAlternateAddress".
[06/Nov/2014:00:50:26 +0800] - import userRoot: No files to merge for
"mailHost".
[06/Nov/2014:00:50:26 +0800] - import userRoot: No files to merge for
"managedby".
[06/Nov/2014:00:50:26 +0800] - import userRoot: No files to merge for "manager".
[06/Nov/2014:00:50:26 +0800] - import userRoot: No files to merge for "member".
[06/Nov/2014:00:50:26 +0800] - import userRoot: No files to merge for
"memberallowcmd".
[06/Nov/2014:00:50:26 +0800] - import userRoot: No files to merge for
"memberdenycmd".
[06/Nov/2014:00:50:26 +0800] - import userRoot: No files to merge for
"memberHost".
[06/Nov/2014:00:50:27 +0800] - import userRoot: No files to merge for
"memberOf".
[06/Nov/2014:00:50:27 +0800] - import userRoot: No files to merge for
"memberservice".
[06/Nov/2014:00:50:27 +0800] - import userRoot: No files to merge for
"memberUser".
[06/Nov/2014:00:50:27 +0800] - import userRoot: No files to merge for
"nscpEntryDN".
[06/Nov/2014:00:50:27 +0800] - import userRoot: No files to merge for
"nsds5ReplConflict".
[06/Nov/2014:00:50:27 +0800] - import userRoot: No files to merge for
"nsuniqueid".
[06/Nov/2014:00:50:27 +0800] - import userRoot: No files to merge for
"ntUniqueId".
[06/Nov/2014:00:50:27 +0800] - import userRoot: No files to merge for
"ntUserDomainId".
[06/Nov/2014:00:50:27 +0800] - import userRoot: No files to merge for
"objectclass".
[06/Nov/2014:00:50:27 +0800] - import userRoot: No files to merge for "ou".
[06/Nov/2014:00:50:27 +0800] - import userRoot: No files to merge for "owner".
[06/Nov/2014:00:50:27 +0800] - import userRoot: No files to merge for
"secretary".
[06/Nov/2014:00:50:27 +0800] - import userRoot: No files to merge for "seeAlso".
[06/Nov/2014:00:50:27 +0800] - import userRoot: No files to merge for "sn".
[06/Nov/2014:00:50:27 +0800] - import userRoot: No files to merge for
"sourcehost".
[06/Nov/2014:00:50:27 +0800] - import userRoot: No files to merge for
"telephoneNumber".
[06/Nov/2014:00:50:27 +0800] - import userRoot: No files to merge for "title".
[06/Nov/2014:00:50:27 +0800] - import userRoot: No files to merge for "uid".
[06/Nov/2014:00:50:27 +0800] - import userRoot: No files to merge for
"uidnumber".
[06/Nov/2014:00:50:27 +0800] - import userRoot: No files to merge for
"uniquemember".
[06/Nov/2014:00:50:27 +0800] - import userRoot: Merging completed in 0 seconds.
[06/Nov/2014:00:50:27 +0800] - import userRoot: Indexing complete. Post-processing...
[06/Nov/2014:00:50:27 +0800] - import userRoot: Generating numSubordinates complete.
[06/Nov/2014:00:50:27 +0800] - import userRoot: Flushing caches...
[06/Nov/2014:00:50:27 +0800] - import userRoot: Closing files...
[06/Nov/2014:00:50:27 +0800] - import userRoot: Import complete. Processed 1722 entries in
607 seconds. (2.84 entries/sec)
[06/Nov/2014:00:50:27 +0800] NSMMReplicationPlugin - multimaster_be_state_change: replica
dc=idc is coming online; enabling replication
[06/Nov/2014:00:50:27 +0800] - Skipping CoS Definition cn=Password
Policy,cn=accounts,dc=idc--no CoS Templates found, which should be added before the CoS
Definition.
mapping tree
# extended LDIF
#
# LDAPv3
# base <cn=mapping tree,cn=config> with scope subtree
# filter: (objectclass=*)
# requesting: ALL
#
# mapping tree, config
dn: cn=mapping tree,cn=config
objectClass: top
objectClass: extensibleObject
cn: mapping tree
# dc\3Didc, mapping tree, config
dn: cn=dc\3Didc,cn=mapping tree,cn=config
objectClass: top
objectClass: extensibleObject
objectClass: nsMappingTree
cn: dc=idc
cn: "dc=idc"
nsslapd-state: backend
nsslapd-backend: userRoot
nsslapd-referral: ldap://idm1.ra.cn.idc:389/dc%3Didc
# replica, dc\3Didc, mapping tree, config
dn: cn=replica,cn=dc\3Didc,cn=mapping tree,cn=config
cn: replica
nsDS5Flags: 1
objectClass: top
objectClass: nsds5replica
objectClass: extensibleobject
nsDS5ReplicaType: 3
nsDS5ReplicaRoot: dc=idc
nsds5ReplicaLegacyConsumer: off
nsDS5ReplicaId: 3
nsDS5ReplicaBindDN: cn=replication manager,cn=config
nsState:: AwAAAAAAAADzUlpUAAAAAAAAAAAAAAAAAAAAAAAAAAABAAAAAAAAAA==
nsDS5ReplicaName: 6cf48589-650a11e4-90ecc46a-d11f6771
nsds5ReplicaChangeCount: 0
nsds5replicareapactive: 0
# meToidm1.ra.cn.idc, replica, dc\3Didc, mapping tree, config
dn: cn=meToidm1.ra.cn.idc,cn=replica,cn=dc\3Didc,cn=mapping tree,cn=config
cn: meToidm1.ra.cn.idc
objectClass: nsds5replicationagreement
objectClass: top
nsDS5ReplicaTransportInfo: TLS
description: me to idm1.ra.cn.idc
nsDS5ReplicaRoot: dc=idc
nsDS5ReplicaHost: idm1.ra.cn.idc
nsds5replicaTimeout: 600
nsDS5ReplicaBindDN: cn=replication manager,cn=config
nsDS5ReplicaPort: 389
nsDS5ReplicaBindMethod: simple
nsDS5ReplicatedAttributeList: (objectclass=*) $ EXCLUDE memberof idnssoaserial
entryusn krblastsuccessfulauth krblastfailedauth krbloginfailedcount
nsds5ReplicaStripAttrs: modifiersName modifyTimestamp internalModifiersName in
ternalModifyTimestamp
nsDS5ReplicaCredentials: {DES}
nsDS5ReplicatedAttributeListTotal: (objectclass=*) $ EXCLUDE entryusn krblasts
uccessfulauth krblastfailedauth krbloginfailedcount
nsds5replicareapactive: 0
nsds5replicaLastUpdateStart: 0
nsds5replicaLastUpdateEnd: 0
nsds5replicaChangesSentSinceStartup:
nsds5replicaLastUpdateStatus: 402 Replication error acquiring replica: unknown
error - Replica has different database generation ID, remote replica may nee
d to be initialized
nsds5replicaUpdateInProgress: FALSE
nsds5replicaLastInitStart: 0
nsds5replicaLastInitEnd: 0
# search result
search: 2
result: 0 Success
# numResponses: 5
# numEntries: 4
--
389 users mailing list
389-users(a)lists.fedoraproject.org
https://admin.fedoraproject.org/mailman/listinfo/389-users
</blockquote>
--
389 users mailing list 389-users(a)lists.fedoraproject.org
https://admin.fedoraproject.org/mailman/listinfo/389-users
</blockquote>
--
389 users mailing list
389-users(a)lists.fedoraproject.org
https://admin.fedoraproject.org/mailman/listinfo/389-users
</blockquote>