Do you have a changelog configured on B? Is B configured as a multiple master? Is the replica ID for B different than A?
Yes to all.
I hope it's an error of mine, we are planning a big reorganization of our Authentication and Authorization Infrastructure, FDS seems to be great for our needs.
I think it is a misconfiguration and maybe it work if I reinstall FDS, but i need to understand what's happening.
Thank you.
I can't understand very well what fds do during replica.
My node A has replica id 1 and node B has 2, in the changelog of A I see records like 4725e604000000010000 or 4725e80f000000010000 and in B records like 472224f2000000020000, so I conclude that 5th digit from right is the replica id. Am I wrong???
When I get the logs "Can't locate CSN 47222163000000020000" in A, is A looking in its own changelog? or in B one? Because, if is true what i said before, A is looking fot id 1 and B for id 2... Right?
By the way, i'm using bin/slapd/server/dbscan -f to look in the changelog, when fds gives the error "Can't locate CSN", I can't see the csn id in the changelog of A nor B.
Thank you.
Dael Maselli wrote:
Do you have a changelog configured on B? Is B configured as a multiple master? Is the replica ID for B different than A?
Yes to all.
I hope it's an error of mine, we are planning a big reorganization of our Authentication and Authorization Infrastructure, FDS seems to be great for our needs.
I think it is a misconfiguration and maybe it work if I reinstall FDS, but i need to understand what's happening.
Thank you.
-- Fedora-directory-users mailing list Fedora-directory-users@redhat.com https://www.redhat.com/mailman/listinfo/fedora-directory-users
Dael Maselli wrote:
I can't understand very well what fds do during replica.
My node A has replica id 1 and node B has 2, in the changelog of A I see records like 4725e604000000010000 or 4725e80f000000010000 and in B records like 472224f2000000020000, so I conclude that 5th digit from right is the replica id. Am I wrong???
Right.
When I get the logs "Can't locate CSN 47222163000000020000" in A, is A looking in its own changelog? or in B one? Because, if is true what i said before, A is looking fot id 1 and B for id 2... Right?
Yes.
By the way, i'm using bin/slapd/server/dbscan -f to look in the changelog, when fds gives the error "Can't locate CSN", I can't see the csn id in the changelog of A nor B.
I don't think dbscan can look at changelogs.
Can you describe the exact steps you took e.g. configured and created changelogs on A and B created replication manager user on A and B configured A to be a multi master replica configured B to be a multi master replica created replication agreement from A to B created replication agreement from B to A Did replica init from A to B
Note that you should not do a replica init from B to A if you already did one from A to B
Thank you.
Dael Maselli wrote:
Do you have a changelog configured on B? Is B configured as a multiple master? Is the replica ID for B different than A?
Yes to all.
I hope it's an error of mine, we are planning a big reorganization of our Authentication and Authorization Infrastructure, FDS seems to be great for our needs.
I think it is a misconfiguration and maybe it work if I reinstall FDS, but i need to understand what's happening.
Thank you.
-- Fedora-directory-users mailing list Fedora-directory-users@redhat.com https://www.redhat.com/mailman/listinfo/fedora-directory-users
-- Fedora-directory-users mailing list Fedora-directory-users@redhat.com https://www.redhat.com/mailman/listinfo/fedora-directory-users
I'm working with the java management console.
I created replication manager users as: dn: cn=A.infn.it,cn=config cn: A.infn.it description: CN=A.infn.it,L=Lecce,OU=Host,O=INFN,C=IT objectClass: top objectClass: nshost
dn: cn=B.infn.it,cn=config cn: B.infn.it description: CN=B.infn.it,L=Lecce,OU=Host,O=INFN,C=IT objectClass: top objectClass: nshost
in my shared/config/certmap.conf i have: certmap default default default:CmapLdapAttr description
I tried SSL auth and it works as I can see in the logs: [29/Oct/2007:14:53:40 +0100] conn=2 SSL 256-bit AES; client CN=A.infn.it,L=Lecce,OU=Host,O=INFN,C=IT; issuer CN=INFN CA,O=INFN,C=IT [29/Oct/2007:14:53:40 +0100] conn=2 SSL client bound as cn=A.infn.it,cn=config
The changelogs are created with management console, enabling the checkbox in the Replication node of the configuration tab, selecting the default location.
Then, under database in the replication node i checked enable replica, and Multiple Master, replication id 1 for A and 2 for B, and in the supplier DN I wrote cn=A.infn.it,cn=config in B and cn=B.infn.it,cn=config in A.
Then, right click on database name under Replication, "New Replication Agreement", selecting B node on A with port 636 and checked "Using Encrypted SSL connection" and "SSL Client Authentication". Here I had a problem! There was a pop-up that told me it can't connect to the other fds server, but I thought it was a bug, because I checked with tcpdump and saw no packet sent (I can see it with simple auth). So I clicked to continue and all seems to work well, even the initialization done from A to B, I didn't do it when I created the Agreement from B to A in the same way.
I followed the manual at http://www.redhat.com/docs/manuals/dir-server/ag/replicat.htm#66943
I hope I was clear, sorry for my macaronic english ;-)
Thank you so much.
Richard Megginson wrote:
Can you describe the exact steps you took e.g. configured and created changelogs on A and B created replication manager user on A and B configured A to be a multi master replica configured B to be a multi master replica created replication agreement from A to B created replication agreement from B to A Did replica init from A to B
Note that you should not do a replica init from B to A if you already did one from A to B
Dael Maselli wrote:
I'm working with the java management console.
I created replication manager users as: dn: cn=A.infn.it,cn=config cn: A.infn.it description: CN=A.infn.it,L=Lecce,OU=Host,O=INFN,C=IT objectClass: top objectClass: nshost
dn: cn=B.infn.it,cn=config cn: B.infn.it description: CN=B.infn.it,L=Lecce,OU=Host,O=INFN,C=IT objectClass: top objectClass: nshost
in my shared/config/certmap.conf i have: certmap default default default:CmapLdapAttr description
I tried SSL auth and it works as I can see in the logs: [29/Oct/2007:14:53:40 +0100] conn=2 SSL 256-bit AES; client CN=A.infn.it,L=Lecce,OU=Host,O=INFN,C=IT; issuer CN=INFN CA,O=INFN,C=IT [29/Oct/2007:14:53:40 +0100] conn=2 SSL client bound as cn=A.infn.it,cn=config
The changelogs are created with management console, enabling the checkbox in the Replication node of the configuration tab, selecting the default location.
Then, under database in the replication node i checked enable replica, and Multiple Master, replication id 1 for A and 2 for B, and in the supplier DN I wrote cn=A.infn.it,cn=config in B and cn=B.infn.it,cn=config in A.
Then, right click on database name under Replication, "New Replication Agreement", selecting B node on A with port 636 and checked "Using Encrypted SSL connection" and "SSL Client Authentication". Here I had a problem! There was a pop-up that told me it can't connect to the other fds server, but I thought it was a bug, because I checked with tcpdump and saw no packet sent (I can see it with simple auth). So I clicked to continue and all seems to work well, even the initialization done from A to B, I didn't do it when I created the Agreement from B to A in the same way.
You don't need to initialize from B to A if you already did the initialize from A to B.
When you did the tcpdump, did you look at traffic on port 389 too, or just 636?
I followed the manual at http://www.redhat.com/docs/manuals/dir-server/ag/replicat.htm#66943
I hope I was clear, sorry for my macaronic english ;-)
Thank you so much.
Richard Megginson wrote:
Can you describe the exact steps you took e.g. configured and created changelogs on A and B created replication manager user on A and B configured A to be a multi master replica configured B to be a multi master replica created replication agreement from A to B created replication agreement from B to A Did replica init from A to B
Note that you should not do a replica init from B to A if you already did one from A to B
-- Fedora-directory-users mailing list Fedora-directory-users@redhat.com https://www.redhat.com/mailman/listinfo/fedora-directory-users
Richard Megginson, on 31/10/2007 17.43, wrote:
Dael Maselli wrote:
[...]
"SSL Client Authentication". Here I had a problem! There was a pop-up that told me it can't connect to the other fds server, but I thought it was a bug, because I checked with tcpdump and saw no packet sent (I can see it with simple auth). So I clicked to continue and all seems to work well, even the initialization done from A to B, I didn't do it when I created the Agreement from B to A in the same way.
You don't need to initialize from B to A if you already did the initialize from A to B.
Yes, I never did it. I only did A->B.
When you did the tcpdump, did you look at traffic on port 389 too, or just 636?
I looked at 389 when I used simple auth with UNencrypted connection, and I saw packets. When I do SSL Auth I specify port 636 for the destination of the agreement, so I didn't look at 389. At 636 no packets.
I tried with SSL and 389 hoping in TLS but it didn't work.
By the way, in production environment I need to do the 4-way MMR, in the manual I read to do it with the A agreement to B and D, B to A and C, and so on, in a circular manner. I don't like this way due to its split-brain danger and no ollerance to more than 1 server fault, so I first tried connecting all to all, is it wrong? May it be the cause of the CNS disaster?
I note you that after this 4-way test i deleted all agreements, replicas and changelogs, maybe there is some "dirty" configuration?
Thanks.
-- Fedora-directory-users mailing list Fedora-directory-users@redhat.com https://www.redhat.com/mailman/listinfo/fedora-directory-users
Dael Maselli wrote:
Richard Megginson, on 31/10/2007 17.43, wrote:
Dael Maselli wrote:
[...]
"SSL Client Authentication". Here I had a problem! There was a pop-up that told me it can't connect to the other fds server, but I thought it was a bug, because I checked with tcpdump and saw no packet sent (I can see it with simple auth). So I clicked to continue and all seems to work well, even the initialization done from A to B, I didn't do it when I created the Agreement from B to A in the same way.
You don't need to initialize from B to A if you already did the initialize from A to B.
Yes, I never did it. I only did A->B.
When you did the tcpdump, did you look at traffic on port 389 too, or just 636?
I looked at 389 when I used simple auth with UNencrypted connection, and I saw packets. When I do SSL Auth I specify port 636 for the destination of the agreement, so I didn't look at 389. At 636 no packets.
I tried with SSL and 389 hoping in TLS but it didn't work.
I suggest turning up the error log level to the replication log, then attempt to initialize B from A. You may have to enable replication logging on both A and B - see http://directory.fedoraproject.org/wiki/FAQ#Troubleshooting
By the way, in production environment I need to do the 4-way MMR, in the manual I read to do it with the A agreement to B and D, B to A and C, and so on, in a circular manner. I don't like this way due to its split-brain danger and no ollerance to more than 1 server fault, so I first tried connecting all to all, is it wrong?
No.
May it be the cause of the CNS disaster?
I don't think so.
I note you that after this 4-way test i deleted all agreements, replicas and changelogs, maybe there is some "dirty" configuration?
Ah, yes, that could be. Can you start over again from scratch?
Thanks.
-- Fedora-directory-users mailing list Fedora-directory-users@redhat.com https://www.redhat.com/mailman/listinfo/fedora-directory-users
-- Fedora-directory-users mailing list Fedora-directory-users@redhat.com https://www.redhat.com/mailman/listinfo/fedora-directory-users
As I said this is a test for a big central LDAP server and before starting from scratch I would like to know what's gone wrong.
I enabled the replica logs and this is the result, note that ds-m1 is node A and ds-m4 is node B, the others ds-m2 and ds-m3 where in the 4-way test. How I can delete them from the configuration???
--- Node A --- [05/Nov/2007:11:53:32 +0100] NSMMReplicationPlugin - : Update window will close at Tue Nov 6 00:01:00 2007 [05/Nov/2007:11:53:32 +0100] NSMMReplicationPlugin - agmt="cn=ds-m4.infn.it" (ds-m4:636): State: wait_for_changes -> wait_for_changes [05/Nov/2007:11:53:32 +0100] NSMMReplicationPlugin - agmt="cn=ds-m4.infn.it" (ds-m4:636): State: wait_for_changes -> start [05/Nov/2007:11:53:32 +0100] NSMMReplicationPlugin - agmt="cn=ds-m4.infn.it" (ds-m4:636): No linger to cancel on the connection [05/Nov/2007:11:53:32 +0100] NSMMReplicationPlugin - agmt="cn=ds-m4.infn.it" (ds-m4:636): Disconnected from the consumer [05/Nov/2007:11:53:32 +0100] NSMMReplicationPlugin - agmt="cn=ds-m4.infn.it" (ds-m4:636): State: start -> ready_to_acquire_replica [05/Nov/2007:11:53:32 +0100] NSMMReplicationPlugin - agmt="cn=ds-m4.infn.it" (ds-m4:636): Trying secure slapi_ldap_init [05/Nov/2007:11:53:32 +0100] NSMMReplicationPlugin - agmt="cn=ds-m4.infn.it" (ds-m4:636): binddn = , passwd = [05/Nov/2007:11:53:32 +0100] NSMMReplicationPlugin - agmt="cn=ds-m4.infn.it" (ds-m4:636): No linger to cancel on the connection [05/Nov/2007:11:53:32 +0100] NSMMReplicationPlugin - agmt="cn=ds-m4.infn.it" (ds-m4:636): Replica was successfully acquired. [05/Nov/2007:11:53:32 +0100] NSMMReplicationPlugin - agmt="cn=ds-m4.infn.it" (ds-m4:636): State: ready_to_acquire_replica -> sending_updates [05/Nov/2007:11:53:32 +0100] - _cl5PositionCursorForReplay (agmt="cn=ds-m4.infn.it" (ds-m4:636)): Consumer RUV: [05/Nov/2007:11:53:32 +0100] NSMMReplicationPlugin - agmt="cn=ds-m4.infn.it" (ds-m4:636): {replicageneration} 471e1779000000010000 [05/Nov/2007:11:53:32 +0100] NSMMReplicationPlugin - agmt="cn=ds-m4.infn.it" (ds-m4:636): {replica 4 ldap://ds-m4.infn.it:389} 471f8bb5000000040000 4721e4e7000000040000 00000000 [05/Nov/2007:11:53:32 +0100] NSMMReplicationPlugin - agmt="cn=ds-m4.infn.it" (ds-m4:636): {replica 1 ldap://ds-m1.infn.it:389} 471e185e000000010000 47220f21000000010000 00000000 [05/Nov/2007:11:53:32 +0100] NSMMReplicationPlugin - agmt="cn=ds-m4.infn.it" (ds-m4:636): {replica 2 ldap://ds-m2.infn.it:389} 471e1834000000020000 47220a40000000020000 00000000 [05/Nov/2007:11:53:32 +0100] NSMMReplicationPlugin - agmt="cn=ds-m4.infn.it" (ds-m4:636): {replica 3 ldap://ds-m3.infn.it:389} 4721e230000000030000 4721e5c6000000030000 00000000 [05/Nov/2007:11:53:32 +0100] - _cl5PositionCursorForReplay (agmt="cn=ds-m4.infn.it" (ds-m4:636)): Supplier RUV: [05/Nov/2007:11:53:32 +0100] NSMMReplicationPlugin - agmt="cn=ds-m4.infn.it" (ds-m4:636): {replicageneration} 471e1779000000010000 [05/Nov/2007:11:53:32 +0100] NSMMReplicationPlugin - agmt="cn=ds-m4.infn.it" (ds-m4:636): {replica 1 ldap://ds-m1.infn.it:389} 471e185e000000010000 4725e80f000000010000 4725e80f [05/Nov/2007:11:53:32 +0100] NSMMReplicationPlugin - agmt="cn=ds-m4.infn.it" (ds-m4:636): {replica 2 ldap://ds-m2.infn.it:389} 471e1834000000020000 47220a40000000020000 00000000 [05/Nov/2007:11:53:32 +0100] NSMMReplicationPlugin - agmt="cn=ds-m4.infn.it" (ds-m4:636): {replica 3 ldap://ds-m3.infn.it:389} 4721e230000000030000 4721e5c6000000030000 00000000 [05/Nov/2007:11:53:32 +0100] NSMMReplicationPlugin - agmt="cn=ds-m4.infn.it" (ds-m4:636): {replica 4 ldap://ds-m4.infn.it:389} 471f8bb5000000040000 4721e4e7000000040000 00000000 [05/Nov/2007:11:53:32 +0100] agmt="cn=ds-m4.infn.it" (ds-m4:636) - session start: anchorcsn=47220f21000000010000 [05/Nov/2007:11:53:32 +0100] agmt="cn=ds-m4.infn.it" (ds-m4:636) - Can't locate CSN 47220f21000000010000 in the changelog (DB rc=-30990). The consumer may need to be reinitialized. [05/Nov/2007:11:53:32 +0100] agmt="cn=ds-m4.infn.it" (ds-m4:636) - clcache_load_buffer: rc=-30990 [05/Nov/2007:11:53:32 +0100] NSMMReplicationPlugin - changelog program - agmt="cn=ds-m4.infn.it" (ds-m4:636): CSN 47220f21000000010000 found, position set for replay [05/Nov/2007:11:53:32 +0100] agmt="cn=ds-m4.infn.it" (ds-m4:636) - clcache_load_buffer: rc=-30990 [05/Nov/2007:11:53:32 +0100] NSMMReplicationPlugin - agmt="cn=ds-m4.infn.it" (ds-m4:636): No more updates to send (cl5GetNextOperationToReplay) [05/Nov/2007:11:53:32 +0100] - repl5_inc_waitfor_async_results: 0 0 [05/Nov/2007:11:53:32 +0100] - repl5_inc_result_threadmain starting [05/Nov/2007:11:53:33 +0100] - repl5_inc_result_threadmain exiting [05/Nov/2007:11:53:33 +0100] agmt="cn=ds-m4.infn.it" (ds-m4:636) - session end: state=0 load=0 sent=0 skipped=0 [05/Nov/2007:11:53:33 +0100] NSMMReplicationPlugin - agmt="cn=ds-m4.infn.it" (ds-m4:636): Successfully released consumer [05/Nov/2007:11:53:33 +0100] NSMMReplicationPlugin - agmt="cn=ds-m4.infn.it" (ds-m4:636): Beginning linger on the connection [05/Nov/2007:11:53:33 +0100] NSMMReplicationPlugin - agmt="cn=ds-m4.infn.it" (ds-m4:636): State: sending_updates -> wait_for_changes [05/Nov/2007:11:53:33 +0100] NSMMReplicationPlugin - agmt="cn=ds-m4.infn.it" (ds-m4:636): State: wait_for_changes -> start [05/Nov/2007:11:53:33 +0100] NSMMReplicationPlugin - agmt="cn=ds-m4.infn.it" (ds-m4:636): Cancelling linger on the connection [05/Nov/2007:11:53:33 +0100] NSMMReplicationPlugin - agmt="cn=ds-m4.infn.it" (ds-m4:636): Disconnected from the consumer [05/Nov/2007:11:53:33 +0100] NSMMReplicationPlugin - agmt="cn=ds-m4.infn.it" (ds-m4:636): State: start -> ready_to_acquire_replica [05/Nov/2007:11:53:33 +0100] NSMMReplicationPlugin - agmt="cn=ds-m4.infn.it" (ds-m4:636): Trying secure slapi_ldap_init [05/Nov/2007:11:53:33 +0100] NSMMReplicationPlugin - agmt="cn=ds-m4.infn.it" (ds-m4:636): binddn = , passwd = [05/Nov/2007:11:53:33 +0100] NSMMReplicationPlugin - agmt="cn=ds-m4.infn.it" (ds-m4:636): No linger to cancel on the connection [05/Nov/2007:11:53:33 +0100] NSMMReplicationPlugin - agmt="cn=ds-m4.infn.it" (ds-m4:636): Replica was successfully acquired. [05/Nov/2007:11:53:33 +0100] NSMMReplicationPlugin - agmt="cn=ds-m4.infn.it" (ds-m4:636): State: ready_to_acquire_replica -> sending_updates [05/Nov/2007:11:53:33 +0100] - _cl5PositionCursorForReplay (agmt="cn=ds-m4.infn.it" (ds-m4:636)): Consumer RUV: [05/Nov/2007:11:53:33 +0100] NSMMReplicationPlugin - agmt="cn=ds-m4.infn.it" (ds-m4:636): {replicageneration} 471e1779000000010000 [05/Nov/2007:11:53:33 +0100] NSMMReplicationPlugin - agmt="cn=ds-m4.infn.it" (ds-m4:636): {replica 4 ldap://ds-m4.infn.it:389} 471f8bb5000000040000 4721e4e7000000040000 00000000 [05/Nov/2007:11:53:33 +0100] NSMMReplicationPlugin - agmt="cn=ds-m4.infn.it" (ds-m4:636): {replica 1 ldap://ds-m1.infn.it:389} 471e185e000000010000 47220f21000000010000 00000000 [05/Nov/2007:11:53:33 +0100] NSMMReplicationPlugin - agmt="cn=ds-m4.infn.it" (ds-m4:636): {replica 2 ldap://ds-m2.infn.it:389} 471e1834000000020000 47220a40000000020000 00000000 [05/Nov/2007:11:53:33 +0100] NSMMReplicationPlugin - agmt="cn=ds-m4.infn.it" (ds-m4:636): {replica 3 ldap://ds-m3.infn.it:389} 4721e230000000030000 4721e5c6000000030000 00000000 [05/Nov/2007:11:53:33 +0100] - _cl5PositionCursorForReplay (agmt="cn=ds-m4.infn.it" (ds-m4:636)): Supplier RUV: [05/Nov/2007:11:53:33 +0100] NSMMReplicationPlugin - agmt="cn=ds-m4.infn.it" (ds-m4:636): {replicageneration} 471e1779000000010000 [05/Nov/2007:11:53:33 +0100] NSMMReplicationPlugin - agmt="cn=ds-m4.infn.it" (ds-m4:636): {replica 1 ldap://ds-m1.infn.it:389} 471e185e000000010000 4725e80f000000010000 4725e80f [05/Nov/2007:11:53:33 +0100] NSMMReplicationPlugin - agmt="cn=ds-m4.infn.it" (ds-m4:636): {replica 2 ldap://ds-m2.infn.it:389} 471e1834000000020000 47220a40000000020000 00000000 [05/Nov/2007:11:53:33 +0100] NSMMReplicationPlugin - agmt="cn=ds-m4.infn.it" (ds-m4:636): {replica 3 ldap://ds-m3.infn.it:389} 4721e230000000030000 4721e5c6000000030000 00000000 [05/Nov/2007:11:53:33 +0100] NSMMReplicationPlugin - agmt="cn=ds-m4.infn.it" (ds-m4:636): {replica 4 ldap://ds-m4.infn.it:389} 471f8bb5000000040000 4721e4e7000000040000 00000000 [05/Nov/2007:11:53:33 +0100] agmt="cn=ds-m4.infn.it" (ds-m4:636) - session start: anchorcsn=47220f21000000010000 [05/Nov/2007:11:53:33 +0100] agmt="cn=ds-m4.infn.it" (ds-m4:636) - Can't locate CSN 47220f21000000010000 in the changelog (DB rc=-30990). The consumer may need to be reinitialized. [05/Nov/2007:11:53:33 +0100] agmt="cn=ds-m4.infn.it" (ds-m4:636) - clcache_load_buffer: rc=-30990 [05/Nov/2007:11:53:33 +0100] NSMMReplicationPlugin - changelog program - agmt="cn=ds-m4.infn.it" (ds-m4:636): CSN 47220f21000000010000 found, position set for replay [05/Nov/2007:11:53:33 +0100] agmt="cn=ds-m4.infn.it" (ds-m4:636) - clcache_load_buffer: rc=-30990 [05/Nov/2007:11:53:33 +0100] NSMMReplicationPlugin - agmt="cn=ds-m4.infn.it" (ds-m4:636): No more updates to send (cl5GetNextOperationToReplay) [05/Nov/2007:11:53:33 +0100] - repl5_inc_waitfor_async_results: 0 0 [05/Nov/2007:11:53:33 +0100] - repl5_inc_result_threadmain starting [05/Nov/2007:11:53:34 +0100] - repl5_inc_result_threadmain exiting [05/Nov/2007:11:53:34 +0100] agmt="cn=ds-m4.infn.it" (ds-m4:636) - session end: state=0 load=0 sent=0 skipped=0 [05/Nov/2007:11:53:34 +0100] NSMMReplicationPlugin - agmt="cn=ds-m4.infn.it" (ds-m4:636): Successfully released consumer [05/Nov/2007:11:53:34 +0100] NSMMReplicationPlugin - agmt="cn=ds-m4.infn.it" (ds-m4:636): Beginning linger on the connection [05/Nov/2007:11:53:34 +0100] NSMMReplicationPlugin - agmt="cn=ds-m4.infn.it" (ds-m4:636): State: sending_updates -> wait_for_changes [05/Nov/2007:11:53:32 +0100] conn=0 op=106 SRCH base="cn=replication,cn=config" scope=2 filter="(objectClass=*)" attrs=ALL [05/Nov/2007:11:53:32 +0100] conn=0 op=106 RESULT err=0 tag=101 nentries=1 etime=0 [05/Nov/2007:11:53:32 +0100] conn=0 op=107 MOD dn="cn=ds-m4.infn.it, cn=replica, cn=\22dc=infn,dc=it\22, cn=mapping tree, cn=config" [05/Nov/2007:11:53:32 +0100] conn=0 op=107 RESULT err=0 tag=103 nentries=0 etime=0 [05/Nov/2007:11:53:32 +0100] conn=0 op=108 MOD dn="cn=ds-m4.infn.it, cn=replica, cn=\22dc=infn,dc=it\22, cn=mapping tree, cn=config" [05/Nov/2007:11:53:32 +0100] conn=0 op=108 RESULT err=0 tag=103 nentries=0 etime=0 [05/Nov/2007:11:54:35 +0100] NSMMReplicationPlugin - agmt="cn=ds-m4.infn.it" (ds-m4:636): Linger timeout has expired on the connection [05/Nov/2007:11:54:35 +0100] NSMMReplicationPlugin - agmt="cn=ds-m4.infn.it" (ds-m4:636): Disconnected from the consumer
--- Node B --- [05/Nov/2007:11:55:14 +0100] NSMMReplicationPlugin - conn=1967 op=3 repl="dc=infn,dc=it": Begin incremental protocol [05/Nov/2007:11:55:14 +0100] NSMMReplicationPlugin - conn=1967 op=3 repl="dc=infn,dc=it": Acquired replica [05/Nov/2007:11:55:14 +0100] NSMMReplicationPlugin - conn=1967 op=3 repl="dc=infn,dc=it": StartNSDS50ReplicationRequest: response=0 rc=0 [05/Nov/2007:11:55:15 +0100] NSMMReplicationPlugin - conn=1967 op=4 repl="dc=infn,dc=it": Released replica [05/Nov/2007:11:55:16 +0100] NSMMReplicationPlugin - conn=1968 op=3 repl="dc=infn,dc=it": Begin incremental protocol [05/Nov/2007:11:55:16 +0100] NSMMReplicationPlugin - conn=1968 op=3 repl="dc=infn,dc=it": Acquired replica [05/Nov/2007:11:55:16 +0100] NSMMReplicationPlugin - conn=1968 op=3 repl="dc=infn,dc=it": StartNSDS50ReplicationRequest: response=0 rc=0 [05/Nov/2007:11:55:17 +0100] NSMMReplicationPlugin - conn=1968 op=4 repl="dc=infn,dc=it": Released replica [05/Nov/2007:11:55:14 +0100] conn=1967 fd=65 slot=65 SSL connection from 193.206.153.171 to 193.206.144.35 [05/Nov/2007:11:55:14 +0100] conn=1967 SSL 256-bit AES; client CN=ds-m1.infn.it,L=Lecce,OU=Host,O=INFN,C=IT; issuer CN=INFN CA,O=INFN,C=IT [05/Nov/2007:11:55:14 +0100] conn=1967 SSL client bound as cn=ds-m1.infn.it,cn=config [05/Nov/2007:11:55:14 +0100] conn=1967 op=0 BIND dn="" method=sasl version=3 mech=EXTERNAL [05/Nov/2007:11:55:14 +0100] conn=1967 op=0 RESULT err=0 tag=97 nentries=0 etime=0 dn="cn=ds-m1.infn.it,cn=config" [05/Nov/2007:11:55:14 +0100] conn=1967 op=1 SRCH base="" scope=0 filter="(objectClass=*)" attrs="supportedControl supportedExtension" [05/Nov/2007:11:55:14 +0100] conn=1967 op=1 RESULT err=0 tag=101 nentries=1 etime=0 [05/Nov/2007:11:55:14 +0100] conn=1967 op=2 SRCH base="" scope=0 filter="(objectClass=*)" attrs="supportedControl supportedExtension" [05/Nov/2007:11:55:14 +0100] conn=1967 op=2 RESULT err=0 tag=101 nentries=1 etime=0 [05/Nov/2007:11:55:14 +0100] conn=1967 op=3 EXT oid="2.16.840.1.113730.3.5.3" name="Netscape Replication Start Session" [05/Nov/2007:11:55:14 +0100] conn=1967 op=3 RESULT err=0 tag=120 nentries=0 etime=0 [05/Nov/2007:11:55:15 +0100] conn=1967 op=4 EXT oid="2.16.840.1.113730.3.5.5" name="Netscape Replication End Session" [05/Nov/2007:11:55:15 +0100] conn=1967 op=4 RESULT err=0 tag=120 nentries=0 etime=0 [05/Nov/2007:11:55:15 +0100] conn=1967 op=5 UNBIND [05/Nov/2007:11:55:15 +0100] conn=1967 op=5 fd=65 closed - U1 [05/Nov/2007:11:55:15 +0100] conn=1968 fd=66 slot=66 SSL connection from 193.206.153.171 to 193.206.144.35 [05/Nov/2007:11:55:15 +0100] conn=1968 SSL 256-bit AES; client CN=ds-m1.infn.it,L=Lecce,OU=Host,O=INFN,C=IT; issuer CN=INFN CA,O=INFN,C=IT [05/Nov/2007:11:55:15 +0100] conn=1968 SSL client bound as cn=ds-m1.infn.it,cn=config [05/Nov/2007:11:55:15 +0100] conn=1968 op=0 BIND dn="" method=sasl version=3 mech=EXTERNAL [05/Nov/2007:11:55:15 +0100] conn=1968 op=0 RESULT err=0 tag=97 nentries=0 etime=0 dn="cn=ds-m1.infn.it,cn=config" [05/Nov/2007:11:55:15 +0100] conn=1968 op=1 SRCH base="" scope=0 filter="(objectClass=*)" attrs="supportedControl supportedExtension" [05/Nov/2007:11:55:15 +0100] conn=1968 op=1 RESULT err=0 tag=101 nentries=1 etime=0 [05/Nov/2007:11:55:15 +0100] conn=1968 op=2 SRCH base="" scope=0 filter="(objectClass=*)" attrs="supportedControl supportedExtension" [05/Nov/2007:11:55:15 +0100] conn=1968 op=2 RESULT err=0 tag=101 nentries=1 etime=0 [05/Nov/2007:11:55:16 +0100] conn=1968 op=3 EXT oid="2.16.840.1.113730.3.5.3" name="Netscape Replication Start Session" [05/Nov/2007:11:55:16 +0100] conn=1968 op=3 RESULT err=0 tag=120 nentries=0 etime=0 [05/Nov/2007:11:55:17 +0100] conn=1968 op=4 EXT oid="2.16.840.1.113730.3.5.5" name="Netscape Replication End Session" [05/Nov/2007:11:55:17 +0100] conn=1968 op=4 RESULT err=0 tag=120 nentries=0 etime=0 [05/Nov/2007:11:56:17 +0100] conn=1968 op=5 UNBIND [05/Nov/2007:11:56:17 +0100] conn=1968 op=5 fd=66 closed - U1
Thank you.
Richard Megginson wrote:
Dael Maselli wrote:
Richard Megginson, on 31/10/2007 17.43, wrote:
Dael Maselli wrote:
[...]
"SSL Client Authentication". Here I had a problem! There was a pop-up that told me it can't connect to the other fds server, but I thought it was a bug, because I checked with tcpdump and saw no packet sent (I can see it with simple auth). So I clicked to continue and all seems to work well, even the initialization done from A to B, I didn't do it when I created the Agreement from B to A in the same way.
You don't need to initialize from B to A if you already did the initialize from A to B.
Yes, I never did it. I only did A->B.
When you did the tcpdump, did you look at traffic on port 389 too, or just 636?
I looked at 389 when I used simple auth with UNencrypted connection, and I saw packets. When I do SSL Auth I specify port 636 for the destination of the agreement, so I didn't look at 389. At 636 no packets.
I tried with SSL and 389 hoping in TLS but it didn't work.
I suggest turning up the error log level to the replication log, then attempt to initialize B from A. You may have to enable replication logging on both A and B - see http://directory.fedoraproject.org/wiki/FAQ#Troubleshooting
By the way, in production environment I need to do the 4-way MMR, in the manual I read to do it with the A agreement to B and D, B to A and C, and so on, in a circular manner. I don't like this way due to its split-brain danger and no ollerance to more than 1 server fault, so I first tried connecting all to all, is it wrong?
No.
May it be the cause of the CNS disaster?
I don't think so.
I note you that after this 4-way test i deleted all agreements, replicas and changelogs, maybe there is some "dirty" configuration?
Ah, yes, that could be. Can you start over again from scratch?
Thanks.
-- Fedora-directory-users mailing list Fedora-directory-users@redhat.com https://www.redhat.com/mailman/listinfo/fedora-directory-users
-- Fedora-directory-users mailing list Fedora-directory-users@redhat.com https://www.redhat.com/mailman/listinfo/fedora-directory-users
-- Fedora-directory-users mailing list Fedora-directory-users@redhat.com https://www.redhat.com/mailman/listinfo/fedora-directory-users
Dael Maselli wrote:
As I said this is a test for a big central LDAP server and before starting from scratch I would like to know what's gone wrong.
There are odd problems from time to time when deleting/recreating replication agreements, replica config, and changelog config. That's why it's better to start from scratch.
I enabled the replica logs and this is the result, note that ds-m1 is node A and ds-m4 is node B, the others ds-m2 and ds-m3 where in the 4-way test. How I can delete them from the configuration???
--- Node A --- [05/Nov/2007:11:53:32 +0100] NSMMReplicationPlugin - : Update window will close at Tue Nov 6 00:01:00 2007 [05/Nov/2007:11:53:32 +0100] NSMMReplicationPlugin - agmt="cn=ds-m4.infn.it" (ds-m4:636): State: wait_for_changes -> wait_for_changes [05/Nov/2007:11:53:32 +0100] NSMMReplicationPlugin - agmt="cn=ds-m4.infn.it" (ds-m4:636): State: wait_for_changes -> start [05/Nov/2007:11:53:32 +0100] NSMMReplicationPlugin - agmt="cn=ds-m4.infn.it" (ds-m4:636): No linger to cancel on the connection [05/Nov/2007:11:53:32 +0100] NSMMReplicationPlugin - agmt="cn=ds-m4.infn.it" (ds-m4:636): Disconnected from the consumer [05/Nov/2007:11:53:32 +0100] NSMMReplicationPlugin - agmt="cn=ds-m4.infn.it" (ds-m4:636): State: start -> ready_to_acquire_replica [05/Nov/2007:11:53:32 +0100] NSMMReplicationPlugin - agmt="cn=ds-m4.infn.it" (ds-m4:636): Trying secure slapi_ldap_init [05/Nov/2007:11:53:32 +0100] NSMMReplicationPlugin - agmt="cn=ds-m4.infn.it" (ds-m4:636): binddn = , passwd = [05/Nov/2007:11:53:32 +0100] NSMMReplicationPlugin - agmt="cn=ds-m4.infn.it" (ds-m4:636): No linger to cancel on the connection [05/Nov/2007:11:53:32 +0100] NSMMReplicationPlugin - agmt="cn=ds-m4.infn.it" (ds-m4:636): Replica was successfully acquired. [05/Nov/2007:11:53:32 +0100] NSMMReplicationPlugin - agmt="cn=ds-m4.infn.it" (ds-m4:636): State: ready_to_acquire_replica -> sending_updates [05/Nov/2007:11:53:32 +0100] - _cl5PositionCursorForReplay (agmt="cn=ds-m4.infn.it" (ds-m4:636)): Consumer RUV: [05/Nov/2007:11:53:32 +0100] NSMMReplicationPlugin - agmt="cn=ds-m4.infn.it" (ds-m4:636): {replicageneration} 471e1779000000010000 [05/Nov/2007:11:53:32 +0100] NSMMReplicationPlugin - agmt="cn=ds-m4.infn.it" (ds-m4:636): {replica 4 ldap://ds-m4.infn.it:389} 471f8bb5000000040000 4721e4e7000000040000 00000000 [05/Nov/2007:11:53:32 +0100] NSMMReplicationPlugin - agmt="cn=ds-m4.infn.it" (ds-m4:636): {replica 1 ldap://ds-m1.infn.it:389} 471e185e000000010000 47220f21000000010000 00000000 [05/Nov/2007:11:53:32 +0100] NSMMReplicationPlugin - agmt="cn=ds-m4.infn.it" (ds-m4:636): {replica 2 ldap://ds-m2.infn.it:389} 471e1834000000020000 47220a40000000020000 00000000 [05/Nov/2007:11:53:32 +0100] NSMMReplicationPlugin - agmt="cn=ds-m4.infn.it" (ds-m4:636): {replica 3 ldap://ds-m3.infn.it:389} 4721e230000000030000 4721e5c6000000030000 00000000 [05/Nov/2007:11:53:32 +0100] - _cl5PositionCursorForReplay (agmt="cn=ds-m4.infn.it" (ds-m4:636)): Supplier RUV: [05/Nov/2007:11:53:32 +0100] NSMMReplicationPlugin - agmt="cn=ds-m4.infn.it" (ds-m4:636): {replicageneration} 471e1779000000010000 [05/Nov/2007:11:53:32 +0100] NSMMReplicationPlugin - agmt="cn=ds-m4.infn.it" (ds-m4:636): {replica 1 ldap://ds-m1.infn.it:389} 471e185e000000010000 4725e80f000000010000 4725e80f [05/Nov/2007:11:53:32 +0100] NSMMReplicationPlugin - agmt="cn=ds-m4.infn.it" (ds-m4:636): {replica 2 ldap://ds-m2.infn.it:389} 471e1834000000020000 47220a40000000020000 00000000 [05/Nov/2007:11:53:32 +0100] NSMMReplicationPlugin - agmt="cn=ds-m4.infn.it" (ds-m4:636): {replica 3 ldap://ds-m3.infn.it:389} 4721e230000000030000 4721e5c6000000030000 00000000 [05/Nov/2007:11:53:32 +0100] NSMMReplicationPlugin - agmt="cn=ds-m4.infn.it" (ds-m4:636): {replica 4 ldap://ds-m4.infn.it:389} 471f8bb5000000040000 4721e4e7000000040000 00000000 [05/Nov/2007:11:53:32 +0100] agmt="cn=ds-m4.infn.it" (ds-m4:636) - session start: anchorcsn=47220f21000000010000 [05/Nov/2007:11:53:32 +0100] agmt="cn=ds-m4.infn.it" (ds-m4:636) - Can't locate CSN 47220f21000000010000 in the changelog (DB rc=-30990). The consumer may need to be reinitialized. [05/Nov/2007:11:53:32 +0100] agmt="cn=ds-m4.infn.it" (ds-m4:636) - clcache_load_buffer: rc=-30990 [05/Nov/2007:11:53:32 +0100] NSMMReplicationPlugin - changelog program
- agmt="cn=ds-m4.infn.it" (ds-m4:636): CSN 47220f21000000010000 found,
position set for replay [05/Nov/2007:11:53:32 +0100] agmt="cn=ds-m4.infn.it" (ds-m4:636) - clcache_load_buffer: rc=-30990 [05/Nov/2007:11:53:32 +0100] NSMMReplicationPlugin - agmt="cn=ds-m4.infn.it" (ds-m4:636): No more updates to send (cl5GetNextOperationToReplay) [05/Nov/2007:11:53:32 +0100] - repl5_inc_waitfor_async_results: 0 0 [05/Nov/2007:11:53:32 +0100] - repl5_inc_result_threadmain starting [05/Nov/2007:11:53:33 +0100] - repl5_inc_result_threadmain exiting [05/Nov/2007:11:53:33 +0100] agmt="cn=ds-m4.infn.it" (ds-m4:636) - session end: state=0 load=0 sent=0 skipped=0 [05/Nov/2007:11:53:33 +0100] NSMMReplicationPlugin - agmt="cn=ds-m4.infn.it" (ds-m4:636): Successfully released consumer [05/Nov/2007:11:53:33 +0100] NSMMReplicationPlugin - agmt="cn=ds-m4.infn.it" (ds-m4:636): Beginning linger on the connection [05/Nov/2007:11:53:33 +0100] NSMMReplicationPlugin - agmt="cn=ds-m4.infn.it" (ds-m4:636): State: sending_updates -> wait_for_changes [05/Nov/2007:11:53:33 +0100] NSMMReplicationPlugin - agmt="cn=ds-m4.infn.it" (ds-m4:636): State: wait_for_changes -> start [05/Nov/2007:11:53:33 +0100] NSMMReplicationPlugin - agmt="cn=ds-m4.infn.it" (ds-m4:636): Cancelling linger on the connection [05/Nov/2007:11:53:33 +0100] NSMMReplicationPlugin - agmt="cn=ds-m4.infn.it" (ds-m4:636): Disconnected from the consumer [05/Nov/2007:11:53:33 +0100] NSMMReplicationPlugin - agmt="cn=ds-m4.infn.it" (ds-m4:636): State: start -> ready_to_acquire_replica [05/Nov/2007:11:53:33 +0100] NSMMReplicationPlugin - agmt="cn=ds-m4.infn.it" (ds-m4:636): Trying secure slapi_ldap_init [05/Nov/2007:11:53:33 +0100] NSMMReplicationPlugin - agmt="cn=ds-m4.infn.it" (ds-m4:636): binddn = , passwd = [05/Nov/2007:11:53:33 +0100] NSMMReplicationPlugin - agmt="cn=ds-m4.infn.it" (ds-m4:636): No linger to cancel on the connection [05/Nov/2007:11:53:33 +0100] NSMMReplicationPlugin - agmt="cn=ds-m4.infn.it" (ds-m4:636): Replica was successfully acquired. [05/Nov/2007:11:53:33 +0100] NSMMReplicationPlugin - agmt="cn=ds-m4.infn.it" (ds-m4:636): State: ready_to_acquire_replica -> sending_updates [05/Nov/2007:11:53:33 +0100] - _cl5PositionCursorForReplay (agmt="cn=ds-m4.infn.it" (ds-m4:636)): Consumer RUV: [05/Nov/2007:11:53:33 +0100] NSMMReplicationPlugin - agmt="cn=ds-m4.infn.it" (ds-m4:636): {replicageneration} 471e1779000000010000 [05/Nov/2007:11:53:33 +0100] NSMMReplicationPlugin - agmt="cn=ds-m4.infn.it" (ds-m4:636): {replica 4 ldap://ds-m4.infn.it:389} 471f8bb5000000040000 4721e4e7000000040000 00000000 [05/Nov/2007:11:53:33 +0100] NSMMReplicationPlugin - agmt="cn=ds-m4.infn.it" (ds-m4:636): {replica 1 ldap://ds-m1.infn.it:389} 471e185e000000010000 47220f21000000010000 00000000 [05/Nov/2007:11:53:33 +0100] NSMMReplicationPlugin - agmt="cn=ds-m4.infn.it" (ds-m4:636): {replica 2 ldap://ds-m2.infn.it:389} 471e1834000000020000 47220a40000000020000 00000000 [05/Nov/2007:11:53:33 +0100] NSMMReplicationPlugin - agmt="cn=ds-m4.infn.it" (ds-m4:636): {replica 3 ldap://ds-m3.infn.it:389} 4721e230000000030000 4721e5c6000000030000 00000000 [05/Nov/2007:11:53:33 +0100] - _cl5PositionCursorForReplay (agmt="cn=ds-m4.infn.it" (ds-m4:636)): Supplier RUV: [05/Nov/2007:11:53:33 +0100] NSMMReplicationPlugin - agmt="cn=ds-m4.infn.it" (ds-m4:636): {replicageneration} 471e1779000000010000 [05/Nov/2007:11:53:33 +0100] NSMMReplicationPlugin - agmt="cn=ds-m4.infn.it" (ds-m4:636): {replica 1 ldap://ds-m1.infn.it:389} 471e185e000000010000 4725e80f000000010000 4725e80f [05/Nov/2007:11:53:33 +0100] NSMMReplicationPlugin - agmt="cn=ds-m4.infn.it" (ds-m4:636): {replica 2 ldap://ds-m2.infn.it:389} 471e1834000000020000 47220a40000000020000 00000000 [05/Nov/2007:11:53:33 +0100] NSMMReplicationPlugin - agmt="cn=ds-m4.infn.it" (ds-m4:636): {replica 3 ldap://ds-m3.infn.it:389} 4721e230000000030000 4721e5c6000000030000 00000000 [05/Nov/2007:11:53:33 +0100] NSMMReplicationPlugin - agmt="cn=ds-m4.infn.it" (ds-m4:636): {replica 4 ldap://ds-m4.infn.it:389} 471f8bb5000000040000 4721e4e7000000040000 00000000 [05/Nov/2007:11:53:33 +0100] agmt="cn=ds-m4.infn.it" (ds-m4:636) - session start: anchorcsn=47220f21000000010000 [05/Nov/2007:11:53:33 +0100] agmt="cn=ds-m4.infn.it" (ds-m4:636) - Can't locate CSN 47220f21000000010000 in the changelog (DB rc=-30990). The consumer may need to be reinitialized. [05/Nov/2007:11:53:33 +0100] agmt="cn=ds-m4.infn.it" (ds-m4:636) - clcache_load_buffer: rc=-30990 [05/Nov/2007:11:53:33 +0100] NSMMReplicationPlugin - changelog program
- agmt="cn=ds-m4.infn.it" (ds-m4:636): CSN 47220f21000000010000 found,
position set for replay [05/Nov/2007:11:53:33 +0100] agmt="cn=ds-m4.infn.it" (ds-m4:636) - clcache_load_buffer: rc=-30990 [05/Nov/2007:11:53:33 +0100] NSMMReplicationPlugin - agmt="cn=ds-m4.infn.it" (ds-m4:636): No more updates to send (cl5GetNextOperationToReplay) [05/Nov/2007:11:53:33 +0100] - repl5_inc_waitfor_async_results: 0 0 [05/Nov/2007:11:53:33 +0100] - repl5_inc_result_threadmain starting [05/Nov/2007:11:53:34 +0100] - repl5_inc_result_threadmain exiting [05/Nov/2007:11:53:34 +0100] agmt="cn=ds-m4.infn.it" (ds-m4:636) - session end: state=0 load=0 sent=0 skipped=0 [05/Nov/2007:11:53:34 +0100] NSMMReplicationPlugin - agmt="cn=ds-m4.infn.it" (ds-m4:636): Successfully released consumer [05/Nov/2007:11:53:34 +0100] NSMMReplicationPlugin - agmt="cn=ds-m4.infn.it" (ds-m4:636): Beginning linger on the connection [05/Nov/2007:11:53:34 +0100] NSMMReplicationPlugin - agmt="cn=ds-m4.infn.it" (ds-m4:636): State: sending_updates -> wait_for_changes [05/Nov/2007:11:53:32 +0100] conn=0 op=106 SRCH base="cn=replication,cn=config" scope=2 filter="(objectClass=*)" attrs=ALL [05/Nov/2007:11:53:32 +0100] conn=0 op=106 RESULT err=0 tag=101 nentries=1 etime=0 [05/Nov/2007:11:53:32 +0100] conn=0 op=107 MOD dn="cn=ds-m4.infn.it, cn=replica, cn=\22dc=infn,dc=it\22, cn=mapping tree, cn=config" [05/Nov/2007:11:53:32 +0100] conn=0 op=107 RESULT err=0 tag=103 nentries=0 etime=0 [05/Nov/2007:11:53:32 +0100] conn=0 op=108 MOD dn="cn=ds-m4.infn.it, cn=replica, cn=\22dc=infn,dc=it\22, cn=mapping tree, cn=config" [05/Nov/2007:11:53:32 +0100] conn=0 op=108 RESULT err=0 tag=103 nentries=0 etime=0 [05/Nov/2007:11:54:35 +0100] NSMMReplicationPlugin - agmt="cn=ds-m4.infn.it" (ds-m4:636): Linger timeout has expired on the connection [05/Nov/2007:11:54:35 +0100] NSMMReplicationPlugin - agmt="cn=ds-m4.infn.it" (ds-m4:636): Disconnected from the consumer
--- Node B --- [05/Nov/2007:11:55:14 +0100] NSMMReplicationPlugin - conn=1967 op=3 repl="dc=infn,dc=it": Begin incremental protocol [05/Nov/2007:11:55:14 +0100] NSMMReplicationPlugin - conn=1967 op=3 repl="dc=infn,dc=it": Acquired replica [05/Nov/2007:11:55:14 +0100] NSMMReplicationPlugin - conn=1967 op=3 repl="dc=infn,dc=it": StartNSDS50ReplicationRequest: response=0 rc=0 [05/Nov/2007:11:55:15 +0100] NSMMReplicationPlugin - conn=1967 op=4 repl="dc=infn,dc=it": Released replica [05/Nov/2007:11:55:16 +0100] NSMMReplicationPlugin - conn=1968 op=3 repl="dc=infn,dc=it": Begin incremental protocol [05/Nov/2007:11:55:16 +0100] NSMMReplicationPlugin - conn=1968 op=3 repl="dc=infn,dc=it": Acquired replica [05/Nov/2007:11:55:16 +0100] NSMMReplicationPlugin - conn=1968 op=3 repl="dc=infn,dc=it": StartNSDS50ReplicationRequest: response=0 rc=0 [05/Nov/2007:11:55:17 +0100] NSMMReplicationPlugin - conn=1968 op=4 repl="dc=infn,dc=it": Released replica [05/Nov/2007:11:55:14 +0100] conn=1967 fd=65 slot=65 SSL connection from 193.206.153.171 to 193.206.144.35 [05/Nov/2007:11:55:14 +0100] conn=1967 SSL 256-bit AES; client CN=ds-m1.infn.it,L=Lecce,OU=Host,O=INFN,C=IT; issuer CN=INFN CA,O=INFN,C=IT [05/Nov/2007:11:55:14 +0100] conn=1967 SSL client bound as cn=ds-m1.infn.it,cn=config [05/Nov/2007:11:55:14 +0100] conn=1967 op=0 BIND dn="" method=sasl version=3 mech=EXTERNAL [05/Nov/2007:11:55:14 +0100] conn=1967 op=0 RESULT err=0 tag=97 nentries=0 etime=0 dn="cn=ds-m1.infn.it,cn=config" [05/Nov/2007:11:55:14 +0100] conn=1967 op=1 SRCH base="" scope=0 filter="(objectClass=*)" attrs="supportedControl supportedExtension" [05/Nov/2007:11:55:14 +0100] conn=1967 op=1 RESULT err=0 tag=101 nentries=1 etime=0 [05/Nov/2007:11:55:14 +0100] conn=1967 op=2 SRCH base="" scope=0 filter="(objectClass=*)" attrs="supportedControl supportedExtension" [05/Nov/2007:11:55:14 +0100] conn=1967 op=2 RESULT err=0 tag=101 nentries=1 etime=0 [05/Nov/2007:11:55:14 +0100] conn=1967 op=3 EXT oid="2.16.840.1.113730.3.5.3" name="Netscape Replication Start Session" [05/Nov/2007:11:55:14 +0100] conn=1967 op=3 RESULT err=0 tag=120 nentries=0 etime=0 [05/Nov/2007:11:55:15 +0100] conn=1967 op=4 EXT oid="2.16.840.1.113730.3.5.5" name="Netscape Replication End Session" [05/Nov/2007:11:55:15 +0100] conn=1967 op=4 RESULT err=0 tag=120 nentries=0 etime=0 [05/Nov/2007:11:55:15 +0100] conn=1967 op=5 UNBIND [05/Nov/2007:11:55:15 +0100] conn=1967 op=5 fd=65 closed - U1 [05/Nov/2007:11:55:15 +0100] conn=1968 fd=66 slot=66 SSL connection from 193.206.153.171 to 193.206.144.35 [05/Nov/2007:11:55:15 +0100] conn=1968 SSL 256-bit AES; client CN=ds-m1.infn.it,L=Lecce,OU=Host,O=INFN,C=IT; issuer CN=INFN CA,O=INFN,C=IT [05/Nov/2007:11:55:15 +0100] conn=1968 SSL client bound as cn=ds-m1.infn.it,cn=config [05/Nov/2007:11:55:15 +0100] conn=1968 op=0 BIND dn="" method=sasl version=3 mech=EXTERNAL [05/Nov/2007:11:55:15 +0100] conn=1968 op=0 RESULT err=0 tag=97 nentries=0 etime=0 dn="cn=ds-m1.infn.it,cn=config" [05/Nov/2007:11:55:15 +0100] conn=1968 op=1 SRCH base="" scope=0 filter="(objectClass=*)" attrs="supportedControl supportedExtension" [05/Nov/2007:11:55:15 +0100] conn=1968 op=1 RESULT err=0 tag=101 nentries=1 etime=0 [05/Nov/2007:11:55:15 +0100] conn=1968 op=2 SRCH base="" scope=0 filter="(objectClass=*)" attrs="supportedControl supportedExtension" [05/Nov/2007:11:55:15 +0100] conn=1968 op=2 RESULT err=0 tag=101 nentries=1 etime=0 [05/Nov/2007:11:55:16 +0100] conn=1968 op=3 EXT oid="2.16.840.1.113730.3.5.3" name="Netscape Replication Start Session" [05/Nov/2007:11:55:16 +0100] conn=1968 op=3 RESULT err=0 tag=120 nentries=0 etime=0 [05/Nov/2007:11:55:17 +0100] conn=1968 op=4 EXT oid="2.16.840.1.113730.3.5.5" name="Netscape Replication End Session" [05/Nov/2007:11:55:17 +0100] conn=1968 op=4 RESULT err=0 tag=120 nentries=0 etime=0 [05/Nov/2007:11:56:17 +0100] conn=1968 op=5 UNBIND [05/Nov/2007:11:56:17 +0100] conn=1968 op=5 fd=66 closed - U1
Thank you.
Richard Megginson wrote:
Dael Maselli wrote:
Richard Megginson, on 31/10/2007 17.43, wrote:
Dael Maselli wrote:
[...]
"SSL Client Authentication". Here I had a problem! There was a pop-up that told me it can't connect to the other fds server, but I thought it was a bug, because I checked with tcpdump and saw no packet sent (I can see it with simple auth). So I clicked to continue and all seems to work well, even the initialization done from A to B, I didn't do it when I created the Agreement from B to A in the same way.
You don't need to initialize from B to A if you already did the initialize from A to B.
Yes, I never did it. I only did A->B.
When you did the tcpdump, did you look at traffic on port 389 too, or just 636?
I looked at 389 when I used simple auth with UNencrypted connection, and I saw packets. When I do SSL Auth I specify port 636 for the destination of the agreement, so I didn't look at 389. At 636 no packets.
I tried with SSL and 389 hoping in TLS but it didn't work.
I suggest turning up the error log level to the replication log, then attempt to initialize B from A. You may have to enable replication logging on both A and B - see http://directory.fedoraproject.org/wiki/FAQ#Troubleshooting
By the way, in production environment I need to do the 4-way MMR, in the manual I read to do it with the A agreement to B and D, B to A and C, and so on, in a circular manner. I don't like this way due to its split-brain danger and no ollerance to more than 1 server fault, so I first tried connecting all to all, is it wrong?
No.
May it be the cause of the CNS disaster?
I don't think so.
I note you that after this 4-way test i deleted all agreements, replicas and changelogs, maybe there is some "dirty" configuration?
Ah, yes, that could be. Can you start over again from scratch?
Thanks.
-- Fedora-directory-users mailing list Fedora-directory-users@redhat.com https://www.redhat.com/mailman/listinfo/fedora-directory-users
-- Fedora-directory-users mailing list Fedora-directory-users@redhat.com https://www.redhat.com/mailman/listinfo/fedora-directory-users
-- Fedora-directory-users mailing list Fedora-directory-users@redhat.com https://www.redhat.com/mailman/listinfo/fedora-directory-users
-- Fedora-directory-users mailing list Fedora-directory-users@redhat.com https://www.redhat.com/mailman/listinfo/fedora-directory-users
Well. I restarted from scratch. Now all works fine.
Now I have 4-way RW replicas with agreements from all to all.
Thank you for assistance.
Regards.
Dear Richard,
The problem came back, this time in one node.
We have 4-way replica with the nodes: ds-m1, ds-2, ds-m3, ds-m4.
Yesterday all RW replica works fine, this morning one node (ds-m3) crashed and restarted with this log:
[14/Nov/2007:09:16:37 +0100] - Fedora-Directory/1.0.4 B2006.312.1621 starting up [14/Nov/2007:09:16:37 +0100] - Detected Disorderly Shutdown last time Directory Server was running, recovering database. [14/Nov/2007:09:16:38 +0100] NSMMReplicationPlugin - replica_check_for_data_reload: Warning: data for replica dc=infn,dc=it was reloaded and it no longer matches the data in the changelog (replica data > changelog). Recreating the changelog file. This could affect replication with replica's consumers in which case the consumers should be reinitialized.
Then I tried to reinitialize ds-m3 from ds-m1 and in ds-m3 log it wrote: [14/Nov/2007:15:21:36 +0100] NSMMReplicationPlugin - multimaster_be_state_change: replica dc=infn,dc=it is going offline; disabling replication [14/Nov/2007:15:21:36 +0100] - WARNING: Import is running with nsslapd-db-private-import-mem on; No other process is allowed to access the database [14/Nov/2007:15:21:38 +0100] - import userRoot: Workers finished; cleaning up... [14/Nov/2007:15:21:39 +0100] - import userRoot: Workers cleaned up. [14/Nov/2007:15:21:39 +0100] - import userRoot: Indexing complete. Post-processing... [14/Nov/2007:15:21:39 +0100] - import userRoot: Flushing caches... [14/Nov/2007:15:21:39 +0100] - import userRoot: Closing files... [14/Nov/2007:15:21:39 +0100] - import userRoot: Import complete. Processed 10 entries in 2 seconds. (5.00 entries/sec) [14/Nov/2007:15:21:39 +0100] NSMMReplicationPlugin - multimaster_be_state_change: replica dc=infn,dc=it is coming online; enabling replication [14/Nov/2007:15:21:39 +0100] NSMMReplicationPlugin - replica_reload_ruv: Warning: new data for replica dc=infn,dc=it does not match the data in the chang elog. Recreating the changelog file. This could affect replication with replica's consumers in which case the consumers should be reinitialized.
So I tried to make changes on directory from node ds-m1,2 or 4 and it propagates to all 4 node (including ds-m3). BUT when I try to make changes from ds-m3 it will not propagates and in the ds-m3 log there is angain:
[14/Nov/2007:15:42:22 +0100] agmt="cn=m3-m2" (ds-m2:636) - Can't locate CSN 4739d5a5000000030000 in the changelog (DB rc=-30990). The consumer may need to be reinitialized. [14/Nov/2007:15:42:22 +0100] agmt="cn=m3-m4" (ds-m4:636) - Can't locate CSN 4739d5a5000000030000 in the changelog (DB rc=-30990). The consumer may need to be reinitialized. [14/Nov/2007:15:42:22 +0100] agmt="cn=m3-m1" (ds-m1:636) - Can't locate CSN 4739d5a5000000030000 in the changelog (DB rc=-30990). The consumer may need to be reinitialized.
So, please help me! What can I do now, we can't reinstall from scratch anytime one server goes down.
Thank you. Best regards.
Dael Maselli.
Dael Maselli wrote:
Well. I restarted from scratch. Now all works fine.
Now I have 4-way RW replicas with agreements from all to all.
Thank you for assistance.
Regards.
-- Fedora-directory-users mailing list Fedora-directory-users@redhat.com https://www.redhat.com/mailman/listinfo/fedora-directory-users
Dael Maselli wrote:
Dear Richard,
The problem came back, this time in one node.
We have 4-way replica with the nodes: ds-m1, ds-2, ds-m3, ds-m4.
Yesterday all RW replica works fine, this morning one node (ds-m3) crashed
What was the cause of the crash?
and restarted with this log:
[14/Nov/2007:09:16:37 +0100] - Fedora-Directory/1.0.4 B2006.312.1621 starting up [14/Nov/2007:09:16:37 +0100] - Detected Disorderly Shutdown last time Directory Server was running, recovering database. [14/Nov/2007:09:16:38 +0100] NSMMReplicationPlugin - replica_check_for_data_reload: Warning: data for replica dc=infn,dc=it was reloaded and it no longer matches the data in the changelog (replica data > changelog). Recreating the changelog file. This could affect replication with replica's consumers in which case the consumers should be reinitialized.
If you see this again, try this: Shutdown m3, then start it with the replica log level: cd /opt/fedora-ds/slapd-m3 ./stop-slapd ./start-slapd -d 8192 Then shut it down as soon as you see the replica_check_for_data_reload: error message. Then paste the error log to pastebin.com or rafb.net/paste and paste the link here. Be sure to obscure any sensitive information first.
Then I tried to reinitialize ds-m3 from ds-m1 and in ds-m3 log it wrote: [14/Nov/2007:15:21:36 +0100] NSMMReplicationPlugin - multimaster_be_state_change: replica dc=infn,dc=it is going offline; disabling replication [14/Nov/2007:15:21:36 +0100] - WARNING: Import is running with nsslapd-db-private-import-mem on; No other process is allowed to access the database [14/Nov/2007:15:21:38 +0100] - import userRoot: Workers finished; cleaning up... [14/Nov/2007:15:21:39 +0100] - import userRoot: Workers cleaned up. [14/Nov/2007:15:21:39 +0100] - import userRoot: Indexing complete. Post-processing... [14/Nov/2007:15:21:39 +0100] - import userRoot: Flushing caches... [14/Nov/2007:15:21:39 +0100] - import userRoot: Closing files... [14/Nov/2007:15:21:39 +0100] - import userRoot: Import complete. Processed 10 entries in 2 seconds. (5.00 entries/sec) [14/Nov/2007:15:21:39 +0100] NSMMReplicationPlugin - multimaster_be_state_change: replica dc=infn,dc=it is coming online; enabling replication [14/Nov/2007:15:21:39 +0100] NSMMReplicationPlugin - replica_reload_ruv: Warning: new data for replica dc=infn,dc=it does not match the data in the chang elog. Recreating the changelog file. This could affect replication with replica's consumers in which case the consumers should be reinitialized.
So I tried to make changes on directory from node ds-m1,2 or 4 and it propagates to all 4 node (including ds-m3). BUT when I try to make changes from ds-m3 it will not propagates and in the ds-m3 log there is angain:
[14/Nov/2007:15:42:22 +0100] agmt="cn=m3-m2" (ds-m2:636) - Can't locate CSN 4739d5a5000000030000 in the changelog (DB rc=-30990). The consumer may need to be reinitialized. [14/Nov/2007:15:42:22 +0100] agmt="cn=m3-m4" (ds-m4:636) - Can't locate CSN 4739d5a5000000030000 in the changelog (DB rc=-30990). The consumer may need to be reinitialized. [14/Nov/2007:15:42:22 +0100] agmt="cn=m3-m1" (ds-m1:636) - Can't locate CSN 4739d5a5000000030000 in the changelog (DB rc=-30990). The consumer may need to be reinitialized.
So, please help me! What can I do now, we can't reinstall from scratch anytime one server goes down.
To get up and running again, try this, assuming you have no pending changes in the m3 database that you care about: shutdown m3 remove all of the files in the changelog directory (e.g. /opt/fedora-ds/slapd-instance/cldb) restart m3 do a replica reinit of m3 from one of the other masters
Thank you. Best regards.
Dael Maselli.
Dael Maselli wrote:
Well. I restarted from scratch. Now all works fine.
Now I have 4-way RW replicas with agreements from all to all.
Thank you for assistance.
Regards.
-- Fedora-directory-users mailing list Fedora-directory-users@redhat.com https://www.redhat.com/mailman/listinfo/fedora-directory-users
-- Fedora-directory-users mailing list Fedora-directory-users@redhat.com https://www.redhat.com/mailman/listinfo/fedora-directory-users
Dael Maselli wrote:
Dear Richard,
The problem came back, this time in one node.
I've created a bug to track this issue - https://bugzilla.redhat.com/show_bug.cgi?id=388021
A less drastic solution than recreating everything from scratch will be to export the database from a good master, and reimport it into that master, then reinit all other masters from that master. You will probably want to make sure there are no pending updates in any of your masters first, or you will lose them.
I'm working on a better fix for this, but in the meantime, you should not have to reinstall everything.
We have 4-way replica with the nodes: ds-m1, ds-2, ds-m3, ds-m4.
Yesterday all RW replica works fine, this morning one node (ds-m3) crashed and restarted with this log:
[14/Nov/2007:09:16:37 +0100] - Fedora-Directory/1.0.4 B2006.312.1621 starting up [14/Nov/2007:09:16:37 +0100] - Detected Disorderly Shutdown last time Directory Server was running, recovering database. [14/Nov/2007:09:16:38 +0100] NSMMReplicationPlugin - replica_check_for_data_reload: Warning: data for replica dc=infn,dc=it was reloaded and it no longer matches the data in the changelog (replica data > changelog). Recreating the changelog file. This could affect replication with replica's consumers in which case the consumers should be reinitialized.
Then I tried to reinitialize ds-m3 from ds-m1 and in ds-m3 log it wrote: [14/Nov/2007:15:21:36 +0100] NSMMReplicationPlugin - multimaster_be_state_change: replica dc=infn,dc=it is going offline; disabling replication [14/Nov/2007:15:21:36 +0100] - WARNING: Import is running with nsslapd-db-private-import-mem on; No other process is allowed to access the database [14/Nov/2007:15:21:38 +0100] - import userRoot: Workers finished; cleaning up... [14/Nov/2007:15:21:39 +0100] - import userRoot: Workers cleaned up. [14/Nov/2007:15:21:39 +0100] - import userRoot: Indexing complete. Post-processing... [14/Nov/2007:15:21:39 +0100] - import userRoot: Flushing caches... [14/Nov/2007:15:21:39 +0100] - import userRoot: Closing files... [14/Nov/2007:15:21:39 +0100] - import userRoot: Import complete. Processed 10 entries in 2 seconds. (5.00 entries/sec) [14/Nov/2007:15:21:39 +0100] NSMMReplicationPlugin - multimaster_be_state_change: replica dc=infn,dc=it is coming online; enabling replication [14/Nov/2007:15:21:39 +0100] NSMMReplicationPlugin - replica_reload_ruv: Warning: new data for replica dc=infn,dc=it does not match the data in the chang elog. Recreating the changelog file. This could affect replication with replica's consumers in which case the consumers should be reinitialized.
So I tried to make changes on directory from node ds-m1,2 or 4 and it propagates to all 4 node (including ds-m3). BUT when I try to make changes from ds-m3 it will not propagates and in the ds-m3 log there is angain:
[14/Nov/2007:15:42:22 +0100] agmt="cn=m3-m2" (ds-m2:636) - Can't locate CSN 4739d5a5000000030000 in the changelog (DB rc=-30990). The consumer may need to be reinitialized. [14/Nov/2007:15:42:22 +0100] agmt="cn=m3-m4" (ds-m4:636) - Can't locate CSN 4739d5a5000000030000 in the changelog (DB rc=-30990). The consumer may need to be reinitialized. [14/Nov/2007:15:42:22 +0100] agmt="cn=m3-m1" (ds-m1:636) - Can't locate CSN 4739d5a5000000030000 in the changelog (DB rc=-30990). The consumer may need to be reinitialized.
So, please help me! What can I do now, we can't reinstall from scratch anytime one server goes down.
Thank you. Best regards.
Dael Maselli.
Dael Maselli wrote:
Well. I restarted from scratch. Now all works fine.
Now I have 4-way RW replicas with agreements from all to all.
Thank you for assistance.
Regards.
-- Fedora-directory-users mailing list Fedora-directory-users@redhat.com https://www.redhat.com/mailman/listinfo/fedora-directory-users
-- Fedora-directory-users mailing list Fedora-directory-users@redhat.com https://www.redhat.com/mailman/listinfo/fedora-directory-users
Well, I think I found a workaround, instead of deleting the changelog I tried to change the max records in it, then after sending 2 or 3 update from a "good" server the errors disappears and all updates work well.
Is it possible that there is some information in the DS configuration that doesn't clear after changelog recreation but flush correctly after entries expire?
Now all 4 nodes works, but I hope there will be a bug-fix soon.
Thank you very much.
Dael.
Rich Megginson wrote:
Dael Maselli wrote:
Dear Richard,
The problem came back, this time in one node.
I've created a bug to track this issue - https://bugzilla.redhat.com/show_bug.cgi?id=388021
A less drastic solution than recreating everything from scratch will be to export the database from a good master, and reimport it into that master, then reinit all other masters from that master. You will probably want to make sure there are no pending updates in any of your masters first, or you will lose them.
I'm working on a better fix for this, but in the meantime, you should not have to reinstall everything.
Dael Maselli wrote:
Well, I think I found a workaround, instead of deleting the changelog I tried to change the max records in it, then after sending 2 or 3 update from a "good" server the errors disappears and all updates work well.
Is it possible that there is some information in the DS configuration that doesn't clear after changelog recreation but flush correctly after entries expire?
Yes, something like that. See the bug. https://bugzilla.redhat.com/show_bug.cgi?id=388021
And please update the bug with your workaround instructions.
Now all 4 nodes works, but I hope there will be a bug-fix soon.
Thank you very much.
Dael.
Rich Megginson wrote:
Dael Maselli wrote:
Dear Richard,
The problem came back, this time in one node.
I've created a bug to track this issue - https://bugzilla.redhat.com/show_bug.cgi?id=388021
A less drastic solution than recreating everything from scratch will be to export the database from a good master, and reimport it into that master, then reinit all other masters from that master. You will probably want to make sure there are no pending updates in any of your masters first, or you will lose them.
I'm working on a better fix for this, but in the meantime, you should not have to reinstall everything.
-- Fedora-directory-users mailing list Fedora-directory-users@redhat.com https://www.redhat.com/mailman/listinfo/fedora-directory-users
389-users@lists.fedoraproject.org