Is replication from a 1.2.8.3 server to a 1.2.10.4 server known to work or not work? We're having changelog issues.
Background:
We have an ldap service consisting of 3 masters, 2 hubs and 16 slaves. All were running 1.2.8.3 since last summer with no issues. This summer, we decided to bring them all up to the latest stable release, 1.2.10.4. We can't afford a lot of downtime for the service as a whole, but with the redundancy level we have, we can take down a machine or two at a time without user impact.
We started with one slave, did a clean install of 1.2.10.4 on it, set up replication agreements from our 1.2.8.3 hubs to it and watched it for a week or so. Everything looked fine, so we started rolling through the rest of the slave servers, got them all running 1.2.10.4 and so far haven't seen any problems.
A couple of days ago, I did one of our two hubs. The first time I bring up the daemon after doing the initial import of our ldap data everything seems fine. However, we start seeing errors the first time we restart:
[11/Jul/2012:10:43:58 -0400] - slapd shutting down - signaling operation threads [11/Jul/2012:10:43:58 -0400] - slapd shutting down - waiting for 2 threads to terminate [11/Jul/2012:10:44:01 -0400] - slapd shutting down - closing down internal subsystems and plugins [11/Jul/2012:10:44:02 -0400] - Waiting for 4 database threads to stop [11/Jul/2012:10:44:04 -0400] - All database threads now stopped [11/Jul/2012:10:44:04 -0400] - slapd stopped. [11/Jul/2012:10:45:00 -0400] - 389-Directory/1.2.10.4 B2012.101.2023 starting up [11/Jul/2012:10:45:07 -0400] NSMMReplicationPlugin - ruv_compare_ruv: the max CSN [4ffdca7e000000330000] from RUV [changelog max RUV] is larger than the max CSN [4ffb605d000000330000] from RUV [database RUV] for element [{replica 51} 4ffb602b000300330000 4ffdca7e000000330000] [11/Jul/2012:10:45:07 -0400] NSMMReplicationPlugin - replica_check_for_data_reload: Warning: data for replica ou=accounts,ou=gtaccounts,ou=departments,dc=gted,dc=gatech,dc=edu does not match the data in the changelog. Recreating the changelog file. This could affect replication with replica's consumers in which case the consumers should be reinitialized. [11/Jul/2012:10:45:07 -0400] NSMMReplicationPlugin - ruv_compare_ruv: the max CSN [4ffdca70000000340000] from RUV [changelog max RUV] is larger than the max CSN [4ffb7098000100340000] from RUV [database RUV] for element [{replica 52} 4ffb6ea2000000340000 4ffdca70000000340000] [11/Jul/2012:10:45:07 -0400] NSMMReplicationPlugin - replica_check_for_data_reload: Warning: data for replica ou=people,dc=gted,dc=gatech,dc=edu does not match the data in the changelog. Recreating the changelog file. This could affect replication with replica's consumers in which case the consumers should be reinitialized. [11/Jul/2012:10:45:08 -0400] - slapd started. Listening on All Interfaces port 389 for LDAP requests [11/Jul/2012:10:45:08 -0400] - Listening on All Interfaces port 636 for LDAPS requests
The _second_ restart is even worse, we get more error messages (see below) and then the daemon dies after it says it's listening on it's ports:
[11/Jul/2012:10:45:32 -0400] - slapd shutting down - signaling operation threads [11/Jul/2012:10:45:32 -0400] - slapd shutting down - waiting for 29 threads to terminate [11/Jul/2012:10:45:34 -0400] - slapd shutting down - closing down internal subsystems and plugins [11/Jul/2012:10:45:35 -0400] - Waiting for 4 database threads to stop [11/Jul/2012:10:45:36 -0400] - All database threads now stopped [11/Jul/2012:10:45:36 -0400] - slapd stopped. [11/Jul/2012:10:46:11 -0400] - 389-Directory/1.2.10.4 B2012.101.2023 starting up [11/Jul/2012:10:46:11 -0400] NSMMReplicationPlugin - ruv_compare_ruv: RUV [changelog max RUV] does not contain element [{replica 68 ldap://gtedm3.iam.gatech.edu:389} 4be339e6000000440000 4ffdc9a1000000440000] which is present in RUV [database RUV] [11/Jul/2012:10:46:11 -0400] NSMMReplicationPlugin - ruv_compare_ruv: RUV [changelog max RUV] does not contain element [{replica 71 ldap://gtedm4.iam.gatech.edu:389} 4be6031e000000470000 4ffdc9a8000000470000] which is present in RUV [database RUV] [11/Jul/2012:10:46:11 -0400] NSMMReplicationPlugin - ruv_compare_ruv: the max CSN [4ffb62a2000100330000] from RUV [changelog max RUV] is larger than the max CSN [4ffb605d000000330000] from RUV [database RUV] for element [{replica 51} 4ffb605d000000330000 4ffb62a2000100330000] [11/Jul/2012:10:46:11 -0400] NSMMReplicationPlugin - replica_check_for_data_reload: Warning: data for replica ou=accounts,ou=gtaccounts,ou=departments,dc=gted,dc=gatech,dc=edu does not match the data in the changelog. Recreating the changelog file. This could affect replication with replica's consumers in which case the consumers should be reinitialized. [11/Jul/2012:10:46:11 -0400] NSMMReplicationPlugin - ruv_compare_ruv: RUV [changelog max RUV] does not contain element [{replica 69 ldap://gtedm3.iam.gatech.edu:389} 4be339e4000000450000 4ffdc9a2000000450000] which is present in RUV [database RUV] [11/Jul/2012:10:46:11 -0400] NSMMReplicationPlugin - ruv_compare_ruv: RUV [changelog max RUV] does not contain element [{replica 72 ldap://gtedm4.iam.gatech.edu:389} 4be6031d000000480000 4ffdc9a9000300480000] which is present in RUV [database RUV] [11/Jul/2012:10:46:11 -0400] NSMMReplicationPlugin - ruv_compare_ruv: the max CSN [4ffb78bc000000340000] from RUV [changelog max RUV] is larger than the max CSN [4ffb7098000100340000] from RUV [database RUV] for element [{replica 52} 4ffb7098000100340000 4ffb78bc000000340000] [11/Jul/2012:10:46:11 -0400] NSMMReplicationPlugin - replica_check_for_data_reload: Warning: data for replica ou=people,dc=gted,dc=gatech,dc=edu does not match the data in the changelog. Recreating the changelog file. This could affect replication with replica's consumers in which case the consumers should be reinitialized. [11/Jul/2012:10:46:11 -0400] - slapd started. Listening on All Interfaces port 389 for LDAP requests [11/Jul/2012:10:46:11 -0400] - Listening on All Interfaces port 636 for LDAPS requests
At this point, the only way I've found to get it back is to clean out the changelog and db directories and re-import the ldap data from scratch. Essentially we can't restart without having to re-import. I've done this a couple of times already and it's entirely reproducible.
I've checked and ensured that there's no obsolete masters that need to be CLEANRUVed. I've also noticed that the errors _seem_ to be only affecting our second and third suffix. We have three suffixes defined, but I haven't seen any error messages for the first one.
Has anyone seen anything like this? We're not sure if this is a general 1.2.10.4 issue or if it only occurs if when replicating from 1.2.8.3 to 1.2.10.4. If it's the former, we cannot proceed with getting the rest of the servers up to 1.2.10.4. If it's the latter, then we need to expedite getting everything up to 1.2.10.4.
On 07/11/2012 11:12 AM, Robert Viduya wrote:
Is replication from a 1.2.8.3 server to a 1.2.10.4 server known to work or not work? We're having changelog issues.
Background:
We have an ldap service consisting of 3 masters, 2 hubs and 16 slaves. All were running 1.2.8.3 since last summer with no issues. This summer, we decided to bring them all up to the latest stable release, 1.2.10.4. We can't afford a lot of downtime for the service as a whole, but with the redundancy level we have, we can take down a machine or two at a time without user impact.
We started with one slave, did a clean install of 1.2.10.4 on it, set up replication agreements from our 1.2.8.3 hubs to it and watched it for a week or so. Everything looked fine, so we started rolling through the rest of the slave servers, got them all running 1.2.10.4 and so far haven't seen any problems.
A couple of days ago, I did one of our two hubs. The first time I bring up the daemon after doing the initial import of our ldap data everything seems fine. However, we start seeing errors the first time we restart:
[11/Jul/2012:10:43:58 -0400] - slapd shutting down - signaling operation threads [11/Jul/2012:10:43:58 -0400] - slapd shutting down - waiting for 2 threads to terminate [11/Jul/2012:10:44:01 -0400] - slapd shutting down - closing down internal subsystems and plugins [11/Jul/2012:10:44:02 -0400] - Waiting for 4 database threads to stop [11/Jul/2012:10:44:04 -0400] - All database threads now stopped [11/Jul/2012:10:44:04 -0400] - slapd stopped. [11/Jul/2012:10:45:00 -0400] - 389-Directory/1.2.10.4 B2012.101.2023 starting up [11/Jul/2012:10:45:07 -0400] NSMMReplicationPlugin - ruv_compare_ruv: the max CSN [4ffdca7e000000330000] from RUV [changelog max RUV] is larger than the max CSN [4ffb605d000000330000] from RUV [database RUV] for element [{replica 51} 4ffb602b000300330000 4ffdca7e000000330000] [11/Jul/2012:10:45:07 -0400] NSMMReplicationPlugin - replica_check_for_data_reload: Warning: data for replica ou=accounts,ou=gtaccounts,ou=departments,dc=gted,dc=gatech,dc=edu does not match the data in the changelog. Recreating the changelog file. This could affect replication with replica's consumers in which case the consumers should be reinitialized. [11/Jul/2012:10:45:07 -0400] NSMMReplicationPlugin - ruv_compare_ruv: the max CSN [4ffdca70000000340000] from RUV [changelog max RUV] is larger than the max CSN [4ffb7098000100340000] from RUV [database RUV] for element [{replica 52} 4ffb6ea2000000340000 4ffdca70000000340000] [11/Jul/2012:10:45:07 -0400] NSMMReplicationPlugin - replica_check_for_data_reload: Warning: data for replica ou=people,dc=gted,dc=gatech,dc=edu does not match the data in the changelog. Recreating the changelog file. This could affect replication with replica's consumers in which case the consumers should be reinitialized. [11/Jul/2012:10:45:08 -0400] - slapd started. Listening on All Interfaces port 389 for LDAP requests [11/Jul/2012:10:45:08 -0400] - Listening on All Interfaces port 636 for LDAPS requests
The problem is that hubs have changelogs but dedicated consumers do not.
Were either of the replicas with ID 51 or 52 removed/deleted at some point in the past?
The _second_ restart is even worse, we get more error messages (see below) and then the daemon dies
Dies? Exits? Crashes? Core files? Do you see any ns-slapd segfault messages in /var/log/messages? When you restart the directory server after it dies, do you see "Disorderly Shutdown" messages in the directory server errors log?
after it says it's listening on it's ports:
[11/Jul/2012:10:45:32 -0400] - slapd shutting down - signaling operation threads [11/Jul/2012:10:45:32 -0400] - slapd shutting down - waiting for 29 threads to terminate [11/Jul/2012:10:45:34 -0400] - slapd shutting down - closing down internal subsystems and plugins [11/Jul/2012:10:45:35 -0400] - Waiting for 4 database threads to stop [11/Jul/2012:10:45:36 -0400] - All database threads now stopped [11/Jul/2012:10:45:36 -0400] - slapd stopped. [11/Jul/2012:10:46:11 -0400] - 389-Directory/1.2.10.4 B2012.101.2023 starting up [11/Jul/2012:10:46:11 -0400] NSMMReplicationPlugin - ruv_compare_ruv: RUV [changelog max RUV] does not contain element [{replica 68 ldap://gtedm3.iam.gatech.edu:389} 4be339e6000000440000 4ffdc9a1000000440000] which is present in RUV [database RUV] [11/Jul/2012:10:46:11 -0400] NSMMReplicationPlugin - ruv_compare_ruv: RUV [changelog max RUV] does not contain element [{replica 71 ldap://gtedm4.iam.gatech.edu:389} 4be6031e000000470000 4ffdc9a8000000470000] which is present in RUV [database RUV] [11/Jul/2012:10:46:11 -0400] NSMMReplicationPlugin - ruv_compare_ruv: the max CSN [4ffb62a2000100330000] from RUV [changelog max RUV] is larger than the max CSN [4ffb605d000000330000] from RUV [database RUV] for element [{replica 51} 4ffb605d000000330000 4ffb62a2000100330000] [11/Jul/2012:10:46:11 -0400] NSMMReplicationPlugin - replica_check_for_data_reload: Warning: data for replica ou=accounts,ou=gtaccounts,ou=departments,dc=gted,dc=gatech,dc=edu does not match the data in the changelog. Recreating the changelog file. This could affect replication with replica's consumers in which case the consumers should be reinitialized. [11/Jul/2012:10:46:11 -0400] NSMMReplicationPlugin - ruv_compare_ruv: RUV [changelog max RUV] does not contain element [{replica 69 ldap://gtedm3.iam.gatech.edu:389} 4be339e4000000450000 4ffdc9a2000000450000] which is present in RUV [database RUV] [11/Jul/2012:10:46:11 -0400] NSMMReplicationPlugin - ruv_compare_ruv: RUV [changelog max RUV] does not contain element [{replica 72 ldap://gtedm4.iam.gatech.edu:389} 4be6031d000000480000 4ffdc9a9000300480000] which is present in RUV [database RUV] [11/Jul/2012:10:46:11 -0400] NSMMReplicationPlugin - ruv_compare_ruv: the max CSN [4ffb78bc000000340000] from RUV [changelog max RUV] is larger than the max CSN [4ffb7098000100340000] from RUV [database RUV] for element [{replica 52} 4ffb7098000100340000 4ffb78bc000000340000] [11/Jul/2012:10:46:11 -0400] NSMMReplicationPlugin - replica_check_for_data_reload: Warning: data for replica ou=people,dc=gted,dc=gatech,dc=edu does not match the data in the changelog. Recreating the changelog file. This could affect replication with replica's consumers in which case the consumers should be reinitialized. [11/Jul/2012:10:46:11 -0400] - slapd started. Listening on All Interfaces port 389 for LDAP requests [11/Jul/2012:10:46:11 -0400] - Listening on All Interfaces port 636 for LDAPS requests
At this point, the only way I've found to get it back is to clean out the changelog and db directories and re-import the ldap data from scratch. Essentially we can't restart without having to re-import. I've done this a couple of times already and it's entirely reproducible.
So every time you shutdown the server, and attempt to restart it, it doesn't start until you re-import?
I've checked and ensured that there's no obsolete masters that need to be CLEANRUVed. I've also noticed that the errors _seem_ to be only affecting our second and third suffix. We have three suffixes defined, but I haven't seen any error messages for the first one.
Has anyone seen anything like this? We're not sure if this is a general 1.2.10.4 issue or if it only occurs if when replicating from 1.2.8.3 to 1.2.10.4. If it's the former, we cannot proceed with getting the rest of the servers up to 1.2.10.4. If it's the latter, then we need to expedite getting everything up to 1.2.10.4.
These do not seem like issues related to replicating from 1.2.8 to 1.2.10. Have you tried a simple test of setting up 2 1.2.10 masters and attempting to replicate your data between them?
-- 389 users mailing list 389-users@lists.fedoraproject.org https://admin.fedoraproject.org/mailman/listinfo/389-users
On Jul 11, 2012, at 7:17 PM, Rich Megginson wrote:
On 07/11/2012 11:12 AM, Robert Viduya wrote:
Is replication from a 1.2.8.3 server to a 1.2.10.4 server known to work or not work? We're having changelog issues.
Background:
We have an ldap service consisting of 3 masters, 2 hubs and 16 slaves. All were running 1.2.8.3 since last summer with no issues. This summer, we decided to bring them all up to the latest stable release, 1.2.10.4. We can't afford a lot of downtime for the service as a whole, but with the redundancy level we have, we can take down a machine or two at a time without user impact.
We started with one slave, did a clean install of 1.2.10.4 on it, set up replication agreements from our 1.2.8.3 hubs to it and watched it for a week or so. Everything looked fine, so we started rolling through the rest of the slave servers, got them all running 1.2.10.4 and so far haven't seen any problems.
A couple of days ago, I did one of our two hubs. The first time I bring up the daemon after doing the initial import of our ldap data everything seems fine. However, we start seeing errors the first time we restart:
[11/Jul/2012:10:43:58 -0400] - slapd shutting down - signaling operation threads [11/Jul/2012:10:43:58 -0400] - slapd shutting down - waiting for 2 threads to terminate [11/Jul/2012:10:44:01 -0400] - slapd shutting down - closing down internal subsystems and plugins [11/Jul/2012:10:44:02 -0400] - Waiting for 4 database threads to stop [11/Jul/2012:10:44:04 -0400] - All database threads now stopped [11/Jul/2012:10:44:04 -0400] - slapd stopped. [11/Jul/2012:10:45:00 -0400] - 389-Directory/1.2.10.4 B2012.101.2023 starting up [11/Jul/2012:10:45:07 -0400] NSMMReplicationPlugin - ruv_compare_ruv: the max CSN [4ffdca7e000000330000] from RUV [changelog max RUV] is larger than the max CSN [4ffb605d000000330000] from RUV [database RUV] for element [{replica 51} 4ffb602b000300330000 4ffdca7e000000330000] [11/Jul/2012:10:45:07 -0400] NSMMReplicationPlugin - replica_check_for_data_reload: Warning: data for replica ou=accounts,ou=gtaccounts,ou=departments,dc=gted,dc=gatech,dc=edu does not match the data in the changelog. Recreating the changelog file. This could affect replication with replica's consumers in which case the consumers should be reinitialized. [11/Jul/2012:10:45:07 -0400] NSMMReplicationPlugin - ruv_compare_ruv: the max CSN [4ffdca70000000340000] from RUV [changelog max RUV] is larger than the max CSN [4ffb7098000100340000] from RUV [database RUV] for element [{replica 52} 4ffb6ea2000000340000 4ffdca70000000340000] [11/Jul/2012:10:45:07 -0400] NSMMReplicationPlugin - replica_check_for_data_reload: Warning: data for replica ou=people,dc=gted,dc=gatech,dc=edu does not match the data in the changelog. Recreating the changelog file. This could affect replication with replica's consumers in which case the consumers should be reinitialized. [11/Jul/2012:10:45:08 -0400] - slapd started. Listening on All Interfaces port 389 for LDAP requests [11/Jul/2012:10:45:08 -0400] - Listening on All Interfaces port 636 for LDAPS requests
The problem is that hubs have changelogs but dedicated consumers do not.
Were either of the replicas with ID 51 or 52 removed/deleted at some point in the past?
No, 51 and 52 belong to an active, functional master.
The _second_ restart is even worse, we get more error messages (see below) and then the daemon dies
Dies? Exits? Crashes? Core files? Do you see any ns-slapd segfault messages in /var/log/messages? When you restart the directory server after it dies, do you see "Disorderly Shutdown" messages in the directory server errors log?
Found these in the kernel log file:
Jul 11 10:46:26 bellar kernel: ns-slapd[4041]: segfault at 0000000000000011 rip 00002b5fe0801857 rsp 0000000076e65970 error 4 Jul 11 10:47:23 bellar kernel: ns-slapd[4714]: segfault at 0000000000000011 rip 00002b980c6ce857 rsp 00000000681f5970 error 4
And yes, we get "Disorderly Shutdown" messages in the errors log.
after it says it's listening on it's ports:
[11/Jul/2012:10:45:32 -0400] - slapd shutting down - signaling operation threads [11/Jul/2012:10:45:32 -0400] - slapd shutting down - waiting for 29 threads to terminate [11/Jul/2012:10:45:34 -0400] - slapd shutting down - closing down internal subsystems and plugins [11/Jul/2012:10:45:35 -0400] - Waiting for 4 database threads to stop [11/Jul/2012:10:45:36 -0400] - All database threads now stopped [11/Jul/2012:10:45:36 -0400] - slapd stopped. [11/Jul/2012:10:46:11 -0400] - 389-Directory/1.2.10.4 B2012.101.2023 starting up [11/Jul/2012:10:46:11 -0400] NSMMReplicationPlugin - ruv_compare_ruv: RUV [changelog max RUV] does not contain element [{replica 68 ldap://gtedm3.iam.gatech.edu:389} 4be339e6000000440000 4ffdc9a1000000440000] which is present in RUV [database RUV] [11/Jul/2012:10:46:11 -0400] NSMMReplicationPlugin - ruv_compare_ruv: RUV [changelog max RUV] does not contain element [{replica 71 ldap://gtedm4.iam.gatech.edu:389} 4be6031e000000470000 4ffdc9a8000000470000] which is present in RUV [database RUV] [11/Jul/2012:10:46:11 -0400] NSMMReplicationPlugin - ruv_compare_ruv: the max CSN [4ffb62a2000100330000] from RUV [changelog max RUV] is larger than the max CSN [4ffb605d000000330000] from RUV [database RUV] for element [{replica 51} 4ffb605d000000330000 4ffb62a2000100330000] [11/Jul/2012:10:46:11 -0400] NSMMReplicationPlugin - replica_check_for_data_reload: Warning: data for replica ou=accounts,ou=gtaccounts,ou=departments,dc=gted,dc=gatech,dc=edu does not match the data in the changelog. Recreating the changelog file. This could affect replication with replica's consumers in which case the consumers should be reinitialized. [11/Jul/2012:10:46:11 -0400] NSMMReplicationPlugin - ruv_compare_ruv: RUV [changelog max RUV] does not contain element [{replica 69 ldap://gtedm3.iam.gatech.edu:389} 4be339e4000000450000 4ffdc9a2000000450000] which is present in RUV [database RUV] [11/Jul/2012:10:46:11 -0400] NSMMReplicationPlugin - ruv_compare_ruv: RUV [changelog max RUV] does not contain element [{replica 72 ldap://gtedm4.iam.gatech.edu:389} 4be6031d000000480000 4ffdc9a9000300480000] which is present in RUV [database RUV] [11/Jul/2012:10:46:11 -0400] NSMMReplicationPlugin - ruv_compare_ruv: the max CSN [4ffb78bc000000340000] from RUV [changelog max RUV] is larger than the max CSN [4ffb7098000100340000] from RUV [database RUV] for element [{replica 52} 4ffb7098000100340000 4ffb78bc000000340000] [11/Jul/2012:10:46:11 -0400] NSMMReplicationPlugin - replica_check_for_data_reload: Warning: data for replica ou=people,dc=gted,dc=gatech,dc=edu does not match the data in the changelog. Recreating the changelog file. This could affect replication with replica's consumers in which case the consumers should be reinitialized. [11/Jul/2012:10:46:11 -0400] - slapd started. Listening on All Interfaces port 389 for LDAP requests [11/Jul/2012:10:46:11 -0400] - Listening on All Interfaces port 636 for LDAPS requests
At this point, the only way I've found to get it back is to clean out the changelog and db directories and re-import the ldap data from scratch. Essentially we can't restart without having to re-import. I've done this a couple of times already and it's entirely reproducible.
So every time you shutdown the server, and attempt to restart it, it doesn't start until you re-import?
No, the first restart works, but we get changelog errors in the log file. Subsequent restarts don't work at all without rebuilding everything.
I've checked and ensured that there's no obsolete masters that need to be CLEANRUVed. I've also noticed that the errors _seem_ to be only affecting our second and third suffix. We have three suffixes defined, but I haven't seen any error messages for the first one.
Has anyone seen anything like this? We're not sure if this is a general 1.2.10.4 issue or if it only occurs if when replicating from 1.2.8.3 to 1.2.10.4. If it's the former, we cannot proceed with getting the rest of the servers up to 1.2.10.4. If it's the latter, then we need to expedite getting everything up to 1.2.10.4.
These do not seem like issues related to replicating from 1.2.8 to 1.2.10. Have you tried a simple test of setting up 2 1.2.10 masters and attempting to replicate your data between them?
Not yet, I may try this next, but it will take some time to set up.
-- 389 users mailing list 389-users@lists.fedoraproject.org https://admin.fedoraproject.org/mailman/listinfo/389-users
On 07/12/2012 08:50 AM, Robert Viduya wrote:
On Jul 11, 2012, at 7:17 PM, Rich Megginson wrote:
On 07/11/2012 11:12 AM, Robert Viduya wrote:
Is replication from a 1.2.8.3 server to a 1.2.10.4 server known to work or not work? We're having changelog issues.
Background:
We have an ldap service consisting of 3 masters, 2 hubs and 16 slaves. All were running 1.2.8.3 since last summer with no issues. This summer, we decided to bring them all up to the latest stable release, 1.2.10.4. We can't afford a lot of downtime for the service as a whole, but with the redundancy level we have, we can take down a machine or two at a time without user impact.
We started with one slave, did a clean install of 1.2.10.4 on it, set up replication agreements from our 1.2.8.3 hubs to it and watched it for a week or so. Everything looked fine, so we started rolling through the rest of the slave servers, got them all running 1.2.10.4 and so far haven't seen any problems.
A couple of days ago, I did one of our two hubs. The first time I bring up the daemon after doing the initial import of our ldap data everything seems fine. However, we start seeing errors the first time we restart:
[11/Jul/2012:10:43:58 -0400] - slapd shutting down - signaling operation threads [11/Jul/2012:10:43:58 -0400] - slapd shutting down - waiting for 2 threads to terminate [11/Jul/2012:10:44:01 -0400] - slapd shutting down - closing down internal subsystems and plugins [11/Jul/2012:10:44:02 -0400] - Waiting for 4 database threads to stop [11/Jul/2012:10:44:04 -0400] - All database threads now stopped [11/Jul/2012:10:44:04 -0400] - slapd stopped. [11/Jul/2012:10:45:00 -0400] - 389-Directory/1.2.10.4 B2012.101.2023 starting up [11/Jul/2012:10:45:07 -0400] NSMMReplicationPlugin - ruv_compare_ruv: the max CSN [4ffdca7e000000330000] from RUV [changelog max RUV] is larger than the max CSN [4ffb605d000000330000] from RUV [database RUV] for element [{replica 51} 4ffb602b000300330000 4ffdca7e000000330000] [11/Jul/2012:10:45:07 -0400] NSMMReplicationPlugin - replica_check_for_data_reload: Warning: data for replica ou=accounts,ou=gtaccounts,ou=departments,dc=gted,dc=gatech,dc=edu does not match the data in the changelog. Recreating the changelog file. This could affect replication with replica's consumers in which case the consumers should be reinitialized. [11/Jul/2012:10:45:07 -0400] NSMMReplicationPlugin - ruv_compare_ruv: the max CSN [4ffdca70000000340000] from RUV [changelog max RUV] is larger than the max CSN [4ffb7098000100340000] from RUV [database RUV] for element [{replica 52} 4ffb6ea2000000340000 4ffdca70000000340000] [11/Jul/2012:10:45:07 -0400] NSMMReplicationPlugin - replica_check_for_data_reload: Warning: data for replica ou=people,dc=gted,dc=gatech,dc=edu does not match the data in the changelog. Recreating the changelog file. This could affect replication with replica's consumers in which case the consumers should be reinitialized. [11/Jul/2012:10:45:08 -0400] - slapd started. Listening on All Interfaces port 389 for LDAP requests [11/Jul/2012:10:45:08 -0400] - Listening on All Interfaces port 636 for LDAPS requests
The problem is that hubs have changelogs but dedicated consumers do not.
Were either of the replicas with ID 51 or 52 removed/deleted at some point in the past?
No, 51 and 52 belong to an active, functional master.
So is it possible that the hub was
The _second_ restart is even worse, we get more error messages (see below) and then the daemon dies
Dies? Exits? Crashes? Core files? Do you see any ns-slapd segfault messages in /var/log/messages? When you restart the directory server after it dies, do you see "Disorderly Shutdown" messages in the directory server errors log?
Found these in the kernel log file:
Jul 11 10:46:26 bellar kernel: ns-slapd[4041]: segfault at 0000000000000011 rip 00002b5fe0801857 rsp 0000000076e65970 error 4 Jul 11 10:47:23 bellar kernel: ns-slapd[4714]: segfault at 0000000000000011 rip 00002b980c6ce857 rsp 00000000681f5970 error 4
And yes, we get "Disorderly Shutdown" messages in the errors log.
ok - please follow the directions at http://port389.org/wiki/FAQ#Debugging_Crashes to enable core files and get a stack trace
Also, 1.2.10.12 is available in the testing repos. Please give this a try. There were a couple of fixes made since 1.2.10.4 that may be applicable:
Ticket 336 [abrt] 389-ds-base-1.2.10.4-2.fc16: index_range_read_ext: Process /usr/sbin/ns-slapd was killed by signal 11 (SIGSEGV) Ticket #347 - IPA dirsvr seg-fault during system longevity test Ticket #348 - crash in ldap_initialize with multiple threads Ticket #361: Bad DNs in ACIs can segfault ns-slapd Trac Ticket #359 - Database RUV could mismatch the one in changelog under the stress Ticket #382 - DS Shuts down intermittently Ticket #390 - [abrt] 389-ds-base-1.2.10.6-1.fc16: slapi_attr_value_cmp: Process /usr/sbin/ns-slapd was killed by signal 11 (SIGSEGV
after it says it's listening on it's ports:
[11/Jul/2012:10:45:32 -0400] - slapd shutting down - signaling operation threads [11/Jul/2012:10:45:32 -0400] - slapd shutting down - waiting for 29 threads to terminate [11/Jul/2012:10:45:34 -0400] - slapd shutting down - closing down internal subsystems and plugins [11/Jul/2012:10:45:35 -0400] - Waiting for 4 database threads to stop [11/Jul/2012:10:45:36 -0400] - All database threads now stopped [11/Jul/2012:10:45:36 -0400] - slapd stopped. [11/Jul/2012:10:46:11 -0400] - 389-Directory/1.2.10.4 B2012.101.2023 starting up [11/Jul/2012:10:46:11 -0400] NSMMReplicationPlugin - ruv_compare_ruv: RUV [changelog max RUV] does not contain element [{replica 68 ldap://gtedm3.iam.gatech.edu:389} 4be339e6000000440000 4ffdc9a1000000440000] which is present in RUV [database RUV] [11/Jul/2012:10:46:11 -0400] NSMMReplicationPlugin - ruv_compare_ruv: RUV [changelog max RUV] does not contain element [{replica 71 ldap://gtedm4.iam.gatech.edu:389} 4be6031e000000470000 4ffdc9a8000000470000] which is present in RUV [database RUV] [11/Jul/2012:10:46:11 -0400] NSMMReplicationPlugin - ruv_compare_ruv: the max CSN [4ffb62a2000100330000] from RUV [changelog max RUV] is larger than the max CSN [4ffb605d000000330000] from RUV [database RUV] for element [{replica 51} 4ffb605d000000330000 4ffb62a2000100330000] [11/Jul/2012:10:46:11 -0400] NSMMReplicationPlugin - replica_check_for_data_reload: Warning: data for replica ou=accounts,ou=gtaccounts,ou=departments,dc=gted,dc=gatech,dc=edu does not match the data in the changelog. Recreating the changelog file. This could affect replication with replica's consumers in which case the consumers should be reinitialized. [11/Jul/2012:10:46:11 -0400] NSMMReplicationPlugin - ruv_compare_ruv: RUV [changelog max RUV] does not contain element [{replica 69 ldap://gtedm3.iam.gatech.edu:389} 4be339e4000000450000 4ffdc9a2000000450000] which is present in RUV [database RUV] [11/Jul/2012:10:46:11 -0400] NSMMReplicationPlugin - ruv_compare_ruv: RUV [changelog max RUV] does not contain element [{replica 72 ldap://gtedm4.iam.gatech.edu:389} 4be6031d000000480000 4ffdc9a9000300480000] which is present in RUV [database RUV] [11/Jul/2012:10:46:11 -0400] NSMMReplicationPlugin - ruv_compare_ruv: the max CSN [4ffb78bc000000340000] from RUV [changelog max RUV] is larger than the max CSN [4ffb7098000100340000] from RUV [database RUV] for element [{replica 52} 4ffb7098000100340000 4ffb78bc000000340000] [11/Jul/2012:10:46:11 -0400] NSMMReplicationPlugin - replica_check_for_data_reload: Warning: data for replica ou=people,dc=gted,dc=gatech,dc=edu does not match the data in the changelog. Recreating the changelog file. This could affect replication with replica's consumers in which case the consumers should be reinitialized. [11/Jul/2012:10:46:11 -0400] - slapd started. Listening on All Interfaces port 389 for LDAP requests [11/Jul/2012:10:46:11 -0400] - Listening on All Interfaces port 636 for LDAPS requests
At this point, the only way I've found to get it back is to clean out the changelog and db directories and re-import the ldap data from scratch. Essentially we can't restart without having to re-import. I've done this a couple of times already and it's entirely reproducible.
So every time you shutdown the server, and attempt to restart it, it doesn't start until you re-import?
No, the first restart works, but we get changelog errors in the log file. Subsequent restarts don't work at all without rebuilding everything.
I've checked and ensured that there's no obsolete masters that need to be CLEANRUVed. I've also noticed that the errors _seem_ to be only affecting our second and third suffix. We have three suffixes defined, but I haven't seen any error messages for the first one.
Has anyone seen anything like this? We're not sure if this is a general 1.2.10.4 issue or if it only occurs if when replicating from 1.2.8.3 to 1.2.10.4. If it's the former, we cannot proceed with getting the rest of the servers up to 1.2.10.4. If it's the latter, then we need to expedite getting everything up to 1.2.10.4.
These do not seem like issues related to replicating from 1.2.8 to 1.2.10. Have you tried a simple test of setting up 2 1.2.10 masters and attempting to replicate your data between them?
Not yet, I may try this next, but it will take some time to set up.
-- 389 users mailing list 389-users@lists.fedoraproject.org https://admin.fedoraproject.org/mailman/listinfo/389-users
-- 389 users mailing list 389-users@lists.fedoraproject.org https://admin.fedoraproject.org/mailman/listinfo/389-users
On Jul 12, 2012, at 11:36 AM, Rich Megginson wrote:
On 07/12/2012 08:50 AM, Robert Viduya wrote:
On Jul 11, 2012, at 7:17 PM, Rich Megginson wrote:
On 07/11/2012 11:12 AM, Robert Viduya wrote:
So is it possible that the hub was
This question seems incomplete?
ok - please follow the directions at http://port389.org/wiki/FAQ#Debugging_Crashes to enable core files and get a stack trace
Also, 1.2.10.12 is available in the testing repos. Please give this a try. There were a couple of fixes made since 1.2.10.4 that may be applicable:
Ticket 336 [abrt] 389-ds-base-1.2.10.4-2.fc16: index_range_read_ext: Process /usr/sbin/ns-slapd was killed by signal 11 (SIGSEGV) Ticket #347 - IPA dirsvr seg-fault during system longevity test Ticket #348 - crash in ldap_initialize with multiple threads Ticket #361: Bad DNs in ACIs can segfault ns-slapd Trac Ticket #359 - Database RUV could mismatch the one in changelog under the stress Ticket #382 - DS Shuts down intermittently Ticket #390 - [abrt] 389-ds-base-1.2.10.6-1.fc16: slapi_attr_value_cmp: Process /usr/sbin/ns-slapd was killed by signal 11 (SIGSEGV
I've enabled the core dump stuff, but now I can't seem to get it to crash. But I'm still getting the changelog messages in the error logs whenever I restart. In addition, the hub server keeps running out of disk space. I tracked it down to the access log filling up with MOD messages from replication. It looks like changes are coming down from our 1.2.8 servers and being applied over and over again. As an example, one of our entries was modified three times today, and on all our other machines I see the following in the access log file:
# egrep 78b8cc871a3cda9f352580e797b270bc access [12/Jul/2012:11:00:59 -0400] conn=383671 op=3145 MOD dn="gtdirguid=78b8cc871a3cda9f352580e797b270bc,ou=accounts,ou=gtaccounts,ou=departments,dc=gted,dc=gatech,dc=edu" [12/Jul/2012:11:01:24 -0400] conn=383671 op=3153 MOD dn="gtdirguid=78b8cc871a3cda9f352580e797b270bc,ou=accounts,ou=gtaccounts,ou=departments,dc=gted,dc=gatech,dc=edu" [12/Jul/2012:11:01:38 -0400] conn=383671 op=3157 MOD dn="gtdirguid=78b8cc871a3cda9f352580e797b270bc,ou=accounts,ou=gtaccounts,ou=departments,dc=gted,dc=gatech,dc=edu"
But on the problematic hub server, I see:
# egrep 78b8cc871a3cda9f352580e797b270bc access [12/Jul/2012:15:17:29 -0400] conn=2 op=58 MOD dn="gtdirguid=78b8cc871a3cda9f352580e797b270bc,ou=accounts,ou=gtaccounts,ou=departments,dc=gted,dc=gatech,dc=edu" [12/Jul/2012:15:17:29 -0400] conn=2 op=60 MOD dn="gtdirguid=78b8cc871a3cda9f352580e797b270bc,ou=accounts,ou=gtaccounts,ou=departments,dc=gted,dc=gatech,dc=edu" [12/Jul/2012:15:17:29 -0400] conn=2 op=61 MOD dn="gtdirguid=78b8cc871a3cda9f352580e797b270bc,ou=accounts,ou=gtaccounts,ou=departments,dc=gted,dc=gatech,dc=edu" [12/Jul/2012:15:24:42 -0400] conn=6 op=169 MOD dn="gtdirguid=78b8cc871a3cda9f352580e797b270bc,ou=accounts,ou=gtaccounts,ou=departments,dc=gted,dc=gatech,dc=edu" [12/Jul/2012:15:24:42 -0400] conn=6 op=171 MOD dn="gtdirguid=78b8cc871a3cda9f352580e797b270bc,ou=accounts,ou=gtaccounts,ou=departments,dc=gted,dc=gatech,dc=edu" [12/Jul/2012:15:24:42 -0400] conn=6 op=172 MOD dn="gtdirguid=78b8cc871a3cda9f352580e797b270bc,ou=accounts,ou=gtaccounts,ou=departments,dc=gted,dc=gatech,dc=edu" [12/Jul/2012:15:24:45 -0400] conn=3 op=170 MOD dn="gtdirguid=78b8cc871a3cda9f352580e797b270bc,ou=accounts,ou=gtaccounts,ou=departments,dc=gted,dc=gatech,dc=edu" [12/Jul/2012:15:24:45 -0400] conn=3 op=172 MOD dn="gtdirguid=78b8cc871a3cda9f352580e797b270bc,ou=accounts,ou=gtaccounts,ou=departments,dc=gted,dc=gatech,dc=edu" [12/Jul/2012:15:24:45 -0400] conn=3 op=173 MOD dn="gtdirguid=78b8cc871a3cda9f352580e797b270bc,ou=accounts,ou=gtaccounts,ou=departments,dc=gted,dc=gatech,dc=edu" [12/Jul/2012:15:24:51 -0400] conn=2 op=2234 MOD dn="gtdirguid=78b8cc871a3cda9f352580e797b270bc,ou=accounts,ou=gtaccounts,ou=departments,dc=gted,dc=gatech,dc=edu" [12/Jul/2012:15:24:51 -0400] conn=2 op=2236 MOD dn="gtdirguid=78b8cc871a3cda9f352580e797b270bc,ou=accounts,ou=gtaccounts,ou=departments,dc=gted,dc=gatech,dc=edu" [12/Jul/2012:15:24:51 -0400] conn=2 op=2237 MOD dn="gtdirguid=78b8cc871a3cda9f352580e797b270bc,ou=accounts,ou=gtaccounts,ou=departments,dc=gted,dc=gatech,dc=edu" [12/Jul/2012:15:24:55 -0400] conn=6 op=2233 MOD dn="gtdirguid=78b8cc871a3cda9f352580e797b270bc,ou=accounts,ou=gtaccounts,ou=departments,dc=gted,dc=gatech,dc=edu" [12/Jul/2012:15:24:55 -0400] conn=6 op=2235 MOD dn="gtdirguid=78b8cc871a3cda9f352580e797b270bc,ou=accounts,ou=gtaccounts,ou=departments,dc=gted,dc=gatech,dc=edu" [12/Jul/2012:15:24:55 -0400] conn=6 op=2236 MOD dn="gtdirguid=78b8cc871a3cda9f352580e797b270bc,ou=accounts,ou=gtaccounts,ou=departments,dc=gted,dc=gatech,dc=edu" [12/Jul/2012:15:24:57 -0400] conn=3 op=2234 MOD dn="gtdirguid=78b8cc871a3cda9f352580e797b270bc,ou=accounts,ou=gtaccounts,ou=departments,dc=gted,dc=gatech,dc=edu" ...
I truncated the output for brevity, but there's over 250 MODs to that one object. It's as if the server isn't able to do the replication bookkeeping and is accepting changes over and over again. Eventually the disk fills up.
I just upgraded it to 1.2.10.12 as suggested and just to be safe, I'm doing a clean import. We'll see how it goes.
On 07/12/2012 02:47 PM, Robert Viduya wrote:
On Jul 12, 2012, at 11:36 AM, Rich Megginson wrote:
On 07/12/2012 08:50 AM, Robert Viduya wrote:
On Jul 11, 2012, at 7:17 PM, Rich Megginson wrote:
On 07/11/2012 11:12 AM, Robert Viduya wrote:
So is it possible that the hub was
This question seems incomplete?
Sorry, I didn't mean to send that.
ok - please follow the directions at http://port389.org/wiki/FAQ#Debugging_Crashes to enable core files and get a stack trace
Also, 1.2.10.12 is available in the testing repos. Please give this a try. There were a couple of fixes made since 1.2.10.4 that may be applicable:
Ticket 336 [abrt] 389-ds-base-1.2.10.4-2.fc16: index_range_read_ext: Process /usr/sbin/ns-slapd was killed by signal 11 (SIGSEGV) Ticket #347 - IPA dirsvr seg-fault during system longevity test Ticket #348 - crash in ldap_initialize with multiple threads Ticket #361: Bad DNs in ACIs can segfault ns-slapd Trac Ticket #359 - Database RUV could mismatch the one in changelog under the stress Ticket #382 - DS Shuts down intermittently Ticket #390 - [abrt] 389-ds-base-1.2.10.6-1.fc16: slapi_attr_value_cmp: Process /usr/sbin/ns-slapd was killed by signal 11 (SIGSEGV
I've enabled the core dump stuff, but now I can't seem to get it to crash. But I'm still getting the changelog messages in the error logs whenever I restart. In addition, the hub server keeps running out of disk space. I tracked it down to the access log filling up with MOD messages from replication. It looks like changes are coming down from our 1.2.8 servers and being applied over and over again. As an example, one of our entries was modified three times today, and on all our other machines I see the following in the access log file:
# egrep 78b8cc871a3cda9f352580e797b270bc access [12/Jul/2012:11:00:59 -0400] conn=383671 op=3145 MOD dn="gtdirguid=78b8cc871a3cda9f352580e797b270bc,ou=accounts,ou=gtaccounts,ou=departments,dc=gted,dc=gatech,dc=edu" [12/Jul/2012:11:01:24 -0400] conn=383671 op=3153 MOD dn="gtdirguid=78b8cc871a3cda9f352580e797b270bc,ou=accounts,ou=gtaccounts,ou=departments,dc=gted,dc=gatech,dc=edu" [12/Jul/2012:11:01:38 -0400] conn=383671 op=3157 MOD dn="gtdirguid=78b8cc871a3cda9f352580e797b270bc,ou=accounts,ou=gtaccounts,ou=departments,dc=gted,dc=gatech,dc=edu"
But on the problematic hub server, I see:
# egrep 78b8cc871a3cda9f352580e797b270bc access [12/Jul/2012:15:17:29 -0400] conn=2 op=58 MOD dn="gtdirguid=78b8cc871a3cda9f352580e797b270bc,ou=accounts,ou=gtaccounts,ou=departments,dc=gted,dc=gatech,dc=edu" [12/Jul/2012:15:17:29 -0400] conn=2 op=60 MOD dn="gtdirguid=78b8cc871a3cda9f352580e797b270bc,ou=accounts,ou=gtaccounts,ou=departments,dc=gted,dc=gatech,dc=edu" [12/Jul/2012:15:17:29 -0400] conn=2 op=61 MOD dn="gtdirguid=78b8cc871a3cda9f352580e797b270bc,ou=accounts,ou=gtaccounts,ou=departments,dc=gted,dc=gatech,dc=edu" [12/Jul/2012:15:24:42 -0400] conn=6 op=169 MOD dn="gtdirguid=78b8cc871a3cda9f352580e797b270bc,ou=accounts,ou=gtaccounts,ou=departments,dc=gted,dc=gatech,dc=edu" [12/Jul/2012:15:24:42 -0400] conn=6 op=171 MOD dn="gtdirguid=78b8cc871a3cda9f352580e797b270bc,ou=accounts,ou=gtaccounts,ou=departments,dc=gted,dc=gatech,dc=edu" [12/Jul/2012:15:24:42 -0400] conn=6 op=172 MOD dn="gtdirguid=78b8cc871a3cda9f352580e797b270bc,ou=accounts,ou=gtaccounts,ou=departments,dc=gted,dc=gatech,dc=edu" [12/Jul/2012:15:24:45 -0400] conn=3 op=170 MOD dn="gtdirguid=78b8cc871a3cda9f352580e797b270bc,ou=accounts,ou=gtaccounts,ou=departments,dc=gted,dc=gatech,dc=edu" [12/Jul/2012:15:24:45 -0400] conn=3 op=172 MOD dn="gtdirguid=78b8cc871a3cda9f352580e797b270bc,ou=accounts,ou=gtaccounts,ou=departments,dc=gted,dc=gatech,dc=edu" [12/Jul/2012:15:24:45 -0400] conn=3 op=173 MOD dn="gtdirguid=78b8cc871a3cda9f352580e797b270bc,ou=accounts,ou=gtaccounts,ou=departments,dc=gted,dc=gatech,dc=edu" [12/Jul/2012:15:24:51 -0400] conn=2 op=2234 MOD dn="gtdirguid=78b8cc871a3cda9f352580e797b270bc,ou=accounts,ou=gtaccounts,ou=departments,dc=gted,dc=gatech,dc=edu" [12/Jul/2012:15:24:51 -0400] conn=2 op=2236 MOD dn="gtdirguid=78b8cc871a3cda9f352580e797b270bc,ou=accounts,ou=gtaccounts,ou=departments,dc=gted,dc=gatech,dc=edu" [12/Jul/2012:15:24:51 -0400] conn=2 op=2237 MOD dn="gtdirguid=78b8cc871a3cda9f352580e797b270bc,ou=accounts,ou=gtaccounts,ou=departments,dc=gted,dc=gatech,dc=edu" [12/Jul/2012:15:24:55 -0400] conn=6 op=2233 MOD dn="gtdirguid=78b8cc871a3cda9f352580e797b270bc,ou=accounts,ou=gtaccounts,ou=departments,dc=gted,dc=gatech,dc=edu" [12/Jul/2012:15:24:55 -0400] conn=6 op=2235 MOD dn="gtdirguid=78b8cc871a3cda9f352580e797b270bc,ou=accounts,ou=gtaccounts,ou=departments,dc=gted,dc=gatech,dc=edu" [12/Jul/2012:15:24:55 -0400] conn=6 op=2236 MOD dn="gtdirguid=78b8cc871a3cda9f352580e797b270bc,ou=accounts,ou=gtaccounts,ou=departments,dc=gted,dc=gatech,dc=edu" [12/Jul/2012:15:24:57 -0400] conn=3 op=2234 MOD dn="gtdirguid=78b8cc871a3cda9f352580e797b270bc,ou=accounts,ou=gtaccounts,ou=departments,dc=gted,dc=gatech,dc=edu" ...
I truncated the output for brevity, but there's over 250 MODs to that one object. It's as if the server isn't able to do the replication bookkeeping and is accepting changes over and over again. Eventually the disk fills up.
Do you see error messages from the supplier suggesting that it is attempting to send the operation but failing and retrying?
Do all of these operations have the same CSN? The csn will be logged with the RESULT line for the operation. Also, what is the err=? for the MOD operations? err=0? Some other code?
I just upgraded it to 1.2.10.12 as suggested and just to be safe, I'm doing a clean import. We'll see how it goes.
-- 389 users mailing list 389-users@lists.fedoraproject.org https://admin.fedoraproject.org/mailman/listinfo/389-users
I've enabled the core dump stuff, but now I can't seem to get it to crash. But I'm still getting the changelog messages in the error logs whenever I restart. In addition, the hub server keeps running out of disk space. I tracked it down to the access log filling up with MOD messages from replication. It looks like changes are coming down from our 1.2.8 servers and being applied over and over again. As an example, one of our entries was modified three times today, and on all our other machines I see the following in the access log file:
# egrep 78b8cc871a3cda9f352580e797b270bc access [12/Jul/2012:11:00:59 -0400] conn=383671 op=3145 MOD dn="gtdirguid=78b8cc871a3cda9f352580e797b270bc,ou=accounts,ou=gtaccounts,ou=departments,dc=gted,dc=gatech,dc=edu" [12/Jul/2012:11:01:24 -0400] conn=383671 op=3153 MOD dn="gtdirguid=78b8cc871a3cda9f352580e797b270bc,ou=accounts,ou=gtaccounts,ou=departments,dc=gted,dc=gatech,dc=edu" [12/Jul/2012:11:01:38 -0400] conn=383671 op=3157 MOD dn="gtdirguid=78b8cc871a3cda9f352580e797b270bc,ou=accounts,ou=gtaccounts,ou=departments,dc=gted,dc=gatech,dc=edu"
But on the problematic hub server, I see:
# egrep 78b8cc871a3cda9f352580e797b270bc access [12/Jul/2012:15:17:29 -0400] conn=2 op=58 MOD dn="gtdirguid=78b8cc871a3cda9f352580e797b270bc,ou=accounts,ou=gtaccounts,ou=departments,dc=gted,dc=gatech,dc=edu" [12/Jul/2012:15:17:29 -0400] conn=2 op=60 MOD dn="gtdirguid=78b8cc871a3cda9f352580e797b270bc,ou=accounts,ou=gtaccounts,ou=departments,dc=gted,dc=gatech,dc=edu" [12/Jul/2012:15:17:29 -0400] conn=2 op=61 MOD dn="gtdirguid=78b8cc871a3cda9f352580e797b270bc,ou=accounts,ou=gtaccounts,ou=departments,dc=gted,dc=gatech,dc=edu" [12/Jul/2012:15:24:42 -0400] conn=6 op=169 MOD dn="gtdirguid=78b8cc871a3cda9f352580e797b270bc,ou=accounts,ou=gtaccounts,ou=departments,dc=gted,dc=gatech,dc=edu" [12/Jul/2012:15:24:42 -0400] conn=6 op=171 MOD dn="gtdirguid=78b8cc871a3cda9f352580e797b270bc,ou=accounts,ou=gtaccounts,ou=departments,dc=gted,dc=gatech,dc=edu" [12/Jul/2012:15:24:42 -0400] conn=6 op=172 MOD dn="gtdirguid=78b8cc871a3cda9f352580e797b270bc,ou=accounts,ou=gtaccounts,ou=departments,dc=gted,dc=gatech,dc=edu" [12/Jul/2012:15:24:45 -0400] conn=3 op=170 MOD dn="gtdirguid=78b8cc871a3cda9f352580e797b270bc,ou=accounts,ou=gtaccounts,ou=departments,dc=gted,dc=gatech,dc=edu" [12/Jul/2012:15:24:45 -0400] conn=3 op=172 MOD dn="gtdirguid=78b8cc871a3cda9f352580e797b270bc,ou=accounts,ou=gtaccounts,ou=departments,dc=gted,dc=gatech,dc=edu" [12/Jul/2012:15:24:45 -0400] conn=3 op=173 MOD dn="gtdirguid=78b8cc871a3cda9f352580e797b270bc,ou=accounts,ou=gtaccounts,ou=departments,dc=gted,dc=gatech,dc=edu" [12/Jul/2012:15:24:51 -0400] conn=2 op=2234 MOD dn="gtdirguid=78b8cc871a3cda9f352580e797b270bc,ou=accounts,ou=gtaccounts,ou=departments,dc=gted,dc=gatech,dc=edu" [12/Jul/2012:15:24:51 -0400] conn=2 op=2236 MOD dn="gtdirguid=78b8cc871a3cda9f352580e797b270bc,ou=accounts,ou=gtaccounts,ou=departments,dc=gted,dc=gatech,dc=edu" [12/Jul/2012:15:24:51 -0400] conn=2 op=2237 MOD dn="gtdirguid=78b8cc871a3cda9f352580e797b270bc,ou=accounts,ou=gtaccounts,ou=departments,dc=gted,dc=gatech,dc=edu" [12/Jul/2012:15:24:55 -0400] conn=6 op=2233 MOD dn="gtdirguid=78b8cc871a3cda9f352580e797b270bc,ou=accounts,ou=gtaccounts,ou=departments,dc=gted,dc=gatech,dc=edu" [12/Jul/2012:15:24:55 -0400] conn=6 op=2235 MOD dn="gtdirguid=78b8cc871a3cda9f352580e797b270bc,ou=accounts,ou=gtaccounts,ou=departments,dc=gted,dc=gatech,dc=edu" [12/Jul/2012:15:24:55 -0400] conn=6 op=2236 MOD dn="gtdirguid=78b8cc871a3cda9f352580e797b270bc,ou=accounts,ou=gtaccounts,ou=departments,dc=gted,dc=gatech,dc=edu" [12/Jul/2012:15:24:57 -0400] conn=3 op=2234 MOD dn="gtdirguid=78b8cc871a3cda9f352580e797b270bc,ou=accounts,ou=gtaccounts,ou=departments,dc=gted,dc=gatech,dc=edu" ...
I truncated the output for brevity, but there's over 250 MODs to that one object. It's as if the server isn't able to do the replication bookkeeping and is accepting changes over and over again. Eventually the disk fills up.
Do you see error messages from the supplier suggesting that it is attempting to send the operation but failing and retrying?
No, there's nothing in the error logs on the supplier side.
Do all of these operations have the same CSN? The csn will be logged with the RESULT line for the operation. Also, what is the err=? for the MOD operations? err=0? Some other code?
Here's some sample out, again, limited for brevity. Most of the RESULT lines don't have a CSN, just the first few. All the err= codes are 0. I've grepped out just the DN sample from my previous mail, again for brevity. There's a lot more DNs being reported:
[12/Jul/2012:15:17:29 -0400] conn=2 op=58 MOD dn="gtdirguid=78b8cc871a3cda9f352580e797b270bc,ou=accounts,ou=gtaccounts,ou=departments,dc=gted,dc=gatech,dc=edu" [12/Jul/2012:15:17:29 -0400] conn=2 op=58 RESULT err=0 tag=103 nentries=0 etime=0 csn=4fff2000000000330000 [12/Jul/2012:15:17:29 -0400] conn=2 op=60 MOD dn="gtdirguid=78b8cc871a3cda9f352580e797b270bc,ou=accounts,ou=gtaccounts,ou=departments,dc=gted,dc=gatech,dc=edu" [12/Jul/2012:15:17:29 -0400] conn=2 op=60 RESULT err=0 tag=103 nentries=0 etime=0 csn=4fff200f000000330000 [12/Jul/2012:15:17:29 -0400] conn=2 op=61 MOD dn="gtdirguid=78b8cc871a3cda9f352580e797b270bc,ou=accounts,ou=gtaccounts,ou=departments,dc=gted,dc=gatech,dc=edu" [12/Jul/2012:15:17:29 -0400] conn=2 op=61 RESULT err=0 tag=103 nentries=0 etime=0 csn=4fff2027000000330000 [12/Jul/2012:15:24:42 -0400] conn=6 op=169 MOD dn="gtdirguid=78b8cc871a3cda9f352580e797b270bc,ou=accounts,ou=gtaccounts,ou=departments,dc=gted,dc=gatech,dc=edu" [12/Jul/2012:15:24:42 -0400] conn=6 op=169 RESULT err=0 tag=103 nentries=0 etime=0 [12/Jul/2012:15:24:42 -0400] conn=6 op=171 MOD dn="gtdirguid=78b8cc871a3cda9f352580e797b270bc,ou=accounts,ou=gtaccounts,ou=departments,dc=gted,dc=gatech,dc=edu" [12/Jul/2012:15:24:42 -0400] conn=6 op=171 RESULT err=0 tag=103 nentries=0 etime=0 [12/Jul/2012:15:24:42 -0400] conn=6 op=172 MOD dn="gtdirguid=78b8cc871a3cda9f352580e797b270bc,ou=accounts,ou=gtaccounts,ou=departments,dc=gted,dc=gatech,dc=edu" [12/Jul/2012:15:24:42 -0400] conn=6 op=172 RESULT err=0 tag=103 nentries=0 etime=0 [12/Jul/2012:15:24:45 -0400] conn=3 op=170 MOD dn="gtdirguid=78b8cc871a3cda9f352580e797b270bc,ou=accounts,ou=gtaccounts,ou=departments,dc=gted,dc=gatech,dc=edu" [12/Jul/2012:15:24:45 -0400] conn=3 op=170 RESULT err=0 tag=103 nentries=0 etime=0 [12/Jul/2012:15:40:34 -0400] conn=3 op=170 MOD dn="gtdirguid=64898416edc9887656a2f933ae48a113,ou=accounts,ou=gtaccounts,ou=departments,dc=gted,dc=gatech,dc=edu" [12/Jul/2012:15:40:34 -0400] conn=3 op=170 RESULT err=0 tag=103 nentries=0 etime=0 csn=4fff25b5000300330000 [12/Jul/2012:15:24:45 -0400] conn=3 op=172 MOD dn="gtdirguid=78b8cc871a3cda9f352580e797b270bc,ou=accounts,ou=gtaccounts,ou=departments,dc=gted,dc=gatech,dc=edu" [12/Jul/2012:15:24:45 -0400] conn=3 op=172 RESULT err=0 tag=103 nentries=0 etime=0 [12/Jul/2012:15:40:34 -0400] conn=3 op=172 MOD dn="gtdirguid=e824607afc4eb02a105b633bcbf9e7c1,ou=accounts,ou=gtaccounts,ou=departments,dc=gted,dc=gatech,dc=edu" [12/Jul/2012:15:40:34 -0400] conn=3 op=172 RESULT err=0 tag=103 nentries=0 etime=0 csn=4fff25b6000100330000 [12/Jul/2012:16:03:44 -0400] conn=3 op=172 EXT oid="2.16.840.1.113730.3.5.5" name="Netscape Replication End Session" [12/Jul/2012:16:03:44 -0400] conn=3 op=172 RESULT err=0 tag=120 nentries=0 etime=0 [12/Jul/2012:15:24:45 -0400] conn=3 op=173 MOD dn="gtdirguid=78b8cc871a3cda9f352580e797b270bc,ou=accounts,ou=gtaccounts,ou=departments,dc=gted,dc=gatech,dc=edu" [12/Jul/2012:15:24:45 -0400] conn=3 op=173 RESULT err=0 tag=103 nentries=0 etime=0 [12/Jul/2012:15:40:34 -0400] conn=3 op=173 MOD dn="gtdirguid=427dd677597bb6143e227143e771b811,ou=accounts,ou=gtaccounts,ou=departments,dc=gted,dc=gatech,dc=edu" [12/Jul/2012:15:40:34 -0400] conn=3 op=173 RESULT err=0 tag=103 nentries=0 etime=0 csn=4fff25b6000200330000 [12/Jul/2012:16:03:47 -0400] conn=3 op=173 EXT oid="2.16.840.1.113730.3.5.12" name="replication-multimaster-extop" [12/Jul/2012:16:03:47 -0400] conn=3 op=173 RESULT err=0 tag=120 nentries=0 etime=0 [12/Jul/2012:15:24:51 -0400] conn=2 op=2234 MOD dn="gtdirguid=78b8cc871a3cda9f352580e797b270bc,ou=accounts,ou=gtaccounts,ou=departments,dc=gted,dc=gatech,dc=edu" [12/Jul/2012:15:24:51 -0400] conn=2 op=2234 RESULT err=0 tag=103 nentries=0 etime=0 [12/Jul/2012:15:24:51 -0400] conn=2 op=2236 MOD dn="gtdirguid=78b8cc871a3cda9f352580e797b270bc,ou=accounts,ou=gtaccounts,ou=departments,dc=gted,dc=gatech,dc=edu" [12/Jul/2012:15:24:51 -0400] conn=2 op=2236 RESULT err=0 tag=103 nentries=0 etime=0 [12/Jul/2012:15:24:51 -0400] conn=2 op=2237 MOD dn="gtdirguid=78b8cc871a3cda9f352580e797b270bc,ou=accounts,ou=gtaccounts,ou=departments,dc=gted,dc=gatech,dc=edu" [12/Jul/2012:15:24:51 -0400] conn=2 op=2237 RESULT err=0 tag=103 nentries=0 etime=0 [12/Jul/2012:15:24:55 -0400] conn=6 op=2233 MOD dn="gtdirguid=78b8cc871a3cda9f352580e797b270bc,ou=accounts,ou=gtaccounts,ou=departments,dc=gted,dc=gatech,dc=edu" [12/Jul/2012:15:24:55 -0400] conn=6 op=2233 RESULT err=0 tag=103 nentries=0 etime=0 [12/Jul/2012:15:24:55 -0400] conn=6 op=2235 MOD dn="gtdirguid=78b8cc871a3cda9f352580e797b270bc,ou=accounts,ou=gtaccounts,ou=departments,dc=gted,dc=gatech,dc=edu" [12/Jul/2012:15:24:55 -0400] conn=6 op=2235 RESULT err=0 tag=103 nentries=0 etime=0 [12/Jul/2012:15:24:55 -0400] conn=6 op=2236 MOD dn="gtdirguid=78b8cc871a3cda9f352580e797b270bc,ou=accounts,ou=gtaccounts,ou=departments,dc=gted,dc=gatech,dc=edu" [12/Jul/2012:15:24:55 -0400] conn=6 op=2236 RESULT err=0 tag=103 nentries=0 etime=0 [12/Jul/2012:15:24:57 -0400] conn=3 op=2234 MOD dn="gtdirguid=78b8cc871a3cda9f352580e797b270bc,ou=accounts,ou=gtaccounts,ou=departments,dc=gted,dc=gatech,dc=edu" [12/Jul/2012:15:24:57 -0400] conn=3 op=2234 RESULT err=0 tag=103 nentries=0 etime=0
The upgrade to 1.2.10.12 seems to have fixed the issue however, I'm not seeing these repeated entries anymore nor am I seeing changelog error messages when I restart the server. I know you're all working on 1.2.11, but are there any major problems with 1.2.10.12 that's keeping it from being pushed to stable? 1.2.10.4 definitely isn't working for us.
On 07/13/2012 08:02 AM, Robert Viduya wrote:
I've enabled the core dump stuff, but now I can't seem to get it to crash. But I'm still getting the changelog messages in the error logs whenever I restart. In addition, the hub server keeps running out of disk space. I tracked it down to the access log filling up with MOD messages from replication. It looks like changes are coming down from our 1.2.8 servers and being applied over and over again. As an example, one of our entries was modified three times today, and on all our other machines I see the following in the access log file:
# egrep 78b8cc871a3cda9f352580e797b270bc access [12/Jul/2012:11:00:59 -0400] conn=383671 op=3145 MOD dn="gtdirguid=78b8cc871a3cda9f352580e797b270bc,ou=accounts,ou=gtaccounts,ou=departments,dc=gted,dc=gatech,dc=edu" [12/Jul/2012:11:01:24 -0400] conn=383671 op=3153 MOD dn="gtdirguid=78b8cc871a3cda9f352580e797b270bc,ou=accounts,ou=gtaccounts,ou=departments,dc=gted,dc=gatech,dc=edu" [12/Jul/2012:11:01:38 -0400] conn=383671 op=3157 MOD dn="gtdirguid=78b8cc871a3cda9f352580e797b270bc,ou=accounts,ou=gtaccounts,ou=departments,dc=gted,dc=gatech,dc=edu"
But on the problematic hub server, I see:
# egrep 78b8cc871a3cda9f352580e797b270bc access [12/Jul/2012:15:17:29 -0400] conn=2 op=58 MOD dn="gtdirguid=78b8cc871a3cda9f352580e797b270bc,ou=accounts,ou=gtaccounts,ou=departments,dc=gted,dc=gatech,dc=edu" [12/Jul/2012:15:17:29 -0400] conn=2 op=60 MOD dn="gtdirguid=78b8cc871a3cda9f352580e797b270bc,ou=accounts,ou=gtaccounts,ou=departments,dc=gted,dc=gatech,dc=edu" [12/Jul/2012:15:17:29 -0400] conn=2 op=61 MOD dn="gtdirguid=78b8cc871a3cda9f352580e797b270bc,ou=accounts,ou=gtaccounts,ou=departments,dc=gted,dc=gatech,dc=edu" [12/Jul/2012:15:24:42 -0400] conn=6 op=169 MOD dn="gtdirguid=78b8cc871a3cda9f352580e797b270bc,ou=accounts,ou=gtaccounts,ou=departments,dc=gted,dc=gatech,dc=edu" [12/Jul/2012:15:24:42 -0400] conn=6 op=171 MOD dn="gtdirguid=78b8cc871a3cda9f352580e797b270bc,ou=accounts,ou=gtaccounts,ou=departments,dc=gted,dc=gatech,dc=edu" [12/Jul/2012:15:24:42 -0400] conn=6 op=172 MOD dn="gtdirguid=78b8cc871a3cda9f352580e797b270bc,ou=accounts,ou=gtaccounts,ou=departments,dc=gted,dc=gatech,dc=edu" [12/Jul/2012:15:24:45 -0400] conn=3 op=170 MOD dn="gtdirguid=78b8cc871a3cda9f352580e797b270bc,ou=accounts,ou=gtaccounts,ou=departments,dc=gted,dc=gatech,dc=edu" [12/Jul/2012:15:24:45 -0400] conn=3 op=172 MOD dn="gtdirguid=78b8cc871a3cda9f352580e797b270bc,ou=accounts,ou=gtaccounts,ou=departments,dc=gted,dc=gatech,dc=edu" [12/Jul/2012:15:24:45 -0400] conn=3 op=173 MOD dn="gtdirguid=78b8cc871a3cda9f352580e797b270bc,ou=accounts,ou=gtaccounts,ou=departments,dc=gted,dc=gatech,dc=edu" [12/Jul/2012:15:24:51 -0400] conn=2 op=2234 MOD dn="gtdirguid=78b8cc871a3cda9f352580e797b270bc,ou=accounts,ou=gtaccounts,ou=departments,dc=gted,dc=gatech,dc=edu" [12/Jul/2012:15:24:51 -0400] conn=2 op=2236 MOD dn="gtdirguid=78b8cc871a3cda9f352580e797b270bc,ou=accounts,ou=gtaccounts,ou=departments,dc=gted,dc=gatech,dc=edu" [12/Jul/2012:15:24:51 -0400] conn=2 op=2237 MOD dn="gtdirguid=78b8cc871a3cda9f352580e797b270bc,ou=accounts,ou=gtaccounts,ou=departments,dc=gted,dc=gatech,dc=edu" [12/Jul/2012:15:24:55 -0400] conn=6 op=2233 MOD dn="gtdirguid=78b8cc871a3cda9f352580e797b270bc,ou=accounts,ou=gtaccounts,ou=departments,dc=gted,dc=gatech,dc=edu" [12/Jul/2012:15:24:55 -0400] conn=6 op=2235 MOD dn="gtdirguid=78b8cc871a3cda9f352580e797b270bc,ou=accounts,ou=gtaccounts,ou=departments,dc=gted,dc=gatech,dc=edu" [12/Jul/2012:15:24:55 -0400] conn=6 op=2236 MOD dn="gtdirguid=78b8cc871a3cda9f352580e797b270bc,ou=accounts,ou=gtaccounts,ou=departments,dc=gted,dc=gatech,dc=edu" [12/Jul/2012:15:24:57 -0400] conn=3 op=2234 MOD dn="gtdirguid=78b8cc871a3cda9f352580e797b270bc,ou=accounts,ou=gtaccounts,ou=departments,dc=gted,dc=gatech,dc=edu" ...
I truncated the output for brevity, but there's over 250 MODs to that one object. It's as if the server isn't able to do the replication bookkeeping and is accepting changes over and over again. Eventually the disk fills up.
Do you see error messages from the supplier suggesting that it is attempting to send the operation but failing and retrying?
No, there's nothing in the error logs on the supplier side.
Do all of these operations have the same CSN? The csn will be logged with the RESULT line for the operation. Also, what is the err=? for the MOD operations? err=0? Some other code?
Here's some sample out, again, limited for brevity. Most of the RESULT lines don't have a CSN, just the first few. All the err= codes are 0. I've grepped out just the DN sample from my previous mail, again for brevity. There's a lot more DNs being reported:
[12/Jul/2012:15:17:29 -0400] conn=2 op=58 MOD dn="gtdirguid=78b8cc871a3cda9f352580e797b270bc,ou=accounts,ou=gtaccounts,ou=departments,dc=gted,dc=gatech,dc=edu" [12/Jul/2012:15:17:29 -0400] conn=2 op=58 RESULT err=0 tag=103 nentries=0 etime=0 csn=4fff2000000000330000 [12/Jul/2012:15:17:29 -0400] conn=2 op=60 MOD dn="gtdirguid=78b8cc871a3cda9f352580e797b270bc,ou=accounts,ou=gtaccounts,ou=departments,dc=gted,dc=gatech,dc=edu" [12/Jul/2012:15:17:29 -0400] conn=2 op=60 RESULT err=0 tag=103 nentries=0 etime=0 csn=4fff200f000000330000 [12/Jul/2012:15:17:29 -0400] conn=2 op=61 MOD dn="gtdirguid=78b8cc871a3cda9f352580e797b270bc,ou=accounts,ou=gtaccounts,ou=departments,dc=gted,dc=gatech,dc=edu" [12/Jul/2012:15:17:29 -0400] conn=2 op=61 RESULT err=0 tag=103 nentries=0 etime=0 csn=4fff2027000000330000 [12/Jul/2012:15:24:42 -0400] conn=6 op=169 MOD dn="gtdirguid=78b8cc871a3cda9f352580e797b270bc,ou=accounts,ou=gtaccounts,ou=departments,dc=gted,dc=gatech,dc=edu" [12/Jul/2012:15:24:42 -0400] conn=6 op=169 RESULT err=0 tag=103 nentries=0 etime=0 [12/Jul/2012:15:24:42 -0400] conn=6 op=171 MOD dn="gtdirguid=78b8cc871a3cda9f352580e797b270bc,ou=accounts,ou=gtaccounts,ou=departments,dc=gted,dc=gatech,dc=edu" [12/Jul/2012:15:24:42 -0400] conn=6 op=171 RESULT err=0 tag=103 nentries=0 etime=0 [12/Jul/2012:15:24:42 -0400] conn=6 op=172 MOD dn="gtdirguid=78b8cc871a3cda9f352580e797b270bc,ou=accounts,ou=gtaccounts,ou=departments,dc=gted,dc=gatech,dc=edu" [12/Jul/2012:15:24:42 -0400] conn=6 op=172 RESULT err=0 tag=103 nentries=0 etime=0 [12/Jul/2012:15:24:45 -0400] conn=3 op=170 MOD dn="gtdirguid=78b8cc871a3cda9f352580e797b270bc,ou=accounts,ou=gtaccounts,ou=departments,dc=gted,dc=gatech,dc=edu" [12/Jul/2012:15:24:45 -0400] conn=3 op=170 RESULT err=0 tag=103 nentries=0 etime=0 [12/Jul/2012:15:40:34 -0400] conn=3 op=170 MOD dn="gtdirguid=64898416edc9887656a2f933ae48a113,ou=accounts,ou=gtaccounts,ou=departments,dc=gted,dc=gatech,dc=edu" [12/Jul/2012:15:40:34 -0400] conn=3 op=170 RESULT err=0 tag=103 nentries=0 etime=0 csn=4fff25b5000300330000 [12/Jul/2012:15:24:45 -0400] conn=3 op=172 MOD dn="gtdirguid=78b8cc871a3cda9f352580e797b270bc,ou=accounts,ou=gtaccounts,ou=departments,dc=gted,dc=gatech,dc=edu" [12/Jul/2012:15:24:45 -0400] conn=3 op=172 RESULT err=0 tag=103 nentries=0 etime=0 [12/Jul/2012:15:40:34 -0400] conn=3 op=172 MOD dn="gtdirguid=e824607afc4eb02a105b633bcbf9e7c1,ou=accounts,ou=gtaccounts,ou=departments,dc=gted,dc=gatech,dc=edu" [12/Jul/2012:15:40:34 -0400] conn=3 op=172 RESULT err=0 tag=103 nentries=0 etime=0 csn=4fff25b6000100330000 [12/Jul/2012:16:03:44 -0400] conn=3 op=172 EXT oid="2.16.840.1.113730.3.5.5" name="Netscape Replication End Session" [12/Jul/2012:16:03:44 -0400] conn=3 op=172 RESULT err=0 tag=120 nentries=0 etime=0 [12/Jul/2012:15:24:45 -0400] conn=3 op=173 MOD dn="gtdirguid=78b8cc871a3cda9f352580e797b270bc,ou=accounts,ou=gtaccounts,ou=departments,dc=gted,dc=gatech,dc=edu" [12/Jul/2012:15:24:45 -0400] conn=3 op=173 RESULT err=0 tag=103 nentries=0 etime=0 [12/Jul/2012:15:40:34 -0400] conn=3 op=173 MOD dn="gtdirguid=427dd677597bb6143e227143e771b811,ou=accounts,ou=gtaccounts,ou=departments,dc=gted,dc=gatech,dc=edu" [12/Jul/2012:15:40:34 -0400] conn=3 op=173 RESULT err=0 tag=103 nentries=0 etime=0 csn=4fff25b6000200330000 [12/Jul/2012:16:03:47 -0400] conn=3 op=173 EXT oid="2.16.840.1.113730.3.5.12" name="replication-multimaster-extop" [12/Jul/2012:16:03:47 -0400] conn=3 op=173 RESULT err=0 tag=120 nentries=0 etime=0 [12/Jul/2012:15:24:51 -0400] conn=2 op=2234 MOD dn="gtdirguid=78b8cc871a3cda9f352580e797b270bc,ou=accounts,ou=gtaccounts,ou=departments,dc=gted,dc=gatech,dc=edu" [12/Jul/2012:15:24:51 -0400] conn=2 op=2234 RESULT err=0 tag=103 nentries=0 etime=0 [12/Jul/2012:15:24:51 -0400] conn=2 op=2236 MOD dn="gtdirguid=78b8cc871a3cda9f352580e797b270bc,ou=accounts,ou=gtaccounts,ou=departments,dc=gted,dc=gatech,dc=edu" [12/Jul/2012:15:24:51 -0400] conn=2 op=2236 RESULT err=0 tag=103 nentries=0 etime=0 [12/Jul/2012:15:24:51 -0400] conn=2 op=2237 MOD dn="gtdirguid=78b8cc871a3cda9f352580e797b270bc,ou=accounts,ou=gtaccounts,ou=departments,dc=gted,dc=gatech,dc=edu" [12/Jul/2012:15:24:51 -0400] conn=2 op=2237 RESULT err=0 tag=103 nentries=0 etime=0 [12/Jul/2012:15:24:55 -0400] conn=6 op=2233 MOD dn="gtdirguid=78b8cc871a3cda9f352580e797b270bc,ou=accounts,ou=gtaccounts,ou=departments,dc=gted,dc=gatech,dc=edu" [12/Jul/2012:15:24:55 -0400] conn=6 op=2233 RESULT err=0 tag=103 nentries=0 etime=0 [12/Jul/2012:15:24:55 -0400] conn=6 op=2235 MOD dn="gtdirguid=78b8cc871a3cda9f352580e797b270bc,ou=accounts,ou=gtaccounts,ou=departments,dc=gted,dc=gatech,dc=edu" [12/Jul/2012:15:24:55 -0400] conn=6 op=2235 RESULT err=0 tag=103 nentries=0 etime=0 [12/Jul/2012:15:24:55 -0400] conn=6 op=2236 MOD dn="gtdirguid=78b8cc871a3cda9f352580e797b270bc,ou=accounts,ou=gtaccounts,ou=departments,dc=gted,dc=gatech,dc=edu" [12/Jul/2012:15:24:55 -0400] conn=6 op=2236 RESULT err=0 tag=103 nentries=0 etime=0 [12/Jul/2012:15:24:57 -0400] conn=3 op=2234 MOD dn="gtdirguid=78b8cc871a3cda9f352580e797b270bc,ou=accounts,ou=gtaccounts,ou=departments,dc=gted,dc=gatech,dc=edu" [12/Jul/2012:15:24:57 -0400] conn=3 op=2234 RESULT err=0 tag=103 nentries=0 etime=0
The upgrade to 1.2.10.12 seems to have fixed the issue however, I'm not seeing these repeated entries anymore nor am I seeing changelog error messages when I restart the server. I know you're all working on 1.2.11, but are there any major problems with 1.2.10.12 that's keeping it from being pushed to stable?
The only thing 1.2.10.12 needs is testers to give it positive karma ("Works For Me") in https://admin.fedoraproject.org/updates/FEDORA-EPEL-2012-6265/389-ds-base-1.... or whatever your platform is.
If you don't have a FAS account or don't want to do this, do I have your permission to provide your name and email to the update as a user for which the update is working?
1.2.10.4 definitely isn't working for us.
389 users mailing list 389-users@lists.fedoraproject.org https://admin.fedoraproject.org/mailman/listinfo/389-users
On Jul 13, 2012, at 10:05 AM, Rich Megginson wrote:
The only thing 1.2.10.12 needs is testers to give it positive karma ("Works For Me") in https://admin.fedoraproject.org/updates/FEDORA-EPEL-2012-6265/389-ds-base-1.... or whatever your platform is.
If you don't have a FAS account or don't want to do this, do I have your permission to provide your name and email to the update as a user for which the update is working?
Eh, not quite. It's working for us on only one of over 20 ldap servers and that one server is just a hub (i.e., it's not getting customer traffic). Also, that one server has been running for less than a day.
I'll roll it out to more of our servers over the next few days and see how it holds up.
On 07/13/2012 08:30 AM, Robert Viduya wrote:
On Jul 13, 2012, at 10:05 AM, Rich Megginson wrote:
The only thing 1.2.10.12 needs is testers to give it positive karma ("Works For Me") in https://admin.fedoraproject.org/updates/FEDORA-EPEL-2012-6265/389-ds-base-1.... or whatever your platform is.
If you don't have a FAS account or don't want to do this, do I have your permission to provide your name and email to the update as a user for which the update is working?
Eh, not quite. It's working for us on only one of over 20 ldap servers and that one server is just a hub (i.e., it's not getting customer traffic). Also, that one server has been running for less than a day.
I'll roll it out to more of our servers over the next few days and see how it holds up.
Sounds good. Thanks!
-- 389 users mailing list 389-users@lists.fedoraproject.org https://admin.fedoraproject.org/mailman/listinfo/389-users
389-users@lists.fedoraproject.org