Yesterday afternoon, one of my consumers randomly crashed/rebooted. Upon rebooting, its replication agreement with its master failed with the following error:
Unable to acquire replica: Excessive clock skew between the supplier and the consumer. Replication is aborting
I did a little bit of Google searching and found some list traffic from a few years ago. From that I derived that this replica was hosed and I would need to re-initialize it. No problem. A re-initialization didn't do anything, same error. Starting from scratch from a completely new/fresh replica produces the same result. That's when I noticed the following errors in the logs on the master.
csngen_new_csn - Warning: too much time skew (-115319 secs). Current seqnum=1
I downloaded the readNsState.py script attached to the following ticket (https://bugzilla.redhat.com/show_bug.cgi?id=233642). Running this on the master produced the following output
For replica cn=replica,cn=o\3Dpotsdam.edu,cn=mapping tree,cn=config len of nsstate is 40 CSN generator state: Replica ID : 6560 Sampled Time : 1328928777 Time in hex : 0x4f35d809 Time as str : Fri Feb 10 21:52:57 2012 Local Offset : 0 Remote Offset : 261 Seq. num : 1 System time : Thu Feb 9 14:00:01 2012 Diff in sec. : -114776
This leads me to believe that the clock skew problem is on the master.
I am not really sure how the clock skew happened. All of these systems synchronize their clocks via a centralized time server and all the times on their clocks are correct. There are 3 or 4 other replicas that are still receiving incremental updates fine, but any attempt to add a new replica results in a failed replication agreement due to excessive clock skew.
I am writing to get a better understanding of the situation and see if there is anything to be done to resolve this. At the moment it seems as if I am caught in an unfortunate situation that will require re-initialization of my master from a back-up.
Thanks for any help that can be provided.
On 02/09/2012 12:23 PM, Greg Kuchyt wrote:
Yesterday afternoon, one of my consumers randomly crashed/rebooted. Upon rebooting, its replication agreement with its master failed with the following error:
Unable to acquire replica: Excessive clock skew between the supplier and the consumer. Replication is aborting
I did a little bit of Google searching and found some list traffic from a few years ago. From that I derived that this replica was hosed and I would need to re-initialize it. No problem. A re-initialization didn't do anything, same error. Starting from scratch from a completely new/fresh replica produces the same result. That's when I noticed the following errors in the logs on the master.
csngen_new_csn - Warning: too much time skew (-115319 secs). Current seqnum=1
I downloaded the readNsState.py script attached to the following ticket (https://bugzilla.redhat.com/show_bug.cgi?id=233642). Running this on the master produced the following output
For replica cn=replica,cn=o\3Dpotsdam.edu,cn=mapping tree,cn=config len of nsstate is 40 CSN generator state: Replica ID : 6560 Sampled Time : 1328928777 Time in hex : 0x4f35d809 Time as str : Fri Feb 10 21:52:57 2012 Local Offset : 0 Remote Offset : 261 Seq. num : 1 System time : Thu Feb 9 14:00:01 2012 Diff in sec. : -114776
This leads me to believe that the clock skew problem is on the master.
I am not really sure how the clock skew happened. All of these systems synchronize their clocks via a centralized time server and all the times on their clocks are correct. There are 3 or 4 other replicas that are still receiving incremental updates fine, but any attempt to add a new replica results in a failed replication agreement due to excessive clock skew.
I am writing to get a better understanding of the situation and see if there is anything to be done to resolve this. At the moment it seems as if I am caught in an unfortunate situation that will require re-initialization of my master from a back-up.
Thanks for any help that can be provided.
What is your 389-ds-base version and platform?
-- 389 users mailing list 389-users@lists.fedoraproject.org https://admin.fedoraproject.org/mailman/listinfo/389-users
On 02/09/2012 02:38 PM, Rich Megginson wrote:
On 02/09/2012 12:23 PM, Greg Kuchyt wrote:
Yesterday afternoon, one of my consumers randomly crashed/rebooted. Upon rebooting, its replication agreement with its master failed with the following error:
Unable to acquire replica: Excessive clock skew between the supplier and the consumer. Replication is aborting
I did a little bit of Google searching and found some list traffic from a few years ago. From that I derived that this replica was hosed and I would need to re-initialize it. No problem. A re-initialization didn't do anything, same error. Starting from scratch from a completely new/fresh replica produces the same result. That's when I noticed the following errors in the logs on the master.
csngen_new_csn - Warning: too much time skew (-115319 secs). Current seqnum=1
I downloaded the readNsState.py script attached to the following ticket (https://bugzilla.redhat.com/show_bug.cgi?id=233642). Running this on the master produced the following output
For replica cn=replica,cn=o\3Dpotsdam.edu,cn=mapping tree,cn=config len of nsstate is 40 CSN generator state: Replica ID : 6560 Sampled Time : 1328928777 Time in hex : 0x4f35d809 Time as str : Fri Feb 10 21:52:57 2012 Local Offset : 0 Remote Offset : 261 Seq. num : 1 System time : Thu Feb 9 14:00:01 2012 Diff in sec. : -114776
This leads me to believe that the clock skew problem is on the master.
I am not really sure how the clock skew happened. All of these systems synchronize their clocks via a centralized time server and all the times on their clocks are correct. There are 3 or 4 other replicas that are still receiving incremental updates fine, but any attempt to add a new replica results in a failed replication agreement due to excessive clock skew.
I am writing to get a better understanding of the situation and see if there is anything to be done to resolve this. At the moment it seems as if I am caught in an unfortunate situation that will require re-initialization of my master from a back-up.
Thanks for any help that can be provided.
What is your 389-ds-base version and platform?
-- 389 users mailing list 389-users@lists.fedoraproject.org https://admin.fedoraproject.org/mailman/listinfo/389-users
Rich,
The master and a few replicas are running on Scientific Linux 6.1 x86_64. Here, we're using the stock packages along with the modified 389-ds-base packages in your fedorapoeople.org repo. So that puts it at 1.2.9.9-1 for 389-ds-base I believe.
Two replicas (including the one that rebooted/failed) are on Fedora 12 x86_64 and their 389-ds-base is 1.2.5-1.
Let me know what other info you need. Thanks.
On 02/09/2012 12:52 PM, Greg Kuchyt wrote:
On 02/09/2012 02:38 PM, Rich Megginson wrote:
On 02/09/2012 12:23 PM, Greg Kuchyt wrote:
Yesterday afternoon, one of my consumers randomly crashed/rebooted. Upon rebooting, its replication agreement with its master failed with the following error:
Unable to acquire replica: Excessive clock skew between the supplier and the consumer. Replication is aborting
I did a little bit of Google searching and found some list traffic from a few years ago. From that I derived that this replica was hosed and I would need to re-initialize it. No problem. A re-initialization didn't do anything, same error. Starting from scratch from a completely new/fresh replica produces the same result. That's when I noticed the following errors in the logs on the master.
csngen_new_csn - Warning: too much time skew (-115319 secs). Current seqnum=1
I downloaded the readNsState.py script attached to the following ticket (https://bugzilla.redhat.com/show_bug.cgi?id=233642). Running this on the master produced the following output
For replica cn=replica,cn=o\3Dpotsdam.edu,cn=mapping tree,cn=config len of nsstate is 40 CSN generator state: Replica ID : 6560 Sampled Time : 1328928777 Time in hex : 0x4f35d809 Time as str : Fri Feb 10 21:52:57 2012 Local Offset : 0 Remote Offset : 261 Seq. num : 1 System time : Thu Feb 9 14:00:01 2012 Diff in sec. : -114776
This leads me to believe that the clock skew problem is on the master.
I am not really sure how the clock skew happened. All of these systems synchronize their clocks via a centralized time server and all the times on their clocks are correct. There are 3 or 4 other replicas that are still receiving incremental updates fine, but any attempt to add a new replica results in a failed replication agreement due to excessive clock skew.
I am writing to get a better understanding of the situation and see if there is anything to be done to resolve this. At the moment it seems as if I am caught in an unfortunate situation that will require re-initialization of my master from a back-up.
Thanks for any help that can be provided.
What is your 389-ds-base version and platform?
-- 389 users mailing list 389-users@lists.fedoraproject.org https://admin.fedoraproject.org/mailman/listinfo/389-users
Rich,
The master and a few replicas are running on Scientific Linux 6.1 x86_64. Here, we're using the stock packages along with the modified 389-ds-base packages in your fedorapoeople.org repo. So that puts it at 1.2.9.9-1 for 389-ds-base I believe.
Two replicas (including the one that rebooted/failed) are on Fedora 12 x86_64 and their 389-ds-base is 1.2.5-1.
There was a known problem with clock skew calculation and handling in 1.2.5 - please try upgrading everything to 1.2.9.9. I realize fedora 12 is no longer supported.
Let me know what other info you need. Thanks.
389 users mailing list 389-users@lists.fedoraproject.org https://admin.fedoraproject.org/mailman/listinfo/389-users
On 02/09/2012 03:03 PM, Rich Megginson wrote:
On 02/09/2012 12:52 PM, Greg Kuchyt wrote:
On 02/09/2012 02:38 PM, Rich Megginson wrote:
On 02/09/2012 12:23 PM, Greg Kuchyt wrote:
Yesterday afternoon, one of my consumers randomly crashed/rebooted. Upon rebooting, its replication agreement with its master failed with the following error:
Unable to acquire replica: Excessive clock skew between the supplier and the consumer. Replication is aborting
I did a little bit of Google searching and found some list traffic from a few years ago. From that I derived that this replica was hosed and I would need to re-initialize it. No problem. A re-initialization didn't do anything, same error. Starting from scratch from a completely new/fresh replica produces the same result. That's when I noticed the following errors in the logs on the master.
csngen_new_csn - Warning: too much time skew (-115319 secs). Current seqnum=1
I downloaded the readNsState.py script attached to the following ticket (https://bugzilla.redhat.com/show_bug.cgi?id=233642). Running this on the master produced the following output
For replica cn=replica,cn=o\3Dpotsdam.edu,cn=mapping tree,cn=config len of nsstate is 40 CSN generator state: Replica ID : 6560 Sampled Time : 1328928777 Time in hex : 0x4f35d809 Time as str : Fri Feb 10 21:52:57 2012 Local Offset : 0 Remote Offset : 261 Seq. num : 1 System time : Thu Feb 9 14:00:01 2012 Diff in sec. : -114776
This leads me to believe that the clock skew problem is on the master.
I am not really sure how the clock skew happened. All of these systems synchronize their clocks via a centralized time server and all the times on their clocks are correct. There are 3 or 4 other replicas that are still receiving incremental updates fine, but any attempt to add a new replica results in a failed replication agreement due to excessive clock skew.
I am writing to get a better understanding of the situation and see if there is anything to be done to resolve this. At the moment it seems as if I am caught in an unfortunate situation that will require re-initialization of my master from a back-up.
Thanks for any help that can be provided.
What is your 389-ds-base version and platform?
-- 389 users mailing list 389-users@lists.fedoraproject.org https://admin.fedoraproject.org/mailman/listinfo/389-users
Rich,
The master and a few replicas are running on Scientific Linux 6.1 x86_64. Here, we're using the stock packages along with the modified 389-ds-base packages in your fedorapoeople.org repo. So that puts it at 1.2.9.9-1 for 389-ds-base I believe.
Two replicas (including the one that rebooted/failed) are on Fedora 12 x86_64 and their 389-ds-base is 1.2.5-1.
There was a known problem with clock skew calculation and handling in 1.2.5 - please try upgrading everything to 1.2.9.9. I realize fedora 12 is no longer supported.
Let me know what other info you need. Thanks.
389 users mailing list 389-users@lists.fedoraproject.org https://admin.fedoraproject.org/mailman/listinfo/389-users
Rich, The F12 systems were in production and were slated for replacement by SL 6.1 systems. I just took the F12 systems out of the mix rather than upgrade them, so everything is now SL 6.1 and 389-ds-base 1.2.9.9.
When attempting to add a new replica I still see the following in the error logs on the master.
"Unable to acquire replica: Excessive clock skew between the supplier and the consumer. Replication is aborting."
As well, I am seeing a lot of these messages in the logs on the master.
"csngen_new_csn - Warning: too much time skew (-123525 secs). Current seqnum=1"
On 02/10/2012 09:07 AM, Greg Kuchyt wrote:
On 02/09/2012 03:03 PM, Rich Megginson wrote:
On 02/09/2012 12:52 PM, Greg Kuchyt wrote:
On 02/09/2012 02:38 PM, Rich Megginson wrote:
On 02/09/2012 12:23 PM, Greg Kuchyt wrote:
Yesterday afternoon, one of my consumers randomly crashed/rebooted. Upon rebooting, its replication agreement with its master failed with the following error:
Unable to acquire replica: Excessive clock skew between the supplier and the consumer. Replication is aborting
I did a little bit of Google searching and found some list traffic from a few years ago. From that I derived that this replica was hosed and I would need to re-initialize it. No problem. A re-initialization didn't do anything, same error. Starting from scratch from a completely new/fresh replica produces the same result. That's when I noticed the following errors in the logs on the master.
csngen_new_csn - Warning: too much time skew (-115319 secs). Current seqnum=1
I downloaded the readNsState.py script attached to the following ticket (https://bugzilla.redhat.com/show_bug.cgi?id=233642). Running this on the master produced the following output
For replica cn=replica,cn=o\3Dpotsdam.edu,cn=mapping tree,cn=config len of nsstate is 40 CSN generator state: Replica ID : 6560 Sampled Time : 1328928777 Time in hex : 0x4f35d809 Time as str : Fri Feb 10 21:52:57 2012 Local Offset : 0 Remote Offset : 261 Seq. num : 1 System time : Thu Feb 9 14:00:01 2012 Diff in sec. : -114776
This leads me to believe that the clock skew problem is on the master.
I am not really sure how the clock skew happened. All of these systems synchronize their clocks via a centralized time server and all the times on their clocks are correct. There are 3 or 4 other replicas that are still receiving incremental updates fine, but any attempt to add a new replica results in a failed replication agreement due to excessive clock skew.
I am writing to get a better understanding of the situation and see if there is anything to be done to resolve this. At the moment it seems as if I am caught in an unfortunate situation that will require re-initialization of my master from a back-up.
Thanks for any help that can be provided.
What is your 389-ds-base version and platform?
-- 389 users mailing list 389-users@lists.fedoraproject.org https://admin.fedoraproject.org/mailman/listinfo/389-users
Rich,
The master and a few replicas are running on Scientific Linux 6.1 x86_64. Here, we're using the stock packages along with the modified 389-ds-base packages in your fedorapoeople.org repo. So that puts it at 1.2.9.9-1 for 389-ds-base I believe.
Two replicas (including the one that rebooted/failed) are on Fedora 12 x86_64 and their 389-ds-base is 1.2.5-1.
There was a known problem with clock skew calculation and handling in 1.2.5 - please try upgrading everything to 1.2.9.9. I realize fedora 12 is no longer supported.
Let me know what other info you need. Thanks.
389 users mailing list 389-users@lists.fedoraproject.org https://admin.fedoraproject.org/mailman/listinfo/389-users
Rich, The F12 systems were in production and were slated for replacement by SL 6.1 systems. I just took the F12 systems out of the mix rather than upgrade them, so everything is now SL 6.1 and 389-ds-base 1.2.9.9.
When attempting to add a new replica I still see the following in the error logs on the master.
"Unable to acquire replica: Excessive clock skew between the supplier and the consumer. Replication is aborting."
As well, I am seeing a lot of these messages in the logs on the master.
"csngen_new_csn - Warning: too much time skew (-123525 secs). Current seqnum=1"
Are your clocks on the servers all in sync?
-- 389 users mailing list 389-users@lists.fedoraproject.org https://admin.fedoraproject.org/mailman/listinfo/389-users
389-users@lists.fedoraproject.org