A few months ago, I had a machine die suddenly when the power cord was tripped over. (Oops!) After that, I had some replication issues that I solved with the help of this list. Before long, they came back, and back, and back. Basically, I get a bunch of messages like this in the error logs:
[22/Dec/2006:09:26:08 -0600] agmt="cn="Replication to zeppo.nebrwesleyan.edu (o=pab)"" (zeppo:389) - Can't locate CSN 458acc0e000000020000 in the changelog (DBrc=-30990). The consumer may need to be reinitialized. [22/Dec/2006:09:31:09 -0600] agmt="cn="Replication to chico.nebrwesleyan.edu (o=pab)"" (chico:389) - Can't locate CSN 458acc0e000000020000 in the changelog (DBrc=-30990). The consumer may need to be reinitialized.
I get similar messages on every host in the 4-way MMR group. Each machine only complains about one CSN, but they're different CSNs on each machine.
This morning, I took down all of the replication agreements, and reinitialized every host from one, which I temporarily treated as the authoritative master. Within minutes, these messages were appearing again.
Does anyone have any ideas how to solve this once and for all? I've rebuilt my replication agreements countless times, and nothing seems to get them in sync. Any and all ideas are welcome. Thanks.
Chris St. Pierre Unix Systems Administrator Nebraska Wesleyan University ---------------------------- Never send mail to thobrux@nebrwesleyan.edu
Does it definitely replicate a few changes correctly before the problem starts? It reminds me of a problem that used to occur with an earlier 6.21 release, but in that case the first change would not be replicated (changelog empty with no anchor at the head of the list), and the second would produce the error you're seeing.
I don't think it'd help diagnosing the problem beyond noting that the change identified by that CSN really is missing, but if you're interested you can inspect the changelog running the dbscan tool on the <instance root>/changelogdb/<replica name>.db4 file. You should have as many .db4 files are you have replicas. You can also make the server dump it using the CL2LDIF task (see the template-cl-dump.pl script, requires perldap).
Chris St. Pierre wrote:
A few months ago, I had a machine die suddenly when the power cord was tripped over. (Oops!) After that, I had some replication issues that I solved with the help of this list. Before long, they came back, and back, and back. Basically, I get a bunch of messages like this in the error logs:
[22/Dec/2006:09:26:08 -0600] agmt="cn="Replication to zeppo.nebrwesleyan.edu (o=pab)"" (zeppo:389) - Can't locate CSN 458acc0e000000020000 in the changelog (DBrc=-30990). The consumer may need to be reinitialized. [22/Dec/2006:09:31:09 -0600] agmt="cn="Replication to chico.nebrwesleyan.edu (o=pab)"" (chico:389) - Can't locate CSN 458acc0e000000020000 in the changelog (DBrc=-30990). The consumer may need to be reinitialized.
I get similar messages on every host in the 4-way MMR group. Each machine only complains about one CSN, but they're different CSNs on each machine.
This morning, I took down all of the replication agreements, and reinitialized every host from one, which I temporarily treated as the authoritative master. Within minutes, these messages were appearing again.
Does anyone have any ideas how to solve this once and for all? I've rebuilt my replication agreements countless times, and nothing seems to get them in sync. Any and all ideas are welcome. Thanks.
Chris St. Pierre Unix Systems Administrator Nebraska Wesleyan University
Never send mail to thobrux@nebrwesleyan.edu
-- Fedora-directory-users mailing list Fedora-directory-users@redhat.com https://www.redhat.com/mailman/listinfo/fedora-directory-users
Ulf Weltman wrote:
Does it definitely replicate a few changes correctly before the problem starts? It reminds me of a problem that used to occur with an earlier 6.21 release, but in that case the first change would not be replicated (changelog empty with no anchor at the head of the list), and the second would produce the error you're seeing.
I don't think it'd help diagnosing the problem beyond noting that the change identified by that CSN really is missing, but if you're interested you can inspect the changelog running the dbscan tool on the <instance root>/changelogdb/<replica name>.db4 file. You should have as many .db4 files are you have replicas. You can also make the server dump it using the CL2LDIF task (see the template-cl-dump.pl script, requires perldap).
I agree it sounds like some bad changelog juju. It should be possible to nuke the changelog databases on all the replicas (doesn't re-init from a supplier do this ??).
389-users@lists.fedoraproject.org