On Fri, 2018-05-18 at 18:39 +0000, Fong, Trevor wrote:
Hi Everyone,
Hazzah! I've finally licked the slow (and erratic) replication
between our 1.2 -> 1.3 clusters!
The problem was that when I was setting up the 1.3 cluster, I'd done
it with a view to replace the 1.2 cluster.
In making that assumption, I'd set up the cluster in
isolation. Everything worked as it was supposed to, but I didn't
occur to me to set the masters up with different replica ID's from
those in the 1.2 cluster. When I hooked the 1.3 cluster up to the
1.2 cluster, replication into the 1.3 was slow and sometimes it would
just break.
Rebuilding the 1.3 cluster with unique replica ID's for all master
nodes across both clusters resolved the problem.
Great work to find this. I think we say you need unique rids on all
masters in the docs, but we don't enforce it at a programming level.
TBH I really want rids to be allocated by the server tools - there is
some things in the works for this, but they are not yet ready.
Better idea is rids should just be guid's, but I don't think Ludwig or
I want to rewrite all of replication for this :)
>
> Thanks everyone for their helpful comments.
> Trev
>
>
> On 2018-02-20, 4:13 PM, "Mark Reynolds" <mreynolds(a)redhat.com>
> wrote:
>
>
>
> On 02/20/2018 06:53 PM, William Brown wrote:
> > On Tue, 2018-02-20 at 23:36 +0000, Fong, Trevor wrote:
> >> Hi William,
> >>
> >> Thanks a lot for your reply.
> >>
> >> That's correct - replication schedule is not enabled.
> >> No - there are definitely changes to replicate - I know, I
> made the
> >> change myself (
> >> I changed the "description" attribute on an account, but it
> takes up
> >> to 15 mins for the change to appear in the 1.3 master.
> >> That master replicates to another master and a bunch of other
> hubs.
> >> Those hubs replicate amongst themselves and a bunch of
> consumers.
> > So to be correct in my understanding:
> >
> > 1.2 <-> 1.3 --> [ group of hubs/consumers ]
> >
> > Yes?
> >
> >> The update can take up to 15 mins to make it from the 1.2
> master,
> >> into the 1.3 master; but once it hits the 1.3 master, it is
> >> replicated around the 1.3 cluster within 1 sec.
> >>
> >> Only memberOf is disallowed for fractional replication.
> >>
> >> Can anyone give me any guidance as to the settings of the
> "backoff"
> >> and other parameters? Any doc links that may be useful?
> > Mark? You wrote thisn, I can't remember what it's called ....
> Before we should adjust the back off min and max values, we need
> to
> determine why 1.2.11 is having a hard time updating 1.3.6. 1.3.6
> is
> just receiving updates, so it's 1.2.11 that "seems" to be
> misbehaving.
> So... Is there anything in the errors log on 1.2.11? It wouldn't
> hurt
> to check 1.3.6, but I think 1.2.11 is where we will find our
> answer.
>
> If there is noting in the log, then turn on replication logging
> and do
> your test update. Once the update hits 1.3.6 turn replication
> logging
> off. Then we can look at the logs and see what happens with your
> test
> update.
>
> But as requested here is the backoff min & max info:
>
>
http://www.port389.org/docs/389ds/design/replication-retry-settin
> gs.html
>
> >
> >> Thanks a lot,
> >> Trev
> >>
> >>
> >> On 2018-02-18, 3:32 PM, "William Brown"
<william(a)blackhats.net
> .au>
> >> wrote:
> >>
> >> On Sat, 2018-02-17 at 01:49 +0000, Fong, Trevor wrote:
> >> > Hi Everyone,
> >> >
> >> > I’ve set up a new 389 DS cluster (389-Directory/1.3.6.1
> >> > B2018.016.1710) and have set up a replication agreement
> from
> >> our old
> >> > cluster (389-Directory/1.2.11.15 B2014.300.2010) to a
> master
> >> node in
> >> > the new cluster. Problem is that updates in the old
> cluster
> >> take up
> >> > to 15 mins to make it into the new cluster. We need it
> to be
> >> near
> >> > instantaneous, like it normally is. Any ideas what I
> can
> >> check?
> >>
> >> I am assuming you don't have a replication schedule
> enabled?
> >>
> >> In LDAP replication is always "eventual". So a delay
isn't
> >> harmful.
> >>
> >> But there are many things that can influence this. Ludwig
> is the
> >> expert, and I expect he'll comment here.
> >>
> >> Only one master may be "replicating" to a server at a
> time. So if
> >> your
> >> 1.3 server is replicating with other servers, then your
> 1.2
> >> server may
> >> have to "wait it's turn".
> >>
> >> There is a replication 'backoff' timer, that sets how long
> it
> >> tries and
> >> scales these attempts too. I'm not sure if 1.2 has this or
> not
> >> though.
> >>
> >> Another reason could be there are no changes to be
> replicated,
> >> replication only runs when there is something to do. So
> your 1.2
> >> server
> >> may have no changes, or it could be eliminating the
> changes with
> >> fractional replication.
> >>
> >> Finally, it's very noisy but you could consider enabling
> >> replication
> >> logging to check what's happening.
> >>
> >> I hope that helps,
> >>
> >>
> >>
> >> >
> >> > Thanks a lot,
> >> > Trev
> >> >
> >> > _________________________________________________
> >> > Trevor Fong
> >> > Senior Programmer Analyst
> >> > Information Technology | Engage. Envision. Enable.
> >> > The University of British Columbia
> >> > trevor.fong(a)ubc.ca | 1-604-827-5247 | it.ubc.ca
> >> >
> >> > _______________________________________________
> >> > 389-users mailing list -- 389-users(a)lists.fedoraproject.
> org
> >> > To unsubscribe send an email to 389-users-leave(a)lists.fe
> dorapro
> >> ject.o
> >> > rg
> >> --
> >> Thanks,
> >>
> >> William Brown
> >> _______________________________________________
> >> 389-users mailing list -- 389-users(a)lists.fedoraproject.or
> g
> >> To unsubscribe send an email to 389-users-leave(a)lists.fedo
> raproje
> >>
ct.org
> >>
> >>
> >> _______________________________________________
> >> 389-users mailing list -- 389-users(a)lists.fedoraproject.org
> >> To unsubscribe send an email to 389-users-leave(a)lists.fedorapr
> oject.o
> >> rg
> _______________________________________________
> 389-users mailing list -- 389-users(a)lists.fedoraproject.org
> To unsubscribe send an email to 389-users-leave(a)lists.fedoraproje
>
ct.org
>
>