Idea to make replication a bit cleaner
by William Brown
Hi,
I was discussing with some staff here in BNE about replication.
It seems a common case is that admins with 2 or 3 servers in MMR (both DS and IPA) will do this:
* Setup all three masters A, B, C (replica id 1,2,3 respectively)
* Run them for a while in replication
* Remove C from replication
* Delete data, change the system
* Re-add C with the same replica id.
Supposedly this can cause duplicate RUV entries for id 3 in masters A and B. Of course, this means that replication has all
kinds of insane issues at this point ....
On one hand, this is the admins fault. But on the other, we should handle this. Consider an admin who re-uses an IPA replica
setup file, without running CLEANALLRUV
So, an have some idea for this. Any change to a replication agreement, should trigger a CLEANALLRUV, before we start the
agreement. This means on our local master we have removed the bad RUV first, then we can add the RUV of the newly added master
when needed ....
What do you think? I think that we must handle this better, and it should be a non-issue to admins.
We can't prevent an admin from intentionally adding duplicate ID's to the topology though. So making it so that the ID's are not
admin controlled would prevent this, but I haven't any good ideas about this (yet)
--
Sincerely,
William Brown
Software Engineer
Red Hat, Brisbane
7 years, 9 months
Re: Ticket 48798 - CI and lib389 tests fail
by Simon Pichugin
Hi William,
On Thu, Jun 09, 2016 at 10:12:29AM +1000, William Brown wrote:
> On Wed, 2016-06-08 at 17:52 +0200, Simon Pichugin wrote:
> > Hi William,
> >
> > I troubleshoot failures at the tickets.
> > And both tickets/ticket48798_test.py and lib389/tests/nss_ssl_test.py
> > fail because of the same problem.
> > As I understand this is because of class design issue (lib389/nss_ssl.py).
> >
> > Can you please take a look? May be you've already faced that issue and
> > can help me with the problem, so it would resolve faster. :)
> >
> > Please, find the log output in the attachment.
> >
> > Thanks,
> > Simon
>
> I haven't seen this issue before. "works for me" right, so it's not a bug? ;)
Did you test it on clean environment?
>
> Joking aside, looking at that trace, the assert failing is that the CA failed to validate post create.
>
> # Check if ca exists. Should be false.
> assert(topology.standalone.nss_ssl._rsa_ca_exists() is False)
> # Create it. Should work.
> assert(topology.standalone.nss_ssl.create_rsa_ca() is True)
> # Check if ca exists. Should be true
> > assert(topology.standalone.nss_ssl._rsa_ca_exists() is True)
> E assert <bound method NssSsl._rsa_ca_exists of <lib389.nss_ssl.NssSsl object at 0x7f13b4de3ed0>>() is True
> E + where <bound method NssSsl._rsa_ca_exists of <lib389.nss_ssl.NssSsl object at 0x7f13b4de3ed0>> =
> <lib389.nss_ssl.NssSsl object at 0x7f13b4de3ed0>._rsa_ca_exists
> E + where <lib389.nss_ssl.NssSsl object at 0x7f13b4de3ed0> = <lib389.DirSrv instance at 0x7f13b553dbd8>.nss_ssl
> E + where <lib389.DirSrv instance at 0x7f13b553dbd8> = <lib389.tests.nss_ssl_test.TopologyStandalone object at
> 0x7f13b4df8210>.standalone
>
> lib389/tests/nss_ssl_test.py:71: AssertionError
>
>
> I would think the error is occuring in:
>
> assert(topology.standalone.nss_ssl.create_rsa_ca() is True)
>
> This may erroneously be returning True.
>
> It would be worth preventing the instance from being removed, and checking the output of the ssl directory.
>
> Have a look at say (depending on your install prefix ...):
>
> cd [/opt/dirsrv]/etc/dirsrv/slapd-standalone
> certutil -L -d .
>
> You could also dump the result of the check call, or even the command line string it uses and run it by hand. Look at line 147 of nss_ssl.py. Maybe we could add some better logging in / around these parts for future if we have this error again?
>
> The reason I think the error is in create_rsa_ca, is because in _rsa_ca_exists(), there is basically no error checking. It's designed to "fail fast", in the cast there is no CA or DB. Because it's returning a "False", which triggers the assert, it means the CA check is probably working, and telling the truth.
>
>
> Does that help? If you need anything else, let me know,
So I am in the process of investigation, but today I am already drained out,
so I will share what I've found and go to sleep.
Certutil shows that CA cert was successfully added.
If I comment only "#assert(topology.standalone.nss_ssl._rsa_ca_exists() is False)",
then "assert(topology.standalone.nss_ssl._rsa_ca_exists() is True)" is passed.
If I additionally comment "#assert(topology.standalone.nss_ssl._rsa_key_and_cert_exists() is False)",
then "assert(topology.standalone.nss_ssl._rsa_ca_exists() is True)" is failed again.
And it is pretty weird. I think all of this happens because of
not proper created NssSsl class (something was messed out with "bound",
"nonbound" and "static" methods AND/OR something wrong with nss_init
opened every function and not closed). But I am still not sure where is the
problem can be, it is only suggestions. :)
Additionaly, I have the next error:
self = <lib389.config.Config object at 0x7f418cdf50d0>, secport = 636, secargs = {'nsSSL3Ciphers': '+all'}
def enable_ssl(self, secport=636, secargs=None):
"""Configure SSL support into cn=encryption,cn=config.
secargs is a dict like {
'nsSSLPersonalitySSL': 'Server-Cert'
}
"""
> if self.deprecation_strict:
E AttributeError: 'Config' object has no attribute 'deprecation_strict'
But I didn't look into this still. If you want I'll do it tomorrow.
Thanks,
Simon
>
>
> --
> Sincerely,
>
> William Brown
> Software Engineer
> Red Hat, Brisbane
>
7 years, 9 months