Could this be part of the problem now?

[root@ipa-12 master]# ipa-replica-manage list ipa-12.company.internal
ipa-11.company.internal: replica
ipa-13.company.internal: replica
[root@ipa-12 master]# ipa-replica-manage list ipa-13.company.internal
ipa-11.company.internal: replica
ipa-12.company.internal: replica
[root@ipa-12 master]# ipa-replica-manage list ipa-11.company.internal
Cannot find ipa-11.company.internal in public server list

I can't use ipa-replica-manage to remove the broken links to ipa-11, says I must use `ipa topologysegment-*`. As far as I can tell, the segments don't exist to remove for ipa11 since it's only half set up ... 

On Mon, May 7, 2018 at 6:40 PM, Jonathan Vaughn <jonathan@creatuity.com> wrote:
Still trying to figure this out. It looks like slapd is dying, I thought it was still running for some reason.

slapd is dying to segfault. strace of it happening doesn't seem to reveal much:

18:32:41.543717 (+     0.000801) openat(AT_FDCWD, "/var/run/dirsrv/slapd-COMPANY-INTERNAL.pid", O_WRONLY|O_CREAT|O_TRUNC, 0666) = 32
18:32:41.544907 (+     0.001195) getpid() = 16014
18:32:41.545269 (+     0.000329) fstat64(32, {st_mode=S_IFREG|0644, st_size=0, ...}) = 0
18:32:41.545799 (+     0.000536) write(32, "16014\n", 6) = 6
18:32:41.546603 (+     0.000818) close(32) = 0
18:32:41.547061 (+     0.000448) chmod("/var/run/dirsrv/slapd-COMPANY-INTERNAL.pid", 0644) = 0
18:32:41.547741 (+     0.000676) clock_gettime(CLOCK_REALTIME, {tv_sec=1525735961, tv_nsec=548030641}) = 0
18:32:41.548324 (+     0.000587) write(2, "[07/May/2018:18:32:41.548030641 "..., 134) = 134
18:32:41.551096 (+     0.002840) clock_gettime(CLOCK_REALTIME, {tv_sec=1525735961, tv_nsec=551287555}) = 0
18:32:41.551568 (+     0.000406) write(4, "[07/May/2018:18:32:41.551287555 "..., 134) = 134
18:32:41.552360 (+     0.000811) fsync(4) = 0
18:32:41.558499 (+     0.006170) clock_gettime(CLOCK_REALTIME, {tv_sec=1525735961, tv_nsec=558678099}) = 0
18:32:41.558901 (+     0.000350) write(2, "[07/May/2018:18:32:41.558678099 "..., 136) = 136
18:32:41.561537 (+     0.002680) clock_gettime(CLOCK_REALTIME, {tv_sec=1525735961, tv_nsec=561718659}) = 0
18:32:41.562357 (+     0.000793) write(4, "[07/May/2018:18:32:41.561718659 "..., 136) = 136
18:32:41.563293 (+     0.001148) fsync(4) = 0
18:32:41.566928 (+     0.003452) getpid() = 16014
18:32:41.567712 (+     0.000752) socket(AF_UNIX, SOCK_DGRAM|SOCK_CLOEXEC, 0) = 32
18:32:41.568628 (+     0.000912) getsockopt(32, SOL_SOCKET, SO_SNDBUF, [163840], [4]) = 0
18:32:41.568972 (+     0.000319) setsockopt(32, SOL_SOCKET, SO_SNDBUFFORCE, [8388608], 4) = -1 EPERM (Operation not permitted)
18:32:41.569548 (+     0.000589) setsockopt(32, SOL_SOCKET, SO_SNDBUF, [8388608], 4) = 0
18:32:41.570064 (+     0.000513) sendmsg(32, {msg_name={sa_family=AF_UNIX, sun_path="/run/systemd/notify"}, msg_namelen=21, msg_iov=[{iov_base="READY=1\nSTATUS=slapd started: Re"..., iov_len=69}], msg_iovlen=1, msg_controllen=0, msg_flags=0}, MSG_NOSIGNAL) = 69
18:32:41.570845 (+     0.000789) close(32) = 0
18:32:41.576358 (+     0.005575) futex(0xb30ed2e8, FUTEX_WAIT, 16016, NULL) = ?
18:33:01.730774 (+    20.154428) +++ killed by SIGSEGV +++

It doesn't die immediately after hitting that final futex call - it's just over 20s later. There's still plenty of free memory at the time.

# free -m
              total        used        free      shared  buff/cache   available
Mem:            974         222         156           0         594         679
Swap:          2535          24        2511

The last entries in the error log is that it is listening on socket / port. No error logged for the segfault.

I did see that during previous setup steps (before it dies) there is apparently an error setting up replication:

[07/May/2018:18:17:46.245041807 -0500] - ERR - NSMMReplicationPlugin - acquire_replica - agmt="cn=meToipa-12.company.internal" (ipa-12:5): Unable to acquire replica: permission denied. The bind dn "" does not have permission to supply replication updates to the replica. Will retry later.

Possibly, the failure to configure replication is leading to the segfault? Any ideas why it would not have permission / what to check to ensure it has permission ?


On Thu, May 3, 2018 at 4:02 PM, Jonathan Vaughn <jonathan@creatuity.com> wrote:
It's still running.

Here's the error log from slapd during that run:

[01/May/2018:19:22:24.567650453 -0500] - ERR - NSMMReplicationPlugin - multimaster_extop_StartNSDS50ReplicationRequest - conn=42 op=5 replica="dc=company,dc=internal": Unable to acquire replica: error: permission denied
[01/May/2018:19:22:24.613143629 -0500] - ERR - NSMMReplicationPlugin - multimaster_extop_StartNSDS50ReplicationRequest - conn=43 op=5 replica="dc=company,dc=internal": Unable to acquire replica: error: permission denied
[01/May/2018:19:22:31.445153171 -0500] - INFO - NSMMReplicationPlugin - repl5_tot_run - Beginning total update of replica "agmt="cn=meToipa-11.company.internal" (ipa-11:389)".
[01/May/2018:19:22:31.672485209 -0500] - NOTICE - NSMMReplicationPlugin - replica_subentry_check - Need to create replication keep alive entry <cn=repl keep alive 7,dc=company,dc=internal>
[01/May/2018:19:22:31.674922201 -0500] - INFO - NSMMReplicationPlugin - replica_subentry_create - add dn: cn=repl keep alive 7,dc=company,dc=internal
objectclass: top
objectclass: ldapsubentry
objectclass: extensibleObject
cn: repl keep alive 7
[01/May/2018:19:22:36.125109819 -0500] - ERR - NSMMReplicationPlugin - multimaster_extop_StartNSDS50ReplicationRequest - conn=44 op=5 replica="dc=company,dc=internal": Unable to acquire replica: error: permission denied
[01/May/2018:19:22:36.598178749 -0500] - INFO - NSMMReplicationPlugin - repl5_tot_run - Finished total update of replica "agmt="cn=meToipa-11.company.internal" (ipa-11:389)". Sent 503 entries.
[01/May/2018:19:27:40.482836683 -0500] - ERR - NSMMReplicationPlugin - bind_and_check_pwp - agmt="cn=meToipa-11.company.internal" (ipa-11:389) - Replication bind with GSSAPI auth failed: LDAP error -1 (Can't contact LDAP server) ()

ipa-11 is the Pi I'm trying to set up replication on. Looks like there was a problem setting up replication due to permissions? But the other replica started up just fine ... 

On Thu, May 3, 2018 at 3:13 PM, Rob Crittenden <rcritten@redhat.com> wrote:
Jonathan Vaughn wrote:
Here's the output from ipa-replica-install :

# ipa-replica-install
WARNING: conflicting time&date synchronization service 'chronyd' will
be disabled in favor of ntpd

Password for admin@COMPANY.INTERNAL:
Run connection check to master
Connection check OK
Configuring NTP daemon (ntpd)
   [1/4]: stopping ntpd
   [2/4]: writing configuration
   [3/4]: configuring ntpd to start on boot
   [4/4]: starting ntpd
Done configuring NTP daemon (ntpd).
Configuring directory server (dirsrv). Estimated time: 30 seconds
   [1/41]: creating directory server instance
   [2/41]: enabling ldapi
   [3/41]: configure autobind for root
   [4/41]: stopping directory server
   [5/41]: updating configuration in dse.ldif
   [6/41]: starting directory server
   [7/41]: adding default schema
   [8/41]: enabling memberof plugin
   [9/41]: enabling winsync plugin
   [10/41]: configuring replication version plugin
   [11/41]: enabling IPA enrollment plugin
   [12/41]: configuring uniqueness plugin
   [13/41]: configuring uuid plugin
   [14/41]: configuring modrdn plugin
   [15/41]: configuring DNS plugin
   [16/41]: enabling entryUSN plugin
   [17/41]: configuring lockout plugin
   [18/41]: configuring topology plugin
   [19/41]: creating indices
   [20/41]: enabling referential integrity plugin
   [21/41]: configuring certmap.conf
   [22/41]: configure new location for managed entries
   [23/41]: configure dirsrv ccache
   [24/41]: enabling SASL mapping fallback
   [25/41]: restarting directory server
   [26/41]: creating DS keytab
   [27/41]: ignore time skew for initial replication
   [28/41]: setting up initial replication
Starting replication, please wait until this has completed.
Update in progress, 11 seconds elapsed
Update succeeded

   [29/41]: prevent time skew after initial replication
   [30/41]: adding sasl mappings to the directory
   [31/41]: updating schema
   [32/41]: setting Auto Member configuration
   [33/41]: enabling S4U2Proxy delegation
   [error] NetworkError: cannot connect to 'ldapi://%2Fvar%2Frun%2Fslapd-COMPANY-INTERNAL.socket':
Your system may be partly configured.
Run /usr/sbin/ipa-server-install --uninstall to clean up.

ipapython.admintool: ERROR    cannot connect to 'ldapi://%2Fvar%2Frun%2Fslapd-COMPANY-INTERNAL.socket':
ipapython.admintool: ERROR    The ipa-replica-install command failed. See /var/log/ipareplica-install.log for more information

And here's the /var/log/ipareplica-install.log from just before it fails to the failure:

2018-05-02T00:22:44Z DEBUG   [32/41]: setting Auto Member configuration
2018-05-02T00:22:44Z DEBUG Starting external process
2018-05-02T00:22:44Z DEBUG args=/usr/bin/ldapmodify -v -f /tmp/tmpt7bxf4x1 -H ldapi://%2Fvar%2Frun%2Fslapd-COMPANY-INTERNAL.socket -Y EXTERNAL
2018-05-02T00:22:45Z DEBUG Process finished, return code=0
2018-05-02T00:22:45Z DEBUG stdout=add nsslapd-pluginConfigArea:
         cn=automember,cn=etc,dc=company,dc=internal
modifying entry "cn=Auto Membership Plugin,cn=plugins,cn=config"
modify complete


2018-05-02T00:22:45Z DEBUG stderr=ldap_initialize( ldapi://%2Fvar%2Frun%2Fslapd-COMPANY-INTERNAL.socket/??base )
SASL/EXTERNAL authentication started
SASL username: gidNumber=0+uidNumber=0,cn=peercred,cn=external,cn=auth
SASL SSF: 0

2018-05-02T00:22:45Z DEBUG   duration: 0 seconds
2018-05-02T00:22:45Z DEBUG   [33/41]: enabling S4U2Proxy delegation
2018-05-02T00:22:45Z DEBUG Traceback (most recent call last):
   File "/usr/lib/python3.6/site-packages/ipapython/ipaldap.py", line 979, in error_handler
     yield
   File "/usr/lib/python3.6/site-packages/ipapython/ipaldap.py", line 1572, in update_entry
     self.conn.modify_s(str(entry.dn), modlist)
   File "/usr/lib/python3.6/site-packages/ldap/ldapobject.py", line 600, in modify_s
     return self.modify_ext_s(dn,modlist,None,None)
   File "/usr/lib/python3.6/site-packages/ldap/ldapobject.py", line 573, in modify_ext_s
     resp_type, resp_data, resp_msgid, resp_ctrls = self.result3(msgid,all=1,timeout=self.timeout)
   File "/usr/lib/python3.6/site-packages/ldap/ldapobject.py", line 714, in result3
     resp_ctrl_classes=resp_ctrl_classes
   File "/usr/lib/python3.6/site-packages/ldap/ldapobject.py", line 721, in result4
     ldap_result = self._ldap_call(self._l.result4,msgid,all,timeout,add_ctrls,add_intermediates,add_extop)
   File "/usr/lib/python3.6/site-packages/ldap/ldapobject.py", line 294, in _ldap_call
     result = func(*args,**kwargs)
ldap.SERVER_DOWN: {'desc': "Can't contact LDAP server"}

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
   File "/usr/lib/python3.6/site-packages/ipaserver/install/service.py", line 506, in start_creation
     run_step(full_msg, method)
   File "/usr/lib/python3.6/site-packages/ipaserver/install/service.py", line 496, in run_step
     method()
   File "/usr/lib/python3.6/site-packages/ipaserver/install/dsinstance.py", line 977, in __setup_s4u2proxy
     __add_principal('ipa-http-delegation', 'HTTP', self)
   File "/usr/lib/python3.6/site-packages/ipaserver/install/dsinstance.py", line 973, in __add_principal
     api.Backend.ldap2.update_entry(entry)
   File "/usr/lib/python3.6/site-packages/ipapython/ipaldap.py", line 1572, in update_entry
     self.conn.modify_s(str(entry.dn), modlist)
   File "/usr/lib/python3.6/contextlib.py", line 99, in __exit__
     self.gen.throw(type, value, traceback)
   File "/usr/lib/python3.6/site-packages/ipapython/ipaldap.py", line 1028, in error_handler
     error=info)
ipalib.errors.NetworkError: cannot connect to 'ldapi://%2Fvar%2Frun%2Fslapd-COMPANY-INTERNAL.socket':

2018-05-02T00:22:45Z DEBUG   [error] NetworkError: cannot connect to 'ldapi://%2Fvar%2Frun%2Fslapd-COMPANY-INTERNAL.socket':
2018-05-02T00:22:45Z DEBUG Destroyed connection context.ldap2_2991558640
2018-05-02T00:22:45Z DEBUG Backing up system configuration file '/etc/ipa/default.conf'
2018-05-02T00:22:45Z DEBUG Saving Index File to '/var/lib/ipa/sysrestore/sysrestore.index'
2018-05-02T00:22:45Z DEBUG   File "/usr/lib/python3.6/site-packages/ipapython/admintool.py", line 174, in execute
     return_value = self.run()
   File "/usr/lib/python3.6/site-packages/ipapython/install/cli.py", line 319, in run
     cfgr.run()
   File "/usr/lib/python3.6/site-packages/ipapython/install/core.py", line 364, in run
     self.execute()
   File "/usr/lib/python3.6/site-packages/ipapython/install/core.py", line 388, in execute
     for _nothing in self._executor():
   File "/usr/lib/python3.6/site-packages/ipapython/install/core.py", line 430, in __runner
     exc_handler(exc_info)
   File "/usr/lib/python3.6/site-packages/ipapython/install/core.py", line 459, in _handle_execute_exception
     self._handle_exception(exc_info)
   File "/usr/lib/python3.6/site-packages/ipapython/install/core.py", line 449, in _handle_exception
     six.reraise(*exc_info)
   File "/usr/lib/python3.6/site-packages/six.py", line 693, in reraise
     raise value
   File "/usr/lib/python3.6/site-packages/ipapython/install/core.py", line 420, in __runner
     step()
   File "/usr/lib/python3.6/site-packages/ipapython/install/core.py", line 417, in <lambda>
     step = lambda: next(self.__gen)
   File "/usr/lib/python3.6/site-packages/ipapython/install/util.py", line 81, in run_generator_with_yield_from
     six.reraise(*exc_info)
   File "/usr/lib/python3.6/site-packages/six.py", line 693, in reraise
     raise value
   File "/usr/lib/python3.6/site-packages/ipapython/install/util.py", line 59, in run_generator_with_yield_from
     value = gen.send(prev_value)
   File "/usr/lib/python3.6/site-packages/ipapython/install/core.py", line 654, in _configure
     next(executor)
   File "/usr/lib/python3.6/site-packages/ipapython/install/core.py", line 430, in __runner
     exc_handler(exc_info)
   File "/usr/lib/python3.6/site-packages/ipapython/install/core.py", line 459, in _handle_execute_exception
     self._handle_exception(exc_info)
   File "/usr/lib/python3.6/site-packages/ipapython/install/core.py", line 517, in _handle_exception
     self.__parent._handle_exception(exc_info)
   File "/usr/lib/python3.6/site-packages/ipapython/install/core.py", line 449, in _handle_exception
     six.reraise(*exc_info)
   File "/usr/lib/python3.6/site-packages/six.py", line 693, in reraise
     raise value
   File "/usr/lib/python3.6/site-packages/ipapython/install/core.py", line 514, in _handle_exception
     super(ComponentBase, self)._handle_exception(exc_info)
   File "/usr/lib/python3.6/site-packages/ipapython/install/core.py", line 449, in _handle_exception
     six.reraise(*exc_info)
   File "/usr/lib/python3.6/site-packages/six.py", line 693, in reraise
     raise value
   File "/usr/lib/python3.6/site-packages/ipapython/install/core.py", line 420, in __runner
     step()
   File "/usr/lib/python3.6/site-packages/ipapython/install/core.py", line 417, in <lambda>
     step = lambda: next(self.__gen)
   File "/usr/lib/python3.6/site-packages/ipapython/install/util.py", line 81, in run_generator_with_yield_from
     six.reraise(*exc_info)
   File "/usr/lib/python3.6/site-packages/six.py", line 693, in reraise
     raise value
   File "/usr/lib/python3.6/site-packages/ipapython/install/util.py", line 59, in run_generator_with_yield_from
     value = gen.send(prev_value)
   File "/usr/lib/python3.6/site-packages/ipapython/install/common.py", line 66, in _install
     for unused in self._installer(self.parent):
   File "/usr/lib/python3.6/site-packages/ipaserver/install/server/__init__.py", line 622, in main
     replica_install(self)
   File "/usr/lib/python3.6/site-packages/ipaserver/install/server/replicainstall.py", line 388, in decorated
     func(installer)
   File "/usr/lib/python3.6/site-packages/ipaserver/install/server/replicainstall.py", line 1407, in install
     pkcs12_info=dirsrv_pkcs12_info)
   File "/usr/lib/python3.6/site-packages/ipaserver/install/server/replicainstall.py", line 110, in install_replica_ds
     setup_pkinit=not options.no_pkinit,
   File "/usr/lib/python3.6/site-packages/ipaserver/install/dsinstance.py", line 419, in create_replica
     self.start_creation(runtime=30)
   File "/usr/lib/python3.6/site-packages/ipaserver/install/service.py", line 506, in start_creation
     run_step(full_msg, method)
   File "/usr/lib/python3.6/site-packages/ipaserver/install/service.py", line 496, in run_step
     method()
   File "/usr/lib/python3.6/site-packages/ipaserver/install/dsinstance.py", line 977, in __setup_s4u2proxy
     __add_principal('ipa-http-delegation', 'HTTP', self)
   File "/usr/lib/python3.6/site-packages/ipaserver/install/dsinstance.py", line 973, in __add_principal
     api.Backend.ldap2.update_entry(entry)
   File "/usr/lib/python3.6/site-packages/ipapython/ipaldap.py", line 1572, in update_entry
     self.conn.modify_s(str(entry.dn), modlist)
   File "/usr/lib/python3.6/contextlib.py", line 99, in __exit__
     self.gen.throw(type, value, traceback)
   File "/usr/lib/python3.6/site-packages/ipapython/ipaldap.py", line 1028, in error_handler
     error=info)

2018-05-02T00:22:45Z DEBUG The ipa-replica-install command failed, exception: NetworkError: cannot connect to 'ldapi://%2Fvar%2Frun%2Fslapd-COMPANY-INTERNAL.socket':
2018-05-02T00:22:45Z ERROR cannot connect to 'ldapi://%2Fvar%2Frun%2Fslapd-COMPANY-INTERNAL.socket':
2018-05-02T00:22:45Z ERROR The ipa-replica-install command failed. See /var/log/ipareplica-install.log for more information

Maybe it just is trying to soon and it hasn't fully started yet? Because now I see that the previous step ALSO shows the URL encoded socket path, and it worked fine - but it looks like the previous step used ldapmodify, but the failing step was accessing LDAP from Python? Perhaps it works fine one way and not the other ?

I won't rule it out but I doubt it. Is ns-slapd still running at this point? This could happen, for example, if ns-slapd crashed after enabling the automember plugin. The DS error log might have something to say as well.

rob


On Wed, May 2, 2018 at 3:29 PM, Rob Crittenden <rcritten@redhat.com <mailto:rcritten@redhat.com>> wrote:

    Jonathan Vaughn via FreeIPA-users wrote:

        Yes, I know, not recommended etc, low performance. I'm not going
        to run the CA on it. I just want to have a backup LDAP/Kerberos
        server.

        Right now I'm just trying to test things out. I've got a master
        and a replica (so you could say two masters I suppose) running
        in Virtualbox VMs, and I'm trying to set up a 3rd replica on a
        Pi. All are Fedors 27. I had to downgrade httpd due to
        https://pagure.io/freeipa/issue/7493
        <https://pagure.io/freeipa/issue/7493> to even set up the first
        VM replica, but this issue is separate.

        Currently, the problem is it can't connect to it's own LDAP
        instance due to some kind of error ... ipa-replica-install
        worked fine on the x86_64 VM but on the armv71 Pi 3B when it
        tries to connect to LDAPI instead of
        using 'ldapi:///var/run//slapd-COMPANY-INTERNAL.socket' it
        uses 'ldapi://%2Fvar%2Frun%2Fslapd-COMPANY-INTERNAL.socket'.

        So it seems there is yet another ARM (or non-x86_64) bug ...
        similar to the problem with httpd and passing the KRB5CCNAME
        properly https://pagure.io/freeipa/issue/7337
        <https://pagure.io/freeipa/issue/7337>

        Any ideas on where to look to patch in a fix to this so it uses
        the correct filename? The socket file is there ... and (at the
        time it tries) LDAP is running.


    What makes you think the ldapi URI is the problem?

    Can you share the logs?

    rob