Hello Jakub,

here the first lines of ldap_child.log

(Wed Aug 23 16:07:11 2017) [[sssd[ldap_child[2104]]]] [main] (0x0400): ldap_child started.
(Wed Aug 23 16:07:11 2017) [[sssd[ldap_child[2104]]]] [main] (0x2000): context initialized
(Wed Aug 23 16:07:11 2017) [[sssd[ldap_child[2104]]]] [unpack_buffer] (0x1000): total buffer size: 81
(Wed Aug 23 16:07:11 2017) [[sssd[ldap_child[2104]]]] [unpack_buffer] (0x1000): realm_str size: 16
(Wed Aug 23 16:07:11 2017) [[sssd[ldap_child[2104]]]] [unpack_buffer] (0x1000): got realm_str: IPA.EXAMPLE.COM
(Wed Aug 23 16:07:11 2017) [[sssd[ldap_child[2104]]]] [unpack_buffer] (0x1000): princ_str size: 41
(Wed Aug 23 16:07:11 2017) [[sssd[ldap_child[2104]]]] [unpack_buffer] (0x1000): got princ_str: host/ipa-lx-test-debian9.ípa.example.com
(Wed Aug 23 16:07:11 2017) [[sssd[ldap_child[2104]]]] [unpack_buffer] (0x1000): keytab_name size: 0
(Wed Aug 23 16:07:11 2017) [[sssd[ldap_child[2104]]]] [unpack_buffer] (0x1000): lifetime: 86400
(Wed Aug 23 16:07:11 2017) [[sssd[ldap_child[2104]]]] [unpack_buffer] (0x0200): Will run as [0][0].
(Wed Aug 23 16:07:11 2017) [[sssd[ldap_child[2104]]]] [privileged_krb5_setup] (0x2000): Kerberos context initialized
(Wed Aug 23 16:07:11 2017) [[sssd[ldap_child[2104]]]] [main] (0x2000): Kerberos context initialized
(Wed Aug 23 16:07:11 2017) [[sssd[ldap_child[2104]]]] [become_user] (0x0200): Trying to become user [0][0].
(Wed Aug 23 16:07:11 2017) [[sssd[ldap_child[2104]]]] [become_user] (0x0200): Already user [0].
(Wed Aug 23 16:07:11 2017) [[sssd[ldap_child[2104]]]] [main] (0x2000): Running as [0][0].
(Wed Aug 23 16:07:11 2017) [[sssd[ldap_child[2104]]]] [main] (0x2000): getting TGT sync
(Wed Aug 23 16:07:11 2017) [[sssd[ldap_child[2104]]]] [ldap_child_get_tgt_sync] (0x2000): got realm_name: [IPA.EXAMPLE.COM]
(Wed Aug 23 16:07:11 2017) [[sssd[ldap_child[2104]]]] [ldap_child_get_tgt_sync] (0x0100): Principal name is: [host/ipa-lx-test-debian9.ípa.example.com@IPA.EXAMPLE.COM]
(Wed Aug 23 16:07:11 2017) [[sssd[ldap_child[2104]]]] [ldap_child_get_tgt_sync] (0x0100): Using keytab [MEMORY:/etc/krb5.keytab]
(Wed Aug 23 16:07:11 2017) [[sssd[ldap_child[2104]]]] [ldap_child_get_tgt_sync] (0x0100): Will canonicalize principals
(Wed Aug 23 16:07:11 2017) [[sssd[ldap_child[2104]]]] [sss_child_krb5_trace_cb] (0x4000): [2104] 1503497231.122092: Getting initial credentials for host/ipa-lx-test-debian9.ípa.example.com@IPA.EXAMPLE.COM
(Wed Aug 23 16:07:11 2017) [[sssd[ldap_child[2104]]]] [sss_child_krb5_trace_cb] (0x4000): [2104] 1503497231.122313: Looked up etypes in keytab: aes256-cts, aes128-cts, aes256-sha2, aes128-sha2, des3-cbc-sha1, rc4-hmac
(Wed Aug 23 16:07:11 2017) [[sssd[ldap_child[2104]]]] [sss_child_krb5_trace_cb] (0x4000): [2104] 1503497231.122451: Sending request (218 bytes) to IPA.EXAMPLE.COM
(Wed Aug 23 16:07:17 2017) [[sssd[ldap_child[2104]]]] [sig_term_handler] (0x0010): Received signal [Terminated] [15], shutting down
(Wed Aug 23 16:07:17 2017) [[sssd[ldap_child[2104]]]] [sig_term_handler] (0x0010): Unlink file [/var/lib/sss/db/ccache_IPA.EXAMPLE.COM_TmKHkD]
(Wed Aug 23 16:07:17 2017) [[sssd[ldap_child[2105]]]] [main] (0x0400): ldap_child started.
(Wed Aug 23 16:07:17 2017) [[sssd[ldap_child[2105]]]] [main] (0x2000): context initialized

We connect to IPA.EXAMPLE.COM, this is not helpfull. You can see, there is a delay of 5 seconds. Later in this file, you can see, we try to connect to second server ipa-lx-test-02.ípa.example.com.

(Wed Aug 23 16:10:14 2017) [[sssd[ldap_child[2129]]]] [sss_child_krb5_trace_cb] (0x4000): [2129] 1503497414.357486: Getting initial credentials for host/ipa-lx-test-debian9.ípa.example.com@IPA.EXAMPLE.COM
(Wed Aug 23 16:10:14 2017) [[sssd[ldap_child[2129]]]] [sss_child_krb5_trace_cb] (0x4000): [2129] 1503497414.357715: Looked up etypes in keytab: aes256-cts, aes128-cts, aes256-sha2, aes128-sha2, des3-cbc-sha1, rc4-hmac
(Wed Aug 23 16:10:14 2017) [[sssd[ldap_child[2129]]]] [sss_child_krb5_trace_cb] (0x4000): [2129] 1503497414.357842: Sending request (218 bytes) to IPA.EXAMPLE.COM
(Wed Aug 23 16:10:14 2017) [[sssd[ldap_child[2129]]]] [sss_child_krb5_trace_cb] (0x4000): [2129] 1503497414.360574: Resolving hostname ipa-lx-test-02.ípa.example.com.
(Wed Aug 23 16:10:14 2017) [[sssd[ldap_child[2129]]]] [sss_child_krb5_trace_cb] (0x4000): [2129] 1503497414.362141: Sending initial UDP request to dgram <ip_of_ipa-lx-test-02>:88
(Wed Aug 23 16:10:14 2017) [[sssd[ldap_child[2129]]]] [sss_child_krb5_trace_cb] (0x4000): [2129] 1503497414.367725: Received answer (346 bytes) from dgram <ip_of_ipa-lx-test-02>:88
(Wed Aug 23 16:10:19 2017) [[sssd[ldap_child[2129]]]] [sss_child_krb5_trace_cb] (0x4000): [2129] 1503497419.374601: Response was from master KDC

I think this was my test with ‘kinit admin@IPA.EXAMPLE.COM’. You can find whole log file attached. From my point of view, failover to second KDC is not fast enough and we should set some timeouts in sssd.conf or krb5.conf ?

krb5.conf from ipa-lx-test-debian9:

#File modified by ipa-client-install

includedir /var/lib/sss/pubconf/krb5.include.d/

[libdefaults]
  default_realm = IPA.EXAMPLE.COM
  dns_lookup_realm = true
  dns_lookup_kdc = true
  rdns = false
  ticket_lifetime = 24h
  forwardable = yes
  default_ccache_name = KEYRING:persistent:%{uid}

[realms]
  IPA.EXAMPLE.COM = {
    pkinit_anchors = FILE:/etc/ipa/ca.crt
  }

[domain_realm]
  .ipa.example.com = IPA.EXAMPLE.COM
  ipa.example.com = IPA.EXAMPLE.COM

Regards,

Michael

Am 23.08.2017 um 22:20 schrieb Jakub Hrozek via FreeIPA-users:

On Wed, Aug 23, 2017 at 05:13:13PM +0200, Michael Gusek via FreeIPA-users wrote:
Hi,

we are testing a FreeIPA trust to an Active Directory. Trust itself
works, we are happy. Now we tested a failure on FreeIPA site. We have
two instances, both with same roles. If we poweroff first installed
server, and clean sssd caches with restart of sssd on client side , sssd
service can’t establish a connection to second instance.

ipa-lx-test-01.ipa.example.com is the first installed FreeIPA with
ipa-server-4.4.0-14.el7.centos.7.x86_64 on latest CentOS7
ipa-lx-test-02.ipa.example.com is the second installed FreeIPA with
ipa-server-4.4.0-14.el7.centos.7.x86_64 on latest CentOS7
ipa-lx-test-debian9.ipa.example.com is a latest Debian 9.1 with sssd
1.15.0-3

For deeper inspection full log is attached. In logs we found something
like this:
OK, so as you say communication with the KDC failed:
(Wed Aug 23 16:07:11 2017) [sssd[be[ipa.example.com]]] [be_resolve_server_process] (0x0200): Found address for server ipa-lx-test-02.ipa.example.com: [x.x.x.x] TTL 1200                                       
(Wed Aug 23 16:07:11 2017) [sssd[be[ipa.example.com]]] [sdap_kinit_kdc_resolved] (0x1000): KDC resolved, attempting to get TGT...                                                                              
(Wed Aug 23 16:07:11 2017) [sssd[be[ipa.example.com]]] [create_tgt_req_send_buffer] (0x0400): buffer size: 81                                                                                                  
(Wed Aug 23 16:07:11 2017) [sssd[be[ipa.example.com]]] [child_handler_setup] (0x2000): Setting up signal handler up for pid [2104]                                                                             
(Wed Aug 23 16:07:11 2017) [sssd[be[ipa.example.com]]] [child_handler_setup] (0x2000): Signal handler set up for pid [2104]                                                                                    
(Wed Aug 23 16:07:11 2017) [sssd[be[ipa.example.com]]] [set_tgt_child_timeout] (0x0400): Setting 6 seconds timeout for tgt child                                                                               
(Wed Aug 23 16:07:11 2017) [sssd[be[ipa.example.com]]] [sdap_process_result] (0x2000): Trace: sh[0x558252c35750], connected[1], ops[(nil)], ldap[0x558252c130c0]                                               
(Wed Aug 23 16:07:11 2017) [sssd[be[ipa.example.com]]] [sdap_process_result] (0x2000): Trace: end of ldap_result list                                                                                          
(Wed Aug 23 16:07:11 2017) [sssd[be[ipa.example.com]]] [write_pipe_handler] (0x0400): All data has been sent!                                                                                                  
(Wed Aug 23 16:07:17 2017) [sssd[be[ipa.example.com]]] [get_tgt_timeout_handler] (0x4000): timeout for sending SIGTERM to tgt child [2104] reached.                                                            
(Wed Aug 23 16:07:17 2017) [sssd[be[ipa.example.com]]] [get_tgt_timeout_handler] (0x0400): Setting 2 seconds timeout for sending SIGKILL to tgt child                                                          
(Wed Aug 23 16:07:17 2017) [sssd[be[ipa.example.com]]] [read_pipe_handler] (0x0400): EOF received, client finished                                                                                             
(Wed Aug 23 16:07:17 2017) [sssd[be[ipa.example.com]]] [child_sig_handler] (0x1000): Waiting for child [2104].                                                                                                 
(Wed Aug 23 16:07:17 2017) [sssd[be[ipa.example.com]]] [child_sig_handler] (0x0020): child [2104] failed with status [7].                                                                                      
(Wed Aug 23 16:07:17 2017) [sssd[be[ipa.example.com]]] [child_callback] (0x0020): LDAP child was terminated due to timeout                                                                                     
(Wed Aug 23 16:07:17 2017) [sssd[be[ipa.example.com]]] [sdap_kinit_done] (0x0080): Communication with KDC timed out, trying the next one                                                                       
(Wed Aug 23 16:07:17 2017) [sssd[be[ipa.example.com]]] [_be_fo_set_port_status] (0x8000): Setting status: PORT_NOT_WORKING. Called from: ../src/providers/ldap/sdap_async_connection.c: sdap_kinit_done: 1207  
(Wed Aug 23 16:07:17 2017) [sssd[be[ipa.example.com]]] [fo_set_port_status] (0x0100): Marking port 389 of server 'ipa-lx-test-02.ipa.example.com' as 'not working'                                             
(Wed Aug 23 16:07:17 2017) [sssd[be[ipa.example.com]]] [fo_set_port_status] (0x0400): Marking port 389 of duplicate server 'ipa-lx-test-02.ipa.example.com' as 'not working' 

Could you check in the ldap_child.log which KDC did SSSD try to talk to
and what takes so long?
_______________________________________________
FreeIPA-users mailing list -- freeipa-users@lists.fedorahosted.org
To unsubscribe send an email to freeipa-users-leave@lists.fedorahosted.org

--

________________________________________________


Michael Gusek | System Administrator | Webtrekk GmbH |
t +49 30 755 415 302 | f +49 30 755 415 100 | w www.webtrekk.com
Amtsgericht/Local Court Berlin, HRB 93435 B | Geschäftsführer/CEO Christian Sauer und Wolf Lichtenstein