Clearing the sssd cache make the AD login works for a short while, it's probably not necessary nor "production" ready. Looking at /var/log/sssd/sssd_domain.ad.com. I do see offline messages:

(Mon Aug  7 15:19:47 2017) [sssd[be[domain.ad.com]]] [sdap_id_op_connect_done] (0x0020): Failed to connect, going offline (5 [Input/output error])
(Mon Aug  7 15:19:47 2017) [sssd[be[domain.ad.com]]] [be_mark_offline] (0x2000): Going offline!
(Mon Aug  7 15:19:47 2017) [sssd[be[domain.ad.com]]] [be_mark_offline] (0x2000): Enable check_if_online_ptask.
(Mon Aug  7 15:19:47 2017) [sssd[be[domain.ad.com]]] [be_ptask_enable] (0x0400): Task [Check if online (periodic)]: enabling task
(Mon Aug  7 15:19:47 2017) [sssd[be[domain.ad.com]]] [be_ptask_schedule] (0x0400): Task [Check if online (periodic)]: scheduling task 65 seconds from now [1502119252]
(Mon Aug  7 15:19:47 2017) [sssd[be[domain.ad.com]]] [be_run_offline_cb] (0x0080): Going offline. Running callbacks.
(Mon Aug  7 15:19:47 2017) [sssd[be[domain.ad.com]]] [sdap_id_op_connect_done] (0x4000): notify offline to op #1
(Mon Aug  7 15:19:47 2017) [sssd[be[domain.ad.com]]] [ipa_subdomains_refresh_connect_done] (0x0020): Unable to connect to LDAP [11]: Resource temporarily unavailable
(Mon Aug  7 15:19:47 2017) [sssd[be[domain.ad.com]]] [ipa_subdomains_refresh_connect_done] (0x0080): No IPA server is available, cannot get the subdomain list while offline
(Mon Aug  7 15:19:47 2017) [sssd[be[domain.ad.com]]] [be_ptask_done] (0x0040): Task [Subdomains Refresh]: failed with [1432158212]: SSSD is offline
(Mon Aug  7 15:19:47 2017) [sssd[be[domain.ad.com]]] [be_ptask_schedule] (0x0400): Task [Subdomains Refresh]: scheduling task 14400 seconds from now [1502133587]
(Mon Aug  7 15:19:47 2017) [sssd[be[domain.ad.com]]] [sdap_id_release_conn_data] (0x4000): releasing unused connection
(Mon Aug  7 15:19:47 2017) [sssd[be[domain.ad.com]]] [be_ptask_online_cb] (0x0400): Back end is online

I uploaded  the full log file /var/log/sssd/sssd_domain.ad.com https://1drv.ms/f/s!AlZwwyQE2ZZ5p2ZmHLzmeKN7mBJ3

Both my IPA servers looks healthy.AD trust agent/controller server role are installed on both.

ipa trustdomain-find ad.com does return all of my AD domains on both IPA servers.

Thanks,
Alex








On Sun, Aug 6, 2017 at 11:07 AM, Jakub Hrozek <jhrozek@redhat.com> wrote:

On 4 Aug 2017, at 23:08, Alexandre Pitre via FreeIPA-users <freeipa-users@lists.fedorahosted.org> wrote:

Turns out, I'm still getting the same problem. It works right away after I force clean the sssd cache: systemctl stop sssd ; rm -f /var/lib/sss/db/* /var/log/sssd/* ; systemctl start sssd

After some time, trying to log back on the same system I see the login prompt is much quicker when I type aduser@ad.com
Instead of getting a simple "Password:" prompt  I get aduser@ad.com@centos.domain.ad.com's password.

If I login as root and stop/start and clean the sssd cache, it start working again.


Are you sure cleaning the cache is needed? Because I think your issue is different. The fact that you get a faster login prompt and the “Server not found…” message both point to the sssd going offline.

You could run ‘sssctl domain-status’ to show if the domain is online or offline (requires the ‘ifp’ service to be enabled until RHEL-7.4/upstream 1.15.x) or look into the logs for messages like “Going offline”.

/var/log/messages is filled with:

centos sssd_be: GSSAPI Error: Unspecified GSS failure.  Minor code may provide more information (Server krbtgt/AD.COM@IPA.AD.COM not found in Kerberos database)

This is the trust principal. Are you sure all your replicas are either trust agents or you ran “ipa-adtrust-install” on them?



Any thoughts ?

Thanks,
Alex


On Tue, Aug 1, 2017 at 2:58 AM, Jakub Hrozek <jhrozek@redhat.com> wrote:
On Mon, Jul 31, 2017 at 05:47:11PM -0400, Alexandre Pitre wrote:
> Bull-eye Jakub, that did the trick. I should have posted for help on the
> mailing list sooner. Thanks you so much, you are saving my ass.
>
> It makes sense to increase the krb5_auth_timeout as my AD domain
> controllers servers are worldwide. Currently they exist in 3 regions: North
> America, Europe and Asia.
>
> The weird thing is it seems that when a linux host try to authenticate
> against my AD, it just randomly select an AD DC from the _kerberos  SRV
> records. Normally, on the windows side, if "sites and services" are setup
> correctly with subnet defined and binded to sites, a windows client
> shouldn't try to authenticate against an AD DC that isn't local to his
> site. This mechanism doesn't  seem to apply to my linux hosts. Is it
> because it's only available for windows hosts ? Is there another way to
> force linux clients to authenticate against AD DC local to their site ?

We haven't implemented the site selection for the clients yet, only for
servers, see:
    https://bugzilla.redhat.com/show_bug.cgi?id=1416528

>
> For now, I set the krb5_auth_timeout to 120 seconds. I had to completely
> stop sssd and start it again. A colleague mentioned that sssd has a known
> issue with restart apparently.

I'm not aware of any such issue..

>
> Also, I'm curious about ports requirements. Going from linux hosts to AD, I
> only authorize 88 TCP/UDP. I believe that's all I need.

Yes, from the clients, that should be enough. The servers need more
ports open:
    https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Linux/7/html/Linux_Domain_Identity_Authentication_and_Policy_Guide/installing-ipa.html#prereq-ports



-- 
_______________________________________________
FreeIPA-users mailing list -- freeipa-users@lists.fedorahosted.org
To unsubscribe send an email to freeipa-users-leave@lists.fedorahosted.org




--
Alexandre Pitre
alexandre.pitre@gmail.com