Justin,

if it's https://krbdev.mit.edu/rt/Ticket/Display.html?id=9037 , then it's even more evil to positively prove than dialing up the sssd debug level.  The min debug level to get verbose adcli update output is debug level 7.    Even running at this debug level for just a few days swamps the /var/log or other filesystem housing /var/log/sssd/*.

You can fine-tune this in sssd.conf with debug_level = 0x0100 , which gives just the desired 'adcli update' verbosity with not much else.  And you can tune the default logrotate.d setting to rotate logs more frequently.  

However, this bug is quite infrequent and the adcli update verbosity is insufficient to determine exactly what's going on.  

Ultimately, we have to disable the 30-day  'adcli update' from sssd.conf and write our own crontab file to fire off every 3-4 days.   In this cron job that called adcli update, we wrapped this manual adcli update with tcpdump to get the raw packet capture.  In that way, we were finally able to get a full packet capture and see this race condition.  We also call adcli update with KRB5_TRACE enabled, so that we get the full krb5 verbose output.

Attached is the simple wrapped adcli update shell script that this cron job calls.  

We had to push this cron job out to thousands of servers and update the machine accounts passwords every 3-4 days to obtain 2-3 failed client packet captures.  That race condition is that infrequent.
It occurs on 0.3 - 0.4% of all adcli update invocations.

Most all of these ideas we obtained from this sssd mailing list (such as disabling automatic password renewal and running adcli update as a cron job).

I'm not convinced that Sebastian's situation is this bug,  so Sebastian might be able to get away with debug_level = 0x0100 to see what his bug is.  

Spike





On Wed, Jan 19, 2022 at 9:15 AM Justin Stephenson <jstephen@redhat.com> wrote:
Hi,

It sounds like a problem occurs when SSSD executes 'adcli update' to
renew the machine account password, if successful the AD DC computer
object password is updated and the new keys are written to the keytab.
If a failure occurs however it may have caused these two things to go
out of sync.

You may need to set a high enough 'debug_level' in your
[domain/$domain] section of sssd.conf then check the adcli output
written into the domain logs when the issue happens.

-Justin

On Wed, Jan 19, 2022 at 5:40 AM Sebastian Grebe
<sebastian.grebe@wago.com> wrote:
>
> Hello,
>
> we are getting report from users where they suddenly can‘t authenticate to their Linux computers anymore. These computers are joint to ore MS Domain using adcli und sssd. Checking the log reveals that the kerberos tickets stored in  /etc/krb5.keytab do not have the expected KVON. At the moment we can’t tell what’s causing the issue. It happens only sporadically. I’m under the impression only computer without permanent network connection (Laptops) are affected.
>
> The log shows:
>
> Jan 11 09:30:52 lc015564 systemd[1]: Starting System Security Services Daemon...
> Jan 11 09:30:52 lc015564 sssd[1376]: Starting up
> Jan 11 09:30:52 lc015564 sssd_be[1609]: Starting up
> Jan 11 09:30:52 lc015564 sssd_ifp[1633]: Starting up
> Jan 11 09:30:52 lc015564 systemd[1]: Started System Security Services Daemon.
> Jan 11 09:30:55 lc015564 sssd_be[1609]: Backend is offline
> Jan 11 09:49:32 lc015564 sssd_be[1609]: Backend is online
> Jan 11 09:49:41 lc015564 krb5_child[6111]: Cannot find key for LC015564$@WAGO.LOCAL kvno 11 in keytab
> Jan 11 09:49:41 lc015564 krb5_child[6111]: Cannot find key for LC015564$@WAGO.LOCAL kvno 11 in keytab
> Jan 11 09:49:49 lc015564 adcli[6102]: GSSAPI client step 1
> Jan 11 09:49:49 lc015564 adcli[6102]: GSSAPI client step 1
> Jan 11 09:49:50 lc015564 adcli[6102]: GSSAPI client step 1
> Jan 11 10:00:57 lc015564 krb5_child[6838]: Cannot find key for LC015564$@WAGO.LOCAL kvno 11 in keytab
> Jan 11 10:00:57 lc015564 krb5_child[6838]: Cannot find key for LC015564$@WAGO.LOCAL kvno 11 in keytab
>
> And klist -k shows:
>
> Keytab name: FILE:/etc/krb5.keytab
> KVNO Principal
> ---- --------------------------------------------------------------------------
>   10 LC015564$@WAGO.LOCAL
>   10 LC015564$@WAGO.LOCAL
>   10 LC015564$@WAGO.LOCAL
>   10 host/LC015564@WAGO.LOCAL
>   10 host/LC015564@WAGO.LOCAL
>   10 host/LC015564@WAGO.LOCAL
>   10 host/lc015564.wago.local@WAGO.LOCAL
>   10 host/lc015564.wago.local@WAGO.LOCAL
>   10 host/lc015564.wago.local@WAGO.LOCAL
>   10 RestrictedKrbHost/LC015564@WAGO.LOCAL
>   10 RestrictedKrbHost/LC015564@WAGO.LOCAL
>   10 RestrictedKrbHost/LC015564@WAGO.LOCAL
>   10 RestrictedKrbHost/lc015564.wago.local@WAGO.LOCAL
>   10 RestrictedKrbHost/lc015564.wago.local@WAGO.LOCAL
>   10 RestrictedKrbHost/lc015564.wago.local@WAGO.LOCAL
>    9 LC015564$@WAGO.LOCAL
>    9 LC015564$@WAGO.LOCAL
>    9 LC015564$@WAGO.LOCAL
>    9 host/LC015564@WAGO.LOCAL
>    9 host/LC015564@WAGO.LOCAL
>    9 host/LC015564@WAGO.LOCAL
>    9 host/lc015564.wago.local@WAGO.LOCAL
>    9 host/lc015564.wago.local@WAGO.LOCAL
>    9 host/lc015564.wago.local@WAGO.LOCAL
>    9 RestrictedKrbHost/LC015564@WAGO.LOCAL
>    9 RestrictedKrbHost/LC015564@WAGO.LOCAL
>    9 RestrictedKrbHost/LC015564@WAGO.LOCAL
>    9 RestrictedKrbHost/lc015564.wago.local@WAGO.LOCAL
>    9 RestrictedKrbHost/lc015564.wago.local@WAGO.LOCAL
>    9 RestrictedKrbHost/lc015564.wago.local@WAGO.LOCAL
>
> This is a our sssd.conf (it's from o different computer):
>
> [sssd]
> domains = wago.local
> config_file_version = 2
> services = ifp
>
> [domain/wago.local]
> default_shell = /bin/bash
> fallback_homedir = /home/%d/%u
> cache_credentials = true
> krb5_store_password_if_offline = true
> krb5_realm = WAGO.LOCAL
> krb5_ccname_template = /tmp/krb5cc_%U
> realmd_tags = manages-system joined-with-adcli
> id_provider = ad
> access_provider = ad
> ad_domain = wago.local
> ad_enabled_domains = wago.local
> ad_hostname = lc017547.wago.local
> use_fully_qualified_names = false
> ldap_id_mapping = true
> ldap_user_gecos = displayName
> ldap_use_tokengroups = false
> ldap_search_base = dc=wago,dc=local?subtree?
> ldap_user_search_base = ou=User,ou=Minden,ou=Germany,dc=wago,dc=local?subtree??ou=User,ou=Administration,dc=wago,dc=local?onelevel?(&(objectClass=user)(cn=a2*))?ou=Service,dc=wago,dc=local?subtree?
> ldap_group_search_base = cn=Users,dc=wago,dc=local?onelevel?(&(objectClass=group)(cn=Domain Users))?ou=Groups,ou=Minden,ou=Germany,dc=wago,dc=local?onelevel?(&(objectClass=group)(cn=&01-PC-Support))
> ldap_netgroup_search_base = cn=Users,dc=wago,dc=local?onelevel?
> ignore_group_members = true
> enumerate = false
> dyndns_update = true
> dyndns_refresh_interval = 7200
> dyndns_update_ptr = true
> dyndns_server = 10.1.100.2
> case_sensitive = Preserving
>
> [nss]
> filter_users = root
> filter_groups = root
>
> [pam]
> offline_credentials_expiration = 0
> offline_failed_login_attempts = 3
> offline_failed_login_delay = 5
>
> And the krb5.conf:
>
> [libdefaults]
> ticket_lifetime = 240:00:00
> renew_lifetime = 240:00:00
> clock_skew = 300
> renewable = true
> default_ccache_name = FILE:/tmp/krb5cc_%{uid}
> default_realm = WAGO.LOCAL
> kdc_timesync = 1
> ccache_type = 4
> forwardable = true
> proxiable = true
> udp_preference_limit = 1
> noaddresses = true
> fcc-mit-ticketflags = true
> [realms]
> WAGO.LOCAL = {
>   admin_server = 10.1.101.200
>   admin_server = 10.1.100.1
>   admin_server = 10.1.100.253
>   admin_server = 10.1.100.2
> }
> [domain_realm]
> .wago.local = WAGO.LOCAL
> wago.local  = WAGO.LOCAL
> [login]
> krb4_convert = true
> krb4_get_tickets = false
>
> To solve the issue we delete the computer from the domain, delete the krb5.keytab and rejoin them.
> _______________________________________________
> sssd-users mailing list -- sssd-users@lists.fedorahosted.org
> To unsubscribe send an email to sssd-users-leave@lists.fedorahosted.org
> Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/
> List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
> List Archives: https://lists.fedorahosted.org/archives/list/sssd-users@lists.fedorahosted.org
> Do not reply to spam on the list, report it: https://pagure.io/fedora-infrastructure
_______________________________________________
sssd-users mailing list -- sssd-users@lists.fedorahosted.org
To unsubscribe send an email to sssd-users-leave@lists.fedorahosted.org
Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: https://lists.fedorahosted.org/archives/list/sssd-users@lists.fedorahosted.org
Do not reply to spam on the list, report it: https://pagure.io/fedora-infrastructure