Re: "Child not responding" on loaded servers
by Patrick Coleman
On 1 May 2016 at 17:04, Jakub Hrozek <jhrozek(a)redhat.com> wrote:
>> On 30 Apr 2016, at 10:28, Patrick Coleman <patrick.coleman(a)meraki.com> wrote:
>> On 29 Apr 2016 9:10 pm, "Lukas Slebodnik" <lslebodn(a)redhat.com> wrote:
>> >
>> > Do you meand IO related load or CPU related load?
>>
>> Lots of both, but we're typically IO bound more of the time.
>>
>> > If there is issue with CPU then you can mount sssd cache to tmpfs
>> > to avoid such issues. (there are plans to improve it in 1.14)
>>
>> Cool, I'll give that a go.
>
> Alternatively, increase the 'timeout' option in sssd's sections..
I appreciate the advice, thankyou. I've put /var/lib/sss on to a tmpfs
filesystem on a couple of loaded machines and seen what I believe to
be improvements - it's a little too early to say, but I'll report back
once I have a wider deployment.
I did want to feed back a little of our research into this issue. If
we strace the sssd_be subprocess on a loaded machine, we see it
sitting in msync() and fdatasync() for periods of up to 7.3 seconds in
one test. This is perhaps expected, given the machine is under heavy
IO load, but sssd makes a *lot* of these calls.
In a 7m 49.985s test (this is as long as the sssd_be process lasts
before it is killed by the parent for not replying to ping) on a
machine with moderate disk load and no new interactive logins, sssd
made 232 *sync calls. The median syscall takes only 67ms, but the
maximum is more than seven seconds - in the eight minute test sssd
spent 1m 00.044s in *sync system calls.
My (naive) analysis here is that the backend process is spending 13%
of its time unavailable to service account queries, because it's doing
cache maintenance. This seems to rather defeat the point of having a
cache... are my assumptions correct here? I'm happy to send the strace
log (and any other data) to interested parties off-list, just let me
know.
In an attempt to improve this behaviour, in addition to a tmpfs for
/var/lib/sss I've also just added the following to the nss and pam
stanzas in the config:
memcache_timeout = 1800
entry_cache_timeout = 1800
...the idea being they will respond from their own cache without
contacting the backend, which may be busy per the above. Is this
reasonable?
Cheers,
Patrick
7 years, 11 months
sssd, win server 2012, samba4 share, sid
by Stefan Fuhrmann
Hello all,
Im having a win server 2012 with AD and centos 7.2 with samba4 as client.
On the centos client I want to do a cifs share with active directory
authentication. I configured all and "id" and "getent" are working.
I raed that I have to configure permission on the samba share with windows
explorer. I can do that but after closing the security tab and reopen it in
win explorer only win SID are shown in security tab. Please have a look to
attached screenshot.
sssd.conf:
[sssd]
services = nss, pam
config_file_version = 2
domains = samba
debug_level = 9
[nss]
filter_users = root
filter_groups = root
[pam]
[domain/samba]
ad_hostname = centi.samba.dance
ad_server = dc.samba.dance
ad_domain = samba
default_shell = /bin/bash
override_homedir = /home/%u
ldap_schema = ad
id_provider = ad
access_provider = ad
# on large directories, you may want to disable enumeration for performance
reasons
enumerate = true
cache_credentials = true
auth_provider = krb5
chpass_provider = krb5
ldap_sasl_mech = GSSAPI
ldap_sasl_authid = centi$(a)SAMBA.DANCE
krb5_realm = SAMBA.DANCE
krb5_server = dc.samba.dance
krb5_kpasswd = dc.samba.dance
krb5_keytab = /etc/krb5.keytab
ldap_krb5_init_creds = true
ldap_referrals = false
ldap_uri = ldap://dc.samba.dance
ldap_search_base = dc=samba,dc=dance
dyndns_update=false
ldap_id_mapping=true
I searched the web, books... play around with ID- mapping....
Also asking on samba mailinglist was no one who can help.
Nothing helps.
How to get the windows usernames in security tab?
Can someone help?
Tia
Stefan
7 years, 11 months
re_expression
by Frank Ritchie
Hi,
I am trying to construct a re_expression to match
FOO+username
where FOO+ is a string literal.
Does anyone have an example?
7 years, 11 months
SSSD AD multiple domain problem and solution / workaround
by Johan Postema
Hi,
Using SSSD 1.14.2 on RHEL6, users from a different than the joined
domain are only resolved when specifying the domain. As an exmaple:
Joined domain "northamerica", the user uniq_user_A@northamerica can be
resolved using: getent passwd uniq_user_A
But uniq_user_B in domain "europe" can ONLY be resolved using: getent
passwd europe\\unique_user_B
Where I would expect that getent passwd uniq_user_B would also work
(see my configuration file attached below).
Diving into the sssd log files, it seems that when specifying just
"uniq_user_B", the DC's are contacted for the europe domain; which it
can't access since it's not in the kerberos keytab. When specifying
europe\\uniq_user_B it's SSSD seems to contact the DC's for the
northamerica
domain instead; the domain it's joined and that's also in the kerberos
keytab.
To solve this issue I added the europe DC's also to het keytab by
changing the domain/realm in the smb.conf and krb5.conf to europe and
re-ran the
net ads join command. Once they are added, and thus also listed by klist
-k, I can now resolve users in both domain without specifying their
domain.
Like: getent passwd uniq_user_B
I wonder if this is the normal behaviour, because if the server is
joined to the northamerica domain, and getent passwd europe\\uniq_user_B
works, I would
expect that it would be possible to run getent passwd uniq_user_B also
without having to add extra domains to the keytab.
The sssd.conf I used:
[sssd]
services = nss, pam
config_file_version = 2
debug_level = 7
domains = northamerica.example.net,europe.example.net
default_shell = /bin/bash
[nss]
debug_level = 7
default_shell = /bin/bash
filter_users = root
filter_groups = root
reconnection_retries = 3
entry_cache_timeout = 300
entry_cache_nowait_percentage = 75
override_shell = /bin/bash
[pam]
debug_level = 7
[domain/northamerica.example.net]
id_provider = ad
subdomains_provider = none
ad_domain = northamerica.example.net
krb5_realm = NORTHAMERICA.EXAMPLE.NET
use_fully_qualified_names = False
debug_level = 7
auth_provider = ad
chpass_provider = ad
access_provider = ad
cache_credentials = true
ldap_idmap_range_size = 2000000
ldap_idmap_default_domain_sid = S-1-5-21-1757981266-299502267-1801674531
ldap_idmap_default_domain = northamerica.example.net
[domain/europe.example.net]
id_provider = ad
subdomains_provider = none
ad_domain = europe.example.net
krb5_realm = EUROPE.EXAMPLE.NET
use_fully_qualified_names = False
debug_level = 7
auth_provider = ad
chpass_provider = ad
access_provider = ad
cache_credentials = true
ldap_idmap_range_size = 2000000
ldap_idmap_default_domain_sid = S-1-5-21-507921405-813497703-1202660629
ldap_idmap_default_domain = europe.example.net
The krb5.conf
[logging]
default = FILE:/var/log/krb5libs.log
kdc = FILE:/var/log/krb5kdc.log
admin_server = FILE:/var/log/kadmind.log
[libdefaults]
default_realm = NORTHAMERICA.EXAMPLE.NET
dns_lookup_realm = true
dns_lookup_kdc = true
ticket_lifetime = 24h
renew_lifetime = 7d
forwardable = true
Johan Postema.
7 years, 11 months
Re: "Child not responding" on loaded servers
by Patrick Coleman
On 29 Apr 2016 9:10 pm, "Lukas Slebodnik" <lslebodn(a)redhat.com> wrote:
>
> On (29/04/16 17:56), Patrick Coleman wrote:
> >Hi,
> >
> >We've got a number of machines using sssd to connect to LDAP for auth.
> >In the past we've had problems with sssd crashing regularly[1], but
> >after posting here we built some custom packages to disable netlink
> >notifications from the kernel, and it's generally improved.
> >
> >We're still seeing auth failures across random machines - perhaps 1-2%
> >when we run a process which connects to all hosts. The machines are
> >generally heavily loaded when this happens, and sssd.log looks like:
> >
> Do you meand IO related load or CPU related load?
Lots of both, but we're typically IO bound more of the time.
> If there is issue with CPU then you can mount sssd cache to tmpfs
> to avoid such issues. (there are plans to improve it in 1.14)
Cool, I'll give that a go.
Cheers
Patrick
7 years, 11 months