On Wed, Feb 17, 2016 at 11:45:36AM +0100, Jakub Hrozek wrote:
Hi,
I would like to get some opinions on where I'm heading with the
performance enhancements for 1.14. Please note this is /not/ a complete
design page. The goal is to just identify some blockers first before I
spend more time working on this feature, even though I already discussed
the page with some developers (thanks!).
If we agree this is the way to go, I will polish the design page as I
work on the feature.
I've started the design page here:
https://fedorahosted.org/sssd/wiki/DesignDocs/OneFourteenPerformanceImpro...
For your convenience, I've included the text below as well:
= Feature Name =
SSSD Performance enhancements for the 1.14 release
Related ticket(s):
*
https://fedorahosted.org/sssd/ticket/2602
*
https://fedorahosted.org/sssd/ticket/2062
=== Problem statement ===
At the moment SSSD doesn't perform well in large environments. Most of
the use-cases we've had reported revolved around logins of users who are
members of large groups or a large amount of groups. Another reported
use-case was the time it takes to resolve a large group.
While workarounds are available for some of the issues (such as using
`ignore_group_members` for resolution of large groups), our goal is to be
able to perform well without these workarounds.
=== Use cases ===
* User who is a member of a large amount of AD groups logs in to a Linux server that is
a member of the AD domain.
* User who is a member of a large amount of AD or IPA groups logs in to a Linux server
that is a member of an IPA domain with a trust relationship to an AD domain
* Administrator of a Linux server runs "ls -l" in a directory where files are
owned by a large group. An example would be group called "students" in an
university setup
=== Overview of the solution ===
During performance analysis with systemtap, we found out that the biggest
delay happens when SSSD writes an entry to the cache. We can't skip cache
writes completely, even if no attributes changed, because we store also the
expiration timestamps in the cache. Also, even if a single attribute (like
the timestamp) changes, ldb would need to unpack the whole entry, change
the record, pack it back and then write the whole blob.
In order to mitigate the costly cache writes, we should avoid writing the
whole cache entry on every cache update.
To avoid this, we will split the monolithic ldb cache representing the
sysdb cache into two ldb files. One would contain the entry itself and would
be fully synchronous. The other (new one) would only contain the timestamps
and would be open using the `LDB_FLG_NOSYNC` to avoid synchronous cache writes.
It would be nice to see some data here to illustrate the potential
improvement. E.g. calling 'id ad_user' after 'sss_cache -E' would be an
expensive operation if the ad_user is a member of many groups. If
nothing has changes on the server side there should be a considerable
difference between the two versions.
I hope this is not too much effort but I would suggest to create an
instrumented build where you check in sysdb_set_entry_attr() if only
timestamp attributes will be written and skip the ldb_modify in the case
and just return EOK. The results here should be better than with an
additional database but should show how much we can get here.
This would have two advantages:
1. If we detect that the entry hasn't changed on the LDAP server at all, we could
avoid writing into the main ldb cache which would still be costly.
1. The writes to the new async ldb cache would be much faster, because the entry is
smaller and because the writes wouldn't call `fsync()` due to using the async flag,
but rather rely on the underlying filesystem to sync the data to the disk.
On SSSD shutdown, we would write a canary to the cache, denoting graceful
shutdown. On SSSD startup, if the canary wasn't found, we would just ditch
the timestamp cache, which would result in refresh and write of the entry
on the next lookup.
Do you expect corruptions on the attribute level, i.e. the ldb can be
opened successfully, an entry can be read successfully but a timestamp
attribute has a corrupted values like 0xffffffff which would prevent
proper updates of the related object? I think only in this case an
unconditional removal of the LDB_FLG_NOSYNC cache would be needed. IF
this cannot happen a full search over the ldb file, e.g. a search for a
non-existing attribute would be sufficient to make sure the database is
not corrupted. And although this might lead to a bit longer starting
time it might save more time later by avoiding un-needed updates to the
main cache.
Finally a comment which is only slightly related. I wonder if you can
factor in when working on the needed sysdb changes that it might be
useful to allow to write other types of attributes to different database
files as well. I'm thinking here of writing e.g. the hashed password
(and maybe other credentials in future) to a different file which is
only access by specific SSSD components. But please skip this idea if
you have a different plan for integrating the second cache where this
won't fit in.
bye,
Sumit