On Fri, Nov 25, 2016 at 11:05:49PM -0000, zfnoctis@gmail.com wrote:
Sorry for the delay, the holidays managed to take up more of my spare time than anticipated.
This was done on Fedora workstation 25. Overall sssd performance appears to be noticeably better than Ubuntu 16.04/16.10 and Fedora 24. Please let me know if I made any mistakes running these tests, or if I need to run additional tests.
#################################################################################################### This was my quick attempt at the USE method in order to rule out other causes of performance issues. #################################################################################################### task: id <domain user>
vmstat, pidstat, and top/htop are showing high amounts of CPU time being spent in user-space. Looks to be sssd_be.
e.g. from pidstat 02:34:39 PM UID PID %usr %system %guest %CPU CPU Command 02:34:40 PM 0 12518 100.00 0.00 0.00 100.00 0 sssd_be
iostat is showing minimal I/O occurring, disks are mostly idle. Not seeing any errors or queuing.
nicstat shows network interfaces never go above 60% utilization. No network errors reported. No queuing.
free is not showing any serious memory pressure, no swapping is occurring.
Unsure how to measure interconnects between subsystems. May use perf/systemtap.
####################################################################################################
These runs were of id <domainuser> and each was of a different user so as to hopefully avoid caching. Each time there were warnings of missing unwind data, is there a set of packages I need to install to resolve this?
####################################################################################################
[root@fc25-vm systemtap]# stap id_perf.stp WARNING: Missing unwind data for a module, rerun with 'stap -d /usr/lib64/libtevent.so.0.9.30' Total run time of id was: 91098 ms Number of zero-level cache transactions: 15 Time spent in level-0 sysdb transactions: 324 ms Time spent writing to LDB: 93 ms Number of LDAP searches: 83 Time spent waiting for LDAP: 1480105770919 ms
OK, this number is clearly wrong. Nonetheless, we also see that even with the number of time spent in writing to the cache quite low (324 ms), the total run was really long (90+ seconds)
We also see that the majority of time is spent resolving groups.
[...]
#################################################################################################### Caching appears to be taking effect here. Guessing I should clear the user cache, is there a best practice for that, rather than just clearing out the sss/db/ directory?
sss_cache -UG
#################################################################################################### [root@fc25-vm systemtap]# stap nested_group_perf.stp ^CTime spent in group sssd_be searches: 2033 Time spent in sdap_nested_group_send/recv: 1112 ms (ratio: 54.69%) Time spent in zero-level sysdb transactions: 699 ms (ratio: 34.38%)
Breakdown of sdap_nested_group req (total: 1112 ms) sdap_nested_group_process req: 1111 sdap_nested_group_process_split req: 11 sdap_nested_group_check_cache: 11 sdap_nested_group_sysdb_search_users: 6 sdap_nested_group_sysdb_search_groups: 3 ldap request breakdown of total 2144 sdap_nested_group_deref req: 54 sdap_deref_search_send req 54 processing deref results: 0 sdap_nested_group_lookup_user req: 979 sdap_nested_group_lookup_group req: 66 Time spent refreshing unknown members: 1045
Breakdown of results processing (total 699) Time spent populating nested members: 3 Time spent searching ldb while populating nested members: 3 Time spent saving nested members: 330 Time spent writing to the ldb: 226 ms
#################################################################################################### This user is from a completely different department, so it likely didn't hit the cache much. ####################################################################################################
Yes, for 'best' (slowest in this respect) results, either expire or remove the cache.
[root@fc25-vm systemtap]# stap nested_group_perf.stp ^CTime spent in group sssd_be searches: 12502 Time spent in sdap_nested_group_send/recv: 4642 ms (ratio: 37.13%) Time spent in zero-level sysdb transactions: 1099 ms (ratio: 8.79%)
Breakdown of sdap_nested_group req (total: 4642 ms) sdap_nested_group_process req: 4971 sdap_nested_group_process_split req: 53 sdap_nested_group_check_cache: 51 sdap_nested_group_sysdb_search_users: 17 sdap_nested_group_sysdb_search_groups: 33 ldap request breakdown of total 55789 sdap_nested_group_deref req: 2908 sdap_deref_search_send req 50266
Here the results are also a bit suspicious. The total time the script took is 12 seconds, but the sdap_deref_search_send request is recorded for 50+ seconds..do you know which of these is closer to the real running time?
It also looks like we need to make the probes a bit more detailed, looking at some of the requests, like sdap_deref_search_send/_recv, we include both the LDAP search and the results processing in the probed time, we should probably break those two down further.
Could you please run this nested_group_perf.stp script a couple more times with cache expired between the 'id' runs so that we can average the results better?
processing deref results: 2 sdap_nested_group_lookup_user req: 2138 sdap_nested_group_lookup_group req: 623 Time spent refreshing unknown members: 2762
Breakdown of results processing (total 1099) Time spent populating nested members: 29 Time spent searching ldb while populating nested members: 23 Time spent saving nested members: 635 Time spent writing to the ldb: 303 ms _______________________________________________ sssd-users mailing list -- sssd-users@lists.fedorahosted.org To unsubscribe send an email to sssd-users-leave@lists.fedorahosted.org