Hi,
I'm wondering if there are any plans to improve sssd performance on large active directory domains (100k+ users, 40k+ groups), or if there are settings I am not aware of that can greatly improve performance, specifically for workstation use cases.
Currently if I do not set "ignore_group_members = True" in sssd.conf, logins can take upwards of 6 minutes and "sssd_be" will max the CPU for up to 20 minutes after logon, which makes it a non-starter. The reason I want to allow group members to be seen is that I want certain domain groups to be able to perform elevated actions using polkit. If I ignore group members, polkit reports that the group is empty and so no one can elevate in the graphical environment.
Ultimately this means that Linux workstations are at a severe disadvantage since they cannot be bound to the domain and have the normal set of access features users and IT expect from macOS or Windows.
Distributions used: Ubuntu 16.04 (sssd 1.13.4-1ubuntu1.1), Ubuntu 16.10 (sssd 1.13.4-3) and Fedora 24 (sssd-1.13.4-3.fc24). All exhibit the same problems.
I've also tried "ldap_group_nesting_level = 1" without seeing any noticeable improvement with respect to performance. Putting the database on /tmp isn't viable as these are workstations that will reboot semi-frequently, and I don't believe this is an I/O bound performance issue anyways.
Thanks for your time.
On (19/11/16 17:57), zfnoctis@gmail.com wrote:
Hi,
I'm wondering if there are any plans to improve sssd performance on large active directory domains (100k+ users, 40k+ groups), or if there are settings I am not aware of that can greatly improve performance, specifically for workstation use cases.
Currently if I do not set "ignore_group_members = True" in sssd.conf, logins can take upwards of 6 minutes and "sssd_be" will max the CPU for up to 20 minutes after logon, which makes it a non-starter. The reason I want to allow group members to be seen is that I want certain domain groups to be able to perform elevated actions using polkit. If I ignore group members, polkit reports that the group is empty and so no one can elevate in the graphical environment.
Ultimately this means that Linux workstations are at a severe disadvantage since they cannot be bound to the domain and have the normal set of access features users and IT expect from macOS or Windows.
Distributions used: Ubuntu 16.04 (sssd 1.13.4-1ubuntu1.1), Ubuntu 16.10 (sssd 1.13.4-3) and Fedora 24 (sssd-1.13.4-3.fc24). All exhibit the same problems.
All supported version of fedora have sssd-1.14.x since 2016-08-19. I would recommed to test with 1.14.x. There were some performance related improvements.
I've also tried "ldap_group_nesting_level = 1" without seeing any noticeable improvement with respect to performance. Putting the database on /tmp isn't viable as these are workstations that will reboot semi-frequently, and I don't believe this is an I/O bound performance issue anyways.
ldap_group_nesting_level = 1 would not help for id_provider AD with default settings. Tokengroups are used by default there.
(100k+ users, 40k+ groups) is quite a lot.
What is a usual count of groups which user have? I assume 100+
LS
It's hard to give an accurate number, many groups have groups nested in them so it very well could be 60+ on average. A quick test with a fully up-to-date Fedora 24 shows no difference in performance.
I am a bit curious how macOS' directory services seem to handle this effortlessly but sssd just falls on its face.
On Sat, Nov 19, 2016 at 06:33:25PM -0000, zfnoctis@gmail.com wrote:
It's hard to give an accurate number, many groups have groups nested in them so it very well could be 60+ on average. A quick test with a fully up-to-date Fedora 24 shows no difference in performance.
Did you remove your cache before the test by chance? The performance improvements in 1.14.x only improve updates to already populated cache, not populating an empty cache.
If the performance is still bad with a cache that was already populated, could you run a systemtap script to check what is going on?
Sure, I'll just do a clean install and go from there. Where could I find the systemtap script(s) you'd like me to run?
On (20/11/16 18:54), zfnoctis@gmail.com wrote:
Sure, I'll just do a clean install and go from there. Where could I find the systemtap script(s) you'd like me to run?
They are available on sssd-1.14+ and on Fedora tehs are in the directory /usr/share/sssd/systemtap/
/usr/share/sssd/systemtap/id_perf.stp /usr/share/sssd/systemtap/nested_group_perf.stp
LS
What is the appropriate usage of these scripts? Should I run "stap nested_group_perf.stp" and in another terminal run id $username or su - $username ?.
Also I have setup a new Fedora 24 VM with sssd 1.14.2. I am not sure what is going on, but there appears to be some issues with group lookups and now it is affecting sudo. I have not encountered these issues on Ubuntu 16.04 with sssd 1.13 so far.
1. Adding an AD group, say, MY.DOMAIN\linuxadmins, to /etc/sudoers, "%MY.DOMAIN\linuxadmins ALL=(ALL) ALL" does not allow group members to elevate privileges. I have verified the users are in the group via id, and I have verified the group contains the users via "getent group MY.DOMAIN\linuxadmins". However simply adding the users directly to /etc/sudoers allows elevation.
2. Users are not seen as members of groups after attempting to elevate with sudo. As in id and getent group no longer show the user as belonging to that group. I also noticed the following error from systemctl status sssd (potentially same as fedorahosted.org/sssd/ticket/2982):
"Nov 21 10:25:00 fc24-vm.my.domain sssd[nss][729]: More groups have the same GID [965793034]in directory server. SSSD will not work correctly."
This appears to be resolved by modifying ldap_idmap settings. For example:
ldap_idmap_range_min=100000 ldap_idmap_range_max=2000100000 ldap_idmap_range_size=2000000000
I am now very concerned that 1.14 means I cannot use sssd for non-graphical environments if sudo cannot find group members.
I just noticed that everytime I attempt to use sudo as a domain user, I see in ssd_my.domain.log:
[ssd[be[my.domain]]] [watchdog_handler] (0x0010): Watchdog timer overflow, killing process!
This message appears precisely when sudo is done thinking, and prompts for a password.
I forgot to include sssd_nss.log
[sssd[nss]] [nss_cmd_getgrgid_search] (0x0010): getgrid call returned more than one result !?! [sssd[nss]] [sss_dp_get_reply] (0x0010): The Data Provider returned an error [org.freedesktop.sssd.Error.DataProvider.Fatal]
On (21/11/16 18:09), zfnoctis@gmail.com wrote:
What is the appropriate usage of these scripts? Should I run "stap nested_group_perf.stp" and in another terminal run id $username or su - $username ?.
Yes, but you will also neet to install systemtap-devel on fedora. You might also use id_perf.stp just for id stap /usr/share/sssd/systemtap/id_perf.stp
It is not required to restart the script between successive id runs, the variables are cleared when systemtap detects id had started or finished.
The situation is little bit different for nested_group_perf.stp Please note that since the script is supposed to be used in scenarios such as tracing "id" performance, which typically involve multiple group requests. Therefore, the variables are not zeroed out and you need to interrupt the script manually with Ctrl+C.
Also I have setup a new Fedora 24 VM with sssd 1.14.2. I am not sure what is going on, but there appears to be some issues with group lookups and now it is affecting sudo. I have not encountered these issues on Ubuntu 16.04 with sssd 1.13 so far.
Adding an AD group, say, MY.DOMAIN\linuxadmins, to /etc/sudoers, "%MY.DOMAIN\linuxadmins ALL=(ALL) ALL" does not allow group members to elevate privileges. I have verified the users are in the group via id, and I have verified the group contains the users via "getent group MY.DOMAIN\linuxadmins". However simply adding the users directly to /etc/sudoers allows elevation.
Users are not seen as members of groups after attempting to elevate with sudo. As in id and getent group no longer show the user as belonging to that group.
I also noticed the following error from systemctl status sssd (potentially same as fedorahosted.org/sssd/ticket/2982):
"Nov 21 10:25:00 fc24-vm.my.domain sssd[nss][729]: More groups have the same GID [965793034]in directory server. SSSD will not work correctly."
This appears to be resolved by modifying ldap_idmap settings. For example:
ldap_idmap_range_min=100000 ldap_idmap_range_max=2000100000 ldap_idmap_range_size=2000000000
Have you changed id mapping related options before? If yes, did you remove sssd cache? rm -f /var/lib/sss/db/*
I do not see a reason why changing idmapping options should fix problem with "More groups have the same"
I am now very concerned that 1.14 means I cannot use sssd for non-graphical environments if sudo cannot find group members.
Missing groups might cause problems also for other programs and not just for sudo. Feel free to file a fedora BZ with sanitized sssd configuration and sssd log files. https://fedorahosted.org/sssd/wiki/Troubleshooting
LS
On Tue, Nov 22, 2016 at 09:23:39AM +0100, Lukas Slebodnik wrote:
On (21/11/16 18:09), zfnoctis@gmail.com wrote:
What is the appropriate usage of these scripts? Should I run "stap nested_group_perf.stp" and in another terminal run id $username or su - $username ?.
Yes, but you will also neet to install systemtap-devel on fedora. You might also use id_perf.stp just for id stap /usr/share/sssd/systemtap/id_perf.stp
Also iirc kernel-debuginfo, but this should be pulled by stap automatically (I think).
It is not required to restart the script between successive id runs, the variables are cleared when systemtap detects id had started or finished.
The situation is little bit different for nested_group_perf.stp Please note that since the script is supposed to be used in scenarios such as tracing "id" performance, which typically involve multiple group requests. Therefore, the variables are not zeroed out and you need to interrupt the script manually with Ctrl+C.
Also I have setup a new Fedora 24 VM with sssd 1.14.2. I am not sure what is going on, but there appears to be some issues with group lookups and now it is affecting sudo. I have not encountered these issues on Ubuntu 16.04 with sssd 1.13 so far.
Adding an AD group, say, MY.DOMAIN\linuxadmins, to /etc/sudoers, "%MY.DOMAIN\linuxadmins ALL=(ALL) ALL" does not allow group members to elevate privileges. I have verified the users are in the group via id, and I have verified the group contains the users via "getent group MY.DOMAIN\linuxadmins". However simply adding the users directly to /etc/sudoers allows elevation.
Users are not seen as members of groups after attempting to elevate with sudo. As in id and getent group no longer show the user as belonging to that group.
I also noticed the following error from systemctl status sssd (potentially same as fedorahosted.org/sssd/ticket/2982):
"Nov 21 10:25:00 fc24-vm.my.domain sssd[nss][729]: More groups have the same GID [965793034]in directory server. SSSD will not work correctly."
This appears to be resolved by modifying ldap_idmap settings. For example:
ldap_idmap_range_min=100000 ldap_idmap_range_max=2000100000 ldap_idmap_range_size=2000000000
Have you changed id mapping related options before? If yes, did you remove sssd cache? rm -f /var/lib/sss/db/*
I do not see a reason why changing idmapping options should fix problem with "More groups have the same"
I am now very concerned that 1.14 means I cannot use sssd for non-graphical environments if sudo cannot find group members.
Missing groups might cause problems also for other programs and not just for sudo. Feel free to file a fedora BZ with sanitized sssd configuration and sssd log files. https://fedorahosted.org/sssd/wiki/Troubleshooting
I think if this issue happens after the cache is cleared, it would be nice to peek at the ldb cache with ldbsearch to see which objects have the duplicate gid..
Regarding my previous issues, dyndns_update = True, combined with changing idmap settings resolved both issues for sudo and groups losing members from the cache.
I will use stap (and likely perf etc.) to go further into the performance issues tomorrow. Thanks again for your assistance.
On Wed, Nov 23, 2016 at 12:52:28AM -0000, zfnoctis@gmail.com wrote:
Regarding my previous issues, dyndns_update = True, combined with changing idmap settings resolved both issues for sudo and groups losing members from the cache.
I will use stap (and likely perf etc.) to go further into the performance issues tomorrow. Thanks again for your assistance.
Thank you, it would be nice to see some performance-related numbers from a real-world large deployment. We did some measurements and tests in the 1.14 development cycle in our lab environment, but obviously we need to do more.
Knowing from a real environment where to focus on would help quite a bit.
Sorry for the delay, the holidays managed to take up more of my spare time than anticipated.
This was done on Fedora workstation 25. Overall sssd performance appears to be noticeably better than Ubuntu 16.04/16.10 and Fedora 24. Please let me know if I made any mistakes running these tests, or if I need to run additional tests.
#################################################################################################### This was my quick attempt at the USE method in order to rule out other causes of performance issues. #################################################################################################### task: id <domain user>
vmstat, pidstat, and top/htop are showing high amounts of CPU time being spent in user-space. Looks to be sssd_be.
e.g. from pidstat 02:34:39 PM UID PID %usr %system %guest %CPU CPU Command 02:34:40 PM 0 12518 100.00 0.00 0.00 100.00 0 sssd_be
iostat is showing minimal I/O occurring, disks are mostly idle. Not seeing any errors or queuing.
nicstat shows network interfaces never go above 60% utilization. No network errors reported. No queuing.
free is not showing any serious memory pressure, no swapping is occurring.
Unsure how to measure interconnects between subsystems. May use perf/systemtap.
####################################################################################################
These runs were of id <domainuser> and each was of a different user so as to hopefully avoid caching. Each time there were warnings of missing unwind data, is there a set of packages I need to install to resolve this?
####################################################################################################
[root@fc25-vm systemtap]# stap id_perf.stp WARNING: Missing unwind data for a module, rerun with 'stap -d /usr/lib64/libtevent.so.0.9.30' Total run time of id was: 91098 ms Number of zero-level cache transactions: 15 Time spent in level-0 sysdb transactions: 324 ms Time spent writing to LDB: 93 ms Number of LDAP searches: 83 Time spent waiting for LDAP: 1480105770919 ms LDAP searches breakdown: Number of user requests: 0 Time spent in user requests: 0
Number of group requests: 14 Time spent in group requests: 1933
Number of initgroups requests: 0 Time spent in initgroups requests: 0
Unaccounted time: -1480105680145 ms sysdb transaction breakdown: 1 hits of transaction sysdb_transaction_commit+0xa0 [libsss_util.so] sdap_nested_done+0x2d5 [libsss_ldap_common.so] sdap_nested_group_process_done+0x1fe [libsss_ldap_common.so] sdap_nested_group_process_done+0x1fe [libsss_ldap_common.so] 0x7f78b9781474 [libtevent.so.0.9.30+0x5474] avg:36 min: 36 max: 36 sum: 36
8 hits of transaction sysdb_transaction_commit+0xa0 [libsss_util.so] sdap_nested_done+0x2d5 [libsss_ldap_common.so] sdap_nested_group_process_done+0x1fe [libsss_ldap_common.so] 0x7f78b9781474 [libtevent.so.0.9.30+0x5474] avg:21 min: 19 max: 28 sum: 171
4 hits of transaction sysdb_transaction_commit+0xa0 [libsss_util.so] sdap_nested_done+0x2d5 [libsss_ldap_common.so] sdap_nested_group_process_done+0x21e [libsss_ldap_common.so] 0x7f78b9781474 [libtevent.so.0.9.30+0x5474] avg:29 min: 21 max: 52 sum: 117
The most expensive transaction breakdown, per transaction: value |-------------------------------------------------- count 0 |@@@ 3 50 |@ 1 100 | 0 150 | 0
####################################################################################################
[root@fc25-vm systemtap]# stap -d /usr/lib64/libtevent.so.0.9.30 id_perf.stp WARNING: Missing unwind data for a module, rerun with 'stap -d /usr/libexec/sssd/sssd_be' Total run time of id was: 102614 ms Number of zero-level cache transactions: 34 Time spent in level-0 sysdb transactions: 7789 ms Time spent writing to LDB: 519 ms Number of LDAP searches: 742 Time spent waiting for LDAP: 51312 ms LDAP searches breakdown: Number of user requests: 1 Time spent in user requests: 144
Number of group requests: 30 Time spent in group requests: 50523
Number of initgroups requests: 1 Time spent in initgroups requests: 465
Unaccounted time: 43513 ms sysdb transaction breakdown: 2 hits of transaction sysdb_transaction_commit+0xa0 [libsss_util.so] sdap_nested_done+0x2d5 [libsss_ldap_common.so] sdap_nested_group_process_done+0x1fe [libsss_ldap_common.so] tevent_common_loop_immediate+0xd4 [libtevent.so.0.9.30] epoll_event_loop_once+0x5d [libtevent.so.0.9.30] std_event_loop_once+0x37 [libtevent.so.0.9.30] _tevent_loop_once+0x9d [libtevent.so.0.9.30] tevent_common_loop_wait+0x1b [libtevent.so.0.9.30] std_event_loop_wait+0x37 [libtevent.so.0.9.30] server_loop+0x13 [libsss_util.so] 0x55a92ac7c677 [sssd_be+0x8 avg:22 min: 22 max: 23 sum: 45
1 hits of transaction sysdb_transaction_commit+0xa0 [libsss_util.so] sdap_nested_done+0x2d5 [libsss_ldap_common.so] sdap_nested_group_process_done+0x1fe [libsss_ldap_common.so] sdap_nested_group_process_done+0x1fe [libsss_ldap_common.so] tevent_common_loop_immediate+0xd4 [libtevent.so.0.9.30] epoll_event_loop_once+0x5d [libtevent.so.0.9.30] std_event_loop_once+0x37 [libtevent.so.0.9.30] _tevent_loop_once+0x9d [libtevent.so.0.9.30] tevent_common_loop_wait+0x1b [libtevent.so.0.9.30] std_event_loop_wait+0x37 [libtevent.so.0.9.30]
avg:53 min: 53 max: 53 sum: 53
4 hits of transaction sysdb_transaction_commit+0xa0 [libsss_util.so] sdap_nested_done+0x2d5 [libsss_ldap_common.so] sdap_nested_group_process_done+0x21e [libsss_ldap_common.so] tevent_common_loop_immediate+0xd4 [libtevent.so.0.9.30] epoll_event_loop_once+0x5d [libtevent.so.0.9.30] std_event_loop_once+0x37 [libtevent.so.0.9.30] _tevent_loop_once+0x9d [libtevent.so.0.9.30] tevent_common_loop_wait+0x1b [libtevent.so.0.9.30] std_event_loop_wait+0x37 [libtevent.so.0.9.30] server_loop+0x13 [libsss_util.so] 0x55fcef0ca677 [sssd_be+0x8 avg:22 min: 22 max: 23 sum: 90
2 hits of transaction sysdb_transaction_commit+0xa0 [libsss_util.so] sdap_nested_done+0x2d5 [libsss_ldap_common.so] sdap_nested_group_process_done+0x21e [libsss_ldap_common.so] tevent_common_loop_immediate+0xd4 [libtevent.so.0.9.30] epoll_event_loop_once+0x5d [libtevent.so.0.9.30] std_event_loop_once+0x37 [libtevent.so.0.9.30] _tevent_loop_once+0x9d [libtevent.so.0.9.30] tevent_common_loop_wait+0x1b [libtevent.so.0.9.30] std_event_loop_wait+0x37 [libtevent.so.0.9.30] server_loop+0x13 [libsss_util.so] 0x55a92ac7c677 [sssd_be+0x8 avg:2635 min: 67 max: 5203 sum: 5270
2 hits of transaction sysdb_transaction_commit+0xa0 [libsss_util.so] sdap_nested_done+0x2d5 [libsss_ldap_common.so] tevent_common_loop_immediate+0xd4 [libtevent.so.0.9.30] epoll_event_loop_once+0x5d [libtevent.so.0.9.30] std_event_loop_once+0x37 [libtevent.so.0.9.30] _tevent_loop_once+0x9d [libtevent.so.0.9.30] tevent_common_loop_wait+0x1b [libtevent.so.0.9.30] std_event_loop_wait+0x37 [libtevent.so.0.9.30] server_loop+0x13 [libsss_util.so] 0x55fcef0ca677 [sssd_be+0x8677] avg:25 min: 25 max: 26 sum: 51
1 hits of transaction sysdb_transaction_commit+0xa0 [libsss_util.so] sdap_nested_done+0x2d5 [libsss_ldap_common.so] sdap_nested_group_process_done+0x21e [libsss_ldap_common.so] sdap_nested_group_process_done+0x1fe [libsss_ldap_common.so] sdap_nested_group_process_done+0x1fe [libsss_ldap_common.so] tevent_common_loop_immediate+0xd4 [libtevent.so.0.9.30] epoll_event_loop_once+0x5d [libtevent.so.0.9.30] std_event_loop_once+0x37 [libtevent.so.0.9.30] _tevent_loop_once+0x9d [libtevent.so.0.9.30] tevent_common_loop_wait+0x1b [libteve avg:1539 min: 1539 max: 1539 sum: 1539
1 hits of transaction sysdb_transaction_commit+0xa0 [libsss_util.so] sdap_ad_save_group_membership_with_idmapping+0x532 [libsss_ldap_common.so] sdap_ad_tokengroups_initgr_mapping_done+0x13b [libsss_ldap_common.so] sdap_get_ad_tokengroups_done+0x22f [libsss_ldap_common.so] generic_ext_search_handler.isra.4+0xe0 [libsss_ldap_common.so] sdap_get_generic_op_finished+0x5ba [libsss_ldap_common.so] sdap_process_result+0x71d [libsss_ldap_common.so] tevent_common_loop_timer_delay+0xd0 [libtevent.so.0.9.30] epoll_event_loop_once+0x69 [li avg:60 min: 60 max: 60 sum: 60
14 hits of transaction sysdb_transaction_commit+0xa0 [libsss_util.so] sdap_nested_done+0x2d5 [libsss_ldap_common.so] sdap_nested_group_process_done+0x1fe [libsss_ldap_common.so] tevent_common_loop_immediate+0xd4 [libtevent.so.0.9.30] epoll_event_loop_once+0x5d [libtevent.so.0.9.30] std_event_loop_once+0x37 [libtevent.so.0.9.30] _tevent_loop_once+0x9d [libtevent.so.0.9.30] tevent_common_loop_wait+0x1b [libtevent.so.0.9.30] std_event_loop_wait+0x37 [libtevent.so.0.9.30] server_loop+0x13 [libsss_util.so] 0x55fcef0ca677 [sssd_be+0x8 avg:29 min: 20 max: 47 sum: 416
1 hits of transaction sysdb_transaction_commit+0xa0 [libsss_util.so] sdap_save_users+0x28a [libsss_ldap_common.so] sdap_get_users_done+0xd2 [libsss_ldap_common.so] sdap_search_user_process+0x25f [libsss_ldap_common.so] generic_ext_search_handler.isra.4+0xe0 [libsss_ldap_common.so] sdap_get_generic_op_finished+0x5ba [libsss_ldap_common.so] sdap_process_result+0x71d [libsss_ldap_common.so] tevent_common_loop_timer_delay+0xd0 [libtevent.so.0.9.30] epoll_event_loop_once+0x69 [libtevent.so.0.9.30] std_event_loop_once+0x37 [libtevent avg:116 min: 116 max: 116 sum: 116
1 hits of transaction sysdb_transaction_commit+0xa0 [libsss_util.so] sdap_nested_done+0x2d5 [libsss_ldap_common.so] sdap_nested_group_process_done+0x1fe [libsss_ldap_common.so] sdap_nested_group_process_done+0x21e [libsss_ldap_common.so] tevent_common_loop_immediate+0xd4 [libtevent.so.0.9.30] epoll_event_loop_once+0x5d [libtevent.so.0.9.30] std_event_loop_once+0x37 [libtevent.so.0.9.30] _tevent_loop_once+0x9d [libtevent.so.0.9.30] tevent_common_loop_wait+0x1b [libtevent.so.0.9.30] std_event_loop_wait+0x37 [libtevent.so.0.9.30]
avg:83 min: 83 max: 83 sum: 83
1 hits of transaction sysdb_transaction_commit+0xa0 [libsss_util.so] sdap_get_initgr_user+0x216 [libsss_ldap_common.so] generic_ext_search_handler.isra.4+0xe0 [libsss_ldap_common.so] sdap_get_generic_op_finished+0x5ba [libsss_ldap_common.so] sdap_process_result+0x71d [libsss_ldap_common.so] tevent_common_loop_timer_delay+0xd0 [libtevent.so.0.9.30] epoll_event_loop_once+0x69 [libtevent.so.0.9.30] std_event_loop_once+0x37 [libtevent.so.0.9.30] _tevent_loop_once+0x9d [libtevent.so.0.9.30] tevent_common_loop_wait+0x1b [libtevent.so avg:6 min: 6 max: 6 sum: 6
2 hits of transaction sysdb_transaction_commit+0xa0 [libsss_util.so] sdap_nested_done+0x2d5 [libsss_ldap_common.so] sdap_nested_group_process_done+0x21e [libsss_ldap_common.so] sdap_nested_group_process_done+0x1fe [libsss_ldap_common.so] tevent_common_loop_immediate+0xd4 [libtevent.so.0.9.30] epoll_event_loop_once+0x5d [libtevent.so.0.9.30] std_event_loop_once+0x37 [libtevent.so.0.9.30] _tevent_loop_once+0x9d [libtevent.so.0.9.30] tevent_common_loop_wait+0x1b [libtevent.so.0.9.30] std_event_loop_wait+0x37 [libtevent.so.0.9.30]
avg:30 min: 26 max: 34 sum: 60
The most expensive transaction breakdown, per transaction: value |-------------------------------------------------- count 0 | 0 50 |@ 1 100 | 0 150 | 0 ~ 450 | 0 500 | 0
500 |@ 1
####################################################################################################
[root@fc25-vm systemtap]# stap -d /usr/lib64/libtevent.so.0.9.30 -d /usr/libexec/sssd/sssd_be id_perf.stp WARNING: Missing unwind data for a module, rerun with 'stap -d /usr/lib64/libc-2.24.so' Total run time of id was: 162710 ms Number of zero-level cache transactions: 32 Time spent in level-0 sysdb transactions: 944 ms Time spent writing to LDB: 301 ms Number of LDAP searches: 141 Time spent waiting for LDAP: 77244 ms LDAP searches breakdown: Number of user requests: 1 Time spent in user requests: 136
Number of group requests: 29 Time spent in group requests: 3561
Number of initgroups requests: 1 Time spent in initgroups requests: 143
Unaccounted time: 84522 ms sysdb transaction breakdown: 9 hits of transaction sysdb_transaction_commit+0xa0 [libsss_util.so] sdap_nested_done+0x2d5 [libsss_ldap_common.so] sdap_nested_group_process_done+0x1fe [libsss_ldap_common.so] tevent_common_loop_immediate+0xd4 [libtevent.so.0.9.30] epoll_event_loop_once+0x5d [libtevent.so.0.9.30] std_event_loop_once+0x37 [libtevent.so.0.9.30] _tevent_loop_once+0x9d [libtevent.so.0.9.30] tevent_common_loop_wait+0x1b [libtevent.so.0.9.30] std_event_loop_wait+0x37 [libtevent.so.0.9.30] server_loop+0x13 [libsss_util.so] main+0x707 [sssd_be] 0x7f76 avg:28 min: 22 max: 37 sum: 254
1 hits of transaction sysdb_transaction_commit+0xa0 [libsss_util.so] sdap_get_initgr_user+0x216 [libsss_ldap_common.so] generic_ext_search_handler.isra.4+0xe0 [libsss_ldap_common.so] sdap_get_generic_op_finished+0x5ba [libsss_ldap_common.so] sdap_process_result+0x71d [libsss_ldap_common.so] tevent_common_loop_timer_delay+0xd0 [libtevent.so.0.9.30] epoll_event_loop_once+0x69 [libtevent.so.0.9.30] std_event_loop_once+0x37 [libtevent.so.0.9.30] _tevent_loop_once+0x9d [libtevent.so.0.9.30] tevent_common_loop_wait+0x1b [libtevent.so avg:5 min: 5 max: 5 sum: 5
3 hits of transaction sysdb_transaction_commit+0xa0 [libsss_util.so] sdap_nested_done+0x2d5 [libsss_ldap_common.so] tevent_common_loop_immediate+0xd4 [libtevent.so.0.9.30] epoll_event_loop_once+0x5d [libtevent.so.0.9.30] std_event_loop_once+0x37 [libtevent.so.0.9.30] _tevent_loop_once+0x9d [libtevent.so.0.9.30] tevent_common_loop_wait+0x1b [libtevent.so.0.9.30] std_event_loop_wait+0x37 [libtevent.so.0.9.30] server_loop+0x13 [libsss_util.so] main+0x707 [sssd_be] 0x7f7675d86401 [libc-2.24.so+0x20401] avg:21 min: 21 max: 22 sum: 64
1 hits of transaction sysdb_transaction_commit+0xa0 [libsss_util.so] sdap_nested_done+0x2d5 [libsss_ldap_common.so] sdap_nested_group_process_done+0x21e [libsss_ldap_common.so] tevent_common_loop_immediate+0xd4 [libtevent.so.0.9.30] epoll_event_loop_once+0x5d [libtevent.so.0.9.30] std_event_loop_once+0x37 [libtevent.so.0.9.30] _tevent_loop_once+0x9d [libtevent.so.0.9.30] tevent_common_loop_wait+0x1b [libtevent.so.0.9.30] std_event_loop_wait+0x37 [libtevent.so.0.9.30] server_loop+0x13 [libsss_util.so] main+0x707 [sssd_be] 0x7fc1 avg:35 min: 35 max: 35 sum: 35
1 hits of transaction sysdb_transaction_commit+0xa0 [libsss_util.so] sdap_nested_done+0x2d5 [libsss_ldap_common.so] sdap_nested_group_process_done+0x1fe [libsss_ldap_common.so] sdap_nested_group_process_done+0x21e [libsss_ldap_common.so] tevent_common_loop_immediate+0xd4 [libtevent.so.0.9.30] epoll_event_loop_once+0x5d [libtevent.so.0.9.30] std_event_loop_once+0x37 [libtevent.so.0.9.30] _tevent_loop_once+0x9d [libtevent.so.0.9.30] tevent_common_loop_wait+0x1b [libtevent.so.0.9.30] std_event_loop_wait+0x37 [libtevent.so.0.9.30]
avg:42 min: 42 max: 42 sum: 42
1 hits of transaction sysdb_transaction_commit+0xa0 [libsss_util.so] sdap_ad_save_group_membership_with_idmapping+0x532 [libsss_ldap_common.so] sdap_ad_tokengroups_initgr_mapping_done+0x13b [libsss_ldap_common.so] sdap_get_ad_tokengroups_done+0x22f [libsss_ldap_common.so] generic_ext_search_handler.isra.4+0xe0 [libsss_ldap_common.so] sdap_get_generic_op_finished+0x5ba [libsss_ldap_common.so] sdap_process_result+0x71d [libsss_ldap_common.so] tevent_common_loop_timer_delay+0xd0 [libtevent.so.0.9.30] epoll_event_loop_once+0x69 [li avg:86 min: 86 max: 86 sum: 86
2 hits of transaction sysdb_transaction_commit+0xa0 [libsss_util.so] sdap_nested_done+0x2d5 [libsss_ldap_common.so] sdap_nested_group_process_done+0x21e [libsss_ldap_common.so] sdap_nested_group_process_done+0x1fe [libsss_ldap_common.so] tevent_common_loop_immediate+0xd4 [libtevent.so.0.9.30] epoll_event_loop_once+0x5d [libtevent.so.0.9.30] std_event_loop_once+0x37 [libtevent.so.0.9.30] _tevent_loop_once+0x9d [libtevent.so.0.9.30] tevent_common_loop_wait+0x1b [libtevent.so.0.9.30] std_event_loop_wait+0x37 [libtevent.so.0.9.30]
avg:31 min: 26 max: 36 sum: 62
2 hits of transaction sysdb_transaction_commit+0xa0 [libsss_util.so] sdap_nested_done+0x2d5 [libsss_ldap_common.so] sdap_nested_group_process_done+0x21e [libsss_ldap_common.so] sdap_nested_group_process_done+0x21e [libsss_ldap_common.so] tevent_common_loop_immediate+0xd4 [libtevent.so.0.9.30] epoll_event_loop_once+0x5d [libtevent.so.0.9.30] std_event_loop_once+0x37 [libtevent.so.0.9.30] _tevent_loop_once+0x9d [libtevent.so.0.9.30] tevent_common_loop_wait+0x1b [libtevent.so.0.9.30] std_event_loop_wait+0x37 [libtevent.so.0.9.30]
avg:25 min: 24 max: 26 sum: 50
1 hits of transaction sysdb_transaction_commit+0xa0 [libsss_util.so] sdap_nested_done+0x2d5 [libsss_ldap_common.so] sdap_nested_group_process_done+0x1fe [libsss_ldap_common.so] tevent_common_loop_immediate+0xd4 [libtevent.so.0.9.30] epoll_event_loop_once+0x5d [libtevent.so.0.9.30] std_event_loop_once+0x37 [libtevent.so.0.9.30] _tevent_loop_once+0x9d [libtevent.so.0.9.30] tevent_common_loop_wait+0x1b [libtevent.so.0.9.30] std_event_loop_wait+0x37 [libtevent.so.0.9.30] server_loop+0x13 [libsss_util.so] main+0x707 [sssd_be] 0x7fc1 avg:35 min: 35 max: 35 sum: 35
4 hits of transaction sysdb_transaction_commit+0xa0 [libsss_util.so] sdap_nested_done+0x2d5 [libsss_ldap_common.so] sdap_nested_group_process_done+0x21e [libsss_ldap_common.so] tevent_common_loop_immediate+0xd4 [libtevent.so.0.9.30] epoll_event_loop_once+0x5d [libtevent.so.0.9.30] std_event_loop_once+0x37 [libtevent.so.0.9.30] _tevent_loop_once+0x9d [libtevent.so.0.9.30] tevent_common_loop_wait+0x1b [libtevent.so.0.9.30] std_event_loop_wait+0x37 [libtevent.so.0.9.30] server_loop+0x13 [libsss_util.so] main+0x707 [sssd_be] 0x7f76 avg:27 min: 21 max: 36 sum: 111
1 hits of transaction sysdb_transaction_commit+0xa0 [libsss_util.so] sdap_save_users+0x28a [libsss_ldap_common.so] sdap_get_users_done+0xd2 [libsss_ldap_common.so] sdap_search_user_process+0x25f [libsss_ldap_common.so] generic_ext_search_handler.isra.4+0xe0 [libsss_ldap_common.so] sdap_get_generic_op_finished+0x5ba [libsss_ldap_common.so] sdap_process_result+0x71d [libsss_ldap_common.so] tevent_common_loop_timer_delay+0xd0 [libtevent.so.0.9.30] epoll_event_loop_once+0x69 [libtevent.so.0.9.30] std_event_loop_once+0x37 [libtevent avg:114 min: 114 max: 114 sum: 114
3 hits of transaction sysdb_transaction_commit+0xa0 [libsss_util.so] sdap_nested_done+0x2d5 [libsss_ldap_common.so] sdap_nested_group_process_done+0x1fe [libsss_ldap_common.so] sdap_nested_group_process_done+0x1fe [libsss_ldap_common.so] tevent_common_loop_immediate+0xd4 [libtevent.so.0.9.30] epoll_event_loop_once+0x5d [libtevent.so.0.9.30] std_event_loop_once+0x37 [libtevent.so.0.9.30] _tevent_loop_once+0x9d [libtevent.so.0.9.30] tevent_common_loop_wait+0x1b [libtevent.so.0.9.30] std_event_loop_wait+0x37 [libtevent.so.0.9.30]
avg:28 min: 26 max: 33 sum: 86
The most expensive transaction breakdown, per transaction: value |-------------------------------------------------- count 0 | 0 50 | 0 100 |@ 1 150 | 0 200 | 0
#################################################################################################### Caching appears to be taking effect here. Guessing I should clear the user cache, is there a best practice for that, rather than just clearing out the sss/db/ directory? #################################################################################################### [root@fc25-vm systemtap]# stap nested_group_perf.stp ^CTime spent in group sssd_be searches: 2033 Time spent in sdap_nested_group_send/recv: 1112 ms (ratio: 54.69%) Time spent in zero-level sysdb transactions: 699 ms (ratio: 34.38%)
Breakdown of sdap_nested_group req (total: 1112 ms) sdap_nested_group_process req: 1111 sdap_nested_group_process_split req: 11 sdap_nested_group_check_cache: 11 sdap_nested_group_sysdb_search_users: 6 sdap_nested_group_sysdb_search_groups: 3 ldap request breakdown of total 2144 sdap_nested_group_deref req: 54 sdap_deref_search_send req 54 processing deref results: 0 sdap_nested_group_lookup_user req: 979 sdap_nested_group_lookup_group req: 66 Time spent refreshing unknown members: 1045
Breakdown of results processing (total 699) Time spent populating nested members: 3 Time spent searching ldb while populating nested members: 3 Time spent saving nested members: 330 Time spent writing to the ldb: 226 ms
#################################################################################################### This user is from a completely different department, so it likely didn't hit the cache much. #################################################################################################### [root@fc25-vm systemtap]# stap nested_group_perf.stp ^CTime spent in group sssd_be searches: 12502 Time spent in sdap_nested_group_send/recv: 4642 ms (ratio: 37.13%) Time spent in zero-level sysdb transactions: 1099 ms (ratio: 8.79%)
Breakdown of sdap_nested_group req (total: 4642 ms) sdap_nested_group_process req: 4971 sdap_nested_group_process_split req: 53 sdap_nested_group_check_cache: 51 sdap_nested_group_sysdb_search_users: 17 sdap_nested_group_sysdb_search_groups: 33 ldap request breakdown of total 55789 sdap_nested_group_deref req: 2908 sdap_deref_search_send req 50266 processing deref results: 2 sdap_nested_group_lookup_user req: 2138 sdap_nested_group_lookup_group req: 623 Time spent refreshing unknown members: 2762
Breakdown of results processing (total 1099) Time spent populating nested members: 29 Time spent searching ldb while populating nested members: 23 Time spent saving nested members: 635 Time spent writing to the ldb: 303 ms
I'm on Centos 7.2 and thought I'd try this to see how it works. Instead of nice output I got and error. Have I done something wrong?
[root@linux systemtap]# stap id_perf.stp ERROR: empty aggregate near identifier 'print' at id_perf.stp:68:5 Total run time of id was: 3 ms Number of zero-level cache transactions: 0 Time spent in level-0 sysdb transactions: 0 ms Time spent writing to LDB: 0 ms Number of LDAP searches: 0 Time spent waiting for LDAP: 0 ms LDAP searches breakdown: Number of user requests: 0 Time spent in user requests: 0
Number of group requests: 0 Time spent in group requests: 0
Number of initgroups requests: 0 Time spent in initgroups requests: 0
Unaccounted time: 3 ms sysdb transaction breakdown: The most expensive transaction breakdown, per transaction: WARNING: Number of errors: 1, skipped probes: 0 WARNING: /usr/bin/staprun exited with status: 1 Pass 5: run failed. [man error::pass5]
------ The most dangerous phrase in the language is, "We've always done it this way."
- Grace Hopper
On 26 November 2016 at 10:05, zfnoctis@gmail.com wrote:
Sorry for the delay, the holidays managed to take up more of my spare time than anticipated.
This was done on Fedora workstation 25. Overall sssd performance appears to be noticeably better than Ubuntu 16.04/16.10 and Fedora 24. Please let me know if I made any mistakes running these tests, or if I need to run additional tests.
############################################################ ######################################## This was my quick attempt at the USE method in order to rule out other causes of performance issues. ############################################################ ######################################## task: id <domain user>
vmstat, pidstat, and top/htop are showing high amounts of CPU time being spent in user-space. Looks to be sssd_be.
e.g. from pidstat 02:34:39 PM UID PID %usr %system %guest %CPU CPU Command 02:34:40 PM 0 12518 100.00 0.00 0.00 100.00 0 sssd_be
iostat is showing minimal I/O occurring, disks are mostly idle. Not seeing any errors or queuing.
nicstat shows network interfaces never go above 60% utilization. No network errors reported. No queuing.
free is not showing any serious memory pressure, no swapping is occurring.
Unsure how to measure interconnects between subsystems. May use perf/systemtap.
############################################################ ########################################
These runs were of id <domainuser> and each was of a different user so as to hopefully avoid caching. Each time there were warnings of missing unwind data, is there a set of packages I need to install to resolve this?
############################################################ ########################################
[root@fc25-vm systemtap]# stap id_perf.stp WARNING: Missing unwind data for a module, rerun with 'stap -d /usr/lib64/libtevent.so.0.9.30' Total run time of id was: 91098 ms Number of zero-level cache transactions: 15 Time spent in level-0 sysdb transactions: 324 ms Time spent writing to LDB: 93 ms Number of LDAP searches: 83 Time spent waiting for LDAP: 1480105770919 ms LDAP searches breakdown: Number of user requests: 0 Time spent in user requests: 0
Number of group requests: 14 Time spent in group requests: 1933 Number of initgroups requests: 0 Time spent in initgroups requests: 0
Unaccounted time: -1480105680145 ms sysdb transaction breakdown: 1 hits of transaction sysdb_transaction_commit+0xa0 [libsss_util.so] sdap_nested_done+0x2d5 [libsss_ldap_common.so] sdap_nested_group_process_done+0x1fe [libsss_ldap_common.so] sdap_nested_group_process_done+0x1fe [libsss_ldap_common.so] 0x7f78b9781474 [libtevent.so.0.9.30+0x5474] avg:36 min: 36 max: 36 sum: 36
8 hits of transaction sysdb_transaction_commit+0xa0 [libsss_util.so] sdap_nested_done+0x2d5 [libsss_ldap_common.so] sdap_nested_group_process_done+0x1fe [libsss_ldap_common.so] 0x7f78b9781474 [libtevent.so.0.9.30+0x5474] avg:21 min: 19 max: 28 sum: 171
4 hits of transaction sysdb_transaction_commit+0xa0 [libsss_util.so] sdap_nested_done+0x2d5 [libsss_ldap_common.so] sdap_nested_group_process_done+0x21e [libsss_ldap_common.so] 0x7f78b9781474 [libtevent.so.0.9.30+0x5474] avg:29 min: 21 max: 52 sum: 117
The most expensive transaction breakdown, per transaction: value |-------------------------------------------------- count 0 |@@@ 3 50 |@ 1 100 | 0 150 | 0
############################################################ ########################################
[root@fc25-vm systemtap]# stap -d /usr/lib64/libtevent.so.0.9.30 id_perf.stp WARNING: Missing unwind data for a module, rerun with 'stap -d /usr/libexec/sssd/sssd_be' Total run time of id was: 102614 ms Number of zero-level cache transactions: 34 Time spent in level-0 sysdb transactions: 7789 ms Time spent writing to LDB: 519 ms Number of LDAP searches: 742 Time spent waiting for LDAP: 51312 ms LDAP searches breakdown: Number of user requests: 1 Time spent in user requests: 144
Number of group requests: 30 Time spent in group requests: 50523 Number of initgroups requests: 1 Time spent in initgroups requests: 465
Unaccounted time: 43513 ms sysdb transaction breakdown: 2 hits of transaction sysdb_transaction_commit+0xa0 [libsss_util.so] sdap_nested_done+0x2d5 [libsss_ldap_common.so] sdap_nested_group_process_done+0x1fe [libsss_ldap_common.so] tevent_common_loop_immediate+0xd4 [libtevent.so.0.9.30] epoll_event_loop_once+0x5d [libtevent.so.0.9.30] std_event_loop_once+0x37 [libtevent.so.0.9.30] _tevent_loop_once+0x9d [libtevent.so.0.9.30] tevent_common_loop_wait+0x1b [libtevent.so.0.9.30] std_event_loop_wait+0x37 [libtevent.so.0.9.30] server_loop+0x13 [libsss_util.so] 0x55a92ac7c677 [sssd_be+0x8 avg:22 min: 22 max: 23 sum: 45
1 hits of transaction sysdb_transaction_commit+0xa0 [libsss_util.so] sdap_nested_done+0x2d5 [libsss_ldap_common.so] sdap_nested_group_process_done+0x1fe [libsss_ldap_common.so] sdap_nested_group_process_done+0x1fe [libsss_ldap_common.so] tevent_common_loop_immediate+0xd4 [libtevent.so.0.9.30] epoll_event_loop_once+0x5d [libtevent.so.0.9.30] std_event_loop_once+0x37 [libtevent.so.0.9.30] _tevent_loop_once+0x9d [libtevent.so.0.9.30] tevent_common_loop_wait+0x1b [libtevent.so.0.9.30] std_event_loop_wait+0x37 [libtevent.so.0.9.30]
avg:53 min: 53 max: 53 sum: 53
4 hits of transaction sysdb_transaction_commit+0xa0 [libsss_util.so] sdap_nested_done+0x2d5 [libsss_ldap_common.so] sdap_nested_group_process_done+0x21e [libsss_ldap_common.so] tevent_common_loop_immediate+0xd4 [libtevent.so.0.9.30] epoll_event_loop_once+0x5d [libtevent.so.0.9.30] std_event_loop_once+0x37 [libtevent.so.0.9.30] _tevent_loop_once+0x9d [libtevent.so.0.9.30] tevent_common_loop_wait+0x1b [libtevent.so.0.9.30] std_event_loop_wait+0x37 [libtevent.so.0.9.30] server_loop+0x13 [libsss_util.so] 0x55fcef0ca677 [sssd_be+0x8 avg:22 min: 22 max: 23 sum: 90
2 hits of transaction sysdb_transaction_commit+0xa0 [libsss_util.so] sdap_nested_done+0x2d5 [libsss_ldap_common.so] sdap_nested_group_process_done+0x21e [libsss_ldap_common.so] tevent_common_loop_immediate+0xd4 [libtevent.so.0.9.30] epoll_event_loop_once+0x5d [libtevent.so.0.9.30] std_event_loop_once+0x37 [libtevent.so.0.9.30] _tevent_loop_once+0x9d [libtevent.so.0.9.30] tevent_common_loop_wait+0x1b [libtevent.so.0.9.30] std_event_loop_wait+0x37 [libtevent.so.0.9.30] server_loop+0x13 [libsss_util.so] 0x55a92ac7c677 [sssd_be+0x8 avg:2635 min: 67 max: 5203 sum: 5270
2 hits of transaction sysdb_transaction_commit+0xa0 [libsss_util.so] sdap_nested_done+0x2d5 [libsss_ldap_common.so] tevent_common_loop_immediate+0xd4 [libtevent.so.0.9.30] epoll_event_loop_once+0x5d [libtevent.so.0.9.30] std_event_loop_once+0x37 [libtevent.so.0.9.30] _tevent_loop_once+0x9d [libtevent.so.0.9.30] tevent_common_loop_wait+0x1b [libtevent.so.0.9.30] std_event_loop_wait+0x37 [libtevent.so.0.9.30] server_loop+0x13 [libsss_util.so] 0x55fcef0ca677 [sssd_be+0x8677] avg:25 min: 25 max: 26 sum: 51
1 hits of transaction sysdb_transaction_commit+0xa0 [libsss_util.so] sdap_nested_done+0x2d5 [libsss_ldap_common.so] sdap_nested_group_process_done+0x21e [libsss_ldap_common.so] sdap_nested_group_process_done+0x1fe [libsss_ldap_common.so] sdap_nested_group_process_done+0x1fe [libsss_ldap_common.so] tevent_common_loop_immediate+0xd4 [libtevent.so.0.9.30] epoll_event_loop_once+0x5d [libtevent.so.0.9.30] std_event_loop_once+0x37 [libtevent.so.0.9.30] _tevent_loop_once+0x9d [libtevent.so.0.9.30] tevent_common_loop_wait+0x1b [libteve avg:1539 min: 1539 max: 1539 sum: 1539
1 hits of transaction sysdb_transaction_commit+0xa0 [libsss_util.so] sdap_ad_save_group_membership_with_idmapping+0x532 [libsss_ldap_common.so] sdap_ad_tokengroups_initgr_mapping_done+0x13b [libsss_ldap_common.so] sdap_get_ad_tokengroups_done+0x22f [libsss_ldap_common.so] generic_ext_search_handler.isra.4+0xe0 [libsss_ldap_common.so] sdap_get_generic_op_finished+0x5ba [libsss_ldap_common.so] sdap_process_result+0x71d [libsss_ldap_common.so] tevent_common_loop_timer_delay+0xd0 [libtevent.so.0.9.30] epoll_event_loop_once+0x69 [li avg:60 min: 60 max: 60 sum: 60
14 hits of transaction sysdb_transaction_commit+0xa0 [libsss_util.so] sdap_nested_done+0x2d5 [libsss_ldap_common.so] sdap_nested_group_process_done+0x1fe [libsss_ldap_common.so] tevent_common_loop_immediate+0xd4 [libtevent.so.0.9.30] epoll_event_loop_once+0x5d [libtevent.so.0.9.30] std_event_loop_once+0x37 [libtevent.so.0.9.30] _tevent_loop_once+0x9d [libtevent.so.0.9.30] tevent_common_loop_wait+0x1b [libtevent.so.0.9.30] std_event_loop_wait+0x37 [libtevent.so.0.9.30] server_loop+0x13 [libsss_util.so] 0x55fcef0ca677 [sssd_be+0x8 avg:29 min: 20 max: 47 sum: 416
1 hits of transaction sysdb_transaction_commit+0xa0 [libsss_util.so] sdap_save_users+0x28a [libsss_ldap_common.so] sdap_get_users_done+0xd2 [libsss_ldap_common.so] sdap_search_user_process+0x25f [libsss_ldap_common.so] generic_ext_search_handler.isra.4+0xe0 [libsss_ldap_common.so] sdap_get_generic_op_finished+0x5ba [libsss_ldap_common.so] sdap_process_result+0x71d [libsss_ldap_common.so] tevent_common_loop_timer_delay+0xd0 [libtevent.so.0.9.30] epoll_event_loop_once+0x69 [libtevent.so.0.9.30] std_event_loop_once+0x37 [libtevent avg:116 min: 116 max: 116 sum: 116
1 hits of transaction sysdb_transaction_commit+0xa0 [libsss_util.so] sdap_nested_done+0x2d5 [libsss_ldap_common.so] sdap_nested_group_process_done+0x1fe [libsss_ldap_common.so] sdap_nested_group_process_done+0x21e [libsss_ldap_common.so] tevent_common_loop_immediate+0xd4 [libtevent.so.0.9.30] epoll_event_loop_once+0x5d [libtevent.so.0.9.30] std_event_loop_once+0x37 [libtevent.so.0.9.30] _tevent_loop_once+0x9d [libtevent.so.0.9.30] tevent_common_loop_wait+0x1b [libtevent.so.0.9.30] std_event_loop_wait+0x37 [libtevent.so.0.9.30]
avg:83 min: 83 max: 83 sum: 83
1 hits of transaction sysdb_transaction_commit+0xa0 [libsss_util.so] sdap_get_initgr_user+0x216 [libsss_ldap_common.so] generic_ext_search_handler.isra.4+0xe0 [libsss_ldap_common.so] sdap_get_generic_op_finished+0x5ba [libsss_ldap_common.so] sdap_process_result+0x71d [libsss_ldap_common.so] tevent_common_loop_timer_delay+0xd0 [libtevent.so.0.9.30] epoll_event_loop_once+0x69 [libtevent.so.0.9.30] std_event_loop_once+0x37 [libtevent.so.0.9.30] _tevent_loop_once+0x9d [libtevent.so.0.9.30] tevent_common_loop_wait+0x1b [libtevent.so avg:6 min: 6 max: 6 sum: 6
2 hits of transaction sysdb_transaction_commit+0xa0 [libsss_util.so] sdap_nested_done+0x2d5 [libsss_ldap_common.so] sdap_nested_group_process_done+0x21e [libsss_ldap_common.so] sdap_nested_group_process_done+0x1fe [libsss_ldap_common.so] tevent_common_loop_immediate+0xd4 [libtevent.so.0.9.30] epoll_event_loop_once+0x5d [libtevent.so.0.9.30] std_event_loop_once+0x37 [libtevent.so.0.9.30] _tevent_loop_once+0x9d [libtevent.so.0.9.30] tevent_common_loop_wait+0x1b [libtevent.so.0.9.30] std_event_loop_wait+0x37 [libtevent.so.0.9.30]
avg:30 min: 26 max: 34 sum: 60
The most expensive transaction breakdown, per transaction: value |-------------------------------------------------- count 0 | 0 50 |@ 1 100 | 0 150 | 0 ~ 450 | 0 500 | 0
500 |@ 1
############################################################ ########################################
[root@fc25-vm systemtap]# stap -d /usr/lib64/libtevent.so.0.9.30 -d /usr/libexec/sssd/sssd_be id_perf.stp WARNING: Missing unwind data for a module, rerun with 'stap -d /usr/lib64/ libc-2.24.so' Total run time of id was: 162710 ms Number of zero-level cache transactions: 32 Time spent in level-0 sysdb transactions: 944 ms Time spent writing to LDB: 301 ms Number of LDAP searches: 141 Time spent waiting for LDAP: 77244 ms LDAP searches breakdown: Number of user requests: 1 Time spent in user requests: 136
Number of group requests: 29 Time spent in group requests: 3561 Number of initgroups requests: 1 Time spent in initgroups requests: 143
Unaccounted time: 84522 ms sysdb transaction breakdown: 9 hits of transaction sysdb_transaction_commit+0xa0 [libsss_util.so] sdap_nested_done+0x2d5 [libsss_ldap_common.so] sdap_nested_group_process_done+0x1fe [libsss_ldap_common.so] tevent_common_loop_immediate+0xd4 [libtevent.so.0.9.30] epoll_event_loop_once+0x5d [libtevent.so.0.9.30] std_event_loop_once+0x37 [libtevent.so.0.9.30] _tevent_loop_once+0x9d [libtevent.so.0.9.30] tevent_common_loop_wait+0x1b [libtevent.so.0.9.30] std_event_loop_wait+0x37 [libtevent.so.0.9.30] server_loop+0x13 [libsss_util.so] main+0x707 [sssd_be] 0x7f76 avg:28 min: 22 max: 37 sum: 254
1 hits of transaction sysdb_transaction_commit+0xa0 [libsss_util.so] sdap_get_initgr_user+0x216 [libsss_ldap_common.so] generic_ext_search_handler.isra.4+0xe0 [libsss_ldap_common.so] sdap_get_generic_op_finished+0x5ba [libsss_ldap_common.so] sdap_process_result+0x71d [libsss_ldap_common.so] tevent_common_loop_timer_delay+0xd0 [libtevent.so.0.9.30] epoll_event_loop_once+0x69 [libtevent.so.0.9.30] std_event_loop_once+0x37 [libtevent.so.0.9.30] _tevent_loop_once+0x9d [libtevent.so.0.9.30] tevent_common_loop_wait+0x1b [libtevent.so avg:5 min: 5 max: 5 sum: 5
3 hits of transaction sysdb_transaction_commit+0xa0 [libsss_util.so] sdap_nested_done+0x2d5 [libsss_ldap_common.so] tevent_common_loop_immediate+0xd4 [libtevent.so.0.9.30] epoll_event_loop_once+0x5d [libtevent.so.0.9.30] std_event_loop_once+0x37 [libtevent.so.0.9.30] _tevent_loop_once+0x9d [libtevent.so.0.9.30] tevent_common_loop_wait+0x1b [libtevent.so.0.9.30] std_event_loop_wait+0x37 [libtevent.so.0.9.30] server_loop+0x13 [libsss_util.so] main+0x707 [sssd_be] 0x7f7675d86401 [libc-2.24.so+0x20401] avg:21 min: 21 max: 22 sum: 64
1 hits of transaction sysdb_transaction_commit+0xa0 [libsss_util.so] sdap_nested_done+0x2d5 [libsss_ldap_common.so] sdap_nested_group_process_done+0x21e [libsss_ldap_common.so] tevent_common_loop_immediate+0xd4 [libtevent.so.0.9.30] epoll_event_loop_once+0x5d [libtevent.so.0.9.30] std_event_loop_once+0x37 [libtevent.so.0.9.30] _tevent_loop_once+0x9d [libtevent.so.0.9.30] tevent_common_loop_wait+0x1b [libtevent.so.0.9.30] std_event_loop_wait+0x37 [libtevent.so.0.9.30] server_loop+0x13 [libsss_util.so] main+0x707 [sssd_be] 0x7fc1 avg:35 min: 35 max: 35 sum: 35
1 hits of transaction sysdb_transaction_commit+0xa0 [libsss_util.so] sdap_nested_done+0x2d5 [libsss_ldap_common.so] sdap_nested_group_process_done+0x1fe [libsss_ldap_common.so] sdap_nested_group_process_done+0x21e [libsss_ldap_common.so] tevent_common_loop_immediate+0xd4 [libtevent.so.0.9.30] epoll_event_loop_once+0x5d [libtevent.so.0.9.30] std_event_loop_once+0x37 [libtevent.so.0.9.30] _tevent_loop_once+0x9d [libtevent.so.0.9.30] tevent_common_loop_wait+0x1b [libtevent.so.0.9.30] std_event_loop_wait+0x37 [libtevent.so.0.9.30]
avg:42 min: 42 max: 42 sum: 42
1 hits of transaction sysdb_transaction_commit+0xa0 [libsss_util.so] sdap_ad_save_group_membership_with_idmapping+0x532 [libsss_ldap_common.so] sdap_ad_tokengroups_initgr_mapping_done+0x13b [libsss_ldap_common.so] sdap_get_ad_tokengroups_done+0x22f [libsss_ldap_common.so] generic_ext_search_handler.isra.4+0xe0 [libsss_ldap_common.so] sdap_get_generic_op_finished+0x5ba [libsss_ldap_common.so] sdap_process_result+0x71d [libsss_ldap_common.so] tevent_common_loop_timer_delay+0xd0 [libtevent.so.0.9.30] epoll_event_loop_once+0x69 [li avg:86 min: 86 max: 86 sum: 86
2 hits of transaction sysdb_transaction_commit+0xa0 [libsss_util.so] sdap_nested_done+0x2d5 [libsss_ldap_common.so] sdap_nested_group_process_done+0x21e [libsss_ldap_common.so] sdap_nested_group_process_done+0x1fe [libsss_ldap_common.so] tevent_common_loop_immediate+0xd4 [libtevent.so.0.9.30] epoll_event_loop_once+0x5d [libtevent.so.0.9.30] std_event_loop_once+0x37 [libtevent.so.0.9.30] _tevent_loop_once+0x9d [libtevent.so.0.9.30] tevent_common_loop_wait+0x1b [libtevent.so.0.9.30] std_event_loop_wait+0x37 [libtevent.so.0.9.30]
avg:31 min: 26 max: 36 sum: 62
2 hits of transaction sysdb_transaction_commit+0xa0 [libsss_util.so] sdap_nested_done+0x2d5 [libsss_ldap_common.so] sdap_nested_group_process_done+0x21e [libsss_ldap_common.so] sdap_nested_group_process_done+0x21e [libsss_ldap_common.so] tevent_common_loop_immediate+0xd4 [libtevent.so.0.9.30] epoll_event_loop_once+0x5d [libtevent.so.0.9.30] std_event_loop_once+0x37 [libtevent.so.0.9.30] _tevent_loop_once+0x9d [libtevent.so.0.9.30] tevent_common_loop_wait+0x1b [libtevent.so.0.9.30] std_event_loop_wait+0x37 [libtevent.so.0.9.30]
avg:25 min: 24 max: 26 sum: 50
1 hits of transaction sysdb_transaction_commit+0xa0 [libsss_util.so] sdap_nested_done+0x2d5 [libsss_ldap_common.so] sdap_nested_group_process_done+0x1fe [libsss_ldap_common.so] tevent_common_loop_immediate+0xd4 [libtevent.so.0.9.30] epoll_event_loop_once+0x5d [libtevent.so.0.9.30] std_event_loop_once+0x37 [libtevent.so.0.9.30] _tevent_loop_once+0x9d [libtevent.so.0.9.30] tevent_common_loop_wait+0x1b [libtevent.so.0.9.30] std_event_loop_wait+0x37 [libtevent.so.0.9.30] server_loop+0x13 [libsss_util.so] main+0x707 [sssd_be] 0x7fc1 avg:35 min: 35 max: 35 sum: 35
4 hits of transaction sysdb_transaction_commit+0xa0 [libsss_util.so] sdap_nested_done+0x2d5 [libsss_ldap_common.so] sdap_nested_group_process_done+0x21e [libsss_ldap_common.so] tevent_common_loop_immediate+0xd4 [libtevent.so.0.9.30] epoll_event_loop_once+0x5d [libtevent.so.0.9.30] std_event_loop_once+0x37 [libtevent.so.0.9.30] _tevent_loop_once+0x9d [libtevent.so.0.9.30] tevent_common_loop_wait+0x1b [libtevent.so.0.9.30] std_event_loop_wait+0x37 [libtevent.so.0.9.30] server_loop+0x13 [libsss_util.so] main+0x707 [sssd_be] 0x7f76 avg:27 min: 21 max: 36 sum: 111
1 hits of transaction sysdb_transaction_commit+0xa0 [libsss_util.so] sdap_save_users+0x28a [libsss_ldap_common.so] sdap_get_users_done+0xd2 [libsss_ldap_common.so] sdap_search_user_process+0x25f [libsss_ldap_common.so] generic_ext_search_handler.isra.4+0xe0 [libsss_ldap_common.so] sdap_get_generic_op_finished+0x5ba [libsss_ldap_common.so] sdap_process_result+0x71d [libsss_ldap_common.so] tevent_common_loop_timer_delay+0xd0 [libtevent.so.0.9.30] epoll_event_loop_once+0x69 [libtevent.so.0.9.30] std_event_loop_once+0x37 [libtevent avg:114 min: 114 max: 114 sum: 114
3 hits of transaction sysdb_transaction_commit+0xa0 [libsss_util.so] sdap_nested_done+0x2d5 [libsss_ldap_common.so] sdap_nested_group_process_done+0x1fe [libsss_ldap_common.so] sdap_nested_group_process_done+0x1fe [libsss_ldap_common.so] tevent_common_loop_immediate+0xd4 [libtevent.so.0.9.30] epoll_event_loop_once+0x5d [libtevent.so.0.9.30] std_event_loop_once+0x37 [libtevent.so.0.9.30] _tevent_loop_once+0x9d [libtevent.so.0.9.30] tevent_common_loop_wait+0x1b [libtevent.so.0.9.30] std_event_loop_wait+0x37 [libtevent.so.0.9.30]
avg:28 min: 26 max: 33 sum: 86
The most expensive transaction breakdown, per transaction: value |-------------------------------------------------- count 0 | 0 50 | 0 100 |@ 1 150 | 0 200 | 0
############################################################ ######################################## Caching appears to be taking effect here. Guessing I should clear the user cache, is there a best practice for that, rather than just clearing out the sss/db/ directory? ############################################################ ######################################## [root@fc25-vm systemtap]# stap nested_group_perf.stp ^CTime spent in group sssd_be searches: 2033 Time spent in sdap_nested_group_send/recv: 1112 ms (ratio: 54.69%) Time spent in zero-level sysdb transactions: 699 ms (ratio: 34.38%)
Breakdown of sdap_nested_group req (total: 1112 ms) sdap_nested_group_process req: 1111 sdap_nested_group_process_split req: 11 sdap_nested_group_check_cache: 11 sdap_nested_group_sysdb_search_users: 6 sdap_nested_group_sysdb_search_groups: 3 ldap request breakdown of total 2144 sdap_nested_group_deref req: 54 sdap_deref_search_send req 54 processing deref results: 0 sdap_nested_group_lookup_user req: 979 sdap_nested_group_lookup_group req: 66 Time spent refreshing unknown members: 1045
Breakdown of results processing (total 699) Time spent populating nested members: 3 Time spent searching ldb while populating nested members: 3 Time spent saving nested members: 330 Time spent writing to the ldb: 226 ms
############################################################ ######################################## This user is from a completely different department, so it likely didn't hit the cache much. ############################################################ ######################################## [root@fc25-vm systemtap]# stap nested_group_perf.stp ^CTime spent in group sssd_be searches: 12502 Time spent in sdap_nested_group_send/recv: 4642 ms (ratio: 37.13%) Time spent in zero-level sysdb transactions: 1099 ms (ratio: 8.79%)
Breakdown of sdap_nested_group req (total: 4642 ms) sdap_nested_group_process req: 4971 sdap_nested_group_process_split req: 53 sdap_nested_group_check_cache: 51 sdap_nested_group_sysdb_search_users: 17 sdap_nested_group_sysdb_search_groups: 33 ldap request breakdown of total 55789 sdap_nested_group_deref req: 2908 sdap_deref_search_send req 50266 processing deref results: 2 sdap_nested_group_lookup_user req: 2138 sdap_nested_group_lookup_group req: 623 Time spent refreshing unknown members: 2762
Breakdown of results processing (total 1099) Time spent populating nested members: 29 Time spent searching ldb while populating nested members: 23 Time spent saving nested members: 635 Time spent writing to the ldb: 303 ms _______________________________________________ sssd-users mailing list -- sssd-users@lists.fedorahosted.org To unsubscribe send an email to sssd-users-leave@lists.fedorahosted.org
On Mon, Nov 28, 2016 at 02:45:59PM +1100, Lachlan Musicman wrote:
I'm on Centos 7.2 and thought I'd try this to see how it works. Instead of nice output I got and error. Have I done something wrong?
[root@linux systemtap]# stap id_perf.stp ERROR: empty aggregate near identifier 'print' at id_perf.stp:68:5
Ah, sorry, this is a silly bug in the script. In your case, id returned data from the cache (as you can see the time id ran was only 3ms) so there were no writable transactions done and the script ended up dereferencing an empty aggregate that normally saves the transaction times.
I've opened a PR on github to fix the script, but runnning sss_cache before you run the script next time should give 'better' results.
Total run time of id was: 3 ms Number of zero-level cache transactions: 0 Time spent in level-0 sysdb transactions: 0 ms Time spent writing to LDB: 0 ms Number of LDAP searches: 0 Time spent waiting for LDAP: 0 ms LDAP searches breakdown: Number of user requests: 0 Time spent in user requests: 0
Number of group requests: 0 Time spent in group requests: 0 Number of initgroups requests: 0 Time spent in initgroups requests: 0
Unaccounted time: 3 ms sysdb transaction breakdown: The most expensive transaction breakdown, per transaction: WARNING: Number of errors: 1, skipped probes: 0 WARNING: /usr/bin/staprun exited with status: 1 Pass 5: run failed. [man error::pass5]
The most dangerous phrase in the language is, "We've always done it this way."
- Grace Hopper
On 26 November 2016 at 10:05, zfnoctis@gmail.com wrote:
Sorry for the delay, the holidays managed to take up more of my spare time than anticipated.
This was done on Fedora workstation 25. Overall sssd performance appears to be noticeably better than Ubuntu 16.04/16.10 and Fedora 24. Please let me know if I made any mistakes running these tests, or if I need to run additional tests.
############################################################ ######################################## This was my quick attempt at the USE method in order to rule out other causes of performance issues. ############################################################ ######################################## task: id <domain user>
vmstat, pidstat, and top/htop are showing high amounts of CPU time being spent in user-space. Looks to be sssd_be.
e.g. from pidstat 02:34:39 PM UID PID %usr %system %guest %CPU CPU Command 02:34:40 PM 0 12518 100.00 0.00 0.00 100.00 0 sssd_be
iostat is showing minimal I/O occurring, disks are mostly idle. Not seeing any errors or queuing.
nicstat shows network interfaces never go above 60% utilization. No network errors reported. No queuing.
free is not showing any serious memory pressure, no swapping is occurring.
Unsure how to measure interconnects between subsystems. May use perf/systemtap.
############################################################ ########################################
These runs were of id <domainuser> and each was of a different user so as to hopefully avoid caching. Each time there were warnings of missing unwind data, is there a set of packages I need to install to resolve this?
############################################################ ########################################
[root@fc25-vm systemtap]# stap id_perf.stp WARNING: Missing unwind data for a module, rerun with 'stap -d /usr/lib64/libtevent.so.0.9.30' Total run time of id was: 91098 ms Number of zero-level cache transactions: 15 Time spent in level-0 sysdb transactions: 324 ms Time spent writing to LDB: 93 ms Number of LDAP searches: 83 Time spent waiting for LDAP: 1480105770919 ms LDAP searches breakdown: Number of user requests: 0 Time spent in user requests: 0
Number of group requests: 14 Time spent in group requests: 1933 Number of initgroups requests: 0 Time spent in initgroups requests: 0
Unaccounted time: -1480105680145 ms sysdb transaction breakdown: 1 hits of transaction sysdb_transaction_commit+0xa0 [libsss_util.so] sdap_nested_done+0x2d5 [libsss_ldap_common.so] sdap_nested_group_process_done+0x1fe [libsss_ldap_common.so] sdap_nested_group_process_done+0x1fe [libsss_ldap_common.so] 0x7f78b9781474 [libtevent.so.0.9.30+0x5474] avg:36 min: 36 max: 36 sum: 36
8 hits of transaction sysdb_transaction_commit+0xa0 [libsss_util.so] sdap_nested_done+0x2d5 [libsss_ldap_common.so] sdap_nested_group_process_done+0x1fe [libsss_ldap_common.so] 0x7f78b9781474 [libtevent.so.0.9.30+0x5474] avg:21 min: 19 max: 28 sum: 171
4 hits of transaction sysdb_transaction_commit+0xa0 [libsss_util.so] sdap_nested_done+0x2d5 [libsss_ldap_common.so] sdap_nested_group_process_done+0x21e [libsss_ldap_common.so] 0x7f78b9781474 [libtevent.so.0.9.30+0x5474] avg:29 min: 21 max: 52 sum: 117
The most expensive transaction breakdown, per transaction: value |-------------------------------------------------- count 0 |@@@ 3 50 |@ 1 100 | 0 150 | 0
############################################################ ########################################
[root@fc25-vm systemtap]# stap -d /usr/lib64/libtevent.so.0.9.30 id_perf.stp WARNING: Missing unwind data for a module, rerun with 'stap -d /usr/libexec/sssd/sssd_be' Total run time of id was: 102614 ms Number of zero-level cache transactions: 34 Time spent in level-0 sysdb transactions: 7789 ms Time spent writing to LDB: 519 ms Number of LDAP searches: 742 Time spent waiting for LDAP: 51312 ms LDAP searches breakdown: Number of user requests: 1 Time spent in user requests: 144
Number of group requests: 30 Time spent in group requests: 50523 Number of initgroups requests: 1 Time spent in initgroups requests: 465
Unaccounted time: 43513 ms sysdb transaction breakdown: 2 hits of transaction sysdb_transaction_commit+0xa0 [libsss_util.so] sdap_nested_done+0x2d5 [libsss_ldap_common.so] sdap_nested_group_process_done+0x1fe [libsss_ldap_common.so] tevent_common_loop_immediate+0xd4 [libtevent.so.0.9.30] epoll_event_loop_once+0x5d [libtevent.so.0.9.30] std_event_loop_once+0x37 [libtevent.so.0.9.30] _tevent_loop_once+0x9d [libtevent.so.0.9.30] tevent_common_loop_wait+0x1b [libtevent.so.0.9.30] std_event_loop_wait+0x37 [libtevent.so.0.9.30] server_loop+0x13 [libsss_util.so] 0x55a92ac7c677 [sssd_be+0x8 avg:22 min: 22 max: 23 sum: 45
1 hits of transaction sysdb_transaction_commit+0xa0 [libsss_util.so] sdap_nested_done+0x2d5 [libsss_ldap_common.so] sdap_nested_group_process_done+0x1fe [libsss_ldap_common.so] sdap_nested_group_process_done+0x1fe [libsss_ldap_common.so] tevent_common_loop_immediate+0xd4 [libtevent.so.0.9.30] epoll_event_loop_once+0x5d [libtevent.so.0.9.30] std_event_loop_once+0x37 [libtevent.so.0.9.30] _tevent_loop_once+0x9d [libtevent.so.0.9.30] tevent_common_loop_wait+0x1b [libtevent.so.0.9.30] std_event_loop_wait+0x37 [libtevent.so.0.9.30]
avg:53 min: 53 max: 53 sum: 53
4 hits of transaction sysdb_transaction_commit+0xa0 [libsss_util.so] sdap_nested_done+0x2d5 [libsss_ldap_common.so] sdap_nested_group_process_done+0x21e [libsss_ldap_common.so] tevent_common_loop_immediate+0xd4 [libtevent.so.0.9.30] epoll_event_loop_once+0x5d [libtevent.so.0.9.30] std_event_loop_once+0x37 [libtevent.so.0.9.30] _tevent_loop_once+0x9d [libtevent.so.0.9.30] tevent_common_loop_wait+0x1b [libtevent.so.0.9.30] std_event_loop_wait+0x37 [libtevent.so.0.9.30] server_loop+0x13 [libsss_util.so] 0x55fcef0ca677 [sssd_be+0x8 avg:22 min: 22 max: 23 sum: 90
2 hits of transaction sysdb_transaction_commit+0xa0 [libsss_util.so] sdap_nested_done+0x2d5 [libsss_ldap_common.so] sdap_nested_group_process_done+0x21e [libsss_ldap_common.so] tevent_common_loop_immediate+0xd4 [libtevent.so.0.9.30] epoll_event_loop_once+0x5d [libtevent.so.0.9.30] std_event_loop_once+0x37 [libtevent.so.0.9.30] _tevent_loop_once+0x9d [libtevent.so.0.9.30] tevent_common_loop_wait+0x1b [libtevent.so.0.9.30] std_event_loop_wait+0x37 [libtevent.so.0.9.30] server_loop+0x13 [libsss_util.so] 0x55a92ac7c677 [sssd_be+0x8 avg:2635 min: 67 max: 5203 sum: 5270
2 hits of transaction sysdb_transaction_commit+0xa0 [libsss_util.so] sdap_nested_done+0x2d5 [libsss_ldap_common.so] tevent_common_loop_immediate+0xd4 [libtevent.so.0.9.30] epoll_event_loop_once+0x5d [libtevent.so.0.9.30] std_event_loop_once+0x37 [libtevent.so.0.9.30] _tevent_loop_once+0x9d [libtevent.so.0.9.30] tevent_common_loop_wait+0x1b [libtevent.so.0.9.30] std_event_loop_wait+0x37 [libtevent.so.0.9.30] server_loop+0x13 [libsss_util.so] 0x55fcef0ca677 [sssd_be+0x8677] avg:25 min: 25 max: 26 sum: 51
1 hits of transaction sysdb_transaction_commit+0xa0 [libsss_util.so] sdap_nested_done+0x2d5 [libsss_ldap_common.so] sdap_nested_group_process_done+0x21e [libsss_ldap_common.so] sdap_nested_group_process_done+0x1fe [libsss_ldap_common.so] sdap_nested_group_process_done+0x1fe [libsss_ldap_common.so] tevent_common_loop_immediate+0xd4 [libtevent.so.0.9.30] epoll_event_loop_once+0x5d [libtevent.so.0.9.30] std_event_loop_once+0x37 [libtevent.so.0.9.30] _tevent_loop_once+0x9d [libtevent.so.0.9.30] tevent_common_loop_wait+0x1b [libteve avg:1539 min: 1539 max: 1539 sum: 1539
1 hits of transaction sysdb_transaction_commit+0xa0 [libsss_util.so] sdap_ad_save_group_membership_with_idmapping+0x532 [libsss_ldap_common.so] sdap_ad_tokengroups_initgr_mapping_done+0x13b [libsss_ldap_common.so] sdap_get_ad_tokengroups_done+0x22f [libsss_ldap_common.so] generic_ext_search_handler.isra.4+0xe0 [libsss_ldap_common.so] sdap_get_generic_op_finished+0x5ba [libsss_ldap_common.so] sdap_process_result+0x71d [libsss_ldap_common.so] tevent_common_loop_timer_delay+0xd0 [libtevent.so.0.9.30] epoll_event_loop_once+0x69 [li avg:60 min: 60 max: 60 sum: 60
14 hits of transaction sysdb_transaction_commit+0xa0 [libsss_util.so] sdap_nested_done+0x2d5 [libsss_ldap_common.so] sdap_nested_group_process_done+0x1fe [libsss_ldap_common.so] tevent_common_loop_immediate+0xd4 [libtevent.so.0.9.30] epoll_event_loop_once+0x5d [libtevent.so.0.9.30] std_event_loop_once+0x37 [libtevent.so.0.9.30] _tevent_loop_once+0x9d [libtevent.so.0.9.30] tevent_common_loop_wait+0x1b [libtevent.so.0.9.30] std_event_loop_wait+0x37 [libtevent.so.0.9.30] server_loop+0x13 [libsss_util.so] 0x55fcef0ca677 [sssd_be+0x8 avg:29 min: 20 max: 47 sum: 416
1 hits of transaction sysdb_transaction_commit+0xa0 [libsss_util.so] sdap_save_users+0x28a [libsss_ldap_common.so] sdap_get_users_done+0xd2 [libsss_ldap_common.so] sdap_search_user_process+0x25f [libsss_ldap_common.so] generic_ext_search_handler.isra.4+0xe0 [libsss_ldap_common.so] sdap_get_generic_op_finished+0x5ba [libsss_ldap_common.so] sdap_process_result+0x71d [libsss_ldap_common.so] tevent_common_loop_timer_delay+0xd0 [libtevent.so.0.9.30] epoll_event_loop_once+0x69 [libtevent.so.0.9.30] std_event_loop_once+0x37 [libtevent avg:116 min: 116 max: 116 sum: 116
1 hits of transaction sysdb_transaction_commit+0xa0 [libsss_util.so] sdap_nested_done+0x2d5 [libsss_ldap_common.so] sdap_nested_group_process_done+0x1fe [libsss_ldap_common.so] sdap_nested_group_process_done+0x21e [libsss_ldap_common.so] tevent_common_loop_immediate+0xd4 [libtevent.so.0.9.30] epoll_event_loop_once+0x5d [libtevent.so.0.9.30] std_event_loop_once+0x37 [libtevent.so.0.9.30] _tevent_loop_once+0x9d [libtevent.so.0.9.30] tevent_common_loop_wait+0x1b [libtevent.so.0.9.30] std_event_loop_wait+0x37 [libtevent.so.0.9.30]
avg:83 min: 83 max: 83 sum: 83
1 hits of transaction sysdb_transaction_commit+0xa0 [libsss_util.so] sdap_get_initgr_user+0x216 [libsss_ldap_common.so] generic_ext_search_handler.isra.4+0xe0 [libsss_ldap_common.so] sdap_get_generic_op_finished+0x5ba [libsss_ldap_common.so] sdap_process_result+0x71d [libsss_ldap_common.so] tevent_common_loop_timer_delay+0xd0 [libtevent.so.0.9.30] epoll_event_loop_once+0x69 [libtevent.so.0.9.30] std_event_loop_once+0x37 [libtevent.so.0.9.30] _tevent_loop_once+0x9d [libtevent.so.0.9.30] tevent_common_loop_wait+0x1b [libtevent.so avg:6 min: 6 max: 6 sum: 6
2 hits of transaction sysdb_transaction_commit+0xa0 [libsss_util.so] sdap_nested_done+0x2d5 [libsss_ldap_common.so] sdap_nested_group_process_done+0x21e [libsss_ldap_common.so] sdap_nested_group_process_done+0x1fe [libsss_ldap_common.so] tevent_common_loop_immediate+0xd4 [libtevent.so.0.9.30] epoll_event_loop_once+0x5d [libtevent.so.0.9.30] std_event_loop_once+0x37 [libtevent.so.0.9.30] _tevent_loop_once+0x9d [libtevent.so.0.9.30] tevent_common_loop_wait+0x1b [libtevent.so.0.9.30] std_event_loop_wait+0x37 [libtevent.so.0.9.30]
avg:30 min: 26 max: 34 sum: 60
The most expensive transaction breakdown, per transaction: value |-------------------------------------------------- count 0 | 0 50 |@ 1 100 | 0 150 | 0 ~ 450 | 0 500 | 0
500 |@ 1
############################################################ ########################################
[root@fc25-vm systemtap]# stap -d /usr/lib64/libtevent.so.0.9.30 -d /usr/libexec/sssd/sssd_be id_perf.stp WARNING: Missing unwind data for a module, rerun with 'stap -d /usr/lib64/ libc-2.24.so' Total run time of id was: 162710 ms Number of zero-level cache transactions: 32 Time spent in level-0 sysdb transactions: 944 ms Time spent writing to LDB: 301 ms Number of LDAP searches: 141 Time spent waiting for LDAP: 77244 ms LDAP searches breakdown: Number of user requests: 1 Time spent in user requests: 136
Number of group requests: 29 Time spent in group requests: 3561 Number of initgroups requests: 1 Time spent in initgroups requests: 143
Unaccounted time: 84522 ms sysdb transaction breakdown: 9 hits of transaction sysdb_transaction_commit+0xa0 [libsss_util.so] sdap_nested_done+0x2d5 [libsss_ldap_common.so] sdap_nested_group_process_done+0x1fe [libsss_ldap_common.so] tevent_common_loop_immediate+0xd4 [libtevent.so.0.9.30] epoll_event_loop_once+0x5d [libtevent.so.0.9.30] std_event_loop_once+0x37 [libtevent.so.0.9.30] _tevent_loop_once+0x9d [libtevent.so.0.9.30] tevent_common_loop_wait+0x1b [libtevent.so.0.9.30] std_event_loop_wait+0x37 [libtevent.so.0.9.30] server_loop+0x13 [libsss_util.so] main+0x707 [sssd_be] 0x7f76 avg:28 min: 22 max: 37 sum: 254
1 hits of transaction sysdb_transaction_commit+0xa0 [libsss_util.so] sdap_get_initgr_user+0x216 [libsss_ldap_common.so] generic_ext_search_handler.isra.4+0xe0 [libsss_ldap_common.so] sdap_get_generic_op_finished+0x5ba [libsss_ldap_common.so] sdap_process_result+0x71d [libsss_ldap_common.so] tevent_common_loop_timer_delay+0xd0 [libtevent.so.0.9.30] epoll_event_loop_once+0x69 [libtevent.so.0.9.30] std_event_loop_once+0x37 [libtevent.so.0.9.30] _tevent_loop_once+0x9d [libtevent.so.0.9.30] tevent_common_loop_wait+0x1b [libtevent.so avg:5 min: 5 max: 5 sum: 5
3 hits of transaction sysdb_transaction_commit+0xa0 [libsss_util.so] sdap_nested_done+0x2d5 [libsss_ldap_common.so] tevent_common_loop_immediate+0xd4 [libtevent.so.0.9.30] epoll_event_loop_once+0x5d [libtevent.so.0.9.30] std_event_loop_once+0x37 [libtevent.so.0.9.30] _tevent_loop_once+0x9d [libtevent.so.0.9.30] tevent_common_loop_wait+0x1b [libtevent.so.0.9.30] std_event_loop_wait+0x37 [libtevent.so.0.9.30] server_loop+0x13 [libsss_util.so] main+0x707 [sssd_be] 0x7f7675d86401 [libc-2.24.so+0x20401] avg:21 min: 21 max: 22 sum: 64
1 hits of transaction sysdb_transaction_commit+0xa0 [libsss_util.so] sdap_nested_done+0x2d5 [libsss_ldap_common.so] sdap_nested_group_process_done+0x21e [libsss_ldap_common.so] tevent_common_loop_immediate+0xd4 [libtevent.so.0.9.30] epoll_event_loop_once+0x5d [libtevent.so.0.9.30] std_event_loop_once+0x37 [libtevent.so.0.9.30] _tevent_loop_once+0x9d [libtevent.so.0.9.30] tevent_common_loop_wait+0x1b [libtevent.so.0.9.30] std_event_loop_wait+0x37 [libtevent.so.0.9.30] server_loop+0x13 [libsss_util.so] main+0x707 [sssd_be] 0x7fc1 avg:35 min: 35 max: 35 sum: 35
1 hits of transaction sysdb_transaction_commit+0xa0 [libsss_util.so] sdap_nested_done+0x2d5 [libsss_ldap_common.so] sdap_nested_group_process_done+0x1fe [libsss_ldap_common.so] sdap_nested_group_process_done+0x21e [libsss_ldap_common.so] tevent_common_loop_immediate+0xd4 [libtevent.so.0.9.30] epoll_event_loop_once+0x5d [libtevent.so.0.9.30] std_event_loop_once+0x37 [libtevent.so.0.9.30] _tevent_loop_once+0x9d [libtevent.so.0.9.30] tevent_common_loop_wait+0x1b [libtevent.so.0.9.30] std_event_loop_wait+0x37 [libtevent.so.0.9.30]
avg:42 min: 42 max: 42 sum: 42
1 hits of transaction sysdb_transaction_commit+0xa0 [libsss_util.so] sdap_ad_save_group_membership_with_idmapping+0x532 [libsss_ldap_common.so] sdap_ad_tokengroups_initgr_mapping_done+0x13b [libsss_ldap_common.so] sdap_get_ad_tokengroups_done+0x22f [libsss_ldap_common.so] generic_ext_search_handler.isra.4+0xe0 [libsss_ldap_common.so] sdap_get_generic_op_finished+0x5ba [libsss_ldap_common.so] sdap_process_result+0x71d [libsss_ldap_common.so] tevent_common_loop_timer_delay+0xd0 [libtevent.so.0.9.30] epoll_event_loop_once+0x69 [li avg:86 min: 86 max: 86 sum: 86
2 hits of transaction sysdb_transaction_commit+0xa0 [libsss_util.so] sdap_nested_done+0x2d5 [libsss_ldap_common.so] sdap_nested_group_process_done+0x21e [libsss_ldap_common.so] sdap_nested_group_process_done+0x1fe [libsss_ldap_common.so] tevent_common_loop_immediate+0xd4 [libtevent.so.0.9.30] epoll_event_loop_once+0x5d [libtevent.so.0.9.30] std_event_loop_once+0x37 [libtevent.so.0.9.30] _tevent_loop_once+0x9d [libtevent.so.0.9.30] tevent_common_loop_wait+0x1b [libtevent.so.0.9.30] std_event_loop_wait+0x37 [libtevent.so.0.9.30]
avg:31 min: 26 max: 36 sum: 62
2 hits of transaction sysdb_transaction_commit+0xa0 [libsss_util.so] sdap_nested_done+0x2d5 [libsss_ldap_common.so] sdap_nested_group_process_done+0x21e [libsss_ldap_common.so] sdap_nested_group_process_done+0x21e [libsss_ldap_common.so] tevent_common_loop_immediate+0xd4 [libtevent.so.0.9.30] epoll_event_loop_once+0x5d [libtevent.so.0.9.30] std_event_loop_once+0x37 [libtevent.so.0.9.30] _tevent_loop_once+0x9d [libtevent.so.0.9.30] tevent_common_loop_wait+0x1b [libtevent.so.0.9.30] std_event_loop_wait+0x37 [libtevent.so.0.9.30]
avg:25 min: 24 max: 26 sum: 50
1 hits of transaction sysdb_transaction_commit+0xa0 [libsss_util.so] sdap_nested_done+0x2d5 [libsss_ldap_common.so] sdap_nested_group_process_done+0x1fe [libsss_ldap_common.so] tevent_common_loop_immediate+0xd4 [libtevent.so.0.9.30] epoll_event_loop_once+0x5d [libtevent.so.0.9.30] std_event_loop_once+0x37 [libtevent.so.0.9.30] _tevent_loop_once+0x9d [libtevent.so.0.9.30] tevent_common_loop_wait+0x1b [libtevent.so.0.9.30] std_event_loop_wait+0x37 [libtevent.so.0.9.30] server_loop+0x13 [libsss_util.so] main+0x707 [sssd_be] 0x7fc1 avg:35 min: 35 max: 35 sum: 35
4 hits of transaction sysdb_transaction_commit+0xa0 [libsss_util.so] sdap_nested_done+0x2d5 [libsss_ldap_common.so] sdap_nested_group_process_done+0x21e [libsss_ldap_common.so] tevent_common_loop_immediate+0xd4 [libtevent.so.0.9.30] epoll_event_loop_once+0x5d [libtevent.so.0.9.30] std_event_loop_once+0x37 [libtevent.so.0.9.30] _tevent_loop_once+0x9d [libtevent.so.0.9.30] tevent_common_loop_wait+0x1b [libtevent.so.0.9.30] std_event_loop_wait+0x37 [libtevent.so.0.9.30] server_loop+0x13 [libsss_util.so] main+0x707 [sssd_be] 0x7f76 avg:27 min: 21 max: 36 sum: 111
1 hits of transaction sysdb_transaction_commit+0xa0 [libsss_util.so] sdap_save_users+0x28a [libsss_ldap_common.so] sdap_get_users_done+0xd2 [libsss_ldap_common.so] sdap_search_user_process+0x25f [libsss_ldap_common.so] generic_ext_search_handler.isra.4+0xe0 [libsss_ldap_common.so] sdap_get_generic_op_finished+0x5ba [libsss_ldap_common.so] sdap_process_result+0x71d [libsss_ldap_common.so] tevent_common_loop_timer_delay+0xd0 [libtevent.so.0.9.30] epoll_event_loop_once+0x69 [libtevent.so.0.9.30] std_event_loop_once+0x37 [libtevent avg:114 min: 114 max: 114 sum: 114
3 hits of transaction sysdb_transaction_commit+0xa0 [libsss_util.so] sdap_nested_done+0x2d5 [libsss_ldap_common.so] sdap_nested_group_process_done+0x1fe [libsss_ldap_common.so] sdap_nested_group_process_done+0x1fe [libsss_ldap_common.so] tevent_common_loop_immediate+0xd4 [libtevent.so.0.9.30] epoll_event_loop_once+0x5d [libtevent.so.0.9.30] std_event_loop_once+0x37 [libtevent.so.0.9.30] _tevent_loop_once+0x9d [libtevent.so.0.9.30] tevent_common_loop_wait+0x1b [libtevent.so.0.9.30] std_event_loop_wait+0x37 [libtevent.so.0.9.30]
avg:28 min: 26 max: 33 sum: 86
The most expensive transaction breakdown, per transaction: value |-------------------------------------------------- count 0 | 0 50 | 0 100 |@ 1 150 | 0 200 | 0
############################################################ ######################################## Caching appears to be taking effect here. Guessing I should clear the user cache, is there a best practice for that, rather than just clearing out the sss/db/ directory? ############################################################ ######################################## [root@fc25-vm systemtap]# stap nested_group_perf.stp ^CTime spent in group sssd_be searches: 2033 Time spent in sdap_nested_group_send/recv: 1112 ms (ratio: 54.69%) Time spent in zero-level sysdb transactions: 699 ms (ratio: 34.38%)
Breakdown of sdap_nested_group req (total: 1112 ms) sdap_nested_group_process req: 1111 sdap_nested_group_process_split req: 11 sdap_nested_group_check_cache: 11 sdap_nested_group_sysdb_search_users: 6 sdap_nested_group_sysdb_search_groups: 3 ldap request breakdown of total 2144 sdap_nested_group_deref req: 54 sdap_deref_search_send req 54 processing deref results: 0 sdap_nested_group_lookup_user req: 979 sdap_nested_group_lookup_group req: 66 Time spent refreshing unknown members: 1045
Breakdown of results processing (total 699) Time spent populating nested members: 3 Time spent searching ldb while populating nested members: 3 Time spent saving nested members: 330 Time spent writing to the ldb: 226 ms
############################################################ ######################################## This user is from a completely different department, so it likely didn't hit the cache much. ############################################################ ######################################## [root@fc25-vm systemtap]# stap nested_group_perf.stp ^CTime spent in group sssd_be searches: 12502 Time spent in sdap_nested_group_send/recv: 4642 ms (ratio: 37.13%) Time spent in zero-level sysdb transactions: 1099 ms (ratio: 8.79%)
Breakdown of sdap_nested_group req (total: 4642 ms) sdap_nested_group_process req: 4971 sdap_nested_group_process_split req: 53 sdap_nested_group_check_cache: 51 sdap_nested_group_sysdb_search_users: 17 sdap_nested_group_sysdb_search_groups: 33 ldap request breakdown of total 55789 sdap_nested_group_deref req: 2908 sdap_deref_search_send req 50266 processing deref results: 2 sdap_nested_group_lookup_user req: 2138 sdap_nested_group_lookup_group req: 623 Time spent refreshing unknown members: 2762
Breakdown of results processing (total 1099) Time spent populating nested members: 29 Time spent searching ldb while populating nested members: 23 Time spent saving nested members: 635 Time spent writing to the ldb: 303 ms _______________________________________________ sssd-users mailing list -- sssd-users@lists.fedorahosted.org To unsubscribe send an email to sssd-users-leave@lists.fedorahosted.org
sssd-users mailing list -- sssd-users@lists.fedorahosted.org To unsubscribe send an email to sssd-users-leave@lists.fedorahosted.org
On 28 November 2016 at 18:46, Jakub Hrozek jhrozek@redhat.com wrote:
On Mon, Nov 28, 2016 at 02:45:59PM +1100, Lachlan Musicman wrote:
I'm on Centos 7.2 and thought I'd try this to see how it works. Instead
of
nice output I got and error. Have I done something wrong?
[root@linux systemtap]# stap id_perf.stp ERROR: empty aggregate near identifier 'print' at id_perf.stp:68:5
Ah, sorry, this is a silly bug in the script. In your case, id returned data from the cache (as you can see the time id ran was only 3ms) so there were no writable transactions done and the script ended up dereferencing an empty aggregate that normally saves the transaction times.
I've opened a PR on github to fix the script, but runnning sss_cache before you run the script next time should give 'better' results.
Yes, that worked - thanks.
cheers L.
------ The most dangerous phrase in the language is, "We've always done it this way."
- Grace Hopper
On Fri, Nov 25, 2016 at 11:05:49PM -0000, zfnoctis@gmail.com wrote:
Sorry for the delay, the holidays managed to take up more of my spare time than anticipated.
This was done on Fedora workstation 25. Overall sssd performance appears to be noticeably better than Ubuntu 16.04/16.10 and Fedora 24. Please let me know if I made any mistakes running these tests, or if I need to run additional tests.
#################################################################################################### This was my quick attempt at the USE method in order to rule out other causes of performance issues. #################################################################################################### task: id <domain user>
vmstat, pidstat, and top/htop are showing high amounts of CPU time being spent in user-space. Looks to be sssd_be.
e.g. from pidstat 02:34:39 PM UID PID %usr %system %guest %CPU CPU Command 02:34:40 PM 0 12518 100.00 0.00 0.00 100.00 0 sssd_be
iostat is showing minimal I/O occurring, disks are mostly idle. Not seeing any errors or queuing.
nicstat shows network interfaces never go above 60% utilization. No network errors reported. No queuing.
free is not showing any serious memory pressure, no swapping is occurring.
Unsure how to measure interconnects between subsystems. May use perf/systemtap.
####################################################################################################
These runs were of id <domainuser> and each was of a different user so as to hopefully avoid caching. Each time there were warnings of missing unwind data, is there a set of packages I need to install to resolve this?
####################################################################################################
[root@fc25-vm systemtap]# stap id_perf.stp WARNING: Missing unwind data for a module, rerun with 'stap -d /usr/lib64/libtevent.so.0.9.30' Total run time of id was: 91098 ms Number of zero-level cache transactions: 15 Time spent in level-0 sysdb transactions: 324 ms Time spent writing to LDB: 93 ms Number of LDAP searches: 83 Time spent waiting for LDAP: 1480105770919 ms
OK, this number is clearly wrong. Nonetheless, we also see that even with the number of time spent in writing to the cache quite low (324 ms), the total run was really long (90+ seconds)
We also see that the majority of time is spent resolving groups.
[...]
#################################################################################################### Caching appears to be taking effect here. Guessing I should clear the user cache, is there a best practice for that, rather than just clearing out the sss/db/ directory?
sss_cache -UG
#################################################################################################### [root@fc25-vm systemtap]# stap nested_group_perf.stp ^CTime spent in group sssd_be searches: 2033 Time spent in sdap_nested_group_send/recv: 1112 ms (ratio: 54.69%) Time spent in zero-level sysdb transactions: 699 ms (ratio: 34.38%)
Breakdown of sdap_nested_group req (total: 1112 ms) sdap_nested_group_process req: 1111 sdap_nested_group_process_split req: 11 sdap_nested_group_check_cache: 11 sdap_nested_group_sysdb_search_users: 6 sdap_nested_group_sysdb_search_groups: 3 ldap request breakdown of total 2144 sdap_nested_group_deref req: 54 sdap_deref_search_send req 54 processing deref results: 0 sdap_nested_group_lookup_user req: 979 sdap_nested_group_lookup_group req: 66 Time spent refreshing unknown members: 1045
Breakdown of results processing (total 699) Time spent populating nested members: 3 Time spent searching ldb while populating nested members: 3 Time spent saving nested members: 330 Time spent writing to the ldb: 226 ms
#################################################################################################### This user is from a completely different department, so it likely didn't hit the cache much. ####################################################################################################
Yes, for 'best' (slowest in this respect) results, either expire or remove the cache.
[root@fc25-vm systemtap]# stap nested_group_perf.stp ^CTime spent in group sssd_be searches: 12502 Time spent in sdap_nested_group_send/recv: 4642 ms (ratio: 37.13%) Time spent in zero-level sysdb transactions: 1099 ms (ratio: 8.79%)
Breakdown of sdap_nested_group req (total: 4642 ms) sdap_nested_group_process req: 4971 sdap_nested_group_process_split req: 53 sdap_nested_group_check_cache: 51 sdap_nested_group_sysdb_search_users: 17 sdap_nested_group_sysdb_search_groups: 33 ldap request breakdown of total 55789 sdap_nested_group_deref req: 2908 sdap_deref_search_send req 50266
Here the results are also a bit suspicious. The total time the script took is 12 seconds, but the sdap_deref_search_send request is recorded for 50+ seconds..do you know which of these is closer to the real running time?
It also looks like we need to make the probes a bit more detailed, looking at some of the requests, like sdap_deref_search_send/_recv, we include both the LDAP search and the results processing in the probed time, we should probably break those two down further.
Could you please run this nested_group_perf.stp script a couple more times with cache expired between the 'id' runs so that we can average the results better?
processing deref results: 2 sdap_nested_group_lookup_user req: 2138 sdap_nested_group_lookup_group req: 623 Time spent refreshing unknown members: 2762
Breakdown of results processing (total 1099) Time spent populating nested members: 29 Time spent searching ldb while populating nested members: 23 Time spent saving nested members: 635 Time spent writing to the ldb: 303 ms _______________________________________________ sssd-users mailing list -- sssd-users@lists.fedorahosted.org To unsubscribe send an email to sssd-users-leave@lists.fedorahosted.org
Here are some more runs of nested_group_perf.stp.
Running fc25 workstation, kernel 4.8.8-300.fc25.x86_64, gnome desktop.
############################################################################################ Noticed a small issue with the system tap script ############################################################################################ nested_group_perf.stp
When missing a package it suggests using yum rather than dnf.
e.g.
“use: yum install kernel-devel-4.8.8-300.fc25.x86_64”
############################################################################################ Modifications to /etc/sssd/sssd.conf: dyndns_update = True ############################################################################################ real 1m23.424s user 0m0.006s sys 0m0.013s
[root@fc25-vm systemtap]# stap nested_group_perf.stp ^CTime spent in group sssd_be searches: 18543 Time spent in sdap_nested_group_send/recv: 12793 ms (ratio: 68.99%) Time spent in zero-level sysdb transactions: 4894 ms (ratio: 26.39%)
Breakdown of sdap_nested_group req (total: 12793 ms) sdap_nested_group_process req: 10973 sdap_nested_group_process_split req: 502 sdap_nested_group_check_cache: 459 sdap_nested_group_sysdb_search_users: 226 sdap_nested_group_sysdb_search_groups: 213 ldap request breakdown of total 22629 sdap_nested_group_deref req: 7462 sdap_deref_search_send req 12399 processing deref results: 534 sdap_nested_group_lookup_user req: 4611 sdap_nested_group_lookup_group req: 791 Time spent refreshing unknown members: 4828
Breakdown of results processing (total 4894) Time spent populating nested members: 1581 Time spent searching ldb while populating nested members: 1372 Time spent saving nested members: 2829 Time spent writing to the ldb: 470 ms
############################################################################################ Same user, ran sss_cache -UG, but perhaps some organizational information is still cached?
Not sure what else accounts for the time difference. ############################################################################################ real 0m49.898s user 0m0.004s sys 0m0.007s
[root@fc25-vm systemtap]# stap nested_group_perf.stp ^CTime spent in group sssd_be searches: 16000 Time spent in sdap_nested_group_send/recv: 7629 ms (ratio: 47.68%) Time spent in zero-level sysdb transactions: 1770 ms (ratio: 11.06%)
Breakdown of sdap_nested_group req (total: 7629 ms) sdap_nested_group_process req: 5971 sdap_nested_group_process_split req: 302 sdap_nested_group_check_cache: 281 sdap_nested_group_sysdb_search_users: 150 sdap_nested_group_sysdb_search_groups: 120 ldap request breakdown of total 21757 sdap_nested_group_deref req: 3708 sdap_deref_search_send req 14555 processing deref results: 36 sdap_nested_group_lookup_user req: 3243 sdap_nested_group_lookup_group req: 583 Time spent refreshing unknown members: 3376
Breakdown of results processing (total 1770) Time spent populating nested members: 302 Time spent searching ldb while populating nested members: 268 Time spent saving nested members: 1236 Time spent writing to the ldb: 212 ms
############################################################################################ Different user than before. ############################################################################################ real 0m41.878s user 0m0.004s sys 0m0.003s
[root@fc25-vm systemtap]# stap nested_group_perf.stp ^CTime spent in group sssd_be searches: 16000 Time spent in sdap_nested_group_send/recv: 7629 ms (ratio: 47.68%) Time spent in zero-level sysdb transactions: 1770 ms (ratio: 11.06%)
Breakdown of sdap_nested_group req (total: 7629 ms) sdap_nested_group_process req: 5971 sdap_nested_group_process_split req: 302 sdap_nested_group_check_cache: 281 sdap_nested_group_sysdb_search_users: 150 sdap_nested_group_sysdb_search_groups: 120 ldap request breakdown of total 21757 sdap_nested_group_deref req: 3708 sdap_deref_search_send req 14555 processing deref results: 36 sdap_nested_group_lookup_user req: 3243 sdap_nested_group_lookup_group req: 583 Time spent refreshing unknown members: 3376
Breakdown of results processing (total 1770) Time spent populating nested members: 302 Time spent searching ldb while populating nested members: 268 Time spent saving nested members: 1236 Time spent writing to the ldb: 212 ms
[root@fc25-vm systemtap]# stap nested_group_perf.stp ^CTime spent in group sssd_be searches: 6862 Time spent in sdap_nested_group_send/recv: 5495 ms (ratio: 80.07%) Time spent in zero-level sysdb transactions: 1133 ms (ratio: 16.51%)
Breakdown of sdap_nested_group req (total: 5495 ms) sdap_nested_group_process req: 4024 sdap_nested_group_process_split req: 182 sdap_nested_group_check_cache: 158 sdap_nested_group_sysdb_search_users: 49 sdap_nested_group_sysdb_search_groups: 90 ldap request breakdown of total 18417 sdap_nested_group_deref req: 2900 sdap_deref_search_send req 14437 processing deref results: 46 sdap_nested_group_lookup_user req: 1687 sdap_nested_group_lookup_group req: 482 Time spent refreshing unknown members: 1811
Breakdown of results processing (total 1133) Time spent populating nested members: 88 Time spent searching ldb while populating nested members: 63 Time spent saving nested members: 910 Time spent writing to the ldb: 77 ms
############################################################################################ ^C[root@fc25-vm systemtap]# stap nested_group_perf.stp ^CTime spent in group sssd_be searches: 5021 Time spent in sdap_nested_group_send/recv: 10726 ms (ratio: 213.62%) Time spent in zero-level sysdb transactions: 368 ms (ratio: 7.32%)
Breakdown of sdap_nested_group req (total: 10726 ms) sdap_nested_group_process req: 1480525834936 sdap_nested_group_process_split req: 124 sdap_nested_group_check_cache: 51 sdap_nested_group_sysdb_search_users: 31 sdap_nested_group_sysdb_search_groups: 19 ldap request breakdown of total 22264 sdap_nested_group_deref req: 1480525833504 sdap_deref_search_send req 16770 processing deref results: 1480525825718 sdap_nested_group_lookup_user req: 2637 sdap_nested_group_lookup_group req: 110 Time spent refreshing unknown members: 2747
Breakdown of results processing (total 368) Time spent populating nested members: 4729 Time spent searching ldb while populating nested members: 3947 Time spent saving nested members: 221 Time spent writing to the ldb: 95 ms
############################################################################################
real 2m4.412s user 0m0.004s sys 0m0.005s
[root@fc25-vm systemtap]# stap nested_group_perf.stp ^C ^CTime spent in group sssd_be searches: 8083 Time spent in sdap_nested_group_send/recv: 36901 ms (ratio: 456.52%) Time spent in zero-level sysdb transactions: 2329 ms (ratio: 28.81%)
Breakdown of sdap_nested_group req (total: 36901 ms) sdap_nested_group_process req: 35943 sdap_nested_group_process_split req: 177 sdap_nested_group_check_cache: 146 sdap_nested_group_sysdb_search_users: 50 sdap_nested_group_sysdb_search_groups: 87 ldap request breakdown of total 29401 sdap_nested_group_deref req: 34914 sdap_deref_search_send req 26218 processing deref results: 25004 sdap_nested_group_lookup_user req: 1385 sdap_nested_group_lookup_group req: 340 Time spent refreshing unknown members: 1458
Breakdown of results processing (total 2329) Time spent populating nested members: 826 Time spent searching ldb while populating nested members: 590 Time spent saving nested members: 1786 Time spent writing to the ldb: 256 ms
############################################################################################ su - <domain user> ############################################################################################ [root@fc25-vm systemtap]# stap nested_group_perf.stp ^CTime spent in group sssd_be searches: 545 Time spent in sdap_nested_group_send/recv: 1480527569532 ms (ratio: 271656434776.51%) Time spent in zero-level sysdb transactions: 112 ms (ratio: 20.55%)
Breakdown of sdap_nested_group req (total: 1480527569532 ms) sdap_nested_group_process req: 1480527569440 sdap_nested_group_process_split req: 11 sdap_nested_group_check_cache: 11 sdap_nested_group_sysdb_search_users: 5 sdap_nested_group_sysdb_search_groups: 6 ldap request breakdown of total 451 sdap_nested_group_deref req: 1480527569424 sdap_deref_search_send req 192 processing deref results: 1480527569190 sdap_nested_group_lookup_user req: 114 sdap_nested_group_lookup_group req: 15 Time spent refreshing unknown members: 130
Breakdown of results processing (total 112) Time spent populating nested members: 649 Time spent searching ldb while populating nested members: 438 Time spent saving nested members: 58 Time spent writing to the ldb: 38 ms
I've created some flamegraphs of sssd_be, hovering your mouse over the elements should show more information.
This is of sssd_be while running "id <domain user>" with the caches completely flushed: http://svgshare.com/i/MJ.svg
This is of sssd_be while running "sudo -l" as a domain user with sudo privileges: http://svgur.com/i/MX.svg
Clearly there are some symbols missing, but I wasn't sure how to resolve this. If someone could point me to the package(s) needed to fix this, I'd be happy to rerun them.
On Mon, Dec 05, 2016 at 03:54:46AM -0000, zfnoctis@gmail.com wrote:
I've created some flamegraphs of sssd_be, hovering your mouse over the elements should show more information.
This is of sssd_be while running "id <domain user>" with the caches completely flushed: http://svgshare.com/i/MJ.svg
This is of sssd_be while running "sudo -l" as a domain user with sudo privileges: http://svgur.com/i/MX.svg
Clearly there are some symbols missing, but I wasn't sure how to resolve this. If someone could point me to the package(s) needed to fix this, I'd be happy to rerun them.
First, sorry for not following up sooner, but there was quite a bit of different new cases and bugs coming in internally last week, so the sssd-users list responses are a bit delayed. But some quick thoughts on the systemtap output and the flame graphs: 1) I think it would be nice (and this is a work for us) to decorate the paths in sssd_be a bit more with systemtap probes because there the probes just indicate that sssd started parsing entries which takes a long time, but we need to break down that part better into smaller parts to see which exact part of code is slow
2) Thank you very much for the flame graphs. I'm also a fan of Brendan Gregg's work :-) Could you tell us how exactly did you generate those? And to resolve the missing symbols, normally debuginfo-install sssd should be enough. If you already did that, maybe 'dnf --enablerepo=*debuginfo upgrade' would sync the debuginfo packages in sync with the non-debuginfo ones.
btw especially the flame graphs that captures the sudo output IMO correlatest with the theory that parsing the entries we receive from LDAP should be optimized, because the strcasecmp() function appears so often in the flame graph and I suspect this is coming from us finding the attributes and the values we are interested in.
I will try to find a bit more time this week to follow up (or ask some other developer :-) and decorate the hot pats with more systemtap probes.
Thank you for working on us on investigating the performance.
On (05/12/16 10:26), Jakub Hrozek wrote:
On Mon, Dec 05, 2016 at 03:54:46AM -0000, zfnoctis@gmail.com wrote:
I've created some flamegraphs of sssd_be, hovering your mouse over the elements should show more information.
This is of sssd_be while running "id <domain user>" with the caches completely flushed: http://svgshare.com/i/MJ.svg
This is of sssd_be while running "sudo -l" as a domain user with sudo privileges: http://svgur.com/i/MX.svg
Clearly there are some symbols missing, but I wasn't sure how to resolve this. If someone could point me to the package(s) needed to fix this, I'd be happy to rerun them.
First, sorry for not following up sooner, but there was quite a bit of different new cases and bugs coming in internally last week, so the sssd-users list responses are a bit delayed. But some quick thoughts on the systemtap output and the flame graphs:
I think it would be nice (and this is a work for us) to decorate the paths in sssd_be a bit more with systemtap probes because there the probes just indicate that sssd started parsing entries which takes a long time, but we need to break down that part better into smaller parts to see which exact part of code is slow
Thank you very much for the flame graphs. I'm also a fan of Brendan Gregg's work :-) Could you tell us how exactly did you generate those? And to resolve the missing symbols, normally debuginfo-install sssd should be enough. If you already did that, maybe 'dnf --enablerepo=*debuginfo upgrade' would sync the debuginfo packages in sync with the non-debuginfo ones.
"dnf/dnf debuginfo-install" does not update packages by default. It can be changed in /etc/dnf/plugins/debuginfo-install.conf
LS
No luck so far on fixing those missing symbols.
“dnf debuginfo-install sssd” does not seem to fix the broken stacks.
“dnf --enablerepo=*debuginfo upgrade” Is not possible currently, as performing a dnf upgrade on Fedora 25 causes it to boot to a black screen, and I don't have the time or the energy to troubleshoot issues with X/Wayland/gdm at the moment.
I must say it is a bit strange how unstable Fedora is, Arch and Gentoo -unstable are remarkably more stable than Fedora in my experience.
Regarding the flame graph creation I just did the following:
git clone https://github.com/brendangregg/FlameGraph cd FlameGraph perf record -F 99 -p $(pidof sssd_be) -g -- sleep 90 perf script | ./stackcollapse-perf.pl > out1.perf-folded ./flamegraph.pl out1.perf-folded > perf-sssd1.svg
I'll assume this is no longer considered an active issue. Hopefully the next few releases of sssd will improve upon this problem.
On Thu, Dec 22, 2016 at 07:06:49PM -0000, zfnoctis@gmail.com wrote:
I'll assume this is no longer considered an active issue. Hopefully the next few releases of sssd will improve upon this problem.
It is, so far we're planning to look into these areas in the next version: https://fedorahosted.org/sssd/wiki/DesignDocs/OneFifteenPerformanceImproveme...
It's quite high-level, but hopefully at least shows what's cooking.
On Sat, Nov 19, 2016 at 05:57:23PM -0000, zfnoctis@gmail.com wrote:
Hi,
I'm wondering if there are any plans to improve sssd performance on large active directory domains (100k+ users, 40k+ groups), or if there are settings I am not aware of that can greatly improve performance, specifically for workstation use cases.
Currently if I do not set "ignore_group_members = True" in sssd.conf, logins can take upwards of 6 minutes and "sssd_be" will max the CPU for up to 20 minutes after logon, which makes it a non-starter. The reason I want to allow group members to be seen is that I want certain domain groups to be able to perform elevated actions using polkit. If I ignore group members, polkit reports that the group is empty and so no one can elevate in the graphical environment.
I would say here polkit could be improved in addition to sssd. If polkit is calling getgr* to find if a user is a member of a certain group it's neither going to be precise nor will that work on large environments.
Did you ask on a polkit list why it's evaluating membership in a group with getgr*?
Ultimately this means that Linux workstations are at a severe disadvantage since they cannot be bound to the domain and have the normal set of access features users and IT expect from macOS or Windows.
Distributions used: Ubuntu 16.04 (sssd 1.13.4-1ubuntu1.1), Ubuntu 16.10 (sssd 1.13.4-3) and Fedora 24 (sssd-1.13.4-3.fc24). All exhibit the same problems.
I've also tried "ldap_group_nesting_level = 1" without seeing any noticeable improvement with respect to performance. Putting the database on /tmp isn't viable as these are workstations that will reboot semi-frequently, and I don't believe this is an I/O bound performance issue anyways.
Thanks for your time. _______________________________________________ sssd-users mailing list -- sssd-users@lists.fedorahosted.org To unsubscribe send an email to sssd-users-leave@lists.fedorahosted.org
Thanks for the response, I'm not familiar with polkit's code, do you have any suggestions where I could confirm it is using getgr()? I'd be happy to speak with polkit developers if that is the crux of the issue.
I have reached out to the polkit-devel mailing list, I'll see what they have to say.
For reference, here is the message I have sent to the polkit-devel list. No responses so far.
https://lists.freedesktop.org/archives/polkit-devel/2016-November/000513.htm...
I want to get some links to the relevant bugs into this old thread for the benefit of anyone finding this thread in the archives...
Currently if I do not set "ignore_group_members = True" in sssd.conf, logins can take upwards of 6 minutes and "sssd_be" will max the CPU for up to 20 minutes after logon, which makes it a non-starter. The reason I want to allow group members to be seen is that I want certain domain groups to be able to perform elevated actions using polkit. If I ignore group members, polkit reports that the group is empty and so no one can elevate in the graphical environment.
I would say here polkit could be improved in addition to sssd. If polkit is calling getgr* to find if a user is a member of a certain group it's neither going to be precise nor will that work on large environments.
Did you ask on a polkit list why it's evaluating membership in a group with getgr*?
I think Polkit does this so that it can provide the authentication agent with a list of users for the user to choose from.
There's this on sssd-devel: https://lists.freedesktop.org/archives/polkit-devel/2016-November/000514.htm... and this in Red Hat's Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=1214026
And there's this Polkit issue: https://gitlab.freedesktop.org/polkit/polkit/-/issues/24
sssd-users@lists.fedorahosted.org