Adding the list since Sumit appears to be busy. The info is anonymized so it should be ok. Hopefully, the gz file makes it through.
=G=?
________________________________ From: Galen Johnson Sent: Thursday, September 21, 2017 5:36 PM To: Sumit Bose Cc: Philip Holman Subject: sssd email login performance
Hi Sumit,
I'm finally getting a chance to follow up on the email thread (of the same title) from the sssd list. We've seen some delays (multi-second) for auth requests when users use their email address versus their id. I've attached a tar file with several log files. Phil may need to explain the summary file if you have any questions about it. We are running Centos 7.4 now but I'm fairly certain that it's the same binaries as RHEL 7.4. These logs were taken while on 7.3. I noticed that sssd bumped to 1.15 with 7.4.
Some outstanding questions we have are:
1. The cache appears to not be used for the email attribute. Why is this not used? 2. We're also curious why the ldap requests add 2 seconds when performing the same query from the command-line returns almost immediately. 3. Is it possible to have SSSD ignore the domain and just immediately look up the address? We see "is_email_from_domain" in the domain log (reflected in the nss log). We checked the man pages and nothing really jumped out as a config option.
It should be noted that we also moved the sssd db cache to tmpfs (per a blog from Jakub).
?
Thanks for any insight
=G=?
Phil's analysis follows:
To wrap up, I took one more look at one of the very slow email logins to pull out a trace of what it was doing. The attached files are the log snippets with line breaks marking off the incoming requests to make it more clear what each module was servicing when. The summary.txt shows the summarized entry for the connection and also gives an abridged combined view of the logs marking where the 7 seconds appear to have gone. So this seemed enough info to share if we have the opportunity for a consult with someone.
The short version is that 1 second roughly went to the bind that tests the user, but the other 6 appear to have likely been the result of interacting with local caches rather than the DCs. So that makes the cache files and related configuration look suspicious. It also makes more sense that our earlier checks (against logs or live tests) of the Exnet interactions have failed to show any latency issues on those step.
Possibly the fiddling we've already done with the cache files and cache config resolved this, but it is probably still worth passing this along to someone knowledgeable who might be able to explain what about the setup likely made everything go sideways. Otherwise, we might be facing some kind of build-up pattern where it will always look rosy after a restart and gradually degrade over time as state builds up.
It might also be a good idea to bounce and clear out sssd/pam state on the weekly restarts just to protect against any possible build-up (unless we want to intentionally avoid that for now to see if it does degrade over time).
?Did this make it to the list? I really wish I could see my own posts.
=G=
________________________________ From: Galen Johnson Sent: Thursday, September 28, 2017 3:28 PM To: End-user discussions about the System Security Services Daemon Subject: Fw: sssd email login performance
Adding the list since Sumit appears to be busy. The info is anonymized so it should be ok. Hopefully, the gz file makes it through.
=G=?
________________________________ From: Galen Johnson Sent: Thursday, September 21, 2017 5:36 PM To: Sumit Bose Cc: Philip Holman Subject: sssd email login performance
Hi Sumit,
I'm finally getting a chance to follow up on the email thread (of the same title) from the sssd list. We've seen some delays (multi-second) for auth requests when users use their email address versus their id. I've attached a tar file with several log files. Phil may need to explain the summary file if you have any questions about it. We are running Centos 7.4 now but I'm fairly certain that it's the same binaries as RHEL 7.4. These logs were taken while on 7.3. I noticed that sssd bumped to 1.15 with 7.4.
Some outstanding questions we have are:
1. The cache appears to not be used for the email attribute. Why is this not used? 2. We're also curious why the ldap requests add 2 seconds when performing the same query from the command-line returns almost immediately. 3. Is it possible to have SSSD ignore the domain and just immediately look up the address? We see "is_email_from_domain" in the domain log (reflected in the nss log). We checked the man pages and nothing really jumped out as a config option.
It should be noted that we also moved the sssd db cache to tmpfs (per a blog from Jakub).
?
Thanks for any insight
=G=?
Phil's analysis follows:
To wrap up, I took one more look at one of the very slow email logins to pull out a trace of what it was doing. The attached files are the log snippets with line breaks marking off the incoming requests to make it more clear what each module was servicing when. The summary.txt shows the summarized entry for the connection and also gives an abridged combined view of the logs marking where the 7 seconds appear to have gone. So this seemed enough info to share if we have the opportunity for a consult with someone.
The short version is that 1 second roughly went to the bind that tests the user, but the other 6 appear to have likely been the result of interacting with local caches rather than the DCs. So that makes the cache files and related configuration look suspicious. It also makes more sense that our earlier checks (against logs or live tests) of the Exnet interactions have failed to show any latency issues on those step.
Possibly the fiddling we've already done with the cache files and cache config resolved this, but it is probably still worth passing this along to someone knowledgeable who might be able to explain what about the setup likely made everything go sideways. Otherwise, we might be facing some kind of build-up pattern where it will always look rosy after a restart and gradually degrade over time as state builds up.
It might also be a good idea to bounce and clear out sssd/pam state on the weekly restarts just to protect against any possible build-up (unless we want to intentionally avoid that for now to see if it does degrade over time).
On Mon, Oct 02, 2017 at 06:21:05PM +0000, Galen Johnson wrote:
?Did this make it to the list? I really wish I could see my own posts.
=G=
From: Galen Johnson Sent: Thursday, September 28, 2017 3:28 PM To: End-user discussions about the System Security Services Daemon Subject: Fw: sssd email login performance
Adding the list since Sumit appears to be busy. The info is anonymized so it should be ok. Hopefully, the gz file makes it through.
Hi,
I'm sorry for the delay (in responding). So far I had a short look at the logs and the lookup scheme is currently as expected. There are currently several reasons which cause the observed delay.
One is that currently the lookups by email address are not added to the memory cache in a way that a second lookup by email address can use it. As a result the request always has to be processed by SSSD's nss responder. (Currently I'm working on improving this so that the memory cache can be used here as well).
Another is that in a setup with multiple domains, e.g. an AD forest, it is not clear where the email address is coming from. What makes it worse is that the domain part of the email address can match a domain in a forest but the user with that email address might come from a completely different domain. That is why SSSD first assumes that the input is a fully-qualified name and then falls back to assuming an email address. And here the backend has to search in each domain for the email address. When lookup up the entry in the on-disk cache this can be done in a single search.
I'll have a closer look at the logs tomorrow to see if there is something which can be tuned for your setup.
bye, Sumit
=G=?
From: Galen Johnson Sent: Thursday, September 21, 2017 5:36 PM To: Sumit Bose Cc: Philip Holman Subject: sssd email login performance
Hi Sumit,
I'm finally getting a chance to follow up on the email thread (of the same title) from the sssd list. We've seen some delays (multi-second) for auth requests when users use their email address versus their id. I've attached a tar file with several log files. Phil may need to explain the summary file if you have any questions about it. We are running Centos 7.4 now but I'm fairly certain that it's the same binaries as RHEL 7.4. These logs were taken while on 7.3. I noticed that sssd bumped to 1.15 with 7.4.
Some outstanding questions we have are:
- The cache appears to not be used for the email attribute. Why is this not used?
- We're also curious why the ldap requests add 2 seconds when performing the same query from the command-line returns almost immediately.
- Is it possible to have SSSD ignore the domain and just immediately look up the address? We see "is_email_from_domain" in the domain log (reflected in the nss log). We checked the man pages and nothing really jumped out as a config option.
It should be noted that we also moved the sssd db cache to tmpfs (per a blog from Jakub).
?
Thanks for any insight
=G=?
Phil's analysis follows:
To wrap up, I took one more look at one of the very slow email logins to pull out a trace of what it was doing. The attached files are the log snippets with line breaks marking off the incoming requests to make it more clear what each module was servicing when. The summary.txt shows the summarized entry for the connection and also gives an abridged combined view of the logs marking where the 7 seconds appear to have gone. So this seemed enough info to share if we have the opportunity for a consult with someone.
The short version is that 1 second roughly went to the bind that tests the user, but the other 6 appear to have likely been the result of interacting with local caches rather than the DCs. So that makes the cache files and related configuration look suspicious. It also makes more sense that our earlier checks (against logs or live tests) of the Exnet interactions have failed to show any latency issues on those step.
Possibly the fiddling we've already done with the cache files and cache config resolved this, but it is probably still worth passing this along to someone knowledgeable who might be able to explain what about the setup likely made everything go sideways. Otherwise, we might be facing some kind of build-up pattern where it will always look rosy after a restart and gradually degrade over time as state builds up.
It might also be a good idea to bounce and clear out sssd/pam state on the weekly restarts just to protect against any possible build-up (unless we want to intentionally avoid that for now to see if it does degrade over time).
sssd-users mailing list -- sssd-users@lists.fedorahosted.org To unsubscribe send an email to sssd-users-leave@lists.fedorahosted.org
Thanks, Sumit.
In the interim, is there a way to override the lookup behavior to force sssd to assume email address over domain (this is a single domain environment)? I think that would take some of the delay away.
=G=
________________________________________ From: Sumit Bose sbose@redhat.com Sent: Tuesday, October 3, 2017 5:18 AM To: sssd-users@lists.fedorahosted.org Subject: [SSSD-users] Re: sssd email login performance
EXTERNAL
On Mon, Oct 02, 2017 at 06:21:05PM +0000, Galen Johnson wrote:
?Did this make it to the list? I really wish I could see my own posts.
=G=
From: Galen Johnson Sent: Thursday, September 28, 2017 3:28 PM To: End-user discussions about the System Security Services Daemon Subject: Fw: sssd email login performance
Adding the list since Sumit appears to be busy. The info is anonymized so it should be ok. Hopefully, the gz file makes it through.
Hi,
I'm sorry for the delay (in responding). So far I had a short look at the logs and the lookup scheme is currently as expected. There are currently several reasons which cause the observed delay.
One is that currently the lookups by email address are not added to the memory cache in a way that a second lookup by email address can use it. As a result the request always has to be processed by SSSD's nss responder. (Currently I'm working on improving this so that the memory cache can be used here as well).
Another is that in a setup with multiple domains, e.g. an AD forest, it is not clear where the email address is coming from. What makes it worse is that the domain part of the email address can match a domain in a forest but the user with that email address might come from a completely different domain. That is why SSSD first assumes that the input is a fully-qualified name and then falls back to assuming an email address. And here the backend has to search in each domain for the email address. When lookup up the entry in the on-disk cache this can be done in a single search.
I'll have a closer look at the logs tomorrow to see if there is something which can be tuned for your setup.
bye, Sumit
=G=?
From: Galen Johnson Sent: Thursday, September 21, 2017 5:36 PM To: Sumit Bose Cc: Philip Holman Subject: sssd email login performance
Hi Sumit,
I'm finally getting a chance to follow up on the email thread (of the same title) from the sssd list. We've seen some delays (multi-second) for auth requests when users use their email address versus their id. I've attached a tar file with several log files. Phil may need to explain the summary file if you have any questions about it. We are running Centos 7.4 now but I'm fairly certain that it's the same binaries as RHEL 7.4. These logs were taken while on 7.3. I noticed that sssd bumped to 1.15 with 7.4.
Some outstanding questions we have are:
- The cache appears to not be used for the email attribute. Why is this not used?
- We're also curious why the ldap requests add 2 seconds when performing the same query from the command-line returns almost immediately.
- Is it possible to have SSSD ignore the domain and just immediately look up the address? We see "is_email_from_domain" in the domain log (reflected in the nss log). We checked the man pages and nothing really jumped out as a config option.
It should be noted that we also moved the sssd db cache to tmpfs (per a blog from Jakub).
?
Thanks for any insight
=G=?
Phil's analysis follows:
To wrap up, I took one more look at one of the very slow email logins to pull out a trace of what it was doing. The attached files are the log snippets with line breaks marking off the incoming requests to make it more clear what each module was servicing when. The summary.txt shows the summarized entry for the connection and also gives an abridged combined view of the logs marking where the 7 seconds appear to have gone. So this seemed enough info to share if we have the opportunity for a consult with someone.
The short version is that 1 second roughly went to the bind that tests the user, but the other 6 appear to have likely been the result of interacting with local caches rather than the DCs. So that makes the cache files and related configuration look suspicious. It also makes more sense that our earlier checks (against logs or live tests) of the Exnet interactions have failed to show any latency issues on those step.
Possibly the fiddling we've already done with the cache files and cache config resolved this, but it is probably still worth passing this along to someone knowledgeable who might be able to explain what about the setup likely made everything go sideways. Otherwise, we might be facing some kind of build-up pattern where it will always look rosy after a restart and gradually degrade over time as state builds up.
It might also be a good idea to bounce and clear out sssd/pam state on the weekly restarts just to protect against any possible build-up (unless we want to intentionally avoid that for now to see if it does degrade over time).
sssd-users mailing list -- sssd-users@lists.fedorahosted.org To unsubscribe send an email to sssd-users-leave@lists.fedorahosted.org
_______________________________________________ sssd-users mailing list -- sssd-users@lists.fedorahosted.org To unsubscribe send an email to sssd-users-leave@lists.fedorahosted.org
On Tue, Oct 03, 2017 at 01:35:29PM +0000, Galen Johnson wrote:
Thanks, Sumit.
In the interim, is there a way to override the lookup behavior to force sssd to assume email address over domain (this is a single domain environment)? I think that would take some of the delay away.
No, there is no such option and after a closer look at the logs I wonder if the delay you see is related to the email address at all.
The nss logs show that getpwnam() request are handled within a second. SSSD detects that the domain part of the email address is not a known domain and tries to ask the backend to refresh the list of domains which is not supported in your case (org.freedesktop.sssd.Error.DataProvider.NotSupported) and SSSD switches to email based lookup. It would be possible to save the not supported result so that further requests can be skipped but this won't save much time. The main issue here is the missing memory cache support I mentioned earlier.
During authentication the group memberships of the user are refreshed to make sure the group membership are up-to-date when the user logs in and that group based access control schemes has valid data as well. Most of the time is spend here and as you already mentioned in summary.txt during ldb transactions. One reason for this is slow storage another might be a missing index. I'd like to skip the first one for a start and look at the second one because recently a missing index issue was fixed in SSSD. To check this I'd like to ask you to add
LDB_WARN_UNINDEXED=1 LDB_WARN_REINDEX=1
to /etc/sysconfig/sssd and run SSSD with debug_level=10. The two variables will add new log messages which include "ldb FULL SEARCH" and "Reindexing" respectively. It would be nice if you can run the login test again and send me the new log files or at least the lines mentioned above with some context.
About "1 sec for user bind to test credentials (necessary?)" from summary.txt. Yes, this is necessary during authentication. And it is a bit time consuming as well because a TLS tunnel has to be created as well otherwise the password is send in clear text over the network. In case you have to authenticate multiple times in a short time interval you can use the 'cached_auth_timeout' option, see man sssd.conf for details.
bye, Sumit
=G=
From: Sumit Bose sbose@redhat.com Sent: Tuesday, October 3, 2017 5:18 AM To: sssd-users@lists.fedorahosted.org Subject: [SSSD-users] Re: sssd email login performance
EXTERNAL
On Mon, Oct 02, 2017 at 06:21:05PM +0000, Galen Johnson wrote:
?Did this make it to the list? I really wish I could see my own posts.
=G=
From: Galen Johnson Sent: Thursday, September 28, 2017 3:28 PM To: End-user discussions about the System Security Services Daemon Subject: Fw: sssd email login performance
Adding the list since Sumit appears to be busy. The info is anonymized so it should be ok. Hopefully, the gz file makes it through.
Hi,
I'm sorry for the delay (in responding). So far I had a short look at the logs and the lookup scheme is currently as expected. There are currently several reasons which cause the observed delay.
One is that currently the lookups by email address are not added to the memory cache in a way that a second lookup by email address can use it. As a result the request always has to be processed by SSSD's nss responder. (Currently I'm working on improving this so that the memory cache can be used here as well).
Another is that in a setup with multiple domains, e.g. an AD forest, it is not clear where the email address is coming from. What makes it worse is that the domain part of the email address can match a domain in a forest but the user with that email address might come from a completely different domain. That is why SSSD first assumes that the input is a fully-qualified name and then falls back to assuming an email address. And here the backend has to search in each domain for the email address. When lookup up the entry in the on-disk cache this can be done in a single search.
I'll have a closer look at the logs tomorrow to see if there is something which can be tuned for your setup.
bye, Sumit
=G=?
From: Galen Johnson Sent: Thursday, September 21, 2017 5:36 PM To: Sumit Bose Cc: Philip Holman Subject: sssd email login performance
Hi Sumit,
I'm finally getting a chance to follow up on the email thread (of the same title) from the sssd list. We've seen some delays (multi-second) for auth requests when users use their email address versus their id. I've attached a tar file with several log files. Phil may need to explain the summary file if you have any questions about it. We are running Centos 7.4 now but I'm fairly certain that it's the same binaries as RHEL 7.4. These logs were taken while on 7.3. I noticed that sssd bumped to 1.15 with 7.4.
Some outstanding questions we have are:
- The cache appears to not be used for the email attribute. Why is this not used?
- We're also curious why the ldap requests add 2 seconds when performing the same query from the command-line returns almost immediately.
- Is it possible to have SSSD ignore the domain and just immediately look up the address? We see "is_email_from_domain" in the domain log (reflected in the nss log). We checked the man pages and nothing really jumped out as a config option.
It should be noted that we also moved the sssd db cache to tmpfs (per a blog from Jakub).
?
Thanks for any insight
=G=?
Phil's analysis follows:
To wrap up, I took one more look at one of the very slow email logins to pull out a trace of what it was doing. The attached files are the log snippets with line breaks marking off the incoming requests to make it more clear what each module was servicing when. The summary.txt shows the summarized entry for the connection and also gives an abridged combined view of the logs marking where the 7 seconds appear to have gone. So this seemed enough info to share if we have the opportunity for a consult with someone.
The short version is that 1 second roughly went to the bind that tests the user, but the other 6 appear to have likely been the result of interacting with local caches rather than the DCs. So that makes the cache files and related configuration look suspicious. It also makes more sense that our earlier checks (against logs or live tests) of the Exnet interactions have failed to show any latency issues on those step.
Possibly the fiddling we've already done with the cache files and cache config resolved this, but it is probably still worth passing this along to someone knowledgeable who might be able to explain what about the setup likely made everything go sideways. Otherwise, we might be facing some kind of build-up pattern where it will always look rosy after a restart and gradually degrade over time as state builds up.
It might also be a good idea to bounce and clear out sssd/pam state on the weekly restarts just to protect against any possible build-up (unless we want to intentionally avoid that for now to see if it does degrade over time).
sssd-users mailing list -- sssd-users@lists.fedorahosted.org To unsubscribe send an email to sssd-users-leave@lists.fedorahosted.org
sssd-users mailing list -- sssd-users@lists.fedorahosted.org To unsubscribe send an email to sssd-users-leave@lists.fedorahosted.org _______________________________________________ sssd-users mailing list -- sssd-users@lists.fedorahosted.org To unsubscribe send an email to sssd-users-leave@lists.fedorahosted.org
Thanks, again, Sumit. We recently switched to using tmpfs for the caching database (per https://jhrozek.wordpress.com/2015/08/19/performance-tuning-sssd-for-large-i...) but I don't recall if it was in place when this test was done. Regardless, we'll try to get another run together and get it back to you. I'll also look into the cached_auth_timeout option.
Would there be any benefit in putting /var/lib/sss/mc on tmpfs as well? Also, what are the *_corrupted files? Is there a way to see the content (like you can with ldbsearch on the other cached data)?
thanks
=G=
________________________________________ From: Sumit Bose sbose@redhat.com Sent: Wednesday, October 4, 2017 5:58 AM To: sssd-users@lists.fedorahosted.org Subject: [SSSD-users] Re: sssd email login performance
EXTERNAL
On Tue, Oct 03, 2017 at 01:35:29PM +0000, Galen Johnson wrote:
Thanks, Sumit.
In the interim, is there a way to override the lookup behavior to force sssd to assume email address over domain (this is a single domain environment)? I think that would take some of the delay away.
No, there is no such option and after a closer look at the logs I wonder if the delay you see is related to the email address at all.
The nss logs show that getpwnam() request are handled within a second. SSSD detects that the domain part of the email address is not a known domain and tries to ask the backend to refresh the list of domains which is not supported in your case (org.freedesktop.sssd.Error.DataProvider.NotSupported) and SSSD switches to email based lookup. It would be possible to save the not supported result so that further requests can be skipped but this won't save much time. The main issue here is the missing memory cache support I mentioned earlier.
During authentication the group memberships of the user are refreshed to make sure the group membership are up-to-date when the user logs in and that group based access control schemes has valid data as well. Most of the time is spend here and as you already mentioned in summary.txt during ldb transactions. One reason for this is slow storage another might be a missing index. I'd like to skip the first one for a start and look at the second one because recently a missing index issue was fixed in SSSD. To check this I'd like to ask you to add
LDB_WARN_UNINDEXED=1 LDB_WARN_REINDEX=1
to /etc/sysconfig/sssd and run SSSD with debug_level=10. The two variables will add new log messages which include "ldb FULL SEARCH" and "Reindexing" respectively. It would be nice if you can run the login test again and send me the new log files or at least the lines mentioned above with some context.
About "1 sec for user bind to test credentials (necessary?)" from summary.txt. Yes, this is necessary during authentication. And it is a bit time consuming as well because a TLS tunnel has to be created as well otherwise the password is send in clear text over the network. In case you have to authenticate multiple times in a short time interval you can use the 'cached_auth_timeout' option, see man sssd.conf for details.
bye, Sumit
=G=
From: Sumit Bose sbose@redhat.com Sent: Tuesday, October 3, 2017 5:18 AM To: sssd-users@lists.fedorahosted.org Subject: [SSSD-users] Re: sssd email login performance
EXTERNAL
On Mon, Oct 02, 2017 at 06:21:05PM +0000, Galen Johnson wrote:
?Did this make it to the list? I really wish I could see my own posts.
=G=
From: Galen Johnson Sent: Thursday, September 28, 2017 3:28 PM To: End-user discussions about the System Security Services Daemon Subject: Fw: sssd email login performance
Adding the list since Sumit appears to be busy. The info is anonymized so it should be ok. Hopefully, the gz file makes it through.
Hi,
I'm sorry for the delay (in responding). So far I had a short look at the logs and the lookup scheme is currently as expected. There are currently several reasons which cause the observed delay.
One is that currently the lookups by email address are not added to the memory cache in a way that a second lookup by email address can use it. As a result the request always has to be processed by SSSD's nss responder. (Currently I'm working on improving this so that the memory cache can be used here as well).
Another is that in a setup with multiple domains, e.g. an AD forest, it is not clear where the email address is coming from. What makes it worse is that the domain part of the email address can match a domain in a forest but the user with that email address might come from a completely different domain. That is why SSSD first assumes that the input is a fully-qualified name and then falls back to assuming an email address. And here the backend has to search in each domain for the email address. When lookup up the entry in the on-disk cache this can be done in a single search.
I'll have a closer look at the logs tomorrow to see if there is something which can be tuned for your setup.
bye, Sumit
=G=?
From: Galen Johnson Sent: Thursday, September 21, 2017 5:36 PM To: Sumit Bose Cc: Philip Holman Subject: sssd email login performance
Hi Sumit,
I'm finally getting a chance to follow up on the email thread (of the same title) from the sssd list. We've seen some delays (multi-second) for auth requests when users use their email address versus their id. I've attached a tar file with several log files. Phil may need to explain the summary file if you have any questions about it. We are running Centos 7.4 now but I'm fairly certain that it's the same binaries as RHEL 7.4. These logs were taken while on 7.3. I noticed that sssd bumped to 1.15 with 7.4.
Some outstanding questions we have are:
- The cache appears to not be used for the email attribute. Why is this not used?
- We're also curious why the ldap requests add 2 seconds when performing the same query from the command-line returns almost immediately.
- Is it possible to have SSSD ignore the domain and just immediately look up the address? We see "is_email_from_domain" in the domain log (reflected in the nss log). We checked the man pages and nothing really jumped out as a config option.
It should be noted that we also moved the sssd db cache to tmpfs (per a blog from Jakub).
?
Thanks for any insight
=G=?
Phil's analysis follows:
To wrap up, I took one more look at one of the very slow email logins to pull out a trace of what it was doing. The attached files are the log snippets with line breaks marking off the incoming requests to make it more clear what each module was servicing when. The summary.txt shows the summarized entry for the connection and also gives an abridged combined view of the logs marking where the 7 seconds appear to have gone. So this seemed enough info to share if we have the opportunity for a consult with someone.
The short version is that 1 second roughly went to the bind that tests the user, but the other 6 appear to have likely been the result of interacting with local caches rather than the DCs. So that makes the cache files and related configuration look suspicious. It also makes more sense that our earlier checks (against logs or live tests) of the Exnet interactions have failed to show any latency issues on those step.
Possibly the fiddling we've already done with the cache files and cache config resolved this, but it is probably still worth passing this along to someone knowledgeable who might be able to explain what about the setup likely made everything go sideways. Otherwise, we might be facing some kind of build-up pattern where it will always look rosy after a restart and gradually degrade over time as state builds up.
It might also be a good idea to bounce and clear out sssd/pam state on the weekly restarts just to protect against any possible build-up (unless we want to intentionally avoid that for now to see if it does degrade over time).
sssd-users mailing list -- sssd-users@lists.fedorahosted.org To unsubscribe send an email to sssd-users-leave@lists.fedorahosted.org
sssd-users mailing list -- sssd-users@lists.fedorahosted.org To unsubscribe send an email to sssd-users-leave@lists.fedorahosted.org _______________________________________________ sssd-users mailing list -- sssd-users@lists.fedorahosted.org To unsubscribe send an email to sssd-users-leave@lists.fedorahosted.org
_______________________________________________ sssd-users mailing list -- sssd-users@lists.fedorahosted.org To unsubscribe send an email to sssd-users-leave@lists.fedorahosted.org
On (04/10/17 12:18), Galen Johnson wrote:
Thanks, again, Sumit. We recently switched to using tmpfs for the caching database (per https://jhrozek.wordpress.com/2015/08/19/performance-tuning-sssd-for-large-i...) but I don't recall if it was in place when this test was done. Regardless, we'll try to get another run together and get it back to you. I'll also look into the cached_auth_timeout option.
Would there be any benefit in putting /var/lib/sss/mc on tmpfs as well? Also, what are the *_corrupted files? Is there a way to see the content (like you can with ldbsearch on the other cached data)?
Did you rename some users/groups in LDAP server? Or do you have colliding UID/GID in LDAP server?
Answer to previous question might explain such files.
LS
It's possible as we've had that happen in the past (and complained loudly to the team that keeps doing it). Is there any way to read those files to see which users/groups are contained in them so we can verify? ldbsearch doesn't seem to read them or I'm giving it the wrong args. If this is already in the troubleshooting/debug docs, I've missed it.
thanks
=G=
________________________________________ From: Lukas Slebodnik lslebodn@redhat.com Sent: Wednesday, October 4, 2017 8:41 AM To: End-user discussions about the System Security Services Daemon Subject: [SSSD-users] Re: sssd email login performance
EXTERNAL
On (04/10/17 12:18), Galen Johnson wrote:
Thanks, again, Sumit. We recently switched to using tmpfs for the caching database (per https://jhrozek.wordpress.com/2015/08/19/performance-tuning-sssd-for-large-i...) but I don't recall if it was in place when this test was done. Regardless, we'll try to get another run together and get it back to you. I'll also look into the cached_auth_timeout option.
Would there be any benefit in putting /var/lib/sss/mc on tmpfs as well? Also, what are the *_corrupted files? Is there a way to see the content (like you can with ldbsearch on the other cached data)?
Did you rename some users/groups in LDAP server? Or do you have colliding UID/GID in LDAP server?
Answer to previous question might explain such files.
LS _______________________________________________ sssd-users mailing list -- sssd-users@lists.fedorahosted.org To unsubscribe send an email to sssd-users-leave@lists.fedorahosted.org
On (04/10/17 12:46), Galen Johnson wrote:
It's possible as we've had that happen in the past (and complained loudly to the team that keeps doing it). Is there any way to read those files to see which users/groups are contained in them so we can verify? ldbsearch doesn't seem to read them or I'm giving it the wrong args. If this is already in the troubleshooting/debug docs, I've missed it.
Memory cache is in binary format.
If you renamed group the it is a known bug https://fedorahosted.org/sssd/ticket/3282
If it is a colliding UID/GID then preffered way is to fix it on server side.
LS
We have millions of entries in the OU and our clients don't see all the entries since we do filter them on our side (and we don't manage the server side). It would be nice to be able to find out which users/groups are affected on our side so we can take that to the admins of the servers. How would you review the data files in memory cache to see the content? All I get back is "data" when I run 'file *_corrupted' which isn't exactly useful. I'm assuming it's used in sssd somehow. Does sssctl have any functionality to help here? Trying to learn how to fish (so you guys don't have to keep feeding me :-)).
thanks
=G=
________________________________________ From: Lukas Slebodnik lslebodn@redhat.com Sent: Wednesday, October 4, 2017 9:08 AM To: End-user discussions about the System Security Services Daemon Subject: [SSSD-users] Re: sssd email login performance
EXTERNAL
On (04/10/17 12:46), Galen Johnson wrote:
It's possible as we've had that happen in the past (and complained loudly to the team that keeps doing it). Is there any way to read those files to see which users/groups are contained in them so we can verify? ldbsearch doesn't seem to read them or I'm giving it the wrong args. If this is already in the troubleshooting/debug docs, I've missed it.
Memory cache is in binary format.
If you renamed group the it is a known bug https://fedorahosted.org/sssd/ticket/3282
If it is a colliding UID/GID then preffered way is to fix it on server side.
LS _______________________________________________ sssd-users mailing list -- sssd-users@lists.fedorahosted.org To unsubscribe send an email to sssd-users-leave@lists.fedorahosted.org
On (04/10/17 13:23), Galen Johnson wrote:
We have millions of entries in the OU and our clients don't see all the entries since we do filter them on our side (and we don't manage the server side). It would be nice to be able to find out which users/groups are affected on our side so we can take that to the admins of the servers. How would you review the data files in memory cache to see the content? All I get back is "data" when I run 'file *_corrupted' which isn't exactly useful. I'm assuming it's used in sssd somehow. Does sssctl have any functionality to help here? Trying to learn how to fish (so you guys don't have to keep feeding me :-)).
You might check sysdb cache for colliding UID/GIDs. But IIRC susch situation shoudl be reported also in sssd domain or nss log files with debug level <=4.
sh# ldbsearch -H /var/lib/sss/db/cache_$domain.ldb '(objectClass=user)' name uidNumber
sh# ldbsearch -H /var/lib/sss/db/cache_$domain.ldb '(objectClass=group)' name gidNumber
And if you have many entries also in sssd cache then you can do some additional processing in shell ... | sort | uniq -c | grep idNumber
LS
Interestingly, that returns nothing unusual. Are those files just placeholders in case they are needed? (I've taken this thread way off topic so this will be my last question on this segue).
thanks
=G=
________________________________________ From: Lukas Slebodnik lslebodn@redhat.com Sent: Wednesday, October 4, 2017 9:30 AM To: End-user discussions about the System Security Services Daemon Subject: [SSSD-users] Re: sssd email login performance
EXTERNAL
On (04/10/17 13:23), Galen Johnson wrote:
We have millions of entries in the OU and our clients don't see all the entries since we do filter them on our side (and we don't manage the server side). It would be nice to be able to find out which users/groups are affected on our side so we can take that to the admins of the servers. How would you review the data files in memory cache to see the content? All I get back is "data" when I run 'file *_corrupted' which isn't exactly useful. I'm assuming it's used in sssd somehow. Does sssctl have any functionality to help here? Trying to learn how to fish (so you guys don't have to keep feeding me :-)).
You might check sysdb cache for colliding UID/GIDs. But IIRC susch situation shoudl be reported also in sssd domain or nss log files with debug level <=4.
sh# ldbsearch -H /var/lib/sss/db/cache_$domain.ldb '(objectClass=user)' name uidNumber
sh# ldbsearch -H /var/lib/sss/db/cache_$domain.ldb '(objectClass=group)' name gidNumber
And if you have many entries also in sssd cache then you can do some additional processing in shell ... | sort | uniq -c | grep idNumber
LS _______________________________________________ sssd-users mailing list -- sssd-users@lists.fedorahosted.org To unsubscribe send an email to sssd-users-leave@lists.fedorahosted.org
On (04/10/17 13:58), Galen Johnson wrote:
Interestingly, that returns nothing unusual.
Then it means it was due to renamed user/group.
Are those files just placeholders in case they are needed? (I've taken this thread way off topic so this will be my last question on this segue).
Files in /var/lib/sss/db/ are mostly sssd cache. And Are used by sssd processes. I hope it answer your question :-)
LS
Sumit,
I'll send the files off-list. We see one "ldb FULL SEARCH" referenced in the domain log but no "Reindexing" in any log. We also went ahead and added 'cached_auth_timeout=30' to our config (pam_id_timeout was set to 30 earlier...according to the man page, I didn't want to exceed that timeout). It sounds like the lookup behavior is behaving as expected, otherwise.
thanks
=G=
________________________________________ From: Sumit Bose sbose@redhat.com Sent: Wednesday, October 4, 2017 5:58 AM To: sssd-users@lists.fedorahosted.org Subject: [SSSD-users] Re: sssd email login performance
EXTERNAL
On Tue, Oct 03, 2017 at 01:35:29PM +0000, Galen Johnson wrote:
Thanks, Sumit.
In the interim, is there a way to override the lookup behavior to force sssd to assume email address over domain (this is a single domain environment)? I think that would take some of the delay away.
No, there is no such option and after a closer look at the logs I wonder if the delay you see is related to the email address at all.
The nss logs show that getpwnam() request are handled within a second. SSSD detects that the domain part of the email address is not a known domain and tries to ask the backend to refresh the list of domains which is not supported in your case (org.freedesktop.sssd.Error.DataProvider.NotSupported) and SSSD switches to email based lookup. It would be possible to save the not supported result so that further requests can be skipped but this won't save much time. The main issue here is the missing memory cache support I mentioned earlier.
During authentication the group memberships of the user are refreshed to make sure the group membership are up-to-date when the user logs in and that group based access control schemes has valid data as well. Most of the time is spend here and as you already mentioned in summary.txt during ldb transactions. One reason for this is slow storage another might be a missing index. I'd like to skip the first one for a start and look at the second one because recently a missing index issue was fixed in SSSD. To check this I'd like to ask you to add
LDB_WARN_UNINDEXED=1 LDB_WARN_REINDEX=1
to /etc/sysconfig/sssd and run SSSD with debug_level=10. The two variables will add new log messages which include "ldb FULL SEARCH" and "Reindexing" respectively. It would be nice if you can run the login test again and send me the new log files or at least the lines mentioned above with some context.
About "1 sec for user bind to test credentials (necessary?)" from summary.txt. Yes, this is necessary during authentication. And it is a bit time consuming as well because a TLS tunnel has to be created as well otherwise the password is send in clear text over the network. In case you have to authenticate multiple times in a short time interval you can use the 'cached_auth_timeout' option, see man sssd.conf for details.
bye, Sumit
=G=
From: Sumit Bose sbose@redhat.com Sent: Tuesday, October 3, 2017 5:18 AM To: sssd-users@lists.fedorahosted.org Subject: [SSSD-users] Re: sssd email login performance
EXTERNAL
On Mon, Oct 02, 2017 at 06:21:05PM +0000, Galen Johnson wrote:
?Did this make it to the list? I really wish I could see my own posts.
=G=
From: Galen Johnson Sent: Thursday, September 28, 2017 3:28 PM To: End-user discussions about the System Security Services Daemon Subject: Fw: sssd email login performance
Adding the list since Sumit appears to be busy. The info is anonymized so it should be ok. Hopefully, the gz file makes it through.
Hi,
I'm sorry for the delay (in responding). So far I had a short look at the logs and the lookup scheme is currently as expected. There are currently several reasons which cause the observed delay.
One is that currently the lookups by email address are not added to the memory cache in a way that a second lookup by email address can use it. As a result the request always has to be processed by SSSD's nss responder. (Currently I'm working on improving this so that the memory cache can be used here as well).
Another is that in a setup with multiple domains, e.g. an AD forest, it is not clear where the email address is coming from. What makes it worse is that the domain part of the email address can match a domain in a forest but the user with that email address might come from a completely different domain. That is why SSSD first assumes that the input is a fully-qualified name and then falls back to assuming an email address. And here the backend has to search in each domain for the email address. When lookup up the entry in the on-disk cache this can be done in a single search.
I'll have a closer look at the logs tomorrow to see if there is something which can be tuned for your setup.
bye, Sumit
=G=?
From: Galen Johnson Sent: Thursday, September 21, 2017 5:36 PM To: Sumit Bose Cc: Philip Holman Subject: sssd email login performance
Hi Sumit,
I'm finally getting a chance to follow up on the email thread (of the same title) from the sssd list. We've seen some delays (multi-second) for auth requests when users use their email address versus their id. I've attached a tar file with several log files. Phil may need to explain the summary file if you have any questions about it. We are running Centos 7.4 now but I'm fairly certain that it's the same binaries as RHEL 7.4. These logs were taken while on 7.3. I noticed that sssd bumped to 1.15 with 7.4.
Some outstanding questions we have are:
- The cache appears to not be used for the email attribute. Why is this not used?
- We're also curious why the ldap requests add 2 seconds when performing the same query from the command-line returns almost immediately.
- Is it possible to have SSSD ignore the domain and just immediately look up the address? We see "is_email_from_domain" in the domain log (reflected in the nss log). We checked the man pages and nothing really jumped out as a config option.
It should be noted that we also moved the sssd db cache to tmpfs (per a blog from Jakub).
?
Thanks for any insight
=G=?
Phil's analysis follows:
To wrap up, I took one more look at one of the very slow email logins to pull out a trace of what it was doing. The attached files are the log snippets with line breaks marking off the incoming requests to make it more clear what each module was servicing when. The summary.txt shows the summarized entry for the connection and also gives an abridged combined view of the logs marking where the 7 seconds appear to have gone. So this seemed enough info to share if we have the opportunity for a consult with someone.
The short version is that 1 second roughly went to the bind that tests the user, but the other 6 appear to have likely been the result of interacting with local caches rather than the DCs. So that makes the cache files and related configuration look suspicious. It also makes more sense that our earlier checks (against logs or live tests) of the Exnet interactions have failed to show any latency issues on those step.
Possibly the fiddling we've already done with the cache files and cache config resolved this, but it is probably still worth passing this along to someone knowledgeable who might be able to explain what about the setup likely made everything go sideways. Otherwise, we might be facing some kind of build-up pattern where it will always look rosy after a restart and gradually degrade over time as state builds up.
It might also be a good idea to bounce and clear out sssd/pam state on the weekly restarts just to protect against any possible build-up (unless we want to intentionally avoid that for now to see if it does degrade over time).
sssd-users mailing list -- sssd-users@lists.fedorahosted.org To unsubscribe send an email to sssd-users-leave@lists.fedorahosted.org
sssd-users mailing list -- sssd-users@lists.fedorahosted.org To unsubscribe send an email to sssd-users-leave@lists.fedorahosted.org _______________________________________________ sssd-users mailing list -- sssd-users@lists.fedorahosted.org To unsubscribe send an email to sssd-users-leave@lists.fedorahosted.org
_______________________________________________ sssd-users mailing list -- sssd-users@lists.fedorahosted.org To unsubscribe send an email to sssd-users-leave@lists.fedorahosted.org
sssd-users@lists.fedorahosted.org