Hi there,
we do see some timeouts when sssd tries to bind to the LDAP server:
(Mon Aug 27 10:35:54 2012) [sssd[be[LDAP]]] [be_resolve_server_done] (4): Found address for server xxx1.domain.de: [10.11.12.13] TTL 21600 (Mon Aug 27 10:35:54 2012) [sssd[be[LDAP]]] [fo_set_port_status] (4): Marking port 636 of server 'xxx1.domain.de' as 'working' (Mon Aug 27 10:35:54 2012) [sssd[be[LDAP]]] [set_server_common_status] (4): Marking server 'xxx1.domain.de' as 'working' (Mon Aug 27 10:35:54 2012) [sssd[be[LDAP]]] [simple_bind_send] (4): Executing simple bind as: uid=abcdefg,ou=People,o=ldap,o=root (Mon Aug 27 10:35:59 2012) [sssd[be[LDAP]]] [be_run_offline_cb] (3): Going offline. Running callbacks. (Mon Aug 27 10:35:59 2012) [sssd[be[LDAP]]] [be_pam_handler_callback] (4): Backend returned: (1, 9, <NULL>) [Provider is Offline (Authentication service cannot retrieve authentication info)]
In this case sssd seems to not ask the second ldap server? We ser the "ldap_search_timeout" option to 120 seconds (default is 6 seconds), but this does change the behaviour.
We are running sssd 1.5.1 (as distributed with CentOS 5.X).
Looking at all the different ldap timeout values, I guess that "ldap_network_timeout" is not the right thing (because network connect does work), could "ldap_opt_timeout" provide some solution (as I do not really understand what it does)?
Cheers, Olaf
On Mon, Aug 27, 2012 at 11:39:24AM +0200, Olaf Gellert wrote:
Hi there,
we do see some timeouts when sssd tries to bind to the LDAP server:
(Mon Aug 27 10:35:54 2012) [sssd[be[LDAP]]] [be_resolve_server_done] (4): Found address for server xxx1.domain.de: [10.11.12.13] TTL 21600 (Mon Aug 27 10:35:54 2012) [sssd[be[LDAP]]] [fo_set_port_status] (4): Marking port 636 of server 'xxx1.domain.de' as 'working' (Mon Aug 27 10:35:54 2012) [sssd[be[LDAP]]] [set_server_common_status] (4): Marking server 'xxx1.domain.de' as 'working' (Mon Aug 27 10:35:54 2012) [sssd[be[LDAP]]] [simple_bind_send] (4): Executing simple bind as: uid=abcdefg,ou=People,o=ldap,o=root (Mon Aug 27 10:35:59 2012) [sssd[be[LDAP]]] [be_run_offline_cb] (3): Going offline. Running callbacks. (Mon Aug 27 10:35:59 2012) [sssd[be[LDAP]]] [be_pam_handler_callback] (4): Backend returned: (1, 9, <NULL>) [Provider is Offline (Authentication service cannot retrieve authentication info)]
In this case sssd seems to not ask the second ldap server? We ser the "ldap_search_timeout" option to 120 seconds (default is 6 seconds), but this does change the behaviour.
We are running sssd 1.5.1 (as distributed with CentOS 5.X).
Looking at all the different ldap timeout values, I guess that "ldap_network_timeout" is not the right thing (because network connect does work), could "ldap_opt_timeout" provide some solution (as I do not really understand what it does)?
Cheers, Olaf
Hi Olaf,
as you discovered, the ldap bind timeout is currently hardcoded to 5 seconds. I think we wanted to make this setting configurable when we were working on making the bind asychronous, but we cancelled that effort.
Maybe it would be beneficial to either reuse ldap_opt_timeout for the bind timeout value or introduce a new timeout. I filed https://fedorahosted.org/sssd/ticket/1501 to track this.
I am far more concerned about the provider going offline without asking the secondary LDAP server. I'll try to reproduce the issue locally.
Hi Jakub,
thanks for your answer.
Jakub Hrozek wrote:
Maybe it would be beneficial to either reuse ldap_opt_timeout for the bind timeout value or introduce a new timeout. I filed https://fedorahosted.org/sssd/ticket/1501 to track this.
thanks.
I am far more concerned about the provider going offline without asking the secondary LDAP server. I'll try to reproduce the issue locally.
If I can help you with anything, just say what you need.
Cheers, Olaf
On Thu, Aug 30, 2012 at 08:33:51AM +0200, Olaf Gellert wrote:
Hi Jakub,
thanks for your answer.
Jakub Hrozek wrote:
Maybe it would be beneficial to either reuse ldap_opt_timeout for the bind timeout value or introduce a new timeout. I filed https://fedorahosted.org/sssd/ticket/1501 to track this.
thanks.
I am far more concerned about the provider going offline without asking the secondary LDAP server. I'll try to reproduce the issue locally.
If I can help you with anything, just say what you need.
Hi Olaf,
I think I may have found your problem. In the extremely rare case when the initial connection to the LDAP server would succeed but then the bind request would time out, the SSSD would not retry the next server.
If you tell me the exact version you are running (the whole output of rpm -q sssd), I can prepare a scratch build for you to test if my patch fixes your issue.
However, I'm curious about how you could end up in a situation like this. Can you run the following test for me?
ldapsearch -x -H ldap://xxx1.domain.de \ -D "uid=abcdefg,ou=People,o=ldap,o=root" \ -w "thepassword" \ -b uid=abcdefg,ou=People,o=ldap,o=root -s base
I used the sanitized values you used in the original report, substitute them for the real ones you use, please and also use -Z or similar depending on your real configuration. The above command should trigger a similar codepath using libldap API as the SSSD does. Does it succeed in your environment? Are there maybe any interesting messages in the server log?
sssd-users@lists.fedorahosted.org