Hello,
My first post here. I have an issue with having occasional failures of LDAP servers being used by SSSD. What happens is that when a new server is stood up to replace the failed servers, users can't seem to login until SSSD is restarted. Some users can, and it is hard to tell which can and can't. I understand this is a caching setting or has something to do with caching, but I don't fully understand why sssd can't just keep running. Prior to this we used nslcd and never had these issues. Is this a known issue or am I missing something in the setup?
Thank you ~Janelle
On Tue, Jun 23, 2015 at 06:42:02AM -0700, Janelle wrote:
Hello,
My first post here. I have an issue with having occasional failures of LDAP servers being used by SSSD. What happens is that when a new server is stood up to replace the failed servers, users can't seem to login until SSSD is restarted. Some users can, and it is hard to tell which can and can't. I understand this is a caching setting or has something to do with caching, but I don't fully understand why sssd can't just keep running. Prior to this we used nslcd and never had these issues. Is this a known issue or am I missing something in the setup?
Do the new servers have a different address? One possible reason is that SSSD would keep the old connection (or remain offline) until you cycle it.
Would signaling sssd to switch to offline and online instead of restarting it work equally?
pkill -USR1 sssd # Go offline pkill -USR2 sssd # Go back online
On Tue, 23 Jun 2015, Janelle wrote:
Hello,
My first post here. I have an issue with having occasional failures of LDAP servers being used by SSSD. What happens is that when a new server is stood up to replace the failed servers, users can't seem to login until SSSD is restarted. Some users can, and it is hard to tell which can and can't. I understand this is a caching setting or has something to do with caching, but I don't fully understand why sssd can't just keep running. Prior to this we used nslcd and never had these issues. Is this a known issue or am I missing something in the setup?
How are you telling SSSD about the available LDAP servers? Are you using SRV records?
jh
On 6/23/15 6:48 AM, John Hodrien wrote:
On Tue, 23 Jun 2015, Janelle wrote:
Hello,
My first post here. I have an issue with having occasional failures of LDAP servers being used by SSSD. What happens is that when a new server is stood up to replace the failed servers, users can't seem to login until SSSD is restarted. Some users can, and it is hard to tell which can and can't. I understand this is a caching setting or has something to do with caching, but I don't fully understand why sssd can't just keep running. Prior to this we used nslcd and never had these issues. Is this a known issue or am I missing something in the setup?
How are you telling SSSD about the available LDAP servers? Are you using SRV records?
jh
Servers are behind a load-balancer. Address never changes.
~J
On Tue, 23 Jun 2015, Janelle wrote:
Servers are behind a load-balancer. Address never changes.
But one problem with that is that SSSD will see multiple servers as one server, and so will mark the server as failed if the load balancer presents it with a broken back end server.
Works much better in my experience when you tell SSSD about all the servers.
jh
On 6/23/15 7:33 AM, John Hodrien wrote:
On Tue, 23 Jun 2015, Janelle wrote:
Servers are behind a load-balancer. Address never changes.
But one problem with that is that SSSD will see multiple servers as one server, and so will mark the server as failed if the load balancer presents it with a broken back end server.
Works much better in my experience when you tell SSSD about all the servers.
jh
Sadly that is not possible. If SSSD did load balancing when given multiple servers, then yes, but it does not. When you are running 30,000 servers with 3000 users, you have to load balance or SSSD simply dies and an ssh login takes 5 minutes to complete. The only way to make SSSD happy and not kill the single server it would point to is to have multiple servers behind a VIP. Am I completely off base to think this is the way to go? Can SSSD be taught to actually load balance?
~J
Just to be clear, are you load balancing LDAP servers or you are making LDAP/LDAPS requests to Active Directory servers?
With AD, you should not be load balancing domain controllers due to the stickiness nature. With 2008 there were GPOs introduced to improve client DC fail-over and fall-back for clients. This would be a good addition to SSSD in the future to use the new GPOs:
http://www.windowsnetworking.com/kbase/WindowsTips/WindowsServer2008/AdminTi...
Location: Administrative Templates\System\Net Logon\DC Locator DNS Records\ Entry Name: Force Rediscovery Interval.
If it is only LDAP, you may want to provide more details regarding your LB setup, whether there is stickiness, etc. in your config.
On Tue, Jun 23, 2015 at 10:52 AM, Janelle janellenicole80@gmail.com wrote:
On 6/23/15 7:33 AM, John Hodrien wrote:
On Tue, 23 Jun 2015, Janelle wrote:
Servers are behind a load-balancer. Address never changes.
But one problem with that is that SSSD will see multiple servers as one server, and so will mark the server as failed if the load balancer presents it with a broken back end server.
Works much better in my experience when you tell SSSD about all the servers.
jh
Sadly that is not possible. If SSSD did load balancing when given multiple servers, then yes, but it does not. When you are running 30,000 servers with 3000 users, you have to load balance or SSSD simply dies and an ssh login takes 5 minutes to complete. The only way to make SSSD happy and not kill the single server it would point to is to have multiple servers behind a VIP. Am I completely off base to think this is the way to go? Can SSSD be taught to actually load balance?
~J
sssd-users mailing list sssd-users@lists.fedorahosted.org https://lists.fedorahosted.org/mailman/listinfo/sssd-users
On 6/23/15 8:38 AM, Frank Pikelner wrote:
Just to be clear, are you load balancing LDAP servers or you are making LDAP/LDAPS requests to Active Directory servers?
With AD, you should not be load balancing domain controllers due to the stickiness nature. With 2008 there were GPOs introduced to improve client DC fail-over and fall-back for clients. This would be a good addition to SSSD in the future to use the new GPOs:
http://www.windowsnetworking.com/kbase/WindowsTips/WindowsServer2008/AdminTi...
Location: Administrative Templates\System\Net Logon\DC Locator DNS Records\ Entry Name: Force Rediscovery Interval.
If it is only LDAP, you may want to provide more details regarding your LB setup, whether there is stickiness, etc. in your config.
On Tue, Jun 23, 2015 at 10:52 AM, Janelle <janellenicole80@gmail.com mailto:janellenicole80@gmail.com> wrote:
On 6/23/15 7:33 AM, John Hodrien wrote: On Tue, 23 Jun 2015, Janelle wrote: Servers are behind a load-balancer. Address never changes. But one problem with that is that SSSD will see multiple servers as one server, and so will mark the server as failed if the load balancer presents it with a broken back end server. Works much better in my experience when you tell SSSD about all the servers. jh Sadly that is not possible. If SSSD did load balancing when given multiple servers, then yes, but it does not. When you are running 30,000 servers with 3000 users, you have to load balance or SSSD simply dies and an ssh login takes 5 minutes to complete. The only way to make SSSD happy and not kill the single server it would point to is to have multiple servers behind a VIP. Am I completely off base to think this is the way to go? Can SSSD be taught to actually load balance? ~J
Sorry for confusion - yes - LDAP servers. I guess I assume these days when people say LDAP, that is what they mean, however, I see your point, since it is such a blurred line anymore.
So here is the scenario -- 3 LDAP servers behind a VIP. VIP = roundrobin. (Just a simple Citrix netscaler). The situation is that all 3 servers are replaced or updated, and then we have issues. If just one server is updated, it seems to recover OK.
Is there information that SSSD gets from LDAP lookups to determine what database it is looking at? I mean if a user changes her password in LDAP - how does SSSD know to use the new one or the cached value?
~J
Perhaps you can try configuring the same VIP/FQDN as your primary and backup URI with ldap_uri, ldap_backup_uri in SSSD config.
The man page for SSSD-LDAP (towards bottom) explains how SSSD performs a fail-over and what timeout exist. http://linux.die.net/man/5/sssd-ldap
Another idea may be to use one LDAP outside of the Netscaler (directly accessible) as a backup ldap_backup_uri. The server would only be used when going through the Netscaler does not work. The backup option would also eventually have SSSD retry the Netscaler as a primary connection method.
On Tue, Jun 23, 2015 at 11:49 AM, Janelle janellenicole80@gmail.com wrote:
On 6/23/15 8:38 AM, Frank Pikelner wrote:
Just to be clear, are you load balancing LDAP servers or you are making LDAP/LDAPS requests to Active Directory servers?
With AD, you should not be load balancing domain controllers due to the stickiness nature. With 2008 there were GPOs introduced to improve client DC fail-over and fall-back for clients. This would be a good addition to SSSD in the future to use the new GPOs:
http://www.windowsnetworking.com/kbase/WindowsTips/WindowsServer2008/AdminTi...
Location: Administrative Templates\System\Net Logon\DC Locator DNS Records\ Entry Name: Force Rediscovery Interval.
If it is only LDAP, you may want to provide more details regarding your LB setup, whether there is stickiness, etc. in your config.
On Tue, Jun 23, 2015 at 10:52 AM, Janelle janellenicole80@gmail.com wrote:
On 6/23/15 7:33 AM, John Hodrien wrote:
On Tue, 23 Jun 2015, Janelle wrote:
Servers are behind a load-balancer. Address never changes.
But one problem with that is that SSSD will see multiple servers as one server, and so will mark the server as failed if the load balancer presents it with a broken back end server.
Works much better in my experience when you tell SSSD about all the servers.
jh
Sadly that is not possible. If SSSD did load balancing when given multiple servers, then yes, but it does not. When you are running 30,000 servers with 3000 users, you have to load balance or SSSD simply dies and an ssh login takes 5 minutes to complete. The only way to make SSSD happy and not kill the single server it would point to is to have multiple servers behind a VIP. Am I completely off base to think this is the way to go? Can SSSD be taught to actually load balance?
~J
Sorry for confusion - yes - LDAP servers. I guess I assume these days when people say LDAP, that is what they mean, however, I see your point, since it is such a blurred line anymore.
So here is the scenario -- 3 LDAP servers behind a VIP. VIP = roundrobin. (Just a simple Citrix netscaler). The situation is that all 3 servers are replaced or updated, and then we have issues. If just one server is updated, it seems to recover OK.
Is there information that SSSD gets from LDAP lookups to determine what database it is looking at? I mean if a user changes her password in LDAP - how does SSSD know to use the new one or the cached value?
~J
sssd-users mailing list sssd-users@lists.fedorahosted.org https://lists.fedorahosted.org/mailman/listinfo/sssd-users
On Tue, Jun 23, 2015 at 11:38:17AM -0400, Frank Pikelner wrote:
Just to be clear, are you load balancing LDAP servers or you are making LDAP/LDAPS requests to Active Directory servers?
With AD, you should not be load balancing domain controllers due to the stickiness nature. With 2008 there were GPOs introduced to improve client DC fail-over and fall-back for clients. This would be a good addition to SSSD in the future to use the new GPOs:
FWIW, the stickiness is exactly how SSSD is behaving: ~~~~~~~~ When a client computer finds a preferred domain controller, it sticks to this domain controller unless that domain controller stops responding or the client computer is restarted ~~~~~~~~
We've had one user who was unhappy about this default behaviour and they solved the problem with SRV queries as well -- they set a low TTL on SRV queries, which forced SSSD to re-discover servers on each login past the TTL interval. Then SSSD would select a server on the same priority level based on the weight field.
Please note that a) this works only with reasonably recent SSSD versions as we haven't been honoring TTL correctly earlier and b) this only works for login, because for identity lookups (which are mostly just LDAP searches), we reuse an LDAP connection until we can..
Jakub Hrozek wrote:
We've had one user who was unhappy about this default behaviour and they solved the problem with SRV queries as well -- they set a low TTL on SRV queries, which forced SSSD to re-discover servers on each login past the TTL interval. Then SSSD would select a server on the same priority level based on the weight field.
I repeat my security concerns: Without effective DNSSEC validation there's no cryptographically signed binding between a name and a server cert then.
Why not just use DNS round-robin with A RRs? Then TLS hostname check works as expected (provided you have proper subjectAltName values in the server certs). It seems the load-balancing works reasonably well with sssd 1.9.x in a installation with 8000+ systems and many OpenLDAP replicas. I could see in the monitoring that replicas being down for a while get new connections after restart.
Ciao, Michael.
On Tue, Jun 23, 2015 at 07:52:46AM -0700, Janelle wrote:
On 6/23/15 7:33 AM, John Hodrien wrote:
On Tue, 23 Jun 2015, Janelle wrote:
Servers are behind a load-balancer. Address never changes.
But one problem with that is that SSSD will see multiple servers as one server, and so will mark the server as failed if the load balancer presents it with a broken back end server.
Works much better in my experience when you tell SSSD about all the servers.
jh
Sadly that is not possible. If SSSD did load balancing when given multiple servers, then yes, but it does not. When you are running 30,000 servers with 3000 users, you have to load balance or SSSD simply dies and an ssh login takes 5 minutes to complete.
What is the configuration you were running here? I'm interested in seeing how we can make SSSD not die :-)
The only way to make SSSD happy and not kill the single server it would point to is to have multiple servers behind a VIP.
Hmm, did you consider SRV records as John pointed out elsewhere? Then you could load-balance using weight fields of SRV records..
Am I completely off base to think this is the way to go? Can SSSD be taught to actually load balance?
I'm not exactly sure how you would like SSSD to behave. Would this ticket help - https://fedorahosted.org/sssd/ticket/2499 ?
Hmm, did you consider SRV records as John pointed out elsewhere? Then you could load-balance using weight fields of SRV records..
OT question - not sure if SRV can be used for load-balancing? If we use the same priority weight for more _ldap servers, will SSSD pick random one or the first one?
Thanks, Ondrej
-----
The information contained in this e-mail and in any attachments is confidential and is designated solely for the attention of the intended recipient(s). If you are not an intended recipient, you must not use, disclose, copy, distribute or retain this e-mail or any part thereof. If you have received this e-mail in error, please notify the sender by return e-mail and delete all copies of this e-mail from your computer system(s). Please direct any additional queries to: communications@s3group.com. Thank You. Silicon and Software Systems Limited (S3 Group). Registered in Ireland no. 378073. Registered Office: South County Business Park, Leopardstown, Dublin 18.
On Wed, Jun 24, 2015 at 08:35:10AM +0000, Ondrej Valousek wrote:
Hmm, did you consider SRV records as John pointed out elsewhere? Then you could load-balance using weight fields of SRV records..
OT question - not sure if SRV can be used for load-balancing? If we use the same priority weight for more _ldap servers, will SSSD pick random one or the first one?
Random one (as RFC 2782 mandates us to do).
On 6/24/15 12:38 AM, Jakub Hrozek wrote:
On Tue, Jun 23, 2015 at 07:52:46AM -0700, Janelle wrote:
On 6/23/15 7:33 AM, John Hodrien wrote:
On Tue, 23 Jun 2015, Janelle wrote:
Servers are behind a load-balancer. Address never changes.
But one problem with that is that SSSD will see multiple servers as one server, and so will mark the server as failed if the load balancer presents it with a broken back end server.
Works much better in my experience when you tell SSSD about all the servers.
jh
Sadly that is not possible. If SSSD did load balancing when given multiple servers, then yes, but it does not. When you are running 30,000 servers with 3000 users, you have to load balance or SSSD simply dies and an ssh login takes 5 minutes to complete.
What is the configuration you were running here? I'm interested in seeing how we can make SSSD not die :-)
The only way to make SSSD happy and not kill the single server it would point to is to have multiple servers behind a VIP.
Hmm, did you consider SRV records as John pointed out elsewhere? Then you could load-balance using weight fields of SRV records..
Am I completely off base to think this is the way to go? Can SSSD be taught to actually load balance?
I'm not exactly sure how you would like SSSD to behave. Would this ticket help - https://fedorahosted.org/sssd/ticket/2499 ? _______________________________________________ sssd-users mailing list sssd-users@lists.fedorahosted.org https://lists.fedorahosted.org/mailman/listinfo/sssd-users
What I found was that when the VIP servers are updated, even though most of the systems continue to run, a large population seems to say the LDAP server has lost connection. And then SSSD stops trying unless you restart it:
ldap_id_use_start_tls = falsessd[be[default]]] [fo_resolve_service_send] (0x0020): No available servers for service 'LDAP' [autofs]edentials = true5) [sssd[be[default]]] [sss_ldap_init_sys_connect_done] (0x0020): ldap_install_tls failed: Connect error ldap_tls_cacertdir = /etc/openldap/cacertst]]] [sdap_sys_connect_done] (0x0020): sdap_async_connect_call request failed.
(ignore cert error - it is set to ALLOW)
A simple "service sssd restart" solves it, but you can see the server is still up. A telnet connect to either of 389 or 636 works fine. It seems to me like SSSD just gives up and stops trying?
As a side note - nslcd works flawlessly and the server might disconnect for a second, then it comes back and nslc restores the connect. It does not seem to give up as SSSD does :-(
~J
On Wed, Jun 24, 2015 at 10:18:26AM -0700, Janelle wrote:
On 6/24/15 12:38 AM, Jakub Hrozek wrote:
On Tue, Jun 23, 2015 at 07:52:46AM -0700, Janelle wrote:
On 6/23/15 7:33 AM, John Hodrien wrote:
On Tue, 23 Jun 2015, Janelle wrote:
Servers are behind a load-balancer. Address never changes.
But one problem with that is that SSSD will see multiple servers as one server, and so will mark the server as failed if the load balancer presents it with a broken back end server.
Works much better in my experience when you tell SSSD about all the servers.
jh
Sadly that is not possible. If SSSD did load balancing when given multiple servers, then yes, but it does not. When you are running 30,000 servers with 3000 users, you have to load balance or SSSD simply dies and an ssh login takes 5 minutes to complete.
What is the configuration you were running here? I'm interested in seeing how we can make SSSD not die :-)
The only way to make SSSD happy and not kill the single server it would point to is to have multiple servers behind a VIP.
Hmm, did you consider SRV records as John pointed out elsewhere? Then you could load-balance using weight fields of SRV records..
Am I completely off base to think this is the way to go? Can SSSD be taught to actually load balance?
I'm not exactly sure how you would like SSSD to behave. Would this ticket help - https://fedorahosted.org/sssd/ticket/2499 ? _______________________________________________ sssd-users mailing list sssd-users@lists.fedorahosted.org https://lists.fedorahosted.org/mailman/listinfo/sssd-users
What I found was that when the VIP servers are updated, even though most of the systems continue to run, a large population seems to say the LDAP server
Have you tried if cycling the offline/online status with USR1 and USR2 helps?
has lost connection. And then SSSD stops trying unless you restart it:
ldap_id_use_start_tls = falsessd[be[default]]] [fo_resolve_service_send] (0x0020): No available servers for service 'LDAP' [autofs]edentials = true5) [sssd[be[default]]] [sss_ldap_init_sys_connect_done] (0x0020): ldap_install_tls failed: Connect error ldap_tls_cacertdir = /etc/openldap/cacertst]]] [sdap_sys_connect_done] (0x0020): sdap_async_connect_call request failed.
(ignore cert error - it is set to ALLOW)
A simple "service sssd restart" solves it, but you can see the server is still up. A telnet connect to either of 389 or 636 works fine. It seems to me like SSSD just gives up and stops trying?
At that point sssd goes offline, right?
Could you try experimenting with a short offline_timeout? (see man sssd.conf for more details on that option)
As a side note - nslcd works flawlessly and the server might disconnect for a second, then it comes back and nslc restores the connect. It does not seem to give up as SSSD does :-(
I think it's because nslcd is not as stateful as sssd, so it would try to connect every time. But I'm not totally sure without seeing the issue myself..
On 6/24/15 10:52 AM, Jakub Hrozek wrote:
On Wed, Jun 24, 2015 at 10:18:26AM -0700, Janelle wrote:
On 6/24/15 12:38 AM, Jakub Hrozek wrote:
On Tue, Jun 23, 2015 at 07:52:46AM -0700, Janelle wrote:
On 6/23/15 7:33 AM, John Hodrien wrote:
On Tue, 23 Jun 2015, Janelle wrote:
Servers are behind a load-balancer. Address never changes.
But one problem with that is that SSSD will see multiple servers as one server, and so will mark the server as failed if the load balancer presents it with a broken back end server.
Works much better in my experience when you tell SSSD about all the servers.
jh
Sadly that is not possible. If SSSD did load balancing when given multiple servers, then yes, but it does not. When you are running 30,000 servers with 3000 users, you have to load balance or SSSD simply dies and an ssh login takes 5 minutes to complete.
What is the configuration you were running here? I'm interested in seeing how we can make SSSD not die :-)
The only way to make SSSD happy and not kill the single server it would point to is to have multiple servers behind a VIP.
Hmm, did you consider SRV records as John pointed out elsewhere? Then you could load-balance using weight fields of SRV records..
Am I completely off base to think this is the way to go? Can SSSD be taught to actually load balance?
I'm not exactly sure how you would like SSSD to behave. Would this ticket help - https://fedorahosted.org/sssd/ticket/2499 ? _______________________________________________ sssd-users mailing list sssd-users@lists.fedorahosted.org https://lists.fedorahosted.org/mailman/listinfo/sssd-users
What I found was that when the VIP servers are updated, even though most of the systems continue to run, a large population seems to say the LDAP server
Have you tried if cycling the offline/online status with USR1 and USR2 helps?
has lost connection. And then SSSD stops trying unless you restart it:
ldap_id_use_start_tls = falsessd[be[default]]] [fo_resolve_service_send] (0x0020): No available servers for service 'LDAP' [autofs]edentials = true5) [sssd[be[default]]] [sss_ldap_init_sys_connect_done] (0x0020): ldap_install_tls failed: Connect error ldap_tls_cacertdir = /etc/openldap/cacertst]]] [sdap_sys_connect_done] (0x0020): sdap_async_connect_call request failed.
(ignore cert error - it is set to ALLOW)
A simple "service sssd restart" solves it, but you can see the server is still up. A telnet connect to either of 389 or 636 works fine. It seems to me like SSSD just gives up and stops trying?
At that point sssd goes offline, right?
Could you try experimenting with a short offline_timeout? (see man sssd.conf for more details on that option)
As a side note - nslcd works flawlessly and the server might disconnect for a second, then it comes back and nslc restores the connect. It does not seem to give up as SSSD does :-(
I think it's because nslcd is not as stateful as sssd, so it would try to connect every time. But I'm not totally sure without seeing the issue myself.. _______________________________________________ sssd-users mailing list sssd-users@lists.fedorahosted.org https://lists.fedorahosted.org/mailman/listinfo/sssd-users
What version was offline_timeout added? I would expect with a default of 60, it would recover, but it does not seem to. But maybe there is a version issue here?
~J
On Wed, Jun 24, 2015 at 08:57:40PM -0700, Janelle wrote:
On 6/24/15 10:52 AM, Jakub Hrozek wrote:
On Wed, Jun 24, 2015 at 10:18:26AM -0700, Janelle wrote:
On 6/24/15 12:38 AM, Jakub Hrozek wrote:
On Tue, Jun 23, 2015 at 07:52:46AM -0700, Janelle wrote:
On 6/23/15 7:33 AM, John Hodrien wrote:
On Tue, 23 Jun 2015, Janelle wrote:
>Servers are behind a load-balancer. Address never changes. But one problem with that is that SSSD will see multiple servers as one server, and so will mark the server as failed if the load balancer presents it with a broken back end server.
Works much better in my experience when you tell SSSD about all the servers.
jh
Sadly that is not possible. If SSSD did load balancing when given multiple servers, then yes, but it does not. When you are running 30,000 servers with 3000 users, you have to load balance or SSSD simply dies and an ssh login takes 5 minutes to complete.
What is the configuration you were running here? I'm interested in seeing how we can make SSSD not die :-)
The only way to make SSSD happy and not kill the single server it would point to is to have multiple servers behind a VIP.
Hmm, did you consider SRV records as John pointed out elsewhere? Then you could load-balance using weight fields of SRV records..
Am I completely off base to think this is the way to go? Can SSSD be taught to actually load balance?
I'm not exactly sure how you would like SSSD to behave. Would this ticket help - https://fedorahosted.org/sssd/ticket/2499 ? _______________________________________________ sssd-users mailing list sssd-users@lists.fedorahosted.org https://lists.fedorahosted.org/mailman/listinfo/sssd-users
What I found was that when the VIP servers are updated, even though most of the systems continue to run, a large population seems to say the LDAP server
Have you tried if cycling the offline/online status with USR1 and USR2 helps?
has lost connection. And then SSSD stops trying unless you restart it:
ldap_id_use_start_tls = falsessd[be[default]]] [fo_resolve_service_send] (0x0020): No available servers for service 'LDAP' [autofs]edentials = true5) [sssd[be[default]]] [sss_ldap_init_sys_connect_done] (0x0020): ldap_install_tls failed: Connect error ldap_tls_cacertdir = /etc/openldap/cacertst]]] [sdap_sys_connect_done] (0x0020): sdap_async_connect_call request failed.
(ignore cert error - it is set to ALLOW)
A simple "service sssd restart" solves it, but you can see the server is still up. A telnet connect to either of 389 or 636 works fine. It seems to me like SSSD just gives up and stops trying?
At that point sssd goes offline, right?
Could you try experimenting with a short offline_timeout? (see man sssd.conf for more details on that option)
As a side note - nslcd works flawlessly and the server might disconnect for a second, then it comes back and nslc restores the connect. It does not seem to give up as SSSD does :-(
I think it's because nslcd is not as stateful as sssd, so it would try to connect every time. But I'm not totally sure without seeing the issue myself.. _______________________________________________ sssd-users mailing list sssd-users@lists.fedorahosted.org https://lists.fedorahosted.org/mailman/listinfo/sssd-users
What version was offline_timeout added? I would expect with a default of 60, it would recover, but it does not seem to. But maybe there is a version issue here?
1.11.7 upstream.
sssd-users@lists.fedorahosted.org