Re: [SSSD-users] caching question? (switching servers)

Wednesday, 24 June 2015

On 6/24/15 12:38 AM, Jakub Hrozek wrote:
...
 On Tue, Jun 23, 2015 at 07:52:46AM -0700, Janelle wrote:
> On 6/23/15 7:33 AM, John Hodrien wrote:
>> On Tue, 23 Jun 2015, Janelle wrote:
>>
>>> Servers are behind a load-balancer. Address never changes.
>> But one problem with that is that SSSD will see multiple servers as one
>> server, and so will mark the server as failed if the load balancer
>> presents it
>> with a broken back end server.
>>
>> Works much better in my experience when you tell SSSD about all the
>> servers.
>>
>> jh
> Sadly that is not possible.  If SSSD did load balancing when given multiple
> servers, then yes, but it does not. When you are running 30,000 servers with
> 3000 users, you have to load balance or SSSD simply dies and an ssh login
> takes 5 minutes to complete.
 What is the configuration you were running here? I'm interested in
 seeing how we can make SSSD not die :-)

> The only way to make SSSD happy and not kill
> the single server it would point to is to have multiple servers behind a
> VIP.
 Hmm, did you consider SRV records as John pointed out elsewhere? Then
 you could load-balance using weight fields of SRV records..

> Am I completely off base to think this is the way to go? Can SSSD be
> taught to actually load balance?
 I'm not exactly sure how you would like SSSD to behave. Would this
 ticket help - https://fedorahosted.org/sssd/ticket/2499 ?
 _______________________________________________
 sssd-users mailing list
 sssd-users(a)lists.fedorahosted.org
 https://lists.fedorahosted.org/mailman/listinfo/sssd-users What I found was that
when the VIP servers are updated, even though most 
of the systems continue to run, a large population seems to say the LDAP 
server has lost connection. And then SSSD stops trying unless you 
restart it:

ldap_id_use_start_tls = falsessd[be[default]]] [fo_resolve_service_send] 
(0x0020): No available servers for service 'LDAP'
[autofs]edentials = true5) [sssd[be[default]]] 
[sss_ldap_init_sys_connect_done] (0x0020): ldap_install_tls failed: 
Connect error
ldap_tls_cacertdir = /etc/openldap/cacertst]]] [sdap_sys_connect_done] 
(0x0020): sdap_async_connect_call request failed.

(ignore cert error - it is set to ALLOW)

A simple "service sssd restart" solves it, but you can see the server is 
still up. A telnet connect to either of 389 or 636 works fine. It seems 
to me like SSSD just gives up and stops trying?

As a side note - nslcd works flawlessly and the server might disconnect 
for a second, then it comes back and nslc restores the connect. It does 
not seem to give up as SSSD does :-(

~J

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

Re: [SSSD-users] caching question? (switching servers)