Hi,
we're using SSSD in combination with active directory and have received complaints from users about a corner case in our setup.
Our AD servers are only reachable from within our corporate network, connection attempts from the outside are dropped by firewalls. This leads to the following scenario: - user takes machine (e.g. laptop) outside the corporate network - user tries to authenticate (or in some cases also tries to "ls" which causes uid/gid lookup) - sssd will try to reach the configured servers for up to 30s - sssd goes (back) into offline mode and uses cached credentials and authenticates the user
This will however NOT happen if sssd gets told by the IP stack that a connection to the target IP is not possible (e.g. "ip route add blackhole 192.0.2.23/32" or one of the routers along the way generates an ICMP unreachable). In such cases sssd will go immediately into offline mode and use cached credentials.
I'm aware that this is over all sensible behaviour, but what I would hope to fine tune is how sssd stays in offline mode. Currently it seems like it will leave offline mode when it tries to reconnect (hardcoded 30s?). That leads to a flip flop scenario where it seems to be 30s offline and 30s "online/connecting" and users have a fairly high chance to hit a time during which their authentication will seemingly stall.
So my question is: Is there a better way to deal with this in the sssd context? If not we'll probably have to implement separate connection checking and inject and remove blackhole routes accordingly. Not the nicest of workarounds in my book.
Thanks, cheers
Thomas
PS: We're using sssd on many distributions, but our main distro at the moment is ubuntu 12.04 with sssd 1.8.6 and we'll be rolling out 14.04 in addition, which has sssd 1.11.3.
On Wed, Apr 02, 2014 at 12:02:41PM +0300, "Thomas B. Rücker" wrote:
Hi,
we're using SSSD in combination with active directory and have received complaints from users about a corner case in our setup.
Our AD servers are only reachable from within our corporate network, connection attempts from the outside are dropped by firewalls. This leads to the following scenario:
- user takes machine (e.g. laptop) outside the corporate network
- user tries to authenticate (or in some cases also tries to "ls" which
causes uid/gid lookup)
- sssd will try to reach the configured servers for up to 30s
^^^^^^^^^^^ This is not so clear to me, are you saying that it takes up to 30 seconds for SSSD to realize it's offline and switch to the offline mode?
- sssd goes (back) into offline mode and uses cached credentials and
authenticates the user
I'm using a very similar setup on my laptop where I authenticate against LDAP and Kerberos servers inside Red Hat's internal network. I see a couple of seconds lag sometimes, but not 30s as you describe..
This will however NOT happen if sssd gets told by the IP stack that a connection to the target IP is not possible (e.g. "ip route add blackhole 192.0.2.23/32" or one of the routers along the way generates an ICMP unreachable). In such cases sssd will go immediately into offline mode and use cached credentials.
So I suspect the dropping of packets instead of rejecting makes the difference, right?
I'm aware that this is over all sensible behaviour, but what I would hope to fine tune is how sssd stays in offline mode. Currently it seems like it will leave offline mode when it tries to reconnect (hardcoded 30s?). That leads to a flip flop scenario where it seems to be 30s offline and 30s "online/connecting" and users have a fairly high chance to hit a time during which their authentication will seemingly stall.
Newer versions have the 'offline_timeout' option available. For the later versions, I would suggest to fine tune the timeouts, so the offline detection is faster.
So my question is: Is there a better way to deal with this in the sssd context? If not we'll probably have to implement separate connection checking and inject and remove blackhole routes accordingly. Not the nicest of workarounds in my book.
Can you enable debugging and see where the biggest lag is? Maybe we could see what exactly takes the longest and lower the appropriate timeout.
Thanks, cheers
Thomas
PS: We're using sssd on many distributions, but our main distro at the moment is ubuntu 12.04 with sssd 1.8.6 and we'll be rolling out 14.04 in addition, which has sssd 1.11.3.
I remember in 1.9 we fixed a bug where we would attempt to resolve kpasswd in addition to kdc on authentication. I can't find the commit rigth now, but it would be nice if you could check some newer version and see if the situation is somewhat better recently.
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
On 04/02/2014 07:41 AM, Jakub Hrozek wrote:
On Wed, Apr 02, 2014 at 12:02:41PM +0300, "Thomas B. Rücker" wrote:
Hi,
we're using SSSD in combination with active directory and have received complaints from users about a corner case in our setup.
Our AD servers are only reachable from within our corporate network, connection attempts from the outside are dropped by firewalls. This leads to the following scenario: - user takes machine (e.g. laptop) outside the corporate network - user tries to authenticate (or in some cases also tries to "ls" which causes uid/gid lookup) - sssd will try to reach the configured servers for up to 30s
^^^^^^^^^^^ This is not so clear to me, are you saying that it takes up to 30 seconds for SSSD to realize it's offline and switch to the offline mode?
- sssd goes (back) into offline mode and uses cached credentials
and authenticates the user
I'm using a very similar setup on my laptop where I authenticate against LDAP and Kerberos servers inside Red Hat's internal network. I see a couple of seconds lag sometimes, but not 30s as you describe..
What he's saying is that the firewall behavior from outside the network is DROP. In our case, the address isn't even resolvable from outside, since we use private DNS entries. So we have a short-circuit.
In his situation, the address is resolvable, so SSSD sends a request to connect to LDAP. It then hangs with no response.
Now, this *should* be hitting the 6 second ldap_network_timeout default value. I'm not sure why it's not, unless there's a timeout failure happening during connect() instead of during poll/select, which I don't think we can actually avoid.
<snip>
Can you enable debugging and see where the biggest lag is? Maybe we could see what exactly takes the longest and lower the appropriate timeout.
This would be very helpful. Please set 'debug_level = 7' in the [domain/DOMAINNAME] section of sssd.conf, restart SSSD and then gather the logs. Look at the timestamps to see what is happening for about thirty lines before and after the lag. Ideally, sanitize that log and send it to us.
On 04/02/2014 05:02 AM, "Thomas B. Rücker" wrote:
Hi,
we're using SSSD in combination with active directory and have received complaints from users about a corner case in our setup.
Our AD servers are only reachable from within our corporate network, connection attempts from the outside are dropped by firewalls. This leads to the following scenario:
- user takes machine (e.g. laptop) outside the corporate network
- user tries to authenticate (or in some cases also tries to "ls" which
causes uid/gid lookup)
- sssd will try to reach the configured servers for up to 30s
- sssd goes (back) into offline mode and uses cached credentials and
authenticates the user
This will however NOT happen if sssd gets told by the IP stack that a connection to the target IP is not possible (e.g. "ip route add blackhole 192.0.2.23/32" or one of the routers along the way generates an ICMP unreachable). In such cases sssd will go immediately into offline mode and use cached credentials.
I'm aware that this is over all sensible behaviour, but what I would hope to fine tune is how sssd stays in offline mode. Currently it seems like it will leave offline mode when it tries to reconnect (hardcoded 30s?). That leads to a flip flop scenario where it seems to be 30s offline and 30s "online/connecting" and users have a fairly high chance to hit a time during which their authentication will seemingly stall.
So my question is: Is there a better way to deal with this in the sssd context? If not we'll probably have to implement separate connection checking and inject and remove blackhole routes accordingly. Not the nicest of workarounds in my book.
Thanks, cheers
Thomas
PS: We're using sssd on many distributions, but our main distro at the moment is ubuntu 12.04 with sssd 1.8.6 and we'll be rolling out 14.04 in addition, which has sssd 1.11.3. _______________________________________________ sssd-users mailing list sssd-users@lists.fedorahosted.org https://lists.fedorahosted.org/mailman/listinfo/sssd-users
In addtion to other comments I want to say that I experienced similar behavior periodically with my laptop until I moved to 1.9.x. Please try the latest version. The problem might already be addressed.
On 03/04/14 22:43, Dmitri Pal wrote:
On 04/02/2014 05:02 AM, "Thomas B. Rücker" wrote:
Hi,
we're using SSSD in combination with active directory and have received complaints from users about a corner case in our setup.
Our AD servers are only reachable from within our corporate network, connection attempts from the outside are dropped by firewalls. This leads to the following scenario:
- user takes machine (e.g. laptop) outside the corporate network
- user tries to authenticate (or in some cases also tries to "ls" which
causes uid/gid lookup)
- sssd will try to reach the configured servers for up to 30s
- sssd goes (back) into offline mode and uses cached credentials and
authenticates the user
This will however NOT happen if sssd gets told by the IP stack that a connection to the target IP is not possible (e.g. "ip route add blackhole 192.0.2.23/32" or one of the routers along the way generates an ICMP unreachable). In such cases sssd will go immediately into offline mode and use cached credentials.
I'm aware that this is over all sensible behaviour, but what I would hope to fine tune is how sssd stays in offline mode. Currently it seems like it will leave offline mode when it tries to reconnect (hardcoded 30s?). That leads to a flip flop scenario where it seems to be 30s offline and 30s "online/connecting" and users have a fairly high chance to hit a time during which their authentication will seemingly stall.
So my question is: Is there a better way to deal with this in the sssd context? If not we'll probably have to implement separate connection checking and inject and remove blackhole routes accordingly. Not the nicest of workarounds in my book.
Thanks, cheers
Thomas
PS: We're using sssd on many distributions, but our main distro at the moment is ubuntu 12.04 with sssd 1.8.6 and we'll be rolling out 14.04 in addition, which has sssd 1.11.3.
In addtion to other comments I want to say that I experienced similar behavior periodically with my laptop until I moved to 1.9.x. Please try the latest version. The problem might already be addressed.
To this I'd like to specifically reply that I've paid closer attention to the behaviour of 1.11.3 on 14.04 and it's outright HORRIBLE compared to 1.8.6 on 12.04. It locks hard for full 30s on much more things. cd into /etc or /home? - 30s penalty on that shell. cd into different directory a few minutes later - 30s penalty again. Unlock screen? sudo? ... 30s It _seems_ (haven't checked) as if the timer only starts running exactly that instant, instead of what I observed on 12.04 where it was constantly retrying so it would be _up to_ 30s.
All goes away by routing the LDAP IPs to a black hole, just like on 12.04/1.8.6.
If you want to reproduce this behaviour, you could try adding LDAP server IPs to your hosts file while outside your network. This will only show though if some firewall/router along the way doesn't reply with "ICMP host unreachable", AFAIU. Or when inside your network you just add iptables DROP rules on all your LDAP server destiations. Whichever way you prefer.
While I'm not very active on this list, I'm trying to investigate this problem internally to get a better idea of how to mitigate it. Sadly I have more urgent things going on, else I'd have come back with debug logs and more well thought out theories already.
Cheers
Thomas
On Tue, Apr 15, 2014 at 03:35:01PM +0300, "Thomas B. Rücker" wrote:
On 03/04/14 22:43, Dmitri Pal wrote:
On 04/02/2014 05:02 AM, "Thomas B. Rücker" wrote:
Hi,
we're using SSSD in combination with active directory and have received complaints from users about a corner case in our setup.
Our AD servers are only reachable from within our corporate network, connection attempts from the outside are dropped by firewalls. This leads to the following scenario:
- user takes machine (e.g. laptop) outside the corporate network
- user tries to authenticate (or in some cases also tries to "ls" which
causes uid/gid lookup)
- sssd will try to reach the configured servers for up to 30s
- sssd goes (back) into offline mode and uses cached credentials and
authenticates the user
This will however NOT happen if sssd gets told by the IP stack that a connection to the target IP is not possible (e.g. "ip route add blackhole 192.0.2.23/32" or one of the routers along the way generates an ICMP unreachable). In such cases sssd will go immediately into offline mode and use cached credentials.
I'm aware that this is over all sensible behaviour, but what I would hope to fine tune is how sssd stays in offline mode. Currently it seems like it will leave offline mode when it tries to reconnect (hardcoded 30s?). That leads to a flip flop scenario where it seems to be 30s offline and 30s "online/connecting" and users have a fairly high chance to hit a time during which their authentication will seemingly stall.
So my question is: Is there a better way to deal with this in the sssd context? If not we'll probably have to implement separate connection checking and inject and remove blackhole routes accordingly. Not the nicest of workarounds in my book.
Thanks, cheers
Thomas
PS: We're using sssd on many distributions, but our main distro at the moment is ubuntu 12.04 with sssd 1.8.6 and we'll be rolling out 14.04 in addition, which has sssd 1.11.3.
In addtion to other comments I want to say that I experienced similar behavior periodically with my laptop until I moved to 1.9.x. Please try the latest version. The problem might already be addressed.
To this I'd like to specifically reply that I've paid closer attention to the behaviour of 1.11.3 on 14.04 and it's outright HORRIBLE compared to 1.8.6 on 12.04. It locks hard for full 30s on much more things. cd into /etc or /home? - 30s penalty on that shell. cd into different directory a few minutes later - 30s penalty again. Unlock screen? sudo? ... 30s It _seems_ (haven't checked) as if the timer only starts running exactly that instant, instead of what I observed on 12.04 where it was constantly retrying so it would be _up to_ 30s.
All goes away by routing the LDAP IPs to a black hole, just like on 12.04/1.8.6.
If you want to reproduce this behaviour, you could try adding LDAP server IPs to your hosts file while outside your network. This will only show though if some firewall/router along the way doesn't reply with "ICMP host unreachable", AFAIU. Or when inside your network you just add iptables DROP rules on all your LDAP server destiations. Whichever way you prefer.
While I'm not very active on this list, I'm trying to investigate this problem internally to get a better idea of how to mitigate it. Sadly I have more urgent things going on, else I'd have come back with debug logs and more well thought out theories already.
Cheers
Thomas
The logs would be really welcome.
On 04/16/2014 04:26 AM, Jakub Hrozek wrote:
On Tue, Apr 15, 2014 at 03:35:01PM +0300, "Thomas B. Rücker" wrote:
On 03/04/14 22:43, Dmitri Pal wrote:
On 04/02/2014 05:02 AM, "Thomas B. Rücker" wrote:
Hi,
we're using SSSD in combination with active directory and have received complaints from users about a corner case in our setup.
Our AD servers are only reachable from within our corporate network, connection attempts from the outside are dropped by firewalls. This leads to the following scenario:
- user takes machine (e.g. laptop) outside the corporate network
- user tries to authenticate (or in some cases also tries to "ls" which
causes uid/gid lookup)
- sssd will try to reach the configured servers for up to 30s
- sssd goes (back) into offline mode and uses cached credentials and
authenticates the user
This will however NOT happen if sssd gets told by the IP stack that a connection to the target IP is not possible (e.g. "ip route add blackhole 192.0.2.23/32" or one of the routers along the way generates an ICMP unreachable). In such cases sssd will go immediately into offline mode and use cached credentials.
I'm aware that this is over all sensible behaviour, but what I would hope to fine tune is how sssd stays in offline mode. Currently it seems like it will leave offline mode when it tries to reconnect (hardcoded 30s?). That leads to a flip flop scenario where it seems to be 30s offline and 30s "online/connecting" and users have a fairly high chance to hit a time during which their authentication will seemingly stall.
So my question is: Is there a better way to deal with this in the sssd context? If not we'll probably have to implement separate connection checking and inject and remove blackhole routes accordingly. Not the nicest of workarounds in my book.
Thanks, cheers
Thomas
PS: We're using sssd on many distributions, but our main distro at the moment is ubuntu 12.04 with sssd 1.8.6 and we'll be rolling out 14.04 in addition, which has sssd 1.11.3.
In addtion to other comments I want to say that I experienced similar behavior periodically with my laptop until I moved to 1.9.x. Please try the latest version. The problem might already be addressed.
To this I'd like to specifically reply that I've paid closer attention to the behaviour of 1.11.3 on 14.04 and it's outright HORRIBLE compared to 1.8.6 on 12.04. It locks hard for full 30s on much more things. cd into /etc or /home? - 30s penalty on that shell. cd into different directory a few minutes later - 30s penalty again. Unlock screen? sudo? ... 30s It _seems_ (haven't checked) as if the timer only starts running exactly that instant, instead of what I observed on 12.04 where it was constantly retrying so it would be _up to_ 30s.
All goes away by routing the LDAP IPs to a black hole, just like on 12.04/1.8.6.
If you want to reproduce this behaviour, you could try adding LDAP server IPs to your hosts file while outside your network. This will only show though if some firewall/router along the way doesn't reply with "ICMP host unreachable", AFAIU. Or when inside your network you just add iptables DROP rules on all your LDAP server destiations. Whichever way you prefer.
While I'm not very active on this list, I'm trying to investigate this problem internally to get a better idea of how to mitigate it. Sadly I have more urgent things going on, else I'd have come back with debug logs and more well thought out theories already.
Cheers
Thomas
The logs would be really welcome. _______________________________________________ sssd-users mailing list sssd-users@lists.fedorahosted.org https://lists.fedorahosted.org/mailman/listinfo/sssd-users
I had some interesting experience during Red Hat summit. The network was significantly overloaded. The VPN was slow and probably bleeding packets on the way like crazy. Any access to internal web page took a while and was happening in multiple steps. When screen was locking it was taking about 30 sec (I have not measured but that was a feeling) to log in. I am not sure we can do much about it but the flaky network is probably going to lead to some timeouts and bad user experience.
On Wed, 2014-04-16 at 19:49 -0400, Dmitri Pal wrote:
I had some interesting experience during Red Hat summit. The network was significantly overloaded. The VPN was slow and probably bleeding packets on the way like crazy. Any access to internal web page took a while and was happening in multiple steps. When screen was locking it was taking about 30 sec (I have not measured but that was a feeling) to log in. I am not sure we can do much about it but the flaky network is probably going to lead to some timeouts and bad user experience.
I think this may be a recent regression. We are never supposed to wait more than a handful of seconds, but I am noticing that with latest RHEL6 updates my RHEL desktop also sometimes gets stuck a while on authentication (VPN). I have not experienced this in F20 (but my domain controller is local).
Simo.
On Wed, Apr 16, 2014 at 10:47:10PM -0400, Simo Sorce wrote:
On Wed, 2014-04-16 at 19:49 -0400, Dmitri Pal wrote:
I had some interesting experience during Red Hat summit. The network was significantly overloaded. The VPN was slow and probably bleeding packets on the way like crazy. Any access to internal web page took a while and was happening in multiple steps. When screen was locking it was taking about 30 sec (I have not measured but that was a feeling) to log in. I am not sure we can do much about it but the flaky network is probably going to lead to some timeouts and bad user experience.
I think this may be a recent regression. We are never supposed to wait more than a handful of seconds, but I am noticing that with latest RHEL6 updates my RHEL desktop also sometimes gets stuck a while on authentication (VPN). I have not experienced this in F20 (but my domain controller is local).
Simo, if you can reproduce the error locally, would you mind enabling debug logs or trying out the 6.6 preview packages?
I only have headless VMs with RHEL6 and I'm not sure I could reproduce the bug there. But it sounds like something we should fix, so any debug information would be welcome, at least to know where to start with local debugging.
btw when I tried to reproduce the bug Thomas was seeing, I saw some blocking DNS calls in openldap's initialization path, but that was on F-20.
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
On 04/17/2014 04:13 AM, Jakub Hrozek wrote:
On Wed, Apr 16, 2014 at 10:47:10PM -0400, Simo Sorce wrote:
On Wed, 2014-04-16 at 19:49 -0400, Dmitri Pal wrote:
I had some interesting experience during Red Hat summit. The network was significantly overloaded. The VPN was slow and probably bleeding packets on the way like crazy. Any access to internal web page took a while and was happening in multiple steps. When screen was locking it was taking about 30 sec (I have not measured but that was a feeling) to log in. I am not sure we can do much about it but the flaky network is probably going to lead to some timeouts and bad user experience.
I think this may be a recent regression. We are never supposed to wait more than a handful of seconds, but I am noticing that with latest RHEL6 updates my RHEL desktop also sometimes gets stuck a while on authentication (VPN). I have not experienced this in F20 (but my domain controller is local).
Simo, if you can reproduce the error locally, would you mind enabling debug logs or trying out the 6.6 preview packages?
I only have headless VMs with RHEL6 and I'm not sure I could reproduce the bug there. But it sounds like something we should fix, so any debug information would be welcome, at least to know where to start with local debugging.
btw when I tried to reproduce the bug Thomas was seeing, I saw some blocking DNS calls in openldap's initialization path, but that was on F-20.
OpenLDAP isn't supposed to be calling DNS at all. That's the entire reason we open the port ourselves now and then pass the FD to it. If it suddenly started running DNS, that's probably a regression in the openldap libraries.
On Mon, Apr 21, 2014 at 10:05:58AM -0400, Stephen Gallagher wrote:
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
On 04/17/2014 04:13 AM, Jakub Hrozek wrote:
On Wed, Apr 16, 2014 at 10:47:10PM -0400, Simo Sorce wrote:
On Wed, 2014-04-16 at 19:49 -0400, Dmitri Pal wrote:
I had some interesting experience during Red Hat summit. The network was significantly overloaded. The VPN was slow and probably bleeding packets on the way like crazy. Any access to internal web page took a while and was happening in multiple steps. When screen was locking it was taking about 30 sec (I have not measured but that was a feeling) to log in. I am not sure we can do much about it but the flaky network is probably going to lead to some timeouts and bad user experience.
I think this may be a recent regression. We are never supposed to wait more than a handful of seconds, but I am noticing that with latest RHEL6 updates my RHEL desktop also sometimes gets stuck a while on authentication (VPN). I have not experienced this in F20 (but my domain controller is local).
Simo, if you can reproduce the error locally, would you mind enabling debug logs or trying out the 6.6 preview packages?
I only have headless VMs with RHEL6 and I'm not sure I could reproduce the bug there. But it sounds like something we should fix, so any debug information would be welcome, at least to know where to start with local debugging.
btw when I tried to reproduce the bug Thomas was seeing, I saw some blocking DNS calls in openldap's initialization path, but that was on F-20.
OpenLDAP isn't supposed to be calling DNS at all. That's the entire reason we open the port ourselves now and then pass the FD to it. If it suddenly started running DNS, that's probably a regression in the openldap libraries.
OK, I will do a bit more testing later this week, after I finish some patches for Jan P. I think this would result in an openldap bug report.
On Mon, Apr 21, 2014 at 10:05:58AM -0400, Stephen Gallagher wrote:
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
On 04/17/2014 04:13 AM, Jakub Hrozek wrote:
On Wed, Apr 16, 2014 at 10:47:10PM -0400, Simo Sorce wrote:
On Wed, 2014-04-16 at 19:49 -0400, Dmitri Pal wrote:
I had some interesting experience during Red Hat summit. The network was significantly overloaded. The VPN was slow and probably bleeding packets on the way like crazy. Any access to internal web page took a while and was happening in multiple steps. When screen was locking it was taking about 30 sec (I have not measured but that was a feeling) to log in. I am not sure we can do much about it but the flaky network is probably going to lead to some timeouts and bad user experience.
I think this may be a recent regression. We are never supposed to wait more than a handful of seconds, but I am noticing that with latest RHEL6 updates my RHEL desktop also sometimes gets stuck a while on authentication (VPN). I have not experienced this in F20 (but my domain controller is local).
Simo, if you can reproduce the error locally, would you mind enabling debug logs or trying out the 6.6 preview packages?
I only have headless VMs with RHEL6 and I'm not sure I could reproduce the bug there. But it sounds like something we should fix, so any debug information would be welcome, at least to know where to start with local debugging.
btw when I tried to reproduce the bug Thomas was seeing, I saw some blocking DNS calls in openldap's initialization path, but that was on F-20.
OpenLDAP isn't supposed to be calling DNS at all. That's the entire reason we open the port ourselves now and then pass the FD to it. If it suddenly started running DNS, that's probably a regression in the openldap libraries.
I had a bit of time to dig into the issue today, here is a snippet of the backtrace I'm seeing, after I started an IPA client with a faulty DNS entry in /etc/resolv.conf
#8 0x00007fda39c9e163 in __gethostbyname_r ( name=name@entry=0x7fff2548d140 "client.example.com", resbuf=resbuf@entry=0x7fff2548d120, buffer=0x1f33590 "\177", buflen=buflen@entry=992, result=result@entry=0x7fff2548d118, h_errnop=h_errnop@entry=0x7fff2548d10c) at ../nss/getXXbyYY_r.c:266 #9 0x00007fda3bb1b3de in ldap_pvt_gethostbyname_a ( name=name@entry=0x7fff2548d140 "client.example.com", resbuf=resbuf@entry=0x7fff2548d120, buf=buf@entry=0x7fff2548d110, result=result@entry=0x7fff2548d118, herrno_ptr=herrno_ptr@entry=0x7fff2548d10c) at util-int.c:350 #10 0x00007fda3bb1b5d0 in ldap_pvt_get_fqdn (name=0x7fff2548d140 "client.example.com", name@entry=0x0) at util-int.c:748 #11 0x00007fda3bb19b47 in ldap_int_initialize ( gopts=gopts@entry=0x7fda3bd40000 <ldap_int_global_options>, dbglvl=dbglvl@entry=0x0) at init.c:645 #12 0x00007fda3bb1a627 in ldap_set_option (ld=0x0, option=24582, invalue=0x7fff2548d2b0) at options.c:446 #13 0x00007fda30951cf6 in setup_tls_config (basic_opts=0x1f30450) at src/providers/ldap/sdap.c:533 #14 0x00007fda308214b3 in ldap_id_init_internal (bectx=0x1f12b40, ops=0x1f12cb0, pvt_data=0x7fff2548d5e8) at src/providers/ldap/ldap_init.c:146 #15 0x00007fda30821ba0 in sssm_ldap_id_init (bectx=0x1f12b40, ops=0x1f12cb0, pvt_data=0x1f12cb8) at src/providers/ldap/ldap_init.c:199 #16 0x000000000041b227 in load_backend_module (ctx=0x1f12b40, bet_type=BET_ID, bet_info=0x1f12ca8, default_mod_name=0x0) at src/providers/data_provider_be.c:2346 #17 0x000000000041ce4c in be_process_init (mem_ctx=0x1f0ba80, be_domain=0x1f093f0 "localipaldap", ev=0x1f0a630, cdb=0x1f0bb90) at src/providers/data_provider_be.c:2520 #18 0x000000000041fde6 in main (argc=3, argv=0x7fff2548e008) at src/providers/data_provider_be.c:2743
Do you agree this is an openldap bug? I don't like that ldap_set_option triggers a blocking DNS resolution call..
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
On 05/29/2014 07:40 AM, Jakub Hrozek wrote:
On Mon, Apr 21, 2014 at 10:05:58AM -0400, Stephen Gallagher wrote:
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
On 04/17/2014 04:13 AM, Jakub Hrozek wrote:
On Wed, Apr 16, 2014 at 10:47:10PM -0400, Simo Sorce wrote:
On Wed, 2014-04-16 at 19:49 -0400, Dmitri Pal wrote:
I had some interesting experience during Red Hat summit. The network was significantly overloaded. The VPN was slow and probably bleeding packets on the way like crazy. Any access to internal web page took a while and was happening in multiple steps. When screen was locking it was taking about 30 sec (I have not measured but that was a feeling) to log in. I am not sure we can do much about it but the flaky network is probably going to lead to some timeouts and bad user experience.
I think this may be a recent regression. We are never supposed to wait more than a handful of seconds, but I am noticing that with latest RHEL6 updates my RHEL desktop also sometimes gets stuck a while on authentication (VPN). I have not experienced this in F20 (but my domain controller is local).
Simo, if you can reproduce the error locally, would you mind enabling debug logs or trying out the 6.6 preview packages?
I only have headless VMs with RHEL6 and I'm not sure I could reproduce the bug there. But it sounds like something we should fix, so any debug information would be welcome, at least to know where to start with local debugging.
btw when I tried to reproduce the bug Thomas was seeing, I saw some blocking DNS calls in openldap's initialization path, but that was on F-20.
OpenLDAP isn't supposed to be calling DNS at all. That's the entire reason we open the port ourselves now and then pass the FD to it. If it suddenly started running DNS, that's probably a regression in the openldap libraries.
I had a bit of time to dig into the issue today, here is a snippet of the backtrace I'm seeing, after I started an IPA client with a faulty DNS entry in /etc/resolv.conf
#8 0x00007fda39c9e163 in __gethostbyname_r ( name=name@entry=0x7fff2548d140 "client.example.com", resbuf=resbuf@entry=0x7fff2548d120, buffer=0x1f33590 "\177", buflen=buflen@entry=992, result=result@entry=0x7fff2548d118, h_errnop=h_errnop@entry=0x7fff2548d10c) at ../nss/getXXbyYY_r.c:266 #9 0x00007fda3bb1b3de in ldap_pvt_gethostbyname_a ( name=name@entry=0x7fff2548d140 "client.example.com", resbuf=resbuf@entry=0x7fff2548d120, buf=buf@entry=0x7fff2548d110, result=result@entry=0x7fff2548d118, herrno_ptr=herrno_ptr@entry=0x7fff2548d10c) at util-int.c:350 #10 0x00007fda3bb1b5d0 in ldap_pvt_get_fqdn (name=0x7fff2548d140 "client.example.com", name@entry=0x0) at util-int.c:748 #11 0x00007fda3bb19b47 in ldap_int_initialize ( gopts=gopts@entry=0x7fda3bd40000 <ldap_int_global_options>, dbglvl=dbglvl@entry=0x0) at init.c:645 #12 0x00007fda3bb1a627 in ldap_set_option (ld=0x0, option=24582, invalue=0x7fff2548d2b0) at options.c:446 #13 0x00007fda30951cf6 in setup_tls_config (basic_opts=0x1f30450) at src/providers/ldap/sdap.c:533 #14 0x00007fda308214b3 in ldap_id_init_internal (bectx=0x1f12b40, ops=0x1f12cb0, pvt_data=0x7fff2548d5e8) at src/providers/ldap/ldap_init.c:146 #15 0x00007fda30821ba0 in sssm_ldap_id_init (bectx=0x1f12b40, ops=0x1f12cb0, pvt_data=0x1f12cb8) at src/providers/ldap/ldap_init.c:199 #16 0x000000000041b227 in load_backend_module (ctx=0x1f12b40, bet_type=BET_ID, bet_info=0x1f12ca8, default_mod_name=0x0) at src/providers/data_provider_be.c:2346 #17 0x000000000041ce4c in be_process_init (mem_ctx=0x1f0ba80, be_domain=0x1f093f0 "localipaldap", ev=0x1f0a630, cdb=0x1f0bb90) at src/providers/data_provider_be.c:2520 #18 0x000000000041fde6 in main (argc=3, argv=0x7fff2548e008) at src/providers/data_provider_be.c:2743
Do you agree this is an openldap bug? I don't like that ldap_set_option triggers a blocking DNS resolution call..
Yeah, that's certainly unexpected. Please open an OpenLDAP bug.
On Thu, May 29, 2014 at 07:52:45AM -0400, Stephen Gallagher wrote:
Do you agree this is an openldap bug? I don't like that ldap_set_option triggers a blocking DNS resolution call..
Yeah, that's certainly unexpected. Please open an OpenLDAP bug.
I was waiting a bit for confirmation from the OpenLDAP maintainer, but that never arrived, so I went ahead and filed: https://bugzilla.redhat.com/show_bug.cgi?id=1104627
sssd-users@lists.fedorahosted.org