Hi, I've recently been trying to hunt down some odd performance problems with our
installation of 389 LDAP (currently 1.3.2.19 but been following recent debian unstable).
We've been seeing long delays (tens of seconds at times) handling even the simplest
new bind()s while the server otherwise has idle worker threads (and other non-idle worker
threads servicing existing conenctions).
Upon grabbing some userland thread stacks during these "hangs" when no new
external connections could be established, I saw what looked to be the thread associated
with slapd_daemon() in ldap/servers/slapd/daemon.c hung up in setup_pr_read_pds() walking
the list of active connections acquiring connection locks (c->c_mutex) sequentially in
the process. I stuck some calls to clock_gettime() around the PR_Lock(c->c_mutex) call
or or about ldap/servers/slapd/daemon.c:1690 and warned when we waitied for more than a
set duration:
[22/Jul/2014:17:37:05 +0000] - setup_pr_read_pds: (fd=192) waited 995.375473 msecs for
lock
[22/Jul/2014:17:37:08 +0000] - setup_pr_read_pds: (fd=202) waited 3003.548263 msecs for
lock
[22/Jul/2014:17:37:10 +0000] - setup_pr_read_pds: (fd=181) waited 1997.828897 msecs for
lock
<up to 20-30 seconds in some extreme cases>
It looks like this could hang for up to CONN_TURBO_TIMEOUT_INTERVAL (default 1 second) per
thread in turbo (up to 50% of worker pool by default). While stuck there, it isn't
calling handle_listeners() to pull new connections off of the well known port.
Perhaps handle_listeners() should run off in its own thread, away from this connection
maitenance? (or if it must be there, a non-blocking PRP_TryLock() or somesuch?)
TIA
Thomas