On Mon, 2017-06-12 at 17:57 +0200, Jakub Hrozek wrote:
On Mon, Jun 12, 2017 at 03:21:43PM +0000, Joakim Tjernlund wrote:
> On Mon, 2017-06-12 at 16:01 +0200, Jakub Hrozek wrote:
> > On Mon, Jun 12, 2017 at 01:53:27PM +0000, Joakim Tjernlund wrote:
> > > On Sun, 2017-06-11 at 20:55 +0200, Jakub Hrozek wrote:
> > > > On Sat, Jun 10, 2017 at 07:56:47AM +0000, Joakim Tjernlund wrote:
> > > > > On Sat, 2017-06-10 at 08:24 +0200, Jakub Hrozek wrote:
> > > > > > On Fri, Jun 09, 2017 at 04:28:45PM +0000, Joakim Tjernlund
wrote:
> > > > > > > both 1.15.2 and git master hangs after less than 24
hour on
> > > > > > > a server.
> > > > > > >
> > > > > > > I can see this repeating the domain log:
> > > > > > >
> > > > > > > (Fri Jun 9 18:21:49 2017) [sssd[be[infinera.com]]]
[orderly_shutdown] (0x0010): SIGTERM: killing children
> > > > > > > (Fri Jun 9 18:21:49 2017) [sssd[be[infinera.com]]]
[ldb] (0x0010): A transaction is still active in ldb context [0xf65ce0] on
/var/lib/sss/db/cache_infinera.com.ldb
> > > > > > > (Fri Jun 9 18:22:42 2017) [sssd[be[infinera.com]]]
[orderly_shutdown] (0x0010): SIGTERM: killing children
> > > > > > > (Fri Jun 9 18:22:42 2017) [sssd[be[infinera.com]]]
[ldb] (0x0010): A transaction is still active in ldb context [0x239cce0] on
/var/lib/sss/db/cache_infinera.com.ldb
> > > > > > > (Fri Jun 9 18:23:35 2017) [sssd[be[infinera.com]]]
[orderly_shutdown] (0x0010): SIGTERM: killing children
> > > > > > > (Fri Jun 9 18:23:35 2017) [sssd[be[infinera.com]]]
[ldb] (0x0010): A transaction is still active in ldb context [0x1421ce0] on
/var/lib/sss/db/cache_infinera.com.ldb
> > > > > > > (Fri Jun 9 18:24:28 2017) [sssd[be[infinera.com]]]
[orderly_shutdown] (0x0010): SIGTERM: killing children
> > > > > > > (Fri Jun 9 18:24:28 2017) [sssd[be[infinera.com]]]
[ldb] (0x0010): A transaction is still active in ldb context [0x1cb0ce0] on
/var/lib/sss/db/cache_infinera.com.ldb
> > > > > >
> > > > > > This is caused by too long write to disk.
> > > > > >
> > > > >
> > > > > Can I just increase the timeout for now? I will patch the code
if needed.
> > > > > On this sever we need enumerate = true ATM, cannot just turn it
off.
> > > >
> > > > Oh, sure. The other alternative might be to mount the cache to
tmpfs.
> > >
> > > After mounting a tmpfs this morning on /var/lib/sss/db, the error has
returned.
> > > Seems to an additional problem here.
> > >
> > > I don't this AD is that big either:
> > > # > getent passwd | wc -l
> > > 3236
> > > # > getent group | wc -l
> > > 885
> > >
> > > Any ideas?
> >
> > Can you get a pstack of when the process is 'stuck' ?
>
> Don't know what pstack is ?
Sorry, it's a utility that prints the backtrace of a process, e.g.:
pstack $(pidof sssd_be)
#0 0x00007f5fa5ae9db3 in __epoll_wait_nocancel () at
../sysdeps/unix/syscall-template.S:84
#1 0x00007f5fa61ca8ca in epoll_event_loop (tvalp=0x7ffd78977bf0, epoll_ev=0xb44e70) at
../tevent_epoll.c:642 #2 epoll_event_loop_once (ev=<optimized out>,
location=<optimized out>) at ../tevent_epoll.c:926
#3 0x00007f5fa61c8f0a in std_event_loop_once (ev=0xb44c30, location=0x7f5faa19cbbd
"/sssd/src/util/server.c:719") at ../tevent_standard.c:114
#4 0x00007f5fa61c50e0 in _tevent_loop_once (ev=ev@entry=0xb44c30,
location=location@entry=0x7f5faa19cbbd "/sssd/src/util/server.c:719") at
../tevent.c:533
#5 0x00007f5fa61c527b in tevent_common_loop_wait (ev=0xb44c30, location=0x7f5faa19cbbd
"/sssd/src/util/server.c:719") at ../tevent.c:637
#6 0x00007f5fa61c8e9a in std_event_loop_wait (ev=0xb44c30, location=0x7f5faa19cbbd
"/sssd/src/util/server.c:719") at ../tevent_standard.c:140
#7 0x00007f5faa173f10 in server_loop (main_ctx=0xb46080) at /sssd/src/util/server.c:719
#8 0x00000000004093ff in main (argc=8, argv=0x7ffd78978028) at
/sssd/src/providers/data_provider_be.c:589
I don't know about Gentoo, but on RHEL/Fedora, it's part of the gdb
package.
I see, its not in native Gentoo but can be found in extarnal overlays. Not sure this will
help though as
sssd is burning CPU when it gets into this state.
Jocke