Hi Jonathan,
This problem looks new to me and has something specific to your environment.
I think the best approach is to continue to debug on your system if you
have the possibility to do so.
From strace we can see that DS started smoothly (created its pid file
then notified systemd it was running fine). According to the pstack
nunc-stans was running and was able to accept network events even if it
appears it detected no incoming connection.
So the server should be ready to serve for some seconds (more than a
minute), then it crashed with one thread dereferencing likely a wrong
pointer.
Could you attach a debugger when the server is started and wait for the
sigsegv to occur. Then confirm the crashing thread backstack.
If it is confirmed, I am afraid this is a stack corruption and valgrind
could help
(
Here's a strace from before it dies. Most of the elapsed time is
it
waiting on some futex call it looks like near the end, when it finally
"returns" (from lack of strace output for duration of call I assume it
didn't actually return but SIGSEGV in that call) and strace prints ' =
?' on the futex it then immediately reports SIGSEGV after. So maybe
the problem is that futex call, which may mean the problem is not
directly in 389DS / FreeIPA itself?
15:13:31.626587 (+ 0.000630) listen(8, 128) = 0 <0.000068>
15:13:31.626857 (+ 0.000235) listen(9, 128) = 0 <0.000048>
15:13:31.627111 (+ 0.000251) clock_gettime(CLOCK_MONOTONIC,
{tv_sec=1464932, tv_nsec=41120614}) = 0 <0.000085>
15:13:31.627457 (+ 0.000356) clock_gettime(CLOCK_REALTIME,
{tv_sec=1526328811, tv_nsec=627560772}) = 0 <0.000043>
15:13:31.631233 (+ 0.003839) clock_gettime(CLOCK_MONOTONIC,
{tv_sec=1464932, tv_nsec=45286796}) = 0 <0.000077>
15:13:31.631720 (+ 0.000427) clock_gettime(CLOCK_MONOTONIC,
{tv_sec=1464932, tv_nsec=45661430}) = 0 <0.000042>
15:13:31.631955 (+ 0.000220) clock_gettime(CLOCK_REALTIME,
{tv_sec=1526328811, tv_nsec=632049036}) = 0 <0.000047>
15:13:31.635669 (+ 0.003785) clock_gettime(CLOCK_MONOTONIC,
{tv_sec=1464932, tv_nsec=49725840}) = 0 <0.000146>
15:13:31.636484 (+ 0.000784) write(16, "a", 1) = 1 <0.000118>
15:13:31.636855 (+ 0.000341) sched_yield() = 0 <0.000252>
15:13:31.637322 (+ 0.000470) futex(0x1cb57a0, FUTEX_WAKE_PRIVATE,
1) = 1 <0.000088>
15:13:31.637897 (+ 0.000610) write(16, "a", 1) = 1 <0.000221>
15:13:31.638394 (+ 0.000467) sched_yield() = 0 <0.000047>
15:13:31.638619 (+ 0.000202) futex(0x1cb5710, FUTEX_WAKE_PRIVATE,
1) = 1 <0.000065>
15:13:31.638908 (+ 0.000298) openat(AT_FDCWD,
"/var/run/dirsrv/slapd-COMPANY-INTERNAL.pid",
O_WRONLY|O_CREAT|O_TRUNC, 0666) = 33 <0.000831>
15:13:31.640260 (+ 0.001387) getpid() = 32353 <0.000077>
15:13:31.640558 (+ 0.000256) fstat64(33, {st_mode=S_IFREG|0644,
st_size=0, ...}) = 0 <0.000119>
15:13:31.641106 (+ 0.000556) write(33, "32353\n", 6) = 6 <0.000127>
15:13:31.641472 (+ 0.000362) close(33) = 0 <0.000519>
15:13:31.642216 (+ 0.000758)
chmod("/var/run/dirsrv/slapd-COMPANY-INTERNAL.pid", 0644) = 0 <0.000152>
15:13:31.642900 (+ 0.000679) clock_gettime(CLOCK_REALTIME,
{tv_sec=1526328811, tv_nsec=643020294}) = 0 <0.000056>
15:13:31.643495 (+ 0.000590) write(2,
"[14/May/2018:15:13:31.643020294 "..., 134) = 134 <0.002697>
15:13:31.646515 (+ 0.003052) clock_gettime(CLOCK_REALTIME,
{tv_sec=1526328811, tv_nsec=646694394}) = 0 <0.000075>
15:13:31.646892 (+ 0.000337) write(4,
"[14/May/2018:15:13:31.646694394 "..., 134) = 134 <0.000522>
15:13:31.647841 (+ 0.000973) fsync(4) = 0 <0.005967>
15:13:31.654425 (+ 0.006617) clock_gettime(CLOCK_REALTIME,
{tv_sec=1526328811, tv_nsec=654598946}) = 0 <0.000253>
15:13:31.655137 (+ 0.000717) write(2,
"[14/May/2018:15:13:31.654598946 "..., 136) = 136 <0.002427>
15:13:31.658312 (+ 0.003165) clock_gettime(CLOCK_REALTIME,
{tv_sec=1526328811, tv_nsec=658486117}) = 0 <0.000251>
15:13:31.659032 (+ 0.000682) write(4,
"[14/May/2018:15:13:31.658486117 "..., 136) = 136 <0.000346>
15:13:31.659623 (+ 0.000595) fsync(4) = 0 <0.003311>
15:13:31.663230 (+ 0.003642) getpid() = 32353 <0.000045>
15:13:31.663732 (+ 0.000454) socket(AF_UNIX,
SOCK_DGRAM|SOCK_CLOEXEC, 0) = 33 <0.000296>
15:13:31.664760 (+ 0.001048) getsockopt(33, SOL_SOCKET, SO_SNDBUF,
[163840], [4]) = 0 <0.000108>
15:13:31.665141 (+ 0.000386) setsockopt(33, SOL_SOCKET,
SO_SNDBUFFORCE, [8388608], 4) = -1 EPERM (Operation not permitted)
<0.000051>
15:13:31.665500 (+ 0.000334) setsockopt(33, SOL_SOCKET, SO_SNDBUF,
[8388608], 4) = 0 <0.000229>
15:13:31.665973 (+ 0.000468) sendmsg(33,
{msg_name={sa_family=AF_UNIX, sun_path="/run/systemd/notify"},
msg_namelen=21, msg_iov=[{iov_base="READY=1\nSTATUS=slapd started:
Re"..., iov_len=69}], msg_iovlen=1, msg_controllen=0, msg_flags=0},
MSG_NOSIGNAL) = 69 <0.000600>
15:13:31.667270 (+ 0.001334) close(33) = 0 <0.003140>
15:13:31.677320 (+ 0.010146) futex(0xb31122e8, FUTEX_WAIT, 32357,
NULL) = ?
15:14:27.661809 (+ 55.984448) +++ killed by SIGSEGV (core dumped) +++
On Mon, May 14, 2018 at 10:27 AM, thierry bordaz <tbordaz(a)redhat.com
<mailto:tbordaz@redhat.com>> wrote:
Hi Jonathan,
This is weird as the crashing thread stack looks truncated (did
you copy/paste all of it ?)
Thread 1 (Thread 0x9e13c280 (LWP 17245)):
#0 0xb67bbf2e in strlen () at /lib/libc.so.6
#1 0xb6a06b40 in dosprintf () at /lib/libnspr4.so
#2 0x00000000 in None ()
Did you install 389-ds-base-debuginfo ?
How did you get that backtrace ? from a core dumped, pstack ? Can
you attach a debugger before the crash occurs ?
It looks it crashed soon at startup, could it be related to a
broken dse.ldif. It should exists a dse.ldif.OK, is it possibly to
try to start with it ?
best regards
thierry
On 05/12/2018 01:22 AM, Jonathan Vaughn via FreeIPA-users wrote:
> Not sure if it makes a difference... I was looking into this
> again and realized I had a bunch of messages from gdb telling me
> to install more debuginfo. I've done that now, here it is again
> freshly run through gdb
>
> GNU gdb (GDB) Fedora 8.0.1-36.fc27
> Copyright (C) 2017 Free Software Foundation, Inc.
> License GPLv3+: GNU GPL version 3 or later
> <
http://gnu.org/licenses/gpl.html
<
http://gnu.org/licenses/gpl.html>>
> This is free software: you are free to change and redistribute it.
> There is NO WARRANTY, to the extent permitted by law. Type "show
> copying"
> and "show warranty" for details.
> This GDB was configured as "armv7hl-redhat-linux-gnueabi".
> Type "show configuration" for configuration details.
> For bug reporting instructions, please see:
> <
http://www.gnu.org/software/gdb/bugs/
> <
http://www.gnu.org/software/gdb/bugs/>>.
> Find the GDB manual and other documentation resources online at:
> <
http://www.gnu.org/software/gdb/documentation/
> <
http://www.gnu.org/software/gdb/documentation/>>.
> For help, type "help".
> Type "apropos word" to search for commands related to
"word"...
> Reading symbols from /usr/sbin/ns-slapd...Reading symbols from
> /usr/lib/debug/usr/sbin/ns-slapd-1.3.7.10-1.fc27.arm.debug...done.
> done.
> ...
>
> Thread 1 (Thread 0x9e13c280 (LWP 17245)):
> #0 0xb67bbf2e in strlen () at /lib/libc.so.6
> #1 0xb6a06b40 in dosprintf () at /lib/libnspr4.so
> #2 0x00000000 in None ()
>
>
>
> On Tue, May 8, 2018 at 7:52 AM, Rob Crittenden
> <rcritten(a)redhat.com <mailto:rcritten@redhat.com>> wrote:
>
> Jonathan Vaughn via FreeIPA-users wrote:
>
> Still trying to figure this out. It looks like slapd
> is dying, I thought it was still running for some reason.
>
> slapd is dying to segfault. strace of it happening
> doesn't seem to reveal much:
>
>
> A stack trace would very much help trying to track down
> the cause.
>
>
http://directory.fedoraproject.org/docs/389ds/FAQ/faq.html#debugging-crashes
>
<
http://directory.fedoraproject.org/docs/389ds/FAQ/faq.html#debugging-cras...
>
> rob
>
>
> 18:32:41.543717 (+ 0.000801) openat(AT_FDCWD,
> "/var/run/dirsrv/slapd-COMPANY-INTERNAL.pid",
> O_WRONLY|O_CREAT|O_TRUNC, 0666) = 32
> 18:32:41.544907 (+ 0.001195) getpid() = 16014
> 18:32:41.545269 (+ 0.000329) fstat64(32,
> {st_mode=S_IFREG|0644, st_size=0, ...}) = 0
> 18:32:41.545799 (+ 0.000536) write(32, "16014\n", 6) = 6
> 18:32:41.546603 (+ 0.000818) close(32) = 0
> 18:32:41.547061 (+ 0.000448)
> chmod("/var/run/dirsrv/slapd-COMPANY-INTERNAL.pid",
> 0644) = 0
> 18:32:41.547741 (+ 0.000676)
> clock_gettime(CLOCK_REALTIME, {tv_sec=1525735961,
> tv_nsec=548030641}) = 0
> 18:32:41.548324 (+ 0.000587) write(2,
> "[07/May/2018:18:32:41.548030641 "..., 134) = 134
> 18:32:41.551096 (+ 0.002840)
> clock_gettime(CLOCK_REALTIME, {tv_sec=1525735961,
> tv_nsec=551287555}) = 0
> 18:32:41.551568 (+ 0.000406) write(4,
> "[07/May/2018:18:32:41.551287555 "..., 134) = 134
> 18:32:41.552360 (+ 0.000811) fsync(4) = 0
> 18:32:41.558499 (+ 0.006170)
> clock_gettime(CLOCK_REALTIME, {tv_sec=1525735961,
> tv_nsec=558678099}) = 0
> 18:32:41.558901 (+ 0.000350) write(2,
> "[07/May/2018:18:32:41.558678099 "..., 136) = 136
> 18:32:41.561537 (+ 0.002680)
> clock_gettime(CLOCK_REALTIME, {tv_sec=1525735961,
> tv_nsec=561718659}) = 0
> 18:32:41.562357 (+ 0.000793) write(4,
> "[07/May/2018:18:32:41.561718659 "..., 136) = 136
> 18:32:41.563293 (+ 0.001148) fsync(4) = 0
> 18:32:41.566928 (+ 0.003452) getpid() = 16014
> 18:32:41.567712 (+ 0.000752) socket(AF_UNIX,
> SOCK_DGRAM|SOCK_CLOEXEC, 0) = 32
> 18:32:41.568628 (+ 0.000912) getsockopt(32,
> SOL_SOCKET, SO_SNDBUF, [163840], [4]) = 0
> 18:32:41.568972 (+ 0.000319) setsockopt(32,
> SOL_SOCKET, SO_SNDBUFFORCE, [8388608], 4) = -1 EPERM
> (Operation not permitted)
> 18:32:41.569548 (+ 0.000589) setsockopt(32,
> SOL_SOCKET, SO_SNDBUF, [8388608], 4) = 0
> 18:32:41.570064 (+ 0.000513) sendmsg(32,
> {msg_name={sa_family=AF_UNIX,
> sun_path="/run/systemd/notify"}, msg_namelen=21,
> msg_iov=[{iov_base="READY=1\nSTATUS=slapd started:
> Re"..., iov_len=69}], msg_iovlen=1, msg_controllen=0,
> msg_flags=0}, MSG_NOSIGNAL) = 69
> 18:32:41.570845 (+ 0.000789) close(32) = 0
> 18:32:41.576358 (+ 0.005575) futex(0xb30ed2e8,
> FUTEX_WAIT, 16016, NULL) = ?
> 18:33:01.730774 (+ 20.154428) +++ killed by SIGSEGV +++
>