On Apr 10, 2013, at 10:00 , Andrei Belov <defanator(a)gmail.com> wrote:
Angus,
On Apr 10, 2013, at 9:24 , Angus Salkeld <asalkeld(a)redhat.com> wrote:
> On 10/04/13 08:52 +0400, Andrei Belov wrote:
>> Hello,
>>
>> I'm facing the issue with qp_ipcs_connection_unref() errors while
>> using libqb (0.14.4) with pacemaker (1.1.8) under SunOS
>> with QB_IPC_SOCKET.
>>
>> Pacemaker's daemons are being aborted on shutdown with the following in
logs:
>>
>> pr 10 04:05:06 [47131] attrd: error: qb_ipcs_connection_unref:
ref:0 state:3 (47131-47133-11)
>> Apr 10 04:05:13 [47130] lrmd: error: qb_ipcs_connection_unref:
ref:0 state:3 (47130-47133-9)
>> Apr 10 04:05:20 [47129] stonith-ng: error: qb_ipcs_connection_unref:
ref:0 state:3 (47129-47133-13)
>> Apr 10 04:05:27 [47128] cib: error: qb_ipcs_connection_unref:
ref:0 state:3 (47128-47131-19)
>>
>> What does it mean?
>>
>>
>> Backtraces:
>>
>> Core was generated by `/opt/local/libexec/pacemaker/attrd'.
>> Program terminated with signal 6, Aborted.
>> #0 0xfffffd7fff0e061a in _lwp_kill () from /lib/64/libc.so.1
>> (gdb) bt
>> #0 0xfffffd7fff0e061a in _lwp_kill () from /lib/64/libc.so.1
>> #1 0xfffffd7fff0d4ddd in thr_kill () from /lib/64/libc.so.1
>> #2 0xfffffd7fff06a971 in raise () from /lib/64/libc.so.1
>> #3 0xfffffd7fff0400a1 in abort () from /lib/64/libc.so.1
>> #4 0xfffffd7fff0403f5 in _assert () from /lib/64/libc.so.1
>> #5 0xfffffd7fc021274e in qb_ipcs_connection_unref () from
/opt/local/lib/libqb.so.0
>> #6 0x00000000004044f9 in main ()
>
> Hi
>
> The connection is reference counted, and it is getting dereferenced
> one too many times. This is due to differences in the behaviour of
> sockets on solaris and linux. Basically the code path for solaris
> shutdown is not well tested by me.
>
> (patches welcome)
Thanks, I'll try to examine this deeper.
I see "tests" subdirectory in libqb - what is the right way to run those?
Just tried "distcheck" target, IPC test failed with the following:
../../tests/check_ipc.c:704:E:ipc_event_on_created_us:test_ipc_event_on_created_us:0:
(after this point) Test timeout expired
../../tests/check_ipc.c:757:E:ipc_disconnect_after_created_us:test_ipc_disconnect_after_created_us:0:
(after this point) Test timeout expired
Full output is here:
http://defan.pp.ru/libqb-sunos-distcheck.txt
Still trying to understand what happens.
> -Angus
>
>>
>> Core was generated by `/opt/local/libexec/pacemaker/cib'.
>> Program terminated with signal 6, Aborted.
>> #0 0xfffffd7fff0f061a in _lwp_kill () from /lib/64/libc.so.1
>> (gdb) bt
>> #0 0xfffffd7fff0f061a in _lwp_kill () from /lib/64/libc.so.1
>> #1 0xfffffd7fff0e4ddd in thr_kill () from /lib/64/libc.so.1
>> #2 0xfffffd7fff07a971 in raise () from /lib/64/libc.so.1
>> #3 0xfffffd7fff0500a1 in abort () from /lib/64/libc.so.1
>> #4 0xfffffd7fff0503f5 in _assert () from /lib/64/libc.so.1
>> #5 0xfffffd7fc021274e in qb_ipcs_connection_unref () from
/opt/local/lib/libqb.so.0
>> #6 0x0000000000410438 in cib_shutdown ()
>> #7 0xfffffd7fbfc5533f in crm_signal_dispatch (source=0x49be80,
callback=<optimized out>, userdata=<optimized out>)
>> at mainloop.c:203
>> #8 0xfffffd7fc555f9e0 in g_main_context_dispatch () from
/opt/local/lib/libglib-2.0.so.0
>> #9 0xfffffd7fc555fd40 in g_main_context_iterate.isra.24 () from
/opt/local/lib/libglib-2.0.so.0
>> #10 0xfffffd7fc5560152 in g_main_loop_run () from
/opt/local/lib/libglib-2.0.so.0
>> #11 0x0000000000411056 in cib_init ()
>> #12 0x000000000041163e in main ()
>>
>> Core was generated by `/opt/local/libexec/pacemaker/lrmd'.
>> Program terminated with signal 6, Aborted.
>> #0 0xfffffd7fff0e061a in _lwp_kill () from /lib/64/libc.so.1
>> (gdb) bt
>> #0 0xfffffd7fff0e061a in _lwp_kill () from /lib/64/libc.so.1
>> #1 0xfffffd7fff0d4ddd in thr_kill () from /lib/64/libc.so.1
>> #2 0xfffffd7fff06a971 in raise () from /lib/64/libc.so.1
>> #3 0xfffffd7fff0400a1 in abort () from /lib/64/libc.so.1
>> #4 0xfffffd7fff0403f5 in _assert () from /lib/64/libc.so.1
>> #5 0xfffffd7fc021274e in qb_ipcs_connection_unref () from
/opt/local/lib/libqb.so.0
>> #6 0xfffffd7fc02128a4 in qb_ipcs_disconnect () from /opt/local/lib/libqb.so.0
>> #7 0xfffffd7fc0212995 in qb_ipcs_unref () from /opt/local/lib/libqb.so.0
>> #8 0xfffffd7fc02129c7 in qb_ipcs_destroy () from /opt/local/lib/libqb.so.0
>> #9 0xfffffd7fbfc55a3f in mainloop_del_ipc_server (server=<optimized out>)
at mainloop.c:517
>> #10 0x00000000004041cd in lrmd_shutdown ()
>> #11 0xfffffd7fbfc5533f in crm_signal_dispatch (source=0x48ad40,
callback=<optimized out>, userdata=<optimized out>)
>> at mainloop.c:203
>> #12 0xfffffd7fc555f9e0 in g_main_context_dispatch () from
/opt/local/lib/libglib-2.0.so.0
>> #13 0xfffffd7fc555fd40 in g_main_context_iterate.isra.24 () from
/opt/local/lib/libglib-2.0.so.0
>> #14 0xfffffd7fc5560152 in g_main_loop_run () from
/opt/local/lib/libglib-2.0.so.0
>> #15 0x00000000004045e3 in main ()
>>
>> Core was generated by `/opt/local/libexec/pacemaker/stonithd'.
>> Program terminated with signal 6, Aborted.
>> #0 0xfffffd7fff11061a in _lwp_kill () from /lib/64/libc.so.1
>> (gdb) bt
>> #0 0xfffffd7fff11061a in _lwp_kill () from /lib/64/libc.so.1
>> #1 0xfffffd7fff104ddd in thr_kill () from /lib/64/libc.so.1
>> #2 0xfffffd7fff09a971 in raise () from /lib/64/libc.so.1
>> #3 0xfffffd7fff0700a1 in abort () from /lib/64/libc.so.1
>> #4 0xfffffd7fff0703f5 in _assert () from /lib/64/libc.so.1
>> #5 0xfffffd7fc021274e in qb_ipcs_connection_unref () from
/opt/local/lib/libqb.so.0
>> #6 0xfffffd7fc02128a4 in qb_ipcs_disconnect () from /opt/local/lib/libqb.so.0
>> #7 0xfffffd7fc0212995 in qb_ipcs_unref () from /opt/local/lib/libqb.so.0
>> #8 0xfffffd7fc02129c7 in qb_ipcs_destroy () from /opt/local/lib/libqb.so.0
>> #9 0x0000000000405e60 in ?? ()
>> #10 0x0000000000407d28 in main ()
>>
>>
>> PS:
>> while reproducing the above, just caught the same with corosync (2.3.0):
>>
>> Core was generated by `corosync'.
>> Program terminated with signal 6, Aborted.
>> #0 0xfffffd7fff1e061a in _lwp_kill () from /lib/64/libc.so.1
>> (gdb) bt
>> #0 0xfffffd7fff1e061a in _lwp_kill () from /lib/64/libc.so.1
>> #1 0xfffffd7fff1d4ddd in thr_kill () from /lib/64/libc.so.1
>> #2 0xfffffd7fff16a971 in raise () from /lib/64/libc.so.1
>> #3 0xfffffd7fff1400a1 in abort () from /lib/64/libc.so.1
>> #4 0xfffffd7fff1403f5 in _assert () from /lib/64/libc.so.1
>> #5 0xfffffd7fc021274e in qb_ipcs_connection_unref () from
/opt/local/lib/libqb.so.0
>> #6 0xfffffd7fc02128a4 in qb_ipcs_disconnect () from /opt/local/lib/libqb.so.0
>> #7 0xfffffd7fc0212995 in qb_ipcs_unref () from /opt/local/lib/libqb.so.0
>> #8 0xfffffd7fc02129c7 in qb_ipcs_destroy () from /opt/local/lib/libqb.so.0
>> #9 0x00000000004264e2 in cs_ipcs_service_destroy ()
>> #10 0x00000000004270e0 in ?? ()
>> #11 0xfffffd7fc0210ff4 in job_dispatch () from /opt/local/lib/libqb.so.0
>> #12 0xfffffd7fc020fcb4 in qb_loop_run () from /opt/local/lib/libqb.so.0
>> #13 0x000000000042b551 in main ()
>>
>> so it looks like libqb related issue for me.
>>
>>
>> Any help would be greatly appreciated!
>>
>>
>> Best regards,
>> Andrei
>> _______________________________________________
>> quarterback-devel mailing list
>> quarterback-devel(a)lists.fedorahosted.org
>>
https://lists.fedorahosted.org/mailman/listinfo/quarterback-devel
> _______________________________________________
> quarterback-devel mailing list
> quarterback-devel(a)lists.fedorahosted.org
>
https://lists.fedorahosted.org/mailman/listinfo/quarterback-devel