On Apr 10, 2013, at 14:07 , Andrei Belov <defanator(a)gmail.com> wrote:
On Apr 10, 2013, at 13:53 , Grüninger, Andreas (LGL Extern)
<Andreas.Grueninger(a)lgl.bwl.de> wrote:
> Andrej
>
> corosync and pacemaker must be started as a non-root user.
> Be carefull when you run the tests as root and you use the same var folder for tests
and the daemons.
> You have to check the permissions on your var folder afterwards.
I run tests under my unprivileged account using standard "distcheck" target.
In this case libqb uses SOCKETDIR within working directory
(e.g., /home/defan/git/libqb/libqb-0.14.4.24-b3ca/_inst/var/run).
I'm able to successfully start both corosync and pacemaker under unprivileged
user (hacluster:haclient). Test cluster from two nodes operates normally until
I'm trying to stop pacemaker/corosync.
This happens at SmartOS (Joyent cloud):
# uname -a
SunOS xxxx 5.11 joyent_20120912T055050Z i86pc i386 i86pc Solaris
Angus - could you please take a look on the output from ipc.test? I feel a bit lost
here...
What I've found so far - trouble is in bind() call in qb_ipc_dgram_sock_setup():
lib/check_ipc.c|787| ENTERING test_ipc_disconnect_after_created_us()
lib/ipc_setup.c|372| server name: test_ipc_disconnect_after_created_us
lib/loop_poll.c|357| grown poll array to 2 for FD 6
=== getpeerucred() success: uid=1007 gid=10 pid=19909
=== handle_new_connection(): auth_result=0
=== handle_new_connection(): uid=1007 gid=10
lib/ipc_setup.c|481| IPC credentials authenticated (19911-19909-7)
lib/ipc_socket.c|531| connecting to client (19911-19909-7)
=== qb_ipcs_us_connect: to 19911-19909-7
=== qb_ipc_dgram_sock_setuip: bind failed: 125
=== qb_ipcs_us_connect: response channel: qb_ipc_dgram_sock_setup failed
=== qb_ipcs_us_connect: ret 2
=== funcs.connect failed
lib/ipc_setup.c|530| Error in connection setup (19911-19909-7): Not owner (1)
lib/ipcs.c|555| qb_ipcs_disconnect(19911-19909-7) state:0
=== getpeerucred() success: uid=1007 gid=10 pid=19909
=== handle_new_connection(): auth_result=0
=== handle_new_connection(): uid=1007 gid=10
lib/ipc_setup.c|481| IPC credentials authenticated (19911-19909-8)
lib/ipc_socket.c|531| connecting to client (19911-19909-8)
=== qb_ipcs_us_connect: to 19911-19909-8
=== qb_ipc_dgram_sock_setuip: bind failed: 125
=== qb_ipcs_us_connect: request channel: qb_ipc_dgram_sock_setup failed
=== qb_ipcs_us_connect: ret 2
=== funcs.connect failed
lib/ipc_setup.c|530| Error in connection setup (19911-19909-8): Not owner (1)
lib/ipcs.c|555| qb_ipcs_disconnect(19911-19909-8) state:0
=== getpeerucred() success: uid=1007 gid=10 pid=19909
=== handle_new_connection(): auth_result=0
=== handle_new_connection(): uid=1007 gid=10
lib/ipc_setup.c|481| IPC credentials authenticated (19911-19909-9)
lib/ipc_socket.c|531| connecting to client (19911-19909-9)
=== qb_ipcs_us_connect: to 19911-19909-9
=== qb_ipc_dgram_sock_setuip: bind failed: 125
=== qb_ipcs_us_connect: request channel: qb_ipc_dgram_sock_setup failed
=== qb_ipcs_us_connect: ret 2
=== funcs.connect failed
lib/ipc_setup.c|530| Error in connection setup (19911-19909-9): Not owner (1)
lib/ipcs.c|555| qb_ipcs_disconnect(19911-19909-9) state:0
0%: Checks: 1, Failures: 0, Errors: 1
125 is EADDRINUSE.
truss shows at least one successful attempt to bind() + listen():
19007: bind(6, 0xFFFFFD7FFFDFF6B0, 95, SOV_SOCKBSD) = 0
19007:
chmod("/home/defan/git/libqb/libqb-0.14.4.24-b3ca/_inst/var/run/test_ipc_disconnect_after_created_us",
0777)
= 0
19007: listen(6, 5, SOV_DEFAULT) = 0
then:
19007: so_socket(PF_UNIX, SOCK_STREAM, 0, 0x00000000, SOV_DEFAULT) = 9
19007: fcntl(9, F_GETFD, 0x00000000) = 0
19007: fcntl(9, F_SETFD, 0x00000001) = 0
19007: fcntl(9, F_SETFL, FNONBLOCK) = 0
19007: bind(9, 0xFFFFFD7FFFDFDFC0, 110, SOV_SOCKBSD) Err#125 EADDRINUSE
[..]
19007: bind(9, 0xFFFFFD7FFFDFDFC0, 110, SOV_SOCKBSD) Err#125 EADDRINUSE
19007: write(2, " = = = q b _ i p c _ d".., 43) = 43
19007: write(2, " 1 2 5", 3) = 3
19007: write(2, " \n", 2) = 2
19007: close(9) = 0
19007: write(2, " = = = q b _ i p c s _".., 73) = 73
19007: close(9) Err#9 EBADF
19007:
unlink("/home/defan/git/libqb/libqb-0.14.4.24-b3ca/_inst/var/run/qb-test_ipc_disconnect_after_created_us-control-19007-19005-8")
= 0
[..]
19007: bind(10, 0xFFFFFD7FFFDFDFC0, 110, SOV_SOCKBSD) Err#125 EADDRINUSE
19007: write(2, " = = = q b _ i p c _ d".., 43) = 43
19007: write(2, " 1 2 5", 3) = 3
19007: write(2, " \n", 2) = 2
19007: close(10) = 0
19007: write(2, " = = = q b _ i p c s _".., 73) = 73
19007: close(10) Err#9 EBADF
19007:
unlink("/home/defan/git/libqb/libqb-0.14.4.24-b3ca/_inst/var/run/qb-test_ipc_disconnect_after_created_us-control-19007-19005-9")
= 0
>
> I copied the necessary parts from my scripts.
>
> export PCMK_ipc_type=socket
> export PREFIX=/opt/ha
> export CLUSTER_USER=hacluster
> export CLUSTER_GROUP=haclient
>
> mkdir -p $PREFIX/var
> chown $CLUSTER_USER:$CLUSTER_GROUP $PREFIX/var/
>
> mkdir -p $PREFIX/etc/corosync/uidgid.d
> (
> echo "uidgid {"
> echo " uid: `id -u ${CLUSTER_USER}`"
> echo " gid: `id -g ${CLUSTER_USER}`"
> echo "}"
> ) > $PREFIX/etc/corosync/uidgid.d/uid.conf
>
> su ${CLUSTER_USER} -c ${APPPATH}${COROSYNC}
> sleep $sleep0
> su ${CLUSTER_USER} -c ${APPPATH}${PACEMAKERD} &
>
> And FYI, it works with Solaris 11.1 and OpenIndiana 151a7.
>
> The status of our cluster with 4 virtual storage appliances:
> ..................................
> crm(live)# status
> Last updated: Wed Apr 10 11:49:08 2013
> Last change: Fri Apr 5 10:28:46 2013 via cibadmin on zd-sol-s2-v61
> Stack: corosync
> Current DC: zd-sol-s1-v61 (3232251190) - partition with quorum
> Version: 1.1.8-f49aa8c
> 2 Nodes configured, unknown expected votes
> 6 Resources configured.
>
>
> Online: [ zd-sol-s1-v61 zd-sol-s2-v61 ]
>
> ClusterNotify_SNMPTraps (ocf::lgl:ClusterNotify): Started zd-sol-s1-v61
> ClusterMon_SNMPTraps (ocf::pacemaker:ClusterMon): Started zd-sol-s1-v61
> zone_zd-sol-s61 (ocf::lgl:zpool): Started zd-sol-s2-v61
> zone_zd-sol-s62 (ocf::lgl:zpool): Started zd-sol-s2-v61
> zone_zd-sol-s60 (ocf::lgl:zpool): Started zd-sol-s2-v61
> zone_zd-sol-s63 (ocf::lgl:zpool): Started zd-sol-s2-v61
> ....................................
>
> Thanks
>
> Andreas
>
> -----Ursprüngliche Nachricht-----
> Von: quarterback-devel-bounces(a)lists.fedorahosted.org
[mailto:quarterback-devel-bounces@lists.fedorahosted.org] Im Auftrag von Andrei Belov
> Gesendet: Mittwoch, 10. April 2013 11:41
> An: lib quarterback
> Betreff: Re: [libqb] qb_ipcs_connection_unref() errors while usingQB_IPC_SOCKET
>
>
> On Apr 10, 2013, at 10:46 , Andrei Belov <defanator(a)gmail.com> wrote:
>
>>
>> On Apr 10, 2013, at 10:00 , Andrei Belov <defanator(a)gmail.com> wrote:
>>
>>> Angus,
>>>
>>> On Apr 10, 2013, at 9:24 , Angus Salkeld <asalkeld(a)redhat.com> wrote:
>>>
>>>> On 10/04/13 08:52 +0400, Andrei Belov wrote:
>>>>> Hello,
>>>>>
>>>>> I'm facing the issue with qp_ipcs_connection_unref() errors while
>>>>> using libqb (0.14.4) with pacemaker (1.1.8) under SunOS with
>>>>> QB_IPC_SOCKET.
>>>>>
>>>>> Pacemaker's daemons are being aborted on shutdown with the
following in logs:
>>>>>
>>>>> pr 10 04:05:06 [47131] attrd: error:
qb_ipcs_connection_unref: ref:0 state:3 (47131-47133-11)
>>>>> Apr 10 04:05:13 [47130] lrmd: error:
qb_ipcs_connection_unref: ref:0 state:3 (47130-47133-9)
>>>>> Apr 10 04:05:20 [47129] stonith-ng: error:
qb_ipcs_connection_unref: ref:0 state:3 (47129-47133-13)
>>>>> Apr 10 04:05:27 [47128] cib: error:
qb_ipcs_connection_unref: ref:0 state:3 (47128-47131-19)
>>>>>
>>>>> What does it mean?
>>>>>
>>>>>
>>>>> Backtraces:
>>>>>
>>>>> Core was generated by `/opt/local/libexec/pacemaker/attrd'.
>>>>> Program terminated with signal 6, Aborted.
>>>>> #0 0xfffffd7fff0e061a in _lwp_kill () from /lib/64/libc.so.1
>>>>> (gdb) bt
>>>>> #0 0xfffffd7fff0e061a in _lwp_kill () from /lib/64/libc.so.1
>>>>> #1 0xfffffd7fff0d4ddd in thr_kill () from /lib/64/libc.so.1
>>>>> #2 0xfffffd7fff06a971 in raise () from /lib/64/libc.so.1
>>>>> #3 0xfffffd7fff0400a1 in abort () from /lib/64/libc.so.1
>>>>> #4 0xfffffd7fff0403f5 in _assert () from /lib/64/libc.so.1
>>>>> #5 0xfffffd7fc021274e in qb_ipcs_connection_unref () from
>>>>> /opt/local/lib/libqb.so.0
>>>>> #6 0x00000000004044f9 in main ()
>>>>
>>>> Hi
>>>>
>>>> The connection is reference counted, and it is getting dereferenced
>>>> one too many times. This is due to differences in the behaviour of
>>>> sockets on solaris and linux. Basically the code path for solaris
>>>> shutdown is not well tested by me.
>>>>
>>>> (patches welcome)
>>>
>>> Thanks, I'll try to examine this deeper.
>>>
>>> I see "tests" subdirectory in libqb - what is the right way to run
those?
>>
>> Just tried "distcheck" target, IPC test failed with the following:
>>
>> ../../tests/check_ipc.c:704:E:ipc_event_on_created_us:test_ipc_event_o
>> n_created_us:0: (after this point) Test timeout expired
>> ../../tests/check_ipc.c:757:E:ipc_disconnect_after_created_us:test_ipc
>> _disconnect_after_created_us:0: (after this point) Test timeout
>> expired
>>
>> Full output is here:
>>
http://defan.pp.ru/libqb-sunos-distcheck.txt
>>
>>
>> Still trying to understand what happens.
>
> Here's the full detailed output of ipc-test only:
>
http://defan.pp.ru/libqb-sunos-ipc-test.txt.gz
>
> It ended up with:
>
> [..]
> lib/check_ipc.c|787| ENTERING test_ipc_disconnect_after_created_us()
> lib/ipc_setup.c|372| server name: test_ipc_disconnect_after_created_us
> lib/loop_poll.c|357| grown poll array to 2 for FD 6 lib/ipc_setup.c|476| IPC
credentials authenticated (29647-29646-7) lib/ipc_socket.c|527| connecting to client
(29647-29646-7) lib/ipc_setup.c|524| Error in connection setup (29647-29646-7): Not owner
(1) lib/ipcs.c|555| qb_ipcs_disconnect(29647-29646-7) state:0 lib/ipc_setup.c|476| IPC
credentials authenticated (29647-29646-8) lib/ipc_socket.c|527| connecting to client
(29647-29646-8) lib/ipc_setup.c|524| Error in connection setup (29647-29646-8): Not owner
(1) lib/ipcs.c|555| qb_ipcs_disconnect(29647-29646-8) state:0 lib/ipc_setup.c|476| IPC
credentials authenticated (29647-29646-9) lib/ipc_socket.c|527| connecting to client
(29647-29646-9) lib/ipc_setup.c|524| Error in connection setup (29647-29646-9): Not owner
(1) lib/ipcs.c|555| qb_ipcs_disconnect(29647-29646-9) state:0
> 77%: Checks: 9, Failures: 0, Errors: 2
> ../../tests/check_ipc.c:840:P:ipc_server_fail_soc:test_ipc_server_fail_soc:0: Passed
> ../../tests/check_ipc.c:304:P:ipc_txrx_us_block:test_ipc_txrx_us_block:0: Passed
> ../../tests/check_ipc.c:304:P:ipc_txrx_us_tmo:test_ipc_txrx_us_tmo:0: Passed
> ../../tests/check_ipc.c:383:P:ipc_fc_us:test_ipc_fc_us:0: Passed
> ../../tests/check_ipc.c:424:P:ipc_exit_us:test_ipc_exit_us:0: Passed
> ../../tests/check_ipc.c:304:P:ipc_dispatch_us:test_ipc_disp_us:0: Passed
> ../../tests/check_ipc.c:654:P:ipc_bulk_events_us:test_ipc_bulk_events_us:0: Passed
> ../../tests/check_ipc.c:704:E:ipc_event_on_created_us:test_ipc_event_on_created_us:0:
(after this point) Test timeout expired
>
../../tests/check_ipc.c:757:E:ipc_disconnect_after_created_us:test_ipc_disconnect_after_created_us:0:
(after this point) Test timeout expired
> FAIL: ipc.test
> [..]
>
>
>>
>>
>>>> -Angus
>>>>
>>>>>
>>>>> Core was generated by `/opt/local/libexec/pacemaker/cib'.
>>>>> Program terminated with signal 6, Aborted.
>>>>> #0 0xfffffd7fff0f061a in _lwp_kill () from /lib/64/libc.so.1
>>>>> (gdb) bt
>>>>> #0 0xfffffd7fff0f061a in _lwp_kill () from /lib/64/libc.so.1
>>>>> #1 0xfffffd7fff0e4ddd in thr_kill () from /lib/64/libc.so.1
>>>>> #2 0xfffffd7fff07a971 in raise () from /lib/64/libc.so.1
>>>>> #3 0xfffffd7fff0500a1 in abort () from /lib/64/libc.so.1
>>>>> #4 0xfffffd7fff0503f5 in _assert () from /lib/64/libc.so.1
>>>>> #5 0xfffffd7fc021274e in qb_ipcs_connection_unref () from
>>>>> /opt/local/lib/libqb.so.0
>>>>> #6 0x0000000000410438 in cib_shutdown ()
>>>>> #7 0xfffffd7fbfc5533f in crm_signal_dispatch (source=0x49be80,
>>>>> callback=<optimized out>, userdata=<optimized out>) at
>>>>> mainloop.c:203
>>>>> #8 0xfffffd7fc555f9e0 in g_main_context_dispatch () from
>>>>> /opt/local/lib/libglib-2.0.so.0
>>>>> #9 0xfffffd7fc555fd40 in g_main_context_iterate.isra.24 () from
>>>>> /opt/local/lib/libglib-2.0.so.0 #10 0xfffffd7fc5560152 in
>>>>> g_main_loop_run () from /opt/local/lib/libglib-2.0.so.0
>>>>> #11 0x0000000000411056 in cib_init ()
>>>>> #12 0x000000000041163e in main ()
>>>>>
>>>>> Core was generated by `/opt/local/libexec/pacemaker/lrmd'.
>>>>> Program terminated with signal 6, Aborted.
>>>>> #0 0xfffffd7fff0e061a in _lwp_kill () from /lib/64/libc.so.1
>>>>> (gdb) bt
>>>>> #0 0xfffffd7fff0e061a in _lwp_kill () from /lib/64/libc.so.1
>>>>> #1 0xfffffd7fff0d4ddd in thr_kill () from /lib/64/libc.so.1
>>>>> #2 0xfffffd7fff06a971 in raise () from /lib/64/libc.so.1
>>>>> #3 0xfffffd7fff0400a1 in abort () from /lib/64/libc.so.1
>>>>> #4 0xfffffd7fff0403f5 in _assert () from /lib/64/libc.so.1
>>>>> #5 0xfffffd7fc021274e in qb_ipcs_connection_unref () from
>>>>> /opt/local/lib/libqb.so.0
>>>>> #6 0xfffffd7fc02128a4 in qb_ipcs_disconnect () from
>>>>> /opt/local/lib/libqb.so.0
>>>>> #7 0xfffffd7fc0212995 in qb_ipcs_unref () from
>>>>> /opt/local/lib/libqb.so.0
>>>>> #8 0xfffffd7fc02129c7 in qb_ipcs_destroy () from
>>>>> /opt/local/lib/libqb.so.0
>>>>> #9 0xfffffd7fbfc55a3f in mainloop_del_ipc_server
>>>>> (server=<optimized out>) at mainloop.c:517 #10
0x00000000004041cd
>>>>> in lrmd_shutdown ()
>>>>> #11 0xfffffd7fbfc5533f in crm_signal_dispatch (source=0x48ad40,
>>>>> callback=<optimized out>, userdata=<optimized out>) at
>>>>> mainloop.c:203
>>>>> #12 0xfffffd7fc555f9e0 in g_main_context_dispatch () from
>>>>> /opt/local/lib/libglib-2.0.so.0
>>>>> #13 0xfffffd7fc555fd40 in g_main_context_iterate.isra.24 () from
>>>>> /opt/local/lib/libglib-2.0.so.0
>>>>> #14 0xfffffd7fc5560152 in g_main_loop_run () from
>>>>> /opt/local/lib/libglib-2.0.so.0
>>>>> #15 0x00000000004045e3 in main ()
>>>>>
>>>>> Core was generated by `/opt/local/libexec/pacemaker/stonithd'.
>>>>> Program terminated with signal 6, Aborted.
>>>>> #0 0xfffffd7fff11061a in _lwp_kill () from /lib/64/libc.so.1
>>>>> (gdb) bt
>>>>> #0 0xfffffd7fff11061a in _lwp_kill () from /lib/64/libc.so.1
>>>>> #1 0xfffffd7fff104ddd in thr_kill () from /lib/64/libc.so.1
>>>>> #2 0xfffffd7fff09a971 in raise () from /lib/64/libc.so.1
>>>>> #3 0xfffffd7fff0700a1 in abort () from /lib/64/libc.so.1
>>>>> #4 0xfffffd7fff0703f5 in _assert () from /lib/64/libc.so.1
>>>>> #5 0xfffffd7fc021274e in qb_ipcs_connection_unref () from
>>>>> /opt/local/lib/libqb.so.0
>>>>> #6 0xfffffd7fc02128a4 in qb_ipcs_disconnect () from
>>>>> /opt/local/lib/libqb.so.0
>>>>> #7 0xfffffd7fc0212995 in qb_ipcs_unref () from
>>>>> /opt/local/lib/libqb.so.0
>>>>> #8 0xfffffd7fc02129c7 in qb_ipcs_destroy () from
>>>>> /opt/local/lib/libqb.so.0
>>>>> #9 0x0000000000405e60 in ?? ()
>>>>> #10 0x0000000000407d28 in main ()
>>>>>
>>>>>
>>>>> PS:
>>>>> while reproducing the above, just caught the same with corosync
(2.3.0):
>>>>>
>>>>> Core was generated by `corosync'.
>>>>> Program terminated with signal 6, Aborted.
>>>>> #0 0xfffffd7fff1e061a in _lwp_kill () from /lib/64/libc.so.1
>>>>> (gdb) bt
>>>>> #0 0xfffffd7fff1e061a in _lwp_kill () from /lib/64/libc.so.1
>>>>> #1 0xfffffd7fff1d4ddd in thr_kill () from /lib/64/libc.so.1
>>>>> #2 0xfffffd7fff16a971 in raise () from /lib/64/libc.so.1
>>>>> #3 0xfffffd7fff1400a1 in abort () from /lib/64/libc.so.1
>>>>> #4 0xfffffd7fff1403f5 in _assert () from /lib/64/libc.so.1
>>>>> #5 0xfffffd7fc021274e in qb_ipcs_connection_unref () from
>>>>> /opt/local/lib/libqb.so.0
>>>>> #6 0xfffffd7fc02128a4 in qb_ipcs_disconnect () from
>>>>> /opt/local/lib/libqb.so.0
>>>>> #7 0xfffffd7fc0212995 in qb_ipcs_unref () from
>>>>> /opt/local/lib/libqb.so.0
>>>>> #8 0xfffffd7fc02129c7 in qb_ipcs_destroy () from
>>>>> /opt/local/lib/libqb.so.0
>>>>> #9 0x00000000004264e2 in cs_ipcs_service_destroy () #10
>>>>> 0x00000000004270e0 in ?? ()
>>>>> #11 0xfffffd7fc0210ff4 in job_dispatch () from
>>>>> /opt/local/lib/libqb.so.0
>>>>> #12 0xfffffd7fc020fcb4 in qb_loop_run () from
>>>>> /opt/local/lib/libqb.so.0
>>>>> #13 0x000000000042b551 in main ()
>>>>>
>>>>> so it looks like libqb related issue for me.
>>>>>
>>>>>
>>>>> Any help would be greatly appreciated!
>>>>>
>>>>>
>>>>> Best regards,
>>>>> Andrei
>>>>> _______________________________________________
>>>>> quarterback-devel mailing list
>>>>> quarterback-devel(a)lists.fedorahosted.org
>>>>>
https://lists.fedorahosted.org/mailman/listinfo/quarterback-devel
>>>> _______________________________________________
>>>> quarterback-devel mailing list
>>>> quarterback-devel(a)lists.fedorahosted.org
>>>>
https://lists.fedorahosted.org/mailman/listinfo/quarterback-devel
>>>
>>
>
> _______________________________________________
> quarterback-devel mailing list
> quarterback-devel(a)lists.fedorahosted.org
>
https://lists.fedorahosted.org/mailman/listinfo/quarterback-devel
> _______________________________________________
> quarterback-devel mailing list
> quarterback-devel(a)lists.fedorahosted.org
>
https://lists.fedorahosted.org/mailman/listinfo/quarterback-devel