URL: https://github.com/SSSD/sssd/pull/5299 Author: pbrezina Title: #5299: dp: fix potential race condition in provider's sbus server Action: opened
PR body: """ We can hit a segfault if provider start is somehow delayed.
- dp_init_send - sbus_server_create_and_connect_send - sbus_server_create (*) - dp_init_done (callback for sbus_server_create_and_connect_send) - sbus_server_create_and_connect_recv - sbus_server_set_on_connection (sets clients data and creates dp_cli)
At (*) sbus server is already created and accepts new connections once we get into tevent loop. So it is possible that the client connects to server before sbus_server_set_on_connection is called and thus the client is not properly initialized. However it should not happen in normal start because providers are started before responders and it can happen only if data provider startup is somehow delay.
You can use this diff to reproduce the crash: ```diff --- a/src/providers/data_provider_be.c +++ b/src/providers/data_provider_be.c @@ -702,6 +702,8 @@ int main(int argc, const char *argv[]) uid_t uid; gid_t gid;
+ sleep(5); + struct poptOption long_options[] = { POPT_AUTOHELP SSSD_MAIN_OPTS ```
Resolves: https://github.com/SSSD/sssd/issues/5298 """
To pull the PR as Git branch: git remote add ghsssd https://github.com/SSSD/sssd git fetch ghsssd pull/5299/head:pr5299 git checkout pr5299
URL: https://github.com/SSSD/sssd/pull/5299 Title: #5299: dp: fix potential race condition in provider's sbus server
Label: +Bugzilla
URL: https://github.com/SSSD/sssd/pull/5299 Title: #5299: dp: fix potential race condition in provider's sbus server
Label: +Waiting for review
URL: https://github.com/SSSD/sssd/pull/5299 Title: #5299: dp: fix potential race condition in provider's sbus server
alexey-tikhonov commented: """ Those are reproducible: ``` FAIL test_ldap.py::test_refresh_after_cleanup_task FAIL test_memory_cache.py::test_colliding_hashes FAIL test_pam_responder.py::test_require_sc_auth_no_cert ```
``` AssertionError: assert -1 != -1 + where -1 = <built-in method find of str object at 0x7fc24a6621f0>(('pam_authenticate for user [user1]: Authentication ' + 'service cannot retrieve authentication info')) + where <built-in method find of str object at 0x7fc24a6621f0> = 'Unable to connect to system bus!\nInfoPipe User lookup with [user1] failed.\nPassword: Password: pam_authenticate for user [user1]: System error\n\nPAM Environment:\n - no env -\n'.find ``` """
See the full comment at https://github.com/SSSD/sssd/pull/5299#issuecomment-685938822
URL: https://github.com/SSSD/sssd/pull/5299 Title: #5299: dp: fix potential race condition in provider's sbus server
Label: -Waiting for review
URL: https://github.com/SSSD/sssd/pull/5299 Title: #5299: dp: fix potential race condition in provider's sbus server
Label: +Changes requested
URL: https://github.com/SSSD/sssd/pull/5299 Author: pbrezina Title: #5299: dp: fix potential race condition in provider's sbus server Action: synchronized
To pull the PR as Git branch: git remote add ghsssd https://github.com/SSSD/sssd git fetch ghsssd pull/5299/head:pr5299 git checkout pr5299
URL: https://github.com/SSSD/sssd/pull/5299 Author: pbrezina Title: #5299: dp: fix potential race condition in provider's sbus server Action: synchronized
To pull the PR as Git branch: git remote add ghsssd https://github.com/SSSD/sssd git fetch ghsssd pull/5299/head:pr5299 git checkout pr5299
URL: https://github.com/SSSD/sssd/pull/5299 Title: #5299: dp: fix potential race condition in provider's sbus server
alexey-tikhonov commented: """ Is the difference in new version - patch of src/providers/data_provider/dp_client.c? """
See the full comment at https://github.com/SSSD/sssd/pull/5299#issuecomment-687087485
URL: https://github.com/SSSD/sssd/pull/5299 Title: #5299: dp: fix potential race condition in provider's sbus server
pbrezina commented: """ Yes. The problem was that backend did not send its id registration in time. It is not supposed to register to itself but the new connection callback is not set before the backend connect to the server so dp_cli is created for it as well. """
See the full comment at https://github.com/SSSD/sssd/pull/5299#issuecomment-687092304
URL: https://github.com/SSSD/sssd/pull/5299 Title: #5299: dp: fix potential race condition in provider's sbus server
alexey-tikhonov commented: """
Yes. The problem was that backend did not send its id registration in time. It is not supposed to register to itself but the new connection callback is not set before the backend connect to the server so dp_cli is created for it as well.
Wouldn't it be better to skip creation of `dp_cli` (and setting up timer) right in `dp_client_init()`? """
See the full comment at https://github.com/SSSD/sssd/pull/5299#issuecomment-688308170
URL: https://github.com/SSSD/sssd/pull/5299 Title: #5299: dp: fix potential race condition in provider's sbus server
pbrezina commented: """ Unfortunately, we can't identify the connection at this point. It requires two messages (Hello and RegisterName) to be sent and this happens after this connection callback is called. """
See the full comment at https://github.com/SSSD/sssd/pull/5299#issuecomment-691939044
URL: https://github.com/SSSD/sssd/pull/5299 Title: #5299: dp: fix potential race condition in provider's sbus server
Label: +Waiting for review
URL: https://github.com/SSSD/sssd/pull/5299 Title: #5299: dp: fix potential race condition in provider's sbus server
Label: -Changes requested
URL: https://github.com/SSSD/sssd/pull/5299 Title: #5299: dp: fix potential race condition in provider's sbus server
alexey-tikhonov commented: """
We can hit a segfault if provider start is somehow delayed.
* dp_init_send * sbus_server_create_and_connect_send * sbus_server_create (*) * dp_init_done (callback for sbus_server_create_and_connect_send) * sbus_server_create_and_connect_recv * sbus_server_set_on_connection (sets clients data and creates dp_cli)
At (*) sbus server is already created and accepts new connections once we get into tevent loop. So it is possible that the client connects to server before sbus_server_set_on_connection is called and thus the client is not properly initialized. However it should not happen in normal start because providers are started before responders and it can happen only if data provider startup is somehow delay.
You can use this diff to reproduce the crash:
--- a/src/providers/data_provider_be.c +++ b/src/providers/data_provider_be.c @@ -702,6 +702,8 @@ int main(int argc, const char *argv[]) uid_t uid; gid_t gid; + sleep(5); + struct poptOption long_options[] = { POPT_AUTOHELP SSSD_MAIN_OPTS
Does it really help to reproduce the crash?
At this point `sbus_server_create()` wasn't executed yet (nothing was executed yet actually)
Funny thing is, crash indeed happen in my testing... but only couple of times per tens of attempts. """
See the full comment at https://github.com/SSSD/sssd/pull/5299#issuecomment-692810749
URL: https://github.com/SSSD/sssd/pull/5299 Title: #5299: dp: fix potential race condition in provider's sbus server
alexey-tikhonov commented: """
We can hit a segfault if provider start is somehow delayed.
* dp_init_send * sbus_server_create_and_connect_send * sbus_server_create (*) * dp_init_done (callback for sbus_server_create_and_connect_send) * sbus_server_create_and_connect_recv * sbus_server_set_on_connection (sets clients data and creates dp_cli)
At (*) sbus server is already created and accepts new connections once we get into tevent loop. So it is possible that the client connects to server before sbus_server_set_on_connection is called and thus the client is not properly initialized. However it should not happen in normal start because providers are started before responders and it can happen only if data provider startup is somehow delay.
You can use this diff to reproduce the crash:
--- a/src/providers/data_provider_be.c +++ b/src/providers/data_provider_be.c @@ -702,6 +702,8 @@ int main(int argc, const char *argv[]) uid_t uid; gid_t gid; + sleep(5); + struct poptOption long_options[] = { POPT_AUTOHELP SSSD_MAIN_OPTS
Does it really help to reproduce the crash?
At this point `sbus_server_create()` wasn't executed yet (nothing was executed yet actually)
Funny thing is, crash indeed happen in my testing... but only couple of times per tens of attempts. """
See the full comment at https://github.com/SSSD/sssd/pull/5299#issuecomment-692810749
URL: https://github.com/SSSD/sssd/pull/5299 Title: #5299: dp: fix potential race condition in provider's sbus server
alexey-tikhonov commented: """ Thank you. ACK. """
See the full comment at https://github.com/SSSD/sssd/pull/5299#issuecomment-692835508
URL: https://github.com/SSSD/sssd/pull/5299 Title: #5299: dp: fix potential race condition in provider's sbus server
Label: -Waiting for review
URL: https://github.com/SSSD/sssd/pull/5299 Title: #5299: dp: fix potential race condition in provider's sbus server
Label: +Accepted
URL: https://github.com/SSSD/sssd/pull/5299 Title: #5299: dp: fix potential race condition in provider's sbus server
Label: +Ready to push
URL: https://github.com/SSSD/sssd/pull/5299 Title: #5299: dp: fix potential race condition in provider's sbus server
pbrezina commented: """ Pushed PR: https://github.com/SSSD/sssd/pull/5299
* `master` * 4a84f8e18ea5604ac7e69849dee492718fd96296 - dp: fix potential race condition in provider's sbus server
"""
See the full comment at https://github.com/SSSD/sssd/pull/5299#issuecomment-694191564
URL: https://github.com/SSSD/sssd/pull/5299 Title: #5299: dp: fix potential race condition in provider's sbus server
Label: +Pushed
URL: https://github.com/SSSD/sssd/pull/5299 Title: #5299: dp: fix potential race condition in provider's sbus server
Label: -Accepted
URL: https://github.com/SSSD/sssd/pull/5299 Title: #5299: dp: fix potential race condition in provider's sbus server
Label: -Ready to push
URL: https://github.com/SSSD/sssd/pull/5299 Author: pbrezina Title: #5299: dp: fix potential race condition in provider's sbus server Action: closed
To pull the PR as Git branch: git remote add ghsssd https://github.com/SSSD/sssd git fetch ghsssd pull/5299/head:pr5299 git checkout pr5299
sssd-devel@lists.fedorahosted.org