On 10/31/2011 04:29 PM, Jakub Hrozek wrote:
On Thu, Oct 27, 2011 at 02:03:25PM +0200, Pavel Zuna wrote:
> On 10/25/2011 01:08 PM, Pavel Zuna wrote:
>> On 10/21/2011 06:44 PM, Stephen Gallagher wrote:
>>> On Fri, 2011-10-21 at 15:28 +0200, Pavel Zuna wrote:
>>>> Base on the second proposal:
>>>>
>>>>
https://fedorahosted.org/sssd/wiki/DesignDocs/SigChld
>>>>
>>>> There is some old SIGCHLD handling code in
src/providers/child_common.[ch], that
>>>> should probably go away if this gets accepted. There was also a naming
conflict
>>>> with the sss_child_ctx structure. This structure is only used internally
by
>>>> functions defined src/providers/child_common.c. I renamed the original
structure
>>>> to sss_child_ctx_old for now.
>>>
>>>
>>> Nack (but a good start)
>>>
>>> sss_child_destructor() cannot return the result of hash_delete(). Talloc
>>> destructors may only return 0 on success or -1 on failure. However,
>>> because destructor failure results in cancelling the free(), we should
>>> just log an error here and return 0.
>>>
>>> sss_child_register:
>>> Don't initialize sss_child_ctx *tmp to the child_ctx passed in. We want
>>> to leave child_ctx untouched until successful completion. Also, please
>>> don't use the variable name "tmp". It's not descriptive.
Call it
>>> "child".
>>>
>>> Instead of allocating *child_ctx at the beginning, allocate "child"
on
>>> the mem_ctx.
>>>
>>> child = talloc_zero(mem_ctx, struct sss_child_ctx);
>>>
>>> Then, only at the end of execution, right before returning EOK, do
>>> *child_ctx = child;
>>>
In sss_child_register():
+ error = hash_enter(sigchld_ctx->children,&key,&value);
+ if (error != HASH_SUCCESS&& error != HASH_ERROR_KEY_NOT_FOUND) {
I don't think hash_enter can return HASH_ERROR_KEY_NOT_FOUND (what would
it mean?) and even if it did, why ignore it?
>>>
>>> sss_child_invoke_cb():
>>> I'm wary of having the callback invocation free the child_ctx. It needs
>>> to be documented well (so that consumers know they must not free it once
>>> the callback has fired or they'll get a double-free).
This still needs fixing.
Forgot about that, sorry. Replaced the talloc_free with removing child_ctx from
the hash table only.
>>>
>>> sss_child_handler():
>>> You need to check whether waitpid() returned an error and handle that
>>> appropriately. waitpid() will return -1 on error and set errno. If the
>>> errno value is EINTR, you should retry (because waitpid() was
>>> interrupted by a signal at the time). Other errors should generate a
>>> DEBUG message.
I think you misunderstood how the error handling should be done:
+ if (pid == -1) {
+ if (errno == ECHILD) {
+ DEBUG(0, ("waitpid failed with ECHILD\n"));
+ } else if (errno == EINVAL) {
+ DEBUG(0, ("waitpid failed with EINVAL\n"));
+ }
+ return;
+ }
Instead of having a DEBUG() call per retval, it would be better to inform user
on why the child process exited, similarly to how child_sig_handler() does it
now.
I don't agree. The system is designed to inform providers that a child has
exited by invoking their callbacks. It's up to the provider to decide what to do
with the result and inform the user if appropriate. This would also make sense
only in the case of a successful call (pid >= 0).
In case of EINTR, we need to repeat the call to waitpid, because it was
interrupted by a signal. In other cases (ECHILD and EINVAL) the call is going to
always fail, so there's won't be any callbacks to invoke.
>>>
>>
>> Ok, I'm done with all of this. It was pretty straight forward, but...
>>
>>>
>>> Please separate any changes you make to the existing sss_child_ctx into
>>> its own patch. I don't like just renaming it to _old, either. Please
>>> find a more appropriate way to replace this old object (preferably by
>>> converting it to use your new mechanism, also in a new patch).
>>>
>>
>> ... this, I'm trying to find a way to make the old code use the new
mechanism. I
>> think it would be better to remove the old one completely and replace it with
>> the new one everywhere where it's used.
>>
>> Pavel
>
> New patch with the above mentioned issues fixed is attached to this email.
>
> We agreed that we're going to replace all uses of the old system by
> the new. Expect about 3 patches to follow on this one - one for each
> provider using the old system.
>
> Pavel
Thanks for taking the time to review this. New patch attached.
Pavel