On 11/01/2011 02:58 PM, Jakub Hrozek wrote:
On Tue, Nov 01, 2011 at 11:34:27AM +0100, Pavel Zuna wrote:
> On 10/31/2011 04:29 PM, Jakub Hrozek wrote:
>> On Thu, Oct 27, 2011 at 02:03:25PM +0200, Pavel Zuna wrote:
>>> On 10/25/2011 01:08 PM, Pavel Zuna wrote:
>>>> On 10/21/2011 06:44 PM, Stephen Gallagher wrote:
>>>>> On Fri, 2011-10-21 at 15:28 +0200, Pavel Zuna wrote:
>>>>>> Base on the second proposal:
>>>>>>
>>>>>>
https://fedorahosted.org/sssd/wiki/DesignDocs/SigChld
>>>>>>
>>>>>> There is some old SIGCHLD handling code in
src/providers/child_common.[ch], that
>>>>>> should probably go away if this gets accepted. There was also a
naming conflict
>>>>>> with the sss_child_ctx structure. This structure is only used
internally by
>>>>>> functions defined src/providers/child_common.c. I renamed the
original structure
>>>>>> to sss_child_ctx_old for now.
>>>>>
>>>>>
>>>>> Nack (but a good start)
>>>>>
>>>>> sss_child_destructor() cannot return the result of hash_delete().
Talloc
>>>>> destructors may only return 0 on success or -1 on failure. However,
>>>>> because destructor failure results in cancelling the free(), we
should
>>>>> just log an error here and return 0.
>>>>>
>>>>> sss_child_register:
>>>>> Don't initialize sss_child_ctx *tmp to the child_ctx passed in.
We want
>>>>> to leave child_ctx untouched until successful completion. Also,
please
>>>>> don't use the variable name "tmp". It's not
descriptive. Call it
>>>>> "child".
>>>>>
>>>>> Instead of allocating *child_ctx at the beginning, allocate
"child" on
>>>>> the mem_ctx.
>>>>>
>>>>> child = talloc_zero(mem_ctx, struct sss_child_ctx);
>>>>>
>>>>> Then, only at the end of execution, right before returning EOK, do
>>>>> *child_ctx = child;
>>>>>
>>
>> In sss_child_register():
>>
>> + error = hash_enter(sigchld_ctx->children,&key,&value);
>> + if (error != HASH_SUCCESS&& error != HASH_ERROR_KEY_NOT_FOUND) {
>>
>> I don't think hash_enter can return HASH_ERROR_KEY_NOT_FOUND (what would
>> it mean?) and even if it did, why ignore it?
>>
>>>>>
>>>>> sss_child_invoke_cb():
>>>>> I'm wary of having the callback invocation free the child_ctx. It
needs
>>>>> to be documented well (so that consumers know they must not free it
once
>>>>> the callback has fired or they'll get a double-free).
>>
>> This still needs fixing.
>>
>
> Forgot about that, sorry. Replaced the talloc_free with removing
> child_ctx from the hash table only.
>
>>>>>
>>>>> sss_child_handler():
>>>>> You need to check whether waitpid() returned an error and handle
that
>>>>> appropriately. waitpid() will return -1 on error and set errno. If
the
>>>>> errno value is EINTR, you should retry (because waitpid() was
>>>>> interrupted by a signal at the time). Other errors should generate a
>>>>> DEBUG message.
>>
>> I think you misunderstood how the error handling should be done:
>>
>> + if (pid == -1) {
>> + if (errno == ECHILD) {
>> + DEBUG(0, ("waitpid failed with ECHILD\n"));
>> + } else if (errno == EINVAL) {
>> + DEBUG(0, ("waitpid failed with EINVAL\n"));
>> + }
>> + return;
>> + }
>>
>> Instead of having a DEBUG() call per retval, it would be better to inform user
>> on why the child process exited, similarly to how child_sig_handler() does it
>> now.
>>
>
> I don't agree. The system is designed to inform providers that a
> child has exited by invoking their callbacks. It's up to the
> provider to decide what to do with the result and inform the user if
> appropriate. This would also make sense only in the case of a
> successful call (pid>= 0).
OK, then why the separate branches for ECHILD and EINVAL? Isn't it
easier to just print errno and strerror(errno) ?
Yeah, it's definitely better. Improved in current patch. :)
>
> In case of EINTR, we need to repeat the call to waitpid, because it
> was interrupted by a signal. In other cases (ECHILD and EINVAL) the
> call is going to always fail, so there's won't be any callbacks to
> invoke.
Yup.
>
>>>>>
>>>>
>>>> Ok, I'm done with all of this. It was pretty straight forward,
but...
>>>>
>>>>>
>>>>> Please separate any changes you make to the existing sss_child_ctx
into
>>>>> its own patch. I don't like just renaming it to _old, either.
Please
>>>>> find a more appropriate way to replace this old object (preferably
by
>>>>> converting it to use your new mechanism, also in a new patch).
>>>>>
>>>>
>>>> ... this, I'm trying to find a way to make the old code use the new
mechanism. I
>>>> think it would be better to remove the old one completely and replace it
with
>>>> the new one everywhere where it's used.
>>>>
>>>> Pavel
>>>
>>> New patch with the above mentioned issues fixed is attached to this email.
>>>
>>> We agreed that we're going to replace all uses of the old system by
>>> the new. Expect about 3 patches to follow on this one - one for each
>>> provider using the old system.
>>>
>>> Pavel
>>
>
> Thanks for taking the time to review this. New patch attached.
>
> Pavel
Two more issues (sorry for not catching all this at once):
Please don't just use DEBUG(0) for all the messages. DEBUG(0) is
supposed to be used for fatal errors that cause the provider to quit
completely.
See
https://fedorahosted.org/pipermail/sssd-devel/2011-June/006407.html
for explanation of the log levels. In general, this patch is going to be
included in 1.7 and onwards -- please use the new debug levels (see
src/util/util.h) defined as macros instead of integers.
Ok.
The second issue is that I'm getting a segfault when logging in
via
Kerberos, the backtrace is pointing to tevent_common_check_signal().
Finally found what the problem was. I suspect there might be a bug in tevent.
The SEGFAULT was generated from tevent_common_check_signal as you pointed out.
It happened when copying a portion of siginfo that my signal handler wasn't using.
Simply adding the SA_SIGINFO flag to tevent_add_signal solved the issue.
Updated patch attached.
Pavel