On 05/16/2011 02:56 PM, Eric Blake wrote:
On 05/16/2011 11:47 AM, Laine Stump wrote:
> The child process can't do any reasonable error reporting about why it
> failed, but it can return a different code to the parent for each
> possible failure. Instead of returning 1 for all forms of failure
> during exec, return 127 if the exec fails, and 126 if
> pthread_sigmask() fails, and log relevant errors in the parent.
> ---
> src/dutil_linux.c | 13 +++++++++----
> 1 files changed, 9 insertions(+), 4 deletions(-)
Traditionally, shells have treated execve() errors according to errno:
ENOENT (or anything else that implies not found) -> 127
EACCES, EINVAL, ENOEXEC (or anything else that implies found but not
executable or permissions in the way of finding) -> 126
any failure not related to an exec attempt -> 1-125
So you may want to return 127/126 for exec() failure, 125 for
pthread_sigmask() failure, and so forth if you want your error messages
to be consistent with the shell, env(1), nice(1), and so forth.
Okay, I changed the numbering to match coreutils (including detecting
the difference between EXIT_ENOENT and EXIT_COULD_NOT_INVOKE, and pushed.
> diff --git a/src/dutil_linux.c b/src/dutil_linux.c
> index 9935431..052221d 100644
> --- a/src/dutil_linux.c
> +++ b/src/dutil_linux.c
> @@ -129,8 +129,8 @@ exec_program(struct netcf *ncf,
> propagate that to children */
> sigemptyset(&newmask);
> if (pthread_sigmask(SIG_SETMASK,&newmask, NULL) != 0) {
> - /* don't report_error, as it will never be seen anyway */
> - _exit(1);
> + /* return a unique code and let the parent log the error */
> + _exit(126);
> }
>
> /* close all open file descriptors */
> @@ -141,8 +141,8 @@ exec_program(struct netcf *ncf,
> execvp(argv[0], (char **) argv);
>
> /* if execvp() returns, it has failed */
> - /* don't report_error, as it will never be seen anyway */
> - _exit(1);
> + /* return a unique code and let the parent log the error */
> + _exit(127);
>
> error:
> /* This is cleanup of parent process only - child
> @@ -185,6 +185,11 @@ int run_program(struct netcf *ncf, const char *const *argv) {
> ERR_THROW(!WIFEXITED(exitstatus), ncf, EEXEC,
> "'%s' terminated improperly: %d",
> argv_str, WEXITSTATUS(exitstatus));
That's a pre-existing bug. You can't read WEXITSTATUS if !WIFEXITED.
I also pushed a small patch to remove the reference to WEXITSTATUS when
WIFEXITED is false. (I knew about this limitation, not sure how I
managed to miss it when I originally wrote the code).
> + ERR_THROW(WEXITSTATUS(exitstatus) == 127, ncf, EEXEC,
> + "Running '%s' exec failed", argv_str);
> + ERR_THROW(WEXITSTATUS(exitstatus) == 126, ncf, EEXEC,
> + "Running '%s' failed to reset child process signal
mask",
> + argv_str);
ACK to the idea in this part, although you may want to use symbolic
names (enum or #define) so that if you adjust the values later (such as
adding 126/127 for the two types of exec failure), then you've reduced
the impact to the rest of the code. For example, coreutils uses the
names EXIT_ENOENT for 127 and EXIT_CANNOT_INVOKE for 126.
Yup. Good idea, so that's what I did!
Thanks!