On Wed, Jul 21, 2021 at 2:51 PM Coiby Xu <coxu(a)redhat.com> wrote:
On Wed, Jul 21, 2021 at 02:04:52AM +0800, Kairui Song wrote:
>When `failure_action` is set to `dump_to_rootfs`, the message:
>"Waiting for rootfs mount, will timeout after 90 seconds"
>is actually wrong. Kdump will simply call `systemctl start sysroot.mount`,
>but the timeout value of sysroot.mount depends on the unit service and
>dracut parameters. And by default, dracut will set
>JobRunningTimeoutSec=0 and JobTimeoutSec=0 for the device units,
>which means it will wait forever. (see wait_for_dev function in dracut)
>
>For some devices, this can be fixed by setting rd.timeout=90. But when
>initqueue is set enabled during initramfs build, dracut will force set
>timeout for host devices to `0`. (see 99base/module-setup.sh).
>
>Depending on dracut / systemd can make things unpredictable and break as
>parameters or code change. To make things easy to understand and
>maintain, just call `systemctl` with `--no-block` params, and implement
>a standalone wait loop. Now `dump_to_rootfs` will actually wait for
>90s then timeout.
>
>Signed-off-by: Kairui Song <kasong(a)redhat.com>
>---
> kdump-lib-initramfs.sh | 14 ++++++++++++--
> 1 file changed, 12 insertions(+), 2 deletions(-)
>
>diff --git a/kdump-lib-initramfs.sh b/kdump-lib-initramfs.sh
>index 4cd18e4..319f9a0 100755
>--- a/kdump-lib-initramfs.sh
>+++ b/kdump-lib-initramfs.sh
>@@ -230,10 +230,20 @@ dump_to_rootfs()
> dinfo "Clean up dead systemd services"
> systemctl cancel
> dinfo "Waiting for rootfs mount, will timeout after 90 seconds"
>- systemctl start sysroot.mount
>+ systemctl start --no-block sysroot.mount
>
>- ddebug "NEWROOT=$NEWROOT"
>+ _loop=0
Should we make _loop a local variable?
Good suggestion, I wanted to be more POSIX compatible so I didn't use
`local` here (shellcheck complains about this). Maybe we can make a
code guidance doc for kdump scripts and clean up the mess of sh/bash
syntax issues in the future.
>+ while [ $_loop -lt 90 ] && ! is_mounted /sysroot; do
>+ sleep 1
>+ _loop=$((_loop + 1))
>+ done
>+
>+ if ! is_mounted /sysroot; then
>+ derror "Failed to mount rootfs"
>+ return
>+ fi
>
>+ ddebug "NEWROOT=$NEWROOT"
> dump_fs $NEWROOT
> }
>
>--
>2.31.1
>
After applying this patch,
hp-moonshot-03-c05.lab.eng.rdu2.redhat.com no
longer hangs after printing "Waiting for rootfs mount, will timeout
after 90 seconds".
Except for the small issue, the rest looks good to me.
Acked-by: Coiby Xu <coxu(a)redhat.com>
--
Best regards,
Coiby
--
Best Regards,
Kairui Song