On 07/29/14 at 09:41am, Vivek Goyal wrote:
On Tue, Jul 29, 2014 at 08:57:55PM +0800, WANG Chao wrote:
> Now upon failure kdump script might not be called at all and it might
> not be able to execute default action. It results in a hang.
>
> Because we disable emergency shell and rely on kdump.sh being invoked
> through dracut-pre-pivot hook. But it might happen that we never call
> into dracut-pre-pivot hook because certain systemd targets could not
> reach due to failure in their dependencies. In those cases error
> handling code does not run and system hangs.
I think it is important to show the systemd dependency graph here.
Just couple of lines.
xyz--->foo.target---->bar.service--->dracut-pre-pivot
And show what target does not reach hence mention that dracut-pre-pivot
hook does not run.
[..]
> +dump_to_rootfs()
> +{
> +
> + echo "Kdump: trying to bring up rootfs device"
> + systemctl start dracut-initqueue
> + echo "Kdump: waiting for rootfs mount, will timeout after 90
seconds"
> + systemctl start sysroot.mount
Will this ever try to enter emergency shell again (in case of failure?)
sysroot.mount wouldn't, only time out after 90 seconds.
But dracut-initqueue would enter kdump error handler again. I'm not sure
which is the best way to deal with dump_to_rootfs case.
a). Calling dracut-initqueue in kdump error handler, would cause a loop of
emergency -> dracut-initqueue -> emergency -> dracut-initqueue ... ,
if something's wrong within dracut-initqueue.
b). Not calling dracut-initqueue in kdump error handler, would cause the
root lvm not being brought up, if kdump error handler is triggered
early when dracut-initqueue hasn't run yet. In which case, root lvm
isn't there thus sysroot.mount would eventually time out.
Between a) and b), I'd prefer b). Because in my experience, most of the
user space errors happened during 2nd kernel boot are related to device
(disk, network) setting up in dracut-initqueue or something's wrong
within kdump.sh.
With b), the trade-off is if kdump error handler runs before
dracut-initqueue, we can't dump_to_rootfs because the root device isn't
ready. But I would say that if such kind of critical error happens so
early, we can't really guarantee a reliable dumping to rootfs.
And if error happens in dracut-initqueue, but the error isn't related to
root disk, we can still do dump_to_rootfs, in case of b).
Thanks
WANG Chao