On Wed, May 07, 2014 at 09:49:00AM -0400, Vivek Goyal wrote:
On Wed, May 07, 2014 at 05:51:55PM +0800, Dave Young wrote:
> Hi, Chao and Vivek
>
> As far as I understand your new error handler will jump into kdump.sh in
> case any error in systemd. It will fix most of the problems. But there will
> be limitation for new error handler service that dump_to_rootfs will always
> fail in case error happens before entering kdump.sh.
Once an error happens in the system, there are no guarantees what will
work and what will not.
So why will dump_to_rootfs will not work? Because we have stopped rest
of the services?
>
> >From the thread link you sent about systemd discussion, they proposed below
> solution:
>
http://thread.gmane.org/gmane.comp.sysutils.systemd.devel/18278
>
> It will create a local-fs-all.target for us to wait them, I'd like to know
> what's the conclusion, why can't we do this way?
What's the advantage of this proposal? I think it is just making things
more twisted and complicated. That is first we are forced to pass "nofail"
option and then create extra targets which wait for dependencies to
finish.
>
> Possiblly there's other fail point other than fsck/mount in systemd which
> we can not catch, is this the reason we ignore the proposal?
I think that proposal is complicated that's why I suggested a simpler
way.
So can you please explain that why that proposal is better and why
we should do things that way.
Thinking more about it.
Currently error handler is emergency shell. I am assuming this handler is
invoked only for serious issues and one invokes emergency shell only if
systemd can not make further progress with the boot.
If that's the case, then calling kdump handler from emergency shell should
work just fine.
But if emergency shell can be invoked even for trivial issues while
systemd could make further progress, then waiting for all local mounts
to finish and invoking kdump handler will make sense too.
In a nutshell it boils down to that when do we want to intervene. After
waiting for all filesystems to mount or during first instance of error.
Waiting for all filesystems to mount has advantage that there is an
increased probability of dump_to_rootfs might work. (If error happend
in mounting other filesystems but not root file system). In that case
first waiting for all local mounts to finish and then invoking kdump error
handler will make sense.
On the flip side continuing after failure can lead to unpredictable
results. How many people continue to boot after they have fallen into
emergency shell. I suspect that nobody tests that path. So it might happen
that emergency shell was called but we did not do anything (due to kdump
no-emergency-shell file), and continue and after that things might just
hang and system never gets a chance to even call reboot.
Chao, you had mentioned that dump_to_rootfs is not working for you. Do you
have details why it was not working for you.
Thanks
Vivek