On 05/07/14 at 10:08am, Vivek Goyal wrote:
On Wed, May 07, 2014 at 09:49:00AM -0400, Vivek Goyal wrote:
> On Wed, May 07, 2014 at 05:51:55PM +0800, Dave Young wrote:
> > Hi, Chao and Vivek
> >
> > As far as I understand your new error handler will jump into kdump.sh in
> > case any error in systemd. It will fix most of the problems. But there will
> > be limitation for new error handler service that dump_to_rootfs will always
> > fail in case error happens before entering kdump.sh.
>
> Once an error happens in the system, there are no guarantees what will
> work and what will not.
>
> So why will dump_to_rootfs will not work? Because we have stopped rest
> of the services?
>
> >
> > >From the thread link you sent about systemd discussion, they proposed
below
> > solution:
> >
http://thread.gmane.org/gmane.comp.sysutils.systemd.devel/18278
> >
> > It will create a local-fs-all.target for us to wait them, I'd like to know
> > what's the conclusion, why can't we do this way?
>
> What's the advantage of this proposal? I think it is just making things
> more twisted and complicated. That is first we are forced to pass
"nofail"
> option and then create extra targets which wait for dependencies to
> finish.
> >
> > Possiblly there's other fail point other than fsck/mount in systemd which
> > we can not catch, is this the reason we ignore the proposal?
>
> I think that proposal is complicated that's why I suggested a simpler
> way.
>
> So can you please explain that why that proposal is better and why
> we should do things that way.
Thinking more about it.
Currently error handler is emergency shell. I am assuming this handler is
invoked only for serious issues and one invokes emergency shell only if
systemd can not make further progress with the boot.
Right. Only serious issue that would stop systemd will trigger
emergency.
During the boot, an issue is serious or fatal from the perspective
of systemd. But is there any chance that issue could be trivial for
kdump? I don't have any handy data to prove it, that's only my concern.
If that's the case, then calling kdump handler from emergency shell should
work just fine.
But if emergency shell can be invoked even for trivial issues while
systemd could make further progress, then waiting for all local mounts
to finish and invoking kdump handler will make sense too.
In a nutshell it boils down to that when do we want to intervene. After
waiting for all filesystems to mount or during first instance of error.
Waiting for all filesystems to mount has advantage that there is an
increased probability of dump_to_rootfs might work. (If error happend
in mounting other filesystems but not root file system). In that case
first waiting for all local mounts to finish and then invoking kdump error
handler will make sense.
On the flip side continuing after failure can lead to unpredictable
results. How many people continue to boot after they have fallen into
emergency shell. I suspect that nobody tests that path. So it might happen
that emergency shell was called but we did not do anything (due to kdump
no-emergency-shell file), and continue and after that things might just
hang and system never gets a chance to even call reboot.
Chao, you had mentioned that dump_to_rootfs is not working for you. Do you
have details why it was not working for you.
Because when emergency.service is triggered, all the other services are
stopped running, even the mounted filesystem is umounted.
But I've figured a way to bring /sysroot back to life before
dump_to_rootfs by manually:
systemctl start initqueue
systemctl start sysroot.mount
I'm not sure if I'm supposed to do. But it seems to be the best chance
to make root fs available and mounted.
Thanks
WANG Chao