On Wed, Apr 09, 2014 at 03:03:49PM +0800, WANG Chao wrote:
[..]
> Sorry I did not get this. So how does rootflags=nofail work? We
will wait
> for root to show up before we go with pre-pivot hooks?
Sorry I didn't make myself clear.
rootflags=nofail has the following effect on sysroot.mount:
- initrd-root-fs.target "Wants" sysroot.mount
W/o nofail:
- initrd-root-fs.target "Requires" sysroot.mount
- sysroot.mount is started "Before" initrd-root-fs.target
In both case, dracut-pre-pivot.service is getting started "After"
sysroot.mount.
Ok so rootfs=nofail will change "Requires=sysroot.mount" to
"Wants=sysroot.mount" in initrd-root-fs.target.
I think that's perfect. IIUC, Changing it to Wants= will mean that we
will wait for sysroot.mount to activate and if activation fails,
initrd-root-fs.target will be activated. That's the behavior we
want in kdump.
So why should we get rid of rootfs=nofail?
> > > We also need to specify in chagnelog the flip side of
the patch. That
> > > is now in case of failure, we probably will not get control and I think
> > > systemd can put us in rescue mode.
> >
> > No, we disable dropping to shell. So we hang in case of such failure.
>
> If we always hang, what was the point of disabling dropping to shell?
At the very beginning, we disabled dropping to shell at arbitrary point
of the boot process, because we want to get into kdump.sh and do
whatever error handling within kdump.sh itself.
Yes and we thought by not dropping into shell we will continue with
processing and ultimately reach kdump. But I think that's a very bad
way to handle errors. Once an error has occured we should have a direct
way to jump into kdump error handler.
For example, user specify "default reboot" and if we don't disable
shell, we will drop into shell instead of action "reboot"
And given the fact shell is disabled, we introduced
"nofail" to solve
the hang issue when running into a disk failure.
What is "hang". Are you defining "hang" as dropping into shell? I am
not able to understand what will happen if we don't pass "nofail".
>
> I see that emergency shell is invoked by dracut directly. I think in
> those cases it will return immediately and dracut script will continue
> even after failure.
A disk failure would cause mount unit failure. A mount unit failure
would cause a local-fs.target never be reached. local-fs.target never
being reached would cause dracut-pre-pivot.service never get started.
dracut-pre-pivot.service has After= dependency on sysroot.mount.
After=initrd.target initrd-parse-etc.service sysroot.mount
So if sysroot.mount fails, dracut-pre-pivot should still be started? Are you
sure that it does not get started if sysroot.mount fails.
However if "nofail" is specified for the mount unit, local-fs.target
would only "Wants" the mount unit, rather "Requires" in
"fail" mode.
I am not sure what's the logic behind converting Requires= to Wants=
with nofail. So if by default we have "Wants=" dependencies on all
mount files, then local-fs.target will reach even if some mount failed.
But systemd folks might not like this idea as they might have other
reasons for why they are using Requires= by default.
That said, local-fs.target would still be reached in case of
"nofail"
mount failure, but would never be reached in case of "fail" mount
failure.
Got it. By default local-fs.target has Requires= dependencies and
nofail converts that into Wants= dependency and that helps in our
case.
[..]
> So question I have is that can we drop another file say
> module-emergency-handling and emergecny shell will call that. And kdump
> can drop module-emergency-handling file or create this link to
> kdump-error-handling and we can handle the error.
>
> IOW, once dracut has encountedred the failure, is there any point in
> continuing further and then expect to drop into kdump module from
> pre-pivot hook.
>
> I think if we error out early, I guess root might not be available and
> I think that's fine. There are so many places things can go wrong and
> we can't guarantee that root is available as backup target.
>
> Second place of failure is from systemd. I see there are two emergency
> services rescue.service and dracut-emergency.service. They both call
> /bin/emergency-shell. So to me if we fix /bin/emergency-shell to call
> /bin/module-emergency-shell that would automatically make sure that
> system will not hang and kdump will get control after failure?
This proposal makes more sense to me. We can implement our own error
handler and override the default one provided by dracut or systemd. This
would give us more flexiability and scalability. We can do whatever we
like depending on the failure type, user specified failsafe action and
other fators.
The only thing is how we implement. I think an alternative approach is
to create a new emergency service and put it under /etc/systemd/ or
/run/systemd/. So it can take procedence over the default one under
/usr/lib/systemd/.
In addition, as you said, Lennart is likely going to facilitate such
replacing. We can wait until that happens and then make the decision.
Lennart asked me to send a mail to him. I am not sure what exactly was
he planning to do. Before I send a mail to him I want to be sure that
we understand problem well and what we want to do.
Actually dropping an overriding emergency service in /run/systemd/ sounds
reasonable. Can you give it a try and see if it works.
We also need to see if calling kdump directly from error path is working
or not.
I think we probably can't implement "mount_root_run_init" logic in this
path as failure has occurred.
I think we will have to keep our default actions simple and minimal. Boot
path is complicated and now dracut and systemd control it completely. We
will not have too much of flexibility w.r.t error handling.
Thanks
Vivek