On 04/09/14 at 11:34am, Vivek Goyal wrote:
On Wed, Apr 09, 2014 at 03:03:49PM +0800, WANG Chao wrote:
> > Sorry I did not get this. So how does rootflags=nofail work? We will wait
> > for root to show up before we go with pre-pivot hooks?
> Sorry I didn't make myself clear.
> rootflags=nofail has the following effect on sysroot.mount:
> - initrd-root-fs.target "Wants" sysroot.mount
> W/o nofail:
> - initrd-root-fs.target "Requires" sysroot.mount
> - sysroot.mount is started "Before" initrd-root-fs.target
> In both case, dracut-pre-pivot.service is getting started "After"
Ok so rootfs=nofail will change "Requires=sysroot.mount" to
"Wants=sysroot.mount" in initrd-root-fs.target.
I think that's perfect. IIUC, Changing it to Wants= will mean that we
will wait for sysroot.mount to activate and if activation fails,
initrd-root-fs.target will be activated. That's the behavior we
want in kdump.
So why should we get rid of rootfs=nofail?
No, we don't. I was explaining why we only remove nofail for non-root
file system and keep rootflags=nofail ...
> > > > We also need to specify in chagnelog the flip side of the patch.
> > > > is now in case of failure, we probably will not get control and I
> > > > systemd can put us in rescue mode.
> > >
> > > No, we disable dropping to shell. So we hang in case of such failure.
> > If we always hang, what was the point of disabling dropping to shell?
> At the very beginning, we disabled dropping to shell at arbitrary point
> of the boot process, because we want to get into kdump.sh and do
> whatever error handling within kdump.sh itself.
Yes and we thought by not dropping into shell we will continue with
processing and ultimately reach kdump. But I think that's a very bad
way to handle errors. Once an error has occured we should have a direct
way to jump into kdump error handler.
> For example, user specify "default reboot" and if we don't disable
> shell, we will drop into shell instead of action "reboot"
> And given the fact shell is disabled, we introduced "nofail" to solve
> the hang issue when running into a disk failure.
What is "hang". Are you defining "hang" as dropping into shell? I am
not able to understand what will happen if we don't pass "nofail".
By "hang", I mean systemd stops running any service, because a certain
target isn't reached, all the services which need to run after this
target get blocked. Hence all services about to run is getting stuck and
> > I see that emergency shell is invoked by dracut directly. I think in
> > those cases it will return immediately and dracut script will continue
> > even after failure.
> A disk failure would cause mount unit failure. A mount unit failure
> would cause a local-fs.target never be reached. local-fs.target never
> being reached would cause dracut-pre-pivot.service never get started.
dracut-pre-pivot.service has After= dependency on sysroot.mount.
After=initrd.target initrd-parse-etc.service sysroot.mount
So if sysroot.mount fails, dracut-pre-pivot should still be started? Are you
sure that it does not get started if sysroot.mount fails.
dracut-pre-pivot still gets started because dracut-pre-pivot.service
doesn't "Requires=" sysroot.mount. So no matter if sysroot.mount fails
or not, as long as it's started, dracut-pre-pivot will run.
> However if "nofail" is specified for the mount unit, local-fs.target
> would only "Wants" the mount unit, rather "Requires" in
I am not sure what's the logic behind converting Requires= to Wants=
with nofail. So if by default we have "Wants=" dependencies on all
mount files, then local-fs.target will reach even if some mount failed.
"Wants=" will not order the service starting sequence.
- W/ "nofail", local-fs.target "Wants=" mount unit.
- W/o "nofail", local-fs.target "Requires=" mount unit and mount unit
runs "Before=" local-fs.target.
So w/ "nofail", local-fs.target can be reached no matter what. But when
local-fs.target is reached, it doesn't mean all mount unit get started,
ie. /etc/fstab all entries are mounted.
But systemd folks might not like this idea as they might have other
reasons for why they are using Requires= by default.
Because "fail" mode is the default mode. That said, any failure would
cause a error handler (emergency.service).
If user want "nofail" mode, a weak dependency "Wants" would be the
choice, because the mount failure wouldn't block local-fs.target from
Like we've discussed in the systemd-devel thread, the preferrable way to
handle this is to remove "nofail" and create our own error handler.
> That said, local-fs.target would still be reached in case of "nofail"
> mount failure, but would never be reached in case of "fail" mount
Got it. By default local-fs.target has Requires= dependencies and
nofail converts that into Wants= dependency and that helps in our
> > So question I have is that can we drop another file say
> > module-emergency-handling and emergecny shell will call that. And kdump
> > can drop module-emergency-handling file or create this link to
> > kdump-error-handling and we can handle the error.
> > IOW, once dracut has encountedred the failure, is there any point in
> > continuing further and then expect to drop into kdump module from
> > pre-pivot hook.
> > I think if we error out early, I guess root might not be available and
> > I think that's fine. There are so many places things can go wrong and
> > we can't guarantee that root is available as backup target.
> > Second place of failure is from systemd. I see there are two emergency
> > services rescue.service and dracut-emergency.service. They both call
> > /bin/emergency-shell. So to me if we fix /bin/emergency-shell to call
> > /bin/module-emergency-shell that would automatically make sure that
> > system will not hang and kdump will get control after failure?
> This proposal makes more sense to me. We can implement our own error
> handler and override the default one provided by dracut or systemd. This
> would give us more flexiability and scalability. We can do whatever we
> like depending on the failure type, user specified failsafe action and
> other fators.
> The only thing is how we implement. I think an alternative approach is
> to create a new emergency service and put it under /etc/systemd/ or
> /run/systemd/. So it can take procedence over the default one under
> In addition, as you said, Lennart is likely going to facilitate such
> replacing. We can wait until that happens and then make the decision.
Lennart asked me to send a mail to him. I am not sure what exactly was
he planning to do. Before I send a mail to him I want to be sure that
we understand problem well and what we want to do.
Actually dropping an overriding emergency service in /run/systemd/ sounds
reasonable. Can you give it a try and see if it works.
We also need to see if calling kdump directly from error path is working
Sounds good to me. We can make use of the existing error handling code.
I think we probably can't implement "mount_root_run_init" logic in this
path as failure has occurred.
We don't implement it now. As I remembered rhel6 has such "default"
option, but we have moved on.
I think we will have to keep our default actions simple and minimal. Boot
path is complicated and now dracut and systemd control it completely. We
will not have too much of flexibility w.r.t error handling.
I think it's a good idea directly jumping to our kdump.sh. We can extend
current kdump.sh. Do some reasonable checking, if the error is trivial
and dump target is ready, then we dump. Otherwise do default action.
What do you think?