On 04/08/14 at 05:25pm, Vivek Goyal wrote:
On Wed, Apr 09, 2014 at 12:35:01AM +0800, WANG Chao wrote:
> On 04/08/14 at 10:01am, Vivek Goyal wrote:
> > On Tue, Apr 08, 2014 at 01:15:26PM +0800, WANG Chao wrote:
> > > In current systemd implementation, nofail mount will not block
> > > local-fs.target, which means our kdump.sh (in dracut-pre-pivot.service)
> > > can't wait for nofail mount. And kdump.sh could run early than nofail
> > > mount happens.
> > >
> > > For short term, let's stop passing nofail to mount. As for
> > > sysroot.mount, since we have explicitly specify to wait for it,
"nofail"
> > > isn't a problem.
> > >
> > > Signed-off-by: WANG Chao <chaowang(a)redhat.com>
> >
> >
> > Chao,
> >
> > I see that we are passing rootflags=nofail. What's the effect of that?
>
> Same effect as other mount. But since we will explicitly wait for
> sysroot.mount in dracut-pre-pivot.service, we should be worried about
> sysroot.mount. rootflags=nofail works as expected.
Sorry I did not get this. So how does rootflags=nofail work? We will wait
for root to show up before we go with pre-pivot hooks?
Sorry I didn't make myself clear.
rootflags=nofail has the following effect on sysroot.mount:
- initrd-root-fs.target "Wants" sysroot.mount
W/o nofail:
- initrd-root-fs.target "Requires" sysroot.mount
- sysroot.mount is started "Before" initrd-root-fs.target
In both case, dracut-pre-pivot.service is getting started "After"
sysroot.mount.
>
> >
> > We also need to specify in chagnelog the flip side of the patch. That
> > is now in case of failure, we probably will not get control and I think
> > systemd can put us in rescue mode.
>
> No, we disable dropping to shell. So we hang in case of such failure.
If we always hang, what was the point of disabling dropping to shell?
At the very beginning, we disabled dropping to shell at arbitrary point
of the boot process, because we want to get into kdump.sh and do
whatever error handling within kdump.sh itself.
For example, user specify "default reboot" and if we don't disable
shell, we will drop into shell instead of action "reboot"
And given the fact shell is disabled, we introduced "nofail" to solve
the hang issue when running into a disk failure.
I see that emergency shell is invoked by dracut directly. I think in
those cases it will return immediately and dracut script will continue
even after failure.
A disk failure would cause mount unit failure. A mount unit failure
would cause a local-fs.target never be reached. local-fs.target never
being reached would cause dracut-pre-pivot.service never get started.
However if "nofail" is specified for the mount unit, local-fs.target
would only "Wants" the mount unit, rather "Requires" in
"fail" mode.
That said, local-fs.target would still be reached in case of "nofail"
mount failure, but would never be reached in case of "fail" mount
failure.
So question I have is that can we drop another file say
module-emergency-handling and emergecny shell will call that. And kdump
can drop module-emergency-handling file or create this link to
kdump-error-handling and we can handle the error.
IOW, once dracut has encountedred the failure, is there any point in
continuing further and then expect to drop into kdump module from
pre-pivot hook.
I think if we error out early, I guess root might not be available and
I think that's fine. There are so many places things can go wrong and
we can't guarantee that root is available as backup target.
Second place of failure is from systemd. I see there are two emergency
services rescue.service and dracut-emergency.service. They both call
/bin/emergency-shell. So to me if we fix /bin/emergency-shell to call
/bin/module-emergency-shell that would automatically make sure that
system will not hang and kdump will get control after failure?
This proposal makes more sense to me. We can implement our own error
handler and override the default one provided by dracut or systemd. This
would give us more flexiability and scalability. We can do whatever we
like depending on the failure type, user specified failsafe action and
other fators.
The only thing is how we implement. I think an alternative approach is
to create a new emergency service and put it under /etc/systemd/ or
/run/systemd/. So it can take procedence over the default one under
/usr/lib/systemd/.
In addition, as you said, Lennart is likely going to facilitate such
replacing. We can wait until that happens and then make the decision.
Though I am not sure who calls dracut-emergency.service. What's the
dependency tree here.
>
> >
> > I talked to lennart and he was open to the idea of resue being replaced
> > by something else. I will send him a mail to implement that. After that
> > I am hoping that we can replace systemd rescue with something kdump
> > specific so that we get control in case of failure and then we can
> > run our policies.
>
> Before we have such facility in systemd, do you think we should remove
> "nofail"? Or we just leave as it is because remove nofail will lead
> failure to hang?
I think we have no choice but to remove "nofail" otherwise we will
seek kdump failures as target might not be mounted. I am not sure
if this problem is limited to non-root tarets only or not.
Yes, it's limited to non-root mount.
Thanks
WANG Chao
And then fix the error handling path.
Thanks
Vivek