On Fri, Apr 20, 2018 at 1:23 PM, Chris Adams <linux(a)cmadams.net> wrote:
Once upon a time, Chris Murphy <lists(a)colorremedies.com> said:
> On Fri, Apr 20, 2018 at 6:15 AM, Florian Weimer <fweimer(a)redhat.com> wrote:
> > I'm trying to pin down what exposed this bug:
> >
> >
https://bugzilla.redhat.com/show_bug.cgi?id=1569970
> >
> > The immediate trigger seems to be that all shutdowns on my system leave the
> > XFS root file system in an unclean state, so that GRUB cannot read recently
> > written files under /boot (assuming that /boot is on the same file system).
Just to note: I believe I ran into this with a separate /boot. I was
hitting what turned out to be a Mesa bug that caused my Intel GPU to
lock up; I tried updating to a newer kernel to see if that would help,
but happened to hit the lockup right after "dnf update" finished, so I
hit the power button.
Poof, grub2 found no config on boot. Oddly, I could still view the
grub.cfg with the grub CLI, so (eventually) I was able to manually boot,
write out a new grub.cfg, and recover.
So while systemd could (and should) do better, IMHO the problem is up
the line with some combination of GRUB not reading the journal and
grubby not forcing a sync.
Another solution might be to have a dnf plugin that forces a journal
flush (not a bad idea in general after loading updates), but that would
be kind of a band-aid over this problem.
It's a good anecdote showing that separate /boot isn't a sure solution
for this problem.
The central problem is bad design, wrong assumptions, by grubby and
GRUB (grub-mkconfig). They both assume something else will commit the
changes to /boot that add kernel, initramfs, and bootloader
configuration. And sync() is only sufficient on non-journaled file
systems, on journaled file systems sync() only ensures metadata is in
the log, it's considered acceptable to depend on log replay to make
the file system consistent, but of course the bootloader can't do
that. And that's why whatever modifies /boot last is most responsible
for making sure the changes are fully committed, and the only way to
do that on a journaled file system is freeze/thaw.
Hence
https://github.com/rhboot/grubby/issues/25
https://savannah.gnu.org/bugs/index.php?52657
--
Chris Murphy