Summary---------- Most all Fedora variants (except Cloud) have a GRUB menu entry containing the word "rescue". This kernel+initramfs pair are never updated for the life of a Fedora installation. And they quickly become stale as a Fedora installation ages. This kernel's modules are eventually deleted, and if selected at boot time, the typical user experience is a dracut shell.
Basic background------------- (skip this section if you know how it works) During a new installation, a single kernel version is installed. e.g.
vmlinuz-5.17.0-0.rc4.96.fc36.x86_64 which is then duplicated as e.g. vmlinuz-0-rescue-3a86878de5d649a983916543ece7bb7e.
Each of those (identical) kernels has an initramfs file:
initramfs-5.17.0-0.rc4.96.fc36.x86_64.img initramfs-0-rescue-3a86878de5d649a983916543ece7bb7e.img
The sole difference is the first one is a smaller host-only initramfs, the second one is a larger no host-only initramfs created with `dracut -N`. The bigger one just contains a bunch of extra kernel modules and dracut scripts, ostensibly to make it more likely to boot a system with some change in hardware that the host-only initramfs doesn't contain. The size of this rescue initramfs is around 100 MiB, with the common day to day "host only" initramfs being around 33 MiB. [1]
As the system is updated, additional kernel versions are installed. dnf.conf contains installonly_limit=3, which results in a maximum of three kernel versions being installed at a time. Once a fourth kernel is installed, the first kernel and its modules are removed from /usr/lib/modules. The rescue kernel+initramfs pair are never updated or upgraded, even during system upgrades.
Observations------------ This has been discussed by the Workstation working group [2] but since this functionality is present in all of Fedora, we're moving the discussion for greater visibility.
There's two separate complaints, if you will: (a) that the kernel+initramfs pair are never update or upgraded for the life of the installation; and (b) that even during one release cycle, the user experience when booting the rescue entry, changes, i.e. when the matching /usr/lib/modules for the rescue entry are present early on, you do get a full runtime behavior, you will get to a graphical environment. But then once the version matched /usr/lib/modules are removed, you get a completely different behavior when booting the rescue entry.
An important note from that ticket from Justin Forbes, the Fedora kernel maintainer: " Remember, the only real purpose of the rescue kernel is to get your system out of something completely unusable. It isn't meant to be a full runtime."
Questions------------
* Considering the very narrow purpose of the entry, maybe the current behavior is adequate?
* Does the rescue entry reliably get users to a dracut prompt, rather than indefinite hang? I don't know whether it does.
* Is there any way to improve the situation without increasing the risk that the rescue entry becomes totally non-functional? * The chosen kernel version needs to be based on one that is known to boot. Currently we know the kernel+initramfs pair work because it's the same version used to boot the installation media when doing the initial provisioning. We don't actually know an updated replacement "no host-only" initramfs will work until it's tried. Is it possible to automate this? And is it worth the risk, or even figuring out how to assess the risk? * At Flock 2021, Zbyszek proposed "Building Initrd Images from RPMs" to reduce the complexity of building initramfs, maybe there's a role for it here? More: https://www.youtube.com/watch?v=GATg_bqmASc
* What happens if we accept some scope creep, and go for many improvements that make the extra work worth it? * What about the unsigned nature of the initramfs? Should we be creating initramfs's in Fedora infra and signing them? * Stuff a graphical rescue environment into the initramfs? (This might be ten leaps too far, but it's intended to encourage thinking with a vivid imagination.)
[1] both values from a recent Fedora 36 Workstation installation
[2] https://pagure.io/fedora-workstation/issue/259
On Tue, 1 Mar 2022 14:37:38 -0700 Chris Murphy lists@colorremedies.com wrote:
Summary---------- Most all Fedora variants (except Cloud) have a GRUB menu entry containing the word "rescue". This kernel+initramfs pair are never updated for the life of a Fedora installation. And they quickly become stale as a Fedora installation ages. This kernel's modules are eventually deleted, and if selected at boot time, the typical user experience is a dracut shell.
There is a way to do this manually.
In order to create a new rescue kernel and initramfs, go into boot and delete the current rescue kernel and initramfs. Then run the command below substituting the kernel you want to use as the rescue kernel. The command below defaults to the currently running kernel.
/boot 10:03 AM root tty4 /usr/lib/kernel/install.d/51-dracut-rescue.install add $(uname -r) "" /lib/modules/$(uname -r)/vmlinuz
Maybe it can be added to a script.
On Tue, Mar 1, 2022 at 3:38 PM Chris Murphy lists@colorremedies.com wrote:
Summary---------- Most all Fedora variants (except Cloud) have a GRUB menu entry containing the word "rescue". This kernel+initramfs pair are never updated for the life of a Fedora installation. And they quickly become stale as a Fedora installation ages. This kernel's modules are eventually deleted, and if selected at boot time, the typical user experience is a dracut shell.
Basic background------------- (skip this section if you know how it works) During a new installation, a single kernel version is installed. e.g.
vmlinuz-5.17.0-0.rc4.96.fc36.x86_64 which is then duplicated as e.g. vmlinuz-0-rescue-3a86878de5d649a983916543ece7bb7e.
Each of those (identical) kernels has an initramfs file:
initramfs-5.17.0-0.rc4.96.fc36.x86_64.img initramfs-0-rescue-3a86878de5d649a983916543ece7bb7e.img
The sole difference is the first one is a smaller host-only initramfs, the second one is a larger no host-only initramfs created with `dracut -N`. The bigger one just contains a bunch of extra kernel modules and dracut scripts, ostensibly to make it more likely to boot a system with some change in hardware that the host-only initramfs doesn't contain. The size of this rescue initramfs is around 100 MiB, with the common day to day "host only" initramfs being around 33 MiB. [1]
As the system is updated, additional kernel versions are installed. dnf.conf contains installonly_limit=3, which results in a maximum of three kernel versions being installed at a time. Once a fourth kernel is installed, the first kernel and its modules are removed from /usr/lib/modules. The rescue kernel+initramfs pair are never updated or upgraded, even during system upgrades.
Observations------------ This has been discussed by the Workstation working group [2] but since this functionality is present in all of Fedora, we're moving the discussion for greater visibility.
There's two separate complaints, if you will: (a) that the kernel+initramfs pair are never update or upgraded for the life of the installation; and (b) that even during one release cycle, the user experience when booting the rescue entry, changes, i.e. when the matching /usr/lib/modules for the rescue entry are present early on, you do get a full runtime behavior, you will get to a graphical environment. But then once the version matched /usr/lib/modules are removed, you get a completely different behavior when booting the rescue entry.
An important note from that ticket from Justin Forbes, the Fedora kernel maintainer: " Remember, the only real purpose of the rescue kernel is to get your system out of something completely unusable. It isn't meant to be a full runtime."
Questions------------
- Considering the very narrow purpose of the entry, maybe the current
behavior is adequate?
- Does the rescue entry reliably get users to a dracut prompt, rather
than indefinite hang? I don't know whether it does.
I am surprised that the rescue kernel would give an indefinite hang or even just a dracut prompt within a release. I understand that people who constantly upgrade may have a very old rescue kernel, which doesn't natively support the things that current installs do and could have issues, but you should be able to reliably boot to terminal with network support from the rescue. That starts to fall apart if you did things like install a system on a very old release, and after a few years of upgrades still have the same rescue kernel, and in the meantime added new hardware, or converted a filesystem to btrfs, which wasn't built in before it was a default for some editions. It is my opinion that the purpose of the rescue kernel is to get you into console access with the network so that you can fix whatever issue made you boot the rescue to begin with, no more, no less.
- Is there any way to improve the situation without increasing the
risk that the rescue entry becomes totally non-functional?
- The chosen kernel version needs to be based on one that is known
to boot. Currently we know the kernel+initramfs pair work because it's the same version used to boot the installation media when doing the initial provisioning. We don't actually know an updated replacement "no host-only" initramfs will work until it's tried. Is it possible to automate this? And is it worth the risk, or even figuring out how to assess the risk?
This gets a bit tricky. Doing an install rescue is safe because either it works, or the system you just installed doesn't work either. When you do have a working rescue though, replacing it with a new kernel, even one that you have successfully booted, is not guaranteed to be safer. What if networking doesn't work in certain circumstances, or any number of issues that create problems but still "boot". I tend to hand create a new rescue when I add new hardware which might require it, but that is about all. One system here has a rescue kernel from 2016. This system has one from about a year ago because I replaced hardware and generated a new one. The original OS image was installed at F27 or so I think.
Justin
- At Flock 2021, Zbyszek proposed "Building Initrd Images from
RPMs" to reduce the complexity of building initramfs, maybe there's a role for it here? More: https://www.youtube.com/watch?v=GATg_bqmASc
- What happens if we accept some scope creep, and go for many
improvements that make the extra work worth it? * What about the unsigned nature of the initramfs? Should we be creating initramfs's in Fedora infra and signing them? * Stuff a graphical rescue environment into the initramfs? (This might be ten leaps too far, but it's intended to encourage thinking with a vivid imagination.)
[1] both values from a recent Fedora 36 Workstation installation
[2] https://pagure.io/fedora-workstation/issue/259
-- Chris Murphy _______________________________________________ devel mailing list -- devel@lists.fedoraproject.org To unsubscribe send an email to devel-leave@lists.fedoraproject.org Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/ List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org Do not reply to spam on the list, report it: https://pagure.io/fedora-infrastructure
On Tue, Mar 1, 2022 at 3:24 PM Justin Forbes jmforbes@linuxtx.org wrote:
I am surprised that the rescue kernel would give an indefinite hang or even just a dracut prompt within a release.
The latter case is trivially reproducible on UEFI, with the failure being that mounting /boot/efi comes *after* switchroot. After switchroot the vfat module in the initramfs is not available, and the rootfs lacks matching /usr/lib/modules, therefore it's also not available. And thus mount fails and thus dracut shell.
A possible simple work around is having the installer add "nofail" mount option to /boot/efi which raises the potential problem that it fails to mount for $REASON, and thus silently isn't getting bootloader updates. I guess that's better than always getting a dracut prompt?
Also more reliable would be if the rescue boot entry uses systemd.whateverisolate=multiuser.target to make sure (a) consistency no matter the existence of /usr/lib/modules (b) we don't get hung up somehow loading the graphical environment possibly needing things in /usr/lib/modules that aren't available.
If I'm not mistaken, this issue hasn't been resolved...
Since the rescue kernel depends to some extent on the kernel modules in the root volume, would the right solution be: - in preuninstall, determine whether the rescue kernel matches the version being removed, and if so, remove it, and then: - determine whether the version being removed matches the running kernel, and if not, then build a rescue kernel for the running kernel version
Protecting the running kernel is optional, so it's possible that the running kernel will be the one removed, and in that case I *think* that the system will end up building a rescue kernel for the version whose installation triggered the removal of a kernel package. That rescue kernel might not work, but neither would the version whose modules were just removed, so the system probably isn't worse off. (And systems with the normal behavior of protecticong the running kernel should be safe from this.)
Then the remaining problem is that an awful lot of Fedora systems have already removed the kernel-modules corresponding to (and supporting) their rescue kernel, and this approach would leave them in their current broken state.
On Tue, Mar 1, 2022 at 2:56 PM Chris Murphy lists@colorremedies.com wrote:
On Tue, Mar 1, 2022 at 3:24 PM Justin Forbes jmforbes@linuxtx.org wrote:
I am surprised that the rescue kernel would give an indefinite hang or even just a dracut prompt within a release.
The latter case is trivially reproducible on UEFI, with the failure being that mounting /boot/efi comes *after* switchroot. After switchroot the vfat module in the initramfs is not available, and the rootfs lacks matching /usr/lib/modules, therefore it's also not available. And thus mount fails and thus dracut shell.
A possible simple work around is having the installer add "nofail" mount option to /boot/efi which raises the potential problem that it fails to mount for $REASON, and thus silently isn't getting bootloader updates. I guess that's better than always getting a dracut prompt?
Also more reliable would be if the rescue boot entry uses systemd.whateverisolate=multiuser.target to make sure (a) consistency no matter the existence of /usr/lib/modules (b) we don't get hung up somehow loading the graphical environment possibly needing things in /usr/lib/modules that aren't available.
-- Chris Murphy _______________________________________________ devel mailing list -- devel@lists.fedoraproject.org To unsubscribe send an email to devel-leave@lists.fedoraproject.org Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/ List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org Do not reply to spam on the list, report it: https://pagure.io/fedora-infrastructure
On Tue, 2022-03-01 at 14:37 -0700, Chris Murphy wrote:
Summary---------- Most all Fedora variants (except Cloud) have a GRUB menu entry containing the word "rescue". This kernel+initramfs pair are never updated for the life of a Fedora installation. And they quickly become stale as a Fedora installation ages. This kernel's modules are eventually deleted, and if selected at boot time, the typical user experience is a dracut shell.
Basic background------------- (skip this section if you know how it works) During a new installation, a single kernel version is installed. e.g.
vmlinuz-5.17.0-0.rc4.96.fc36.x86_64 which is then duplicated as e.g. vmlinuz-0-rescue-3a86878de5d649a983916543ece7bb7e.
Each of those (identical) kernels has an initramfs file:
initramfs-5.17.0-0.rc4.96.fc36.x86_64.img initramfs-0-rescue-3a86878de5d649a983916543ece7bb7e.img
The sole difference is the first one is a smaller host-only initramfs, the second one is a larger no host-only initramfs created with `dracut -N`. The bigger one just contains a bunch of extra kernel modules and dracut scripts, ostensibly to make it more likely to boot a system with some change in hardware that the host-only initramfs doesn't contain. The size of this rescue initramfs is around 100 MiB, with the common day to day "host only" initramfs being around 33 MiB. [1]
As the system is updated, additional kernel versions are installed. dnf.conf contains installonly_limit=3, which results in a maximum of three kernel versions being installed at a time. Once a fourth kernel is installed, the first kernel and its modules are removed from /usr/lib/modules. The rescue kernel+initramfs pair are never updated or upgraded, even during system upgrades.
Hello, I found this email on google , thank you for this "basic background" lesson .
He also have bug report about this [1], in resume we can regenerate rescue entry just by delete /boot/*-0-rescue-* files and reinstall the kernel .
But isn't missing inst.rescue boot option [2] ?
I also recently did a scratch to fix this bug [3]
Best regards,
[1] https://bugzilla.redhat.com/show_bug.cgi?id=1768132 https://bugzilla.redhat.com/show_bug.cgi?id=1768132#c35
[2] https://github.com/rhinstaller/anaconda/blob/master/docs/boot-options.rst#in...
[3] https://src.fedoraproject.org/fork/sergiomb/rpms/dracut/blob/rawhide/f/0001-...
On 1/3/23 18:02, Sérgio Basto wrote:
But isn't missing inst.rescue boot option [2] ?
That's a different thing. The rescue kernel is only to have all kernel modules available. The "inst.rescue" mode is available on netinst images (and maybe others?) and boots an actual rescue mode that lets you mount the installed system for maintenance or repair.
On Tue, 2023-01-03 at 21:21 -0800, Samuel Sieb wrote:
On 1/3/23 18:02, Sérgio Basto wrote:
But isn't missing inst.rescue boot option [2] ?
That's a different thing. The rescue kernel is only to have all kernel modules available. The "inst.rescue" mode is available on netinst images (and maybe others?) and boots an actual rescue mode that lets you mount the installed system for maintenance or repair.
So shouldn't be called rescue , I though and would be nice have a boot entry that really do a rescue like is available on netinstall images
Maybe the entry should be called safemode like Ubuntus
Best regards,