Hey all,
I have a fair number of Raspberry Pi V3 B/B+ and Raspberry Pi V2 B systems running Fedora 31 (and a V4 I would like to get on Fedora - still waiting). I have Fedora 31 aarch64 installed on most of the V3 and armv7l/armhfp on the V2 systems. Running dnf updating today, updated to the 5.4.19 kernel and the update of the aarch64 images seemed to hang while running the kernel-core script for over an hour. Looking from another terminal and running top (running dnf upgrade remotely over ssh), looks like grubby is hung burning CPU time and eating memory (and I have lots of cache). Couple machines eventually crashed, and I killed grubby on a couple of others, and the dnf eventually ran to completion on the later 2. In all cases (5 - aarch64 systems) I was left with totally unbootable systems. The ones with screens go straight to the U-Boot prompt or trying to reboot over and over again looking for storage and then looking at eth0 and then rebooting and never show the kernel selection prompt. The few armv7l/armhfp systems haven't seem to have gotten that the 5.4.19 upgrade yet. I've still got two booted and running aarch64 systems running (haven't rebooted) and I'm going to try and roll that update back.
Any thoughts to were to go from here? Not sure what to report this under in Bugzilla.
Regards, Mike
On Tue, 2020-02-18 at 15:55 -0500, Michael H. Warfield wrote:
Hey all,
I have a fair number of Raspberry Pi V3 B/B+ and Raspberry Pi V2 B systems running Fedora 31 (and a V4 I would like to get on Fedora - still waiting). I have Fedora 31 aarch64 installed on most of the V3 and armv7l/armhfp on the V2 systems. Running dnf updating today, updated to the 5.4.19 kernel and the update of the aarch64 images seemed to hang while running the kernel-core script for over an hour. Looking from another terminal and running top (running dnf upgrade remotely over ssh), looks like grubby is hung burning CPU time and eating memory (and I have lots of cache). Couple machines eventually crashed, and I killed grubby on a couple of others, and the dnf eventually ran to completion on the later 2. In all cases (5 - aarch64 systems) I was left with totally unbootable systems. The ones with screens go straight to the U-Boot prompt or trying to reboot over and over again looking for storage and then looking at eth0 and then rebooting and never show the kernel selection prompt. The few armv7l/armhfp systems haven't seem to have gotten that the 5.4.19 upgrade yet. I've still got two booted and running aarch64 systems running (haven't rebooted) and I'm going to try and roll that update back.
Any thoughts to were to go from here? Not sure what to report this under in Bugzilla.
Looks like it's grubby that doing the damage. Leaving me something like this (vastly shortened, sorry for the length) in /boot/efi/EFI/fedora/grub.cfg :
On Tue, 2020-02-18 at 16:53 -0500, Michael H. Warfield wrote:
On Tue, 2020-02-18 at 15:55 -0500, Michael H. Warfield wrote:
Hey all, I have a fair number of Raspberry Pi V3 B/B+ and Raspberry Pi V2 B systems running Fedora 31 (and a V4 I would like to get on Fedora - still waiting). I have Fedora 31 aarch64 installed on most of the V3 and armv7l/armhfp on the V2 systems. Running dnf updating today, updated to the 5.4.19 kernel and the update of the aarch64 images seemed to hang while running the kernel-core script for over an hour. Looking from another terminal and running top (running dnf upgrade remotely over ssh), looks like grubby is hung burning CPU time and eating memory (and I have lots of cache). Couple machines eventually crashed, and I killed grubby on a couple of others, and the dnf eventually ran to completion on the later 2. In all cases (5 - aarch64 systems) I was left with totally unbootable systems. The ones with screens go straight to the U-Boot prompt or trying to reboot over and over again looking for storage and then looking at eth0 and then rebooting and never show the kernel selection prompt. The few armv7l/armhfp systems haven't seem to have gotten that the 5.4.19 upgrade yet. I've still got two booted and running aarch64 systems running (haven't rebooted) and I'm going to try and roll that update back. Any thoughts to were to go from here? Not sure what to report this under in Bugzilla.
Looks like it's grubby that doing the damage. Leaving me something like this (vastly shortened, sorry for the length) in /boot/efi/EFI/fedora/grub.cfg :
Trimming the following to save us all some pain...
-- set default_kernelopts="root=UUID=92567a3e-b3d2-494a-b4ec- ebb4687d6b40=UUID=92567a3e-b3d2-494a-b4ec-ebb4687d6b40=92567a3e-b3d2- --
That was just a SMALL sample of the line. And it doesn't get fixed by uninstalling.
I'm confirming this. Using another system, I manually repaired that "set default_kernelopts=" back to what was in the original Fedora image. Back to the original (my rpi-test system) the card immediately booted the system up AND it had the CORRECT new system up, 5.4.19.
So the problem is in grubby. This had to have happened in just the last week. The update to 5.4.18 did not result in this carnage. So, whatever happened with grubby happened recently.
Strangely, the /boot/efi/EFI/fedora/grub.cfg file is in the aarch64 image but not in the arm7l image. :-?
I just gotta pry a couple of cards out of a couple of cases now. :-P
Mike
Some how this does not look encouraging.
-- [root@rpi-devel mhw]# ls -l /boot/efi/EFI/fedora/grub.cfg -rwx------. 1 root root 841768 Feb 18 10:55 /boot/efi/EFI/fedora/grub.cfg --
An 840K grub.cfg???
Ok... Maybe I can fix these manually. Maybe not.
Regards, Mike
On Tue, 18 Feb 2020, Michael H. Warfield wrote:
On Tue, 2020-02-18 at 16:53 -0500, Michael H. Warfield wrote:
On Tue, 2020-02-18 at 15:55 -0500, Michael H. Warfield wrote:
Hey all, I have a fair number of Raspberry Pi V3 B/B+ and Raspberry Pi V2 B systems running Fedora 31 (and a V4 I would like to get on Fedora - still waiting). I have Fedora 31 aarch64 installed on most of the V3 and armv7l/armhfp on the V2 systems. Running dnf updating today, updated to the 5.4.19 kernel and the update of the aarch64 images seemed to hang while running the kernel-core script for over an hour. Looking from another terminal and running top (running dnf upgrade remotely over ssh), looks like grubby is hung burning CPU time and eating memory (and I have lots of cache). Couple machines eventually crashed, and I killed grubby on a couple of others, and the dnf eventually ran to completion on the later 2. In all cases (5 - aarch64 systems) I was left with totally unbootable systems. The ones with screens go straight to the U-Boot prompt or trying to reboot over and over again looking for storage and then looking at eth0 and then rebooting and never show the kernel selection prompt. The few armv7l/armhfp systems haven't seem to have gotten that the 5.4.19 upgrade yet. I've still got two booted and running aarch64 systems running (haven't rebooted) and I'm going to try and roll that update back. Any thoughts to were to go from here? Not sure what to report this under in Bugzilla.
Looks like it's grubby that doing the damage. Leaving me something like this (vastly shortened, sorry for the length) in /boot/efi/EFI/fedora/grub.cfg :
Trimming the following to save us all some pain...
-- set default_kernelopts="root=UUID=92567a3e-b3d2-494a-b4ec- ebb4687d6b40=UUID=92567a3e-b3d2-494a-b4ec-ebb4687d6b40=92567a3e-b3d2- --
That was just a SMALL sample of the line. And it doesn't get fixed by uninstalling.
I'm confirming this. Using another system, I manually repaired that "set default_kernelopts=" back to what was in the original Fedora image. Back to the original (my rpi-test system) the card immediately booted the system up AND it had the CORRECT new system up, 5.4.19.
So the problem is in grubby. This had to have happened in just the last week. The update to 5.4.18 did not result in this carnage. So, whatever happened with grubby happened recently.
Strangely, the /boot/efi/EFI/fedora/grub.cfg file is in the aarch64 image but not in the arm7l image. :-?
I just gotta pry a couple of cards out of a couple of cases now. :-P
Mike
Some how this does not look encouraging.
-- [root@rpi-devel mhw]# ls -l /boot/efi/EFI/fedora/grub.cfg -rwx------. 1 root root 841768 Feb 18 10:55 /boot/efi/EFI/fedora/grub.cfg --
An 840K grub.cfg???
It would probably be easier to regenerate the grub.cfg file eg. grub2-mkconfig -o /boot/efi/EFI/fedora/grub.cfg though you might want to write that to a temporary file first to check it.
Also grubby is optional in Fedora 31, as it isn't needed if BLS is enabled, ie. if you have GRUB_ENABLE_BLSCFG=true in /etc/default/grub and a grub.cfg file built with that configuration, which should include the lines insmod blscfg blscfg and also configuration files in /boot/loader/entries/ for each kernel.
I have updated my Pi3B to 5.4.19-200.fc31.aarch64 without problems, though I don't think grubby was ever installed on it.
Michael Young
On Tue, 2020-02-18 at 23:14 +0000, YOUNG, MICHAEL A. wrote:
:
It would probably be easier to regenerate the grub.cfg file eg. grub2-mkconfig -o /boot/efi/EFI/fedora/grub.cfg though you might want to write that to a temporary file first to check it.
Yeah, I've had to do that on a few x86_64 platforms.
Also grubby is optional in Fedora 31, as it isn't needed if BLS is enabled, ie. if you have GRUB_ENABLE_BLSCFG=true in /etc/default/grub and a grub.cfg file built with that configuration, which should include the lines insmod blscfg blscfg and also configuration files in /boot/loader/entries/ for each kernel.
Thanks! Good to know.
I have updated my Pi3B to 5.4.19-200.fc31.aarch64 without problems, though I don't think grubby was ever installed on it.
I took the stock Fedora Xfce image to build the systems and update them. Sadly the Fedora Server image install an lvm image, which sucks along several vectors and the Workstation installs Gnome3, which also sucks. Xfce seems to be my sweet spot. So it maybe something they have configured there. But the armv7l doesn't seem to have this problem and doesn't seem to be using grub2.
Michael Young
Regards, Mike
Bugzilla bug report has been submitted.
On Tue, 2020-02-18 at 17:19 -0500, Michael H. Warfield wrote:
On Tue, 2020-02-18 at 16:53 -0500, Michael H. Warfield wrote:
On Tue, 2020-02-18 at 15:55 -0500, Michael H. Warfield wrote:
Hey all, I have a fair number of Raspberry Pi V3 B/B+ and Raspberry Pi V2 B systems running Fedora 31 (and a V4 I would like to get on Fedora
still waiting). I have Fedora 31 aarch64 installed on most of the V3 and armv7l/armhfp on the V2 systems. Running dnf updating today, updated to the 5.4.19 kernel and the update of the aarch64 images seemed to hang while running the kernel-core script for over an hour. Looking from another terminal and running top (running dnf upgrade remotely over ssh), looks like grubby is hung burning CPU time and eating memory (and I have lots of cache). Couple machines eventually crashed, and I killed grubby on a couple of others, and the dnf eventually ran to completion on the later 2. In all cases (5
aarch64 systems) I was left with totally unbootable systems. The ones with screens go straight to the U-Boot prompt or trying to reboot over and over again looking for storage and then looking at eth0 and then rebooting and never show the kernel selection prompt. The few armv7l/armhfp systems haven't seem to have gotten that the 5.4.19 upgrade yet. I've still got two booted and running aarch64 systems running (haven't rebooted) and I'm going to try and roll that update back. Any thoughts to were to go from here? Not sure what to report this under in Bugzilla.
Looks like it's grubby that doing the damage. Leaving me something like this (vastly shortened, sorry for the length) in /boot/efi/EFI/fedora/grub.cfg :
Trimming the following to save us all some pain...
-- set default_kernelopts="root=UUID=92567a3e-b3d2-494a-b4ec- ebb4687d6b40=UUID=92567a3e-b3d2-494a-b4ec-ebb4687d6b40=92567a3e- b3d2- -- That was just a SMALL sample of the line. And it doesn't get fixed by uninstalling.
I'm confirming this. Using another system, I manually repaired that "set default_kernelopts=" back to what was in the original Fedora image. Back to the original (my rpi-test system) the card immediately booted the system up AND it had the CORRECT new system up, 5.4.19.
So the problem is in grubby. This had to have happened in just the last week. The update to 5.4.18 did not result in this carnage. So, whatever happened with grubby happened recently.
Strangely, the /boot/efi/EFI/fedora/grub.cfg file is in the aarch64 image but not in the arm7l image. :-?
I just gotta pry a couple of cards out of a couple of cases now. :-P
Mike
Some how this does not look encouraging.
-- [root@rpi-devel mhw]# ls -l /boot/efi/EFI/fedora/grub.cfg -rwx------. 1 root root 841768 Feb 18 10:55 /boot/efi/EFI/fedora/grub.cfg --
An 840K grub.cfg???
Ok... Maybe I can fix these manually. Maybe not.
Regards, Mike
On Tue, 18 Feb 2020, Michael H. Warfield wrote:
So the problem is in grubby. This had to have happened in just the last week. The update to 5.4.18 did not result in this carnage. So, whatever happened with grubby happened recently.
Strangely, the /boot/efi/EFI/fedora/grub.cfg file is in the aarch64 image but not in the arm7l image. :-?
It seems like this bug would affect all EFI systems, not just aarch64. I'm be on the alert before installing 5.4.19 on x86_64 as well.
On Tue, 2020-02-18 at 19:44 -0500, Stuart D. Gathman wrote:
On Tue, 18 Feb 2020, Michael H. Warfield wrote:
So the problem is in grubby. This had to have happened in just the last week. The update to 5.4.18 did not result in this carnage. So, whatever happened with grubby happened recently. Strangely, the /boot/efi/EFI/fedora/grub.cfg file is in the aarch64 image but not in the arm7l image. :-?
It seems like this bug would affect all EFI systems, not just aarch64. I'm be on the alert before installing 5.4.19 on x86_64 as well.
I agree. Seems strange but...
I'm actually wondering if the hang and crash is a buffer overrun. I looked on a newly rebuilt system and saw the corruption but it was only a few lines long and the system still worked. I've got a pile of development systems are updated every time a kernel gets updated (once or twice a week) that that line had hit over 800K long. They're all largely mirrors of each other, just with different tasks. Each of the "corrupt" files were identical but all of the updates have been in lock step.
All but one of the six affected systems are back up and the single one that's not managed to boot the kernel and started but ran into a panic. But I get a kernel menu on it now.
Just checked my x86_64 system (a Lenovo Yoga 730-15). No sign of the corruption in that file. Very strange but 6 systems impacted at nearly the same time (hours).
Mike
On Tue, 2020-02-18 at 20:22 -0500, Michael H. Warfield wrote:
On Tue, 2020-02-18 at 19:44 -0500, Stuart D. Gathman wrote:
On Tue, 18 Feb 2020, Michael H. Warfield wrote:
So the problem is in grubby. This had to have happened in just the last week. The update to 5.4.18 did not result in this carnage. So, whatever happened with grubby happened recently. Strangely, the /boot/efi/EFI/fedora/grub.cfg file is in the aarch64 image but not in the arm7l image. :-?
It seems like this bug would affect all EFI systems, not just aarch64. I'm be on the alert before installing 5.4.19 on x86_64 as well.
I agree. Seems strange but...
I'm actually wondering if the hang and crash is a buffer overrun. I looked on a newly rebuilt system and saw the corruption but it was only a few lines long and the system still worked. I've got a pile of development systems are updated every time a kernel gets updated (once or twice a week) that that line had hit over 800K long. They're all largely mirrors of each other, just with different tasks. Each of the "corrupt" files were identical but all of the updates have been in lock step.
All but one of the six affected systems are back up and the single one that's not managed to boot the kernel and started but ran into a panic. But I get a kernel menu on it now.
That's it...
The panic on the odd system out was a missing initramfs but that's one of the two that hurled chunks and crashed during the update and I got it back on the previous kernel and reinstalled the newer kernel to fix that once I had fixed grub.cfg. I then compared the working grub.cfg file to the resulting one from the reinstall.
This:
Hi Michael,
Trying to reproduce this issue without success. How did you write the images and what command was used? After booting, did you confirm the kargs were as expected?
Once booted, you just 'dnf update' and get the corrupted grub.cfg?
Thanks! Paul
----- Original Message -----
On Tue, 2020-02-18 at 20:22 -0500, Michael H. Warfield wrote:
On Tue, 2020-02-18 at 19:44 -0500, Stuart D. Gathman wrote:
On Tue, 18 Feb 2020, Michael H. Warfield wrote:
So the problem is in grubby. This had to have happened in just the last week. The update to 5.4.18 did not result in this carnage. So, whatever happened with grubby happened recently. Strangely, the /boot/efi/EFI/fedora/grub.cfg file is in the aarch64 image but not in the arm7l image. :-?
It seems like this bug would affect all EFI systems, not just aarch64. I'm be on the alert before installing 5.4.19 on x86_64 as well.
I agree. Seems strange but...
I'm actually wondering if the hang and crash is a buffer overrun. I looked on a newly rebuilt system and saw the corruption but it was only a few lines long and the system still worked. I've got a pile of development systems are updated every time a kernel gets updated (once or twice a week) that that line had hit over 800K long. They're all largely mirrors of each other, just with different tasks. Each of the "corrupt" files were identical but all of the updates have been in lock step.
All but one of the six affected systems are back up and the single one that's not managed to boot the kernel and started but ran into a panic. But I get a kernel menu on it now.
That's it...
The panic on the odd system out was a missing initramfs but that's one of the two that hurled chunks and crashed during the update and I got it back on the previous kernel and reinstalled the newer kernel to fix that once I had fixed grub.cfg. I then compared the working grub.cfg file to the resulting one from the reinstall.
This:
set default_kernelopts="root=UUID=92567a3e-b3d2-494a-b4ec-ebb4687d6b40 ro cma=192MB" -- Became this during the install: -- set default_kernelopts="root=UUID=92567a3e-b3d2-494a-b4ec-ebb4687d6b40=UUID=92567a3e-b3d2-494a-b4ec-ebb4687d6b40=92567a3e-b3d2-494a-b4ec-ebb4687d6b40 ro cma=192MB" --
And became this after the reinstall was complete:
set default_kernelopts="root=UUID=92567a3e-b3d2-494a-b4ec-ebb4687d6b40=UUID=92567a3e-b3d2-494a-b4ec-ebb4687d6b40=92567a3e-b3d2-494a-b4ec-ebb4687d6b40=UUID=92567a3e-b3d2-494a-b4ec-ebb4687d6b40=UUID=92567a3e-b3d2-494a-b4ec-ebb4687d6b40=92567a3e-b3d2-494a-b4ec-ebb4687d6b40=92567a3e-b3d2-494a-b4ec-ebb4687d6b40=UUID=92567a3e-b3d2-494a-b4ec-ebb4687d6b40=92567a3e-b3d2-494a-b4ec-ebb4687d6b40=UUID=92567a3e-b3d2-494a-b4ec-ebb4687d6b40=92567a3e-b3d2-494a-b4ec-ebb4687d6b40=92567a3e-b3d2-494a-b4ec-ebb4687d6b40=92567a3e-b3d2-494a-b4ec-ebb4687d6b40=92567a3e-b3d2-494a-b4ec-ebb4687d6b40 ro cma=192MB" --
It duplicated the root UUID on the default_kernelopts several times. And rebooting with that bad line still worked. It seems to be the size of that duplicated root UUID that finally borks it. Each time the kernel gets updated, it gets another string attached until it blows chunks. This may have been in there for a long time, I'm just real aggressive about doing kernel updates.
It's now up on the updated kernel. WIERD. I should probably update my Bugzilla report now.
No idea why it's not showing up on x86_64 systems.
Just checked my x86_64 system (a Lenovo Yoga 730-15). No sign of the corruption in that file. Very strange but 6 systems impacted at nearly the same time (hours).
Mike
Michael H. Warfield (AI4NB) | (o) +1 706 850-8773 | mhw@WittsEnd.com //|=mhw=|// | (c) +1 678 463-0932 | http://www.wittsend.com/mhw/ ARIN whois: MHW9-ARIN | An optimist believes we live in the best of all PGP Key: 0xC0EB9675674627FF | possible worlds. A pessimist is sure of it!
arm mailing list -- arm@lists.fedoraproject.org To unsubscribe send an email to arm-leave@lists.fedoraproject.org Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/ List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedoraproject.org/archives/list/arm@lists.fedoraproject.org
On Wed, 2020-02-19 at 11:25 -0500, Paul Whalen wrote:
Hi Michael,
Trying to reproduce this issue without success. How did you write the images and what command was used? After booting, did you confirm the kargs were as expected?
I used arm-install as recommended.
Once booted, you just 'dnf update' and get the corrupted grub.cfg?
Correct.
One of the systems required a reinstall of kernel-core-5.4.19- 200.fc31.aarch64 in order to fix a missing initramfs (the system had crashed).
Saw this in the subsequent the reinstall and the grub.cfg that started out as 5973 bytes was now 841768 bytes with corrupted "default_kernelopts=" in the file. Fortunately, I had backed up the grub.cfg and was able to restored it before trying to reboot the system.
----- Original Message -----
On Wed, 2020-02-19 at 11:25 -0500, Paul Whalen wrote:
Hi Michael,
Trying to reproduce this issue without success. How did you write the images and what command was used? After booting, did you confirm the kargs were as expected?
I used arm-install as recommended.
What command?
Once booted, you just 'dnf update' and get the corrupted grub.cfg?
Correct.
One of the systems required a reinstall of kernel-core-5.4.19- 200.fc31.aarch64 in order to fix a missing initramfs (the system had crashed).
Saw this in the subsequent the reinstall and the grub.cfg that started out as 5973 bytes was now 841768 bytes with corrupted "default_kernelopts=" in the file. Fortunately, I had backed up the grub.cfg and was able to restored it before trying to reboot the system.
-- Running transaction check Transaction check succeeded. Running transaction test Transaction test succeeded. Running transaction Preparing : 1/1 Reinstalling : kernel-core-5.4.19-200.fc31.aarch64 1/2 Running scriptlet: kernel-core-5.4.19-200.fc31.aarch64 1/2 Running scriptlet: kernel-core-5.4.19-200.fc31.aarch64 2/2 Cleanup : kernel-core-5.4.19-200.fc31.aarch64 2/2 Running scriptlet: kernel-core-5.4.19-200.fc31.aarch64 2/2 grubby fatal error: unable to find a suitable template
Verifying : kernel-core-5.4.19-200.fc31.aarch64 1/2 Verifying : kernel-core-5.4.19-200.fc31.aarch64 2/2 Completion plugin: Generating completion cache...
Reinstalled: kernel-core-5.4.19-200.fc31.aarch64
Complete!
Now have a bug report in Bugzilla and working it. Just attached a stack of files requested to the bug report.
https://bugzilla.redhat.com/show_bug.cgi?id=1804483
Mike
Thanks! Paul
----- Original Message -----
On Tue, 2020-02-18 at 20:22 -0500, Michael H. Warfield wrote:
On Tue, 2020-02-18 at 19:44 -0500, Stuart D. Gathman wrote:
On Tue, 18 Feb 2020, Michael H. Warfield wrote:
So the problem is in grubby. This had to have happened in just the last week. The update to 5.4.18 did not result in this carnage. So, whatever happened with grubby happened recently. Strangely, the /boot/efi/EFI/fedora/grub.cfg file is in the aarch64 image but not in the arm7l image. :-?
It seems like this bug would affect all EFI systems, not just aarch64. I'm be on the alert before installing 5.4.19 on x86_64 as well.
I agree. Seems strange but... I'm actually wondering if the hang and crash is a buffer overrun. I looked on a newly rebuilt system and saw the corruption but it was only a few lines long and the system still worked. I've got a pile of development systems are updated every time a kernel gets updated (once or twice a week) that that line had hit over 800K long. They're all largely mirrors of each other, just with different tasks. Each of the "corrupt" files were identical but all of the updates have been in lock step. All but one of the six affected systems are back up and the single one that's not managed to boot the kernel and started but ran into a panic. But I get a kernel menu on it now.
That's it...
The panic on the odd system out was a missing initramfs but that's one of the two that hurled chunks and crashed during the update and I got it back on the previous kernel and reinstalled the newer kernel to fix that once I had fixed grub.cfg. I then compared the working grub.cfg file to the resulting one from the reinstall.
This:
set default_kernelopts="root=UUID=92567a3e-b3d2-494a-b4ec- ebb4687d6b40 ro cma=192MB" -- Became this during the install: -- set default_kernelopts="root=UUID=92567a3e-b3d2-494a-b4ec- ebb4687d6b40=UUID=92567a3e-b3d2-494a-b4ec-ebb4687d6b40=92567a3e- b3d2-494a-b4ec-ebb4687d6b40 ro cma=192MB" --
And became this after the reinstall was complete:
set default_kernelopts="root=UUID=92567a3e-b3d2-494a-b4ec- ebb4687d6b40=UUID=92567a3e-b3d2-494a-b4ec-ebb4687d6b40=92567a3e- b3d2-494a-b4ec-ebb4687d6b40=UUID=92567a3e-b3d2-494a-b4ec- ebb4687d6b40=UUID=92567a3e-b3d2-494a-b4ec-ebb4687d6b40=92567a3e- b3d2-494a-b4ec-ebb4687d6b40=92567a3e-b3d2-494a-b4ec- ebb4687d6b40=UUID=92567a3e-b3d2-494a-b4ec-ebb4687d6b40=92567a3e- b3d2-494a-b4ec-ebb4687d6b40=UUID=92567a3e-b3d2-494a-b4ec- ebb4687d6b40=92567a3e-b3d2-494a-b4ec-ebb4687d6b40=92567a3e-b3d2- 494a-b4ec-ebb4687d6b40=92567a3e-b3d2-494a-b4ec- ebb4687d6b40=92567a3e-b3d2-494a-b4ec-ebb4687d6b40 ro cma=192MB" --
It duplicated the root UUID on the default_kernelopts several times. And rebooting with that bad line still worked. It seems to be the size of that duplicated root UUID that finally borks it. Each time the kernel gets updated, it gets another string attached until it blows chunks. This may have been in there for a long time, I'm just real aggressive about doing kernel updates.
It's now up on the updated kernel. WIERD. I should probably update my Bugzilla report now.
No idea why it's not showing up on x86_64 systems.
Just checked my x86_64 system (a Lenovo Yoga 730-15). No sign of the corruption in that file. Very strange but 6 systems impacted at nearly the same time (hours).
Mike
Michael H. Warfield (AI4NB) | (o) +1 706 850-8773 | mhw@WittsEnd.com //|=mhw=|// | (c) +1 678 463-0932 | http://www.wittsend.com/mhw/ ARIN whois: MHW9-ARIN | An optimist believes we live in the best of all PGP Key: 0xC0EB9675674627FF | possible worlds. A pessimist is sure of it!
arm mailing list -- arm@lists.fedoraproject.org To unsubscribe send an email to arm-leave@lists.fedoraproject.org Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/ List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedoraproject.org/archives/list/arm@lists.fedoraproject.org
-- Michael H. Warfield (AI4NB) | (o) +1 706 850-8773 | mhw@WittsEnd.com //|=mhw=|// | (c) +1 678 463-0932 | http://www.wittsend.com/mhw/ ARIN whois: MHW9-ARIN | An optimist believes we live in the best of all PGP Key: 0xC0EB9675674627FF | possible worlds. A pessimist is sure of it!
On Wed, 2020-02-19 at 11:56 -0500, Michael H. Warfield wrote:
On Wed, 2020-02-19 at 11:25 -0500, Paul Whalen wrote:
Hi Michael, Trying to reproduce this issue without success. How did you write the images and what command was used? After booting, did you confirm the kargs were as expected?
I used arm-install as recommended.
Specific command I used was...
arm-image-installer --image=Fedora-Xfce-31-1.9.aarch64.raw.xz --target=rpi3 --selinux=off --norootpass --resizefs --addconsole
With some keys options and output.
Once booted, you just 'dnf update' and get the corrupted grub.cfg?
Correct.
One of the systems required a reinstall of kernel-core-5.4.19- 200.fc31.aarch64 in order to fix a missing initramfs (the system had crashed).
Saw this in the subsequent the reinstall and the grub.cfg that started out as 5973 bytes was now 841768 bytes with corrupted "default_kernelopts=" in the file. Fortunately, I had backed up the grub.cfg and was able to restored it before trying to reboot the system.
-- Running transaction check Transaction check succeeded. Running transaction test Transaction test succeeded. Running transaction
Preparing : 1/1 Reinstalling : kernel-core-5.4.19- 200.fc31.aarch64 1/2 Running scriptlet: kernel-core-5.4.19- 200.fc31.aarch64 1/2 Running scriptlet: kernel-core-5.4.19- 200.fc31.aarch64 2/2 Cleanup : kernel-core-5.4.19- 200.fc31.aarch64 2/2 Running scriptlet: kernel-core-5.4.19- 200.fc31.aarch64 2/2 grubby fatal error: unable to find a suitable template
Verifying : kernel-core-5.4.19- 200.fc31.aarch64 1/2 Verifying : kernel-core-5.4.19- 200.fc31.aarch64 2/2 Completion plugin: Generating completion cache...
Reinstalled: kernel-core-5.4.19- 200.fc31.aarch64
Complete!
Now have a bug report in Bugzilla and working it. Just attached a stack of files requested to the bug report.
https://bugzilla.redhat.com/show_bug.cgi?id=1804483
Mike
Thanks! Paul
----- Original Message -----
On Tue, 2020-02-18 at 20:22 -0500, Michael H. Warfield wrote:
On Tue, 2020-02-18 at 19:44 -0500, Stuart D. Gathman wrote:
On Tue, 18 Feb 2020, Michael H. Warfield wrote:
So the problem is in grubby. This had to have happened in just the last week. The update to 5.4.18 did not result in this carnage. So, whatever happened with grubby happened recently. Strangely, the /boot/efi/EFI/fedora/grub.cfg file is in the aarch64 image but not in the arm7l image. :-?
It seems like this bug would affect all EFI systems, not just aarch64. I'm be on the alert before installing 5.4.19 on x86_64 as well.
I agree. Seems strange but... I'm actually wondering if the hang and crash is a buffer overrun. I looked on a newly rebuilt system and saw the corruption but it was only a few lines long and the system still worked. I've got a pile of development systems are updated every time a kernel gets updated (once or twice a week) that that line had hit over 800K long. They're all largely mirrors of each other, just with different tasks. Each of the "corrupt" files were identical but all of the updates have been in lock step. All but one of the six affected systems are back up and the single one that's not managed to boot the kernel and started but ran into a panic. But I get a kernel menu on it now.
That's it...
The panic on the odd system out was a missing initramfs but that's one of the two that hurled chunks and crashed during the update and I got it back on the previous kernel and reinstalled the newer kernel to fix that once I had fixed grub.cfg. I then compared the working grub.cfg file to the resulting one from the reinstall.
This:
set default_kernelopts="root=UUID=92567a3e-b3d2-494a-b4ec- ebb4687d6b40 ro cma=192MB" -- Became this during the install: -- set default_kernelopts="root=UUID=92567a3e-b3d2-494a-b4ec- ebb4687d6b40=UUID=92567a3e-b3d2-494a-b4ec-ebb4687d6b40=92567a3e- b3d2-494a-b4ec-ebb4687d6b40 ro cma=192MB" --
And became this after the reinstall was complete:
set default_kernelopts="root=UUID=92567a3e-b3d2-494a-b4ec- ebb4687d6b40=UUID=92567a3e-b3d2-494a-b4ec-ebb4687d6b40=92567a3e- b3d2-494a-b4ec-ebb4687d6b40=UUID=92567a3e-b3d2-494a-b4ec- ebb4687d6b40=UUID=92567a3e-b3d2-494a-b4ec-ebb4687d6b40=92567a3e- b3d2-494a-b4ec-ebb4687d6b40=92567a3e-b3d2-494a-b4ec- ebb4687d6b40=UUID=92567a3e-b3d2-494a-b4ec-ebb4687d6b40=92567a3e- b3d2-494a-b4ec-ebb4687d6b40=UUID=92567a3e-b3d2-494a-b4ec- ebb4687d6b40=92567a3e-b3d2-494a-b4ec-ebb4687d6b40=92567a3e-b3d2- 494a-b4ec-ebb4687d6b40=92567a3e-b3d2-494a-b4ec- ebb4687d6b40=92567a3e-b3d2-494a-b4ec-ebb4687d6b40 ro cma=192MB" --
It duplicated the root UUID on the default_kernelopts several times. And rebooting with that bad line still worked. It seems to be the size of that duplicated root UUID that finally borks it. Each time the kernel gets updated, it gets another string attached until it blows chunks. This may have been in there for a long time, I'm just real aggressive about doing kernel updates.
It's now up on the updated kernel. WIERD. I should probably update my Bugzilla report now.
No idea why it's not showing up on x86_64 systems.
Just checked my x86_64 system (a Lenovo Yoga 730-15). No sign of the corruption in that file. Very strange but 6 systems impacted at nearly the same time (hours).
Mike
Michael H. Warfield (AI4NB) | (o) +1 706 850-8773 | mhw@WittsEnd.com //|=mhw=|// | (c) +1 678 463-0932 | http://www.wittsend.com/mhw/ ARIN whois: MHW9-ARIN | An optimist believes we live in the best of all PGP Key: 0xC0EB9675674627FF | possible worlds. A pessimist is sure of it!
arm mailing list -- arm@lists.fedoraproject.org To unsubscribe send an email to arm-leave@lists.fedoraproject.org Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/ List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedoraproject.org/archives/list/arm@lists.fedoraproject.org
On Wed, 2020-02-19 at 11:25 -0500, Paul Whalen wrote:
Hi Michael,
Trying to reproduce this issue without success. How did you write the images and what command was used? After booting, did you confirm the kargs were as expected?
The original image booted and worked. Some of the machines had been updated to 5.4.18 the week before. One newer build was from the original image and updated directly to 5.4.19 and none of the 6 machines could be rebooted without fixing the /boot/efi/EFI/fedora/grub.cfg file.
Not sure what you mean by confirming the kargs or what would be expected. So I guess the answer there was no.
Once booted, you just 'dnf update' and get the corrupted grub.cfg?
Thanks! Paul
----- Original Message -----
On Tue, 2020-02-18 at 20:22 -0500, Michael H. Warfield wrote:
On Tue, 2020-02-18 at 19:44 -0500, Stuart D. Gathman wrote:
On Tue, 18 Feb 2020, Michael H. Warfield wrote:
So the problem is in grubby. This had to have happened in just the last week. The update to 5.4.18 did not result in this carnage. So, whatever happened with grubby happened recently. Strangely, the /boot/efi/EFI/fedora/grub.cfg file is in the aarch64 image but not in the arm7l image. :-?
It seems like this bug would affect all EFI systems, not just aarch64. I'm be on the alert before installing 5.4.19 on x86_64 as well.
I agree. Seems strange but... I'm actually wondering if the hang and crash is a buffer overrun. I looked on a newly rebuilt system and saw the corruption but it was only a few lines long and the system still worked. I've got a pile of development systems are updated every time a kernel gets updated (once or twice a week) that that line had hit over 800K long. They're all largely mirrors of each other, just with different tasks. Each of the "corrupt" files were identical but all of the updates have been in lock step. All but one of the six affected systems are back up and the single one that's not managed to boot the kernel and started but ran into a panic. But I get a kernel menu on it now.
That's it...
The panic on the odd system out was a missing initramfs but that's one of the two that hurled chunks and crashed during the update and I got it back on the previous kernel and reinstalled the newer kernel to fix that once I had fixed grub.cfg. I then compared the working grub.cfg file to the resulting one from the reinstall.
This:
set default_kernelopts="root=UUID=92567a3e-b3d2-494a-b4ec- ebb4687d6b40 ro cma=192MB" -- Became this during the install: -- set default_kernelopts="root=UUID=92567a3e-b3d2-494a-b4ec- ebb4687d6b40=UUID=92567a3e-b3d2-494a-b4ec-ebb4687d6b40=92567a3e- b3d2-494a-b4ec-ebb4687d6b40 ro cma=192MB" --
And became this after the reinstall was complete:
set default_kernelopts="root=UUID=92567a3e-b3d2-494a-b4ec- ebb4687d6b40=UUID=92567a3e-b3d2-494a-b4ec-ebb4687d6b40=92567a3e- b3d2-494a-b4ec-ebb4687d6b40=UUID=92567a3e-b3d2-494a-b4ec- ebb4687d6b40=UUID=92567a3e-b3d2-494a-b4ec-ebb4687d6b40=92567a3e- b3d2-494a-b4ec-ebb4687d6b40=92567a3e-b3d2-494a-b4ec- ebb4687d6b40=UUID=92567a3e-b3d2-494a-b4ec-ebb4687d6b40=92567a3e- b3d2-494a-b4ec-ebb4687d6b40=UUID=92567a3e-b3d2-494a-b4ec- ebb4687d6b40=92567a3e-b3d2-494a-b4ec-ebb4687d6b40=92567a3e-b3d2- 494a-b4ec-ebb4687d6b40=92567a3e-b3d2-494a-b4ec- ebb4687d6b40=92567a3e-b3d2-494a-b4ec-ebb4687d6b40 ro cma=192MB" --
It duplicated the root UUID on the default_kernelopts several times. And rebooting with that bad line still worked. It seems to be the size of that duplicated root UUID that finally borks it. Each time the kernel gets updated, it gets another string attached until it blows chunks. This may have been in there for a long time, I'm just real aggressive about doing kernel updates.
It's now up on the updated kernel. WIERD. I should probably update my Bugzilla report now.
No idea why it's not showing up on x86_64 systems.
Just checked my x86_64 system (a Lenovo Yoga 730-15). No sign of the corruption in that file. Very strange but 6 systems impacted at nearly the same time (hours).
Mike
Michael H. Warfield (AI4NB) | (o) +1 706 850-8773 | mhw@WittsEnd.com //|=mhw=|// | (c) +1 678 463-0932 | http://www.wittsend.com/mhw/ ARIN whois: MHW9-ARIN | An optimist believes we live in the best of all PGP Key: 0xC0EB9675674627FF | possible worlds. A pessimist is sure of it!
arm mailing list -- arm@lists.fedoraproject.org To unsubscribe send an email to arm-leave@lists.fedoraproject.org Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/ List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedoraproject.org/archives/list/arm@lists.fedoraproject.org
arm mailing list -- arm@lists.fedoraproject.org To unsubscribe send an email to arm-leave@lists.fedoraproject.org Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/ List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedoraproject.org/archives/list/arm@lists.fedoraproject.org
On Wed, 2020-02-19 at 12:04 -0500, Michael H. Warfield wrote:
On Wed, 2020-02-19 at 11:25 -0500, Paul Whalen wrote:
Hi Michael,
Trying to reproduce this issue without success. How did you write the images and what command was used? After booting, did you confirm the kargs were as expected?
The original image booted and worked. Some of the machines had been updated to 5.4.18 the week before. One newer build was from the original image and updated directly to 5.4.19 and none of the 6 machines could be rebooted without fixing the /boot/efi/EFI/fedora/grub.cfg file.
Not sure what you mean by confirming the kargs or what would be expected. So I guess the answer there was no.
On the bugzilla ticket one error looks to have been from "grubby- deprecated" which was on the system. Removing it and reproducing the reinstall kernel-core and the problem seems to have gone away. Looks like grubby-deprecated is required by extlinux-bootloader, which is then also removed. 3 of the systems have been tested and no longer show the error.
We may have it.
But the grubby-deprecated packages are in both the download images for Xfce aarch64 and armfp for Fedora 31. Doesn't seem to affect armfp.
Mike
Once booted, you just 'dnf update' and get the corrupted grub.cfg?
Thanks! Paul
----- Original Message -----
On Tue, 2020-02-18 at 20:22 -0500, Michael H. Warfield wrote:
On Tue, 2020-02-18 at 19:44 -0500, Stuart D. Gathman wrote:
On Tue, 18 Feb 2020, Michael H. Warfield wrote:
So the problem is in grubby. This had to have happened in just the last week. The update to 5.4.18 did not result in this carnage. So, whatever happened with grubby happened recently. Strangely, the /boot/efi/EFI/fedora/grub.cfg file is in the aarch64 image but not in the arm7l image. :-?
It seems like this bug would affect all EFI systems, not just aarch64. I'm be on the alert before installing 5.4.19 on x86_64 as well.
I agree. Seems strange but... I'm actually wondering if the hang and crash is a buffer overrun. I looked on a newly rebuilt system and saw the corruption but it was only a few lines long and the system still worked. I've got a pile of development systems are updated every time a kernel gets updated (once or twice a week) that that line had hit over 800K long. They're all largely mirrors of each other, just with different tasks. Each of the "corrupt" files were identical but all of the updates have been in lock step. All but one of the six affected systems are back up and the single one that's not managed to boot the kernel and started but ran into a panic. But I get a kernel menu on it now.
That's it...
The panic on the odd system out was a missing initramfs but that's one of the two that hurled chunks and crashed during the update and I got it back on the previous kernel and reinstalled the newer kernel to fix that once I had fixed grub.cfg. I then compared the working grub.cfg file to the resulting one from the reinstall.
This:
set default_kernelopts="root=UUID=92567a3e-b3d2-494a-b4ec- ebb4687d6b40 ro cma=192MB" -- Became this during the install: -- set default_kernelopts="root=UUID=92567a3e-b3d2-494a-b4ec- ebb4687d6b40=UUID=92567a3e-b3d2-494a-b4ec-ebb4687d6b40=92567a3e- b3d2-494a-b4ec-ebb4687d6b40 ro cma=192MB" --
And became this after the reinstall was complete:
set default_kernelopts="root=UUID=92567a3e-b3d2-494a-b4ec- ebb4687d6b40=UUID=92567a3e-b3d2-494a-b4ec-ebb4687d6b40=92567a3e- b3d2-494a-b4ec-ebb4687d6b40=UUID=92567a3e-b3d2-494a-b4ec- ebb4687d6b40=UUID=92567a3e-b3d2-494a-b4ec-ebb4687d6b40=92567a3e- b3d2-494a-b4ec-ebb4687d6b40=92567a3e-b3d2-494a-b4ec- ebb4687d6b40=UUID=92567a3e-b3d2-494a-b4ec-ebb4687d6b40=92567a3e- b3d2-494a-b4ec-ebb4687d6b40=UUID=92567a3e-b3d2-494a-b4ec- ebb4687d6b40=92567a3e-b3d2-494a-b4ec-ebb4687d6b40=92567a3e-b3d2- 494a-b4ec-ebb4687d6b40=92567a3e-b3d2-494a-b4ec- ebb4687d6b40=92567a3e-b3d2-494a-b4ec-ebb4687d6b40 ro cma=192MB" --
It duplicated the root UUID on the default_kernelopts several times. And rebooting with that bad line still worked. It seems to be the size of that duplicated root UUID that finally borks it. Each time the kernel gets updated, it gets another string attached until it blows chunks. This may have been in there for a long time, I'm just real aggressive about doing kernel updates.
It's now up on the updated kernel. WIERD. I should probably update my Bugzilla report now.
No idea why it's not showing up on x86_64 systems.
Just checked my x86_64 system (a Lenovo Yoga 730-15). No sign of the corruption in that file. Very strange but 6 systems impacted at nearly the same time (hours).
Mike
Michael H. Warfield (AI4NB) | (o) +1 706 850-8773 | mhw@WittsEnd.com //|=mhw=|// | (c) +1 678 463-0932 | http://www.wittsend.com/mhw/ ARIN whois: MHW9-ARIN | An optimist believes we live in the best of all PGP Key: 0xC0EB9675674627FF | possible worlds. A pessimist is sure of it!
arm mailing list -- arm@lists.fedoraproject.org To unsubscribe send an email to arm-leave@lists.fedoraproject.org Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/ List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedoraproject.org/archives/list/arm@lists.fedoraproject.org
arm mailing list -- arm@lists.fedoraproject.org To unsubscribe send an email to arm-leave@lists.fedoraproject.org Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/ List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedoraproject.org/archives/list/arm@lists.fedoraproject.org
----- Original Message -----
On Wed, 2020-02-19 at 12:04 -0500, Michael H. Warfield wrote:
On Wed, 2020-02-19 at 11:25 -0500, Paul Whalen wrote:
Hi Michael,
Trying to reproduce this issue without success. How did you write the images and what command was used? After booting, did you confirm the kargs were as expected?
The original image booted and worked. Some of the machines had been updated to 5.4.18 the week before. One newer build was from the original image and updated directly to 5.4.19 and none of the 6 machines could be rebooted without fixing the /boot/efi/EFI/fedora/grub.cfg file.
Not sure what you mean by confirming the kargs or what would be expected. So I guess the answer there was no.
Thanks. I was concerned the issue was from the arm-image-installer doing something it shouldnt.
On the bugzilla ticket one error looks to have been from "grubby- deprecated" which was on the system. Removing it and reproducing the reinstall kernel-core and the problem seems to have gone away. Looks like grubby-deprecated is required by extlinux-bootloader, which is then also removed. 3 of the systems have been tested and no longer show the error.
We may have it.
But the grubby-deprecated packages are in both the download images for Xfce aarch64 and armfp for Fedora 31. Doesn't seem to affect armfp.
The armhfp images use extlinux and require grubby-deprecated. The aarch64 images shouldn't have it installed.
Looking at your bz, I see you mention the aarch64 image has extlinux-bootloader and grubby-deprecated which I dont see. Did you install anything else on the image that could have pulled it in?
Mike
Once booted, you just 'dnf update' and get the corrupted grub.cfg?
Thanks! Paul
----- Original Message -----
On Tue, 2020-02-18 at 20:22 -0500, Michael H. Warfield wrote:
On Tue, 2020-02-18 at 19:44 -0500, Stuart D. Gathman wrote:
On Tue, 18 Feb 2020, Michael H. Warfield wrote:
> So the problem is in grubby. This had to have happened in > just > the > last week. The update to 5.4.18 did not result in this > carnage. So, > whatever happened with grubby happened recently. > Strangely, the /boot/efi/EFI/fedora/grub.cfg file is in the > aarch64 > image but not in the arm7l image. :-? It seems like this bug would affect all EFI systems, not just aarch64. I'm be on the alert before installing 5.4.19 on x86_64 as well.
I agree. Seems strange but... I'm actually wondering if the hang and crash is a buffer overrun. I looked on a newly rebuilt system and saw the corruption but it was only a few lines long and the system still worked. I've got a pile of development systems are updated every time a kernel gets updated (once or twice a week) that that line had hit over 800K long. They're all largely mirrors of each other, just with different tasks. Each of the "corrupt" files were identical but all of the updates have been in lock step. All but one of the six affected systems are back up and the single one that's not managed to boot the kernel and started but ran into a panic. But I get a kernel menu on it now.
That's it...
The panic on the odd system out was a missing initramfs but that's one of the two that hurled chunks and crashed during the update and I got it back on the previous kernel and reinstalled the newer kernel to fix that once I had fixed grub.cfg. I then compared the working grub.cfg file to the resulting one from the reinstall.
This:
set default_kernelopts="root=UUID=92567a3e-b3d2-494a-b4ec- ebb4687d6b40 ro cma=192MB" -- Became this during the install: -- set default_kernelopts="root=UUID=92567a3e-b3d2-494a-b4ec- ebb4687d6b40=UUID=92567a3e-b3d2-494a-b4ec-ebb4687d6b40=92567a3e- b3d2-494a-b4ec-ebb4687d6b40 ro cma=192MB" --
And became this after the reinstall was complete:
set default_kernelopts="root=UUID=92567a3e-b3d2-494a-b4ec- ebb4687d6b40=UUID=92567a3e-b3d2-494a-b4ec-ebb4687d6b40=92567a3e- b3d2-494a-b4ec-ebb4687d6b40=UUID=92567a3e-b3d2-494a-b4ec- ebb4687d6b40=UUID=92567a3e-b3d2-494a-b4ec-ebb4687d6b40=92567a3e- b3d2-494a-b4ec-ebb4687d6b40=92567a3e-b3d2-494a-b4ec- ebb4687d6b40=UUID=92567a3e-b3d2-494a-b4ec-ebb4687d6b40=92567a3e- b3d2-494a-b4ec-ebb4687d6b40=UUID=92567a3e-b3d2-494a-b4ec- ebb4687d6b40=92567a3e-b3d2-494a-b4ec-ebb4687d6b40=92567a3e-b3d2- 494a-b4ec-ebb4687d6b40=92567a3e-b3d2-494a-b4ec- ebb4687d6b40=92567a3e-b3d2-494a-b4ec-ebb4687d6b40 ro cma=192MB" --
It duplicated the root UUID on the default_kernelopts several times. And rebooting with that bad line still worked. It seems to be the size of that duplicated root UUID that finally borks it. Each time the kernel gets updated, it gets another string attached until it blows chunks. This may have been in there for a long time, I'm just real aggressive about doing kernel updates.
It's now up on the updated kernel. WIERD. I should probably update my Bugzilla report now.
No idea why it's not showing up on x86_64 systems.
Just checked my x86_64 system (a Lenovo Yoga 730-15). No sign of the corruption in that file. Very strange but 6 systems impacted at nearly the same time (hours).
Mike
Michael H. Warfield (AI4NB) | (o) +1 706 850-8773 | mhw@WittsEnd.com //|=mhw=|// | (c) +1 678 463-0932 | http://www.wittsend.com/mhw/ ARIN whois: MHW9-ARIN | An optimist believes we live in the best of all PGP Key: 0xC0EB9675674627FF | possible worlds. A pessimist is sure of it!
arm mailing list -- arm@lists.fedoraproject.org To unsubscribe send an email to arm-leave@lists.fedoraproject.org Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/ List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedoraproject.org/archives/list/arm@lists.fedoraproject.org
arm mailing list -- arm@lists.fedoraproject.org To unsubscribe send an email to arm-leave@lists.fedoraproject.org Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/ List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedoraproject.org/archives/list/arm@lists.fedoraproject.org
-- Michael H. Warfield (AI4NB) | (o) +1 706 850-8773 | mhw@WittsEnd.com //|=mhw=|// | (c) +1 678 463-0932 | http://www.wittsend.com/mhw/ ARIN whois: MHW9-ARIN | An optimist believes we live in the best of all PGP Key: 0xC0EB9675674627FF | possible worlds. A pessimist is sure of it!
On Wed, 2020-02-19 at 14:12 -0500, Paul Whalen wrote:
----- Original Message -----
On Wed, 2020-02-19 at 12:04 -0500, Michael H. Warfield wrote:
On Wed, 2020-02-19 at 11:25 -0500, Paul Whalen wrote:
Hi Michael,
Trying to reproduce this issue without success. How did you write the images and what command was used? After booting, did you confirm the kargs were as expected?
The original image booted and worked. Some of the machines had been updated to 5.4.18 the week before. One newer build was from the original image and updated directly to 5.4.19 and none of the 6 machines could be rebooted without fixing the /boot/efi/EFI/fedora/grub.cfg file. Not sure what you mean by confirming the kargs or what would be expected. So I guess the answer there was no.
Thanks. I was concerned the issue was from the arm-image-installer doing something it shouldnt.
On the bugzilla ticket one error looks to have been from "grubby- deprecated" which was on the system. Removing it and reproducing the reinstall kernel-core and the problem seems to have gone away. Looks like grubby-deprecated is required by extlinux-bootloader, which is then also removed. 3 of the systems have been tested and no longer show the error.
We may have it.
But the grubby-deprecated packages are in both the download images for Xfce aarch64 and armfp for Fedora 31. Doesn't seem to affect armfp.
The armhfp images use extlinux and require grubby-deprecated. The aarch64 images shouldn't have it installed.
Looking at your bz, I see you mention the aarch64 image has extlinux- bootloader and grubby-deprecated which I dont see. Did you install anything else on the image that could have pulled it in?
I don't think so. I'll do a fresh raw image cut and see if there is something in there without doing any updates. Since I have a mix of systems, there may have been a cross contamination and most of the 64 bit systems were built to upgrade and replace their 32 bit previous Fedora versions on earlier systems built before the aarch64 was stable enough (Fedora 30? 29?). And I built one system from scratch and updated it from there. Since these are in sort of a cooperative cluster, it might be possible that cross over between what rpm packages were installed, but it hit all of the V3 systems and I only have a couple of V2 systems left on-line (for testing) at this time that require it and all of them have been updating successfully for months.
Just checked a fresh raw un-updated image and you're right, it's not there.
Now I gotta figure out how it got on all 6 of my V3 systems and WHEN. It's even on my reference card. So it got introduced into my build process very early on. Maybe from an old backup.
What's bad is that it's a required package for the armfp systems but is in the aarch64 repositories and yet can cause this problem if it's accidentally installed. Maybe that should be should be removed from the aarch64 repositories unless there's an actual documented need for it with the caveats of what could happen?
Mike
Mike
Once booted, you just 'dnf update' and get the corrupted grub.cfg?
Thanks! Paul
----- Original Message -----
On Tue, 2020-02-18 at 20:22 -0500, Michael H. Warfield wrote:
On Tue, 2020-02-18 at 19:44 -0500, Stuart D. Gathman wrote: > On Tue, 18 Feb 2020, Michael H. Warfield wrote: > > > So the problem is in grubby. This had to have happened > > in > > just > > the > > last week. The update to 5.4.18 did not result in this > > carnage. So, > > whatever happened with grubby happened recently. > > Strangely, the /boot/efi/EFI/fedora/grub.cfg file is in > > the > > aarch64 > > image but not in the arm7l image. :-? > It seems like this bug would affect all EFI systems, not > just > aarch64. > I'm be on the alert before installing 5.4.19 on x86_64 as > well. I agree. Seems strange but... I'm actually wondering if the hang and crash is a buffer overrun. I looked on a newly rebuilt system and saw the corruption but it was only a few lines long and the system still worked. I've got a pile of development systems are updated every time a kernel gets updated (once or twice a week) that that line had hit over 800K long. They're all largely mirrors of each other, just with different tasks. Each of the "corrupt" files were identical but all of the updates have been in lock step. All but one of the six affected systems are back up and the single one that's not managed to boot the kernel and started but ran into a panic. But I get a kernel menu on it now.
That's it...
The panic on the odd system out was a missing initramfs but that's one of the two that hurled chunks and crashed during the update and I got it back on the previous kernel and reinstalled the newer kernel to fix that once I had fixed grub.cfg. I then compared the working grub.cfg file to the resulting one from the reinstall.
This:
set default_kernelopts="root=UUID=92567a3e-b3d2-494a-b4ec- ebb4687d6b40 ro cma=192MB" -- Became this during the install: -- set default_kernelopts="root=UUID=92567a3e-b3d2-494a-b4ec- ebb4687d6b40=UUID=92567a3e-b3d2-494a-b4ec- ebb4687d6b40=92567a3e- b3d2-494a-b4ec-ebb4687d6b40 ro cma=192MB" --
And became this after the reinstall was complete:
set default_kernelopts="root=UUID=92567a3e-b3d2-494a-b4ec- ebb4687d6b40=UUID=92567a3e-b3d2-494a-b4ec- ebb4687d6b40=92567a3e- b3d2-494a-b4ec-ebb4687d6b40=UUID=92567a3e-b3d2-494a-b4ec- ebb4687d6b40=UUID=92567a3e-b3d2-494a-b4ec- ebb4687d6b40=92567a3e- b3d2-494a-b4ec-ebb4687d6b40=92567a3e-b3d2-494a-b4ec- ebb4687d6b40=UUID=92567a3e-b3d2-494a-b4ec- ebb4687d6b40=92567a3e- b3d2-494a-b4ec-ebb4687d6b40=UUID=92567a3e-b3d2-494a-b4ec- ebb4687d6b40=92567a3e-b3d2-494a-b4ec-ebb4687d6b40=92567a3e- b3d2- 494a-b4ec-ebb4687d6b40=92567a3e-b3d2-494a-b4ec- ebb4687d6b40=92567a3e-b3d2-494a-b4ec-ebb4687d6b40 ro cma=192MB" --
It duplicated the root UUID on the default_kernelopts several times. And rebooting with that bad line still worked. It seems to be the size of that duplicated root UUID that finally borks it. Each time the kernel gets updated, it gets another string attached until it blows chunks. This may have been in there for a long time, I'm just real aggressive about doing kernel updates.
It's now up on the updated kernel. WIERD. I should probably update my Bugzilla report now.
No idea why it's not showing up on x86_64 systems.
Just checked my x86_64 system (a Lenovo Yoga 730-15). No sign of the corruption in that file. Very strange but 6 systems impacted at nearly the same time (hours).
Mike
Michael H. Warfield (AI4NB) | (o) +1 706 850-8773 | mhw@WittsEnd.com //|=mhw=|// | (c) +1 678 463-0932 | http://www.wittsend.com/mhw/ ARIN whois: MHW9-ARIN | An optimist believes we live in the best of all PGP Key: 0xC0EB9675674627FF | possible worlds. A pessimist is sure of it!
arm mailing list -- arm@lists.fedoraproject.org To unsubscribe send an email to arm-leave@lists.fedoraproject.org Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/ List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedoraproject.org/archives/list/arm@lists.fedoraproject.org
arm mailing list -- arm@lists.fedoraproject.org To unsubscribe send an email to arm-leave@lists.fedoraproject.org Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/ List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedoraproject.org/archives/list/arm@lists.fedoraproject.org
-- Michael H. Warfield (AI4NB) | (o) +1 706 850-8773 | mhw@WittsEnd.com //|=mhw=|// | (c) +1 678 463-0932 | http://www.wittsend.com/mhw/ ARIN whois: MHW9-ARIN | An optimist believes we live in the best of all PGP Key: 0xC0EB9675674627FF | possible worlds. A pessimist is sure of it!
On Wed, 2020-02-19 at 15:09 -0500, Michael H. Warfield wrote:
: - <Snip>
The armhfp images use extlinux and require grubby-deprecated. The aarch64 images shouldn't have it installed. Looking at your bz, I see you mention the aarch64 image has extlinux- bootloader and grubby-deprecated which I dont see. Did you install anything else on the image that could have pulled it in?
Well now I'm really baffled. Based on the dnf logs on one of the affected systems, grubby-deprecated seems to have been pulled in about two weeks ago. That would account for the timing. But it was on a system that was upgraded to aarch64 and the 5.3.7 kernel months ago and been in operation and updated frequently since then.
I don't think so. I'll do a fresh raw image cut and see if there is something in there without doing any updates. Since I have a mix of systems, there may have been a cross contamination and most of the 64 bit systems were built to upgrade and replace their 32 bit previous Fedora versions on earlier systems built before the aarch64 was stable enough (Fedora 30? 29?). And I built one system from scratch and updated it from there. Since these are in sort of a cooperative cluster, it might be possible that cross over between what rpm packages were installed, but it hit all of the V3 systems and I only have a couple of V2 systems left on-line (for testing) at this time that require it and all of them have been updating successfully for months.
Just checked a fresh raw un-updated image and you're right, it's not there.
Now I gotta figure out how it got on all 6 of my V3 systems and WHEN. It's even on my reference card. So it got introduced into my build process very early on. Maybe from an old backup.
What's bad is that it's a required package for the armfp systems but is in the aarch64 repositories and yet can cause this problem if it's accidentally installed. Maybe that should be should be removed from the aarch64 repositories unless there's an actual documented need for it with the caveats of what could happen?
I think I've got them cleaned up with the various armfp systems "with" and the various aarch64 systems "without". I'm going to go back and flush my system config backup systems and reinitialize in case the culprit is lurking in there (which I highly suspect at this time).
: - <Big snip>
Mike
----- Original Message -----
On Wed, 2020-02-19 at 14:12 -0500, Paul Whalen wrote:
----- Original Message -----
On Wed, 2020-02-19 at 12:04 -0500, Michael H. Warfield wrote:
On Wed, 2020-02-19 at 11:25 -0500, Paul Whalen wrote:
Hi Michael,
Trying to reproduce this issue without success. How did you write the images and what command was used? After booting, did you confirm the kargs were as expected?
The original image booted and worked. Some of the machines had been updated to 5.4.18 the week before. One newer build was from the original image and updated directly to 5.4.19 and none of the 6 machines could be rebooted without fixing the /boot/efi/EFI/fedora/grub.cfg file. Not sure what you mean by confirming the kargs or what would be expected. So I guess the answer there was no.
Thanks. I was concerned the issue was from the arm-image-installer doing something it shouldnt.
On the bugzilla ticket one error looks to have been from "grubby- deprecated" which was on the system. Removing it and reproducing the reinstall kernel-core and the problem seems to have gone away. Looks like grubby-deprecated is required by extlinux-bootloader, which is then also removed. 3 of the systems have been tested and no longer show the error.
We may have it.
But the grubby-deprecated packages are in both the download images for Xfce aarch64 and armfp for Fedora 31. Doesn't seem to affect armfp.
The armhfp images use extlinux and require grubby-deprecated. The aarch64 images shouldn't have it installed.
Looking at your bz, I see you mention the aarch64 image has extlinux- bootloader and grubby-deprecated which I dont see. Did you install anything else on the image that could have pulled it in?
I don't think so. I'll do a fresh raw image cut and see if there is something in there without doing any updates. Since I have a mix of systems, there may have been a cross contamination and most of the 64 bit systems were built to upgrade and replace their 32 bit previous Fedora versions on earlier systems built before the aarch64 was stable enough (Fedora 30? 29?). And I built one system from scratch and updated it from there. Since these are in sort of a cooperative cluster, it might be possible that cross over between what rpm packages were installed, but it hit all of the V3 systems and I only have a couple of V2 systems left on-line (for testing) at this time that require it and all of them have been updating successfully for months.
Just checked a fresh raw un-updated image and you're right, it's not there.
Thanks for confirming.
Now I gotta figure out how it got on all 6 of my V3 systems and WHEN. It's even on my reference card. So it got introduced into my build process very early on. Maybe from an old backup.
What's bad is that it's a required package for the armfp systems but is in the aarch64 repositories and yet can cause this problem if it's accidentally installed. Maybe that should be should be removed from the aarch64 repositories unless there's an actual documented need for it with the caveats of what could happen?
Perhaps the maintainer can make some tweaks so it doesnt happen again. I think we'll all remember this one if someone hits it and sends to our list :)
Thanks for chasing it down.
Paul
Mike
Mike
Once booted, you just 'dnf update' and get the corrupted grub.cfg?
Thanks! Paul
----- Original Message -----
On Tue, 2020-02-18 at 20:22 -0500, Michael H. Warfield wrote: > On Tue, 2020-02-18 at 19:44 -0500, Stuart D. Gathman wrote: > > On Tue, 18 Feb 2020, Michael H. Warfield wrote: > > > > > So the problem is in grubby. This had to have happened > > > in > > > just > > > the > > > last week. The update to 5.4.18 did not result in this > > > carnage. So, > > > whatever happened with grubby happened recently. > > > Strangely, the /boot/efi/EFI/fedora/grub.cfg file is in > > > the > > > aarch64 > > > image but not in the arm7l image. :-? > > It seems like this bug would affect all EFI systems, not > > just > > aarch64. > > I'm be on the alert before installing 5.4.19 on x86_64 as > > well. > I agree. Seems strange but... > I'm actually wondering if the hang and crash is a buffer > overrun. I > looked on a newly rebuilt system and saw the corruption but > it > was > only > a few lines long and the system still worked. I've got a > pile > of > development systems are updated every time a kernel gets > updated > (once > or twice a week) that that line had hit over 800K > long. They're > all > largely mirrors of each other, just with different > tasks. Each > of > the > "corrupt" files were identical but all of the updates have > been > in > lock > step. > All but one of the six affected systems are back up and the > single > one > that's not managed to boot the kernel and started but ran > into > a > panic. > But I get a kernel menu on it now.
That's it...
The panic on the odd system out was a missing initramfs but that's one of the two that hurled chunks and crashed during the update and I got it back on the previous kernel and reinstalled the newer kernel to fix that once I had fixed grub.cfg. I then compared the working grub.cfg file to the resulting one from the reinstall.
This:
set default_kernelopts="root=UUID=92567a3e-b3d2-494a-b4ec- ebb4687d6b40 ro cma=192MB" -- Became this during the install: -- set default_kernelopts="root=UUID=92567a3e-b3d2-494a-b4ec- ebb4687d6b40=UUID=92567a3e-b3d2-494a-b4ec- ebb4687d6b40=92567a3e- b3d2-494a-b4ec-ebb4687d6b40 ro cma=192MB" --
And became this after the reinstall was complete:
set default_kernelopts="root=UUID=92567a3e-b3d2-494a-b4ec- ebb4687d6b40=UUID=92567a3e-b3d2-494a-b4ec- ebb4687d6b40=92567a3e- b3d2-494a-b4ec-ebb4687d6b40=UUID=92567a3e-b3d2-494a-b4ec- ebb4687d6b40=UUID=92567a3e-b3d2-494a-b4ec- ebb4687d6b40=92567a3e- b3d2-494a-b4ec-ebb4687d6b40=92567a3e-b3d2-494a-b4ec- ebb4687d6b40=UUID=92567a3e-b3d2-494a-b4ec- ebb4687d6b40=92567a3e- b3d2-494a-b4ec-ebb4687d6b40=UUID=92567a3e-b3d2-494a-b4ec- ebb4687d6b40=92567a3e-b3d2-494a-b4ec-ebb4687d6b40=92567a3e- b3d2- 494a-b4ec-ebb4687d6b40=92567a3e-b3d2-494a-b4ec- ebb4687d6b40=92567a3e-b3d2-494a-b4ec-ebb4687d6b40 ro cma=192MB" --
It duplicated the root UUID on the default_kernelopts several times. And rebooting with that bad line still worked. It seems to be the size of that duplicated root UUID that finally borks it. Each time the kernel gets updated, it gets another string attached until it blows chunks. This may have been in there for a long time, I'm just real aggressive about doing kernel updates.
It's now up on the updated kernel. WIERD. I should probably update my Bugzilla report now.
No idea why it's not showing up on x86_64 systems.
> Just checked my x86_64 system (a Lenovo Yoga 730-15). No > sign > of > the > corruption in that file. Very strange but 6 systems > impacted > at > nearly > the same time (hours).
Mike
Michael H. Warfield (AI4NB) | (o) +1 706 850-8773 | mhw@WittsEnd.com //|=mhw=|// | (c) +1 678 463-0932 | http://www.wittsend.com/mhw/ ARIN whois: MHW9-ARIN | An optimist believes we live in the best of all PGP Key: 0xC0EB9675674627FF | possible worlds. A pessimist is sure of it!
arm mailing list -- arm@lists.fedoraproject.org To unsubscribe send an email to arm-leave@lists.fedoraproject.org Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/ List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedoraproject.org/archives/list/arm@lists.fedoraproject.org
arm mailing list -- arm@lists.fedoraproject.org To unsubscribe send an email to arm-leave@lists.fedoraproject.org Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/ List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedoraproject.org/archives/list/arm@lists.fedoraproject.org
-- Michael H. Warfield (AI4NB) | (o) +1 706 850-8773 | mhw@WittsEnd.com //|=mhw=|// | (c) +1 678 463-0932 | http://www.wittsend.com/mhw/ ARIN whois: MHW9-ARIN | An optimist believes we live in the best of all PGP Key: 0xC0EB9675674627FF | possible worlds. A pessimist is sure of it!
-- Michael H. Warfield (AI4NB) | (o) +1 706 850-8773 | mhw@WittsEnd.com //|=mhw=|// | (c) +1 678 463-0932 | http://www.wittsend.com/mhw/ ARIN whois: MHW9-ARIN | An optimist believes we live in the best of all PGP Key: 0xC0EB9675674627FF | possible worlds. A pessimist is sure of it!
I have a fair number of Raspberry Pi V3 B/B+ and Raspberry Pi V2 B systems running Fedora 31 (and a V4 I would like to get on Fedora - still waiting). I have Fedora 31 aarch64 installed on most of the V3 and armv7l/armhfp on the V2 systems. Running dnf updating today, updated to the 5.4.19 kernel and the update of the aarch64 images seemed to hang while running the kernel-core script for over an hour. Looking from another terminal and running top (running dnf upgrade remotely over ssh), looks like grubby is hung burning CPU time and eating memory (and I have lots of cache). Couple machines eventually crashed, and I killed grubby on a couple of others, and the dnf eventually ran to completion on the later 2. In all cases (5 - aarch64 systems) I was left with totally unbootable systems. The ones with screens go straight to the U-Boot prompt or trying to reboot over and over again looking for storage and then looking at eth0 and then rebooting and never show the kernel selection prompt. The few armv7l/armhfp systems haven't seem to have gotten that the 5.4.19 upgrade yet. I've still got two booted and running aarch64 systems running (haven't rebooted) and I'm going to try and roll that update back.
Any thoughts to were to go from here? Not sure what to report this under in Bugzilla.
You should at lest be able to select the prior kernel from the menu. Was is the same update on aarch64 and ARMv7?
I'm not aware of any issues but I think all my RPi testing has moved to F-32 or at the very least to 5.5 or later kernels.
What other updates were in the list?
On Tue, 2020-02-18 at 22:46 +0000, Peter Robinson wrote:
I have a fair number of Raspberry Pi V3 B/B+ and Raspberry Pi V2 B systems running Fedora 31 (and a V4 I would like to get on Fedora - still waiting). I have Fedora 31 aarch64 installed on most of the V3 and armv7l/armhfp on the V2 systems. Running dnf updating today, updated to the 5.4.19 kernel and the update of the aarch64 images seemed to hang while running the kernel-core script for over an hour. Looking from another terminal and running top (running dnf upgrade remotely over ssh), looks like grubby is hung burning CPU time and eating memory (and I have lots of cache). Couple machines eventually crashed, and I killed grubby on a couple of others, and the dnf eventually ran to completion on the later 2. In all cases (5 - aarch64 systems) I was left with totally unbootable systems. The ones with screens go straight to the U-Boot prompt or trying to reboot over and over again looking for storage and then looking at eth0 and then rebooting and never show the kernel selection prompt. The few armv7l/armhfp systems haven't seem to have gotten that the 5.4.19 upgrade yet. I've still got two booted and running aarch64 systems running (haven't rebooted) and I'm going to try and roll that update back.
Any thoughts to were to go from here? Not sure what to report this under in Bugzilla.
You should at lest be able to select the prior kernel from the menu. Was is the same update on aarch64 and ARMv7?
No and no. You don't get that far you're dumped immediately into U- Boot. You don't get a kernel menu.
Problem as been isolated to grubby creating a corrupted grub.cfg file. Fixing that file has gotten all the damaged systems back up so fire that I've fixed. Two systems I was able to kill -9 grubby and immediately replace the grub.cfg file with a clean one and they recovered. I've just got one dead one left that I have to pry the sd card out of and edit the file on another system.
I'm not aware of any issues but I think all my RPi testing has moved to F-32 or at the very least to 5.5 or later kernels.
I need to move to F-32 for sure and I definitely want the 5.5 kernel for other reasons. My wife has a Lenovo Yoga 730-13 and the 5.4.* kernels all barf with CPU lockup around the SSD card. Strangely, it does with my Lenvo Yoga 730-15. But my 15" has a replacable SSD and her 13" does not (that's on the only differences). But that's another story. I've got an RPi-4 waiting for Fedora 32.
What other updates were in the list?
I'll have to dig that out of the logs. It's isolated to one of the grubby scripts at this point.
Mike