On Mon, Jul 18, 2022 at 12:10 AM Patrick Hemmer <fedora(a)stormcloud9.net>
wrote:
Ever since upgrading to Fedora 36, my root filesystem is getting
corrupted
every few hours. I maintain block level backups, and I have to restore
every time this happens. xfs_repair can fix the filesystem, but the system
is typically unusable as there's often over 10k files in lost+found.
I have tried creating a brand new filesystem (mkfs.xfs), but it still gets
corrupted.
I would file a bug, but the caveat is that I also have LVM underneath the
filesystem. And so I don't know whether it's a problem with XFS, or LVM. I
have other XFS filesystems also on LVM, and have seen corruption on them as
well, but it's nowhere near as significant or frequent as on the root
filesystem.
Sometimes I can detect the corruption before the kernel does, by doing a
snapshot, and running `xfs_repair -n` on the snapshot. And sometimes the
kernel will detect the corruption first, usually with a message like:
Jul 17 15:06:52 whistler kernel: XFS (dm-0): Metadata corruption detected
at xfs_buf_ioend+0x14c/0x5d0 [xfs], xfs_inode block 0x46057c8
xfs_inode_buf_verify
Jul 17 15:06:52 whistler kernel: XFS (dm-0): Unmount and run xfs_repair
Jul 17 15:06:52 whistler kernel: XFS (dm-0): First 128 bytes of corrupted
metadata buffer:
Jul 17 15:06:52 whistler kernel: 00000000: 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 ................
Jul 17 15:06:52 whistler kernel: 00000010: 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 ................
Jul 17 15:06:52 whistler kernel: 00000020: 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 ................
Jul 17 15:06:52 whistler kernel: 00000030: 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 ................
Jul 17 15:06:52 whistler kernel: 00000040: 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 ................
Jul 17 15:06:52 whistler kernel: 00000050: 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 ................
Jul 17 15:06:52 whistler kernel: 00000060: 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 ................
Jul 17 15:06:52 whistler kernel: 00000070: 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 ................
Jul 17 15:06:52 whistler kernel: XFS (dm-0): metadata I/O error in
"xfs_imap_to_bp+0x40/0x50 [xfs]" at daddr 0x46057c8 len 32 error 117
Jul 17 15:06:52 whistler kernel: XFS (dm-0): Metadata I/O Error (0x1)
detected at xfs_trans_read_buf_map+0x179/0x2d0 [xfs]
(fs/xfs/xfs_trans_buf.c:296). Shutting down filesystem.
Jul 17 15:06:52 whistler kernel: XFS (dm-0): Please unmount the filesystem
and rectify the problem(s)
So how can I proceed on this? Is there any way to determine whether this
is an LVM issue or an XFS issue?
LVM and XFS on linux have been very reliable, so you need to rule out
hardware problems. If the drive supports
S.M.A.R.T then smartmontools can run the internal tests. Some vendors
provide test software (often
Windows only). Cables and connectors should also be considered. Try
swapping cables and connections.
"Contact enhancer" sometimes solves connection problems (now that cars are
full of computers, you can buy
contact enhancer at auto supply stores).
It is very useful to have an external drive to USB adapter. For nvme, a
USB-C nvme case provides a way to
test nvme drives, and a cast-off 128G nvme card can be used in the adapter
as a fast alternative to USB memory
"keys".
_______________________________________________
users mailing list -- users(a)lists.fedoraproject.org
To unsubscribe send an email to users-leave(a)lists.fedoraproject.org
Fedora Code of Conduct:
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines:
https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives:
https://lists.fedoraproject.org/archives/list/users@lists.fedoraproject.org
Do not reply to spam on the list, report it:
https://pagure.io/fedora-infrastructure
--
George N. White III