On Tue, May 24, 2022 at 10:24:32AM -0400, Vivek Goyal wrote:
On Tue, May 24, 2022 at 10:12:40PM +0800, Tao Liu wrote:
> Thin provision is a mechanism that you can allocate a lvm volume which has
> a large virtual size for file systems but actually in a small physical
> size. The physical size can be autoextended in use if thin pool reached a
> threshold specified in /etc/lvm/lvm.conf.
>
> There are 3 works should be handled when enable lvm2 thinp for kdump:
>
> 1) Check if the dump target device or directory is thinp device.
> 2) Monitor the thin pool and autoextend its size when it reached the threshold
> during kdump.
Hi Vivek,
Have you tested that auto-extend logic is working fine?
Secondly, can you please also test what happens if thin pool gets full
and there is no more space for extension. Is system hanging? If yes,
that's not a good situation. We want to reboot back after saving dump
automatically.
If it does hang, We need to add logic to configure xfs error handling so
that it does not retry infinitely.
I have tested the autoextend logic locally, which works fine. If thinpool
gets full, it will first reach "device-mapper: thin: 253:2: switching pool
to out-of-data-space (queue IO) mode", then 60s later it will switch to
"device-mapper: thin: 253:2: switching pool to out-of-data-space (error IO)
mode", then continues. So it will hang for 60s mostly, please see the
dmesg log below.
As Zdenek suggested, we can use "lvchange --errorwhenfull y|n
vgname/thinpoolname"
to skip the waiting, but I think it is not harmful for now.
[ 3.627063] kdump[506]: saving vmcore-dmesg.txt complete
[ 3.635826] kdump[508]: saving vmcore
[ 4.248430] device-mapper: thin: 253:2: reached low water mark for data device: sending
event.
[ 3.875066] lvm[440]: Insufficient free space: 3 extents needed, but only 2 available
[ 3.886365] lvm[440]: Failed command for vg00-thinpool-tpool.
[ 3.896824] lvm[440]: WARNING: Thin pool vg00-thinpool-tpool data is now 95.12% full.
[ 4.436433] device-mapper: thin: 253:2: switching pool to out-of-data-space (queue IO)
mode
[ 3.980617] lvm[440]: Insufficient free space: 4 extents needed, but only 2 available
[ 3.992068] lvm[440]: Failed command for vg00-thinpool-tpool.
[ 4.066058] lvm[440]: Insufficient free space: 4 extents needed, but only 2 available
[ 4.083457] lvm[440]: Failed command for vg00-thinpool-tpool.
[ 4.092540] lvm[440]: WARNING: Thin pool vg00-thinpool-tpool data is now 100.00% full.
[ 4.271764] kdump.sh[509]: ^MChecking for memory holes : [ 0.0
%] / ^MChecking for memory s
[ 4.295810] kdump.sh[509]: The dumpfile is saved to
/kdumproot/mnt/var/crash/127.0.0.1-2022-05-24-14:42:53//vmcore-incomplete.
[ 4.304127] kdump.sh[509]: makedumpfile Completed.
[ 12.564731] lvm[440]: Insufficient free space: 4 extents needed, but only 2 available
[ 12.581630] lvm[440]: Failed command for vg00-thinpool-tpool.
[ 42.569627] lvm[440]: Insufficient free space: 4 extents needed, but only 2 available
[ 42.587529] lvm[440]: Failed command for vg00-thinpool-tpool.
[ 67.085196] device-mapper: thin: 253:2: switching pool to out-of-data-space (error IO)
mode
[ 67.126206] EXT4-fs warning (device dm-3): ext4_end_bio:345: I/O error 3 writing to
inode 32389 starting block 28672)
[ 67.142081] Buffer I/O error on device dm-3, logical block 26625
[ 67.143062] Buffer I/O error on device dm-3, logical block 26626
[ 67.143062] Buffer I/O error on device dm-3, logical block 26627
[ 67.143062] Buffer I/O error on device dm-3, logical block 26628
[ 67.173719] Buffer I/O error on device dm-3, logical block 26629
[ 67.174703] Buffer I/O error on device dm-3, logical block 26630
[ 67.174703] Buffer I/O error on device dm-3, logical block 26631
[ 67.174703] Buffer I/O error on device dm-3, logical block 26632
[ 67.203393] Buffer I/O error on device dm-3, logical block 26633
[ 67.204375] Buffer I/O error on device dm-3, logical block 26634
[ 67.218086] EXT4-fs warning (device dm-3): ext4_end_bio:345: I/O error 3 writing to
inode 32389 starting block 29184)
[ 67.230461] EXT4-fs warning (device dm-3): ext4_end_bio:345: I/O error 3 writ:
[ 67.243262] EXT4-fs warning (device dm-3): ext4_end_bio:345: I/O error 3 writing to
inode 32389 starting block 32768)
[ 67.257469] EXT4-fs warning (device dm-3): ext4_end_bio:345: I/O error 3 writing to
inode 32389 starting block 33280)
[ 67.270994] EXT4-fs warning (device dm-3): ext4_end_bio:345: I/O error 3 writing to
inode 32389 starting block 34816)
[ 67.283991] EXT4-fs warning (device dm-3): ext4_end_bio:345: I/O error 3 writing to
inode 32389 starting block 36864)
[ 67.296368] EXT4-fs warning (device dm-3): ext4_end_bio:345: I/O error 3 writing to
inode 32389 starting block 37376)
[ 67.310025] EXT4-fs warning (device dm-3): ext4_end_bio:345: I/O error 3 writing to
inode 32389 starting block 38912)
[ 67.323535] EXT4-fs warning (device dm-3): ext4_end_bio:345: I/O error 3 writing to
inode 32389 starting block 43520)
[ 67.338894] JBD2: Detected IO errors while flushing file data on dm-3-8
[ 66.856873] lvm[440]: Insufficient free space: 4 extents needed, but only 2 available
[ 66.872656] lvm[440]: Failed command for vg00-thinpool-tpool.
[ 66.884612] kdump.sh[512]: sync: error syncing
'/kdumproot/mnt/var/crash/127.0.0.1-2022-05-24-14:42:53//vmcore': Input/output
error
[ 66.902511] kdump[514]: sync vmcore failed, exitcode:1
[ 66.914007] kdump[516]: saving vmcore failed
> 3) If thin pool size-autoextend fails, the user space program will not know due to
> buffered IO. So "sync -f vmcore" is used during kdump in 2nd kernel,
> to force sync vmcore data into disk.
It would be good if this "sync -f vmcore" fix is sent as a separate patch.
This is needed anyway irrespective of thin pool support.
OK, I will split it to a seperate patch.
Thanks,
Tao Liu
Thanks
Vivek
>
> According to my testing, the memory consumption procedure for lvm2 thinp is the thin
pool
> size-autoextend phase. For fedora and rhel9, the default crashkernel value is
enough. But
> for rhel8, the default crashkernel value 1G-4G:160M is not enough, so it should
> be handled particularly.
>
> v1 -> v2:
>
> 1) Modified the usage of lvs cmd when check if target is lvm2 thinp
> device.
> 2) Removed the sync flag way of mounting for lvm2 thinp target
> during kdump, use "sync -f vmcore" to force sync data, and handle
> the error if fails.
>
> Tao Liu (4):
> Add lvm2 thin provision dump target checker
> Add lvm2-monitor.service for kdump when lvm2 thinp enabled
> lvm.conf should be check modified if lvm2 thinp enabled
> Fix the sync issue for dump_fs
>
> dracut-kdump.sh | 10 ++++++++--
> dracut-lvm2-monitor.service | 15 +++++++++++++++
> dracut-module-setup.sh | 16 ++++++++++++++++
> kdump-lib-initramfs.sh | 20 ++++++++++++++++++++
> kdumpctl | 1 +
> kexec-tools.spec | 2 ++
> 6 files changed, 62 insertions(+), 2 deletions(-)
> create mode 100644 dracut-lvm2-monitor.service
>
> --
> 2.33.1
>