I recently organized the outstanding work Kairui had done previously on reboot estimate support[1].
This series add an more accurate kdump crashkernel estimation helper.
Compared to Kairui's version, I only modified parts of the code to adapt to the new changes in kexec-tools over the last two years. The core part written by Kairui has only been slightly modified by me.
The original Patch4 has been removed.
In Patch3 v2, I moved most of the code used only in estimate to kdump-estimate.sh. I exported some variables defined in kdumpctl in order to use them in estimate.
In Patch4 v2, I found that creating files by using cat or echo and redirecting output would be blocked by selinux, so I switched to the cp command.
The current version is not perfect, and has only been tested on x86_64-fedora. It even includes some known issues:
When a large amount of memory is reserved, if crash_base is greater than 4G, it will reserve 256M of low memory, which causes problems when calculating reserved_size. This is because we subtract the memtotal given in memdebug from the set crashkernel value. However, the extra reserved 256M memory causes memtotal to be greater than crashkernel.
I'm now posting this patchset for everyone to help review and provide feedback.
Thanks.
[1] https://lists.fedoraproject.org/archives/list/kexec@lists.fedoraproject.org/...
Kairui Song & Lichen Liu (5): kdumpctl: only acquire the single instance lock when necessary kdumpctl: allow passing in extra cmdline using env variable kdump-estiamte.sh: introduce a seperate file kdump-estimate.sh: add reboot estimation support Upate crashkernel-howto.txt
.editorconfig | 2 +- crashkernel-howto.txt | 100 ++++- dracut-kdump.sh | 15 + kdump-estimate-cleanup.service | 8 + kdump-estimate.service | 11 + kdump-estimate.sh | 776 +++++++++++++++++++++++++++++++++ kdump-lib.sh | 9 +- kdump.shutdown | 13 + kdumpctl | 213 ++------- kexec-tools.spec | 19 + 10 files changed, 956 insertions(+), 210 deletions(-) create mode 100644 kdump-estimate-cleanup.service create mode 100644 kdump-estimate.service create mode 100755 kdump-estimate.sh create mode 100644 kdump.shutdown
From: Kairui Song <kasong at redhat.com>
Only acquire the single instance lock when necessary.
For some commands, like showmem, estimate or help, there is no reason for kdumpctl to be blocked.
Signed-off-by: Kairui Song <kasong(a)redhat.com> Signed-off-by: Lichen Liu lichliu@redhat.com --- kdumpctl | 13 ++++++++++--- 1 file changed, 10 insertions(+), 3 deletions(-)
diff --git a/kdumpctl b/kdumpctl index 7e561fd..e63adea 100755 --- a/kdumpctl +++ b/kdumpctl @@ -1764,18 +1764,21 @@ main()
case "$1" in start) + single_instance_lock if ! start; then derror "Starting kdump: [FAILED]" exit 1 fi ;; stop) + single_instance_lock if ! stop; then derror "Stopping kdump: [FAILED]" exit 1 fi ;; status) + single_instance_lock EXIT_CODE=0 is_kernel_loaded "$DEFAULT_DUMP_MODE" case "$?" in @@ -1791,9 +1794,11 @@ main() exit $EXIT_CODE ;; reload) + single_instance_lock reload ;; restart) + single_instance_lock if ! stop; then derror "Stopping kdump: [FAILED]" exit 1 @@ -1804,11 +1809,13 @@ main() fi ;; rebuild) + single_instance_lock rebuild ;; condrestart) ;;
propagate) + single_instance_lock propagate_ssh_key ;; showmem) @@ -1821,15 +1828,18 @@ main() get_default_crashkernel "$2" ;; reset-crashkernel) + single_instance_lock shift reset_crashkernel "$@" ;; _reset-crashkernel-after-update) + single_instance_lock if [[ $(kdump_get_conf_val auto_reset_crashkernel) != no ]]; then reset_crashkernel_after_update fi ;; _reset-crashkernel-for-installed_kernel) + single_instance_lock if [[ $(kdump_get_conf_val auto_reset_crashkernel) != no ]]; then reset_crashkernel_for_installed_kernel "$2" fi @@ -1850,9 +1860,6 @@ if [[ ! -f $KDUMP_CONFIG_FILE ]]; then exit 1 fi
-# Other kdumpctl instances will block in queue, until this one exits -single_instance_lock - # To avoid fd 9 leaking, we invoke a subshell, close fd 9 and call main. # So that fd isn't leaking when main is invoking a subshell. (
From: Kairui Song <kasong at redhat.com>
Introduce a KDUMP_COMMANDLINE_EXTRA env variable. It can be used to pass in extra command line params for kexec.
Signed-off-by: Kairui Song <kasong(a)redhat.com> Signed-off-by: Lichen Liu lichliu@redhat.com --- kdumpctl | 1 + 1 file changed, 1 insertion(+)
diff --git a/kdumpctl b/kdumpctl index e63adea..8add4ce 100755 --- a/kdumpctl +++ b/kdumpctl @@ -698,6 +698,7 @@ load_kdump()
KEXEC_ARGS=$(prepare_kexec_args "${KEXEC_ARGS}") KDUMP_COMMANDLINE=$(prepare_cmdline "${KDUMP_COMMANDLINE}" "${KDUMP_COMMANDLINE_REMOVE}" "${KDUMP_COMMANDLINE_APPEND}") + KDUMP_COMMANDLINE="$KDUMP_COMMANDLINE $KDUMP_COMMANDLINE_EXTRA" # For secureboot enabled machines, use new kexec file based syscall. # Old syscall will always fail as it does not have capability to # to kernel signature verification.
From: Kairui Song <kasong at redhat.com>
Move and adjusted the code slightly to make it work within a seperate file.
Estimate depends on some vairables defined in kdumpctl, export them.
Signed-off-by: Kairui Song <kasong(a)redhat.com> Signed-off-by: Lichen Liu lichliu@redhat.com --- kdump-estimate.sh | 186 +++++++++++++++++++++++++++++++++++++++++++ kdumpctl | 199 ++++------------------------------------------ kexec-tools.spec | 2 + 3 files changed, 205 insertions(+), 182 deletions(-) create mode 100755 kdump-estimate.sh
diff --git a/kdump-estimate.sh b/kdump-estimate.sh new file mode 100755 index 0000000..063a6d2 --- /dev/null +++ b/kdump-estimate.sh @@ -0,0 +1,186 @@ +#!/bin/bash + +[[ $dracutbasedir ]] || dracutbasedir=/usr/lib/dracut +. $dracutbasedir/dracut-functions.sh +. /lib/kdump/kdump-lib.sh +. /lib/kdump/kdump-logger.sh + +if [[ -f /etc/sysconfig/kdump ]]; then + . /etc/sysconfig/kdump +fi + +if ! dlog_init; then + echo "failed to initiate the kdump logger." + exit 1 +fi + +check_vmlinux() +{ + # Use readelf to check if it's a valid ELF + readelf -h "$1" &> /dev/null || return 1 +} + +get_vmlinux_size() +{ + local size=0 _msize + + while read -r _msize; do + size=$((size + _msize)) + done <<< "$(readelf -l -W "$1" | awk '/^ LOAD/{print $6}' 2> /dev/stderr)" + + echo $size +} + +try_decompress() +{ + # The obscure use of the "tr" filter is to work around older versions of + # "grep" that report the byte offset of the line instead of the pattern. + + # Try to find the header ($1) and decompress from here + for pos in $(tr "$1\n$2" "\n$2=" < "$4" | grep -abo "^$2"); do + if ! type -P "$3" > /dev/null; then + ddebug "Signiature detected but '$3' is missing, skip this decompressor" + break + fi + + pos=${pos%%:*} + tail "-c+$pos" "$img" | $3 > "$5" 2> /dev/null + if check_vmlinux "$5"; then + ddebug "Kernel is extracted with '$3'" + return 0 + fi + done + + return 1 +} + +# Borrowed from linux/scripts/extract-vmlinux +get_kernel_size() +{ + # Prepare temp files: + local tmp img=$1 + + tmp=$(mktemp /tmp/vmlinux-XXX) + trap 'rm -f "$tmp"' 0 + + # Try to check if it's a vmlinux already + check_vmlinux "$img" && get_vmlinux_size "$img" && return 0 + + # That didn't work, so retry after decompression. + try_decompress '\037\213\010' xy gunzip "$img" "$tmp" || + try_decompress '\3757zXZ\000' abcde unxz "$img" "$tmp" || + try_decompress 'BZh' xy bunzip2 "$img" "$tmp" || + try_decompress '\135\0\0\0' xxx unlzma "$img" "$tmp" || + try_decompress '\211\114\132' xy 'lzop -d' "$img" "$tmp" || + try_decompress '\002!L\030' xxx 'lz4 -d' "$img" "$tmp" || + try_decompress '(\265/\375' xxx unzstd "$img" "$tmp" + + # Finally check for uncompressed images or objects: + [[ $? -eq 0 ]] && get_vmlinux_size "$tmp" && return 0 + + # Fallback to use iomem + local _size=0 _seg + while read -r _seg; do + _size=$((_size + 0x${_seg#*-} - 0x${_seg%-*})) + done <<< "$(grep -E "Kernel (code|rodata|data|bss)" /proc/iomem | cut -d ":" -f 1)" + echo $_size +} + +do_estimate_simple() +{ + local kdump_mods + local -A large_mods + local baseline + local kernel_size mod_size initrd_size baseline_size runtime_size reserved_size estimated_size recommended_size _cryptsetup_overhead + local size_mb=$((1024 * 1024)) + + # TODO: fix TARGET_INITRD reference + kdump_mods="$(lsinitrd "$TARGET_INITRD" -f /usr/lib/dracut/hostonly-kernel-modules.txt | tr '\n' ' ')" + baseline=$(kdump_get_arch_recommend_size) + if [[ ${baseline: -1} == "M" ]]; then + baseline=${baseline%M} + elif [[ ${baseline: -1} == "G" ]]; then + baseline=$((${baseline%G} * 1024)) + elif [[ ${baseline: -1} == "T" ]]; then + baseline=$((${baseline%Y} * 1048576)) + fi + + # The default pre-reserved crashkernel value + baseline_size=$((baseline * size_mb)) + # Current reserved crashkernel size + reserved_size=$(< /sys/kernel/kexec_crash_size) + # A pre-estimated value for userspace usage and kernel + # runtime allocation, 64M should good for most cases + runtime_size=$((64 * size_mb)) + # Kernel image size + kernel_size=$(get_kernel_size "$KDUMP_KERNEL") + # Kdump initramfs size + initrd_size=$(du -b "$TARGET_INITRD" | awk '{print $1}') + # Kernel modules static size after loaded + mod_size=0 + while read -r _name _size _; do + if [[ " $kdump_mods " != *" $_name "* ]]; then + continue + fi + mod_size=$((mod_size + _size)) + + # Mark module with static size larger than 2M as large module + if [[ $((_size / size_mb)) -ge 1 ]]; then + large_mods[$_name]=$_size + fi + done <<< "$(< /proc/modules)" + + # Extra memory usage required for LUKS2 decryption + crypt_size=0 + for _dev in $(get_all_kdump_crypt_dev); do + _crypt_info=$(cryptsetup luksDump "/dev/block/$_dev") + [[ $(echo "$_crypt_info" | sed -n "s/^Version:\s*(.*)/\1/p") == "2" ]] || continue + for _mem in $(echo "$_crypt_info" | sed -n "s/\sMemory:\s*(.*)/\1/p" | sort -n -r); do + crypt_size=$((crypt_size + _mem * 1024)) + break + done + done + + if [[ $crypt_size -ne 0 ]]; then + if [[ $(uname -m) == aarch64 ]]; then + _cryptsetup_overhead=50 + else + _cryptsetup_overhead=20 + fi + + crypt_size=$((crypt_size + _cryptsetup_overhead * size_mb)) + echo -e "Encrypted kdump target requires extra memory, assuming using the keyslot with maximum memory requirement\n" + fi + + estimated_size=$((kernel_size + mod_size + initrd_size + runtime_size + crypt_size)) + if [[ $baseline_size -gt $estimated_size ]]; then + recommended_size=$baseline_size + else + recommended_size=$estimated_size + fi + + echo "Reserved crashkernel: $((reserved_size / size_mb))M" + echo "Recommended crashkernel: $((recommended_size / size_mb))M" + echo + echo "Kernel image size: $((kernel_size / size_mb))M" + echo "Kernel modules size: $((mod_size / size_mb))M" + echo "Initramfs size: $((initrd_size / size_mb))M" + echo "Runtime reservation: $((runtime_size / size_mb))M" + [[ $crypt_size -ne 0 ]] && + echo "LUKS required size: $((crypt_size / size_mb))M" + echo -n "Large modules:" + if [[ ${#large_mods[@]} -eq 0 ]]; then + echo " <none>" + else + echo "" + for _mod in "${!large_mods[@]}"; do + echo " $_mod: ${large_mods[$_mod]}" + done + fi + + if [[ $reserved_size -le $recommended_size ]]; then + echo "WARNING: Current crashkernel size is lower than recommended size $((recommended_size / size_mb))M." + fi +} + +do_estimate_simple diff --git a/kdumpctl b/kdumpctl index 8add4ce..ef3aa74 100755 --- a/kdumpctl +++ b/kdumpctl @@ -1165,187 +1165,6 @@ rebuild() rebuild_initrd }
-check_vmlinux() -{ - # Use readelf to check if it's a valid ELF - readelf -h "$1" &> /dev/null || return 1 -} - -get_vmlinux_size() -{ - local size=0 _msize - - while read -r _msize; do - size=$((size + _msize)) - done <<< "$(readelf -l -W "$1" | awk '/^ LOAD/{print $6}' 2> /dev/stderr)" - - echo $size -} - -try_decompress() -{ - # The obscure use of the "tr" filter is to work around older versions of - # "grep" that report the byte offset of the line instead of the pattern. - - # Try to find the header ($1) and decompress from here - for pos in $(tr "$1\n$2" "\n$2=" < "$4" | grep -abo "^$2"); do - if ! type -P "$3" > /dev/null; then - ddebug "Signiature detected but '$3' is missing, skip this decompressor" - break - fi - - pos=${pos%%:*} - tail "-c+$pos" "$img" | $3 > "$5" 2> /dev/null - if check_vmlinux "$5"; then - ddebug "Kernel is extracted with '$3'" - return 0 - fi - done - - return 1 -} - -# Borrowed from linux/scripts/extract-vmlinux -get_kernel_size() -{ - # Prepare temp files: - local tmp img=$1 - - tmp="$KDUMP_TMPDIR/vmlinux" - - # Try to check if it's a vmlinux already - check_vmlinux "$img" && get_vmlinux_size "$img" && return 0 - - # That didn't work, so retry after decompression. - try_decompress '\037\213\010' xy gunzip "$img" "$tmp" || - try_decompress '\3757zXZ\000' abcde unxz "$img" "$tmp" || - try_decompress 'BZh' xy bunzip2 "$img" "$tmp" || - try_decompress '\135\0\0\0' xxx unlzma "$img" "$tmp" || - try_decompress '\211\114\132' xy 'lzop -d' "$img" "$tmp" || - try_decompress '\002!L\030' xxx 'lz4 -d' "$img" "$tmp" || - try_decompress '(\265/\375' xxx unzstd "$img" "$tmp" - - # Finally check for uncompressed images or objects: - [[ $? -eq 0 ]] && get_vmlinux_size "$tmp" && return 0 - - # Fallback to use iomem - local _size=0 _seg - while read -r _seg; do - _size=$((_size + 0x${_seg#*-} - 0x${_seg%-*})) - done <<< "$(grep -E "Kernel (code|rodata|data|bss)" /proc/iomem | cut -d ":" -f 1)" - echo $_size -} - -do_estimate() -{ - local kdump_mods - local -A large_mods - local baseline - local kernel_size mod_size initrd_size baseline_size runtime_size reserved_size estimated_size recommended_size _cryptsetup_overhead - local size_mb=$((1024 * 1024)) - - setup_initrd - is_system_modified - case "$?" in - 0) - # Nothing to do - ;; - 1) - rebuild_initrd || return - ;; - *) - return - ;; - esac - - kdump_mods="$(lsinitrd "$TARGET_INITRD" -f /usr/lib/dracut/hostonly-kernel-modules.txt | tr '\n' ' ')" - baseline=$(kdump_get_arch_recommend_size) - if [[ ${baseline: -1} == "M" ]]; then - baseline=${baseline%M} - elif [[ ${baseline: -1} == "G" ]]; then - baseline=$((${baseline%G} * 1024)) - elif [[ ${baseline: -1} == "T" ]]; then - baseline=$((${baseline%Y} * 1048576)) - fi - - # The default pre-reserved crashkernel value - baseline_size=$((baseline * size_mb)) - # Current reserved crashkernel size - reserved_size=$(< /sys/kernel/kexec_crash_size) - # A pre-estimated value for userspace usage and kernel - # runtime allocation, 64M should good for most cases - runtime_size=$((64 * size_mb)) - # Kernel image size - kernel_size=$(get_kernel_size "$KDUMP_KERNEL") - # Kdump initramfs size - initrd_size=$(du -b "$TARGET_INITRD" | awk '{print $1}') - # Kernel modules static size after loaded - mod_size=0 - while read -r _name _size _; do - if [[ " $kdump_mods " != *" $_name "* ]]; then - continue - fi - mod_size=$((mod_size + _size)) - - # Mark module with static size larger than 2M as large module - if [[ $((_size / size_mb)) -ge 1 ]]; then - large_mods[$_name]=$_size - fi - done <<< "$(< /proc/modules)" - - # Extra memory usage required for LUKS2 decryption - crypt_size=0 - for _dev in $(get_all_kdump_crypt_dev); do - _crypt_info=$(cryptsetup luksDump "/dev/block/$_dev") - [[ $(echo "$_crypt_info" | sed -n "s/^Version:\s*(.*)/\1/p") == "2" ]] || continue - for _mem in $(echo "$_crypt_info" | sed -n "s/\sMemory:\s*(.*)/\1/p" | sort -n -r); do - crypt_size=$((crypt_size + _mem * 1024)) - break - done - done - - if [[ $crypt_size -ne 0 ]]; then - if [[ $(uname -m) == aarch64 ]]; then - _cryptsetup_overhead=50 - else - _cryptsetup_overhead=20 - fi - - crypt_size=$((crypt_size + _cryptsetup_overhead * size_mb)) - echo -e "Encrypted kdump target requires extra memory, assuming using the keyslot with maximum memory requirement\n" - fi - - estimated_size=$((kernel_size + mod_size + initrd_size + runtime_size + crypt_size)) - if [[ $baseline_size -gt $estimated_size ]]; then - recommended_size=$baseline_size - else - recommended_size=$estimated_size - fi - - echo "Reserved crashkernel: $((reserved_size / size_mb))M" - echo "Recommended crashkernel: $((recommended_size / size_mb))M" - echo - echo "Kernel image size: $((kernel_size / size_mb))M" - echo "Kernel modules size: $((mod_size / size_mb))M" - echo "Initramfs size: $((initrd_size / size_mb))M" - echo "Runtime reservation: $((runtime_size / size_mb))M" - [[ $crypt_size -ne 0 ]] && - echo "LUKS required size: $((crypt_size / size_mb))M" - echo -n "Large modules:" - if [[ ${#large_mods[@]} -eq 0 ]]; then - echo " <none>" - else - echo "" - for _mod in "${!large_mods[@]}"; do - echo " $_mod: ${large_mods[$_mod]}" - done - fi - - if [[ $reserved_size -lt $recommended_size ]]; then - echo "WARNING: Current crashkernel size is lower than recommended size $((recommended_size / size_mb))M." - fi -} - get_default_crashkernel() { local _dump_mode=$1 @@ -1823,7 +1642,23 @@ main() show_reserved_mem ;; estimate) - do_estimate + export KDUMP_KERNEL + export TARGET_INITRD + export DEFAULT_INITRD + setup_initrd + is_system_modified + case "$?" in + 0) + # Nothing to do + ;; + 1) + rebuild_initrd || return + ;; + *) + return + ;; + esac + exec /lib/kdump/kdump-estimate.sh "$@" ;; get-default-crashkernel) get_default_crashkernel "$2" diff --git a/kexec-tools.spec b/kexec-tools.spec index 5a6648e..9990d92 100644 --- a/kexec-tools.spec +++ b/kexec-tools.spec @@ -39,6 +39,7 @@ Source34: crashkernel-howto.txt Source35: kdump-migrate-action.sh Source36: kdump-restart.sh Source37: 60-fadump.install +Source38: kdump-estimate.sh
####################################### # These are sources for mkdumpramfs @@ -194,6 +195,7 @@ install -m 644 %{SOURCE12} $RPM_BUILD_ROOT%{_mandir}/man8/mkdumprd.8 install -m 644 %{SOURCE25} $RPM_BUILD_ROOT%{_mandir}/man8/kdumpctl.8 install -m 755 %{SOURCE20} $RPM_BUILD_ROOT%{_prefix}/lib/kdump/kdump-lib.sh install -m 755 %{SOURCE23} $RPM_BUILD_ROOT%{_prefix}/lib/kdump/kdump-lib-initramfs.sh +install -m 755 %{SOURCE38} $RPM_BUILD_ROOT%{_prefix}/lib/kdump/kdump-estimate.sh install -m 755 %{SOURCE31} $RPM_BUILD_ROOT%{_prefix}/lib/kdump/kdump-logger.sh %ifarch ppc64 ppc64le install -m 755 %{SOURCE32} $RPM_BUILD_ROOT/usr/sbin/mkfadumprd
From: Kairui Song <kasong at redhat.com>
This introduced a new sub-command "kdumpctl estimate --reboot". This sub command will trigger a kdump run and collect memory usage during the kdump run. The collected info can be viewed with "kdumpctl estimate" (without the --reboot).
The estimation supports all regular kdump targets, except the raw target. An fs storage is needed for storing the collected info. The only way to support raw dump target is to include an extra fs in the initramfs, which consumes extra memory and makes the result unreliable.
To make it work gracefully and compatible with most environments, a temporary file will be created as /kdump-estimate. This file holds the estimation status and progress data. A few new services and systemd trigger are introduced for the implementation:
kdump-estimate.service: This is the boot service, will only start when /kdump-estimate file exists. This is the service that calls kdump estimate routine and makes the whole progress automated.
kdump.shutdown: This is the panic trigger. Then /kdump-estimate indicates a panic is pending to be triggered, this systemd hook will trigger the panic after all filesystems are umounted or remounted ro. So a panic can be triggered safely without losing data.
kdump-estimate-cleanup.service: This is the failure handler. If anything went wrong and kdump-estimate.service failed, this will clean up everything.
An estimation run will progress through three stages:
Reboot stage: It will temporarily boost the crashkernel to a quarter of all available memory, then reboot via kexec to make it take effect. Using kexec reboot can avoid modifying the actual commandline. But a `--hard-reboot` option is also available in case kexec doesn't work.
Panic stage: System is rebooted, a panic will be triggered. Some extra kernel command line will be passed to the kdump kernel indicating extra info need to be collected. kdump scripts will store the info on the dump target temporarily.
Final stage: System reboot back into normal status. Kdump estimate service will collect the captured info, and estimation is done. Now it's reviewable via "kdumpctl estimate".
Currently, the actually reboot estimation logic is only based on the output of "rd.memdebug", which can be extended more later to make use of "memstrack" to track the peak memory usage better.
The info looks like this: ``` Reboot estimation: Last estimation: Thu Sep 23 03:50:24 AM CST 2021 Kernel version: 5.13.15-200.fc34.x86_64 Estimate took: 34s Boosted crashkernel: 511M
Cached memory usage: 70M Uncached memory usage: 92M Reserved memory: 56M
Average runtime memory usage: 138M
WARNING: /etc/kdump.conf has changed since last estimation, the result might be outdated. First kernel based estimation: Reserved crashkernel: 511M
Kernel image size: 34M Kernel modules size: 9M Initramfs size: 36M Runtime reservation: 64M Large modules: xfs: 1908736
Recommended crashkernel: 181M ```
Signed-off-by: Kairui Song <kasong(a)redhat.com> Signed-off-by: Lichen Liu lichliu@redhat.com --- .editorconfig | 2 +- dracut-kdump.sh | 15 + kdump-estimate-cleanup.service | 8 + kdump-estimate.service | 11 + kdump-estimate.sh | 626 ++++++++++++++++++++++++++++++++- kdump-lib.sh | 9 +- kdump.shutdown | 13 + kexec-tools.spec | 17 + 8 files changed, 680 insertions(+), 21 deletions(-) create mode 100644 kdump-estimate-cleanup.service create mode 100644 kdump-estimate.service create mode 100644 kdump.shutdown
diff --git a/.editorconfig b/.editorconfig index b343f27..0f28411 100644 --- a/.editorconfig +++ b/.editorconfig @@ -18,7 +18,7 @@ binary_next_line = false space_redirects = true
# Some scripts will only run with bash -[{mkfadumprd,mkdumprd,kdumpctl,kdump-lib.sh}] +[{mkfadumprd,mkdumprd,kdumpctl,kdump-lib.sh,kdump-estimate.sh}] shell_variant = bash
# Use dracut code style for *-module-setup.sh diff --git a/dracut-kdump.sh b/dracut-kdump.sh index 1fc2231..2807088 100755 --- a/dracut-kdump.sh +++ b/dracut-kdump.sh @@ -188,6 +188,14 @@ dump_fs() return 1 fi
+ _estimate_dir=$(getarg kdump_estimate_dir=) + if [ -n "$_estimate_dir" ]; then + _estimate_dir=$1/$KDUMP_PATH/$_estimate_dir + dinfo "saving estimation result to $_estimate_dir/" + mkdir -p "$_estimate_dir" + ln -sfr "$_dump_fs_path/kexec-dmesg.log" "$_estimate_dir/kexec-dmesg.log" + fi + # improper kernel cmdline can cause the failure of echo, we can ignore this kind of failure return 0 } @@ -437,6 +445,13 @@ dump_ssh() derror "saving vmcore failed, exitcode:$_ret" fi
+ _estimate_dir=$(getarg kdump_estimate_dir=) + if [ -n "$_estimate_dir" ]; then + _estimate_dir="$KDUMP_PATH/$_estimate_dir" + dinfo "saving estimation result to $2:$_estimate_dir/" + ssh $_ssh_opt "$2" "mkdir -p '$_estimate_dir' && ln -sfr '$_ssh_dir/kexec-dmesg.log' '$_estimate_dir/kexec-dmesg.log'" + fi + return $_ret }
diff --git a/kdump-estimate-cleanup.service b/kdump-estimate-cleanup.service new file mode 100644 index 0000000..3b5a0bd --- /dev/null +++ b/kdump-estimate-cleanup.service @@ -0,0 +1,8 @@ +[Unit] +Description=Kdump crash memory usage estimation failed +DefaultDependencies=no + +[Service] +Type=oneshot +ExecStart=/usr/lib/kdump/kdump-estimate.sh stage-clean +StandardOutput=journal+console diff --git a/kdump-estimate.service b/kdump-estimate.service new file mode 100644 index 0000000..b681d4b --- /dev/null +++ b/kdump-estimate.service @@ -0,0 +1,11 @@ +[Unit] +Description=Kdump crash memory usage estimation +ConditionPathExists=/kdump-estimate +After=kdump.service network.target network-online.target remote-fs.target basic.target +OnFailure=kdump-estimate-cleanup.service +DefaultDependencies=no + +[Service] +Type=oneshot +ExecStart=/usr/lib/kdump/kdump-estimate.sh stage-check +StandardOutput=journal+console diff --git a/kdump-estimate.sh b/kdump-estimate.sh index 063a6d2..fbd6d16 100755 --- a/kdump-estimate.sh +++ b/kdump-estimate.sh @@ -1,14 +1,68 @@ #!/bin/bash +# Kdump memory usage estimation
[[ $dracutbasedir ]] || dracutbasedir=/usr/lib/dracut . $dracutbasedir/dracut-functions.sh . /lib/kdump/kdump-lib.sh . /lib/kdump/kdump-logger.sh
+KEXEC=/sbin/kexec +KEXEC_ARGS=() + +DEFAULT_SSH_KEY_PATH="/root/.ssh/kdump_id_rsa" +DEFAULT_SAVE_PATH=/var/crash + +ESTIMATE_REBOOT=0 +ESTIMATE_KEXEC_REBOOT=1 + +ESTIMATE_RESULTS_DIR="/var/lib/kdump/kdump-estimate/" +ESTIMATE_TMPDIR="" +ESTIMATE_TMPMNT="" + +# File used to track estimate progress +# File does not exist: not in a estiamtion process. +# File is not empty: in a estimation process and status are dumped in this file +ESTIMATE_STATUS_FILE=/kdump-estimate + +# Values that will be stored in ESTIMATE_STATUS_FILE: +# Which stage the estimation process is in (reboot/panic/result) +ESTIMATE_STAGE= +# The temporary dir on dump target used to collect info +ESTIMATE_DIR= +# Temporary boosted crashkernel value +ESTIMATE_MEMORY= +# The kernel being estimated +ESTIMATE_KERNEL= +# The size of initramfs used for estimate +ESTIMATE_INITRD_SIZE= +# The time when estimating started +ESTIMATE_START_TIMESTAMP= +# The time when panic is triggered +ESTIMATE_PANIC_TIMESTAMP= +# The time when estimating started +ESTIMATE_END_TIMESTAMP= +# Store original crashkernel= value, only used in hard reboot mode +ESTIMATE_ORIG_CRASHKERNEL= +# The kernel boot entry being estimated, only used in hard reboot mode +ESTIMATE_KERNEL_ENTRY= + +ESTIMATE_STATUS_KEYS=( + ESTIMATE_STAGE + ESTIMATE_DIR + ESTIMATE_MEMORY + ESTIMATE_KERNEL + ESTIMATE_START_TIMESTAMP + ESTIMATE_PANIC_TIMESTAMP + ESTIMATE_END_TIMESTAMP + ESTIMATE_ORIG_CRASHKERNEL + ESTIMATE_KERNEL_ENTRY +) + if [[ -f /etc/sysconfig/kdump ]]; then . /etc/sysconfig/kdump fi
+# initiate the kdump logger if ! dlog_init; then echo "failed to initiate the kdump logger." exit 1 @@ -62,7 +116,7 @@ get_kernel_size()
tmp=$(mktemp /tmp/vmlinux-XXX) trap 'rm -f "$tmp"' 0 - + # Try to check if it's a vmlinux already check_vmlinux "$img" && get_vmlinux_size "$img" && return 0
@@ -86,15 +140,458 @@ get_kernel_size() echo $_size }
-do_estimate_simple() +save_estimate_status() +{ + touch "$ESTIMATE_STATUS_FILE" + chmod 0600 "$ESTIMATE_STATUS_FILE" + + { + for _key in "${ESTIMATE_STATUS_KEYS[@]}"; do + echo "$_key=${!_key}" + done + } > "$ESTIMATE_STATUS_FILE" + + sync +} + +load_estimate_status() +{ + local _file=$1 _key _val + _file=${_file:-$ESTIMATE_STATUS_FILE} + + if [[ -s $_file ]]; then + while IFS="=" read -r _key _val; do + if [[ " ${ESTIMATE_STATUS_KEYS[*]} " == *" $_key "* ]]; then + declare -g "$_key"="$_val" + else + derror "Unknown kdump estimate status '$_key'" + fi + done <<< "$(< "$_file")" + else + derror "Failed to read estimate status file $_file" + return 1 + fi +} + +clear_estimate_status() +{ + mv "$ESTIMATE_STATUS_FILE" "$ESTIMATE_STATUS_FILE.old" + + sync +} + +set_boot_crashkernel() +{ + local _kernel_entry=$1 _ck_value=$2 + + ESTIMATE_ORIG_CRASHKERNEL=$(grubby --info "$_kernel_entry" | sed -n -e 's/^args=.*(crashkernel=\S*).*"$/\1/p') + ESTIMATE_KERNEL_ENTRY=$_kernel_entry + + grubby --args crashkernel="$_ck_value" --update-kernel="$_kernel_entry" + + save_estimate_status +} + +# Restore crashkernel value for the default boot kernel +restore_boot_crashkernel() +{ + if ! [[ $ESTIMATE_KERNEL_ENTRY ]]; then + return 0 + fi + + if [[ $ESTIMATE_ORIG_CRASHKERNEL ]]; then + dinfo "Restoring crashkernel= kernel parameter to original value: '$ESTIMATE_ORIG_CRASHKERNEL'." + grubby --args "$ESTIMATE_ORIG_CRASHKERNEL" --update-kernel "$ESTIMATE_KERNEL_ENTRY" + else + dinfo "Removing crashkernel= kernel parameter." + grubby --remove-args "crashkernel" --update-kernel "$ESTIMATE_KERNEL_ENTRY" + fi + + ESTIMATE_ORIG_CRASHKERNEL= + ESTIMATE_KERNEL_ENTRY= + + save_estimate_status +} + +is_in_estimate_process() +{ + [[ -s $ESTIMATE_STATUS_FILE ]] +} + +ensure_service_enabled() +{ + if ! systemctl is-enabled kdump-estimate &> /dev/null; then + derror "kdump-estimate.service have to be enabled in systemd." + + clear_estimate_status + + exit 1 + fi +} + +prepare_tempdir() { + [[ -n $ESTIMATE_TMPDIR ]] && return 0 + + ESTIMATE_TMPDIR="$(mktemp -d -t kdump-estimate.XXXXXX)" + [ -d "$ESTIMATE_TMPDIR" ] || perror_exit "kdump-estimate: mktemp -p -d -t kdump-estimate.XXXXXX failed." + ESTIMATE_TMPMNT="$ESTIMATE_TMPDIR/target" + + trap ' + ret=$?; + is_mounted $ESTIMATE_TMPMNT && umount -f $ESTIMATE_TMPMNT; + [[ -d $ESTIMATE_TMPDIR ]] && rm --one-file-system -rf -- "$ESTIMATE_TMPDIR"; + exit $ret; + ' EXIT + + # clean up after ourselves no matter how we die. + trap 'exit 1;' SIGINT +} + +check_user_confirm() +{ + local _confirm + + dwarn "continue? (y/n):" + read -r _confirm + while [[ $_confirm != 'y' ]]; do + if [[ $_confirm == 'n' ]]; then + exit 0 + else + echo "Please input (y/n):" + read -r _confirm + fi + done + + return 0 +} +start_staged_reboot_estimate() +{ + local _initrd _confirm _memory _ck_memory _cur_ck_memory + local _kdump_kernel _kexec_cmdline _def_kernel + + ensure_service_enabled + + dwarn "WARNING: This will reboot current system and trigger a panic to" + dwarn " estimate real kdump memory usage, this may take a while." + check_user_confirm + + if [[ $(grep -o crashkernel /proc/cmdline | wc -l) -gt 1 ]]; then + if [[ $ESTIMATE_KEXEC_REBOOT -ne 1 ]]; then + derror "Multiple crashkernel value is being used, hard reboot estimation" + derror "is not support with such config yet." + exit 1 + fi + fi + + if is_raw_dump_target; then + dwarn "ERROR: estimate result for raw dump target is not reliable, kdump requires" + dwarn " a fs storage for the estimation result in capture kernel." + exit 1 + fi + + if [[ $(get_luks_crypt_dev "$(kdump_get_maj_min "$(get_root_fs_device)")") ]] || + [[ $(get_all_kdump_crypt_dev) ]]; then + dwarn "WARNING: encrypted device is in use, you will have to input " + dwarn " the password manually after reboot." + check_user_confirm + fi + + # Check and gather memory info + _memory=$(get_system_size_in_bytes) + _ck_memory=$((_memory / 1024 / 1024 / 4)) + + _cur_ck_memory=$(< /sys/kernel/kexec_crash_size) + _cur_ck_memory=$((_cur_ck_memory / 1024 / 1024)) + + dinfo "Available system RAM: $((_memory / 1024 / 1024))MB" + if [[ $_ck_memory -gt 16384 ]]; then + _ck_memory=16384 + fi + + if [[ $_ck_memory -lt 256 ]]; then + derror "System RAM is too small to run an estimation." + exit 1 + fi + + if [[ $_ck_memory -lt $_cur_ck_memory ]]; then + _ck_memory=$_cur_ck_memory + fi + + dinfo "Will use crashkernel=${_ck_memory}M for estimation." + + # Check and gather kernel and boot info + # Following two environment variables are prepared by prepare_kdump_bootinfo + _initrd=$DEFAULT_INITRD + _kdump_kernel=$KDUMP_KERNEL + + dinfo "Kdump kernel is: $_kdump_kernel" + if [[ $ESTIMATE_KEXEC_REBOOT == 1 ]]; then + + _kexec_cmdline=$(sed -e 's/(\s+|^)crashkernel=\S*(\s+|$)//g' /proc/cmdline) + _kexec_cmdline="$_kexec_cmdline crashkernel=${_ck_memory}M" + + if is_secure_boot_enforced; then + dinfo "Secure Boot is enabled. Using kexec file based syscall." + KEXEC_ARGS+=("-s") + fi + else + _def_kernel=$(grubby --default-kernel) + dinfo "Hard reboot mode is enabled, default boot kernel is: $_def_kernel" + + if [[ $_kdump_kernel != "$_def_kernel" ]]; then + dwarn "Default boot kernel is not the kdump kernel, estimation might be unreliable." + check_user_confirm + fi + fi + + ESTIMATE_START_TIMESTAMP=$(date +%s%N) + ESTIMATE_STAGE="reboot" + ESTIMATE_DIR=".$(tr -dc A-Za-z0-9 < /dev/urandom | head -c 12)" + ESTIMATE_MEMORY=$((_ck_memory * 1024)) + + # rebuild the initramfs + kdumpctl rebuild || exit $? + + # ensure kdump service works and fail early if not + kdumpctl restart || exit $? + + # Kdump initramfs size + ESTIMATE_INITRD_SIZE=$(du -b "$TARGET_INITRD" | awk '{print $1}') + + save_estimate_status + + # double check, in case user touched the service + ensure_service_enabled + + if [[ $ESTIMATE_KEXEC_REBOOT == 1 ]]; then + "$KEXEC" --command-line "$_kexec_cmdline" --initrd "$_initrd" --load "$_kdump_kernel" "${KEXEC_ARGS[@]}" + + dinfo "Rebooting with kexec to apply updated crashkernel kernel parameter." + systemctl kexec + else + set_boot_crashkernel "$_def_kernel" "${_ck_memory}M" + + dinfo "Rebooting to apply updated crashkernel kernel parameter." + reboot + fi +} + +retrive_estimate_result_fs() +{ + local _target=$1 _fstype=$2 _opt=$3 _estimate_dir=$4 _dest=$5 + local _mnt _path + + if ! is_mounted "$_target"; then + _mnt="$ESTIMATE_TMPMNT" + mkdir -p "$_mnt" + mount "$_target" "$_mnt" -t "$_fstype" -o defaults || mount_failure "$_target" "" "$_fstype" + else + _mnt="$(get_mntpoint_from_target "$_target")" + fi + + _path=$(get_save_path) + + # Currently just retrive the kexec dmesg for analyze + cat "$_mnt/$_path/$_estimate_dir/kexec-dmesg.log" > "$_dest/kexec-dmesg.log" + + rm -rf "${_mnt:?}/$_path/$_estimate_dir/" +} + +retrive_estimate_result_ssh() +{ + local _target=$1 _estimate_dir=$2 _dest=$3 + local _key _path _ssh_opt + + _key=$(kdump_get_conf_val sshkey) + if ! [[ -f $_key ]]; then + _key="/root/.ssh/kdump_id_rsa" + if ! [[ -f $_key ]]; then + derror "Default SSH key '$_key' doesn't exist, no available key to try, exiting." + exit 1 + fi + fi + + _ssh_opt=("-i" "$_key" "-o" "BatchMode=yes" "-o" "StrictHostKeyChecking=yes") + _path=$(get_save_path) + + # Currently just retrive the kexec dmesg for analyze + if ! ssh "${_ssh_opt[@]}" "$_target" cat "$_path/$_estimate_dir/kexec-dmesg.log" > "$_dest/kexec-dmesg.log"; then + derror "Failed to retrive estimate result over ssh on '$_target'" + exit 1 + ssh "${_ssh_opt[@]}" "$2" "rm -rf '${_path:?}/$_estimate_dir'" + fi +} + +staged_estimate_panic() +{ + dinfo "Preparing to trigger a panic." + dinfo "Kdump will generate extra data in '$ESTIMATE_DIR' on the dump target." + + KDUMP_COMMANDLINE_EXTRA="rd.memdebug=4 kdump_estimate_dir=$ESTIMATE_DIR" kdumpctl restart || exit $? + + ESTIMATE_STAGE="panic" + ESTIMATE_PANIC_TIMESTAMP=$(date +%s%N) + ESTIMATE_KERNEL=$(uname -r) + save_estimate_status + + # Restore the boot crashkernel value asap + restore_boot_crashkernel + + # The real panic will be triggered by + # /usr/lib/systemd/system-shutdown/kdump.shutdown + dinfo "Triggering a kernel panic." + systemctl halt +} + +analyze_result() +{ + local _result_dir=$1 + local _result_dmesg=$_result_dir/kexec-dmesg.log + local _cached_usage _uncached_usage _reserve_usage + + local _memfree _memavail _memtotal + local _line _i + + while read -r _line; do + case $_line in + *MemTotal:*) + _memtotal=$(echo "$_line" | awk '{print $(NF-1)}') + _reserve_usage=$((ESTIMATE_MEMORY - _memtotal)) + ;; + *MemFree:*) + _memfree=$(echo "$_line" | awk '{print $(NF-1)}') + if [[ -n $_memtotal ]]; then + _i=$((_memtotal - _memfree)) + if [[ $_i -lt $_cached_usage ]] || [[ -z $_cached_usage ]]; then + _cached_usage="$_i" + fi + fi + ;; + *MemAvailable:*) + _memavail=$(echo "$_line" | awk '{print $(NF-1)}') + if [[ -n $_memtotal ]]; then + _i=$((_memtotal - _memavail)) + if [[ $_i -lt $_uncached_usage ]] || [[ -z $_uncached_usage ]]; then + _uncached_usage="$_i" + fi + fi + ;; + esac + done <<< "$(grep -A 3 "[debug_mem]" "$_result_dmesg")" + + _datedir=$(date +%Y-%m-%d-%T) + _estimate_dir=$ESTIMATE_RESULTS_DIR/$_datedir + + mkdir -p "$_estimate_dir" + cp $ESTIMATE_STATUS_FILE "$_estimate_dir/status" + cp $KDUMP_CONFIG_FILE "$_estimate_dir/kdump.conf" + cp "$_result_dmesg" "$_estimate_dir/dmesg" + echo "$_cached_usage $_uncached_usage $_reserve_usage" > "$ESTIMATE_TMPDIR/usage" + cp "$ESTIMATE_TMPDIR/usage" "$_estimate_dir/usage" + + rm -rf $ESTIMATE_RESULTS_DIR/latest + ln -sfr "$_estimate_dir" $ESTIMATE_RESULTS_DIR/latest +} + +staged_estimate_collect_result() +{ + local _target _fstype _opt _estimate_dir _temp_dest _ret + + ESTIMATE_STAGE="result" + ESTIMATE_END_TIMESTAMP=$(date +%s%N) + save_estimate_status + + prepare_tempdir + _estimate_dir=$ESTIMATE_DIR + _temp_dest=$ESTIMATE_TMPDIR + + if is_mount_in_dracut_args; then + local _dracut_args + + _dracut_args=$(kdump_get_conf_val dracut_args) + _target=$(get_dracut_args_target "$_dracut_args") + _fstype=$(get_dracut_args_fstype "$_dracut_args") + _opt=$(get_dracut_args_fsopts "$_dracut_args") + + retrive_estimate_result_fs "$_target" "$_fstype" "$_opt" "$_estimate_dir" "$_temp_dest" + + elif is_nfs_dump_target; then + _target=$(kdump_get_conf_val "nfs|nfs4") + + retrive_estimate_result_fs "$_target" nfs defaults "$_estimate_dir" "$_temp_dest" + elif is_ssh_dump_target; then + _target=$(kdump_get_conf_val "ssh") + + retrive_estimate_result_ssh "$_target" "$_estimate_dir" "$_temp_dest" + else + _target=$(get_block_dump_target) + + if is_raw_dump_target; then + derror "Unexpected error, unsupported dump target." + return 1 + fi + + _opt=$(get_mntopt_from_target "$_target") + _fstype=$(get_fs_type_from_target "$_target") + + retrive_estimate_result_fs "$_target" "$_fstype" "$_opt" "$_estimate_dir" "$_temp_dest" + fi + + if ! [[ -s "$_temp_dest/kexec-dmesg.log" ]]; then + derror "Failed to retrieve kexec-dmesg.log file for estimation." + return 1 + fi + + analyze_result "$_temp_dest" + _ret=$? + + cleanup_estiamte_stage + return $_ret +} + +cleanup_estiamte_stage() +{ + restore_boot_crashkernel + clear_estimate_status +} + +progress_staged_estimate() +{ + if ! load_estimate_status; then + derror "The estimate process is interrupted unexpectedly." + cleanup_estiamte_stage + fi + + if [[ $ESTIMATE_STAGE == "reboot" ]]; then + staged_estimate_panic + elif [[ $ESTIMATE_STAGE == "panic" ]]; then + staged_estimate_collect_result + else + derror "Unknown estimate stage: '$ESTIMATE_STAGE'" + cleanup_estiamte_stage + fi + + return $? +} + + +estimate_report() +{ + ### + ### File based estimation report + ### local kdump_mods local -A large_mods local baseline local kernel_size mod_size initrd_size baseline_size runtime_size reserved_size estimated_size recommended_size _cryptsetup_overhead - local size_mb=$((1024 * 1024)) + local size_mb=$((1024 * 1024)) size_kb=1024 + + if ! [[ -f $TARGET_INITRD ]]; then + derror "kdumpctl estimate: kdump initramfs is not built yet." + exit 1 + fi
- # TODO: fix TARGET_INITRD reference kdump_mods="$(lsinitrd "$TARGET_INITRD" -f /usr/lib/dracut/hostonly-kernel-modules.txt | tr '\n' ' ')" baseline=$(kdump_get_arch_recommend_size) if [[ ${baseline: -1} == "M" ]]; then @@ -102,7 +599,7 @@ do_estimate_simple() elif [[ ${baseline: -1} == "G" ]]; then baseline=$((${baseline%G} * 1024)) elif [[ ${baseline: -1} == "T" ]]; then - baseline=$((${baseline%Y} * 1048576)) + baseline=$((${baseline%T} * 1048576)) fi
# The default pre-reserved crashkernel value @@ -149,28 +646,82 @@ do_estimate_simple() fi
crypt_size=$((crypt_size + _cryptsetup_overhead * size_mb)) - echo -e "Encrypted kdump target requires extra memory, assuming using the keyslot with maximum memory requirement\n" + echo -e "NOTE: Encrypted kdump target requires extra memory, assuming using the keyslot with maximum memory requirement\n" + fi + + ### + ### Reboot based estimation report + ### + local reboot_estimate_dir=$ESTIMATE_RESULTS_DIR/latest + local uncached_usage cached_usage reserve_usage reboot_estimate_size=0 + + if ! [[ -d $reboot_estimate_dir ]]; then + reboot_estimate_dir="" + else + load_estimate_status "$reboot_estimate_dir/status" + + read -r uncached_usage cached_usage reserve_usage < "$reboot_estimate_dir/usage" + reboot_estimate_size=$(((uncached_usage + cached_usage) / 2 + ESTIMATE_INITRD_SIZE + reserve_usage)) + reboot_estimate_size=$((reboot_estimate_size * size_kb)) fi
estimated_size=$((kernel_size + mod_size + initrd_size + runtime_size + crypt_size)) - if [[ $baseline_size -gt $estimated_size ]]; then + if [[ $reboot_estimate_size -gt $estimated_size ]]; then + recommended_size=$reboot_estimate_size + else + recommended_size=$estimated_size + fi + + # There will be a peak usage when initramfs was being unpacked, + # two copies of the squashed content will exists at the same time. + # This can be removed if there is a way to avoid the peak usage. + # TODO: fadump doesn't use squash. + recommended_size=$(( recommended_size + initrd_size )) + + if [[ $baseline_size -gt $recommended_size ]]; then recommended_size=$baseline_size + fi + + echo "Reboot estimation:" + if [[ $reboot_estimate_dir ]]; then + echo " Last estimation: $(date -d "@${ESTIMATE_START_TIMESTAMP:0:10}")" + echo " Kernel version: $ESTIMATE_KERNEL" + echo " Estimate took: $(((ESTIMATE_END_TIMESTAMP - ESTIMATE_PANIC_TIMESTAMP) / 1000 / 1000 / 1000))s" + echo " Boosted crashkernel: $((ESTIMATE_MEMORY / size_kb))M" + echo + echo " Cached memory usage: $((cached_usage / size_kb))M" + echo " Uncached memory usage: $((uncached_usage / size_kb))M" + echo " Reserved memory: $((reserve_usage / size_kb))M" + echo " Note: The reserved memory may be negative because when reserves memory over 4G, also allocates 256M extra low memory for DMA buffers and swiotlb." + echo + echo " Average runtime memory usage: $(((reboot_estimate_size + size_mb - 1) / size_mb))M" + echo + + if [[ -n $(diff $KDUMP_CONFIG_FILE "$reboot_estimate_dir/kdump.conf" -q) ]] ;then + echo " WARNING: $KDUMP_CONFIG_FILE has changed since last estimation, the result might be outdated." + fi + if [[ $(uname -r) != "$ESTIMATE_KERNEL" ]] ;then + echo " WARNING: Kernel version has changed since last estimation, the result might be outdated." + fi + else - recommended_size=$estimated_size + echo " No result available." + echo " Use `kdumpctl estimate --reboot` to do a reboot estimation." + echo fi
- echo "Reserved crashkernel: $((reserved_size / size_mb))M" - echo "Recommended crashkernel: $((recommended_size / size_mb))M" + echo "First kernel based estimation:" + echo " Reserved crashkernel: $((reserved_size / size_mb))M" echo - echo "Kernel image size: $((kernel_size / size_mb))M" - echo "Kernel modules size: $((mod_size / size_mb))M" - echo "Initramfs size: $((initrd_size / size_mb))M" - echo "Runtime reservation: $((runtime_size / size_mb))M" + echo " Kernel image size: $((kernel_size / size_mb))M" + echo " Kernel modules size: $((mod_size / size_mb))M" + echo " Initramfs size: $((initrd_size / size_mb))M" + echo " Runtime reservation: $((runtime_size / size_mb))M" [[ $crypt_size -ne 0 ]] && - echo "LUKS required size: $((crypt_size / size_mb))M" - echo -n "Large modules:" + echo " LUKS required size: $((crypt_size / size_mb))M" + echo -n " Large modules:" if [[ ${#large_mods[@]} -eq 0 ]]; then - echo " <none>" + echo " <none>" else echo "" for _mod in "${!large_mods[@]}"; do @@ -179,8 +730,47 @@ do_estimate_simple() fi
if [[ $reserved_size -le $recommended_size ]]; then + echo + echo "Recommended crashkernel: $((recommended_size / size_mb))M" + echo + fi + # Leave a 1MB margin + if [[ $(( recommended_size - reserved_size )) -gt $size_mb ]]; then echo "WARNING: Current crashkernel size is lower than recommended size $((recommended_size / size_mb))M." fi }
-do_estimate_simple +case $1 in +estimate) + shift + while [[ $# -ne 0 ]]; do + case $1 in + --reboot) + ESTIMATE_REBOOT=1 + ;; + --hard-reboot) + ESTIMATE_KEXEC_REBOOT=0 + ;; + *) + derror "Unrecognized argument $1" + exit 1 + ;; + esac + shift + done + + if [[ $ESTIMATE_REBOOT -eq 1 ]]; then + start_staged_reboot_estimate + else + estimate_report + fi + ;; + +stage-check) + progress_staged_estimate || exit $? + ;; + +stage-clean) + cleanup_estiamte_stage || exit $? + ;; +esac \ No newline at end of file diff --git a/kdump-lib.sh b/kdump-lib.sh index 4290be3..1bed87e 100755 --- a/kdump-lib.sh +++ b/kdump-lib.sh @@ -787,11 +787,16 @@ prepare_cmdline() }
PROC_IOMEM=/proc/iomem +#get system memory size i.e. memblock.memory.total_size in the unit of bytes +get_system_size_in_bytes() +{ + sum=$(sed -n "s/\s*([0-9a-fA-F]+)-([0-9a-fA-F]+) : System RAM$/+ 0x\2 - 0x\1 + 1/p" $PROC_IOMEM) + echo $((sum)) +} #get system memory size i.e. memblock.memory.total_size in the unit of GB get_system_size() { - sum=$(sed -n "s/\s*([0-9a-fA-F]+)-([0-9a-fA-F]+) : System RAM$/+ 0x\2 - 0x\1 + 1/p" $PROC_IOMEM) - echo $(( (sum) / 1024 / 1024 / 1024)) + echo $(( $(get_system_size_in_bytes) / 1024 / 1024 / 1024)) }
# Return the recommended size for the reserved crashkernel memory diff --git a/kdump.shutdown b/kdump.shutdown new file mode 100644 index 0000000..19ed798 --- /dev/null +++ b/kdump.shutdown @@ -0,0 +1,13 @@ +#!/bin/sh +# Trigger a panic for estimation if in a esimation process + +ESTIMATE_STATUS_FILE=/kdump-estimate + +[ -s "$ESTIMATE_STATUS_FILE" ] || exit 0 + +. "$ESTIMATE_STATUS_FILE" + +if [ "$ESTIMATE_STAGE" = "panic" ]; then + echo 1 > /proc/sys/kernel/sysrq + echo c > /proc/sysrq-trigger +fi diff --git a/kexec-tools.spec b/kexec-tools.spec index 9990d92..ae98759 100644 --- a/kexec-tools.spec +++ b/kexec-tools.spec @@ -40,6 +40,9 @@ Source35: kdump-migrate-action.sh Source36: kdump-restart.sh Source37: 60-fadump.install Source38: kdump-estimate.sh +Source39: kdump.shutdown +Source40: kdump-estimate.service +Source41: kdump-estimate-cleanup.service
####################################### # These are sources for mkdumpramfs @@ -180,6 +183,7 @@ mkdir -p -m755 $RPM_BUILD_ROOT%{_bindir} mkdir -p -m755 $RPM_BUILD_ROOT%{_libdir} mkdir -p -m755 $RPM_BUILD_ROOT%{_prefix}/lib/kdump mkdir -p -m755 $RPM_BUILD_ROOT%{_sharedstatedir}/kdump +mkdir -p -m755 $RPM_BUILD_ROOT%{_sharedstatedir}/kdump/kdump-estimate install -m 755 %{SOURCE1} $RPM_BUILD_ROOT%{_bindir}/kdumpctl
install -m 755 build/sbin/kexec $RPM_BUILD_ROOT/usr/sbin/kexec @@ -216,9 +220,17 @@ install -m 755 -D %{SOURCE37} $RPM_BUILD_ROOT%{_prefix}/lib/kernel/install.d/60- %endif install -m 644 %{SOURCE15} $RPM_BUILD_ROOT%{_mandir}/man5/kdump.conf.5 install -m 644 %{SOURCE16} $RPM_BUILD_ROOT%{_unitdir}/kdump.service +install -m 644 %{SOURCE40} $RPM_BUILD_ROOT%{_unitdir}/kdump-estimate.service +install -m 644 %{SOURCE41} $RPM_BUILD_ROOT%{_unitdir}/kdump-estimate-cleanup.service install -m 755 -D %{SOURCE22} $RPM_BUILD_ROOT%{_prefix}/lib/systemd/system-generators/kdump-dep-generator.sh install -m 755 -D %{SOURCE30} $RPM_BUILD_ROOT%{_prefix}/lib/kernel/install.d/60-kdump.install install -m 755 -D %{SOURCE33} $RPM_BUILD_ROOT%{_prefix}/lib/kernel/install.d/92-crashkernel.install +install -m 755 -D %{SOURCE39} $RPM_BUILD_ROOT%{_prefix}/lib/systemd/system-shutdown/kdump.shutdown + +mkdir -p $RPM_BUILD_ROOT%{_unitdir}/multi-user.target.wants/ +pushd $RPM_BUILD_ROOT%{_unitdir}/multi-user.target.wants/ +ln -sr ../kdump-estimate.service +popd
%ifarch %{ix86} x86_64 ppc64 s390x ppc64le aarch64 install -m 755 makedumpfile-%{mkdf_ver}/makedumpfile $RPM_BUILD_ROOT/usr/sbin/makedumpfile @@ -371,6 +383,7 @@ fi %dir %{_sysconfdir}/kdump/pre.d %dir %{_sysconfdir}/kdump/post.d %dir %{_sharedstatedir}/kdump +%dir %{_sharedstatedir}/kdump/kdump-estimate %{_mandir}/man8/kdumpctl.8.gz %{_mandir}/man8/kexec.8.gz %ifarch %{ix86} x86_64 ppc64 s390x ppc64le aarch64 @@ -380,7 +393,11 @@ fi %{_mandir}/man8/vmcore-dmesg.8.gz %{_mandir}/man5/* %{_unitdir}/kdump.service +%{_unitdir}/kdump-estimate.service +%{_unitdir}/kdump-estimate-cleanup.service +%{_unitdir}/multi-user.target.wants/kdump-estimate.service %{_prefix}/lib/systemd/system-generators/kdump-dep-generator.sh +%{_prefix}/lib/systemd/system-shutdown/kdump.shutdown %{_prefix}/lib/kernel/install.d/60-kdump.install %{_prefix}/lib/kernel/install.d/92-crashkernel.install %doc News
From: Kairui Song <kasong at redhat.com>
Add docs about the new reboot estimation.
Signed-off-by: Kairui Song <kasong(a)redhat.com> Signed-off-by: Lichen Liu lichliu@redhat.com --- crashkernel-howto.txt | 100 ++++++++++++++++++++++++++++++++---------- 1 file changed, 78 insertions(+), 22 deletions(-)
diff --git a/crashkernel-howto.txt b/crashkernel-howto.txt index 54e1141..ae5a6a3 100644 --- a/crashkernel-howto.txt +++ b/crashkernel-howto.txt @@ -87,34 +87,90 @@ NOTE: On s390x you also need to run zipl for the change to take effect. Estimate crashkernel ====================
+Estimate manually +----------------- + The best way to estimate a usable crashkernel value is by testing kdump -manually. And you can set crashkernel to a large value, then adjust the -crashkernel value to an acceptable value gradually. +manually. You can set crashkernel to a large value, then adjust the crashkernel +value gradually to an acceptable value. + +You can use /usr/lib/modules/$(uname -r)/crashkernel.default as a reference and +adjust the crashkernel value based on that. + +Some components will consume a significantly large amount of memory for kdump, +eg. AMD SME, LUKS2 with argon2. You may need to increase kdump memory to an +obvious large value in such a case, and unfortunately, there is no good solution +yet.
-`kdumpctl` also provides a sub-command for doing rough estimating without -triggering kdump: +kdumpctl assisted estimation +---------------------------- + +`kdumpctl` also provides a sub-command for helping you to estimate the kdump +crashkernel usage:
`kdumpctl estimate`
-The output will be like this: +By running this `kdumpctl estimate` command directly, it will generate a report +like this:
``` - Encrypted kdump target requires extra memory, assuming using the keyslot with minimum memory requirement - - Reserved crashkernel: 256M - Recommended crashkernel: 655M - - Kernel image size: 47M - Kernel modules size: 12M - Initramfs size: 19M - Runtime reservation: 64M - LUKS required size: 512M - Large modules: - xfs: 1892352 - nouveau: 2318336 - WARNING: Current crashkernel size is lower than recommended size 655M. +NOTE: Encrypted kdump target requires extra memory, assuming using the keyslot with the minimum memory requirement + +Reboot estimation: + Last estimation: Wed Sep 22 01:38:48 PM EDT 2021 + Kernel version: 5.13.8-200.fc34.x86_64 + Estimate took: 51s + Boosted crashkernel: 820M + + Cached memory usage: 99M + Uncached memory usage: 125M + Reserved memory: 63M + + Average runtime memory usage: 177M + +First kernel based estimation: + Reserved crashkernel: 820M + + Kernel image size: 49M + Kernel modules size: 9M + Initramfs size: 20M + Runtime reservation: 64M + LUKS required size: 512M + Large modules: + xfs: 1908736 + nouveau: 2318336 + +Recommended crashkernel: 708M + +WARNING: Current crashkernel size is lower than recommended size 708M ```
-It will generate a summary report about the estimated memory consumption -of each component of kdump. The value may not be accurate enough, but -would be a good start for finding a suitable crashkernel value. +This report is composed of two parts: + +- First kernel based estimation: + +This estimation can estimate the crashkernel value without actually triggering +kdump, it's based on the collectible info in first kernel, eg. kdump resource +file size, kernel modules size. The value may not be accurate enough but would +be a good start for finding a suitable crashkernel value. + +- Reboot estimation + +This estimation result is based on a real kdump run, which is more reliable but +sers has to collect and update the estimate by running this command: +`kdumpctl estimate --reboot`. +his will have to reset your system multiple times to trigger kdump and collect +he result. + reboot estimation run is composed of three stages: ++Reboot stage: Kdump estimation will have to temporarily boost the crashkernel +value via a reboot first. This reboot is done via kexec to speed up the +estimation progress. You can change this to hard reboot with `--hard-reboot` +option if kexec doesn't work for your system. + +Panic stage: The machine is rebooted and crashkernel value is updated, a panic +will be triggered. Kdump will catch the panic, and during this kdump run, extra +Infos are captured for estimating kdump memory usage. + +Final stage: After the kdump dump progress is finished, the machine will reboot +back into normal status again. Kdump service will collect the captured info, and +estimation is done.