Hi Philipp,
On Wed, Mar 30, 2022 at 11:34 PM Philipp Rudo <prudo(a)redhat.com> wrote:
Hi Tao,
On Tue, 29 Mar 2022 20:45:33 +0800
Tao Liu <ltao(a)redhat.com> wrote:
> When AMD sme/sev enabled, it will require extra memory reserved for swiotlb.
> Currently the swiotlb kernel parameter is removed by /etc/sysconfig/kdump for
> 2nd kernel, so a 64MB memory for swiotlb will be reserved by default [1],
> and the 2nd kernel will fail of OOM with the default crashkernel value.
>
> swiotlb memory is mandatory for AMD sme/sev. For rhel8, kernel commit
> 0adb0f4 ("[x86] Reserve at most 64M of SWIOTLB memory for crashkernel")
is that commit id correct? I cannot find it in my rhel-8 clone...
Sorry, it's ab30d3f, I copied the wrong commit id...
> implemented the reservation in the kernel side, as part of the
kernel
> crashkernel=auto implementation.
>
> For rhel9 however, crashkernel=auto has been mirrored to userspace
> kexec-tools, so this patch will enable the swiotlb reservation for AMD
> sme/sev in kexec-tools as well.
Not really about this patch but I find it weird that we adjust the
crashkernel=auto value for swiotlb but not for other cases wher we know
that we need a larger crashkernel, e.g. when the dump target is on a
LUKS encrypted device. I understand that in the past, when
crashkernel=auto was handled by the kernel, we simply didn't know about
what the dump target will be when we allocated the crashkernel during
boot. But today, when crashkernel=auto is handled in userspace we can.
I didn't have much context on LUKS, by looking into the code, I see only
"kdumpctl estimate" has the LUKS related crashkernel value increment. It needs
the user to adjust crashkernel value manually by estimate-and-set. Maybe we
can merge the crashkernel adjustment of swiotlb and luks together,
so luks don't need to "kdumpctl estimate" manually anymore?
> [1]:
https://elixir.bootlin.com/linux/v5.14/source/arch/x86/mm/mem_encrypt.c#L200
>
> Signed-off-by: Tao Liu <ltao(a)redhat.com>
> ---
> kdump-lib-initramfs.sh | 10 ++++++++++
> kdump-lib.sh | 11 +++++++++--
> kdumpctl | 7 ++++++-
> 3 files changed, 25 insertions(+), 3 deletions(-)
>
> diff --git a/kdump-lib-initramfs.sh b/kdump-lib-initramfs.sh
> index 9be0fe9..699babd 100755
> --- a/kdump-lib-initramfs.sh
> +++ b/kdump-lib-initramfs.sh
> @@ -8,6 +8,11 @@ KDUMP_CONFIG_FILE="/etc/kdump.conf"
> FENCE_KDUMP_CONFIG_FILE="/etc/sysconfig/fence_kdump"
> FENCE_KDUMP_SEND="/usr/libexec/fence_kdump_send"
>
> +# Reserve 64MB for swiotlb when AMD sme/sev enabled.
> +AMD_SME_SEV_SWIOTLB=64
> +# Copied from include/linux/swiotlb.h
> +IO_TLB_SHIFT=11
> +
> # Read kdump config in well formated style
> kdump_read_conf()
> {
> @@ -130,3 +135,8 @@ is_fs_dump_target()
> {
> [ -n "$(kdump_get_conf_val "ext[234]\|xfs\|btrfs\|minix")"
]
> }
> +
> +is_amd_sme_sev_enabled()
> +{
> + dmesg | grep -sq "AMD Memory Encryption Features active"
Is there a more reliable way, e.g. a sysfs attribute, to determine if
SEV is enabled?
The dmesg has the problem that it is stored in a ring buffer. Which
means if it runs full old entries are overwritten by new ones. So you
cannot fully rely on it.
I agree with your point, dmesg isn't the ideal way to detect sme/sev enablement.
However I didn't find a better way to do it.
Actually we need to verify the following:
1) sme enabled on host
2) sev enabled on host
3) sev enabled on VM guest
Only case 2 can be verified through $(cat /sys/module/kvm_amd/parameters/sev),
$(dmesg | grep -sq "AMD Memory Encryption Features active") is generic for the
3
cases though...
Thanks,
Tao Liu
Thanks
Philipp
> +}
> diff --git a/kdump-lib.sh b/kdump-lib.sh
> index 4ed5035..8ea6bfe 100755
> --- a/kdump-lib.sh
> +++ b/kdump-lib.sh
> @@ -831,7 +831,7 @@ get_recommend_size()
> # $1 dump mode, if not specified, dump_mode will be judged by is_fadump_capable
> kdump_get_arch_recommend_crashkernel()
> {
> - local _arch _ck_cmdline _dump_mode
> + local _arch _ck_cmdline _dump_mode _mem_enc
>
> if [[ -z "$1" ]]; then
> if is_fadump_capable; then
> @@ -845,7 +845,14 @@ kdump_get_arch_recommend_crashkernel()
>
> _arch=$(uname -m)
>
> - if [[ $_arch == "x86_64" ]] || [[ $_arch == "s390x" ]];
then
> + if [[ $_arch == "x86_64" ]]; then
> + if is_amd_sme_sev_enabled; then
> + _mem_enc=$AMD_SME_SEV_SWIOTLB
> + else
> + _mem_enc=0
> + fi
> +
_ck_cmdline="1G-4G:$((192+$_mem_enc))M,4G-64G:$((256+$_mem_enc))M,64G-:$((512+$_mem_enc))M"
> + elif [[ $_arch == "s390x" ]]; then
> _ck_cmdline="1G-4G:192M,4G-64G:256M,64G-:512M"
> elif [[ $_arch == "aarch64" ]]; then
> _ck_cmdline="2G-:448M"
> diff --git a/kdumpctl b/kdumpctl
> index 1869753..50aafda 100755
> --- a/kdumpctl
> +++ b/kdumpctl
> @@ -21,6 +21,7 @@
FADUMP_REGISTER_SYS_NODE="/sys/kernel/fadump_registered"
> #kdump shall be the default dump mode
> DEFAULT_DUMP_MODE="kdump"
> image_time=0
> +size_mb=$((1024 * 1024))
>
> standard_kexec_args="-d -p"
>
> @@ -622,6 +623,11 @@ load_kdump()
> {
> local ret
>
> + # Force 2nd kernel use swiotlb for AMD sme/sve to 64MB at most.
> + if is_amd_sme_sev_enabled; then
> + KDUMP_COMMANDLINE_APPEND+=" swiotlb=$((($AMD_SME_SEV_SWIOTLB *
$size_mb) >> $IO_TLB_SHIFT)) "
> + fi
> +
> KEXEC_ARGS=$(prepare_kexec_args "${KEXEC_ARGS}")
> KDUMP_COMMANDLINE=$(prepare_cmdline "${KDUMP_COMMANDLINE}"
"${KDUMP_COMMANDLINE_REMOVE}" "${KDUMP_COMMANDLINE_APPEND}")
>
> @@ -1189,7 +1195,6 @@ do_estimate()
> local -A large_mods
> local baseline
> local kernel_size mod_size initrd_size baseline_size runtime_size
reserved_size estimated_size recommended_size
> - local size_mb=$((1024 * 1024))
>
> setup_initrd
> if [[ ! -f $TARGET_INITRD ]]; then