It's reported that kdump kernel failed to boot and can't dump vmcore when crashkernel=192M and SME/SEV is active.
This is because swiotlb will be enabled and reserves 64M memory by default on system with SME/SEV enabled. Then kdump kernel will be out of memory after taking 64M away for swiotlb init.
So here add extra 64M memory to default crashkernel value so that kdump kernel can function well as before. When doing that, search journalctl for the "Memory Encryption Features active: AMD" to check if SME or SEV is active. This line of log is printed out in kernel function as below and the type SME is mutual exclusive with type SEV. ***: arch/x86/mm/mem_encrypt.c:print_mem_encrypt_feature_info()
Note: 1) The conditional check is relying on journalctl log because I didn't find available system interface to check if SEV is active. Even though we can check if SME is active via /proc/cpuinfo. For consistency, I take the same check for both SME and SEV by searching journalctl.
2) The conditional check is relying on journalctl log, means it won't work for crashkernel setting in anoconda because the installation kernel doesn't have the SME/SEV setting. So customer need manually run 'kdumpctl reset-crashkernel' to reset crashkernel to add the extra 64M after OS installation.
3) We need watch the line of log printing in print_mem_encrypt_feature_info() in kernel just in case people may change it in the future.
Signed-off-by: Baoquan He bhe@redhat.com --- kdump-lib.sh | 13 ++++++++++++- 1 file changed, 12 insertions(+), 1 deletion(-)
diff --git a/kdump-lib.sh b/kdump-lib.sh index 16238c508f65..d53da6f528ae 100755 --- a/kdump-lib.sh +++ b/kdump-lib.sh @@ -84,6 +84,11 @@ is_generic_fence_kdump() [[ $(kdump_get_conf_val fence_kdump_nodes) ]] }
+is_sme_or_sev_active() +{ + journalctl -b | grep "Memory Encryption Features active: AMD"| grep -q "[SME|SEV]" +} + to_dev_name() { local dev="${1//"/}" @@ -930,7 +935,13 @@ kdump_get_arch_recommend_crashkernel()
_arch=$(uname -m)
- if [[ $_arch == "x86_64" ]] || [[ $_arch == "s390x" ]]; then + if [[ $_arch == "x86_64" ]] ; then + if is_sme_or_sev_active; then + _ck_cmdline="1G-4G:256M,4G-64G:320M,64G-:576M" + else + _ck_cmdline="1G-4G:192M,4G-64G:256M,64G-:512M" + fi + elif [[ $_arch == "s390x" ]]; then _ck_cmdline="1G-4G:192M,4G-64G:256M,64G-:512M" elif [[ $_arch == "aarch64" ]]; then local _running_kernel
Hi Baoquan,
On Thu, 17 Aug 2023 22:53:57 +0800 Baoquan He bhe@redhat.com wrote:
It's reported that kdump kernel failed to boot and can't dump vmcore when crashkernel=192M and SME/SEV is active.
This is because swiotlb will be enabled and reserves 64M memory by default on system with SME/SEV enabled. Then kdump kernel will be out of memory after taking 64M away for swiotlb init.
So here add extra 64M memory to default crashkernel value so that kdump kernel can function well as before. When doing that, search journalctl for the "Memory Encryption Features active: AMD" to check if SME or SEV is active. This line of log is printed out in kernel function as below and the type SME is mutual exclusive with type SEV. ***: arch/x86/mm/mem_encrypt.c:print_mem_encrypt_feature_info()
Note:
The conditional check is relying on journalctl log because I didn't find available system interface to check if SEV is active. Even though we can check if SME is active via /proc/cpuinfo. For consistency, I take the same check for both SME and SEV by searching journalctl.
The conditional check is relying on journalctl log, means it won't work for crashkernel setting in anoconda because the installation kernel doesn't have the SME/SEV setting. So customer need manually run 'kdumpctl reset-crashkernel' to reset crashkernel to add the extra 64M after OS installation.
We need watch the line of log printing in print_mem_encrypt_feature_info() in kernel just in case people may change it in the future.
Those are quite some heavy restrictions. I would like to find a solution that does not rely on checking journalctl. How about instead of checking if SME/SEV is enabled to check how much memory the swiotlb has allocated. IIUC this is provided in /sys/kernel/debug/swiotlb/io_tlb_used. With that the code should also work for s390 (which also uses swiotlb in their SEV equivalent) as well as makes it more future prove when the swiotlb might be allocated dynamically [1].
[1] https://lwn.net/Articles/940973/
Signed-off-by: Baoquan He bhe@redhat.com
kdump-lib.sh | 13 ++++++++++++- 1 file changed, 12 insertions(+), 1 deletion(-)
diff --git a/kdump-lib.sh b/kdump-lib.sh index 16238c508f65..d53da6f528ae 100755 --- a/kdump-lib.sh +++ b/kdump-lib.sh @@ -84,6 +84,11 @@ is_generic_fence_kdump() [[ $(kdump_get_conf_val fence_kdump_nodes) ]] }
+is_sme_or_sev_active() +{
- journalctl -b | grep "Memory Encryption Features active: AMD"| grep -q "[SME|SEV]"
+}
to_dev_name() { local dev="${1//"/}" @@ -930,7 +935,13 @@ kdump_get_arch_recommend_crashkernel()
_arch=$(uname -m)
- if [[ $_arch == "x86_64" ]] || [[ $_arch == "s390x" ]]; then
- if [[ $_arch == "x86_64" ]] ; then
if is_sme_or_sev_active; then
_ck_cmdline="1G-4G:256M,4G-64G:320M,64G-:576M"
else
_ck_cmdline="1G-4G:192M,4G-64G:256M,64G-:512M"
fi
I would prefer if you could follow what Pingfan's did for aarch64 recently. I.e. instead of hard coding the additional memory to collect a "delta" on what needs to be added and _crashkernel_add it in the end.
Thanks Philipp
- elif [[ $_arch == "s390x" ]]; then _ck_cmdline="1G-4G:192M,4G-64G:256M,64G-:512M" elif [[ $_arch == "aarch64" ]]; then local _running_kernel
On Wed, Aug 30, 2023 at 04:05:54PM +0200, Philipp Rudo wrote:
Hi Baoquan,
On Thu, 17 Aug 2023 22:53:57 +0800 Baoquan He bhe@redhat.com wrote:
It's reported that kdump kernel failed to boot and can't dump vmcore when crashkernel=192M and SME/SEV is active.
This is because swiotlb will be enabled and reserves 64M memory by default on system with SME/SEV enabled. Then kdump kernel will be out of memory after taking 64M away for swiotlb init.
So here add extra 64M memory to default crashkernel value so that kdump kernel can function well as before. When doing that, search journalctl for the "Memory Encryption Features active: AMD" to check if SME or SEV is active. This line of log is printed out in kernel function as below and the type SME is mutual exclusive with type SEV. ***: arch/x86/mm/mem_encrypt.c:print_mem_encrypt_feature_info()
Note:
The conditional check is relying on journalctl log because I didn't find available system interface to check if SEV is active. Even though we can check if SME is active via /proc/cpuinfo. For consistency, I take the same check for both SME and SEV by searching journalctl.
The conditional check is relying on journalctl log, means it won't work for crashkernel setting in anoconda because the installation kernel doesn't have the SME/SEV setting. So customer need manually run 'kdumpctl reset-crashkernel' to reset crashkernel to add the extra 64M after OS installation.
We need watch the line of log printing in print_mem_encrypt_feature_info() in kernel just in case people may change it in the future.
Those are quite some heavy restrictions. I would like to find a solution that does not rely on checking journalctl. How about instead of checking if SME/SEV is enabled to check how much memory the swiotlb has allocated. IIUC this is provided in /sys/kernel/debug/swiotlb/io_tlb_used. With that the code should also work for s390 (which also uses swiotlb in their SEV equivalent) as well as makes it more future prove when the swiotlb might be allocated dynamically [1].
How do people work enable SME/SEV? If people enable it by adding the mem_encrypt=on kernel parameter, can we add the extra 64M by detecting this parameter? Since Anaconda supports adding extra kernel parameters, it will make setting crashkernel in Anconda work as well.
Hi Coiby,
On Thu, Aug 31, 2023 at 3:40 PM Coiby Xu coxu@redhat.com wrote:
On Wed, Aug 30, 2023 at 04:05:54PM +0200, Philipp Rudo wrote:
Hi Baoquan,
On Thu, 17 Aug 2023 22:53:57 +0800 Baoquan He bhe@redhat.com wrote:
It's reported that kdump kernel failed to boot and can't dump vmcore when crashkernel=192M and SME/SEV is active.
This is because swiotlb will be enabled and reserves 64M memory by default on system with SME/SEV enabled. Then kdump kernel will be out of memory after taking 64M away for swiotlb init.
So here add extra 64M memory to default crashkernel value so that kdump kernel can function well as before. When doing that, search journalctl for the "Memory Encryption Features active: AMD" to check if SME or SEV is active. This line of log is printed out in kernel function as below and the type SME is mutual exclusive with type SEV. ***: arch/x86/mm/mem_encrypt.c:print_mem_encrypt_feature_info()
Note:
The conditional check is relying on journalctl log because I didn't find available system interface to check if SEV is active. Even though we can check if SME is active via /proc/cpuinfo. For consistency, I take the same check for both SME and SEV by searching journalctl.
The conditional check is relying on journalctl log, means it won't work for crashkernel setting in anoconda because the installation kernel doesn't have the SME/SEV setting. So customer need manually run 'kdumpctl reset-crashkernel' to reset crashkernel to add the extra 64M after OS installation.
We need watch the line of log printing in print_mem_encrypt_feature_info() in kernel just in case people may change it in the future.
Those are quite some heavy restrictions. I would like to find a solution that does not rely on checking journalctl. How about instead of checking if SME/SEV is enabled to check how much memory the swiotlb has allocated. IIUC this is provided in /sys/kernel/debug/swiotlb/io_tlb_used. With that the code should also work for s390 (which also uses swiotlb in their SEV equivalent) as well as makes it more future prove when the swiotlb might be allocated dynamically [1].
How do people work enable SME/SEV? If people enable it by adding the mem_encrypt=on kernel parameter, can we add the extra 64M by detecting this parameter? Since Anaconda supports adding extra kernel parameters, it will make setting crashkernel in Anconda work as well.
I guess checking mem_encrypt=on is not a good way. Because even if you add "mem_encrypt=on" to kernel cmdline, you cannot verify if sev/sme is successfully enabled or not. If sev/sme fails, we will reserve 64M more memory for crashkernel, which is no harm, however I think the check itself is not clean...
Thanks, Tao Liu
-- Best regards, Coiby _______________________________________________ kexec mailing list -- kexec@lists.fedoraproject.org To unsubscribe send an email to kexec-leave@lists.fedoraproject.org Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/ List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedoraproject.org/archives/list/kexec@lists.fedoraproject.org Do not reply to spam, report it: https://pagure.io/fedora-infrastructure/new_issue
On Thu, Aug 31, 2023 at 03:57:32PM +0800, Tao Liu wrote:
Hi Coiby,
On Thu, Aug 31, 2023 at 3:40 PM Coiby Xu coxu@redhat.com wrote:
On Wed, Aug 30, 2023 at 04:05:54PM +0200, Philipp Rudo wrote:
Hi Baoquan,
On Thu, 17 Aug 2023 22:53:57 +0800 Baoquan He bhe@redhat.com wrote:
It's reported that kdump kernel failed to boot and can't dump vmcore when crashkernel=192M and SME/SEV is active.
This is because swiotlb will be enabled and reserves 64M memory by default on system with SME/SEV enabled. Then kdump kernel will be out of memory after taking 64M away for swiotlb init.
So here add extra 64M memory to default crashkernel value so that kdump kernel can function well as before. When doing that, search journalctl for the "Memory Encryption Features active: AMD" to check if SME or SEV is active. This line of log is printed out in kernel function as below and the type SME is mutual exclusive with type SEV. ***: arch/x86/mm/mem_encrypt.c:print_mem_encrypt_feature_info()
Note:
The conditional check is relying on journalctl log because I didn't find available system interface to check if SEV is active. Even though we can check if SME is active via /proc/cpuinfo. For consistency, I take the same check for both SME and SEV by searching journalctl.
The conditional check is relying on journalctl log, means it won't work for crashkernel setting in anoconda because the installation kernel doesn't have the SME/SEV setting. So customer need manually run 'kdumpctl reset-crashkernel' to reset crashkernel to add the extra 64M after OS installation.
We need watch the line of log printing in print_mem_encrypt_feature_info() in kernel just in case people may change it in the future.
Those are quite some heavy restrictions. I would like to find a solution that does not rely on checking journalctl. How about instead of checking if SME/SEV is enabled to check how much memory the swiotlb has allocated. IIUC this is provided in /sys/kernel/debug/swiotlb/io_tlb_used. With that the code should also work for s390 (which also uses swiotlb in their SEV equivalent) as well as makes it more future prove when the swiotlb might be allocated dynamically [1].
How do people work enable SME/SEV? If people enable it by adding the mem_encrypt=on kernel parameter, can we add the extra 64M by detecting this parameter? Since Anaconda supports adding extra kernel parameters, it will make setting crashkernel in Anconda work as well.
I guess checking mem_encrypt=on is not a good way. Because even if you add "mem_encrypt=on" to kernel cmdline, you cannot verify if sev/sme is successfully enabled or not. If sev/sme fails, we will reserve 64M more memory for crashkernel, which is no harm, however I think the check itself is not clean...
Anoter benefit of checking mem_encrypt=on is users could avoid rebooting the system one more time. With current approach, users need to reboot the system so mem_encrypt=on will take effect and reboot the system againt for the crashkernel change to take info effect. I don't know how often mem_encrypt=on will fail to enable SEV/SME but I think whoever enales SME/SEV should confirm it has been truly enabled.
On 08/31/23 at 03:39pm, Coiby Xu wrote:
On Wed, Aug 30, 2023 at 04:05:54PM +0200, Philipp Rudo wrote:
Hi Baoquan,
On Thu, 17 Aug 2023 22:53:57 +0800 Baoquan He bhe@redhat.com wrote:
It's reported that kdump kernel failed to boot and can't dump vmcore when crashkernel=192M and SME/SEV is active.
This is because swiotlb will be enabled and reserves 64M memory by default on system with SME/SEV enabled. Then kdump kernel will be out of memory after taking 64M away for swiotlb init.
So here add extra 64M memory to default crashkernel value so that kdump kernel can function well as before. When doing that, search journalctl for the "Memory Encryption Features active: AMD" to check if SME or SEV is active. This line of log is printed out in kernel function as below and the type SME is mutual exclusive with type SEV. ***: arch/x86/mm/mem_encrypt.c:print_mem_encrypt_feature_info()
Note:
The conditional check is relying on journalctl log because I didn't find available system interface to check if SEV is active. Even though we can check if SME is active via /proc/cpuinfo. For consistency, I take the same check for both SME and SEV by searching journalctl.
The conditional check is relying on journalctl log, means it won't work for crashkernel setting in anoconda because the installation kernel doesn't have the SME/SEV setting. So customer need manually run 'kdumpctl reset-crashkernel' to reset crashkernel to add the extra 64M after OS installation.
We need watch the line of log printing in print_mem_encrypt_feature_info() in kernel just in case people may change it in the future.
Those are quite some heavy restrictions. I would like to find a solution that does not rely on checking journalctl. How about instead of checking if SME/SEV is enabled to check how much memory the swiotlb has allocated. IIUC this is provided in /sys/kernel/debug/swiotlb/io_tlb_used. With that the code should also work for s390 (which also uses swiotlb in their SEV equivalent) as well as makes it more future prove when the swiotlb might be allocated dynamically [1].
How do people work enable SME/SEV? If people enable it by adding the mem_encrypt=on kernel parameter, can we add the extra 64M by detecting this parameter? Since Anaconda supports adding extra kernel parameters, it will make setting crashkernel in Anconda work as well.
We can't check them by detecting "mem_encrypt=on" because we can specify 'mem_encrypt=on' on any machine, amd system w/o sme/sev capability, even on intel system. Even a amd system w/ sme capability, we can disable it in register.
On Fri, Sep 01, 2023 at 09:02:29AM +0800, Baoquan He wrote:
On 08/31/23 at 03:39pm, Coiby Xu wrote:
On Wed, Aug 30, 2023 at 04:05:54PM +0200, Philipp Rudo wrote:
Hi Baoquan,
On Thu, 17 Aug 2023 22:53:57 +0800 Baoquan He bhe@redhat.com wrote:
It's reported that kdump kernel failed to boot and can't dump vmcore when crashkernel=192M and SME/SEV is active.
This is because swiotlb will be enabled and reserves 64M memory by default on system with SME/SEV enabled. Then kdump kernel will be out of memory after taking 64M away for swiotlb init.
So here add extra 64M memory to default crashkernel value so that kdump kernel can function well as before. When doing that, search journalctl for the "Memory Encryption Features active: AMD" to check if SME or SEV is active. This line of log is printed out in kernel function as below and the type SME is mutual exclusive with type SEV. ***: arch/x86/mm/mem_encrypt.c:print_mem_encrypt_feature_info()
Note:
The conditional check is relying on journalctl log because I didn't find available system interface to check if SEV is active. Even though we can check if SME is active via /proc/cpuinfo. For consistency, I take the same check for both SME and SEV by searching journalctl.
The conditional check is relying on journalctl log, means it won't work for crashkernel setting in anoconda because the installation kernel doesn't have the SME/SEV setting. So customer need manually run 'kdumpctl reset-crashkernel' to reset crashkernel to add the extra 64M after OS installation.
We need watch the line of log printing in print_mem_encrypt_feature_info() in kernel just in case people may change it in the future.
Those are quite some heavy restrictions. I would like to find a solution that does not rely on checking journalctl. How about instead of checking if SME/SEV is enabled to check how much memory the swiotlb has allocated. IIUC this is provided in /sys/kernel/debug/swiotlb/io_tlb_used. With that the code should also work for s390 (which also uses swiotlb in their SEV equivalent) as well as makes it more future prove when the swiotlb might be allocated dynamically [1].
How do people work enable SME/SEV? If people enable it by adding the mem_encrypt=on kernel parameter, can we add the extra 64M by detecting this parameter? Since Anaconda supports adding extra kernel parameters, it will make setting crashkernel in Anconda work as well.
We can't check them by detecting "mem_encrypt=on" because we can specify 'mem_encrypt=on' on any machine, amd system w/o sme/sev capability, even on intel system. Even a amd system w/ sme capability, we can disable it in register.
Thanks for the explanation! I could understand mem_encrypt=on doesn't guarantee sme/sev is enabled. But if it's the only way for users to enable sme/sev, personally I would prefer to detecting mem_encrypt=on to improve usability by avoiding rebooting the system one more time. And I don't think users will file a bug against kexec-tools when an extra 64M is reserved when sme/sev isn't truly enabled. Instead users will file a bug about this sme/sev feature when they find mem_encrypt=on doesn't enable sme/sev.
On 09/01/23 at 01:46pm, Coiby Xu wrote:
On Fri, Sep 01, 2023 at 09:02:29AM +0800, Baoquan He wrote:
On 08/31/23 at 03:39pm, Coiby Xu wrote:
On Wed, Aug 30, 2023 at 04:05:54PM +0200, Philipp Rudo wrote:
Hi Baoquan,
On Thu, 17 Aug 2023 22:53:57 +0800 Baoquan He bhe@redhat.com wrote:
It's reported that kdump kernel failed to boot and can't dump vmcore when crashkernel=192M and SME/SEV is active.
This is because swiotlb will be enabled and reserves 64M memory by default on system with SME/SEV enabled. Then kdump kernel will be out of memory after taking 64M away for swiotlb init.
So here add extra 64M memory to default crashkernel value so that kdump kernel can function well as before. When doing that, search journalctl for the "Memory Encryption Features active: AMD" to check if SME or SEV is active. This line of log is printed out in kernel function as below and the type SME is mutual exclusive with type SEV. ***: arch/x86/mm/mem_encrypt.c:print_mem_encrypt_feature_info()
Note:
The conditional check is relying on journalctl log because I didn't find available system interface to check if SEV is active. Even though we can check if SME is active via /proc/cpuinfo. For consistency, I take the same check for both SME and SEV by searching journalctl.
The conditional check is relying on journalctl log, means it won't work for crashkernel setting in anoconda because the installation kernel doesn't have the SME/SEV setting. So customer need manually run 'kdumpctl reset-crashkernel' to reset crashkernel to add the extra 64M after OS installation.
We need watch the line of log printing in print_mem_encrypt_feature_info() in kernel just in case people may change it in the future.
Those are quite some heavy restrictions. I would like to find a solution that does not rely on checking journalctl. How about instead of checking if SME/SEV is enabled to check how much memory the swiotlb has allocated. IIUC this is provided in /sys/kernel/debug/swiotlb/io_tlb_used. With that the code should also work for s390 (which also uses swiotlb in their SEV equivalent) as well as makes it more future prove when the swiotlb might be allocated dynamically [1].
How do people work enable SME/SEV? If people enable it by adding the mem_encrypt=on kernel parameter, can we add the extra 64M by detecting this parameter? Since Anaconda supports adding extra kernel parameters, it will make setting crashkernel in Anconda work as well.
We can't check them by detecting "mem_encrypt=on" because we can specify 'mem_encrypt=on' on any machine, amd system w/o sme/sev capability, even on intel system. Even a amd system w/ sme capability, we can disable it in register.
Thanks for the explanation! I could understand mem_encrypt=on doesn't guarantee sme/sev is enabled. But if it's the only way for users to enable sme/sev, personally I would prefer to detecting mem_encrypt=on to improve usability by avoiding rebooting the system one more time. And I don't think users will file a bug against kexec-tools when an extra 64M is reserved when sme/sev isn't truly enabled. Instead users will file a
Thanks for suggestion, Coiby.
I am fine with this. Just user trusts us that we have optimized default crashkernel value with best effort. If we accept adding extra 64M whenever mem_encrypt=on is added, we can accept adding extra 64M memory alway. The thing is it may not be a optimization with best effort. My personal thinking.
bug about this sme/sev feature when they find mem_encrypt=on doesn't enable sme/sev.
On 08/30/23 at 04:05pm, Philipp Rudo wrote:
Hi Baoquan,
On Thu, 17 Aug 2023 22:53:57 +0800 Baoquan He bhe@redhat.com wrote:
It's reported that kdump kernel failed to boot and can't dump vmcore when crashkernel=192M and SME/SEV is active.
This is because swiotlb will be enabled and reserves 64M memory by default on system with SME/SEV enabled. Then kdump kernel will be out of memory after taking 64M away for swiotlb init.
So here add extra 64M memory to default crashkernel value so that kdump kernel can function well as before. When doing that, search journalctl for the "Memory Encryption Features active: AMD" to check if SME or SEV is active. This line of log is printed out in kernel function as below and the type SME is mutual exclusive with type SEV. ***: arch/x86/mm/mem_encrypt.c:print_mem_encrypt_feature_info()
Note:
The conditional check is relying on journalctl log because I didn't find available system interface to check if SEV is active. Even though we can check if SME is active via /proc/cpuinfo. For consistency, I take the same check for both SME and SEV by searching journalctl.
The conditional check is relying on journalctl log, means it won't work for crashkernel setting in anoconda because the installation kernel doesn't have the SME/SEV setting. So customer need manually run 'kdumpctl reset-crashkernel' to reset crashkernel to add the extra 64M after OS installation.
We need watch the line of log printing in print_mem_encrypt_feature_info() in kernel just in case people may change it in the future.
Thanks for careful checking, Phlipp.
Those are quite some heavy restrictions. I would like to find a
Hmm, Notes I given out are not heavy restrictions, they are notice and explanation about more details.
Note 1) is why I take the way to do it.
3) is a reminder, as we know, kernel printing can't be changed casually because user space utility will check them. This is what we often meet and need to pay attention to when change kernel code.
Note 2) could be a shortage. I talked to Dave, it's acceptable. SME/SEV operator need to some manual operation after OS installation on SME/SEV platform.
solution that does not rely on checking journalctl. How about instead of checking if SME/SEV is enabled to check how much memory the swiotlb has allocated. IIUC this is provided in /sys/kernel/debug/swiotlb/io_tlb_used. With that the code should also work for s390 (which also uses swiotlb in their SEV equivalent) as well as makes it more future prove when the swiotlb might be allocated dynamically [1].
Hmm, people could specify swiotlb= in kernel cmdline, we have filter that out in kdump because kdump usually don't need swiotlb. Except of SME/SEV which implicitly requires that.
Dynamic swiotlb saving memory takes effect mainly on embedded system. On server, it could save memory at early stage of bootup, but finally still need enough memory. Even though dynamic swiotlb is taken, we should prepare sufficient memory for the whole life of kdump kernel. So, here specifying a fixed value which is big enough to satisy swiotlb in all cases is necessary. Imagine what early kdump could encounter.
Hi Baoquan,
On Fri, 1 Sep 2023 08:43:26 +0800 Baoquan He bhe@redhat.com wrote:
On 08/30/23 at 04:05pm, Philipp Rudo wrote:
Hi Baoquan,
On Thu, 17 Aug 2023 22:53:57 +0800 Baoquan He bhe@redhat.com wrote:
It's reported that kdump kernel failed to boot and can't dump vmcore when crashkernel=192M and SME/SEV is active.
This is because swiotlb will be enabled and reserves 64M memory by default on system with SME/SEV enabled. Then kdump kernel will be out of memory after taking 64M away for swiotlb init.
So here add extra 64M memory to default crashkernel value so that kdump kernel can function well as before. When doing that, search journalctl for the "Memory Encryption Features active: AMD" to check if SME or SEV is active. This line of log is printed out in kernel function as below and the type SME is mutual exclusive with type SEV. ***: arch/x86/mm/mem_encrypt.c:print_mem_encrypt_feature_info()
Note:
The conditional check is relying on journalctl log because I didn't find available system interface to check if SEV is active. Even though we can check if SME is active via /proc/cpuinfo. For consistency, I take the same check for both SME and SEV by searching journalctl.
The conditional check is relying on journalctl log, means it won't work for crashkernel setting in anoconda because the installation kernel doesn't have the SME/SEV setting. So customer need manually run 'kdumpctl reset-crashkernel' to reset crashkernel to add the extra 64M after OS installation.
We need watch the line of log printing in print_mem_encrypt_feature_info() in kernel just in case people may change it in the future.
Thanks for careful checking, Phlipp.
Those are quite some heavy restrictions. I would like to find a
Hmm, Notes I given out are not heavy restrictions, they are notice and explanation about more details.
Note 1) is why I take the way to do it.
- is a reminder, as we know, kernel printing can't be changed casually
because user space utility will check them. This is what we often meet and need to pay attention to when change kernel code.
Note 2) could be a shortage. I talked to Dave, it's acceptable. SME/SEV operator need to some manual operation after OS installation on SME/SEV platform.
Fair, "heavy restrictions" was maybe a little too much. Still I would love to use an interface other than the kernel log as the way I see it all the limitation originate from using it. Especially the fact that it currently only works for x86 but not s390 (what about the other arches?) is annoying. Alas I don't see any common interface exits at the moment...
FYI, I've noticed that journalctl has the --dmesg and --grep options. Using those you can avoid parsing the whole user space log and the two pipes to grep.
One more question. In an answer to Coiby you mentioned that SME can be "disabled in register". I believe that is done during boot. So we don't need to consider situations where SME is enabled during boot, i.e. where the message is in the log, but gets disabled later during run time. Is that correct?
solution that does not rely on checking journalctl. How about instead of checking if SME/SEV is enabled to check how much memory the swiotlb has allocated. IIUC this is provided in /sys/kernel/debug/swiotlb/io_tlb_used. With that the code should also work for s390 (which also uses swiotlb in their SEV equivalent) as well as makes it more future prove when the swiotlb might be allocated dynamically [1].
Hmm, people could specify swiotlb= in kernel cmdline, we have filter that out in kdump because kdump usually don't need swiotlb. Except of SME/SEV which implicitly requires that.
True, in that case my suggestion would waste quite some memory. Plus ...
Dynamic swiotlb saving memory takes effect mainly on embedded system. On server, it could save memory at early stage of bootup, but finally still need enough memory. Even though dynamic swiotlb is taken, we should prepare sufficient memory for the whole life of kdump kernel. So, here specifying a fixed value which is big enough to satisy swiotlb in all cases is necessary. Imagine what early kdump could encounter.
... even if the production system needs more than 64M for the swiotlb the kdump kernel could live with less. After all IIUC having more memory for the swiotlb is more a performance optimization but not strictly necessary.
BTW, if I understand Petr's correctly the dynamic swiotlb is not only meant to save memory on embedded systems but also to allow spending more memory on confidential VMs, when a lot of I/O is done via the bounce buffer. At least that is how I understand this point
""" 2) CoCo VMs use bounce buffers for all I/O but may need substantially more than 64 MiB. """
Thanks Philipp
On 09/04/23 at 04:22pm, Philipp Rudo wrote:
Hi Baoquan,
On Fri, 1 Sep 2023 08:43:26 +0800 Baoquan He bhe@redhat.com wrote:
On 08/30/23 at 04:05pm, Philipp Rudo wrote:
Hi Baoquan,
On Thu, 17 Aug 2023 22:53:57 +0800 Baoquan He bhe@redhat.com wrote:
It's reported that kdump kernel failed to boot and can't dump vmcore when crashkernel=192M and SME/SEV is active.
This is because swiotlb will be enabled and reserves 64M memory by default on system with SME/SEV enabled. Then kdump kernel will be out of memory after taking 64M away for swiotlb init.
So here add extra 64M memory to default crashkernel value so that kdump kernel can function well as before. When doing that, search journalctl for the "Memory Encryption Features active: AMD" to check if SME or SEV is active. This line of log is printed out in kernel function as below and the type SME is mutual exclusive with type SEV. ***: arch/x86/mm/mem_encrypt.c:print_mem_encrypt_feature_info()
Note:
The conditional check is relying on journalctl log because I didn't find available system interface to check if SEV is active. Even though we can check if SME is active via /proc/cpuinfo. For consistency, I take the same check for both SME and SEV by searching journalctl.
The conditional check is relying on journalctl log, means it won't work for crashkernel setting in anoconda because the installation kernel doesn't have the SME/SEV setting. So customer need manually run 'kdumpctl reset-crashkernel' to reset crashkernel to add the extra 64M after OS installation.
We need watch the line of log printing in print_mem_encrypt_feature_info() in kernel just in case people may change it in the future.
Thanks for careful checking, Phlipp.
Those are quite some heavy restrictions. I would like to find a
Hmm, Notes I given out are not heavy restrictions, they are notice and explanation about more details.
Note 1) is why I take the way to do it.
- is a reminder, as we know, kernel printing can't be changed casually
because user space utility will check them. This is what we often meet and need to pay attention to when change kernel code.
Note 2) could be a shortage. I talked to Dave, it's acceptable. SME/SEV operator need to some manual operation after OS installation on SME/SEV platform.
Fair, "heavy restrictions" was maybe a little too much. Still I would love to use an interface other than the kernel log as the way I see it all the limitation originate from using it. Especially the fact that it currently only works for x86 but not s390 (what about the other arches?) is annoying. Alas I don't see any common interface exits at the moment...
Yes, when I did the investigation, I truly wanted to find system interface to judge if they are active or not. After checking on machines with sme or sev enabled, and talking to people, there isn't an available interface for SEV to check.
This patch adds is_sme_or_sev_active() which is for sme/sev only. The later support for TDX on x86, s390 mem encryption could need another function adding. For TDX, we may be able to adjust the is_sme_or_sev_active(), however I will take interface way if there's one. It's the same too for s390.
FYI, I've noticed that journalctl has the --dmesg and --grep options. Using those you can avoid parsing the whole user space log and the two pipes to grep.
Fine to me. It looks like below.
journalctl -q --dmesg --grep "Memory Encryption Features active: AMD [SME|SEV]"
One more question. In an answer to Coiby you mentioned that SME can be "disabled in register". I believe that is done during boot. So we don't need to consider situations where SME is enabled during boot, i.e. where the message is in the log, but gets disabled later during run time. Is that correct?
Correct. I said so to mean 'mem_encrypt=on' in kernel cmdline is not a guarantee of active SME/SEV.
solution that does not rely on checking journalctl. How about instead of checking if SME/SEV is enabled to check how much memory the swiotlb has allocated. IIUC this is provided in /sys/kernel/debug/swiotlb/io_tlb_used. With that the code should also work for s390 (which also uses swiotlb in their SEV equivalent) as well as makes it more future prove when the swiotlb might be allocated dynamically [1].
Hmm, people could specify swiotlb= in kernel cmdline, we have filter that out in kdump because kdump usually don't need swiotlb. Except of SME/SEV which implicitly requires that.
True, in that case my suggestion would waste quite some memory. Plus ...
Right.
And, before the dynamic swiotlb, SME/SEV will add extra 64M to swiotlb when swiotlb= has been specified in cmdline. That could add quite a lot of memory for swiotlb even though it's not needed because we will filter out the swiotlb= in kdump kernel cmdline.
Dynamic swiotlb saving memory takes effect mainly on embedded system. On server, it could save memory at early stage of bootup, but finally still need enough memory. Even though dynamic swiotlb is taken, we should prepare sufficient memory for the whole life of kdump kernel. So, here specifying a fixed value which is big enough to satisy swiotlb in all cases is necessary. Imagine what early kdump could encounter.
... even if the production system needs more than 64M for the swiotlb the kdump kernel could live with less. After all IIUC having more memory for the swiotlb is more a performance optimization but not strictly necessary.
BTW, if I understand Petr's correctly the dynamic swiotlb is not only meant to save memory on embedded systems but also to allow spending more memory on confidential VMs, when a lot of I/O is done via the bounce buffer. At least that is how I understand this point
""" 2) CoCo VMs use bounce buffers for all I/O but may need substantially more than 64 MiB.
Agree. Seems the CoCo VMs Petr mentioned need more bounce buffer slots to function better.
Hi Baoquan,
On Wed, 6 Sep 2023 20:09:48 +0800 Baoquan He bhe@redhat.com wrote:
On 09/04/23 at 04:22pm, Philipp Rudo wrote:
Hi Baoquan,
On Fri, 1 Sep 2023 08:43:26 +0800 Baoquan He bhe@redhat.com wrote:
On 08/30/23 at 04:05pm, Philipp Rudo wrote:
Hi Baoquan,
On Thu, 17 Aug 2023 22:53:57 +0800 Baoquan He bhe@redhat.com wrote:
It's reported that kdump kernel failed to boot and can't dump vmcore when crashkernel=192M and SME/SEV is active.
This is because swiotlb will be enabled and reserves 64M memory by default on system with SME/SEV enabled. Then kdump kernel will be out of memory after taking 64M away for swiotlb init.
So here add extra 64M memory to default crashkernel value so that kdump kernel can function well as before. When doing that, search journalctl for the "Memory Encryption Features active: AMD" to check if SME or SEV is active. This line of log is printed out in kernel function as below and the type SME is mutual exclusive with type SEV. ***: arch/x86/mm/mem_encrypt.c:print_mem_encrypt_feature_info()
Note:
The conditional check is relying on journalctl log because I didn't find available system interface to check if SEV is active. Even though we can check if SME is active via /proc/cpuinfo. For consistency, I take the same check for both SME and SEV by searching journalctl.
The conditional check is relying on journalctl log, means it won't work for crashkernel setting in anoconda because the installation kernel doesn't have the SME/SEV setting. So customer need manually run 'kdumpctl reset-crashkernel' to reset crashkernel to add the extra 64M after OS installation.
We need watch the line of log printing in print_mem_encrypt_feature_info() in kernel just in case people may change it in the future.
Thanks for careful checking, Phlipp.
Those are quite some heavy restrictions. I would like to find a
Hmm, Notes I given out are not heavy restrictions, they are notice and explanation about more details.
Note 1) is why I take the way to do it.
- is a reminder, as we know, kernel printing can't be changed casually
because user space utility will check them. This is what we often meet and need to pay attention to when change kernel code.
Note 2) could be a shortage. I talked to Dave, it's acceptable. SME/SEV operator need to some manual operation after OS installation on SME/SEV platform.
Fair, "heavy restrictions" was maybe a little too much. Still I would love to use an interface other than the kernel log as the way I see it all the limitation originate from using it. Especially the fact that it currently only works for x86 but not s390 (what about the other arches?) is annoying. Alas I don't see any common interface exits at the moment...
Yes, when I did the investigation, I truly wanted to find system interface to judge if they are active or not. After checking on machines with sme or sev enabled, and talking to people, there isn't an available interface for SEV to check.
This patch adds is_sme_or_sev_active() which is for sme/sev only. The later support for TDX on x86, s390 mem encryption could need another function adding. For TDX, we may be able to adjust the is_sme_or_sev_active(), however I will take interface way if there's one. It's the same too for s390.
personally I would make is_sme_or_sev_active more generic as is_cvm and handle all the different implementations in a single function. But that is something we can do once we add support for other vendors.
FYI, I've noticed that journalctl has the --dmesg and --grep options. Using those you can avoid parsing the whole user space log and the two pipes to grep.
Fine to me. It looks like below.
journalctl -q --dmesg --grep "Memory Encryption Features active: AMD [SME|SEV]"
Looks good. Two comments though
- The square brackets '[]' denote a character class not a group. Meaning that the expression will match if one of the characters contained is present at that location. So it would also match "... AMD Soup", "... AMD Extra" etc.. What you probably want to use are round brackets '()' which denote a group, aka. substring.
- With the round brackets you need to be careful as that will also return true on partial matches, e.g. for "... AMD SEV-ES". Not sure if that is expected or not. In case it's not I think it is easiest to simply match for the full line, i.e. put everything into ^...$. At least that is the least error prone method for me.
Putting all together that should lead to.
journalctl -q --dmesg --grep "^Memory Encryption Features active: (Intel TDX|AMD (SME|SEV))$"
Thanks Philipp
One more question. In an answer to Coiby you mentioned that SME can be "disabled in register". I believe that is done during boot. So we don't need to consider situations where SME is enabled during boot, i.e. where the message is in the log, but gets disabled later during run time. Is that correct?
Correct. I said so to mean 'mem_encrypt=on' in kernel cmdline is not a guarantee of active SME/SEV.
solution that does not rely on checking journalctl. How about instead of checking if SME/SEV is enabled to check how much memory the swiotlb has allocated. IIUC this is provided in /sys/kernel/debug/swiotlb/io_tlb_used. With that the code should also work for s390 (which also uses swiotlb in their SEV equivalent) as well as makes it more future prove when the swiotlb might be allocated dynamically [1].
Hmm, people could specify swiotlb= in kernel cmdline, we have filter that out in kdump because kdump usually don't need swiotlb. Except of SME/SEV which implicitly requires that.
True, in that case my suggestion would waste quite some memory. Plus ...
Right.
And, before the dynamic swiotlb, SME/SEV will add extra 64M to swiotlb when swiotlb= has been specified in cmdline. That could add quite a lot of memory for swiotlb even though it's not needed because we will filter out the swiotlb= in kdump kernel cmdline.
Dynamic swiotlb saving memory takes effect mainly on embedded system. On server, it could save memory at early stage of bootup, but finally still need enough memory. Even though dynamic swiotlb is taken, we should prepare sufficient memory for the whole life of kdump kernel. So, here specifying a fixed value which is big enough to satisy swiotlb in all cases is necessary. Imagine what early kdump could encounter.
... even if the production system needs more than 64M for the swiotlb the kdump kernel could live with less. After all IIUC having more memory for the swiotlb is more a performance optimization but not strictly necessary.
BTW, if I understand Petr's correctly the dynamic swiotlb is not only meant to save memory on embedded systems but also to allow spending more memory on confidential VMs, when a lot of I/O is done via the bounce buffer. At least that is how I understand this point
""" 2) CoCo VMs use bounce buffers for all I/O but may need substantially more than 64 MiB.
Agree. Seems the CoCo VMs Petr mentioned need more bounce buffer slots to function better.
On 09/08/23 at 01:09pm, Philipp Rudo wrote:
Hi Baoquan,
On Wed, 6 Sep 2023 20:09:48 +0800 Baoquan He bhe@redhat.com wrote:
On 09/04/23 at 04:22pm, Philipp Rudo wrote:
Hi Baoquan,
On Fri, 1 Sep 2023 08:43:26 +0800 Baoquan He bhe@redhat.com wrote:
On 08/30/23 at 04:05pm, Philipp Rudo wrote:
Hi Baoquan,
On Thu, 17 Aug 2023 22:53:57 +0800 Baoquan He bhe@redhat.com wrote:
It's reported that kdump kernel failed to boot and can't dump vmcore when crashkernel=192M and SME/SEV is active.
This is because swiotlb will be enabled and reserves 64M memory by default on system with SME/SEV enabled. Then kdump kernel will be out of memory after taking 64M away for swiotlb init.
So here add extra 64M memory to default crashkernel value so that kdump kernel can function well as before. When doing that, search journalctl for the "Memory Encryption Features active: AMD" to check if SME or SEV is active. This line of log is printed out in kernel function as below and the type SME is mutual exclusive with type SEV. ***: arch/x86/mm/mem_encrypt.c:print_mem_encrypt_feature_info()
Note:
The conditional check is relying on journalctl log because I didn't find available system interface to check if SEV is active. Even though we can check if SME is active via /proc/cpuinfo. For consistency, I take the same check for both SME and SEV by searching journalctl.
The conditional check is relying on journalctl log, means it won't work for crashkernel setting in anoconda because the installation kernel doesn't have the SME/SEV setting. So customer need manually run 'kdumpctl reset-crashkernel' to reset crashkernel to add the extra 64M after OS installation.
We need watch the line of log printing in print_mem_encrypt_feature_info() in kernel just in case people may change it in the future.
Thanks for careful checking, Phlipp.
Those are quite some heavy restrictions. I would like to find a
Hmm, Notes I given out are not heavy restrictions, they are notice and explanation about more details.
Note 1) is why I take the way to do it.
- is a reminder, as we know, kernel printing can't be changed casually
because user space utility will check them. This is what we often meet and need to pay attention to when change kernel code.
Note 2) could be a shortage. I talked to Dave, it's acceptable. SME/SEV operator need to some manual operation after OS installation on SME/SEV platform.
Fair, "heavy restrictions" was maybe a little too much. Still I would love to use an interface other than the kernel log as the way I see it all the limitation originate from using it. Especially the fact that it currently only works for x86 but not s390 (what about the other arches?) is annoying. Alas I don't see any common interface exits at the moment...
Yes, when I did the investigation, I truly wanted to find system interface to judge if they are active or not. After checking on machines with sme or sev enabled, and talking to people, there isn't an available interface for SEV to check.
This patch adds is_sme_or_sev_active() which is for sme/sev only. The later support for TDX on x86, s390 mem encryption could need another function adding. For TDX, we may be able to adjust the is_sme_or_sev_active(), however I will take interface way if there's one. It's the same too for s390.
personally I would make is_sme_or_sev_active more generic as is_cvm and handle all the different implementations in a single function. But that is something we can do once we add support for other vendors.
Yeah, I have the same thought. Let's handle the existing thing because we don't know for sure what those future features really look like.
FYI, I've noticed that journalctl has the --dmesg and --grep options. Using those you can avoid parsing the whole user space log and the two pipes to grep.
Fine to me. It looks like below.
journalctl -q --dmesg --grep "Memory Encryption Features active: AMD [SME|SEV]"
Looks good. Two comments though
- The square brackets '[]' denote a character class not a group. Meaning that the expression will match if one of the characters contained is present at that location. So it would also match "... AMD Soup", "... AMD Extra" etc.. What you probably want to use are round brackets '()' which denote a group, aka. substring.
That actually is my plan. I didn't know if SEV-ES and SEV-NP have the same bahaviour as SEV since they are still undergoing dev, that's why I added Vitaly and Jie in the CC. However, as said earlier, we should handle things which have been done. Hence let me change it to only focus on SME/SEV for the time being. We can adjust those later once they are done, just one by one. Thanks.
- With the round brackets you need to be careful as that will also return true on partial matches, e.g. for "... AMD SEV-ES". Not sure if that is expected or not. In case it's not I think it is easiest to simply match for the full line, i.e. put everything into ^...$. At least that is the least error prone method for me.
Putting all together that should lead to.
journalctl -q --dmesg --grep "^Memory Encryption Features active: (Intel TDX|AMD (SME|SEV))$"
TDX hasn't landed in, let's put it aside for the time being. So it's like:
journalctl -q --dmesg --grep "^Memory Encryption Features active: AMD (SME|SEV)$"
One more question. In an answer to Coiby you mentioned that SME can be "disabled in register". I believe that is done during boot. So we don't need to consider situations where SME is enabled during boot, i.e. where the message is in the log, but gets disabled later during run time. Is that correct?
Correct. I said so to mean 'mem_encrypt=on' in kernel cmdline is not a guarantee of active SME/SEV.
solution that does not rely on checking journalctl. How about instead of checking if SME/SEV is enabled to check how much memory the swiotlb has allocated. IIUC this is provided in /sys/kernel/debug/swiotlb/io_tlb_used. With that the code should also work for s390 (which also uses swiotlb in their SEV equivalent) as well as makes it more future prove when the swiotlb might be allocated dynamically [1].
Hmm, people could specify swiotlb= in kernel cmdline, we have filter that out in kdump because kdump usually don't need swiotlb. Except of SME/SEV which implicitly requires that.
True, in that case my suggestion would waste quite some memory. Plus ...
Right.
And, before the dynamic swiotlb, SME/SEV will add extra 64M to swiotlb when swiotlb= has been specified in cmdline. That could add quite a lot of memory for swiotlb even though it's not needed because we will filter out the swiotlb= in kdump kernel cmdline.
Dynamic swiotlb saving memory takes effect mainly on embedded system. On server, it could save memory at early stage of bootup, but finally still need enough memory. Even though dynamic swiotlb is taken, we should prepare sufficient memory for the whole life of kdump kernel. So, here specifying a fixed value which is big enough to satisy swiotlb in all cases is necessary. Imagine what early kdump could encounter.
... even if the production system needs more than 64M for the swiotlb the kdump kernel could live with less. After all IIUC having more memory for the swiotlb is more a performance optimization but not strictly necessary.
BTW, if I understand Petr's correctly the dynamic swiotlb is not only meant to save memory on embedded systems but also to allow spending more memory on confidential VMs, when a lot of I/O is done via the bounce buffer. At least that is how I understand this point
""" 2) CoCo VMs use bounce buffers for all I/O but may need substantially more than 64 MiB.
Agree. Seems the CoCo VMs Petr mentioned need more bounce buffer slots to function better.