Re: [PATCH] x86_64: Reserve extra memory for crashkernel if AMD sme/sev enabled

Thursday, 7 April 2022

On Thu, Apr 7, 2022 at 10:44 PM Philipp Rudo <prudo(a)redhat.com&gt; wrote:
...

 Hi Tao,

 On Wed, 6 Apr 2022 18:55:38 +0800
 Tao Liu <ltao(a)redhat.com&gt; wrote:

 > On Tue, Apr 5, 2022 at 5:37 PM Philipp Rudo <prudo(a)redhat.com&gt; wrote:
 > >
 > > Hi Coiby,
 > > Hi Tao,
 > >
 > > On Sat, 2 Apr 2022 16:43:37 +0800
 > > Coiby Xu <coxu(a)redhat.com&gt; wrote:
 > >
 > > > Hi Tao,
 > > >
 > > > On Sat, Apr 02, 2022 at 02:14:38PM +0800, Tao Liu wrote:
 > > > >Hi Philipp,
 > > > >
 > > > >On Wed, Mar 30, 2022 at 11:34 PM Philipp Rudo <prudo(a)redhat.com&gt;
wrote:
 > > > [...]
 > > > >
 > > > >> > implemented the reservation in the kernel side, as part of
the kernel
 > > > >> > crashkernel=auto implementation.
 > > >
 > > >
 > > > For swiotlb, I think Dave and Emma prefer to do it in kernel space.
 > > > Maybe you can try a kernel space solution first like [1]?
 > > >
 > > > [1]
https://lore.kernel.org/lkml/20190910151341.14986-3-kasong@redhat.com/
 > > >
 > > > >> >
 > > > >> > For rhel9 however, crashkernel=auto has been mirrored to
userspace
 > > > >> > kexec-tools, so this patch will enable the swiotlb
reservation for AMD
 > > > >> > sme/sev in kexec-tools as well.
 > > > >>
 > > > >> Not really about this patch but I find it weird that we adjust
the
 > > > >> crashkernel=auto value for swiotlb but not for other cases wher
we know
 > > > >> that we need a larger crashkernel, e.g. when the dump target is
on a
 > > > >> LUKS encrypted device. I understand that in the past, when
 > > > >> crashkernel=auto was handled by the kernel, we simply didn't
know about
 > > > >> what the dump target will be when we allocated the crashkernel
during
 > > > >> boot. But today, when crashkernel=auto is handled in userspace we
can.
 > > > >>
 > > > >
 > > > >I didn't have much context on LUKS, by looking into the code, I
see only
 > > > >"kdumpctl estimate" has the LUKS related crashkernel value
increment. It needs
 > > > >the user to adjust crashkernel value manually by estimate-and-set.
Maybe we
 > > > >can merge the crashkernel adjustment of swiotlb and luks together,
 > > > >so luks don't need to "kdumpctl estimate" manually
anymore?
 > > >
 > > > For LUKS, a kernel-space solution by reusing LUKS master key is
 > > > preferred because a) LUKS requires too much memory b) we can't expect
 > > > the user to wait at the screen to input the passphrase to open the disk
 > > > when kernel crashes. Let's see how far my work [2] can go.
 > >
 >
 > Yeah, after I played around with LUKS for a while, I agree with your point.
 > kernel side implementation is better for luks.
 >
 > Just for curiosity, as for the case b), can't we use a password file instead
 > of inputting passphrase manually every time? Like:
 >
 >  $ cat /etc/fstab
 >  /dev/mapper/mydata /mnt ext4 defaults 0 0
 >  $ cat /etc/crypttab
 >  mydata /dev/vdb /root/lukspass
 >  $ cryptsetup luksAddKey /dev/vdb /root/lukspass
 >
 > If  /root/lukspass is accessible, luks device can be automatically
 > mounted, thus no
 > user interaction needed.

 Sounds like an interesting idea. My only concern is that we introduce a
 potential security risk, e.g. somebody might extract the key file from
 the initrd and use it to access a disk without permission. But I don't
 have much experience in the area. So I think it's definitely something
 worth to further investigate.

Yes, that indeed is a security risk... I think the /root/lukspass is a
file which
can be managed just like /etc/passwd, /etc/shadow or /root/.ssh/id_rsa, which
only the root user/process has access. The initramfs is also only accessible to
root user for the file permission is 600 on my machine...

...
 > > you are right that it is better to solve the problem with
LUKS in the
 > > kernel. I only used it as an example where today we know that we need
 > > more memory but don't automatically increase the crashkernel. So why do
 > > it for swiotlb but for others? What's the reason for this inconsistency?
 >
 > By looking into the "kdumpctl estimate" code, the memory estimating for
luks is
 > calculated dynamically, depending on how many crypt devices there are, and
 > their "Memory:" field status.
 >
 > However for amd sme/sev swiotlb, the reserved memory for 2nd kernel is 64M,
 > see rhel-8 kernel commit ab30d3f ("[x86] Reserve at most 64M of SWIOTLB
 > memory for crashkernel").
 >
 > I implement it in the kexec-tools side, because the rhel-8 patch is
 > part of craskernel=auto
 > implementation. As far as I know crashkernel=auto has been mirrored to
 > kexec-tools side, so it is reasonable for me to make it in kexec-tools
 > side as well.

 I fully understand. But remember that the crashkernel needs to be
 reserved extremely early during boot. Even before the initramfs
 is mounted. So when crashkernel=auto was implemented in the kernel we
 simply didn't had information about many things, like what the root
 device is and if it is encrypted. swiotlb was one of the few exceptions.
 By moving crashkernel=auto to the kexec-tools we now have the
 information and could use it. If we decide to do so (see Coibys mail).
 Thanks for the explanation!

Thanks,
Tao Liu

...
 > >
 > > And a little bit more general. What is the relation between 'kdumpctl
 > > reset-crashkernel' and 'kdumctl estimate'? The way I see it the
 > > workflow we suggest to users currently is
 > >
 > > 1. $ kdumpctl reser-crashkernel
 > > 2. -> test if it works, if not
 > > 3. $ kdumpctl estimate
 > > 4. -> set crashkernel manually to suggested value
 > > 5. -> test if it works, if not
 > > 6. -> set crashkernel manually to larger value
 > > 7. -> test if it works, if not go back to 6
 > >
 >
 > Thanks for the summarization. I happened to lose
 > the context, which confused me a lot when reading the code.

 np

 > > I don't understand why we need the separate 'kdumpctl estimate'
step in
 > > between. In my opinion the following workflow would be much simpler
 > > from a user perspective
 > >
 > > 1. $ kdumpctl reset-crashkernel
 > >      (calls kdumpctl estimate internally using current default values
 > >       as minimum values)
 > > 2. -> test if it woks, if not
 > > 3. -> set crashkernel manually larger value
 > > 4. -> test if it works, if not go back to 3
 > >
 >
 > Agreed!

 Thanks
 Philipp

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

Re: [PATCH] x86_64: Reserve extra memory for crashkernel if AMD sme/sev enabled