On 04/27/19 at 08:57pm, piliu wrote:
Hi Dave,
On 04/26/2019 10:45 AM, Dave Young wrote:
> Hi Pingfan,
> On 04/26/19 at 10:34am, Pingfan Liu wrote:
>> On powerpc, /sys/devices/system/cpu/cpuX nodes are present for all
"possible"
>> CPUs which is what the problem was to start with. As these nodes always exist
>> irrespective of whether a CPU is hot-added/removed, there is no udev
>> ADD/REMOVE event when a CPU is hot-added or hot-removed on powerpc64.
>> Pingfan tried fixing that but it didn't please the maintainer as it breaks
some
>> old userspace tools.
>>
>> The alternative solution for powerpc was to use CPU online/offline events
instead. As
>> crash_notes are already built for all /sys/devices/system/cpu/cpuX nodes and
these nodes
>> are present for all "possible" CPUs
(online/offline/could-be-hot-removed/could-be-hot-added)
>> on powerpc64, a reload is not necessary for CPU hot-remove. But a reload is
still needed
>> for CPU hot-add because KDump kernel fails to get the 'boot_cpuid' and
eventually fails
>> to boot [see early_init_dt_scan_cpus() in arch/powerpc/kernel/prom.c file] if
system crashes
>> on hot-added CPU..
>
This is what is the problem, and the user visible errors.
Maybe I can rearrange the paragraphs, so the commit log can be more
easier to review.
Not really, the user visible error is kdump kernel hangs after "I'm in
purgatory".
Then why it happens, the reason is crash happens on the hot added cpu,
but kexec udev rule to restart kdump service - when a core is added,
is not being triggered
Why kdump service need a restart, the paragraph explained, ... about the
dt code, and boot_cpuid stuff ...
Currently kdump udev rule use add/remove uevent, why not triggered then?
explain it as well.
Then how and what is proposed to fix it
Finally choosed online/offline (offline is not needed), and why.
>
> > The above paragraphs can explain why we do not need "remove/offline",
> > but it does not say the big picture of what this patch is solving.
> >
> > So the patch log still need to add more things. Basically first thing is
> > "What", What is the problem, what happened, especially the user
visible
> > errors. and then how we are going to solve it, why we will workaround
>
> > it instead of fix it in other way.
> >
> >>
> >> Workaround it by using a ppc dedicated udev rules, which uses cpu
online/offline message.
> >> Further more, as for offline message, it is even useless on powerpc, and can
be dropped.
> >>
> >> Signed-off-by: Pingfan Liu <piliu(a)redhat.com>
> >> ---
> >> 98-kexec-ppc.rules | 15 +++++++++++++++
> >> kexec-tools.spec | 10 ++++++++--
> >> 2 files changed, 23 insertions(+), 2 deletions(-)
> >> create mode 100644 98-kexec-ppc.rules
> >>
> >> diff --git a/98-kexec-ppc.rules b/98-kexec-ppc.rules
> >> new file mode 100644
> >> index 0000000..9d783a0
> >> --- /dev/null
> >> +++ b/98-kexec-ppc.rules
> >> @@ -0,0 +1,15 @@
> >> +SUBSYSTEM=="cpu", ACTION=="online",
GOTO="kdump_reload"
> >> +SUBSYSTEM=="memory", ACTION=="online",
GOTO="kdump_reload"
> >> +SUBSYSTEM=="memory", ACTION=="offline",
GOTO="kdump_reload"
> >> +
> >> +GOTO="kdump_reload_end"
> >> +
> >> +LABEL="kdump_reload"
> >> +
> >> +# If kdump is not loaded, calling "kdumpctl reload" will end up
> >> +# doing nothing, but it and systemd-run will always generate
> >> +# extra logs for each call, so trigger the "kdumpctl reload"
> >> +# only if kdump service is active to avoid unnecessary logs
> >> +RUN+="/bin/sh -c '/usr/bin/systemctl is-active kdump.service ||
exit 0; /usr/bin/systemd-run --quiet /usr/bin/kdumpctl reload'"
> >> +
> >> +LABEL="kdump_reload_end"
> >> diff --git a/kexec-tools.spec b/kexec-tools.spec
> >> index a1e6686..2bc983c 100644
> >> --- a/kexec-tools.spec
> >> +++ b/kexec-tools.spec
> >> @@ -18,7 +18,8 @@ Source8: kdump.conf
> >> Source9:
http://downloads.sourceforge.net/project/makedumpfile/makedumpfile/1.6.5/...
> >> Source10: kexec-kdump-howto.txt
> >> Source12: mkdumprd.8
> >> -Source14: 98-kexec.rules
> >> +Source13: 98-kexec.rules
> >> +Source14: 98-kexec-ppc.rules
> >> Source15: kdump.conf.5
> >> Source16: kdump.service
> >> Source18: kdump.sysconfig.s390x
> >> @@ -169,10 +170,15 @@ install -m 644 %{SOURCE25}
$RPM_BUILD_ROOT%{_mandir}/man8/kdumpctl.8
> >> install -m 755 %{SOURCE20}
$RPM_BUILD_ROOT%{_prefix}/lib/kdump/kdump-lib.sh
> >> install -m 755 %{SOURCE23}
$RPM_BUILD_ROOT%{_prefix}/lib/kdump/kdump-lib-initramfs.sh
> >> %ifnarch s390x
> >> +install -m 755 %{SOURCE28}
$RPM_BUILD_ROOT%{_udevrulesdir}/../kdump-udev-throttler
> >> +%endif
> >> +%ifnarch s390x ppc64 ppc64le
> >> # For s390x the ELF header is created in the kdump kernel and therefore
kexec
> >> # udev rules are not required
> >> install -m 644 %{SOURCE14} $RPM_BUILD_ROOT%{_udevrulesdir}/98-kexec.rules
> >> -install -m 755 %{SOURCE28}
$RPM_BUILD_ROOT%{_udevrulesdir}/../kdump-udev-throttler
> >> +%endif
> >> +%ifarch ppc64 ppc64le
> >> +install -m 644 %{SOURCE14}
$RPM_BUILD_ROOT%{_udevrulesdir}/98-kexec-ppc.rules
> >> %endif
> >> install -m 644 %{SOURCE15} $RPM_BUILD_ROOT%{_mandir}/man5/kdump.conf.5
> >> install -m 644 %{SOURCE16} $RPM_BUILD_ROOT%{_unitdir}/kdump.service
> >> --
> >> 2.19.1
> >>
> >
> > Thanks
> > Dave
> >