On 05/01/2014 02:58 PM, Vivek Goyal wrote:
On Thu, May 01, 2014 at 01:11:30PM -0400, Prarit Bhargava wrote:
[..]
> diff --git a/98-kexec.rules b/98-kexec.rules
> index 8c742dd..e32ee13 100644
> --- a/98-kexec.rules
> +++ b/98-kexec.rules
> @@ -1,4 +1,4 @@
> -SUBSYSTEM=="cpu", ACTION=="online", PROGRAM="/bin/systemctl
try-restart kdump.service"
> -SUBSYSTEM=="cpu", ACTION=="offline",
PROGRAM="/bin/systemctl try-restart kdump.service"
> -SUBSYSTEM=="memory", ACTION=="add", PROGRAM="/bin/systemctl
try-restart kdump.service"
> -SUBSYSTEM=="memory", ACTION=="remove",
PROGRAM="/bin/systemctl try-restart kdump.service"
> +SUBSYSTEM=="cpu", ACTION=="add", PROGRAM="/bin/systemctl
try-restart kdump.service"
> +SUBSYSTEM=="cpu", ACTION=="remove", PROGRAM="/bin/systemctl
try-restart kdump.service"
So when is "add" event generated. After cpu has been added and that add
operation is complete?
^^^ you used "add" too many times here and you need to define what
"add" means
in your sentence.
Can you ask your question using these udev definitions?
'add' - device has been added to system, but not in service in kernel
'remove' - device has been removed from system
'online' - device has been brought into service in kernel
'offline' - device has been removed from service in kernel
Also, this might answer your question ... This is a udevadm dump of udev events
when hot adding a CPU (physically adding the CPU).
ACPI slot attention on
(ie, press the button and let system know that socket is present)
KERNEL[343.237922] add /devices/LNXSYSTM:00/device:00/LNXCPU:02 (acpi)
KERNEL[343.237979] add /devices/system/cpu/cpu2 (cpu)
UDEV [343.241459] add /devices/system/cpu/cpu2 (cpu)
KERNEL[343.242204] add /module/acpi_cpufreq (module)
UDEV [343.242492] add /module/acpi_cpufreq (module)
KERNEL[343.255279] remove /module/acpi_cpufreq (module)
UDEV [343.255610] remove /module/acpi_cpufreq (module)
KERNEL[343.256586] add /module/pcc_cpufreq (module)
UDEV [343.256801] add /module/pcc_cpufreq (module)
KERNEL[343.260090] remove /module/pcc_cpufreq (module)
UDEV [343.260330] remove /module/pcc_cpufreq (module)
UDEV [343.260622] add /devices/LNXSYSTM:00/device:00/LNXCPU:02 (acpi)
echo 1 > /sys/devices/system/cpu/cpu3/online
(ie bring the cpu into service)
KERNEL[638.683159] add /devices/virtual/msr/msr2 (msr)
KERNEL[638.703675] add /devices/virtual/cpuid/cpu2 (cpuid)
UDEV [638.703691] add /devices/virtual/msr/msr2 (msr)
UDEV [638.703700] add /devices/virtual/cpuid/cpu2 (cpuid)
KERNEL[638.796047] add /devices/virtual/thermal/cooling_device2 (thermal)
KERNEL[638.796071] add /devices/system/machinecheck/machinecheck2
(machinecheck)
KERNEL[638.796082] online /devices/system/cpu/cpu2 (cpu)
UDEV [638.800036] add /devices/virtual/thermal/cooling_device2 (thermal)
UDEV [638.800059] add /devices/system/machinecheck/machinecheck2
(machinecheck)
UDEV [639.535403] online /devices/system/cpu/cpu2 (cpu)
IOW, I don't want a race between kexec-tools looking at
/sys/.../cpu<N>/crash_notes and cpu add operation. It should not happen
that cpu add operation is still in progress while kexec starts poking
in /sys. Otherwise potentially we can miss this cpu in /proc/vmcore.
crash_notes is created when the cpu is added, not when it is onlined. I cannot
predict when a panic occurs, and it has always worked like this so I'm not
agreeing with your concern wrt to the crash_notes file.
P.