On Thu, May 01, 2014 at 01:11:30PM -0400, Prarit Bhargava wrote:
When doing cpu softplug via /sys
ie)
echo 0 > /sys/devices/system/cpu/cpuX/online
echo 1 > /sys/devices/system/cpu/cpuX/online
the kdump service eventually stops because it is rate limited by systemd.
[root@dhg3 hoemann]# systemctl status kdump
kdump.service - Crash recovery kernel arming
Loaded: loaded (/usr/lib/systemd/system/kdump.service; enabled)
Active: failed (Result: start-limit) since Tue 2014-04-22 17:02:56 MDT; 59min ago
Process: 8803 ExecStop=/usr/bin/kdumpctl stop (code=exited, status=0/SUCCESS)
Process: 8911 ExecStart=/usr/bin/kdumpctl start (code=killed, signal=TERM)
Main PID: 8911 (code=killed, signal=TERM)
Apr 22 17:02:56 dhg3 systemd[1]: Starting Crash recovery kernel arming...
Apr 22 17:02:56 dhg3 systemd[1]: Stopping Crash recovery kernel arming...
Apr 22 17:02:56 dhg3 systemd[1]: Starting Crash recovery kernel arming...
Apr 22 17:02:56 dhg3 systemd[1]: kdump.service start request repeated too quickly,
refusing to start.
Apr 22 17:02:56 dhg3 systemd[1]: Failed to start Crash recovery kernel arming.
Apr 22 17:02:56 dhg3 systemd[1]: Unit kdump.service entered failed state.
The ratelimiting can be disabled by adding StartLimitInterval=0 to the kdump
service file.
During debugging of this issue additional issues were noted with the udev
rules. The first is the handling of the add & remove states for cpus. CPUs
are added and removed when they are brought into service. kdump, however,
does not need to restart during the add and remove but needs to restart
only when the CPUs memory is allocated or free'd during an online or offline.
Prarit, I think you got above description reversed. Kdump service needs
to restart when cpus are added/removed and not when cpus are
onlined/offlined.
Similarily, when memory modules are onlined and offlined the memory is not
made available for use by the kernel. It does not make sense to restart
kdump until the modules are added and removed from the kernel so the memory
is actually in use or removed from use.
And same here. For memory kdump service needs to start when memory is
onlined/offlined and not when memory is added/removed.
Signed-off-by: Prarit Bhargava <prarit(a)redhat.com>
Cc: Vivek Goyal <vgoyal(a)redhat.com>
---
98-kexec.rules | 8 ++++----
kdump.service | 1 +
2 files changed, 5 insertions(+), 4 deletions(-)
diff --git a/98-kexec.rules b/98-kexec.rules
index 8c742dd..e32ee13 100644
--- a/98-kexec.rules
+++ b/98-kexec.rules
@@ -1,4 +1,4 @@
-SUBSYSTEM=="cpu", ACTION=="online", PROGRAM="/bin/systemctl
try-restart kdump.service"
-SUBSYSTEM=="cpu", ACTION=="offline", PROGRAM="/bin/systemctl
try-restart kdump.service"
-SUBSYSTEM=="memory", ACTION=="add", PROGRAM="/bin/systemctl
try-restart kdump.service"
-SUBSYSTEM=="memory", ACTION=="remove", PROGRAM="/bin/systemctl
try-restart kdump.service"
+SUBSYSTEM=="cpu", ACTION=="add", PROGRAM="/bin/systemctl
try-restart kdump.service"
+SUBSYSTEM=="cpu", ACTION=="remove", PROGRAM="/bin/systemctl
try-restart kdump.service"
+SUBSYSTEM=="memory", ACTION=="online", PROGRAM="/bin/systemctl
try-restart kdump.service"
+SUBSYSTEM=="memory", ACTION=="offline", PROGRAM="/bin/systemctl
try-restart kdump.service"
diff --git a/kdump.service b/kdump.service
index 55b7ca2..24c1386 100644
--- a/kdump.service
+++ b/kdump.service
@@ -7,6 +7,7 @@ Type=oneshot
ExecStart=/usr/bin/kdumpctl start
ExecStop=/usr/bin/kdumpctl stop
RemainAfterExit=yes
+StartLimitInterval=0
Can you please put this change in a separate patch. This is logically a
different change where we want to allow unlimited start of kdump service as
one can generate many cpu/memory events in a very short amount of time.
Thanks
Vivek