This patch adds fadump howto document to kexec-tools. The document
is prepared in reference to kexec-kdump-howto.txt document.
Signed-off-by: Mahesh Salgaonkar <mahesh(a)linux.vnet.ibm.com>
Signed-off-by: Hari Bathini <hbathini(a)linux.vnet.ibm.com>
---
fadump-howto.txt | 250 ++++++++++++++++++++++++++++++++++++++++++++++++++++++
1 file changed, 250 insertions(+)
create mode 100644 fadump-howto.txt
diff --git a/fadump-howto.txt b/fadump-howto.txt
new file mode 100644
index 0000000..16f8549
--- /dev/null
+++ b/fadump-howto.txt
@@ -0,0 +1,250 @@
+Firmware assisted dump (fadump) HOWTO
+
+Introduction
+
+Firmware assisted dump is a new feature in the 3.4 mainline kernel supported
+only on powerpc architecture. The goal of firmware-assisted dump is to enable
+the dump of a crashed system, and to do so from a fully-reset system, and to
+minimize the total elapsed time until the system is back in production use. A
+complete documentation on implementation can be found at
+Documentation/powerpc/firmware-assisted-dump.txt in upstream linux kernel tree
+from 3.4 version and above.
+
+Please note that the firmware-assisted dump feature is only available on Power6
+and above systems with recent firmware versions.
+
+Overview
+
+Fadump
+
+Fadump is a robust kernel crash dumping mechanism to get reliable kernel crash
+dump with assistance from firmware. This approach does not use kexec, instead
+firmware assists in booting the kdump kernel while preserving memory contents.
+Unlike kdump, the system is fully reset, and loaded with a fresh copy of the
+kernel. In particular, PCI and I/O devices are reinitialized and are in a
+clean, consistent state. This second kernel, often called a capture kernel,
+boots with very little memory and captures the dump image.
+
+The first kernel registers the sections of memory with the Power firmware for
+dump preservation during OS initialization. These registered sections of memory
+are reserved by the first kernel during early boot. When a system crashes, the
+Power firmware fully resets the system, preserves all the system memory
+contents, save the low memory (boot memory of size larger of 5% of system
+RAM or 256MB) of RAM to the previous registered region. It will also save
+system registers, and hardware PTE's.
+
+Fadump is supported only on ppc64 platform. The standard kernel and capture
+kernel are one and the same on ppc64.
+
+If you're reading this document, you should already have kexec-tools
+installed. If not, you install it via the following command:
+
+ # yum install kexec-tools
+
+Fadump Operational Flow:
+
+Like kdump, fadump also exports the ELF formatted kernel crash dump through
+/proc/vmcore. Hence existing kdump infrastructure can be used to capture fadump
+vmcore. The idea is to keep the functionality transparent to end user. From
+user perspective there is no change in the way kdump init script works.
+
+However, unlike kdump, fadump does not pre-load kdump kernel and initrd into
+reserved memory, instead it always uses default OS initrd during second boot
+after crash. Hence, for fadump, we rebuild the new kdump initrd and replace it
+with default initrd. Before replacing existing default initrd we take a backup
+of original default initrd which is restored back when user decides to switch
+to kdump. The dracut package has been enhanced to rebuild the default initrd
+with vmcore capture steps as per /etc/kdump.conf
+
+The control flow of fadump works as follows:
+01. System panics.
+02. At the crash, kernel informs power firmware that kernel has crashed.
+03. Firmware takes the control and reboots the entire system preserving
+ only the memory (resets all other devices).
+04. The reboot follows the normal booting process (non-kexec).
+05. The boot loader loads the default kernel and initrd from /boot
+06. The default initrd loads and runs /init
+07. dracut-kdump.sh script present in fadump aware default initrd checks if
+ '/proc/vmcore' file exists before executing steps to capture vmcore.
+ (This check will help to bypass the vmcore capture steps during normal boot
+ process.)
+09. Captures dump according to /etc/kdump.conf
+10. Is dump capture successful (yes goto 12, no goto 11)
+11. Perfom the default action specified in /etc/kdump.conf (Default action
+ is reboot, if unspecified)
+12. Reboot
+
+
+How to configure fadump:
+
+Again, we assume if you're reading this document, you should already have
+kexec-tools installed. If not, you install it via the following command:
+
+ # yum install kexec-tools
+
+To be able to do much of anything interesting in the way of debug analysis,
+you'll also need to install the kernel-debuginfo package, of the same arch
+as your running kernel, and the crash utility:
+
+ # yum --enablerepo=\*debuginfo install kernel-debuginfo.$(uname -m) crash
+
+Next up, we need to modify some boot parameters to enable firmware assisted
+dump. With the help of grubby, it's very easy to append "fadump=on" to the
end
+of your kernel boot parameters. Optionally, user can also append
+'fadump_reserve_mem=X' kernel cmdline to specify size of the memory to reserve
+for boot memory dump preservation.
+
+ # grubby --args="fadump=on" --update-kernel=/boot/vmlinuz-`uname -r`
+
+The term 'boot memory' means size of the low memory chunk that is required for
+a kernel to boot successfully when booted with restricted memory. By default,
+the boot memory size will be the larger of 5% of system RAM or 256MB.
+Alternatively, user can also specify boot memory size through boot parameter
+'fadump_reserve_mem=' which will override the default calculated size. Use this
+option if default boot memory size is not sufficient for second kernel to boot
+successfully.
+
+After making said changes, reboot your system, so that the specified memory is
+reserved and left untouched by the normal system. Take note that the output of
+'free -m' will show X MB less memory than without this parameter, which is
+expected. If you see OOM (Out Of Memory) error messages while loading capture
+kernel, then you should bump up the memory reservation size.
+
+Now that you've got that reserved memory region set up, you want to turn on
+the kdump init script:
+
+ # systemctl enable kdump.service
+
+Then, start up kdump as well:
+
+ # systemctl start kdump.service
+
+This should turn on the firmware assisted functionality in kernel by
+echo'ing 1 to /sys/kernel/fadump_registered, leaving the system ready
+to capture a vmcore upon crashing. To test this out, you can force-crash
+your system by echo'ing a c into /proc/sysrq-trigger:
+
+ # echo c > /proc/sysrq-trigger
+
+You should see some panic output, followed by the system reset and booting into
+fresh copy of kernel. When default initrd loads and runs /init, vmcore should
+be copied out to disk (by default, in /var/crash/<YYYY.MM.DD-HH:MM:SS>/vmcore),
+then the system rebooted back into your normal kernel.
+
+Once back to your normal kernel, you can use the previously installed crash
+kernel in conjunction with the previously installed kernel-debuginfo to
+perform postmortem analysis:
+
+ # crash /usr/lib/debug/lib/modules/2.6.17-1.2621.el5/vmlinux
+ /var/crash/2006-08-23-15:34/vmcore
+
+ crash> bt
+
+and so on...
+
+Saving vmcore-dmesg.txt
+----------------------
+Kernel log bufferes are one of the most important information available
+in vmcore. Now before saving vmcore, kernel log bufferes are extracted
+from /proc/vmcore and saved into a file vmcore-dmesg.txt. After
+vmcore-dmesg.txt, vmcore is saved. Destination disk and directory for
+vmcore-dmesg.txt is same as vmcore. Note that kernel log buffers will
+not be available if dump target is raw device.
+
+Dump Triggering methods:
+
+This section talks about the various ways, other than a Kernel Panic, in which
+fadump can be triggered. The following methods assume that fadump is configured
+on your system, with the scripts enabled as described in the section above.
+
+1) AltSysRq C
+
+FAdump can be triggered with the combination of the 'Alt','SysRq' and
'C'
+keyboard keys. Please refer to the following link for more details:
+
+http://kbase.redhat.com/faq/FAQ_43_5559.shtm
+
+In addition, on PowerPC boxes, fadump can also be triggered via Hardware
+Management Console(HMC) using 'Ctrl', 'O' and 'C' keyboard keys.
+
+2) Kernel OOPs
+
+If we want to generate a dump everytime the Kernel OOPses, we can achieve this
+by setting the 'Panic On OOPs' option as follows:
+
+ # echo 1 > /proc/sys/kernel/panic_on_oops
+
+3) PowerPC specific methods:
+
+On IBM PowerPC machines, issuing a soft reset invokes the XMON debugger(if
+XMON is configured). To configure XMON one needs to compile the kernel with
+the CONFIG_XMON and CONFIG_XMON_DEFAULT options, or by compiling with
+CONFIG_XMON and booting the kernel with xmon=on option.
+
+Following are the ways to remotely issue a soft reset on PowerPC boxes, which
+would drop you to XMON. Pressing a 'X' (capital alphabet X) followed by an
+'Enter' here will trigger the dump.
+
+3.1) HMC
+
+Hardware Management Console(HMC) available on Power4 and Power5 machines allow
+partitions to be reset remotely. This is specially useful in hang situations
+where the system is not accepting any keyboard inputs.
+
+Once you have HMC configured, the following steps will enable you to trigger
+fadump via a soft reset:
+
+On Power4
+ Using GUI
+
+ * In the right pane, right click on the partition you wish to dump.
+ * Select "Operating System->Reset".
+ * Select "Soft Reset".
+ * Select "Yes".
+
+ Using HMC Commandline
+
+ # reset_partition -m <machine> -p <partition> -t soft
+
+On Power5
+ Using GUI
+
+ * In the right pane, right click on the partition you wish to dump.
+ * Select "Restart Partition".
+ * Select "Dump".
+ * Select "OK".
+
+ Using HMC Commandline
+
+ # chsysstate -m <managed system name> -n <lpar name> -o dumprestart -r
lpar
+
+3.2) Blade Management Console for Blade Center
+
+To initiate a dump operation, go to Power/Restart option under "Blade Tasks"
in
+the Blade Management Console. Select the corresponding blade for which you want
+to initate the dump and then click "Restart blade with NMI". This issues a
+system reset and invokes xmon debugger.
+
+
+Advanced Setups & Default action:
+
+Kdump and fadump exhibit similar behavior in terms of setup & default action.
+For fadump advanced setup related information see section "Advanced Setups" in
+"kexec-kdump-howto.txt" document. Refer to "Default action" section
in "kexec-
+kdump-howto.txt" document for fadump default action related information.
+
+Compression and filtering
+
+Refer "Compression and filtering" section in "kexec-kdump-howto.txt"
document.
+Compression and filtering are same for kdump & fadump.
+
+
+Notes on rootfs mount:
+Dracut is designed to mount rootfs by default. If rootfs mounting fails it
+will refuse to go on. So fadump leaves rootfs mounting to dracut currently.
+We make the assumtion that proper root= cmdline is being passed to dracut
+initramfs for the time being. If you need modify "KDUMP_COMMANDLINE=" in
+/etc/sysconfig/kdump, you will need to make sure that appropriate root=
+options are copied from /proc/cmdline. In general it is best to append
+command line options using "KDUMP_COMMANDLINE_APPEND=" instead of replacing
+the original command line completely.