The current method for kdump memory debug is to use dracut "rd.memdebug=[0-3]", it is not enough for debugging kernel modules. For example, when we want to find out which kernel module consumes a large amount of memory, "rd.memdebug" won't help too much.
A better way is needed to achieve this requirement, this is very useful for kdump OOM debugging.
The principle of this patch series is to use kernel trace to track slab and buddy allocation calls during kernel module loading(module_init), thus we can analyze all the trace data and get the total memory consumption. As for large slab allocation, it will probably fall into buddy allocation, thus tracing "mm_page_alloc" alone should be enough for the purpose.
The trace events include memory calls under "tracing/events/": kmem/mm_page_alloc
We also inpect the following events to detect the module loading: module/module_load module/module_put
We can get the module name and task pid from "module_load" event which also mark the beginning of the loading, and module_put called by the same task pid implies the end of the loading. So the memory events recorded in between by the same task pid are consumed by this module during loading(i.e. modprobe or module_init()).
With these information, we can record the total memory(the larger, the more precise) consumption involved by each kernel module loading.
One major flaw of this method is that the trace ring buffer consumes a lot of memory. If it is too small, old records maybe be overwritten by subsequent records. The trace ring buffer is set to be 5MB by default, but it can be overridden by users via the standard kernel boot parameter "trace_buf_size".
Users should increase the crash kernel memory reservation as needed after setting large trace ring buffer size, in case oom happens during debugging.
Usage: 1)Pass "rd.kodebug" to kdump kernel cmdline using "KDUMP_COMMANDLINE_APPEND" in /etc/sysconfig/kdump. 2)Pass the extra "trace_buf_size=nn[KMG]" to specify trace ring buffer size(per cpu) as needed.
As an example, it prints out something below on my kvm machine: == debug_mem for kernel modules during loading begin == 4 pages consumed by "dm_mod" [load finished] 0 pages consumed by "dm_log" [load finished] 0 pages consumed by "dm_region_hash" [load finished] 0 pages consumed by "dm_mirror" [load finished] 9 pages consumed by "sunrpc" [load finished] 24 pages consumed by "floppy" [load finished] 0 pages consumed by "libata" [load finished] 0 pages consumed by "i2c_core" [load finished] 27 pages consumed by "ata_piix" [load finished] 0 pages consumed by "drm" [load finished] 1 pages consumed by "ttm" [load finished] 0 pages consumed by "drm_kms_helper" [load finished] 1604 pages consumed by "qxl" [load finished] 0 pages consumed by "virtio" [load finished] 0 pages consumed by "virtio_ring" [load finished] 10 pages consumed by "virtio_pci" [load finished] 1 pages consumed by "pata_acpi" [load finished] 0 pages consumed by "ata_generic" [load finished] 0 pages consumed by "serio_raw" [load finished] 0 pages consumed by "crc32c_intel" [load finished] 0 pages consumed by "crct10dif_common" [load finished] 0 pages consumed by "crct10dif_pclmul" [load finished] 278 pages consumed by "virtio_net" [load finished] 198 pages consumed by "virtio_console" [load finished] 0 pages consumed by "cdrom" [load finished] 6 pages consumed by "sr_mod" [load finished] 162 pages consumed by "virtio_blk" [load finished] 1 pages consumed by "fscache" [load finished] 0 pages consumed by "lockd" [load finished] 17 pages consumed by "nfs" [load finished] 0 pages consumed by "libcrc32c" [load finished] 0 pages consumed by "dns_resolver" [load finished] 8 pages consumed by "xfs" [load finished] 0 pages consumed by "nfsv4" [load finished] == debug_mem for kernel modules during loading end ==
We can clearly see that "qxl" loading consumed more than 6MB memory.
Xunlei Pang (3): memdebug-ko: add dracut-memdebug-ko.sh to debug kernel module memory consumption module-setup: apply kernel module memory debug support kexec-kdump-howto: add the debugging tip for rd.kodebug
dracut-kdump.sh | 9 ++++ dracut-memdebug-ko.sh | 117 +++++++++++++++++++++++++++++++++++++++++++++++++ dracut-module-setup.sh | 9 ++++ kdumpctl | 14 ++++++ kexec-kdump-howto.txt | 37 ++++++++++++++++ kexec-tools.spec | 2 + 6 files changed, 188 insertions(+) create mode 100755 dracut-memdebug-ko.sh
Add dracut-memdebug-ko.sh, install it to the dracut kdump module.
The principle is to use kernel trace to track buddy page allocation events during kernel module loading(module_init), thus we can analyze all the trace data and get the total memory consumption. as for large slab allocation, it will fall into buddy, thus tracing "mm_page_alloc" only should be enough for the purpose.
One major flaw of this method is that it consumes a lot of memory, users should increase the crash kernel memory reservation or trace buffer size (via "trace_buf_size=nn[KMG]") as needed.
Signed-off-by: Xunlei Pang xlpang@redhat.com --- dracut-memdebug-ko.sh | 117 ++++++++++++++++++++++++++++++++++++++++++++++++++ kexec-tools.spec | 2 + 2 files changed, 119 insertions(+) create mode 100755 dracut-memdebug-ko.sh
diff --git a/dracut-memdebug-ko.sh b/dracut-memdebug-ko.sh new file mode 100755 index 0000000..cd22be7 --- /dev/null +++ b/dracut-memdebug-ko.sh @@ -0,0 +1,117 @@ +# Try to find out kernel modules with large total memory allocation during loading. +# For large slab allocation, it will fall into buddy, thus tracing "mm_page_alloc" +# only(this saves us lots of trace buffer memory) should be enough for the purpose. + +parse_trace_data=$1 +TRACE_BASE="/sys/kernel/debug" +# trace access through debugfs would be obsolete if "/sys/kernel/tracing" is available. +if [[ -d "/sys/kernel/tracing" ]]; then + TRACE_BASE="/sys/kernel" +fi + +if ! [[ "$parse_trace_data" ]]; then + # old debugfs case. + if ! [[ -d "$TRACE_BASE/tracing" ]]; then + mount none -t debugfs $TRACE_BASE + # new tracefs case. + elif ! [[ -f $TRACE_BASE/tracing/tracing_on ]]; then + mount none -t tracefs "$TRACE_BASE/tracing" + fi + + if ! [[ -f "$TRACE_BASE/tracing/tracing_on" ]]; then + warn "Mount trace failed for kernel module memory analyzing." + return 0 + fi + + # Prepare trace setup. + echo 1 > $TRACE_BASE/tracing/events/kmem/mm_page_alloc/enable + echo 1 > $TRACE_BASE/tracing/events/module/module_load/enable + echo 1 > $TRACE_BASE/tracing/events/module/module_put/enable + echo 1 > $TRACE_BASE/tracing/tracing_on + + # 5MB should be big enough for most cases? + # Users can override it via "trace_buf_size=nn[KMG]" boot command. + cat /proc/cmdline | grep -q "trace_buf_size=" + if [[ $? -ne 0 ]]; then + echo 5120 > $TRACE_BASE/tracing/buffer_size_kb + fi + + # Clear trace data + echo > $TRACE_BASE/tracing/trace + return 0 +fi + +# Begin to parse trace data. +if ! [[ -d "$TRACE_BASE/tracing" ]]; then + warn "Can't activate trace, skip kernel module memory analyzing!" + return 0 +fi + +# Temporarily turn off tracing during copy. +echo 0 > $TRACE_BASE/tracing/tracing_on +TMPFILE=/tmp/tmp$$$$ +cp $TRACE_BASE/tracing/trace $TMPFILE -f +echo 1 > $TRACE_BASE/tracing/tracing_on + +# Indexed by task pid. +declare -A current_module + +# Indexed by module name. +declare -A module_loaded +declare -A nr_alloc_pages + +while read pid cpu flags ts function ; +do + # Skip comment lines + if [[ $pid = "#" ]]; then + continue + fi + + if [[ $function = module_load* ]]; then + # One module is being loaded, save the task pid for tracking. + module_name=${function#*: } + module_names+=" $module_name" + current_module[$pid]="$module_name" + [[ ${module_loaded[$module_name]} ]] && echo ""$module_name" was loaded multiple times!" + unset module_loaded[$module_name] + nr_alloc_pages[$module_name]=0 + fi + + if ! [[ ${current_module[$pid]} ]]; then + continue + fi + + if [[ $function = module_put* ]]; then + # Mark the module as loaded + module_loaded[${current_module[$pid]}]=1 + # Module has been loaded when module_put is called, untrack the task + unset current_module[$pid] + continue + fi + + # Once we get here, the task is being tracked(is loading a module). + # Get the module name. + module_name=${current_module[$pid]} + + if [[ $function = mm_page_alloc* ]]; then + order=$(echo $function | sed -e 's/.*order=([0-9]*) .*/\1/') + nr_alloc_pages[$module_name]=$((${nr_alloc_pages[$module_name]}+$((2 ** $order)))) + fi +done < $TMPFILE + +echo -e "\n\n== debug_mem for kernel modules during loading begin ==" >&2 +for i in $module_names; do + status="load finished" + if ! [[ ${module_loaded[$i]} ]]; then + status="loading" + fi + echo -e "${nr_alloc_pages[$i]} pages consumed by "$i" [$status]" >&2 +done +echo -e "== debug_mem for kernel modules during loading end ==\n\n" >&2 + +unset module_names +unset module_loaded + +rm $TMPFILE -f + +return 0 diff --git a/kexec-tools.spec b/kexec-tools.spec index 1597071..691ad7a 100644 --- a/kexec-tools.spec +++ b/kexec-tools.spec @@ -41,6 +41,7 @@ Source103: dracut-kdump-error-handler.sh Source104: dracut-kdump-emergency.service Source105: dracut-kdump-error-handler.service Source106: dracut-kdump-capture.service +Source107: dracut-memdebug-ko.sh
Requires(post): systemd-units Requires(preun): systemd-units @@ -224,6 +225,7 @@ cp %{SOURCE103} $RPM_BUILD_ROOT/etc/kdump-adv-conf/kdump_dracut_modules/99kdumpb cp %{SOURCE104} $RPM_BUILD_ROOT/etc/kdump-adv-conf/kdump_dracut_modules/99kdumpbase/%{remove_dracut_prefix %{SOURCE104}} cp %{SOURCE105} $RPM_BUILD_ROOT/etc/kdump-adv-conf/kdump_dracut_modules/99kdumpbase/%{remove_dracut_prefix %{SOURCE105}} cp %{SOURCE106} $RPM_BUILD_ROOT/etc/kdump-adv-conf/kdump_dracut_modules/99kdumpbase/%{remove_dracut_prefix %{SOURCE106}} +cp %{SOURCE107} $RPM_BUILD_ROOT/etc/kdump-adv-conf/kdump_dracut_modules/99kdumpbase/%{remove_dracut_prefix %{SOURCE107}} chmod 755 $RPM_BUILD_ROOT/etc/kdump-adv-conf/kdump_dracut_modules/99kdumpbase/%{remove_dracut_prefix %{SOURCE100}} chmod 755 $RPM_BUILD_ROOT/etc/kdump-adv-conf/kdump_dracut_modules/99kdumpbase/%{remove_dracut_prefix %{SOURCE101}}
On 2016/10/31 at 15:15, Xunlei Pang wrote:
Add dracut-memdebug-ko.sh, install it to the dracut kdump module.
The principle is to use kernel trace to track buddy page allocation events during kernel module loading(module_init), thus we can analyze all the trace data and get the total memory consumption. as for large slab allocation, it will fall into buddy, thus tracing "mm_page_alloc" only should be enough for the purpose.
One major flaw of this method is that it consumes a lot of memory, users should increase the crash kernel memory reservation or trace buffer size (via "trace_buf_size=nn[KMG]") as needed.
We can address this major flaw now, as we can use filter to make the trace data very small. Here is the improved version:
Subject: [PATCH v2 1/3] memdebug-ko: add dracut-memdebug-ko.sh to debug kernel module memory consumption
Add dracut-memdebug-ko.sh, install it to the dracut kdump module.
The principle is to use kernel trace to track buddy page allocation events during kernel module loading(module_init), thus we can analyze all the trace data and get the total memory consumption. as for large slab allocation, it will fall into buddy, thus tracing "mm_page_alloc" alone should be enough for the purpose.
There are three kinds of known applications for module loading: "systemd-udevd", "modprobe" and "insmod".
We utilize them as the mm_page_alloc filter, so that loads of events can be avoided. As a result, we get very small trace data.
Signed-off-by: Xunlei Pang xlpang@redhat.com --- dracut-memdebug-ko.sh | 111 ++++++++++++++++++++++++++++++++++++++++++++++++++ kexec-tools.spec | 2 + 2 files changed, 113 insertions(+) create mode 100755 dracut-memdebug-ko.sh
diff --git a/dracut-memdebug-ko.sh b/dracut-memdebug-ko.sh new file mode 100755 index 0000000..fc0a4ba --- /dev/null +++ b/dracut-memdebug-ko.sh @@ -0,0 +1,111 @@ +# Try to find out kernel modules with large total memory allocation during loading. +# For large slab allocation, it will fall into buddy, thus tracing "mm_page_alloc" +# alone should be enough for the purpose. + +TRACE_BASE="/sys/kernel/debug" +# trace access through debugfs would be obsolete if "/sys/kernel/tracing" is available. +if [[ -d "/sys/kernel/tracing" ]]; then + TRACE_BASE="/sys/kernel" +fi + +# old debugfs case. +if ! [[ -d "$TRACE_BASE/tracing" ]]; then + mount none -t debugfs $TRACE_BASE +# new tracefs case. +elif ! [[ -f "$TRACE_BASE/tracing/trace" ]]; then + mount none -t tracefs "$TRACE_BASE/tracing" +fi + +if ! [[ -f "$TRACE_BASE/tracing/trace" ]]; then + warn "Mount trace failed for kernel module memory analyzing." + return 0 +fi + +MATCH_EVENTS="module:module_put module:module_load kmem:mm_page_alloc" +SET_EVENTS=$(echo $(cat $TRACE_BASE/tracing/set_event)) +# Check if trace was properly setup, prepare it if not. +if [[ $(cat $TRACE_BASE/tracing/tracing_on) != 1 ]] || \ + [[ "$SET_EVENTS" != "$MATCH_EVENTS" ]]; then + # Set our trace events. + echo $MATCH_EVENTS > $TRACE_BASE/tracing/set_event + + # There are three kinds of known applications for module loading: + # "systemd-udevd", "modprobe" and "insmod". + # Set them to the mm_page_alloc event filter. + page_alloc_filter="comm == systemd-udevd* || comm == modprobe* || comm == modprobe*" + echo $page_alloc_filter > $TRACE_BASE/tracing/events/kmem/mm_page_alloc/filter + + # Set the number of comm-pid. Thanks to filters, 4096 is big enough(also generally supported). + echo 4096 > $TRACE_BASE/tracing/saved_cmdlines_size + + # Enable and clear trace data. + echo 1 > $TRACE_BASE/tracing/tracing_on + echo > $TRACE_BASE/tracing/trace + return 0 +fi + +# Indexed by task pid. +declare -A current_module +# Indexed by module name. +declare -A module_loaded +declare -A nr_alloc_pages + +# For large trace data, parsing tracing/trace turns out to be very slow, +# so copy it out first and we parse the copy file to avoid this issue. +TMPFILE=/tmp/kdump.trace.tmp.$$$$ +cp $TRACE_BASE/tracing/trace $TMPFILE -f +while read pid cpu flags ts function ; +do + # Skip comment lines + if [[ $pid = "#" ]]; then + continue + fi + + if [[ $function = module_load* ]]; then + # One module is being loaded, save the task pid for tracking. + module_name=${function#*: } + # Remove the trailing after whitespace, there may be the module flags. + module_name=${module_name%% *} + module_names+=" $module_name" + current_module[$pid]="$module_name" + [[ ${module_loaded[$module_name]} ]] && warn ""$module_name" was loaded multiple times!" + unset module_loaded[$module_name] + nr_alloc_pages[$module_name]=0 + fi + + if ! [[ ${current_module[$pid]} ]]; then + continue + fi + + if [[ $function = module_put* ]]; then + # Mark the module as loaded + module_loaded[${current_module[$pid]}]=1 + # Module has been loaded when module_put is called, untrack the task + unset current_module[$pid] + continue + fi + + # Once we get here, the task is being tracked(is loading a module). + # Get the module name. + module_name=${current_module[$pid]} + + if [[ $function = mm_page_alloc* ]]; then + order=$(echo $function | sed -e 's/.*order=([0-9]*) .*/\1/') + nr_alloc_pages[$module_name]=$((${nr_alloc_pages[$module_name]}+$((2 ** $order)))) + fi +done < $TMPFILE + +echo "== debug_mem for kernel modules during loading begin ==" >&2 +for i in $module_names; do + status="load finished" + if ! [[ ${module_loaded[$i]} ]]; then + status="loading" + fi + echo "${nr_alloc_pages[$i]} pages consumed by "$i" [$status]" >&2 +done +echo "== debug_mem for kernel modules during loading end ==" >&2 + +unset module_names +unset module_loaded +rm $TMPFILE -f +return 0 diff --git a/kexec-tools.spec b/kexec-tools.spec index 1597071..691ad7a 100644 --- a/kexec-tools.spec +++ b/kexec-tools.spec @@ -41,6 +41,7 @@ Source103: dracut-kdump-error-handler.sh Source104: dracut-kdump-emergency.service Source105: dracut-kdump-error-handler.service Source106: dracut-kdump-capture.service +Source107: dracut-memdebug-ko.sh
Requires(post): systemd-units Requires(preun): systemd-units @@ -224,6 +225,7 @@ cp %{SOURCE103} $RPM_BUILD_ROOT/etc/kdump-adv-conf/kdump_dracut_modules/99kdumpb cp %{SOURCE104} $RPM_BUILD_ROOT/etc/kdump-adv-conf/kdump_dracut_modules/99kdumpbase/%{remove_dracut_prefix %{SOURCE104}} cp %{SOURCE105} $RPM_BUILD_ROOT/etc/kdump-adv-conf/kdump_dracut_modules/99kdumpbase/%{remove_dracut_prefix %{SOURCE105}} cp %{SOURCE106} $RPM_BUILD_ROOT/etc/kdump-adv-conf/kdump_dracut_modules/99kdumpbase/%{remove_dracut_prefix %{SOURCE106}} +cp %{SOURCE107} $RPM_BUILD_ROOT/etc/kdump-adv-conf/kdump_dracut_modules/99kdumpbase/%{remove_dracut_prefix %{SOURCE107}} chmod 755 $RPM_BUILD_ROOT/etc/kdump-adv-conf/kdump_dracut_modules/99kdumpbase/%{remove_dracut_prefix %{SOURCE100}} chmod 755 $RPM_BUILD_ROOT/etc/kdump-adv-conf/kdump_dracut_modules/99kdumpbase/%{remove_dracut_prefix %{SOURCE101}}
On 2016/11/01 at 14:10, Xunlei Pang wrote:
On 2016/10/31 at 15:15, Xunlei Pang wrote:
Add dracut-memdebug-ko.sh, install it to the dracut kdump module.
The principle is to use kernel trace to track buddy page allocation events during kernel module loading(module_init), thus we can analyze all the trace data and get the total memory consumption. as for large slab allocation, it will fall into buddy, thus tracing "mm_page_alloc" only should be enough for the purpose.
One major flaw of this method is that it consumes a lot of memory, users should increase the crash kernel memory reservation or trace buffer size (via "trace_buf_size=nn[KMG]") as needed.
We can address this major flaw now, as we can use filter to make the trace data very small. Here is the improved version:
Subject: [PATCH v2 1/3] memdebug-ko: add dracut-memdebug-ko.sh to debug kernel module memory consumption
Add dracut-memdebug-ko.sh, install it to the dracut kdump module.
The principle is to use kernel trace to track buddy page allocation events during kernel module loading(module_init), thus we can analyze all the trace data and get the total memory consumption. as for large slab allocation, it will fall into buddy, thus tracing "mm_page_alloc" alone should be enough for the purpose.
There are three kinds of known applications for module loading: "systemd-udevd", "modprobe" and "insmod".
We utilize them as the mm_page_alloc filter, so that loads of events can be avoided. As a result, we get very small trace data.
Signed-off-by: Xunlei Pang xlpang@redhat.com
dracut-memdebug-ko.sh | 111 ++++++++++++++++++++++++++++++++++++++++++++++++++ kexec-tools.spec | 2 + 2 files changed, 113 insertions(+) create mode 100755 dracut-memdebug-ko.sh
diff --git a/dracut-memdebug-ko.sh b/dracut-memdebug-ko.sh new file mode 100755 index 0000000..fc0a4ba --- /dev/null +++ b/dracut-memdebug-ko.sh @@ -0,0 +1,111 @@ +# Try to find out kernel modules with large total memory allocation during loading. +# For large slab allocation, it will fall into buddy, thus tracing "mm_page_alloc" +# alone should be enough for the purpose.
+TRACE_BASE="/sys/kernel/debug" +# trace access through debugfs would be obsolete if "/sys/kernel/tracing" is available. +if [[ -d "/sys/kernel/tracing" ]]; then
- TRACE_BASE="/sys/kernel"
+fi
+# old debugfs case. +if ! [[ -d "$TRACE_BASE/tracing" ]]; then
- mount none -t debugfs $TRACE_BASE
+# new tracefs case. +elif ! [[ -f "$TRACE_BASE/tracing/trace" ]]; then
- mount none -t tracefs "$TRACE_BASE/tracing"
+fi
+if ! [[ -f "$TRACE_BASE/tracing/trace" ]]; then
- warn "Mount trace failed for kernel module memory analyzing."
- return 0
+fi
+MATCH_EVENTS="module:module_put module:module_load kmem:mm_page_alloc" +SET_EVENTS=$(echo $(cat $TRACE_BASE/tracing/set_event)) +# Check if trace was properly setup, prepare it if not. +if [[ $(cat $TRACE_BASE/tracing/tracing_on) != 1 ]] || \
- [[ "$SET_EVENTS" != "$MATCH_EVENTS" ]]; then
- # Set our trace events.
- echo $MATCH_EVENTS > $TRACE_BASE/tracing/set_event
- # There are three kinds of known applications for module loading:
- # "systemd-udevd", "modprobe" and "insmod".
- # Set them to the mm_page_alloc event filter.
- page_alloc_filter="comm == systemd-udevd* || comm == modprobe* || comm == modprobe*"
Oops, this line should be, seems having "*" doesn't work for the filter:
page_alloc_filter="comm == systemd-udevd || comm == modprobe || comm == insmod"
- echo $page_alloc_filter > $TRACE_BASE/tracing/events/kmem/mm_page_alloc/filter
- # Set the number of comm-pid. Thanks to filters, 4096 is big enough(also generally supported).
- echo 4096 > $TRACE_BASE/tracing/saved_cmdlines_size
- # Enable and clear trace data.
- echo 1 > $TRACE_BASE/tracing/tracing_on
- echo > $TRACE_BASE/tracing/trace
- return 0
+fi
+# Indexed by task pid. +declare -A current_module +# Indexed by module name. +declare -A module_loaded +declare -A nr_alloc_pages
+# For large trace data, parsing tracing/trace turns out to be very slow, +# so copy it out first and we parse the copy file to avoid this issue. +TMPFILE=/tmp/kdump.trace.tmp.$$$$ +cp $TRACE_BASE/tracing/trace $TMPFILE -f +while read pid cpu flags ts function ; +do
- # Skip comment lines
- if [[ $pid = "#" ]]; then
continue
- fi
- if [[ $function = module_load* ]]; then
# One module is being loaded, save the task pid for tracking.
module_name=${function#*: }
# Remove the trailing after whitespace, there may be the module flags.
module_name=${module_name%% *}
module_names+=" $module_name"
current_module[$pid]="$module_name"
[[ ${module_loaded[$module_name]} ]] && warn "\"$module_name\" was loaded multiple times!"
unset module_loaded[$module_name]
nr_alloc_pages[$module_name]=0
- fi
- if ! [[ ${current_module[$pid]} ]]; then
continue
- fi
- if [[ $function = module_put* ]]; then
# Mark the module as loaded
module_loaded[${current_module[$pid]}]=1
# Module has been loaded when module_put is called, untrack the task
unset current_module[$pid]
continue
- fi
- # Once we get here, the task is being tracked(is loading a module).
- # Get the module name.
- module_name=${current_module[$pid]}
- if [[ $function = mm_page_alloc* ]]; then
order=$(echo $function | sed -e 's/.*order=\([0-9]*\) .*/\1/')
nr_alloc_pages[$module_name]=$((${nr_alloc_pages[$module_name]}+$((2 ** $order))))
- fi
+done < $TMPFILE
+echo "== debug_mem for kernel modules during loading begin ==" >&2 +for i in $module_names; do
- status="load finished"
- if ! [[ ${module_loaded[$i]} ]]; then
status="loading"
- fi
- echo "${nr_alloc_pages[$i]} pages consumed by "$i" [$status]" >&2
+done +echo "== debug_mem for kernel modules during loading end ==" >&2
+unset module_names +unset module_loaded +rm $TMPFILE -f +return 0 diff --git a/kexec-tools.spec b/kexec-tools.spec index 1597071..691ad7a 100644 --- a/kexec-tools.spec +++ b/kexec-tools.spec @@ -41,6 +41,7 @@ Source103: dracut-kdump-error-handler.sh Source104: dracut-kdump-emergency.service Source105: dracut-kdump-error-handler.service Source106: dracut-kdump-capture.service +Source107: dracut-memdebug-ko.sh
Requires(post): systemd-units Requires(preun): systemd-units @@ -224,6 +225,7 @@ cp %{SOURCE103} $RPM_BUILD_ROOT/etc/kdump-adv-conf/kdump_dracut_modules/99kdumpb cp %{SOURCE104} $RPM_BUILD_ROOT/etc/kdump-adv-conf/kdump_dracut_modules/99kdumpbase/%{remove_dracut_prefix %{SOURCE104}} cp %{SOURCE105} $RPM_BUILD_ROOT/etc/kdump-adv-conf/kdump_dracut_modules/99kdumpbase/%{remove_dracut_prefix %{SOURCE105}} cp %{SOURCE106} $RPM_BUILD_ROOT/etc/kdump-adv-conf/kdump_dracut_modules/99kdumpbase/%{remove_dracut_prefix %{SOURCE106}} +cp %{SOURCE107} $RPM_BUILD_ROOT/etc/kdump-adv-conf/kdump_dracut_modules/99kdumpbase/%{remove_dracut_prefix %{SOURCE107}} chmod 755 $RPM_BUILD_ROOT/etc/kdump-adv-conf/kdump_dracut_modules/99kdumpbase/%{remove_dracut_prefix %{SOURCE100}} chmod 755 $RPM_BUILD_ROOT/etc/kdump-adv-conf/kdump_dracut_modules/99kdumpbase/%{remove_dracut_prefix %{SOURCE101}}
Hi Xunlei,
It looks nice to me. Few minor comments:
On Tuesday 01 November 2016 12:42 PM, Xunlei Pang wrote:
On 2016/11/01 at 14:10, Xunlei Pang wrote:
On 2016/10/31 at 15:15, Xunlei Pang wrote:
Add dracut-memdebug-ko.sh, install it to the dracut kdump module.
The principle is to use kernel trace to track buddy page allocation events during kernel module loading(module_init), thus we can analyze all the trace data and get the total memory consumption. as for large slab allocation, it will fall into buddy, thus tracing "mm_page_alloc" only should be enough for the purpose.
One major flaw of this method is that it consumes a lot of memory, users should increase the crash kernel memory reservation or trace buffer size (via "trace_buf_size=nn[KMG]") as needed.
We can address this major flaw now, as we can use filter to make the trace data very small. Here is the improved version:
Subject: [PATCH v2 1/3] memdebug-ko: add dracut-memdebug-ko.sh to debug kernel module memory consumption
Add dracut-memdebug-ko.sh, install it to the dracut kdump module.
The principle is to use kernel trace to track buddy page allocation events during kernel module loading(module_init), thus we can analyze all the trace data and get the total memory consumption. as for large slab allocation, it will fall into buddy, thus tracing "mm_page_alloc" alone should be enough for the purpose.
There are three kinds of known applications for module loading: "systemd-udevd", "modprobe" and "insmod".
We utilize them as the mm_page_alloc filter, so that loads of events can be avoided. As a result, we get very small trace data.
Signed-off-by: Xunlei Pang xlpang@redhat.com
dracut-memdebug-ko.sh | 111 ++++++++++++++++++++++++++++++++++++++++++++++++++ kexec-tools.spec | 2 + 2 files changed, 113 insertions(+) create mode 100755 dracut-memdebug-ko.sh
diff --git a/dracut-memdebug-ko.sh b/dracut-memdebug-ko.sh new file mode 100755 index 0000000..fc0a4ba --- /dev/null +++ b/dracut-memdebug-ko.sh @@ -0,0 +1,111 @@ +# Try to find out kernel modules with large total memory allocation during loading. +# For large slab allocation, it will fall into buddy, thus tracing "mm_page_alloc" +# alone should be enough for the purpose.
+TRACE_BASE="/sys/kernel/debug" +# trace access through debugfs would be obsolete if "/sys/kernel/tracing" is available. +if [[ -d "/sys/kernel/tracing" ]]; then
- TRACE_BASE="/sys/kernel"
+fi
+# old debugfs case. +if ! [[ -d "$TRACE_BASE/tracing" ]]; then
- mount none -t debugfs $TRACE_BASE
+# new tracefs case. +elif ! [[ -f "$TRACE_BASE/tracing/trace" ]]; then
- mount none -t tracefs "$TRACE_BASE/tracing"
+fi
+if ! [[ -f "$TRACE_BASE/tracing/trace" ]]; then
- warn "Mount trace failed for kernel module memory analyzing."
- return 0
+fi
+MATCH_EVENTS="module:module_put module:module_load kmem:mm_page_alloc" +SET_EVENTS=$(echo $(cat $TRACE_BASE/tracing/set_event)) +# Check if trace was properly setup, prepare it if not. +if [[ $(cat $TRACE_BASE/tracing/tracing_on) != 1 ]] || \
IIUC, then we expect that this if condition is executed when memdebug-ko.sh cmdline hook is installed, right? ie. inst_hook cmdline 00 "$moddir/memdebug-ko.sh"
Although, there is no possibility that this hook is installed twice at present, still we should keep the code in such a way that even if this script is called twice through inst_hook, it should return from 'if'.
May be where there is no argument passed to the hook we can return after this 'if'.
- [[ "$SET_EVENTS" != "$MATCH_EVENTS" ]]; then
- # Set our trace events.
- echo $MATCH_EVENTS > $TRACE_BASE/tracing/set_event
- # There are three kinds of known applications for module loading:
- # "systemd-udevd", "modprobe" and "insmod".
- # Set them to the mm_page_alloc event filter.
- page_alloc_filter="comm == systemd-udevd* || comm == modprobe* || comm == modprobe*"
Oops, this line should be, seems having "*" doesn't work for the filter:
page_alloc_filter="comm == systemd-udevd || comm == modprobe || comm == insmod"
- echo $page_alloc_filter > $TRACE_BASE/tracing/events/kmem/mm_page_alloc/filter
- # Set the number of comm-pid. Thanks to filters, 4096 is big enough(also generally supported).
- echo 4096 > $TRACE_BASE/tracing/saved_cmdlines_size
- # Enable and clear trace data.
I do not see the possibility of any other events enabled in kdump kernel. Still, disabling them would be safer. echo 0 > $TRACE_BASE/tracing/events/enable
- echo 1 > $TRACE_BASE/tracing/tracing_on
- echo > $TRACE_BASE/tracing/trace
- return 0
+fi
+# Indexed by task pid. +declare -A current_module +# Indexed by module name. +declare -A module_loaded +declare -A nr_alloc_pages
+# For large trace data, parsing tracing/trace turns out to be very slow, +# so copy it out first and we parse the copy file to avoid this issue. +TMPFILE=/tmp/kdump.trace.tmp.$$$$ +cp $TRACE_BASE/tracing/trace $TMPFILE -f +while read pid cpu flags ts function ; +do
- # Skip comment lines
- if [[ $pid = "#" ]]; then
continue
- fi
- if [[ $function = module_load* ]]; then
# One module is being loaded, save the task pid for tracking.
module_name=${function#*: }
# Remove the trailing after whitespace, there may be the module flags.
module_name=${module_name%% *}
module_names+=" $module_name"
current_module[$pid]="$module_name"
[[ ${module_loaded[$module_name]} ]] && warn "\"$module_name\" was loaded multiple times!"
unset module_loaded[$module_name]
nr_alloc_pages[$module_name]=0
- fi
- if ! [[ ${current_module[$pid]} ]]; then
continue
- fi
- if [[ $function = module_put* ]]; then
# Mark the module as loaded
module_loaded[${current_module[$pid]}]=1
# Module has been loaded when module_put is called, untrack the task
unset current_module[$pid]
continue
- fi
- # Once we get here, the task is being tracked(is loading a module).
- # Get the module name.
- module_name=${current_module[$pid]}
- if [[ $function = mm_page_alloc* ]]; then
order=$(echo $function | sed -e 's/.*order=\([0-9]*\) .*/\1/')
nr_alloc_pages[$module_name]=$((${nr_alloc_pages[$module_name]}+$((2 ** $order))))
- fi
+done < $TMPFILE
+echo "== debug_mem for kernel modules during loading begin ==" >&2 +for i in $module_names; do
- status="load finished"
- if ! [[ ${module_loaded[$i]} ]]; then
status="loading"
- fi
- echo "${nr_alloc_pages[$i]} pages consumed by "$i" [$status]" >&2
+done +echo "== debug_mem for kernel modules during loading end ==" >&2
+unset module_names +unset module_loaded +rm $TMPFILE -f +return 0 diff --git a/kexec-tools.spec b/kexec-tools.spec index 1597071..691ad7a 100644 --- a/kexec-tools.spec +++ b/kexec-tools.spec @@ -41,6 +41,7 @@ Source103: dracut-kdump-error-handler.sh Source104: dracut-kdump-emergency.service Source105: dracut-kdump-error-handler.service Source106: dracut-kdump-capture.service +Source107: dracut-memdebug-ko.sh
Requires(post): systemd-units Requires(preun): systemd-units @@ -224,6 +225,7 @@ cp %{SOURCE103} $RPM_BUILD_ROOT/etc/kdump-adv-conf/kdump_dracut_modules/99kdumpb cp %{SOURCE104} $RPM_BUILD_ROOT/etc/kdump-adv-conf/kdump_dracut_modules/99kdumpbase/%{remove_dracut_prefix %{SOURCE104}} cp %{SOURCE105} $RPM_BUILD_ROOT/etc/kdump-adv-conf/kdump_dracut_modules/99kdumpbase/%{remove_dracut_prefix %{SOURCE105}} cp %{SOURCE106} $RPM_BUILD_ROOT/etc/kdump-adv-conf/kdump_dracut_modules/99kdumpbase/%{remove_dracut_prefix %{SOURCE106}} +cp %{SOURCE107} $RPM_BUILD_ROOT/etc/kdump-adv-conf/kdump_dracut_modules/99kdumpbase/%{remove_dracut_prefix %{SOURCE107}} chmod 755 $RPM_BUILD_ROOT/etc/kdump-adv-conf/kdump_dracut_modules/99kdumpbase/%{remove_dracut_prefix %{SOURCE100}} chmod 755 $RPM_BUILD_ROOT/etc/kdump-adv-conf/kdump_dracut_modules/99kdumpbase/%{remove_dracut_prefix %{SOURCE101}}
~Pratyush
Hi Pratyush,
Xunlei has posted a new patch set to dracut list..
Thanks Dave
On Thursday 03 November 2016 08:39 AM, Dave Young wrote:
Hi Pratyush,
Xunlei has posted a new patch set to dracut list..
Not subscribed to initramfs@vger.kernel.org. I will subscribe it.
~Pratyush
On 10/31/16 at 03:15pm, Xunlei Pang wrote:
Add dracut-memdebug-ko.sh, install it to the dracut kdump module. +parse_trace_data=$1 +TRACE_BASE="/sys/kernel/debug" +# trace access through debugfs would be obsolete if "/sys/kernel/tracing" is available. +if [[ -d "/sys/kernel/tracing" ]]; then
- TRACE_BASE="/sys/kernel"
+fi
Roughly reading patch, I am not familiar with debug tracing.
On my laptop, seems only debugfs tracing is on, but also /sys/kernel/tracing exists. not sure if it should get a chance to have a try.
[root@x1 tracing]# pwd /sys/kernel/debug/tracing [root@x1 tracing]# cat tracing_on 1 [root@x1 tracing]# cd /sys/kernel/tracing [root@x1 tracing]# pwd /sys/kernel/tracing [root@x1 tracing]# ls [root@x1 tracing]#
+if ! [[ "$parse_trace_data" ]]; then
- # old debugfs case.
- if ! [[ -d "$TRACE_BASE/tracing" ]]; then
mount none -t debugfs $TRACE_BASE
- # new tracefs case.
- elif ! [[ -f $TRACE_BASE/tracing/tracing_on ]]; then
mount none -t tracefs "$TRACE_BASE/tracing"
- fi
- if ! [[ -f "$TRACE_BASE/tracing/tracing_on" ]]; then
warn "Mount trace failed for kernel module memory analyzing."
return 0
- fi
- # Prepare trace setup.
- echo 1 > $TRACE_BASE/tracing/events/kmem/mm_page_alloc/enable
- echo 1 > $TRACE_BASE/tracing/events/module/module_load/enable
- echo 1 > $TRACE_BASE/tracing/events/module/module_put/enable
- echo 1 > $TRACE_BASE/tracing/tracing_on
- # 5MB should be big enough for most cases?
- # Users can override it via "trace_buf_size=nn[KMG]" boot command.
- cat /proc/cmdline | grep -q "trace_buf_size="
- if [[ $? -ne 0 ]]; then
echo 5120 > $TRACE_BASE/tracing/buffer_size_kb
- fi
- # Clear trace data
- echo > $TRACE_BASE/tracing/trace
- return 0
+fi
+# Begin to parse trace data. +if ! [[ -d "$TRACE_BASE/tracing" ]]; then
- warn "Can't activate trace, skip kernel module memory analyzing!"
- return 0
+fi
+# Temporarily turn off tracing during copy. +echo 0 > $TRACE_BASE/tracing/tracing_on +TMPFILE=/tmp/tmp$$$$ +cp $TRACE_BASE/tracing/trace $TMPFILE -f +echo 1 > $TRACE_BASE/tracing/tracing_on
+# Indexed by task pid. +declare -A current_module
+# Indexed by module name. +declare -A module_loaded +declare -A nr_alloc_pages
+while read pid cpu flags ts function ; +do
- # Skip comment lines
- if [[ $pid = "#" ]]; then
continue
- fi
- if [[ $function = module_load* ]]; then
# One module is being loaded, save the task pid for tracking.
module_name=${function#*: }
module_names+=" $module_name"
current_module[$pid]="$module_name"
[[ ${module_loaded[$module_name]} ]] && echo "\"$module_name\" was loaded multiple times!"
unset module_loaded[$module_name]
nr_alloc_pages[$module_name]=0
- fi
- if ! [[ ${current_module[$pid]} ]]; then
continue
- fi
- if [[ $function = module_put* ]]; then
# Mark the module as loaded
module_loaded[${current_module[$pid]}]=1
# Module has been loaded when module_put is called, untrack the task
unset current_module[$pid]
continue
- fi
- # Once we get here, the task is being tracked(is loading a module).
- # Get the module name.
- module_name=${current_module[$pid]}
- if [[ $function = mm_page_alloc* ]]; then
order=$(echo $function | sed -e 's/.*order=\([0-9]*\) .*/\1/')
nr_alloc_pages[$module_name]=$((${nr_alloc_pages[$module_name]}+$((2 ** $order))))
- fi
+done < $TMPFILE
+echo -e "\n\n== debug_mem for kernel modules during loading begin ==" >&2 +for i in $module_names; do
- status="load finished"
- if ! [[ ${module_loaded[$i]} ]]; then
status="loading"
- fi
- echo -e "${nr_alloc_pages[$i]} pages consumed by "$i" [$status]" >&2
+done +echo -e "== debug_mem for kernel modules during loading end ==\n\n" >&2
+unset module_names +unset module_loaded
+rm $TMPFILE -f
+return 0 diff --git a/kexec-tools.spec b/kexec-tools.spec index 1597071..691ad7a 100644 --- a/kexec-tools.spec +++ b/kexec-tools.spec @@ -41,6 +41,7 @@ Source103: dracut-kdump-error-handler.sh Source104: dracut-kdump-emergency.service Source105: dracut-kdump-error-handler.service Source106: dracut-kdump-capture.service +Source107: dracut-memdebug-ko.sh
Requires(post): systemd-units Requires(preun): systemd-units @@ -224,6 +225,7 @@ cp %{SOURCE103} $RPM_BUILD_ROOT/etc/kdump-adv-conf/kdump_dracut_modules/99kdumpb cp %{SOURCE104} $RPM_BUILD_ROOT/etc/kdump-adv-conf/kdump_dracut_modules/99kdumpbase/%{remove_dracut_prefix %{SOURCE104}} cp %{SOURCE105} $RPM_BUILD_ROOT/etc/kdump-adv-conf/kdump_dracut_modules/99kdumpbase/%{remove_dracut_prefix %{SOURCE105}} cp %{SOURCE106} $RPM_BUILD_ROOT/etc/kdump-adv-conf/kdump_dracut_modules/99kdumpbase/%{remove_dracut_prefix %{SOURCE106}} +cp %{SOURCE107} $RPM_BUILD_ROOT/etc/kdump-adv-conf/kdump_dracut_modules/99kdumpbase/%{remove_dracut_prefix %{SOURCE107}} chmod 755 $RPM_BUILD_ROOT/etc/kdump-adv-conf/kdump_dracut_modules/99kdumpbase/%{remove_dracut_prefix %{SOURCE100}} chmod 755 $RPM_BUILD_ROOT/etc/kdump-adv-conf/kdump_dracut_modules/99kdumpbase/%{remove_dracut_prefix %{SOURCE101}}
-- 1.8.3.1 _______________________________________________ kexec mailing list -- kexec@lists.fedoraproject.org To unsubscribe send an email to kexec-leave@lists.fedoraproject.org
On 2016/11/01 at 16:37, Baoquan He wrote:
On 10/31/16 at 03:15pm, Xunlei Pang wrote:
Add dracut-memdebug-ko.sh, install it to the dracut kdump module. +parse_trace_data=$1 +TRACE_BASE="/sys/kernel/debug" +# trace access through debugfs would be obsolete if "/sys/kernel/tracing" is available. +if [[ -d "/sys/kernel/tracing" ]]; then
- TRACE_BASE="/sys/kernel"
+fi
Roughly reading patch, I am not familiar with debug tracing.
On my laptop, seems only debugfs tracing is on, but also /sys/kernel/tracing exists. not sure if it should get a chance to have a try.
[root@x1 tracing]# pwd /sys/kernel/debug/tracing [root@x1 tracing]# cat tracing_on 1 [root@x1 tracing]# cd /sys/kernel/tracing [root@x1 tracing]# pwd /sys/kernel/tracing [root@x1 tracing]# ls [root@x1 tracing]#
Yes, "/sys/kernel/tracing" has priority, if exists, it will mount tracefs to /sys/kernel/tracing/, then operate under this folder. There will also be trace files under "/sys/kernel/debug/tracing" if debugfs is mounted, but they actuall have the same function.
IIUC, "sys/kernel/tracing/" is just a new interface to be independent of debugfs.
Regards, Xunlei
+if ! [[ "$parse_trace_data" ]]; then
- # old debugfs case.
- if ! [[ -d "$TRACE_BASE/tracing" ]]; then
mount none -t debugfs $TRACE_BASE
- # new tracefs case.
- elif ! [[ -f $TRACE_BASE/tracing/tracing_on ]]; then
mount none -t tracefs "$TRACE_BASE/tracing"
- fi
- if ! [[ -f "$TRACE_BASE/tracing/tracing_on" ]]; then
warn "Mount trace failed for kernel module memory analyzing."
return 0
- fi
- # Prepare trace setup.
- echo 1 > $TRACE_BASE/tracing/events/kmem/mm_page_alloc/enable
- echo 1 > $TRACE_BASE/tracing/events/module/module_load/enable
- echo 1 > $TRACE_BASE/tracing/events/module/module_put/enable
- echo 1 > $TRACE_BASE/tracing/tracing_on
- # 5MB should be big enough for most cases?
- # Users can override it via "trace_buf_size=nn[KMG]" boot command.
- cat /proc/cmdline | grep -q "trace_buf_size="
- if [[ $? -ne 0 ]]; then
echo 5120 > $TRACE_BASE/tracing/buffer_size_kb
- fi
- # Clear trace data
- echo > $TRACE_BASE/tracing/trace
- return 0
+fi
+# Begin to parse trace data. +if ! [[ -d "$TRACE_BASE/tracing" ]]; then
- warn "Can't activate trace, skip kernel module memory analyzing!"
- return 0
+fi
+# Temporarily turn off tracing during copy. +echo 0 > $TRACE_BASE/tracing/tracing_on +TMPFILE=/tmp/tmp$$$$ +cp $TRACE_BASE/tracing/trace $TMPFILE -f +echo 1 > $TRACE_BASE/tracing/tracing_on
+# Indexed by task pid. +declare -A current_module
+# Indexed by module name. +declare -A module_loaded +declare -A nr_alloc_pages
+while read pid cpu flags ts function ; +do
- # Skip comment lines
- if [[ $pid = "#" ]]; then
continue
- fi
- if [[ $function = module_load* ]]; then
# One module is being loaded, save the task pid for tracking.
module_name=${function#*: }
module_names+=" $module_name"
current_module[$pid]="$module_name"
[[ ${module_loaded[$module_name]} ]] && echo "\"$module_name\" was loaded multiple times!"
unset module_loaded[$module_name]
nr_alloc_pages[$module_name]=0
- fi
- if ! [[ ${current_module[$pid]} ]]; then
continue
- fi
- if [[ $function = module_put* ]]; then
# Mark the module as loaded
module_loaded[${current_module[$pid]}]=1
# Module has been loaded when module_put is called, untrack the task
unset current_module[$pid]
continue
- fi
- # Once we get here, the task is being tracked(is loading a module).
- # Get the module name.
- module_name=${current_module[$pid]}
- if [[ $function = mm_page_alloc* ]]; then
order=$(echo $function | sed -e 's/.*order=\([0-9]*\) .*/\1/')
nr_alloc_pages[$module_name]=$((${nr_alloc_pages[$module_name]}+$((2 ** $order))))
- fi
+done < $TMPFILE
+echo -e "\n\n== debug_mem for kernel modules during loading begin ==" >&2 +for i in $module_names; do
- status="load finished"
- if ! [[ ${module_loaded[$i]} ]]; then
status="loading"
- fi
- echo -e "${nr_alloc_pages[$i]} pages consumed by "$i" [$status]" >&2
+done +echo -e "== debug_mem for kernel modules during loading end ==\n\n" >&2
+unset module_names +unset module_loaded
+rm $TMPFILE -f
+return 0 diff --git a/kexec-tools.spec b/kexec-tools.spec index 1597071..691ad7a 100644 --- a/kexec-tools.spec +++ b/kexec-tools.spec @@ -41,6 +41,7 @@ Source103: dracut-kdump-error-handler.sh Source104: dracut-kdump-emergency.service Source105: dracut-kdump-error-handler.service Source106: dracut-kdump-capture.service +Source107: dracut-memdebug-ko.sh
Requires(post): systemd-units Requires(preun): systemd-units @@ -224,6 +225,7 @@ cp %{SOURCE103} $RPM_BUILD_ROOT/etc/kdump-adv-conf/kdump_dracut_modules/99kdumpb cp %{SOURCE104} $RPM_BUILD_ROOT/etc/kdump-adv-conf/kdump_dracut_modules/99kdumpbase/%{remove_dracut_prefix %{SOURCE104}} cp %{SOURCE105} $RPM_BUILD_ROOT/etc/kdump-adv-conf/kdump_dracut_modules/99kdumpbase/%{remove_dracut_prefix %{SOURCE105}} cp %{SOURCE106} $RPM_BUILD_ROOT/etc/kdump-adv-conf/kdump_dracut_modules/99kdumpbase/%{remove_dracut_prefix %{SOURCE106}} +cp %{SOURCE107} $RPM_BUILD_ROOT/etc/kdump-adv-conf/kdump_dracut_modules/99kdumpbase/%{remove_dracut_prefix %{SOURCE107}} chmod 755 $RPM_BUILD_ROOT/etc/kdump-adv-conf/kdump_dracut_modules/99kdumpbase/%{remove_dracut_prefix %{SOURCE100}} chmod 755 $RPM_BUILD_ROOT/etc/kdump-adv-conf/kdump_dracut_modules/99kdumpbase/%{remove_dracut_prefix %{SOURCE101}}
-- 1.8.3.1 _______________________________________________ kexec mailing list -- kexec@lists.fedoraproject.org To unsubscribe send an email to kexec-leave@lists.fedoraproject.org
On 11/01/16 at 05:02pm, Xunlei Pang wrote:
On 2016/11/01 at 16:37, Baoquan He wrote:
On 10/31/16 at 03:15pm, Xunlei Pang wrote:
Add dracut-memdebug-ko.sh, install it to the dracut kdump module. +parse_trace_data=$1 +TRACE_BASE="/sys/kernel/debug" +# trace access through debugfs would be obsolete if "/sys/kernel/tracing" is available. +if [[ -d "/sys/kernel/tracing" ]]; then
- TRACE_BASE="/sys/kernel"
+fi
Roughly reading patch, I am not familiar with debug tracing.
On my laptop, seems only debugfs tracing is on, but also /sys/kernel/tracing exists. not sure if it should get a chance to have a try.
[root@x1 tracing]# pwd /sys/kernel/debug/tracing [root@x1 tracing]# cat tracing_on 1 [root@x1 tracing]# cd /sys/kernel/tracing [root@x1 tracing]# pwd /sys/kernel/tracing [root@x1 tracing]# ls [root@x1 tracing]#
Yes, "/sys/kernel/tracing" has priority, if exists, it will mount tracefs to /sys/kernel/tracing/, then operate under this folder. There will also be trace files under "/sys/kernel/debug/tracing" if debugfs is mounted, but they actuall have the same function.
IIUC, "sys/kernel/tracing/" is just a new interface to be independent of debugfs.
Got it, thanks.
Regards, Xunlei
+if ! [[ "$parse_trace_data" ]]; then
- # old debugfs case.
- if ! [[ -d "$TRACE_BASE/tracing" ]]; then
mount none -t debugfs $TRACE_BASE
- # new tracefs case.
- elif ! [[ -f $TRACE_BASE/tracing/tracing_on ]]; then
mount none -t tracefs "$TRACE_BASE/tracing"
- fi
- if ! [[ -f "$TRACE_BASE/tracing/tracing_on" ]]; then
warn "Mount trace failed for kernel module memory analyzing."
return 0
- fi
- # Prepare trace setup.
- echo 1 > $TRACE_BASE/tracing/events/kmem/mm_page_alloc/enable
- echo 1 > $TRACE_BASE/tracing/events/module/module_load/enable
- echo 1 > $TRACE_BASE/tracing/events/module/module_put/enable
- echo 1 > $TRACE_BASE/tracing/tracing_on
- # 5MB should be big enough for most cases?
- # Users can override it via "trace_buf_size=nn[KMG]" boot command.
- cat /proc/cmdline | grep -q "trace_buf_size="
- if [[ $? -ne 0 ]]; then
echo 5120 > $TRACE_BASE/tracing/buffer_size_kb
- fi
- # Clear trace data
- echo > $TRACE_BASE/tracing/trace
- return 0
+fi
+# Begin to parse trace data. +if ! [[ -d "$TRACE_BASE/tracing" ]]; then
- warn "Can't activate trace, skip kernel module memory analyzing!"
- return 0
+fi
+# Temporarily turn off tracing during copy. +echo 0 > $TRACE_BASE/tracing/tracing_on +TMPFILE=/tmp/tmp$$$$ +cp $TRACE_BASE/tracing/trace $TMPFILE -f +echo 1 > $TRACE_BASE/tracing/tracing_on
+# Indexed by task pid. +declare -A current_module
+# Indexed by module name. +declare -A module_loaded +declare -A nr_alloc_pages
+while read pid cpu flags ts function ; +do
- # Skip comment lines
- if [[ $pid = "#" ]]; then
continue
- fi
- if [[ $function = module_load* ]]; then
# One module is being loaded, save the task pid for tracking.
module_name=${function#*: }
module_names+=" $module_name"
current_module[$pid]="$module_name"
[[ ${module_loaded[$module_name]} ]] && echo "\"$module_name\" was loaded multiple times!"
unset module_loaded[$module_name]
nr_alloc_pages[$module_name]=0
- fi
- if ! [[ ${current_module[$pid]} ]]; then
continue
- fi
- if [[ $function = module_put* ]]; then
# Mark the module as loaded
module_loaded[${current_module[$pid]}]=1
# Module has been loaded when module_put is called, untrack the task
unset current_module[$pid]
continue
- fi
- # Once we get here, the task is being tracked(is loading a module).
- # Get the module name.
- module_name=${current_module[$pid]}
- if [[ $function = mm_page_alloc* ]]; then
order=$(echo $function | sed -e 's/.*order=\([0-9]*\) .*/\1/')
nr_alloc_pages[$module_name]=$((${nr_alloc_pages[$module_name]}+$((2 ** $order))))
- fi
+done < $TMPFILE
+echo -e "\n\n== debug_mem for kernel modules during loading begin ==" >&2 +for i in $module_names; do
- status="load finished"
- if ! [[ ${module_loaded[$i]} ]]; then
status="loading"
- fi
- echo -e "${nr_alloc_pages[$i]} pages consumed by "$i" [$status]" >&2
+done +echo -e "== debug_mem for kernel modules during loading end ==\n\n" >&2
+unset module_names +unset module_loaded
+rm $TMPFILE -f
+return 0 diff --git a/kexec-tools.spec b/kexec-tools.spec index 1597071..691ad7a 100644 --- a/kexec-tools.spec +++ b/kexec-tools.spec @@ -41,6 +41,7 @@ Source103: dracut-kdump-error-handler.sh Source104: dracut-kdump-emergency.service Source105: dracut-kdump-error-handler.service Source106: dracut-kdump-capture.service +Source107: dracut-memdebug-ko.sh
Requires(post): systemd-units Requires(preun): systemd-units @@ -224,6 +225,7 @@ cp %{SOURCE103} $RPM_BUILD_ROOT/etc/kdump-adv-conf/kdump_dracut_modules/99kdumpb cp %{SOURCE104} $RPM_BUILD_ROOT/etc/kdump-adv-conf/kdump_dracut_modules/99kdumpbase/%{remove_dracut_prefix %{SOURCE104}} cp %{SOURCE105} $RPM_BUILD_ROOT/etc/kdump-adv-conf/kdump_dracut_modules/99kdumpbase/%{remove_dracut_prefix %{SOURCE105}} cp %{SOURCE106} $RPM_BUILD_ROOT/etc/kdump-adv-conf/kdump_dracut_modules/99kdumpbase/%{remove_dracut_prefix %{SOURCE106}} +cp %{SOURCE107} $RPM_BUILD_ROOT/etc/kdump-adv-conf/kdump_dracut_modules/99kdumpbase/%{remove_dracut_prefix %{SOURCE107}} chmod 755 $RPM_BUILD_ROOT/etc/kdump-adv-conf/kdump_dracut_modules/99kdumpbase/%{remove_dracut_prefix %{SOURCE100}} chmod 755 $RPM_BUILD_ROOT/etc/kdump-adv-conf/kdump_dracut_modules/99kdumpbase/%{remove_dracut_prefix %{SOURCE101}}
-- 1.8.3.1 _______________________________________________ kexec mailing list -- kexec@lists.fedoraproject.org To unsubscribe send an email to kexec-leave@lists.fedoraproject.org
kexec mailing list -- kexec@lists.fedoraproject.org To unsubscribe send an email to kexec-leave@lists.fedoraproject.org
On 10/31/16 at 03:15pm, Xunlei Pang wrote:
Add dracut-memdebug-ko.sh, install it to the dracut kdump module.
The principle is to use kernel trace to track buddy page allocation events during kernel module loading(module_init), thus we can analyze all the trace data and get the total memory consumption. as for large slab allocation, it will fall into buddy, thus tracing "mm_page_alloc" only should be enough for the purpose.
One major flaw of this method is that it consumes a lot of memory, users should increase the crash kernel memory reservation or trace buffer size (via "trace_buf_size=nn[KMG]") as needed.
Signed-off-by: Xunlei Pang xlpang@redhat.com
dracut-memdebug-ko.sh | 117 ++++++++++++++++++++++++++++++++++++++++++++++++++
Have not dig the details, just run the script I got: ./dracut-memdebug-ko.sh: line 117: return: can only `return' from a function or sourced script
Thanks Dave
On 11/02/16 at 01:07pm, Dave Young wrote:
On 10/31/16 at 03:15pm, Xunlei Pang wrote:
Add dracut-memdebug-ko.sh, install it to the dracut kdump module.
The principle is to use kernel trace to track buddy page allocation events during kernel module loading(module_init), thus we can analyze all the trace data and get the total memory consumption. as for large slab allocation, it will fall into buddy, thus tracing "mm_page_alloc" only should be enough for the purpose.
One major flaw of this method is that it consumes a lot of memory, users should increase the crash kernel memory reservation or trace buffer size (via "trace_buf_size=nn[KMG]") as needed.
Signed-off-by: Xunlei Pang xlpang@redhat.com
dracut-memdebug-ko.sh | 117 ++++++++++++++++++++++++++++++++++++++++++++++++++
Have not dig the details, just run the script I got: ./dracut-memdebug-ko.sh: line 117: return: can only `return' from a function or sourced script
replace all return with exit works for me.
Cool, it can be used as a standalone script without dracut dependent. Is it possible to extend it and create a standalone package?
Thanks Dave
On 2016/11/02 at 13:13, Dave Young wrote:
On 11/02/16 at 01:07pm, Dave Young wrote:
On 10/31/16 at 03:15pm, Xunlei Pang wrote:
Add dracut-memdebug-ko.sh, install it to the dracut kdump module.
The principle is to use kernel trace to track buddy page allocation events during kernel module loading(module_init), thus we can analyze all the trace data and get the total memory consumption. as for large slab allocation, it will fall into buddy, thus tracing "mm_page_alloc" only should be enough for the purpose.
One major flaw of this method is that it consumes a lot of memory, users should increase the crash kernel memory reservation or trace buffer size (via "trace_buf_size=nn[KMG]") as needed.
Signed-off-by: Xunlei Pang xlpang@redhat.com
dracut-memdebug-ko.sh | 117 ++++++++++++++++++++++++++++++++++++++++++++++++++
Have not dig the details, just run the script I got: ./dracut-memdebug-ko.sh: line 117: return: can only `return' from a function or sourced script
replace all return with exit works for me.
Cool, it can be used as a standalone script without dracut dependent. Is it possible to extend it and create a standalone package?
It actually contains three stages: 1) prepare and setup trace; 2) collect enough trace data; 3) parse the trace data. It relys on dracut to determine the proper stages, or at least some manual setup.
Regards, Xunlei
Thanks Dave
On 11/02/16 at 01:24pm, Xunlei Pang wrote:
On 2016/11/02 at 13:13, Dave Young wrote:
On 11/02/16 at 01:07pm, Dave Young wrote:
On 10/31/16 at 03:15pm, Xunlei Pang wrote:
Add dracut-memdebug-ko.sh, install it to the dracut kdump module.
The principle is to use kernel trace to track buddy page allocation events during kernel module loading(module_init), thus we can analyze all the trace data and get the total memory consumption. as for large slab allocation, it will fall into buddy, thus tracing "mm_page_alloc" only should be enough for the purpose.
One major flaw of this method is that it consumes a lot of memory, users should increase the crash kernel memory reservation or trace buffer size (via "trace_buf_size=nn[KMG]") as needed.
Signed-off-by: Xunlei Pang xlpang@redhat.com
dracut-memdebug-ko.sh | 117 ++++++++++++++++++++++++++++++++++++++++++++++++++
Have not dig the details, just run the script I got: ./dracut-memdebug-ko.sh: line 117: return: can only `return' from a function or sourced script
replace all return with exit works for me.
Cool, it can be used as a standalone script without dracut dependent. Is it possible to extend it and create a standalone package?
It actually contains three stages: 1) prepare and setup trace; 2) collect enough trace data; 3) parse the trace data. It relys on dracut to determine the proper stages, or at least some manual setup.
I used it manually: ./dracut-memdebug-ko.sh
modprobe somemodule ./dracut-memdebug-ko.sh 1
So seems if is still useful, but it is just an random thought, there might be other page alloc during the module loading, I'm not sure if we can distingush them.
Regards, Xunlei
Thanks Dave
On 2016/11/02 at 13:34, Dave Young wrote:
On 11/02/16 at 01:24pm, Xunlei Pang wrote:
On 2016/11/02 at 13:13, Dave Young wrote:
On 11/02/16 at 01:07pm, Dave Young wrote:
On 10/31/16 at 03:15pm, Xunlei Pang wrote:
Add dracut-memdebug-ko.sh, install it to the dracut kdump module.
The principle is to use kernel trace to track buddy page allocation events during kernel module loading(module_init), thus we can analyze all the trace data and get the total memory consumption. as for large slab allocation, it will fall into buddy, thus tracing "mm_page_alloc" only should be enough for the purpose.
One major flaw of this method is that it consumes a lot of memory, users should increase the crash kernel memory reservation or trace buffer size (via "trace_buf_size=nn[KMG]") as needed.
Signed-off-by: Xunlei Pang xlpang@redhat.com
dracut-memdebug-ko.sh | 117 ++++++++++++++++++++++++++++++++++++++++++++++++++
Have not dig the details, just run the script I got: ./dracut-memdebug-ko.sh: line 117: return: can only `return' from a function or sourced script
replace all return with exit works for me.
Cool, it can be used as a standalone script without dracut dependent. Is it possible to extend it and create a standalone package?
It actually contains three stages: 1) prepare and setup trace; 2) collect enough trace data; 3) parse the trace data. It relys on dracut to determine the proper stages, or at least some manual setup.
I used it manually: ./dracut-memdebug-ko.sh
modprobe somemodule ./dracut-memdebug-ko.sh 1
So seems if is still useful, but it is just an random thought, there might be other page alloc during the module loading, I'm not sure if we can distingush them.
Yes, we can. This patch accounts the events between "module_load" and the following first "module_put".
Regards, Xunlei
Regards, Xunlei
Thanks Dave
On 2016/11/02 at 13:07, Dave Young wrote:
On 10/31/16 at 03:15pm, Xunlei Pang wrote:
Add dracut-memdebug-ko.sh, install it to the dracut kdump module.
The principle is to use kernel trace to track buddy page allocation events during kernel module loading(module_init), thus we can analyze all the trace data and get the total memory consumption. as for large slab allocation, it will fall into buddy, thus tracing "mm_page_alloc" only should be enough for the purpose.
One major flaw of this method is that it consumes a lot of memory, users should increase the crash kernel memory reservation or trace buffer size (via "trace_buf_size=nn[KMG]") as needed.
Signed-off-by: Xunlei Pang xlpang@redhat.com
dracut-memdebug-ko.sh | 117 ++++++++++++++++++++++++++++++++++++++++++++++++++
Have not dig the details, just run the script I got: ./dracut-memdebug-ko.sh: line 117: return: can only `return' from a function or sourced script
Hi Dave,
This is on purpose by always returning 0(true), you can run it like this:
. dracut-memdebug-ko.sh
Strictly this script should be run under dracut environment, it includes something like "warn" which is implemented by dracut. But normally it still works.
Regards, Xunlei
On 11/02/16 at 01:18pm, Xunlei Pang wrote:
On 2016/11/02 at 13:07, Dave Young wrote:
On 10/31/16 at 03:15pm, Xunlei Pang wrote:
Add dracut-memdebug-ko.sh, install it to the dracut kdump module.
The principle is to use kernel trace to track buddy page allocation events during kernel module loading(module_init), thus we can analyze all the trace data and get the total memory consumption. as for large slab allocation, it will fall into buddy, thus tracing "mm_page_alloc" only should be enough for the purpose.
One major flaw of this method is that it consumes a lot of memory, users should increase the crash kernel memory reservation or trace buffer size (via "trace_buf_size=nn[KMG]") as needed.
Signed-off-by: Xunlei Pang xlpang@redhat.com
dracut-memdebug-ko.sh | 117 ++++++++++++++++++++++++++++++++++++++++++++++++++
Have not dig the details, just run the script I got: ./dracut-memdebug-ko.sh: line 117: return: can only `return' from a function or sourced script
Hi Dave,
This is on purpose by always returning 0(true), you can run it like this:
. dracut-memdebug-ko.sh
To drop the "return" it is better to add some functions, and then in this scripts call the functions, then no need return and exit any more.
Strictly this script should be run under dracut environment, it includes something like "warn" which is implemented by dracut. But normally it still works.
Regards, Xunlei
On 2016/11/02 at 13:41, Dave Young wrote:
On 11/02/16 at 01:18pm, Xunlei Pang wrote:
On 2016/11/02 at 13:07, Dave Young wrote:
On 10/31/16 at 03:15pm, Xunlei Pang wrote:
Add dracut-memdebug-ko.sh, install it to the dracut kdump module.
The principle is to use kernel trace to track buddy page allocation events during kernel module loading(module_init), thus we can analyze all the trace data and get the total memory consumption. as for large slab allocation, it will fall into buddy, thus tracing "mm_page_alloc" only should be enough for the purpose.
One major flaw of this method is that it consumes a lot of memory, users should increase the crash kernel memory reservation or trace buffer size (via "trace_buf_size=nn[KMG]") as needed.
Signed-off-by: Xunlei Pang xlpang@redhat.com
dracut-memdebug-ko.sh | 117 ++++++++++++++++++++++++++++++++++++++++++++++++++
Have not dig the details, just run the script I got: ./dracut-memdebug-ko.sh: line 117: return: can only `return' from a function or sourced script
Hi Dave,
This is on purpose by always returning 0(true), you can run it like this:
. dracut-memdebug-ko.sh
To drop the "return" it is better to add some functions, and then in this scripts call the functions, then no need return and exit any more.
Yes, will update this way when making dracut patch.
Regards, Xunlei
Strictly this script should be run under dracut environment, it includes something like "warn" which is implemented by dracut. But normally it still works.
Regards, Xunlei
If there is any "rd.kodebug", we will monitor the kernel module memory consumption as follows: 1) Use dracut early inst_hook to setup tracing, as no kernel module is supposed to be loaded at this point. 2) Parse the trace data at kdump's pre_dump stage, as all kernel modules needed are supposed to be loaded at this point.
Signed-off-by: Xunlei Pang xlpang@redhat.com --- dracut-kdump.sh | 9 +++++++++ dracut-module-setup.sh | 9 +++++++++ kdumpctl | 14 ++++++++++++++ 3 files changed, 32 insertions(+)
diff --git a/dracut-kdump.sh b/dracut-kdump.sh index 42ba37f..7ada205 100755 --- a/dracut-kdump.sh +++ b/dracut-kdump.sh @@ -33,6 +33,15 @@ do_kdump_pre() if [ -n "$KDUMP_PRE" ]; then "$KDUMP_PRE" fi + + # If cmdline hook exists, we know that memdebug-ko.sh + # was activated. + # + # Execute memdebug-ko.sh before dumping, at this point + # all kernel modules are supposed to be loaded. + if [ -f /lib/dracut/hooks/cmdline/00-memdebug-ko.sh ]; then + . /lib/dracut/hooks/cmdline/00-memdebug-ko.sh 1 + fi }
do_kdump_post() diff --git a/dracut-module-setup.sh b/dracut-module-setup.sh index 68e0ff8..1785a12 100755 --- a/dracut-module-setup.sh +++ b/dracut-module-setup.sh @@ -733,4 +733,13 @@ install() { # target. Ideally all this should be pushed into dracut iscsi module # at some point of time. kdump_check_iscsi_targets + + # Setup memdebug-ko if there is any "/tmp/.kdump.memdebug.ko.tmp* which + # has been created by kdump temporarily as a hint. + ls /tmp/.kdump.memdebug.ko.tmp* > /dev/null 2>&1 + if [[ $? -eq 0 ]]; then + # Prepare tracing kernel module memory consumption. + # NOTE: Change the flename in dracut-kdump.sh::do_kdump_pre() accordingly. + inst_hook cmdline 00 "$moddir/memdebug-ko.sh" + fi } diff --git a/kdumpctl b/kdumpctl index bf12d0d..171f252 100755 --- a/kdumpctl +++ b/kdumpctl @@ -178,12 +178,26 @@ rebuild_fadump_initrd()
rebuild_kdump_initrd() { + # Clean up memdebug.ko temp file if any + rm /tmp/.kdump.memdebug.ko.tmp* -rf + + echo "$(prepare_cmdline)" | grep -q "rd.kodebug" + if [ $? -eq 0 ]; then + # As a hint to inst_hook memdebug.ko.sh for dracut 99kdumpbase, + # see dracut-module-setup.sh. + touch /tmp/.kdump.memdebug.ko.tmp$$$$ + fi + $MKDUMPRD $TARGET_INITRD $kdump_kver if [ $? != 0 ]; then echo "mkdumprd: failed to make kdump initrd" >&2 + # Clean up memdebug.ko temp file if any + rm /tmp/.kdump.memdebug.ko.tmp* -rf return 1 fi
+ # Clean up memdebug.ko temp file if any + rm /tmp/.kdump.memdebug.ko.tmp* -rf return 0 }
Update the doc to include the help for rd.kodebug.
Signed-off-by: Xunlei Pang xlpang@redhat.com --- kexec-kdump-howto.txt | 37 +++++++++++++++++++++++++++++++++++++ 1 file changed, 37 insertions(+)
diff --git a/kexec-kdump-howto.txt b/kexec-kdump-howto.txt index f46563f..87f3821 100644 --- a/kexec-kdump-howto.txt +++ b/kexec-kdump-howto.txt @@ -706,4 +706,41 @@ Debugging Tips
Now minicom should be logging serial console in file console-logs.
+- Debug the large memory consumption by kernel modules + + Besides dracut "rd.memdebug=[0-3]" support to debug kdump system memory, + kdump supports a fine-grained memory monitor for kernel modules to help + figure out which kernel module consumes a large amount of memory(the larger, + the more precise the result is). + + One can pass "rd.kodebug" to kdump kernel through "KDUMP_COMMANDLINE_APPEND" + defined in /etc/sysconfig/kdump, touch /etc/kdump.conf, restart kdump, then + the memory consumption information by kernel modules will be printed out to + the console before the vmcore dumping starts. + + As an example, it prints out something below: + == debug_mem for kernel modules during loading begin == + 0 pages consumed by "pata_acpi" [load finished] + 0 pages consumed by "ata_generic" [load finished] + 0 pages consumed by "drm" [load finished] + 1 pages consumed by "ttm" [load finished] + 0 pages consumed by "drm_kms_helper" [load finished] + 834 pages consumed by "qxl" [load finished] + 0 pages consumed by "mii" [load finished] + 5 pages consumed by "8139cp" [load finished] + 0 pages consumed by "8139too" [load finished] + 0 pages consumed by "virtio" [load finished] + 0 pages consumed by "virtio_ring" [load finished] + 10 pages consumed by "virtio_pci" [load finished] + 0 pages consumed by "serio_raw" [load finished] + 0 pages consumed by "crc32c_intel" [load finished] + 199 pages consumed by "virtio_console" [load finished] + 0 pages consumed by "libcrc32c" [load finished] + 8 pages consumed by "xfs" [load finished] + == debug_mem for kernel modules during loading end == + + If you encounters OOM when using the feature, you can enlarge the reserved memory + via "crashkernel" and try again. If you find some kernel module is not included + in the print, you can enlarge the trace buffer via "trace_buf_size"(the default + size set is 5MB in total for all cpus), and try again.
On Monday 31 October 2016 12:45 PM, Xunlei Pang wrote:
One can pass "rd.kodebug" to kdump kernel through "KDUMP_COMMANDLINE_APPEND"
- defined in /etc/sysconfig/kdump, touch /etc/kdump.conf, restart kdump,
Hummm...not sure...just thinking..if there is something which should force a rebuild... should that be in /etc/sysconfig/kdump?
~Pratyush
On 2016/11/03 at 11:07, Pratyush Anand wrote:
On Monday 31 October 2016 12:45 PM, Xunlei Pang wrote:
One can pass "rd.kodebug" to kdump kernel through "KDUMP_COMMANDLINE_APPEND"
- defined in /etc/sysconfig/kdump, touch /etc/kdump.conf, restart kdump,
Hummm...not sure...just thinking..if there is something which should force a rebuild... should that be in /etc/sysconfig/kdump?
Yes, for such cases, it actually needs a kdump kernel reload.
Regards, Xunlei
On 2016/10/31 at 15:15, Xunlei Pang wrote:
The current method for kdump memory debug is to use dracut "rd.memdebug=[0-3]", it is not enough for debugging kernel modules. For example, when we want to find out which kernel module consumes a large amount of memory, "rd.memdebug" won't help too much.
A better way is needed to achieve this requirement, this is very useful for kdump OOM debugging.
The principle of this patch series is to use kernel trace to track slab and buddy allocation calls during kernel module loading(module_init), thus we can analyze all the trace data and get the total memory consumption. As for large slab allocation, it will probably fall into buddy allocation, thus tracing "mm_page_alloc" alone should be enough for the purpose.
The trace events include memory calls under "tracing/events/": kmem/mm_page_alloc
We also inpect the following events to detect the module loading: module/module_load module/module_put
We can get the module name and task pid from "module_load" event which also mark the beginning of the loading, and module_put called by the same task pid implies the end of the loading. So the memory events recorded in between by the same task pid are consumed by this module during loading(i.e. modprobe or module_init()).
With these information, we can record the total memory(the larger, the more precise) consumption involved by each kernel module loading.
One major flaw of this method is that the trace ring buffer consumes a lot of memory. If it is too small, old records maybe be overwritten by subsequent records. The trace ring buffer is set to be 5MB by default, but it can be overridden by users via the standard kernel boot parameter "trace_buf_size".
Users should increase the crash kernel memory reservation as needed after setting large trace ring buffer size, in case oom happens during debugging.
Usage: 1)Pass "rd.kodebug" to kdump kernel cmdline using "KDUMP_COMMANDLINE_APPEND" in /etc/sysconfig/kdump.
Dave once mentioned that to integrate this feature into dracut "rd.memdebug", thinking more that this is implemented in a completely different way, and there is one major flaw that it consumes lots of memory and may probably cause OOM on large systems, also we need to find a call site where all the modules are supposed to be loaded. I'd like to consider it on dracut's side if we can find some way to reduce the trace memory one day.
Thus I added a new "rd.kodebug" on kdump side for this feature, after all kdump is the known user that longs for it until now.
Regards, Xunlei
2)Pass the extra "trace_buf_size=nn[KMG]" to specify trace ring buffer size(per cpu) as needed.
As an example, it prints out something below on my kvm machine: == debug_mem for kernel modules during loading begin == 4 pages consumed by "dm_mod" [load finished] 0 pages consumed by "dm_log" [load finished] 0 pages consumed by "dm_region_hash" [load finished] 0 pages consumed by "dm_mirror" [load finished] 9 pages consumed by "sunrpc" [load finished] 24 pages consumed by "floppy" [load finished] 0 pages consumed by "libata" [load finished] 0 pages consumed by "i2c_core" [load finished] 27 pages consumed by "ata_piix" [load finished] 0 pages consumed by "drm" [load finished] 1 pages consumed by "ttm" [load finished] 0 pages consumed by "drm_kms_helper" [load finished] 1604 pages consumed by "qxl" [load finished] 0 pages consumed by "virtio" [load finished] 0 pages consumed by "virtio_ring" [load finished] 10 pages consumed by "virtio_pci" [load finished] 1 pages consumed by "pata_acpi" [load finished] 0 pages consumed by "ata_generic" [load finished] 0 pages consumed by "serio_raw" [load finished] 0 pages consumed by "crc32c_intel" [load finished] 0 pages consumed by "crct10dif_common" [load finished] 0 pages consumed by "crct10dif_pclmul" [load finished] 278 pages consumed by "virtio_net" [load finished] 198 pages consumed by "virtio_console" [load finished] 0 pages consumed by "cdrom" [load finished] 6 pages consumed by "sr_mod" [load finished] 162 pages consumed by "virtio_blk" [load finished] 1 pages consumed by "fscache" [load finished] 0 pages consumed by "lockd" [load finished] 17 pages consumed by "nfs" [load finished] 0 pages consumed by "libcrc32c" [load finished] 0 pages consumed by "dns_resolver" [load finished] 8 pages consumed by "xfs" [load finished] 0 pages consumed by "nfsv4" [load finished] == debug_mem for kernel modules during loading end ==
We can clearly see that "qxl" loading consumed more than 6MB memory.
Xunlei Pang (3): memdebug-ko: add dracut-memdebug-ko.sh to debug kernel module memory consumption module-setup: apply kernel module memory debug support kexec-kdump-howto: add the debugging tip for rd.kodebug
dracut-kdump.sh | 9 ++++ dracut-memdebug-ko.sh | 117 +++++++++++++++++++++++++++++++++++++++++++++++++ dracut-module-setup.sh | 9 ++++ kdumpctl | 14 ++++++ kexec-kdump-howto.txt | 37 ++++++++++++++++ kexec-tools.spec | 2 + 6 files changed, 188 insertions(+) create mode 100755 dracut-memdebug-ko.sh