Hi Xunlei,
It looks nice to me. Few minor comments:
On Tuesday 01 November 2016 12:42 PM, Xunlei Pang wrote:
On 2016/11/01 at 14:10, Xunlei Pang wrote:
> On 2016/10/31 at 15:15, Xunlei Pang wrote:
>> Add dracut-memdebug-ko.sh, install it to the dracut kdump module.
>>
>> The principle is to use kernel trace to track buddy page allocation
>> events during kernel module loading(module_init), thus we can analyze
>> all the trace data and get the total memory consumption. as for large
>> slab allocation, it will fall into buddy, thus tracing "mm_page_alloc"
>> only should be enough for the purpose.
>>
>> One major flaw of this method is that it consumes a lot of memory, users
>> should increase the crash kernel memory reservation or trace buffer size
>> (via "trace_buf_size=nn[KMG]") as needed.
> We can address this major flaw now, as we can use filter to make the trace data very
small.
> Here is the improved version:
>
> Subject: [PATCH v2 1/3] memdebug-ko: add dracut-memdebug-ko.sh to debug kernel
> module memory consumption
>
> Add dracut-memdebug-ko.sh, install it to the dracut kdump module.
>
> The principle is to use kernel trace to track buddy page allocation
> events during kernel module loading(module_init), thus we can analyze
> all the trace data and get the total memory consumption. as for large
> slab allocation, it will fall into buddy, thus tracing "mm_page_alloc"
> alone should be enough for the purpose.
>
> There are three kinds of known applications for module loading:
> "systemd-udevd", "modprobe" and "insmod".
>
> We utilize them as the mm_page_alloc filter, so that loads of events
> can be avoided. As a result, we get very small trace data.
>
> Signed-off-by: Xunlei Pang <xlpang(a)redhat.com>
> ---
> dracut-memdebug-ko.sh | 111 ++++++++++++++++++++++++++++++++++++++++++++++++++
> kexec-tools.spec | 2 +
> 2 files changed, 113 insertions(+)
> create mode 100755 dracut-memdebug-ko.sh
>
> diff --git a/dracut-memdebug-ko.sh b/dracut-memdebug-ko.sh
> new file mode 100755
> index 0000000..fc0a4ba
> --- /dev/null
> +++ b/dracut-memdebug-ko.sh
> @@ -0,0 +1,111 @@
> +# Try to find out kernel modules with large total memory allocation during loading.
> +# For large slab allocation, it will fall into buddy, thus tracing
"mm_page_alloc"
> +# alone should be enough for the purpose.
> +
> +TRACE_BASE="/sys/kernel/debug"
> +# trace access through debugfs would be obsolete if "/sys/kernel/tracing"
is available.
> +if [[ -d "/sys/kernel/tracing" ]]; then
> + TRACE_BASE="/sys/kernel"
> +fi
> +
> +# old debugfs case.
> +if ! [[ -d "$TRACE_BASE/tracing" ]]; then
> + mount none -t debugfs $TRACE_BASE
> +# new tracefs case.
> +elif ! [[ -f "$TRACE_BASE/tracing/trace" ]]; then
> + mount none -t tracefs "$TRACE_BASE/tracing"
> +fi
> +
> +if ! [[ -f "$TRACE_BASE/tracing/trace" ]]; then
> + warn "Mount trace failed for kernel module memory analyzing."
> + return 0
> +fi
> +
> +MATCH_EVENTS="module:module_put module:module_load kmem:mm_page_alloc"
> +SET_EVENTS=$(echo $(cat $TRACE_BASE/tracing/set_event))
> +# Check if trace was properly setup, prepare it if not.
> +if [[ $(cat $TRACE_BASE/tracing/tracing_on) != 1 ]] || \
IIUC, then we expect that this if condition is executed when
memdebug-ko.sh cmdline hook is installed, right?
ie.
inst_hook cmdline 00 "$moddir/memdebug-ko.sh"
Although, there is no possibility that this hook is installed twice at
present, still we should keep the code in such a way that even if this
script is called twice through inst_hook, it should return from 'if'.
May be where there is no argument passed to the hook we can return after
this 'if'.
> + [[ "$SET_EVENTS" != "$MATCH_EVENTS" ]];
then
> + # Set our trace events.
> + echo $MATCH_EVENTS > $TRACE_BASE/tracing/set_event
> +
> + # There are three kinds of known applications for module loading:
> + # "systemd-udevd", "modprobe" and "insmod".
> + # Set them to the mm_page_alloc event filter.
> + page_alloc_filter="comm == systemd-udevd* || comm == modprobe* || comm ==
modprobe*"
Oops, this line should be, seems having "*" doesn't work for the filter:
page_alloc_filter="comm == systemd-udevd || comm == modprobe || comm ==
insmod"
> + echo $page_alloc_filter >
$TRACE_BASE/tracing/events/kmem/mm_page_alloc/filter
> +
> + # Set the number of comm-pid. Thanks to filters, 4096 is big enough(also
generally supported).
> + echo 4096 > $TRACE_BASE/tracing/saved_cmdlines_size
> +
> + # Enable and clear trace data.
I do not see the possibility of any other events enabled in kdump
kernel. Still, disabling them would be safer.
echo 0 > $TRACE_BASE/tracing/events/enable
> + echo 1 > $TRACE_BASE/tracing/tracing_on
> + echo > $TRACE_BASE/tracing/trace
> + return 0
> +fi
> +
> +# Indexed by task pid.
> +declare -A current_module
> +# Indexed by module name.
> +declare -A module_loaded
> +declare -A nr_alloc_pages
> +
> +# For large trace data, parsing tracing/trace turns out to be very slow,
> +# so copy it out first and we parse the copy file to avoid this issue.
> +TMPFILE=/tmp/kdump.trace.tmp.$$$$
> +cp $TRACE_BASE/tracing/trace $TMPFILE -f
> +while read pid cpu flags ts function ;
> +do
> + # Skip comment lines
> + if [[ $pid = "#" ]]; then
> + continue
> + fi
> +
> + if [[ $function = module_load* ]]; then
> + # One module is being loaded, save the task pid for tracking.
> + module_name=${function#*: }
> + # Remove the trailing after whitespace, there may be the module flags.
> + module_name=${module_name%% *}
> + module_names+=" $module_name"
> + current_module[$pid]="$module_name"
> + [[ ${module_loaded[$module_name]} ]] && warn
"\"$module_name\" was loaded multiple times!"
> + unset module_loaded[$module_name]
> + nr_alloc_pages[$module_name]=0
> + fi
> +
> + if ! [[ ${current_module[$pid]} ]]; then
> + continue
> + fi
> +
> + if [[ $function = module_put* ]]; then
> + # Mark the module as loaded
> + module_loaded[${current_module[$pid]}]=1
> + # Module has been loaded when module_put is called, untrack the task
> + unset current_module[$pid]
> + continue
> + fi
> +
> + # Once we get here, the task is being tracked(is loading a module).
> + # Get the module name.
> + module_name=${current_module[$pid]}
> +
> + if [[ $function = mm_page_alloc* ]]; then
> + order=$(echo $function | sed -e 's/.*order=\([0-9]*\) .*/\1/')
> + nr_alloc_pages[$module_name]=$((${nr_alloc_pages[$module_name]}+$((2 **
$order))))
> + fi
> +done < $TMPFILE
> +
> +echo "== debug_mem for kernel modules during loading begin ==" >&2
> +for i in $module_names; do
> + status="load finished"
> + if ! [[ ${module_loaded[$i]} ]]; then
> + status="loading"
> + fi
> + echo "${nr_alloc_pages[$i]} pages consumed by \"$i\"
[$status]" >&2
> +done
> +echo "== debug_mem for kernel modules during loading end ==" >&2
> +
> +unset module_names
> +unset module_loaded
> +rm $TMPFILE -f
> +return 0
> diff --git a/kexec-tools.spec b/kexec-tools.spec
> index 1597071..691ad7a 100644
> --- a/kexec-tools.spec
> +++ b/kexec-tools.spec
> @@ -41,6 +41,7 @@ Source103: dracut-kdump-error-handler.sh
> Source104: dracut-kdump-emergency.service
> Source105: dracut-kdump-error-handler.service
> Source106: dracut-kdump-capture.service
> +Source107: dracut-memdebug-ko.sh
>
> Requires(post): systemd-units
> Requires(preun): systemd-units
> @@ -224,6 +225,7 @@ cp %{SOURCE103}
$RPM_BUILD_ROOT/etc/kdump-adv-conf/kdump_dracut_modules/99kdumpb
> cp %{SOURCE104}
$RPM_BUILD_ROOT/etc/kdump-adv-conf/kdump_dracut_modules/99kdumpbase/%{remove_dracut_prefix
%{SOURCE104}}
> cp %{SOURCE105}
$RPM_BUILD_ROOT/etc/kdump-adv-conf/kdump_dracut_modules/99kdumpbase/%{remove_dracut_prefix
%{SOURCE105}}
> cp %{SOURCE106}
$RPM_BUILD_ROOT/etc/kdump-adv-conf/kdump_dracut_modules/99kdumpbase/%{remove_dracut_prefix
%{SOURCE106}}
> +cp %{SOURCE107}
$RPM_BUILD_ROOT/etc/kdump-adv-conf/kdump_dracut_modules/99kdumpbase/%{remove_dracut_prefix
%{SOURCE107}}
> chmod 755
$RPM_BUILD_ROOT/etc/kdump-adv-conf/kdump_dracut_modules/99kdumpbase/%{remove_dracut_prefix
%{SOURCE100}}
> chmod 755
$RPM_BUILD_ROOT/etc/kdump-adv-conf/kdump_dracut_modules/99kdumpbase/%{remove_dracut_prefix
%{SOURCE101}}
>
~Pratyush