On 7/7/22 16:13, Christian Hergert wrote:
Sysprof has modular data collection backends, and not everything
requires linking against libunwind.
For those not familiar with Sysprof, or profiling the desktop at large, generally a
single program is not the problem. The performance problems often exist across a number of
processes. That can be anything from a library used by multiple applications which
cumulatively waste resources, IPC across programs, thundering herds when files on disk
change, GPU usage, CPU frequency scaling, memory bandwidth, RAPL, etc.
So Sysprof has a binary logging format that is straight-forward, efficient, and allows us
to record many different types of information within a single file. That file format is
used by a number of tools in the stack from GLib, Pango, Gtk, Mutter, GNOME Shell, GJS,
various libraries, and applications on top of it. It can capture counters, stack traces,
file contents, marks, logs, and a multitude of other data frames.
These capture files can also be muxed together at any point.
Some of the modular data collectors require libunwind, many do not. For example, the
memprof collector records the backtraces from malloc/free/etc. But the GJS data-collector
can use SpiderMonkey's internal APIs to get backtraces from a SIGPROF sigaction. The
most used collector, however, is the perf collector which is just reading from a perf fd
mmap'd into a ring buffer.
The perf collector doesn't record the whole stack because the amount of time it takes
to decode a 30 second system-wide capture with DWARF/etc is so slow practically nobody
would use it.
The best profiler is the one people will use.
We have an in-tree parser for ELF that allows us to avoid a lot of extraneous code when
extracting symbols. Partially because libunwind is incredibly slow (by profiler
requirements), and partially because historically we never had to stash stack frames for
contextual unwinding.
Could we write a new data collection module that does DWARF unwinding and stashes some
8kb of stack? Sure. Would people use it? Probably not, because again, it's so slow
that people will start profiling by intuition again which is probably the worst of all
options.
Of course stashing the stack is not a good option. I just don’t
think frame pointers are a good solution either. The correct solution
(albeit the most difficult one) is to find a way to perform efficient
profiling without frame pointers. I do not have the resources to
write such a solution, but I am almost certain that Meta does.
Can we write a eBPF kernel module to decode symbols there? Maybe? Can
I? Probably not.
Somebody else could, though. And it would not make the people who do
not do system-wide profiling pay the price that frame pointers enact.
Windows can do profiling without having to use frame pointers.
There is no reason that Linux cannot as well.
Personally, I think some libraries should not be compiled with
-fno-omit-frame-pointer. However, I think that number is much smaller than the opposite.
Encryption, graphics drivers, etc all seem like good candidates here to be explicit about
performance requirements.
Many encryption libraries will generally not have a frame pointer
because much of the actual encryption code is hand-written assembler.
glibc string functions do not maintain a frame pointer either.
--
Sincerely,
Demi Marie Obenour (she/her/hers)