On Mo, 06.01.20 11:22, Michael Catanzaro (mcatanzaro(a)gnome.org) wrote:
So I talked to Tejun Heo about this (kernel cgroups maintainer,
working for facebook with the people who did the PSI stuff, kernel mm
guy). Here's the gist:
- earlyoom might be OK as short time stopgap if people really want to
hurry something, as long as it watches only swap depletion (which it
pretty much does already). But it should then also determine what
to kill taking the swap use into account and little else (which it
apparently does not). This doesn't make any sense to have though if
there is no swap.
- Don't bother with the OOM score the kernel calculates for processes,
it doesn't take the swap use into account. That said, do take the
configurable OOM score *adjustment* into account, so that processes
which set that are respected, i.e. journald, udevd, and such. (or in
otherwords, ignore /proc/$PID/oom_score, but respect
/proc/PID/oom_score_adj).
- going down to 100ms poll intervals is a bad idea, 1s is sufficient,
maybe higher.
- facebook is working on making oomd something that just works for
everyone, they are in the final rounds of canonicalizing the
configuration so that it can just work for all workloads without
tuning. The last bits for this to be deployable are currently being
done on the kernel side ("iocost"), when that's in, they'll submit
oomd (or simplified parts of it) to systemd, so that it's just there
and works. It's their expressive intention to make this something
that also works for desktop stuff and requires no further
tuning. they also will do the systemd work necessary. time frame:
half a year, maybe one year, but no guarantees.
- oomd currently polls some parameters in time intervals too,
still. They are working on getting rid of that too, so that
everything is event based via PSI. Given their own focus on servers
it's not a primary goal, but still a goal.
Or in other words: oomd is the way to go in the long run, developed
alongside the kernel features backing it. You can use it already if
you like, but there are still too many knobs for generic
deployment. earlyoom might be a valid temporary stopgap if you want to
hurry this.
(And now I hope I paraphrased everything he said more or less
correctly...)
if you want to know more about fb's oomd:
https://cfp.all-systems-go.io/ASG2019/talk/DQX3DH/
(but before this will enter systemd it's gonna be dumbed down, i.e,
less configuration, more "just works")
Lennart
--
Lennart Poettering, Berlin