We're running Fedora 19 with the 3.11.10-200.fc19.x86_64 kernel (just
the normal RPM) on large servers (128GB RAM over two NUMA regions,
each with one hex-core processor) with a large number of processes
(more than 700, a couple hundred of which are active fairly
frequently). We encounter situations where there system gets
overwhelmed with migrate/N tasks from the kernel, based on what we've
seen in top.
Here's what we've tried:
* tuned-adm on latency-performance and virtual-host profiles
* kernel.sched_migration_cost_ns=5000000 (which tuned will do for
those profiles in v3.3/Fedora 20)
* numad
Here's what we've used for analysis:
* powertop
* top/htop
* perf record -a -g
* SystemTap with code to print out migrations occurring
* numatop
All we know is that the migration storms correlate with concurrent
Chef runs verifying/configuring containers on the system.
Obviously, Chef invokes many things. But, most of the migrations we
see are for Chef, the Ruby interpreter, the sh interpreter, and Munin.
Responsiveness returns to normal after SystemTap reports a large set
of chef-solo migrations, presumably at their completion.
David Strauss
Pantheon Systems
Fedora Server Working Group
Show replies by date