On Sat, Jun 6, 2020 at 4:18 AM David Kaufmann <astra(a)ionic.at> wrote:
Hi!
On Sat, Jun 06, 2020 at 01:15:35AM -0700, Samuel Sieb wrote:
> 5GB of swap space that normally would be on disk is now taking less than 2G
> of RAM. Instead of the usual 6G in the disk swap, now I have less than 2.
What's bugging me about this thread is that quite a few people made
comparisions resulting in "compressed zram is smaller than uncompressed
disk swap". For me this seems quite obvious, and looks like flawed
testing.
Wouldn't it be more correct to also compress disk swap to compare size?
This would probably make the size-argument void, as I'd expect them to
be the same size. If it's not, that would be an useful data point.
Also I don't care about 10GB stuff sitting in disk swap, whilst 1GB of
RAM could be quite essential. This leaves overall performance as a
potential benefit of zram swap.
To me this sounds like too much dependency on swap. There is a
balance. You really do want some swap, because incidental and
occasional heavy swap, is merely inactive page eviction and it's good
to get them out of the way because it frees memory for real work. And
it avoids reclaim, which is what must happen in the noswap case. What
people hate is slow swap. So the proposal is basically: let's use some
fast swap and not a lot of slow swap. It's mainly an optimization.
I expect special workloads will need to make adjustments, and in the
case of heavy persistent swap it might be zswap is better or it might
be that nothing can help that workload except buying more RAM.
What almost noone wrote about is CPU usage, for which I saw two
anecdotal references: "no change" and "doubled CPU usage".
For sure there is an impact on CPU. This exchanges IO bound work, for
CPU and memory bound work. But it's pretty lightweight compression.
And again, whatever is defined as "too much" CPU hit for the workload,
necessarily translates into either "suffer the cost of IO bound
swap-on-drive, or suffer the cost of more memory." There is no free
lunch. It really isn't magic.
One thing this might alter the calculus of is swappiness. Because the
zram device is so much faster, page out page in is a lower penalty
than file reclaim reads. So now instead of folks thinking swappiness
should be 1 (or even 0), it's the opposite. It probably should be 100.
See the swappiness section in the Chris Down article referenced in the proposal:
https://chrisdown.name/2018/01/02/in-defence-of-swap.html
It would be interesting to see a comparison of swap, compressed swap and
zram swap performance while having processes competing on swap.
E.g. System with 8 GB RAM:
* 2 processes with 3.5 GB memory each
this would leave about 1GB for the system itself, so with disk swap
there should be not much swapping and with zram swap I'd expect little
swapping too.
* 2 processes with 4 GB memory each
this would force light disk swap usage in the first two cases, I'm not
really sure what it would do to zram swap.
* 2 processes with 4.5 GB memory each
this would definitely lead to constant swapping, but it should still
fit in zram swap, with a ratio of 2:1 it should theoretically fit in a
2GB zram device, and realistically in a 4GB zram device.
I'm not sure. Maybe. But this is the plus of having it available and
folks testing it. It is still as rudimentary as swap. It might be
useful to make it a bit smarter, look into some of the cgroupsv2
resource control work for hints on how to make the zram device bigger
or smaller. Some workloads might favor a zram device sized to 100%
RAM, without difficulty.
For the programs I'd think of something allocating memory and
writing/reading random parts of it, running for N iterations.
Interesting points would be: average CPU usage and wallclock time for
each of the processes.
I've left out the obvious cases of "not enough memory usage to use any
of those anyway" and "too much memory usage that it results in oom".
oom killer should not invoke in daily usage anyway, I see that as a sign
for "something has gone seriously wrong", comparable with "program
segfaults on start".
There are worse things than OOM. Stuck with a totally unresponsive
system and no OOM on the way. Hence earlyoom. And on-going resource
control work with cgroupsv2 isolation.
--
Chris Murphy