On Mon, Dec 21, 2020 at 12:49 PM Colin Walters <walters(a)verbum.org> wrote:
On Mon, Dec 21, 2020, at 11:28 AM, Ben Cotton wrote:
>
>
>
> == Summary ==
>
> RPM Copy on Write provides a better experience for Fedora Users as it
> reduces the amount of I/O and offsets CPU cost of package
> decompression. RPM Copy on Write uses reflinking capabilities in
> btrfs, which is the default filesystem in Fedora 33.
A bunch of points here:
- No, it's the default for one Edition. Others don't default to it. And even
for Workstation we can't *require* it because it's definitely supported to use
other filesystems and storage layouts.
- Orthogonal to this, I'd also note that xfs supports reflinks too.
Combining those I'd say instead e.g.: "Most Fedora Editions default to a
filesystem that support reflinks, e.g. btrfs or xfs" (actually I think IoT defaults
to ext4 for...probably they didn't consider it?)
It'd be more accurate to say most Fedora variants default to Btrfs.
The only exceptions right now are Cloud, Server, and CoreOS. But yes,
Fedora Server's current default of XFS on LVM means it also supports
reflinks.
As an aside, I *really* hate this split of terminology we have among
Editions, Spins, and Labs. It's confusing to everyone. :(
- When talking about RPMs we need to think about container images,
which use overlayfs by default, which defers to the underlying filesystem for reflinks -
so should be fine, but should be explicitly written down (and tested)
- Generally incompatible RPM payload changes cause pain proportional to how far
they're "not backported", e.g. if support for this isn't in Fedora N-1
(e.g. Fedora 32) it will be harder for current Koji/mock model. Nowadays many more people
use podman than mock, which e.g. if using a RHEL8 host will naturally avoid the dependency
on an updated RPM. But
Incomplete statement here?
That said, we don't have a problem in the Koji/Mock model anymore, as
bootstrap mode is now activated. Additionally, Mock uses
systemd-nspawn by default for all cases except for with Koji (which
overrides this because it can't handle nspawn mode at the moment).
> # Decompression happens inline with download.
rpm-ostree does this by default today BTW (rpms are unpacked into local ostree commits in
parallel even).
> ## Regular RPMs use a compressed .cpio based payload. In contrast,
> extent based RPMs contain uncompressed data aligned to the fundamental
> page size of the architecture, e.g. 4KiB on x86_64. This alignment is
> required for <code>FICLONERANGE</code> to work. Only files are
> represented in the payload, other directory entries like symlinks,
> device nodes etc are constructed entirely from rpm header information.
This is the core change; some interesting tradeoffs here. Python projects in particular
ship a lot of files smaller than 4k (classic example is `__init__.py` which is zero
sized). And ppc64le is 64KiB pages right? So there will be "zero space" to
align, right? Would need some math to see how much this would add up to, although I guess
the implementation could instead use holes?
> Files are referenced by their digest, so identical files are
> de-duplicated.
But just inside a single RPM, right? It's interesting to compare with ostree which
does this by default; conceptually this is using reflinks inside a single RPM to do what
ostree does system wide with hardlinks.
BTW we learned a few things, notably zero sized files are tricky because there can be a
*lot* of them - see e.g.
https://github.com/ostreedev/ostree/pull/2197
That one was too many hardlinks, but how well do filesystems like btrfs/xfs handle
thousands of reflinks instead? The Python __init__.py thing is such a pathological
case...
> # Disk space requirements are expected to be marginally higher than
> before: all new packages or updates will consume their installed size
> before installation instead of about half their size (regular rpms
> with payloads still cost space).
This won't matter much for small updates but could be quite noticeable for larger
system upgrades.
This all said the more I think about this, wouldn't it be way simpler to change rpm
to support a "temporary root directory", e.g. `/usr/.rpmtemp` or whatever. Then
dnf/zypper/etc cam do the unpack-and-download model without any format changes to RPM -
instead of reflinking it'd just be rename() into place. This is effectively what
rpm-ostree is doing today except with ostree commits instead of a temporary directory.
Sure, this makes some degree of sense, but it doesn't reduce the IOPS
for actually *doing* the installation. My understanding is that this
Change is intended to reduce the thrashing when doing package
transactions.
This is also a flaw with RPM-OSTree, since you have to fetch
everything individually and construct the root by shifting hardlinks
or reflinks around.
--
真実はいつも一つ!/ Always, there's only one truth!