Fedora 32 system-wide change proposal: reduce installation media size
by improving the compression ratio of SquashFS filesystem
by Bohdan Khomutskyi
Summary
Improve compression ratio of SquashFS filesystem on the installation media.
Owner
Name: Bohdan Khomutskyi
Email: bkhomuts(a)redhat.com
Current status
Targeted release: I propose this change for Fedora 32
Last updated: Jan 5 2020
Pagure.io issue: https://pagure.io/releng/issue/9127
I was unable to create an article in Fedora wiki system.
Detailed Description
As of Fedora 31, the LiveOS/squashfs.img file on the installation image, is
compressed with default settings of mksquashfs. The standard configuration
is set to XZ algorithm with block size of 128k and BCJ filter enabled.
Those parameters can be adjusted which will lead to a better compression
ratio and/or reduction of the CPU usage at build time.
This is simple to achieve. Recently, Lorax has gotten support[1] for
adjusting the compression options for mksquashfs via the configuration
file. The file should be altered as following:
[compression]
bcj = yes
args = -b 1M -Xdict-size 1M -no-recovery
Where -b 1M and -Xdict-size 1M are block and dictionary sizes respectively.
Could be adjusted.
Benefit to Fedora
-
Reduction of the installation media size and the cost of storing and
distributing Fedora.
-
Reduction of the CPU usage at build time. Depending on which compression
parameters chosen.
-
See a graphical detail at https://pagure.io/releng/issue/9127.
Scope
-
Proposal owners:
The build environment should have support for adjusting the Lorax
configuration file.. Lorax is a program that produces the
LiveOS/squashfs.img file on the installation media.
One of the way to allow for such customization, is to add a feature in
Pungi, to allow for passing -c option to Lorax.
-
Release engineering: #9127 <https://pagure.io/releng/issue/9127>
-
Policies and guidelines: N/A
-
Trademark approval: N/A
Upgrade/compatibility impact
-
This change comes at a cost of higher memory usage during the
installation. Based on my personal estimations, this should not be the
issue. Since the decompression should require up to 1MiB per thread.
User Experience
-
Increasing the block size on the current configuration with EXT4 file
system, should increase latency while accessing the EXT4 filesystem. The
exact impact is to be evaluated.
-
The impact of latency will be reduced, if the plain SquashFS option is
be choosen.
Dependencies
-
N/A
Contingency Plan
-
N/A
Documentation
https://pagure.io/releng/issue/9127.
mksquashfs(1)
Release notes See also
https://pagure.io/releng/issue/8646
--
Bohdan Khomutskyi
Release Configuration Management engineer
Red Hat
1 year, 9 months
Fedora 33 System-Wide Change proposal: Fedora-Retired-Packages
by Ben Cotton
https://fedoraproject.org/wiki/Changes/Fedora-Retired-Packages
== Summary ==
All retired packages are obsoleted by `fedora-retired-packages`.
== Owner ==
* Name: [[User:msuchy| Miroslav Suchý]]
* Email: msuchy(a)redhat.com
== Detailed Description ==
Right now `fedora-obsoletes-package` retires packages which cause an
issue during an upgrade. We do nothing about all other retired
packages. Now imagine the following story (it already happened many
times):
We have package "foo". It is a leaf package. No one requires it. It
uses just basic libraries.
A user installs it during F32 lifetime.
Around F35 the upstream dies. Around F37 Fedora maintainer retires the
package (or orphan and it later become retired).
Because the package is a leaf package, it causes no pain during
upgrade F37->F38. Not even during upgrade to F39, F40, F41, F42. And
then during upgrade to F43 it suddenly causes a problem. But because
it is .fc37 everyone will hesitate to add it
fedora-obsolete-packages.fc43.
Additionally, during F38-F43, users may expect that their system is
fully updated and they have no security issues. But it is not true
about package "foo", which no one maintains. And users are not aware
of that because he does not follow fedora-devel mailing list.
Obviously.
What I propose is: As part of the retirement process we add the to
fedora-retired-packages:
Obsoletes: foo < %{latestversion+1}
And during upgrade from F37->F38 the package will be removed.
If the user wants to preserve the package (e.g., because it moved to
Copr), he simply uninstalls and protects the installation of
fedora-retired-packages. But that will be an informed decision.
The benefits are:
* we do not leave unmaintained packages on a user's machine.
* We make sure that archaic packages do not break upgrade between two
versions of Fedora.
== Feedback ==
After [https://bugzilla.redhat.com/show_bug.cgi?id=1816532#c5
discussion with fedora-obsolete-package maintainer] I filed this
Change proposal to include a wider audience.
See relevant [https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.o...
thread on devel mailing list].
== Benefit to Fedora ==
* We do not leave unmaintained packages on a user's machine.
* We make sure that archaic packages do not break upgrade between two
versions of Fedora.
== Scope ==
* Proposal owners:
Create package `fedora-retired-packages` as sub-package of
`fedora-obsolete-packages`
[https://bugzilla.redhat.com/show_bug.cgi?id=1816532 BZ#1816532]
Edit https://fedoraproject.org/wiki/How_to_remove_a_package_at_end_of_life#Obs...
guidelines with:
The retired package should be obsoleted by one of:
* fedora-obsoleted-packages - if the package can cause problem during
upgrade to next version of Fedora
* fedora-retired-packages - in all other cases
It is enough to open an issue on
https://src.fedoraproject.org/rpms/fedora-obsolete-packages
* Other developers:
No other work should be necessary.
* Release engineering:
This is optional. I may work with rel-eng to change
https://pagure.io/releng/blob/master/f/docs/source/sop_retire_orphaned_pa...
to automatically create PR for `fedora-obsolete-packages`
* Policies and guidelines: As stated above
https://fedoraproject.org/wiki/How_to_remove_a_package_at_end_of_life#Obs...
will need an update.
* Trademark approval: N/A (not needed for this Change)
== Upgrade/compatibility impact ==
During an upgrade, all retired packages will be automatically removed.
User may opt-out by:
<pre>
$ cat /etc/dnf/dnf.conf
[main]
...
exclude=fedora-retired-packages
</pre>
== How To Test ==
1. Upgrade to next version of Fedora.
2. Check all retired packages are removed.
== User Experience ==
- Packages that are no longer maintained are removed during a
distribution upgrade.
== Dependencies ==
This update has no dependencies on any other package.
== Contingency Plan ==
* Contingency mechanism: Drop `fedora-retired-package`. Or remove
`Obsoletes` from this sub-package.
* Contingency deadline: Beta freeze
* Blocks release? No
== Documentation ==
TBD
== Release Notes ==
TBD
--
Ben Cotton
He / Him / His
Senior Program Manager, Fedora & CentOS Stream
Red Hat
TZ=America/Indiana/Indianapolis
1 year, 9 months
[ELN] Opt out python2.7 from ELN
by Miro Hrončok
Hello,
as a maintainer of the python2.7 package I was surprised to see it being built
for ELN and I like to start a discussion on whether and how can I opt out this
deprecated package from ELN.
Since there is no tracker or dedicated mailing list, I am following the advice
given somewhere else on devel, to use this list, possibly with the [ELN] marker
in subject.
Thanks,
--
Miro Hrončok
--
Phone: +420777974800
IRC: mhroncok
1 year, 9 months
Lua 5.4.0
by Tom Callaway
I just built lua 5.4.0 in Rawhide. As with previous major updates of lua,
the package also includes a copy of the lua 5.3 libraries so that rawhide
does not just become broken reps. If you depend on lua, please rebuild your
packages in rawhide and let me know if you run into any issues.
Thanks,
Tom
1 year, 9 months
Fedora 32 System-Wide Change proposal (late): Enable EarlyOOM
by Ben Cotton
https://fedoraproject.org/wiki/Changes/EnableEarlyoom
== Summary ==
Install earlyoom package, and enable it by default. This will cause
the kernel oomkiller to trigger sooner, but will not affect which
process it chooses to kill off. The idea is to recover from out of
memory situations sooner, rather than the typical complete system hang
in which the user has no other choice but to force power off.
== Owner ==
* Name: [[User:chrismurphy| Chris Murphy]]
* Email: bugzilla(a)colorremedies.com
== Detailed Description ==
Workstation working group has discussed "better interactivity in
low-memory situations" for some months. In certain use cases,
typically compiling, if all RAM and swap are completely consumed,
system responsiveness becomes so abysmal that a reasonable user can
consider the system "lost", and resorts to forcing a power off. This
is objective a very bad UX. The broad discussion of this problem, and
some ideas for near term and long term solutions, is located here:
Recent long discussions on "Better interactivity in low-memory situations"<br>
https://pagure.io/fedora-workstation/issue/98<br>
https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.o...<br>
Fedora editions and spins, have the in-kernel OOM (out-of-memory)
manager enabled. The manager's concern is keeping the kernel itself
functioning. It has no concern about user space function or
interactivity. This proposed change attempts to improve the user
experience, in the short term, by triggering the in-kernel process
killing mechanism, sooner. Instead of the system becoming completely
unresponsive for tens of minutes, hours or days, the expectation is an
offending process (determined by oom_score, same as now) will be
killed off within seconds or a few minutes. This is an incremental
improvement in user experience, but admittedly still suboptimal. There
is additional work on-going to improve the user experience further.
Workstation working group discussion specific to enabling earlyoom by default
https://pagure.io/fedora-workstation/issue/119
Other in-progress solutions:<br>
https://gitlab.freedesktop.org/hadess/low-memory-monitor<br>
Background information on this complicated problem:<br>
https://www.kernel.org/doc/gorman/html/understand/understand016.html<br>
https://lwn.net/Articles/317814/<br>
== Benefit to Fedora ==
There are two major benefits to Fedora:
* improved user experience by more quickly regaining control over
one's system, rather than having to force power off in low-memory
situations where there's aggressive swapping. Once a system becomes
unresponsive, it's completely reasonable for the user to assume the
system is lost, but that includes high potential for data loss.
* reducing forced poweroff as the main work around will increase data
collection, improving understanding of low memory situations and how
to handle them better
== Scope ==
* Proposal owners:
a. Modify {{code|https://pagure.io/fedora-comps/blob/master/f/comps-f32.xml.in}}
to include earlyoom package for Workstation.<br>
b. Modify {{code|https://src.fedoraproject.org/rpms/fedora-release/blob/master/f/80-workstation.preset}}
to include:
<pre>
# enable earlyoom by default on workstation
enable earlyoom.service
</pre>
* Other developers:
Restricted to Workstation edition, unless other editions/spins want to opt-in.
* Release engineering: [https://pagure.io/releng/issues #9141] (a
check of an impact with Release Engineering is needed) <!-- REQUIRED
FOR SYSTEM WIDE CHANGES -->
* Policies and guidelines: N/A
* Trademark approval: N/A
== Upgrade/compatibility impact ==
earlyoom.service will be enabled on upgrade. An upgraded system should
exhibit the same behaviors as a clean installed system.
== How To Test ==
* Fedora 30/31 users can test today, any edition or spin:<br>
{{code|sudo dnf install earlyoom}}<br>
{{code|sudo systemctl enable --now earlyoom}}
And then attempt to cause an out of memory situation. Examples:<br>
{{code|tail /dev/zero}}<br>
{{code|https://lkml.org/lkml/2019/8/4/15}}
* Fedora Workstation 32 (and Rawhide) users will see this service is
already enabled. It can be toggled with {{code|sudo systemctl
start/stop earlyoom}} where start means earlyoom is running, and stop
means earlyoom is not running.
== User Experience ==
The most egregious instances this change is trying to mitigate:
a. RAM is completely used
b. Swap is completely used
c. System becomes unresponsive to the user as swap thrashing has ensued
--> earlyoom disabled, the user often gives up and forces power off
(in my own testing this condition lasts >30 minutes with no kernel
triggered oom killer and no recovery)
--> earlyoom enabled, the system likely still becomes unresponsive but
oom killer is triggered in much less time (seconds or a few minutes,
in my testing, after less than 10% RAM and 10% swap is remaining)
earlyoom starts sending SIGTERM once both memory and swap are below
their respective PERCENT setting, default 10%. It sends SIGKILL once
both are below their respective KILL_PERCENT setting, default 5%.
The package includes configuration file /etc/default/earlyoom which
sets option {{code|-r 60}} causing a memory report to be entered into
the journal every minute.
== Dependencies ==
earlyoom package has no dependencies
== Contingency Plan ==
* Contingency mechanism: Owner will revert all changes
* Contingency deadline: Final freeze
* Blocks release? No
* Blocks product? No
== Documentation ==
{{code|man earlyoom}}<br><br>
https://www.kernel.org/doc/gorman/html/understand/understand016.html
== Release Notes ==
Earlyoom service is enabled by default, which will cause kernel
oom-killer to trigger sooner. To revert to previous behavior:<br>
{{code|sudo systemctl disable earlyoom.service}}
And to customize see {{code|man earlyoom}}.
--
Ben Cotton
He / Him / His
Fedora Program Manager
Red Hat
TZ=America/Indiana/Indianapolis
1 year, 9 months
Headsup: dbus 1.12.10-1.fc29 is missing systemd dbus.service file,
breaking almost everything
by Hans de Goede
Hi All,
Just a quick headsup for users following Fedora 29, the
dbus 1.12.10-1.fc29 build is missing the systemd dbus.service
file, breaking almost everything.
Instead it contains a dbus-daemon.service file, but the
dbus.socket file expects a matching dbus.service, not
dbus-daemon.service.
So either hold of on applying updates until this is fixed
or exclude dbus.
Regards,
Hans
1 year, 9 months
Fedora 33 System-Wide Change proposal: CMake to do out-of-source builds
by Ben Cotton
https://fedoraproject.org/wiki/Changes/CMake_to_do_out-of-source_builds
== Summary ==
<code>%cmake</code> macro will be adjusted (<code>-B</code> parameter)
to use separate build folder (already standardized
<code>%{_vpath_builddir}</code> macro). Additionally,
<code>%cmake_build</code>, <code>%cmake_install</code> and
<code>%ctest</code> macro will be created (and backported to the older
supported Fedora releases) to perform various operations that are
commonly used with CMake in a backend-agnostic (Makefiles, Ninja,
etc.) way.
Packages that will stop building are trivial to fix and will be
adjusted either by maintainers or change owners.
== Owner ==
* Name: [[User:ignatenkobrain|Igor Raits]], [[User:besser82|Björn
Esser]], [[User:ngompa|Neal Gompa]]
* Email: ignatenkobrain(a)fedoraproject.org, besser82(a)fedoraproject.org,
ngompa13(a)gmail.com
== Detailed Description ==
Historically, software builds had a singular build configuration and
required running the build within the project root. Nowadays, there
are many build modes and options that can be configured in projects,
different build settings (e.g. compiler flags) / types (release,
debug) that can be applied and different tools that can be used to
actually execute builds (compilers like gcc/clang, build job
schedulers like make/ninja, and so on). Thus, CMake upstream strongly
discourages users of doing in-source builds and recommends doing
out-of-source builds.
From <code>cmake.1</code>:
<pre>
To maintain a pristine source tree, perform an out-of-source build by
using a separate dedicated build tree. An in-source build in which the
build tree is placed in the same directory as the source tree is also
supported, but discouraged.
</pre>
The other part of the change is introduction of additional macros is
creation of set of macro that can build, install and run tests in a
backend-agnostic, vpath-aware (out-of-source, in-source) way.
=== Migration ===
==== <code>%cmake</code> + <code>%(make|ninja)_(build|install)</code> ====
There are multiple paths to complete the migration:
* Add <code>-C "%{_vpath_builddir}"</code> to the <code>%(make|ninja)_*</code>
* Replace <code>%(make|ninja)_build</code> and
<code>%(make|ninja)_install</code> with <code>%cmake_build</code> and
<code>%cmake_install</code> respectively
* Redefine vpath builddir <code>%global _vpath_builddir .</code> to
continue performing in-source builds (and optionally converting to the
<code>%cmake_*</code>)
Depending on the package, one of these options may be used to adapt to
this change.
==== <code>%cmake -B builddir</code> +
<code>%(make|ninja)_(build|install) -C builddir</code> ====
No changes are needed.
== Benefit to Fedora ==
* Follow CMake upstream recommendations when building packages
* Brings Fedora package builds more in-line with how upstream projects
expect them to be built
* Improve compatibility with other RPM distributions that already do this
* Support backend-agnostic way of building CMake projects
== Scope ==
* Proposal owners: Implement necessary macros, try to build packages
that <code>BuildRequires: cmake</code> in a side tag, analyze failures
and fix the relevant ones (introduced by this change).
* Other developers: While proposal owners will try to fix all affected
packages, there might be some cases where package is already FTBFS so
the fix can't be performed. Other package maintainers will have to fix
the issue themselves after they fix FTBFS.
* Release engineering: [https://pagure.io/releng/issue/9524 #9524]
* Policies and guidelines: CMake page will be adjusted to mention
newly created macros and the documentation about relevant VPATH macros
needs to be restructured a bit (they are already documented on the
Meson page, they need to be moved to the separate page and referenced
both from CMake and Meson page).
* Trademark approval: N/A (not needed for this Change)
== Upgrade/compatibility impact ==
Existing packages can (and most likely will) become FTBFS, but
proposal owners will fix as many Fedora packages as possible. However
fixing third-party packages is not possible and out of scope.
Third-party packagers will need to adapt based on the recommendations
noted in this Change.
== How To Test ==
# Grab the new cmake RPM from the Koji sidetag (TBC)
# Try to build package that uses <code>%cmake</code>,
<code>%cmake_build</code>, <code>%cmake_install</code> and
<code>%ctest</code> macro
== User Experience ==
The end-users (non-packagers) will not notice any changes.
== Dependencies ==
There are around 1100 RPMs in Fedora that depend on CMake at
build-time. All proposal owners are provenpackagers so they are able
to commit necessary fixes. No external dependencies.
== Contingency Plan ==
* Contingency mechanism: Proposal owners will adjust macros to not do
out-of-source builds by default, but will preserve newly created macro
(essentially to bring them to the targeted state of older supported
Fedora releases).
* Contingency deadline: Beta freeze.
* Blocks release? No
== Documentation ==
The only place that needs to be adjusted is packaging guidelines.
--
Ben Cotton
He / Him / His
Senior Program Manager, Fedora & CentOS Stream
Red Hat
TZ=America/Indiana/Indianapolis
1 year, 9 months
Fedora 33 System-Wide Change proposal: swap on zram
by Ben Cotton
https://fedoraproject.org/wiki/Changes/SwapOnZRAM
== Summary ==
Swap is useful, except when it's slow. zram is a RAM drive that uses
compression. Create a swap-on-zram during start-up. And no longer use
swap partitions by default.
== Owner ==
* Name: [[User:chrismurphy| Chris Murphy]]
* Email: chrismurphy(a)fedoraproject.org
== Detailed Description ==
==== zram Basic function ====
The zram† device, typically <span style=color:brown>/dev/zram0</span>,
has a size set at create time during early boot, by zram-generator†
per its configuration file. The memory used is not preallocated. It's
dynamically allocated and deallocated, on demand. Due to compression,
a full <span style=color:brown>/dev/zram0</span> uses half as much
memory as its size.
The <span style=color:brown>/dev/zram0</span> behaves like any other
block device. It can be formatted with a file system, or mkswap, which
is the intention with this change proposal.
The system will use RAM normally up until it's full, and then start
paging out to swap-on-zram, same as a conventional swap-on-drive. The
zram driver starts to allocate memory at roughly 1/2 the rate of page
outs, due to compression. But, there is no free lunch. This means
swap-on-zram is not as effective at page eviction as swap-on-drive,
the eviction rate is ~50% instead of 100%. But it is at least an order
of magnitude faster than drive based swap.
zram has about 0.1% overhead or ~1MiB/1GiB. If the workload never
touches swap, this overhead is the sole cost. In practice when not
used at all, feature owner has experienced ~0.04% overhead.
Example: A system has 16 GiB RAM. The proposed defaults suggest the
<span style=color:brown>/dev/zram0</span> device will be 4 GiB. If the
workload completely fills up swap with 4 GiB of anonymous pages,
what's happened? The <span style=color:red>zramctl</span> command will
display the true compression ratio. If 2:1 is really obtained, it
means 4GiB swap data is compressed to 2GiB. Therefore 2GiB is the
actual RAM usage, and is also the net effective eviction. i.e. 4 GiB
anonymous pages are evicted, but are then compressed and pinned into 2
GiB RAM, for a net memory savings of 2 GiB.
†</br >
[https://www.kernel.org/doc/Documentation/blockdev/zram.txt kernel.org zram.txt]
[https://github.com/systemd/zram-generator Github zram-generator project]
==== Overview of the Feature ====
Using swap is a good idea†, but no one likes it when it's slow.
Anaconda and Fedora IoT have been using swap-on-zram by default for
years. This builds on their prior effort.
There are three components to the change:
# Install systemd rust-zram-generator† package. This does not enable
swap-on-zram, it only makes the generator available.</br >
# Install a default zram-generator configuration. When present,
swap-on-zram is set-up during startup.</br >
# Do not create swap partition/LV with default installations.
This proposal aims to apply all three, for all Fedora editions and
spins, by default.
It further aims to apply the first two, for upgrades and custom installations.
It might be useful to only make the generator available (1), should an
edition/spin wish to opt out, or as a fallback if applying the feature
to upgrades fails to withstand scrutiny.
†</br >
There is a tl;dr section at the top. Highly recommend reading the
whole article. [https://chrisdown.name/2018/01/02/in-defence-of-swap.html
In defence of swap: common misconceptions]
==== Default zram device configuration: ====
During startup, create a zram device <span
style=color:brown>/dev/zram0</span>, with a size equal to 50% RAM, but
capped† to 4 GiB, and with a higher than typical swap priority†.
These values seem reasonably conservative, and are based on prior work
in Fedora. Anaconda sets swap-on-drive sized to 50% RAM in the no
hibernation case, common outside x86. Fedora IoT's implementation also
sets swap-on-zram size to 50% RAM.
†</br >
[https://github.com/systemd/zram-generator/issues/10 RFE: should be
able to set a cap on zram device size #10]
[https://github.com/systemd/zram-generator/issues/8 RFE: should set priority #8]
==== Default installer behavior ====
The installer is currently responsible for creating a swap-on-drive
device. This will be dropped. The zram-generator + configuration file
will trigger the setup and activation of swap-on-zram. This means
hibernation isn't possible, even on systems that could support it.
Please see [https://pagure.io/fedora-workstation/blob/master/f/hibernationstatus.md
Supporting hibernation in Workstation edition] for much more detailed
information, including why it's increasingly likely hibernation isn't
possible anyway, and a path to improving hibernation support.
==== Custom/Advance partitioning installer behavior ====
The user can add swap using Custom partitioning at install time. This
is swap-on-drive. And the installer will also include the <span
style=color:red> resume=UUID </span> kernel parameter for this swap
device. No change in behavior here.
Since swap-on-zram is still enabled by default, there will be two
swaps: swap-on-zram, and swap-on-drive. The swap-on-zram will have
higher priority, thus being favored over drive based swap. The kernel
is smart enough to know it can't hibernate to a zram device, and will
instead use drive based swap.
==== How can it be disabled? ====
Immediately:</br >
<span style=color:red>swapoff /dev/zram0</span>
Permanently:</br >
<span style=color:red>rm /etc/systemd/zram-generator.conf</span>
== Feedback ==
==== You're enabling it on upgrades? ====
That's the current plan. As a technical matter, feature owner is
confident this feature will improve the experience of all users
regardless of configuration. As a non-technical matter, it's
recognized that (a) ''hey pal, you're messing with my customizations,
not cool!'' and (b) ''swap always stinks, I don't care if it has a 'Z'
in the name!'' may need more convincing.
There are possible risks.
* Workloads that expect full use of memory, and depend on 100% page
eviction. These may run slower if they really need full use of memory,
but some memory is used for the zram device instead. Such workloads
might favor zswap.
* Workloads with low compressible pages. In the worst case, this means
unnecessary work merely moving pages around.
* Workloads with memory full, and hibernation. Hibernation is already
stressful to memory-management subsystem and prone to bailing out in
such cases. The swap-on-zram will be favored for evictions in the
attempt to free memory to create the hibernation image. It could
increase instances of hibernation entry failure. This isn't a crash,
it just means the attempt doesn't succeed, and the system resumes
operation instead of hibernating.
While possible, it's difficult to estimate their probability. But this
is a significant consideration in the conservative default zram size.
Users can easily increase zram size as needed for their use case,
simply by editing <span
style=color:red>/etc/systemd/zram-generator.conf</span> and the change
takes effect at next boot.
==== Why systemd zram-generator? ====
It's the most upstream implementation to date, is fast and
lightweight. The zram-generator uses existing systemd infrastructure
to setup the zram block device, format it as swap, and swapon - all
during early boot. It's very similar in behavior to fstab-generator,
gpt-auto-generator, and cryptsetup-generator†.
Converging on one implementation avoids user confusion. And while the
alternatives are nice and work fine, a systemd generator is
particularly well suited for this use case compared to a systemd
service unit.†
Also, it's an reference implementation of a system generator written in Rust.
†</br >
[https://www.freedesktop.org/software/systemd/man/systemd.generator.html
freedesktop.org About systemd generators.]</br >
[https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.o...
devel@ ''Re: swap-on-zram by default'' Zbigniew Jędrzejewski-Szmek,
systemd zram-generator author/maintainer]
==== Why not a bigger zram device? ====
The main idea of being conservative is to address concerns about
upgrades. It's possible some workloads will have less compressible
data. Hence, not going with <span style=color:brown>/dev/zram0</span>
sized to 100% of RAM at this time. Even a <span
style=color:brown>/dev/zram0</span> of 200% RAM is not unreasonable
*if* the compression ratio is at least 2:1. However, it's possible a
system can get "stuck" in a kind of swap thrashing similar to
conventional swap-on-drive, except it's CPU and memory bound, rather
than IO bound. Feature owner thinks it's better to just oom, instead
of getting overly aggressive with the zram device size.
Conversely it's possible to be too conservative with the size, and
result in more instances of OOM kill. If applying the feature to
upgrades is rejected, it's probably reasonable to increase the cap to
~8GiB. Of course more feedback and testing is needed, and it will be
taken into consideration.
Note that the kernel zram doc says an excessively sized zram device
does come with overhead. Users's can increase the size easily
post-install, a capability they don't easily have with swap-on-drive.
The goal for Fedora 33 is a default that's useful and safe for the
vast majority of use cases.
==== Why not zswap? ====
Zswap† is a similar idea, but with a totally different implementation.
It is swap specific, uses a RAM cache, and requires a conventional
swap partition existing already. It might be true certain workloads
are better suited for using zswap. But swap-on-zram depends only on
volatile storage. This is simpler and it's more secure. Whereas zswap
"spills over" into swap-on-drive and will leak user data if that swap
device isn't encrypted. Some workloads may do better with zswap, and
it's a valid future feature for a new generator, or possibly extend
zram-generator to support it via the configuration file. Maybe the
generator could favor zswap when swap-on-drive already exists; and
fallback to swap-on-zram?
†</br >
[https://www.kernel.org/doc/Documentation/vm/zswap.txt kernel.org zswap.txt]
== Benefit to Fedora ==
* significantly improves system responsiveness, especially when swap
is under pressure;
* more secure, user data leaks into swap are on volatile media;
* without swap-on-drive, there's better utilization of a limited
resource: benefit of swap without the drive space consumption;
* complements on-going resource control work, including earlyoom;
* further reduces the time to out-of-memory kill, when workloads exceed limits;
* improves performance for both "no swap" and "existing swap" setups;
== Scope ==
* Proposal owners:
** add zram-generator package to comps and kickstarts as appropriate
** obsolete zram package (used by Fedora IoT)
** means of per edition/spin configurations, if needed
** coordinate a test day
* Other developers:
**Anaconda are agreeable to deprecating their built-in implementation
in favor of swap-on-zram
**RFE's for zram-generator: users are not worse off if they don't
happen. Open request for help, to make it possible. It's much
appreciated.</br >
[https://github.com/systemd/zram-generator/issues/10 RFE: should be
able to set a cap on zram device size #10]</br >
[https://github.com/systemd/zram-generator/issues/8 RFE: should set priority #8]
* Release engineering: [https://pagure.io/releng/issues #9495]
* Policies and guidelines: N/A
* Trademark approval: N/A
== Upgrade/compatibility impact ==
Add Supplements:fedora-release-common to zram-generator to pull it in
on upgrades.
Existing systems without swap will have swap-on-zram enabled.
Existing systems with swap-on-drive, will also have swap-on-zram
enabled (two swap devices), with higher priority for the zram device.
Existing swap-on-drive will not be removed.
'zram' package contains zram-swap.service and associated bash scripts,
and is currently used by Fedora IoT and ARM spins. It will be
obsoleted to avoid conflicting/duplicative swap-on-zram
implementations.
== How To Test ==
Any hardware. Any version of Fedora.
# dnf install zram-generator
# cp /usr/share/doc/zram-generator/zram-generator.conf.example
/etc/systemd/zram-generator.conf
# Edit the configuration
# Reboot
# Check that swap is on a zram device: zramctl, swapon
# Detailed check: journalctl -b -o short-monotonic | grep 'swap\|zram'
# Check that priority is higher than existing swap if two or more are
listed. ## (Enhancement is needed for this.)
Suggested configuration file values:</br >
<span style=color:red>[zram0]</span></br >
<span style=color:red>memory-limit = none</span></br >
<span style=color:red>zram-fraction = 0.5</span></br >
Feel free to run your usual workloads more aggressively or in
parallel. Suspend-to-RAM and suspend-to-drive are expected to continue
to work too (or at least hit all the same bugs as without zram being
used).
Also, you can see the actual compression ratio achieved with the
following command:</br >
<span style=color:red> zramctl </span>
==== Test Day ====
[https://pagure.io/fedora-qa/issue/632 QA: SwapOnzram Test Day] to
discover edge cases, and tweak the default configuration if necessary
to establish a good one-size-fits all approach.
== User Experience ==
The user won't notice anything displeasing. If their usual workload
causes them to dread swap thrashing, they'll be surprised that
thrashing doesn't happen. The user might get curious if they don't
find a swap entry in /etc/fstab. Or if they 'swapon' and see swap
pointing to <span style=color:brown>/dev/zram0</span> instead of a
drive partition or LV.
== Dependencies ==
N/A
== Contingency Plan ==
* Contingency mechanism: Don't ship the generator = big hammer, but
easy. Preferable to ship the generator, but only selectively ship
configuration files = scalpel, pretty easy.
* Contingency deadline: Beta freeze
* Blocks release? No.
* Blocks product? No.
== Documentation ==
Consider adding a hint in an /etc/fstab comment? There is no man page
for this, and the documentation is also minimal, besides what's in
this feature proposal. It's an open question how the user should get
more information on how to configure and tweak it. But then, they
don't have that for swap today either. There's just institutional
knowledge.
Hence, a strong test day, with a lot of people and press coverage of
the feature, might help spread the word for institutional knowledge
changes coming.
Ideas welcome.
== Release Notes ==
Pending feedback and test day.
--
Ben Cotton
He / Him / His
Senior Program Manager, Fedora & CentOS Stream
Red Hat
TZ=America/Indiana/Indianapolis
1 year, 10 months
Fedora 33 System-Wide Change proposal: Make btrfs the default file
system for desktop variants
by Ben Cotton
https://fedoraproject.org/wiki/Changes/BtrfsByDefault
== Summary ==
For laptop and workstation installs of Fedora, we want to provide file
system features to users in a transparent fashion. We want to add new
features, while reducing the amount of expertise needed to deal with
situations like [https://pagure.io/fedora-workstation/issue/152
running out of disk space.] Btrfs is well adapted to this role by
design philosophy, let's make it the default.
== Owners ==
* Names: [[User:Chrismurphy|Chris Murphy]], [[User:Ngompa|Neal
Gompa]], [[User:Josef|Josef Bacik]], [[User:Salimma|Michel Alexandre
Salim]], [[User:Dcavalca|Davide Cavalca]], [[User:eeickmeyer|Erich
Eickmeyer]], [[User:ignatenkobrain|Igor Raits]],
[[User:Raveit65|Wolfgang Ulbrich]], [[User:Zsun|Zamir SUN]],
[[User:rdieter|Rex Dieter]], [[User:grinnz|Dan Book]],
[[User:nonamedotc|Mukundan Ragavan]]
* Emails: chrismurphy(a)fedoraproject.org, ngompa13(a)gmail.com,
josef(a)toxicpanda.com, michel(a)michel-slm.name, dcavalca(a)fb.com,
erich(a)ericheickmeyer.com, ignatenkobrain(a)fedoraproject.org,
fedora(a)raveit.de, zsun(a)fedoraproject.org, rdieter(a)gmail.com,
grinnz(a)gmail.com, nonamedotc(a)gmail.com
* Products: All desktop editions, spins, and labs
* Responsible WGs: Workstation Working Group, KDE Special Interest Group
== Detailed Description ==
Fedora desktop edition/spin variants will switch to using Btrfs as the
filesystem by default for new installs. Labs derived from these
variants inherit this change, and other editions may opt into this
change.
The change is based on the installer's custom partitioning Btrfs
preset. It's been well tested for 7 years.
'''''Current partitioning'''''<br />
<span style="color: tomato">vg/root</span> LV mounted at <span
style="color: tomato">/</span> and a <span style="color:
tomato">vg/home</span> LV mounted at <span style="color:
tomato">/home</span>. These are separate file system volumes, with
separate free/used space.
'''''Proposed partitioning'''''<br />
<span style="color: tomato">root</span> subvolume mounted at <span
style="color: tomato">/</span> and <span style="color:
tomato">home</span> subvolume mounted at <span style="color:
tomato">/home</span>. Subvolumes don't have size, they act mostly like
directories, space is shared.
'''''Unchanged'''''<br />
<span style="color: tomato">/boot</span> will be a small ext4 volume.
A separate boot is needed to boot dm-crypt sysroot installations; it's
less complicated to keep the layout the same, regardless of whether
sysroot is encrypted. There will be no automatic snapshots/rollbacks.
If you select to encrypt your data, LUKS (dm-crypt) will be still used
as it is today (with the small difference that Btrfs is used instead
of LVM+Ext4). There is upstream work on getting native encryption for
Btrfs that will be considered once ready and is subject of a different
change proposal in a future Fedora release.
=== Optimizations (Optional) ===
The detailed description above is the proposal. It's intended to be a
minimalist and transparent switch. It's also the same as was
[[Features/F16BtrfsDefaultFs|proposed]] (and
[https://lwn.net/Articles/446925/ accepted]) for Fedora 16. The
following optimizations improve on the proposal, but are not critical.
They are also transparent to most users. The general idea is agree to
the base proposal first, and then consider these as enhancements.
==== Boot on Btrfs ====
* Instead of a 1G ext4 boot, create a 1G Btrfs boot.
* Advantage: Makes it possible to include in a snapshot and rollback
regime. GRUB has stable support for Btrfs for 10+ years.
* Scope: Contingent on bootloader and installer team review and
approval. blivet should use <code>mkfs.btrfs --mixed</code>.
==== Compression ====
* Enable transparent compression using zstd on select directories:
<span style="color: tomato">/usr</span> <span style="color:
tomato">/var/lib/flatpak</span> <span style="color:
tomato">~/.local/share/flatpak</span>
* Advantage: Saves space and significantly increase the lifespan of
flash-based media by reducing write amplification. It may improve
performance in some instances.
* Scope: Contingent on installer team review and approval to enhance
anaconda to perform the installation using <code>mount -o
compress=zstd</code>, then set the proper XATTR for each directory.
The XATTR can't be set until after the directories are created via:
rsync, rpm, or unsquashfs based installation.
==== Additional subvolumes ====
* <span style="color: tomato">/var/log/</span> <span style="color:
tomato">/var/lib/libvirt/images</span> and <span style="color:
tomato">~/.local/share/gnome-boxes/images/</span> will use separate
subvolumes.
* Advantage: Makes it easier to excluded them from snapshots,
rollbacks, and send/receive. (Btrfs snapshotting is not recursive, it
stops at a nested subvolume.)
* Scope: Anaconda knows how to do this already, just change the
kickstart to add additional subvolumes (minus the subvolume in <span
style="color: tomato">~/</span>. GNOME Boxes will need enhancement to
detect that the user home is on Btrfs and create <span style="color:
tomato">~/.local/share/gnome-boxes/images/</span> as a subvolume.
== Feedback ==
==== Red Hat doesn't support Btrfs? Can Fedora do this? ====
Red Hat supports Fedora well, in many ways. But Fedora already works
closely with, and depends on, upstreams. And this will be one of them.
That's an important consideration for this proposal. The community has
a stake in ensuring it is supported. Red Hat will never support Btrfs
if Fedora rejects it. Fedora necessarily needs to be first, and make
the persuasive case that it solves more problems than alternatives.
Feature owners believe it does, hands down.
The Btrfs community has users that have been using it for most of the
past decade at scale. It's been the default on openSUSE (and SUSE
Linux Enterprise) since 2014, and Facebook has been using it for all
their OS and data volumes, in their data centers, for almost as long.
Btrfs is a mature, well-understood, and battle-tested file system,
used on both desktop/container and server/cloud use-cases. We do have
developers of the Btrfs filesystem maintaining and supporting the code
in Fedora, one is a Change owner, so issues that are pinned to Btrfs
can be addressed quickly.
==== What about device-mapper alternatives? ====
dm-thin (thin provisioning):
[[https://pagure.io/fedora-workstation/issue/152 Issue #152] still
happens, because the installer won't over provision by default. It
still requires manual intervention by the user to identify and resolve
the problem. Upon growing a file system on dm-thin, the pool is over
committed, and file system sizes become a fantasy: they don't add up
to the total physical storage available. The truth of used and free
space is only known by the thin pool, and CLI and GUI programs are
unprepared for this. Integration points like rpm free space checks or
GNOME disk-space warnings would have to be adapted as well.
dm-vdo: is not yet merged, and isn't as straightforward to selectively
enable per directory and per file, as is the case on Btrfs using
<code>chattr +c</code> on <span style="color:
tomato">/var/lib/flatpaks/</span>.
Btrfs solves the problems that need solving, with few side effects or
pitfalls for users. It has more features we can take advantage of
immediately and transparently: compression, integrity, and IO
isolation. Many Btrfs features and optimizations can be opted into
selectively per directory or file, such as compression and nodatacow,
rather than as a layer that's either on or off.
==== What about UI/UX and integration in the desktop? ====
If Btrfs isn't the default file system, there's no commitment, nor
reason to work on any UI/UX integration. There are ideas to make
certain features discoverable: selective compression; systemd-homed
may take advantage of either Btrfs online resize, or near-term planned
native encryption, which could make it possible to live convert
non-encrypted homes to encrypted; and system snapshot and rollbacks.
Anaconda already has sophisticated Btrfs integration.
==== What Btrfs features are recommended and supported? ====
The primary goal of this feature is to be largely transparent to the
user. It does not require or expect users to learn new commands, or to
engage in peculiar maintenance rituals.
The full set of Btrfs features that is considered stable and enabled
by default upstream will be enabled in Fedora. Fedora is a community
project. What is supported within Fedora depends on what the community
decides to put forward in terms of resources.
The upstream [https://btrfs.wiki.kernel.org/index.php/Status Btrfs
feature status page].
==== Are subvolumes really mostly like directories? ====
Subvolumes behave like directories in terms of navigation in both the
GUI and CLI, e.g. <code>cp</code>, <code>mv</code>, <code>du</code>,
owner/permissions, and SELinux labels. They also share space, just
like a directory.
But it is an incomplete answer.
A subvolume is an independent file tree, with its own POSIX namespace,
and has its own pool of inodes. This means inode numbers repeat
themselves on a Btrfs volume. Inodes are only unique within a given
subvolume. A subvolume has its own st_dev, so if you use <code>stat
FILE</code> it reports a device value referring to the subvolume the
file is in. And it also means hard links can't be created between
subvolumes. From this perspective, subvolumes start looking more like
a separate file system. But subvolumes share most of the other trees,
so they're not truly independent file systems. They're also not block
devices.
== Benefit to Fedora ==
Problems Btrfs helps solve:
* Users running out of free space on either <span style="color:
tomato">/</span> or <span style="color: tomato">/home</span>
[https://pagure.io/fedora-workstation/issue/152 Workstation issue
#152]
** "one big file system": no hard barriers like partitions or logical volumes
** transparent compression: significantly reduces write amplification,
improves lifespan of storage hardware
** reflinks and snapshots are more efficient for use cases like
containers (Podman supports both)
* Storage devices can be flaky, resulting in data corruption
** Everything is checksummed and verified on every read
** Corrupt data results in EIO (input/output error), instead of
resulting in application confusion, and isn't replicated into backups
and archives
* Poor desktop responsiveness when under pressure
[https://pagure.io/fedora-workstation/issue/154 Workstation issue
#154]
** Currently only Btrfs has proper IO isolation capability via cgroups2
** Completes the resource control picture: memory, cpu, IO isolation
* File system resize
** Online shrink and grow are fundamental to the design
* Complex storage setups are... complicated
** Simple and comprehensive command interface. One master command
** Simpler to boot, all code is in the kernel, no initramfs complexities
** Simple and efficient file system replication, including incremental
backups, with <code>btrfs send</code> and <code>btrfs receive</code>
== Scope ==
* Proposal owners:
** Submit PR's for Anaconda to change <code>default_scheme =
BTRFS</code> to the proper product files.
** Multiple test days: build community support network
** Aid with documentation
* Other developers:
** Anaconda, review PRs and merge
** Bootloader team, review PRs and merge
** Recommended optimization <code>chattr +C</code> set on the
containing directory for virt-manager and GNOME Boxes.
* Release engineering: [https://pagure.io/releng/issue/9545 #9545]
* Policies and guidelines: N/A
* Trademark approval: N/A
== Upgrade/compatibility impact ==
Change will not affect upgrades.
Documentation will be provided for existing Btrfs users to "retrofit"
their setups to that of a default Btrfs installation (base plus any
approved options).
== How To Test ==
'''''Today'''''<br />
Do a custom partitioning installation; change the scheme drop-down
menu to Btrfs; click the blue "automatically create partitions"; and
install.<br />
Fedora 31, 32, Rawhide, on x86_64 and ARM.
'''''Once change lands'''''<br />
It should be simple enough to test, just do a normal install.
== User Experience ==
==== Pros ====
* Mostly transparent
* Space savings from compression
* Longer lifespan of hardware, also from compression.
* Utilities for used and free space, CLI and GUI, are expected to
behave the same. No special commands are required.
* More detailed information can be revealed by <code>btrfs</code>
specific commands.
==== Enhancement opportunities ====
[https://bugzilla.redhat.com/show_bug.cgi?id=906591 updatedb does not
index /home when /home is a bind mount] Also can affected rpm-ostree
installations, including Silverblue.
[https://gitlab.gnome.org/GNOME/gnome-usage/-/issues/49 GNOME Usage:
Incorrect numbers when using multiple btrfs subvolumes] This isn't
Btrfs specific, happens with "one big ext4" volume as well.
[https://gitlab.gnome.org/GNOME/gnome-boxes/-/issues/88 GNOME Boxes,
RFE: create qcow2 with 'nocow' option when on btrfs /home] This is
Btrfs specific, and is a recommended optimization for both GNOME Boxes
and virt-manager.
[https://github.com/containers/libpod/issues/6563 containers/libpod:
automatically use btrfs driver if on btrfs]
== Dependencies ==
None.
== Contingency Plan ==
* Contingency mechanism: Owner will revert changes back to LVM+ext4
* Contingency deadline: Beta freeze
* Blocks release? Yes
* Blocks product? Workstation and KDE
== Documentation ==
Strictly speaking no documentation is required reading for users. But
there will be some Fedora documentation to help get the ball rolling.
For those who want to know more:
[https://btrfs.wiki.kernel.org/index.php/Main_Page btrfs wiki main
page and full feature list.]
<code>man 5 btrfs</code> contains: mount options, features, swapfile
support, checksum algorithms, and more<br />
<code>man btrfs</code> contains an overview of the btrfs subcommands<br />
<code>man btrfs <nowiki><subcommand></nowiki></code> will show the man
page for that subcommand
NOTE: The btrfs command will accept partial subcommands, as long as
it's not ambiguous. These are equivalent commands:<br />
<code>btrfs subvolume snapshot</code><br />
<code>btrfs sub snap</code><br />
<code>btrfs su sn</code>
You'll discover your own convention. It might be preferable to write
out the full command on forums and lists, but then maybe some folks
don't learn about this useful shortcut?
For those who want to know a lot more:
[https://btrfs.wiki.kernel.org/index.php/Main_Page#Developer_documentation
Btrfs developer documentation]<br />
[https://github.com/btrfs/btrfs-dev-docs/blob/master/trees.txt Btrfs trees]
== Release Notes ==
The default file system on the desktop is Btrfs.
--
Ben Cotton
He / Him / His
Senior Program Manager, Fedora & CentOS Stream
Red Hat
TZ=America/Indiana/Indianapolis
1 year, 10 months