A tale of systemd and MaxProcs
by Kevin Fenzi
Greetings.
I thought I would share this in hopes we can come up with ways to
handle this sort of thing better. I do not blame anyone and I'd like to
focus on positive steps we can take rather than pointing fingers.
Just before the Fedora 25 development cycle started, I went and
upgraded our koji builders from Fedora 23 to Fedora 24. I've done this
every cycle since we moved them to Fedora instances and its gone
usually pretty smoothly.
Not this time. ;) The kernel maintainers started getting builds that
failed and got stuck in odd ways. A few other larger packages (glibc)
maintainers also hit this issue. The mock logs simply showed a bunch of
"Fork failed" messages and a bunch of processes stuck in D
At first we thought it might be a kernel issue, so we tried various
kernels on the builders without any luck. Then the kernel maintainers
added some patches to see if they could get debugging from the deadlock
it was hitting. That fixed the processes stuck in D, but builds still
failed.
I started looking around for anything else that might be related and
saw there was a systemd update that mentioned task limits, which in
turn pointed us to looking at systemd and seeing this (from systemd 228
NEWS):
" * There's a new system.conf setting DefaultTasksMax= to
control the default TasksMax= setting for services and
scopes running on the system. (TasksMax= is the primary
setting that exposes the "pids" cgroup controller on systemd
and was introduced in the previous systemd release.) The
setting now defaults to 512, which means services that are
not explicitly configured otherwise will only be able to
create 512 processes or threads at maximum, from this
version on. Note that this means that thread- or
process-heavy services might need to be reconfigured to set
TasksMax= to a higher value. It is sufficient to set
TasksMax= in these specific unit files to a higher value, or
even "infinity". Similar, there's now a logind.conf setting
UserTasksMax= that defaults to 4096 and limits the total
number of processes or tasks each user may own
concurrently. nspawn containers also have the TasksMax=
value set by default now, to 8192. Note that all of this
only has an effect if the "pids" cgroup controller is
enabled in the kernel. The general benefit of these changes
should be a more robust and safer system, that provides a
certain amount of per-service fork() bomb protection."
We had systemd 229, and this was the root cause of the issue.
In systemd 231 they changed this to:
" * In similar fashion TasksMax= takes percentage values now,
too. The value is taken relative to the configured maximum number of
processes on the system. The per-service task maximum has been changed
to 15% using this functionality. (Effectively this is an increase of
512 → 4915 for service units, given the kernel's default pid_max
setting.)"
So, how can we do better here?
* IMHO the initial upstream default didn't make sense for Fedora
* The feature doesn't seem to work right (causing processes to go into
D and be deadlocked). Likely this is a rcu/kernel bug(s), but systemd
shouldn't enable this unless it's robust or have some kind of check
for a recent enough kernel with fixes?
* There isn't any logging showing that this was hit. Perhaps there's a
technical limitation with cgroups? This would be very handy in
tracking down things like this.
* This might have been a nice thing for release notes? I try and keep
up, but I don't read every systemd NEWS file as they happen or most
packages NEWS files. Can we make adding release notes any easier?
* Perhaps after beta but before final we ping maintainers of
"important" packages asking what big changes have happened? Or
someone just goes thru the release notes for them all and proposes a
list of them?
* Your brilliant idea here.
kevin
7 years, 6 months
Cloud and Server Q&A
by Chris Murphy
Hi,
I was asked to start this in today's Server meeting. The genesis for
me was, I have more questions than answers and I'm fairly convinced
I'm not the only person who's kinda shrugging not knowing what all the
questions even are. Answers are important too, but good questions to
properly explore scope and liabilities have to come first.
Cloud WG folks had decided a while ago to focus on Atomic Host, and
sounds like now they only want to do that, and form a new Atomic WG.
[1][2]
I see 8 base images for Cloud that aren't rpm-ostree based. Are they
in need of a new home? Who's using them? Are they all needed? Does it
make sense for Server WG to produce the non-Atomic Cloud deliverable
images?
At the last Cloud meeting, it was floated whether some Cloud people
should move over to Server, or vice versa. Should there be an
Atomic/Container WG? i.e. a fourth product deliverable?
Being contrary, I wondered about consolidation as a solution rather
than adding another WG and product. [3] Does anyone see Cloud WG, or
Server WG as spread too thinly? What estimate do you have for overlap
in work between Cloud and Server? Is there an economy of scale by
combining them? And is it both useful and practical to have subgroups
within a WG, to split out the sub variants of Server: hardware, cloud,
atomic host?
Server and Workstation WGs have expressed interest in moving to
rpm-ostree based deployments also. So I'm confused by what an Atomic
WG would produce that's unique. There are huge differences between
conventional and rpm-ostree deployments. Does it make sense for an
Atomic WG to have no outputs? And instead is liason with Server and
Workstation WGs, QA, Docs, releng, and others, to help in the
transition to this new way of delivering and maintaining Fedora?
It might be that the Cloud and Server PRD refreshes help sort some of
this stuff out too.
OK I have more questions but this is long enough. I'm certain others
can ask better questions, or versions of these ones, and in particular
the questions I haven't asked.
[1]
https://meetbot.fedoraproject.org/fedora-meeting-1/2016-09-21/fedora_clou...
[2]
https://fedorahosted.org/cloud/ticket/170
[3]
(Combining) Cloud Atomic Server WGs
https://lists.fedoraproject.org/archives/list/cloud@lists.fedoraproject.o...
--
Chris Murphy
7 years, 6 months
USB writing changes: wiki instructions, tests
by Adam Williamson
Hey folks! Wanted to send a heads-up that I've made some fairly
significant changes to the wiki regarding writing Fedora to USB.
I've done (yet another) revision of the main wiki instructions for
writing USB sticks:
https://fedoraproject.org/wiki/How_to_create_and_use_Live_USB
since the Fedora Media Writer tool - the rewrite (of the rewrite?) of
LiveUSB Creator - now looks to be working well in testing, and is
available on all our target platforms, the page now heavily promotes it
as the best tool to use in almost all cases. Some other methods have
been entirely removed - FMW should be a better choice for Windows and
macOS than the tools that were listed before.
I retained sections for:
* livecd-iso-to-disk, as it's now the only tool that supports non-
destructive write and data persistence
* gnome-disk-utility, for non-Fedora *nix without Flatpak support
* dd, for non-Fedora *nix without Flatpak or GNOME and people who just
like dd
* unetbootin, because the section basically exists to explicitly state
that we don't support it (maybe we should add a similar section for
Rufus...)
Please do tell me about any problems you note, or if you think I
removed something wrongly, or anything. Note that as of right now the
best version of mediawriter for F23 and F24 is in updates-testing, so I
had the instructions to install it include `--enablerepo=updates-
testing`; I'm expecting the updates will go stable soon and we can
remove that.
For validation testing folks, I have also revised the USB validation
test cases. I really kinda hated the way the Installation matrix was
set up with test names that didn't match the test case page names for
no good reason, and also didn't think we really needed all the
different test cases, so I've consolidated them into three
consistently-named test cases with more result columns:
https://fedoraproject.org/wiki/QA:Testcase_USB_dd
https://fedoraproject.org/wiki/QA:Testcase_USB_fmw
https://fedoraproject.org/wiki/QA:Testcase_USB_litd
The effect is pretty much the same as before - except that we now
include the FMW test instead of the LiveUSB Creator test - we test all
three methods for both BIOS and UEFI and for both live and DVD images,
but it's just arranged a little differently (and, I hope, better). I
combined the separate 'live' and 'dvd' versions of the dd and litd test
cases (which were barely different at all) into single test cases, and
consolidated common wording between all the test cases using templates.
I've put this change live on the current Installation validation page -
https://fedoraproject.org/wiki/Test_Results:Fedora_25_Branched_20160930.n...
again, please let me know of any problems you see with this change :)
Thanks everyone!
--
Adam Williamson
Fedora QA Community Monkey
IRC: adamw | Twitter: AdamW_Fedora | XMPP: adamw AT happyassassin . net
http://www.happyassassin.net
7 years, 6 months
Retiring python-slimit in Rawhide
by Stephen Gallagher
Upstream is dead (website is gone), it hasn't seen a release in over three years
and nothing in Fedora relies on it anymore.
7 years, 6 months
F24, small backward steps
by Roger Wells
Just a couple of smallish things after upgrading (via dnf) from F23 to
F24 a couple of months ago:
1. deja-dup gui:
one has to deselect then reselect the Overview option in order
to be offered the "Backup Now" option.
The details option in the progress dialog will only display two
or three lines, is not resizeable, and does not follow resizing the
entire dialog
The progress dialog does not wait to be dismissed at the end,
causing any messages about problems (like failure to backup a particular
file) to not be seen
2. fingerprint identification:
The laptop has a fingerprint reader and it works fine. However
I prefer not to use it. The user set up specifies that fingerprint login
is disabled.
However whenever I am asked for a password the fingerprint
reader blinks until I swipe a finger over it (even after using a
password).
No fingerprint is registered.
This is different than F23 where it never blinked.
3. Scrolling issues:
This, edge and natural scrolling via the touchpad, was covered
nicely in a previous thread.
Solutions offered there work well but should be better
integrated as I am sure they will be.
Desktop is: gnome-desktop3-3.20.2-1.fc24.x86_64
laptop is Thinkpad X240 (Intel graphics)
Not to be a pita, just trying to help
I really like Fedora & the Gnome desktop
--
Roger Wells, P.E.
leidos
221 Third St
Newport, RI 02840
401-847-4210 (voice)
401-849-1585 (fax)
roger.k.wells(a)leidos.com
7 years, 6 months
python-xlib license change to LGPLv2+
by Orion Poplawski
python-xlib-0.17-1.fc26 changed it's license from GPLv2+ to LGPLv2+.
--
Orion Poplawski
Technical Manager 303-415-9701 x222
NWRA, Boulder/CoRA Office FAX: 303-415-9702
3380 Mitchell Lane orion(a)nwra.com
Boulder, CO 80301 http://www.nwra.com
7 years, 6 months
Re: pytest 3.0 in rawhide
by Adam Williamson
On Fri, 2016-09-30 at 23:22 +0200, Thomas Moschny wrote:
> Hi,
>
> this is a heads-up about the pytest update to version 3.0.3 that just
> hit rawhide.
>
> A number of incompatible changes were made in 3.0.0 compared to 2.9.2.
> See http://doc.pytest.org/en/latest/changelog.html for the full list of
> changes and new features.
>
> If you got this email directly, then your package (SRPM) depends on
> pytest. Please check, whether it builds and works with the new pytest
> release. This especially holds for the pytest plugins, some of which
> definitively need to be updated to support pytest 3.0.
>
> Here's the list of packages that (according to dnf repoquery)
> build-depend on pytest:
>
> copr-frontend
> copr-keygen
> freeipa
> python-astropy
> python-coveralls
> python-django-pytest
> python-docopt
> python-gabbi
> python-lib389
> python-pytest-cache
> python-pytest-capturelog
> python-pytest-cov
> python-pytest-mock
> python-pytest-multihost
> python-pytest-pep8
> python-pytest-runner
> python-pytest-sourceorder
> python-pytest-spec
> python-pytest-testmon
> python-pytest-timeout
> python-pytest-watch
> python-pytest-xdist
> python3-pytest-asyncio
I don't think you made this query wide enough. It doesn't list fedfind,
for instance, which has:
BuildRequires: pytest
%if 0%{?with_python3}
BuildRequires: python3-pytest
%endif # if with_python3
--
Adam Williamson
Fedora QA Community Monkey
IRC: adamw | Twitter: AdamW_Fedora | XMPP: adamw AT happyassassin . net
http://www.happyassassin.net
7 years, 6 months
Fw: pytest 3.0 in rawhide
by Kevin Fenzi
Forwarding this on from announce for any intested parties.
Begin forwarded message:
Date: Fri, 30 Sep 2016 23:22:07 +0200
From: Thomas Moschny <thomas.moschny(a)gmx.de>
To: devel-announce(a)lists.fedoraproject.org
Cc: python-devel(a)lists.fedoraproject.org
Subject: pytest 3.0 in rawhide
Hi,
this is a heads-up about the pytest update to version 3.0.3 that just
hit rawhide.
A number of incompatible changes were made in 3.0.0 compared to 2.9.2.
See http://doc.pytest.org/en/latest/changelog.html for the full list of
changes and new features.
If you got this email directly, then your package (SRPM) depends on
pytest. Please check, whether it builds and works with the new pytest
release. This especially holds for the pytest plugins, some of which
definitively need to be updated to support pytest 3.0.
Here's the list of packages that (according to dnf repoquery)
build-depend on pytest:
copr-frontend
copr-keygen
freeipa
python-astropy
python-coveralls
python-django-pytest
python-docopt
python-gabbi
python-lib389
python-pytest-cache
python-pytest-capturelog
python-pytest-cov
python-pytest-mock
python-pytest-multihost
python-pytest-pep8
python-pytest-runner
python-pytest-sourceorder
python-pytest-spec
python-pytest-testmon
python-pytest-timeout
python-pytest-watch
python-pytest-xdist
python3-pytest-asyncio
Thanks,
Thomas
_______________________________________________
devel-announce mailing list -- devel-announce(a)lists.fedoraproject.org
To unsubscribe send an email to
devel-announce-leave(a)lists.fedoraproject.org
7 years, 6 months
Re: Cloud and Server Q&A
by Josh Boyer
On Fri, Sep 30, 2016 at 4:41 PM, Josh Berkus <jberkus(a)redhat.com> wrote:
> On 09/30/2016 01:11 PM, Adam Miller wrote:
>> On Fri, Sep 30, 2016 at 8:35 AM, Matthew Miller
>> <mattdm(a)fedoraproject.org> wrote:
>>> On Thu, Sep 29, 2016 at 04:16:15PM -0700, Adam Williamson wrote:
>>>>> think QA clearly understands what cloud image(s) are release blocking,
>>>>> as previously they were just the non-atomic images.
>>>> Which images are prominent on the download pages and how much of a
>>>> relationship there is between that and 'release blocking' status is
>>>> *also* not my problem, but I'd agree with you (Chris) that it'd be
>>>> rather strange for the most prominently advertised deliverable for a
>>>> given product not to be a release-blocking one.
>>>
>>> I don't think that Atomic *needs* to be release blocking, because if it
>>> misses the grand unified release, we have the ability to update it at
>>> the next cycle, so it's less of a big deal. But if we collectively
>>> prefer to make sure everything is lined up on the release day... I can
>>> see arguments for that, too.
>
> Well, currently I'm working with the designers on a new page for Atomic
> F25. So if that's NOT going to be live the day of the F25 release, then
> it's something we need to know ahead of time.
>
> I also really don't like the message Atomic not being ready sends. We
> will have three branches for GetFedora: Workstation, Server, and Atomic.
> If Atomic isn't ready the day of the release, it looks pretty bad;
> that's saying we're ok with only being 2/3 ready, or that despite
> promoting Atomic to 1st class status we don't really believe it's important.
So... there is a pretty big disparity between what you just said and
what FESCo has been told in the past two meetings. Jan has been
trying to get release blocking deliverables for the Cloud WG (now
Atomic?) confirmed for a while [1]. Two weeks ago, Kushal confirmed
the existing base images are release blocking and Atomic is not. That
was repeated in today's meeting[2] as well:
16:44:56 <kushal> Cloud base image is the only blocking deliverable.
16:44:59 <kushal> Atomic is not.
I realize this WG is in the middle of rebooting itself, but to have
clearly conflicting information from the WG members is a bit
concerning.
josh
[1] https://fedorahosted.org/fesco/ticket/1626
[2] https://meetbot.fedoraproject.org/fedora-meeting/2016-09-30/fesco.2016-09...
7 years, 6 months