Hi folks!
We've had openQA testing of updates for stable and branched releases,
and gating based on those tests, enabled for a while now. I believe
this is going quite well, and I think we addressed the issues reported
when we first enabled gating - Bodhi's gating status updates work more
smoothly now, and openQA respects Bodhi's "re-run tests" button so
failed tests can be re-triggered.
A few weeks ago, I enabled testing of Rawhide updates in the openQA
lab/stg instance. This was to see how smoothly the tests run, how often
we run into unexpected failures or problems, and whether the hardware
resources we have are sufficient for the extra load.
So far this has been going more smoothly than I anticipated, if
anything. The workers seem to keep up with the test load, even though
one out of three worker systems for the stg instance is currently out
of commission (we're using it to investigate a bug). We do get
occasional failures which seem to be related to Rawhide kernel slowness
(e.g. operations timing out that usually don't otherwise time out), but
on the whole, the level of false failures is (I would say) acceptably
low, enough that my current regime of checking the test results daily
and restarting failed ones that don't seem to indicate a real bug
should be sufficient.
So, I'd like to propose that we enable Rawhide update testing on the
production openQA instance also. This would cause results to appear on
the Automated Tests tab in Bodhi, but they would be only informational
(and unless the update was gated by a CI test, or somehow otherwise
configured not to be pushed automatically, updates would continue to be
pushed 'stable' almost immediately on creation, regardless of the
openQA results).
More significantly, I'd also propose that we turn on gating on openQA
results for Rawhide updates. This would mean Rawhide updates would be
held from going 'stable' (and included in the next compose) until the
gating openQA tests had run and passed. We may want to do this a bit
after turning on the tests; perhaps Fedora 37 branch point would be a
natural time to do it.
Currently this would usually mean a wait from update submission to
'stable push' (which really means that the build goes into the
buildroot, and will go into the next Rawhide compose when it happens)
of somewhere between 45 minutes and a couple of hours. It would also
mean that if Rawhide updates for inter-dependent packages are not
correctly grouped, the dependent update(s) will fail testing and be
gated until the update they depend on has passed testing and been
pushed. The tests for the dependent update(s) would then need to be re-
run, either by someone hitting the button in Bodhi or an openQA admin
noticing and restarting them, before the dependent update(s) could be
pushed.
In the worst case, if updated packages A and B both need the other to
work correctly but the updates are submitted separately, both updates
may fail tests and be blocked. This could only be resolved by waiving
the failures, or replacing the separate updates with an update
containing both packages.
All of those considerations are already true for stable and branched
releases, but people are probably more used to grouping updates for
stable and branched than doing it for Rawhide, and the typical flow of
going from a build to an update provides more opportunity to create
grouped updates for branched/stable. For Rawhide the easiest way to do
it if you need to do it is to do the builds in a side tag and use
Bodhi's ability to create updates from a side tag.
As with branched/stable, only critical path updates would have the
tests run and be gated on the results. Non-critpath updates would be
unaffected. (There's a small allowlist of non-critpath packages for
which the tests are also run, but they are not currently gated on the
results).
I think doing this could really help us keep Rawhide solid and avoid
introducing major compose-breaking bugs, at minimal cost. But it's a
significant change and I wanted to see what folks think. In particular,
if you find the existing gating of updates for stable/branched releases
to cause problems in any way, I'd love to hear about it.
Thanks folks!
--
Adam Williamson
Fedora QA
IRC: adamw | Twitter: adamw_ha
https://www.happyassassin.net
The current target release date is the early target date (2023-03-14).
Action summary
====================
Accepted blockers
-----------------
1. distribution — Workstation boot x86_64 image exceeds maximum size — ASSIGNED
ACTION: Workstation WG to reduce image size or increase the limit
2. kwin — kwin_wayland often crashed when used as the sddm Wayland
compositor and logging out of Plasma resulting in a black screen —
ON_QA
ACTION: QA to verify FEDORA-2023-81ff51758d
Proposed blockers
-----------------
1. gdb — Can't open file anon_inode:i915.gem which was expanded to
anon_inode:i915.gem during file-backed mapping note processing — NEW
ACTION: Maintainer to diagnose issue
2. pkgconf — Don't depend on system-rpm-config — ON_QA
ACTION: QA to verify FEDORA-2023-766817d642
Bug-by-bug detail
=============
Accepted blockers
-----------------
1. distribution — https://bugzilla.redhat.com/show_bug.cgi?id=2149246 — ASSIGNED
Workstation boot x86_64 image exceeds maximum size
Changes to linux-firmware briefly brought the Worktation image below
the limit. However, the Fedora-38-20230216.n.0 went above the limit
again.
2. kwin — https://bugzilla.redhat.com/show_bug.cgi?id=2168034 — ON_QA
kwin_wayland often crashed when used as the sddm Wayland compositor
and logging out of Plasma resulting in a black screen
Logging out of Plasma sometimes triggers a kwin crash.
FEDORA-2023-81ff51758d contains a candidate fix.
Proposed blockers
-----------------
1. gdb — https://bugzilla.redhat.com/show_bug.cgi?id=2172342 — NEW
Can't open file anon_inode:i915.gem which was expanded to
anon_inode:i915.gem during file-backed mapping note processing
Reporting a bug with gnome-abrt fails because no fram and no thread
found in the backtrace that gdb produces.
2. pkgconf — https://bugzilla.redhat.com/show_bug.cgi?id=2172406 — ON_QA
Don't depend on system-rpm-config
pkgconf pulls in ~22 new packages as a runtime dependency, which don't
appear to be strictly necessary. FEDORA-2023-766817d642 contains a
candidate fix.
--
Ben Cotton
He / Him / His
Fedora Program Manager
Red Hat
TZ=America/Indiana/Indianapolis
I've spent several days working on this problem and I believe it's time to
raise it here, because it can possibly be a disaster for F38 desktops.
Please read these reports:
1.
https://ask.fedoraproject.org/t/popular-third-party-rpms-fail-to-install-up…
2. https://bugzilla.redhat.com/show_bug.cgi?id=2170878
3. https://bugzilla.redhat.com/show_bug.cgi?id=2170839
I'm happy to provide further details or debug some additional use cases,
but overall this will need developers attention at this point. Even though
it only concerns third-party software, and we don't block on third-party
software, it has such consequences for our OS and is so likely to happen
that I don't think we want to release F38 in this exact state.
Kamil
==============================================
#fedora-meeting-2: Workstation WG (2023-02-14)
==============================================
Meeting started by brainycmurf at 19:37:36 UTC. The full logs are
available at
https://meetbot.fedoraproject.org/fedora-meeting-2/2023-02-15/workstation.2…
.
Meeting summary
---------------
* Present members: Tom, Michael, Kalev, Owen, Chris, Allan, Matthias,
(brainycmurf, 19:37:51)
* Guests: (brainycmurf, 19:37:51)
* Regrets: Jens, Neal (brainycmurf, 19:37:51)
* Secretary: Chris (brainycmurf, 19:37:52)
* Fedora 38 beta blockers (brainycmurf, 19:37:52)
* LINK:
https://qa.fedoraproject.org/blockerbugs/milestone/38/beta/buglist
(brainycmurf, 19:37:52)
* Fedora 38 final blockers (brainycmurf, 19:37:59)
* LINK:
https://qa.fedoraproject.org/blockerbugs/milestone/38/final/buglist
(brainycmurf, 19:38:01)
* LINK: https://bugzilla.redhat.com/show_bug.cgi?id=2145219
(brainycmurf, 19:38:10)
* encryption of user data (excludes system) (brainycmurf, 19:38:14)
* LINK: https://pagure.io/fedora-workstation/issue/82 (brainycmurf,
19:38:16)
* encryption of system data (excludes user) (brainycmurf, 19:38:18)
* LINK: https://pagure.io/fedora-workstation/issue/136 (brainycmurf,
19:38:20)
* This is still a top goal but Workstation WG alone can't make this
happen, it needs broader support. (brainycmurf, 19:39:05)
* ACTION: Owen will summarize this discussion as an invitation to
explore it further. (brainycmurf, 19:39:10)
* Make the test day results more useful (brainycmurf, 19:39:16)
* LINK: https://pagure.io/fedora-workstation/issue/329 (brainycmurf,
19:39:18)
* Reconsider use of adwaita-qt (brainycmurf, 19:39:22)
* LINK: https://pagure.io/fedora-workstation/issue/351 (brainycmurf,
19:39:24)
* Announcements and status updates (brainycmurf, 19:39:28)
* Last week's meeting minutes posted (brainycmurf, 19:39:32)
* LINK:
https://meetbot.fedoraproject.org/fedora-meeting-2/2023-02-07/workstation.2…
(brainycmurf, 19:39:34)
* GNOME 44 UI freeze / beta release happened last Saturday
(brainycmurf, 19:39:39)
Meeting ended at 19:39:49 UTC.
Action Items
------------
* Owen will summarize this discussion as an invitation to explore it
further.
Action Items, by person
-----------------------
* **UNASSIGNED**
* Owen will summarize this discussion as an invitation to explore it
further.
People Present (lines said)
---------------------------
* brainycmurf (51)
* zodbot (7)
* Michael (0)
Generated by `MeetBot`_ 0.4
.. _`MeetBot`: https://fedoraproject.org/wiki/Zodbot#Meeting_Functions