> My suggestion would be that we make sure 'blockerbugs'
includes
> lists of each type of blocker. Ahead of and at Go/No-Go meetings,
> we would want to have a formal assurance from the person
> responsible for fixing the bug that the fix would be provided by a
> certain time - say, one day or two days ahead of the release date -
> and it would be QA's responsibility to ensure the updates are
> tested promptly, and releng's responsibility to ensure they are
> pushed on time after being tested. I would suggest the Program
> Manager ought to have overall responsibility for keeping an eye on
> the 0Day and Stable blocker lists and making sure the maintainer,
> QA, and releng all did their jobs on time.
The biggest issue is this, I think. We probably need to encode
"Special Blockers" into the Go/No-Go process. I don't think that
assurance that it will be fixed on time is necessarily good enough.
Particularly given the time that it takes stable updates to make it to
the mirrors, I'd say that we probably want to say that any such
special blockers have to be queued for stable before the Go/No-Go
decision is made. (This may in some cases mean *during* the Go/No-Go
meeting, of course.)
Well, here's our latest mess-up:
https://bodhi.fedoraproject.org/updates/FEDORA-2015-e00b75e39f
dnf-plugin-system-upgrade-0.7.0-1.fc22 had enough karma for stable on Oct 29, which was
Go/No-Go day. Therefore it was considered "resolved". However, it was pushed to
testing on Nov 2 (4 days later) and to stable on Nov 5 (5 days later!), which was the
public release day. Since mirrormanager is configured to serve even last-but-one metadata
(i.e. even 1-2 days old, relengs can provide a more precise value), many of our users
upgraded on Nov 5 and Nov 6 using an older version of system-upgrade which broke their
systems. Just read the comments:
https://fedoramagazine.org/upgrading-from-fedora-22-to-fedora-23/#comments
I was very unhappy. We solved most of the issues, it was a lot of work, and yet a large
group of people was hit by those old, long-resolved problems, just because of bad timing
and slow repo pushes (for whatever reason).
So, that update was "queued for stable before the Go/No-Go" as you proposed, and
yet we have failed to deliver it. So if we really want to avoid such problems in the
future, we either need to insist on "pushed to stable by Go/No-Go, no
exceptions", or we need to have another check on release day and verify that all
required builds were pushed to stable at least 2 days before -- if not, do not announce
the release and wait for more days. The first approach is slightly impractical (we
don't want to wait another week, it might be resolved in 2 more days; do we lift final
freeze or not?), the second approach is confusing for media (media announce we're Go,
and then nothing happens on the proclaimed release day).
What I see as a potential solution here is decoupling tasks that need to wait for the 0day
blockers and those which don't. So, at the Go/No-Go meeting, we can decide that it is
No-Go in general, but composes are final now and can be uploaded to proper locations for
mirrors to pick them up. I don't know exactly what else relengs need to do, but I
guess there will be other tasks that can be done. And in 2-3 days, we can have Go/No-Go
again, where we decide that even 0day blocker have been addressed, pushed to stable, and
we can pronounce the whole release Go, and publish the announcement immediately or the
next day or whatever's appropriate (bearing in mind that there should be 2 days period
after the 0day blockers are pushed stable).
WDYT? Reasonable? Complicated? Bonkers? Off the mark?