This has kinda been an elephant in the room that I've talked
about to a
few people but we haven't had much of a discussion about it yet. For the
sake of simplicity, I'm going to be talking specifically about depcheck
but most of this applies to upgradepath and possibly other tasks.
I have been pondering about the same issues wrt upgradepath lately.
<snip>
I have some ideas about how to address this that are variations on a
slightly different scheduling mantra:
1. Collect update/build change notifications
2. Run depcheck on affected koji tags at most every X minutes
Yes, I think that's the most reasonable approach we can do at the moment.
3. Report changes in build/update status on a per-build/per-update
basis at every depcheck run
I don't understand this. Is it somehow different from what we already do?
This way, we'd be scheduling actual depcheck runs less often but in a
way that is closer to how it actually works. From a maintainers'
perspective, nothing should change significantly - notifications will
arrive shortly after changes to a build/update are submitted.
To accomplish this, I propose the following:
1. Add a separate buildbot builder
I'm not completely familiar with 'builder' term. I read the docs [1], and I
see we have three builds on stage - all, i386 and x86_64 - but I'm not sure exactly
why. Autotest allowed us to select machines depending on arbitrary tags (arch,
distribution, virt capabilities, etc). I suppose we will need the same with Taskotron.
Will we add a new builder for every former tag and their combinations, or why exactly do
we need to solve this on builder level?
[1]
http://docs.buildbot.net/0.8.1/Builder.html#Builder
to handle depcheck and similar tasks
by adding a "fuse" to the actual kickoff of the task. The first
received signal would start the fuse and after X minutes, the task
would actually start and depcheck would run on the entire tag.
Yes, this sounds good. This is the simple concept. This advanced concept would be even
better:
* an incoming signal starts the fuse
* the fuse runs for X minutes
* after the job is executed (or finished), another fuse can't be started for Y minutes
(the next fuse timer X is ignited but frozen until Y expires)
With this, we can wait a short time (X) for additional signals (to mitigate a problem of
several signals coming shortly after each other), and then wait a long time (Y) until a
new job can be started (therefore we mandate some minimum period of time between jobs, to
lower the load).
But I guess that would be more difficult to implement and the simple fuse is good enough
for now.
2. Enhance taskotron-trigger to add a concept of a "delayed trigger"
which would work with the existing bodhi and koji listeners
but instead of immediately scheduling tasks based on incoming
fedmsgs, use the fused builder as described in 1.
Just a note - currently, upgradepath is triggered on any Bodhi update stable or testing
request. That is not optimal. Optimal is:
a) Don't trigger on update testing request (until [2] is implemented)
b) _Do_ trigger on any build tagged in Rawhide (i.e. Koji notification)
I'm not sure how to tackle this right now, aside from 'control.autoqa'-like
files from the past, but we will need to deal with that. A lot of checks in the future
won't be as simple as "run on any new Koji build", but "run on any new
Koji build if X and Y and not Z".
It's not an immediate priority, everything should somehow work now, because
upgradepath runs on the whole tag. So even if we schedule it a bit too often (a) or a bit
too seldom (b), the results will still get computed sooner or later. But since we're
re-working the triggers a bit, it might be good to have this on our minds.
[2]
https://phab.qadevel.cloud.fedoraproject.org/T153
Some changes to resultsdb would likely be needed as well but I don't
want to limit ourselves to what's currently available. When Josef and I
sat down and talked about results storage at Flock last year, we
decided to move forward with a simple resultdb so that we'd have a
method to store results knowing full well that it would likely need
significant changes in the near future.
Thoughts? Counter-proposals? Other suggestions?
For upgradepath, I was thinking about implementing a proper per-update checking (rather
than whole tag checking). So, if there was a new update foo-1.2, I would check just this
update and nothing else. The execution would be (much) faster, but we would spawn (much)
more jobs.
It would require some changes in the trigger code (see paragraph above) and also we would
need to spawn upgradepath for _every single new build in Rawhide_ (because that Rawhide
build could fix some Bodhi update issue in stable releases).
I'm not really sure this is worth it. It's a lot of work and the necessity to run
upgradepath on every new Rawhide build deters me a bit. The test infra load is probably
comparable or even higher than the fuse-based solution. But the check results would be
available sooner. For this moment, I would see this as a high priority, even if we decide
to do it. But I wanted to mention that for upgradepath, a different approach is possible
(based purely on notifications and targeted testing, not testing the whole tag), it's
just not a clear winner when compared to tag-based testing.