Re: Fedmsg Emitting

Tuesday, 2 June 2015

On Tue, 2 Jun 2015 09:15:24 -0400 (EDT)
Kamil Paral <kparal(a)redhat.com&gt; wrote:

...
 > > Before we start sending fedmsgs we need to discuss a few
things.
 > > We don't have to find solutions to all these problems, just keep
 > > them in mind when designing the solution we're going to start
 > > with:
 > > 
 > > 1. How often do we send fedmsg
 > > a) per-task
 > > b) per-update
 > > c) per-build
 ...
 > 
 > That leaves us with c)

 Seems reasonable to me.

 > 
 > > I guess c) allows to easier filtering in FMN.
 > 
 > c) not only allows for easier filtering in FMN but it's also more
 > compatible with how I think that releng would like to see build
 > gating done. Assuming that we eventually get into the rawhide
 > space, we'll have to start emitting stuff per-build anyways :)
 > 
 > I'm of the opinion that c) is going to be best here. In the past,
 > we've done a lot of results on a per-update basis but unless I'm
 > forgetting something, we could transition to more of a per-build
 > system.
 > 
 > For example - depcheck processes updates - if one build in that
 > update fails, the whole update fails. While I think that this the
 > best choice, I also think that logic should be handled in bodhi
 > instead of us trying to emulate what bodhi is doing. As far as I
 > know, this is happening with bodhi2 - they're assuming that we'll
 > be emitting per-build fedmsgs and the logic for failing/passing an
 > update will lie in bodhi and not rely on our emulation of bodhi's
 > processes.

 That's great to hear.

 > 
 > > 2. Who do we target: users, systems or both
 > > 
 > > The issue here is with tasks that repeatedly test one update.
 > > Currently we check if there's a bodhi update comment with the same
 > > result already and if so, we don't post the comment again. To do
 > > something like that with fedmsgs we'd have to have a code running
 > > somewhere that would check against its database whether an
 > > incoming result is a duplicate or not. The question is where the
 > > code would run. Bodhi comes to mind since it already has
 > > information about updates and so is good for tasks that work with
 > > bodhi updates. However, there might be tasks that work with
 > > something else, like composes. In this case we'd probably have
 > > the code on taskotron systems.
 > 
 > I think that how we handle scheduling of some of our current checks
 > (depcheck and upgradepath) is a byproduct of trying to make a
 > repo-level check look like a build/update-level check. I can't
 > think of many more tasks that would run into the same problem of
 > repeated runs.

 I agree that depcheck and upgradepath are somewhat "special" here.
 Once Fedora Infra sees how many results we publish daily, especially
 during freeze periods (when there are lots of packaging pending
 stable), they might ask us to come up with a better solution, I'm
 afraid. I'm not sure if there are better ways to handle it, either
 way, there will probably always be some kind of check that will
 require this kind of constant re-running. But it seems reasonable to
 assume that it will be a small minority in the overall task pool.

 > 
 > For the majority of tasks, I see the process as being similar to:
 > 
 >   1. trigger task $x for $y
 >   2. run task $x with $y as input
 >   3. report result for $x($y)
 > 
 > With this, we'd be running $x for each $y and the reporting would
 > only happen for each unique ($x, $y) assuming that something wasn't
 > rescheduled or forced to re-run.
 > 
 > I think it would be best to have consistent behavior for our fedmsg
 > emitting. If most tasks will only emit fedmsgs once, we should take
 > our minority tasks that emit more than one fedmsg per item and
 > deduplicate before the messages are emitted.

 Or, you can say that most tasks emit fedmsgs always (even though that
 means just once), and therefore the minority tasks should also emit
 it always :) I agree with having a consistent behavior. But I think
 it's possible to find a solution side-stepping this. See below.

 > 
 > > So if we target systems we'd just send all results in fedmsgs and
 > > let the systems consume them and do whatever they want to do with
 > > them (e.g. bodhi can squash all the tasks relevant to specific
 > > update and notify the maintainer of the package via fedmsg about
 > > the result). If we target users, we'd have to have some logic to
 > > limit rate of fedmsgs ourselves but that would mean hiding some
 > > of the results (although duplicates) from the world.
 > 
 > I'd like to see us do the deduplication in resultsdb (assuming
 > that's where the fedmsg emission will be happening). I think that
 > we already have a table for items and I don't think that keeping
 > track of "is_emitted" and the last state emitted (so we can track
 > changes in state) would be too bad. Then again, I'm not the one
 > working in the code and I could be wrong :)

 We talked with Martin about this in length some time ago, and I
 raised the question of different consumers. I see two groups here -
 machines and humans. If I understand you correctly, what you propose
 up there is to hardcode the system to fit human preferences. If I
 misunderstood it, then the whole rest of the mail is based on wrong
 assumptions, but it's still an interesting topic :) 
I think of it more as trying to ensure consistent behavior in the data
that we emit to external entities, whether that's a machine or a human.

In my mind, the core of the issue is that we're taking what are
essentially per-repo checks and hacking them until they emulate a
per-build or per-update check as closely as we can make it.

This means that we end up with a lot of "duplicate" results every time
that the check is run because we have to run it as check(repo) instead
of check(build) or check(update).

Since we're trying to emulate a per-build or per-update check (in the
case of upgradepath and depcheck) I don't see a reason why a human or
machine consumer would want or need the "duplicate" results beyond how
we emulate the per-build or per-update check that external entities are
expecting.

FWIW, I don't think we're going to have many more of these emulation
checks. I also think that we can get away with an "emulation mode" for
resultsdb that handles checks like upgradepath and depcheck differently
than other results so that they behave like other "one result per
change" checks but I'm not the one writing that code. If martin thinks
that's not a tractable solution, I'm game for other options.

...
 When targeting humans, I believe we will cut off some use cases for
 machines, which can benefit from duplicated (and thus very
 up-to-date) information. Some ideas from top of my head describing
 what duplicated messages allow:
 * For some checks like depcheck, the machine (i.e. Bodhi) can not
 only display the outcome for a certain package, but also the time
 when this package was last tested (might be a very interesting piece
 of information, was it OK yesterday, or 14 days ago and not run
 since?).
 * Or maybe show a graph of all the outcomes in the last week - so
 that you can see that it passed 10 times and failed once, and decide
 that the failure was probably a random fluke which is not worth
 investigating further. 
What would another interpretation of failing once and then passing for
all subsequent runs be? I can't think of a situation other than bugs in
the check that would be anything other than pass.

...
 * If the message passes through another system (e.g. Bodhi, Koji),
 the system in question can e.g. allows users to configure how they
 want to receive results - whether duplicated or deduplicated, how
 much deduplicated, how often, etc. This is mostly true for email, RSS
 or some other communication channels, because fedmsg bus itself is
 not configurable per individual users' needs. 
I'm not sure there would be enough people interested in receiving that
kind of data to make it worth the effort supplying it through bodhi or
fedmsg would entail. If we were going to have a bunch of checks that
were emulating per-event checks, then maybe but for upgradepath and
depcheck? I just can't see the desire or demand.

...
 * It's possible to create some kind of package testing stats
 overview, live and without regular queries. 
If I'm understanding you correctly, this has been in the back of my
mind for a while but I think it can be done without the duplicated data
- just showing current check state.

...
 You can argue that most of this is achievable without duplicated
 messages, by querying the ResultsDB. Yes, but it often means
 increased performance hit and you lose the "live" status. For
 example, in order to display the graph from the second point, you can
 choose the query ResultsDB for every page view, but that means a lot
 of computing demand. Or you can cache it and refresh it once an hour,
 but that loses the live status. With notifications, you can have it
 always perfectly up-to-date and you don't need to refresh it
 needlessly. You can put in a safeguard against lost fedmsgs like
 "refresh the graph if older than a week, just to be safe", but that's
 it.

 So, for machine processing, I see duplicated messages as a benefit. I
 don't insist we need to have it, but it seems to allow interesting
 tools to be written. (A different question is whether the volume
 won't be too high for fedmsg bus to process it, but that is a
 separate and a technical issue.)

 If some machine didn't want to see duplicated messages and wanted to
 be able to easily filter them out without keeping its own database of
 querying ours, we can add something like "duplicate=True" into the
 message body? Simple solution, for machines.

 Now, let's imagine we still decide for message deduplication and we
 chose the human as our primary notification target. There are further
 issues with it. Let's imagine a simple scenario:

 1. A maintainer submits update U1 consisting of builds B1 and B2.
 2. Depcheck x86_64 runs on U1, reports results.
 3. Maintainer receives two fedmsg notifications, one for B1 and one
 for B2, from FMN (email or irc).
 4. Depcheck i386 runs on U1, reports results.
 3. Maintainer receives two fedmsg notifications, one for B1
 and one for B2, from FMN (email or irc).
 6. Depcheck armhfp runs on U1, reports results.
 3. Maintainer receives two fedmsg notifications, one for B1 and one
 for B2, from FMN (email or irc).
 8. Upgradepath noarch runs on U1, reports results.
 3. Maintainer receives two fedmsg notifications, one for B1 and one
 for B2, from FMN (email or irc).

 As you can see, the maintainer receives "number of builds x number of
 architectures (except for noarch checks) x number of checks" results.
 And the notifications are distributed in time, not sent together at
 once.

 So, if we really want to do a good job in informing the maintainer
 here, deduplication of future results is just one part of the story.
 We also need to combine:
 * individual build results, if they are part of a bigger object
 (update)
 * architecture results, for checks which are architecture dependent
 * individual check results, if we run multiple checks

 So that ideally:
 1. A maintainer submits update U1 consisting of builds B1 and B2.
 2. Depcheck x86_64 runs on U1, reports results.
 3. Depcheck i386 runs on U1, reports results.
 4. Depcheck armhfp runs on U1, reports results.
 5. Upgradepath noarch runs on U1, reports results.
 6. Maintainer receives a single fedmsg notification about U1, from
 FMN (email or irc).

 Unfortunately, this means we would have to implement a lot of
 external logic (i.e. Bodhi's "what is an update" logic), which is
 something we're trying to get away from (we have our unpleasant
 experience with bodhi comments feature which deals with lots of this
 stuff). 
Yeah, this would be great but that's the exact thinking that gave us
the current bodhi comment code that needs to die in a fire :)

IIRC, one of the things that we agreed on a while back was that we
wouldn't try to do any of that by ourselves in the future.

...
 Taking all of this into account, it seems easier and more sensible
to
 me to target machines with taskotron fedmsgs. Let's see:

 1. A maintainer submits update U1 consisting of builds B1 and B2.
 2. Taskotron gradually executes all available checks on B1 and B2.
 3. Taskotron emits fedmsgs for every completed check, for every
 architecture, for every build.
 4. Bodhi listens for Taskotron fedmsgs, marks internally (and
 possibly in the web UI) which builds were tested with what result,
 adds/updates links to logs.
 5. Once results for all builds x archs x checks were received, or
 once some timeout occurred (e.g. "wait at most 8 hours for test
 results"), Bodhi sends its own fedmsg.
 6. Maintainer listens for _Bodhi_ fedmsgs and receives a single
 notification that U1 testing is complete.

 Now, because of the fact that Bodhi is designed for publishing
 updates, it can tailor the messaging behavior nicely. It can either
 notify after all testing is complete, or it can notify immediately
 after the first failure. It can have timeouts in case some tests get
 stuck. I'm not sure if it can make some of these things configurable
 for the particular maintainer, I think that is no longer possible
 when using fedmsgs instead of emails. But it can publish under
 different topics (e.g. first failure vs testing complete) and
 maintainers can subscribe to what suits them. (And if they're feeling
 particularly tough, they can of course also subscribe to the flood of
 core taskotron fedmsgs).

 Furthermore, Bodhi can put additional logic into this, splitting
 checks into essential a non-essential group. I.e. depcheck +
 upgradepath vs rpmlint + rpmgrill. The notifications can fire off
 after the essential testing is complete, or maybe then can wait for
 all testing but ignore potential failures in non-essential group (and
 set the overall outcome to something like INFO, if e.g. only rpmlint
 failed).

 With this approach, I like that the Bodhi logic is configured in
 Bodhi, and we're not trying to emulate it, we just supply raw data.
 People subscribe to Bodhi notifications. The same approach can be
 used with Koji or any other service - we're supplying data, they're
 deciding what to do with it, what is important and what is not, and
 they're sending final result notifications (or even partial if they
 want and make sense). 
Assuming that the bodhi devs are onboard with taking on most of the
maintenance of all that, I like the idea of keeping most of that in
bodhi or at least not in resultsdb. From the start, Josef and I
(maybe more folks, don't recall) agreed that resultsdb should not care
about whether a given item is a global pass/fail or overridden.
Resultsdb should only care that item $x was run with result $r.
Anything beyond that needs to be in a different system that is
capable of using resultsdb as input.

The thing I don't understand is how deduplicating our emulated results
from upgradepath or depcheck prevents any of that you've described. As
long as we're emitting results on state change, an external system
would be plenty capable of doing what you describe. Am I missing
something?

...
 But what about results which don't have a specific service, you
ask?
 What if new glibc is submitted and existing firefox is tested against
 it using firefox-regression-suite check, where does these results go?
 Great question.

 I think the raw Taskotron fedmsgs are the answer here. Hopefully most
 of these checks will be one-shot execution (unlike continuous
 execution like depcheck). So if maintainers subscribe to our
 messages, they should receive one result per every arch at worst,
 i.e. 3 separate notifications for a single execution. Or, if they
 have some really special kind of check, they'd process the
 notifications on their own. Once we're there and checks like these
 are more common, we can talk about providing services for further
 deduplication. But still, even if we really need to do this in some
 specific cases, I think the general approach should be the one
 outlined above, where we don't notify people directly but send it
 through middle-man services with their own logic and special needs. 
Yeah, I think that for the near future after we have package-specific
checks we'll be limited to taskotron fedmsgs. Once we have everything
in place to support stuff like this, we can see if there are other
reporting/status mechanisms that make sense.

...
 Now, after seeing the wall of text I've written, I wonder, have
I
 actually kept to the original topic, or strayed away into a
 completely different area? :-) 
I think most of it is broadly on the topic of how notifications and
results are presented to users. Not sure that it's all required for
fedmsg emission from resultsdb but I could be missing something :)

Tim

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

Re: Fedmsg Emitting