This ended up being rather long, so as an executive summary:
I'm proposing that we change the 0.5.0 release of AutoQA to focus on getting the best information and the least noise to package maintainers. The focus would be on decreasing the number of emails that maintainers are receiving and improving the understandability of our logs (focusing on depcheck and upgradepath).
This is a proposal, and I'm hoping to spark a discussion with this thread; not dictate a change to our roadmap.
------------------------------------------------------------
After the thread in devel@ about the volume of email coming out of bodhi for AutoQA 'PASSED' comments [1], I started thinking more about some of my past experiences with user complaints about the information they're presented with.
My major concern is this: low signal to noise ratio (SNR) in output leads to users ignoring or refusing to use the tool as a whole. In my experience, once users start ignoring the tool it is a very difficult, uphill battle to get them to stop ignoring/hating/distrusting it.
At the moment, I think that we have two major SNR ratio issues in AutoQA: comment emails coming out of bodhi and log files (especially depcheck and upgradepath).
I can't seem to find the book I have that discusses it right now, but one of the things that I believe strongly is: testing output that is difficult for a human (with sufficient background knowledge) to understand isn't much better than not testing at all and is actually worse in some cases (we're not at that point, though).
As I heard James Bach [2] put it; building software is like building a house. The developers are the construction workers and the testers are responsible for shining light on the places of the house that need work. The people with the light can't directly build the house but the construction workers can build a better house when the light is shining on the most important things to fix.
When we have a low SNR, our output is muddled which isn't very far from putting tissue paper over the lights in the building analogy. The light might be in the right places but it's really hard for the builders to tell where exactly the light is pointing and have a harder time fixing those issues.
So, what is the point of all this? Basically, I'm proposing that we hijack our current plans for 0.5.0 and re-focus on improving the SNR and usability of our current tests. Specifically, I would like to see us focus on two things:
1) Stop spamming maintainers with not-needed comment updates from bodhi - The current proposal is covered in #314 [3]
2) Improve our logging - focusing on depcheck and upgradepath - Goal 1: maintainers should be able to find the information they need about why their package failed within 30 seconds of opening the log file.
- Goal 2: users should be able to easily find documentation on what a test is supposed to do and examples on how to triage a failure using our logging output.
How could we get there? 1) This seems pretty straight forward to me. I didn't think that the bodhi side of things would get implemented quite that quickly or I would have said something earlier. From my conversations with Kamil and James, I don't think that there is going to be much resistance to this solution, though.
2) This would take a bit more work. I'm not sure if the exact goal of 30 seconds is possible but I'm of the opinion that it is a good place to start. We could start by looking at what we want the output to look like, determine if that is possible and do some user testing to get input from actual maintainers (not directly involved with AutoQA).
I know why things are the way that they are and I don't disagree with those design decisions. Blamestorming is pointless and counter-productive, anyways. I'm interested in finding a solution :)
And now, discussion time! Thoughts, suggestions, complaints?
Tim
[1] http://lists.fedoraproject.org/pipermail/devel/2011-April/150901.html [2] http://www.satisfice.com/aboutjames.shtml [3] https://fedorahosted.org/autoqa/ticket/314
I'm proposing that we change the 0.5.0 release of AutoQA to focus on getting the best information and the least noise to package maintainers. The focus would be on decreasing the number of emails that maintainers are receiving and improving the understandability of our logs (focusing on depcheck and upgradepath).
I talked about this with tflink in length yesterday on IRC and he convinced me that we should really spend some time looking into this. At first I was unwilling to waste more time with temporary solutions like bodhi comments, but finally I had to admit that the "proper" solutions will take some months. And in the meantime quite a few developers could really start ignoring anything AutoQA-related or even hate it.
Therefore I'm inclined to re-think the work that needs to be done next and propose changes that won't require heavy architecture changes, but OTOH will improve the impression AutoQA makes on package maintainers. Like better logs readability. Like documentation.
If we create a set of tickets relevant to this topic, I think we could make it as the theme for 0.5.0. And move the current tickets from 0.5.0 to 0.6.0, or some similar approach.
- Stop spamming maintainers with not-needed comment updates from
bodhi
- The current proposal is covered in #314 [3]
This needs to be discussed properly, I'd probably create a new thread about it. Do we really want to discard emails for PASSED results? What if the test FAILED first and PASSED after that, does it change anything? And what about discarding emails for all results altogether? How do we expect maintainers to learn the result when we won't notify them (at least in some cases)? Can we do it configurable per-maintainer? Are we able to send a single email after all tests have passed? Is there a different approach?
Too many questions :) Anyway I agree we can improve in this area, we just need to carefully consider how.
- Improve our logging - focusing on depcheck and upgradepath
- Goal 1: maintainers should be able to find the information they
need about why their package failed within 30 seconds of opening the log file.
Possible tickets: * split one big result log into small logs per-update * create highlights per-update (relevant to ↑) * search through command output (depcheck output is a good example) and look for possible lines indicating the failure, then offer those to the maintainer (e.g. highlight them)
- Goal 2: users should be able to easily find documentation on what
a test is supposed to do and examples on how to triage a failure using our logging output.
Possible tickets: * create a wiki page for every test with thorough description of the test, how to interpret the results, how to find the cause of the failure in the log, the explanation for most common failures, links to resources with correct procedure descriptions (e.g. packaging guidelines, etc) * link those wiki pages from bodhi comments (can we use html in the comments, at least <a> tag?)
I think that's enough ideas for (brand new) 0.5.0, what do you think?
And now, discussion time! Thoughts, suggestions, complaints?
Yes, please.
Hi,
I completely agree, that we should create a documentation on 'how to read XYZ test results', especially for depcheck (since upgradepath is more-or-less pretty straightforward), because sometimes, it takes even me (the 'allmighty autoqa developer') quite a significant time to decipher, why is it failing.
I thought, that we could write some 'filter' to show (highlight) only those lines, which are relevant to the failing update, so I found one FAILED Depcheck run, and started to dig into that.
Let's take this one as an example https://fedorahosted.org/pipermail/autoqa-results/2011-April/109935.html (I'll use http://autoqa.fedoraproject.org/results/89197-autotest/172.16.0.20/debug/cli... for line-numbers reference)
---------------------
From the mail, we see that there's a new update, and it fails:
Update FAILED: openscada-0.7.1-3.fc15 Build FAILED: openscada-0.7.1-3.fc15 Rejected: openscada-0.7.1-3.fc15
Specificaly, it's build openscada-0.7.1-3.fc15 fails.
----
From the client.DEBUG file, it's quite clear, that there really is a new build in -pending (line 37):
04/28 18:15:22 INFO | depcheck:0082| Pending builds: ['openscada-0.7.1-3.fc15']
The build contains 88 RPMs, which are fetched (lines 722 to 810), with the rest of RPMs for the already accepted builds.
Simple search for "depsolving problems" takes me to line 1283 (and goes on down to line 1485)
04/28 18:17:05 DEBUG| utils:0105| [stdout] SKIPBROKEN: openscada-demo-0.7.0.2-3.fc15.x86_64 from f15 has depsolving problems 04/28 18:17:05 DEBUG| utils:0105| [stdout] SKIPBROKEN: --> Package: openscada-demo-0.7.0.2-3.fc15.x86_64 (f15) 04/28 18:17:05 DEBUG| utils:0105| [stdout] --> Requires: openscada-Protocol-HTTP = 0.7.0.2-3.fc15 04/28 18:17:05 DEBUG| utils:0105| [stdout] --> Removing: openscada-Protocol-HTTP-0.7.0.2-3.fc15.x86_64 (f15) 04/28 18:17:05 DEBUG| utils:0105| [stdout] --> openscada-Protocol-HTTP = 0.7.0.2-3.fc15 04/28 18:17:05 DEBUG| utils:0105| [stdout] --> Updated By: openscada-Protocol-HTTP-0.7.1-3.fc15.x86_64 (pending) 04/28 18:17:05 DEBUG| utils:0105| [stdout] --> openscada-Protocol-HTTP = 0.7.1-3.fc15
Carefull reader will notice, that it's actually _the old version_ of openscada-demo, which is failing. What was not so clear to me, though, was why is it even taking the old version into account - it should be updated by the openscada-0.7.1-3.fc15.
After an hour or so of basically pointless diggign through all the logs from the depcheck, and even trying to draw 'dependencies diagram', I realized, that all the 'depsolving problems' are just for that one rpm 'openscada-demo-0.7.0.2-3.fc15.x86_64'. No other openscada RPM seemed to have the dependency issues.
At first, I thought about some circular dependencies (hence the diagram), but in the moment of clarity, I looked went through the RPMs contained in the openscada-0.7.1-3.fc15 build (lines 722 to 810). You probably guessed it - the openscada-demo is not a RPM in this build. And this is causing it to fail.
---------------------
What have I taken from this?
It might be usefull to aggregate all the "SKIPBROKEN: <ENVRA> from <repo> has depsolving issues" lines, and filter out the <package name>s duplicities. So we're sure which rpm is actually causing the trouble. This is really valuable information, because it not only tells you, which build caused the update to fail, but even which particular RPMs are the root of that problem. This is IMHO generally usefull information.
Then, we should probably describe some specific 'Fail scenarios', and ways to spot them. In this particular case, it would be helpfull to emphasise, that RPM which is not a part of the update is causing trouble. And that it might be good idea to check, whether the <N from ENVRA> is a RPM in the update. I can't really find a way to describe it in general terms, so taking this specific example:
1) remember all rpms, which were a part of 'Pending builds' (lines 722 to 810) 2) openscada-demo-0.7.0.2-3.fc15.x86_64 has problems 3) take the 'name' part from the envra (openscada-demo), and check, whether rpm file with name 'openscada-demo' was a part of 'Pending builds' 4) if it wasn't, then print out a message like "Depcheck is might be failing, because openscada-demo is not a part of the openscada build." 5) if it was, then it's probably a different 'fail-case', and we should investigate further.
So to sum it up - it might be good to set up some heuristic alghoritms, which would try to hint, what went wrong.
J.
Eh,
when I read it for the second time, it seems like a buch of blabber, so let me explain.
Inspired by Tim's idea, I started to think about a 'simple filter' which would highlight only the lines relevant to specified build/update, so the developers do not need to dig through a bunch of not-really-interesting stuff (e.g. other builds, and some depcheck 'noise'). An example of how the filtered document could look like is at the end of this email.
Based on this 'simple filter' idea, my mind spiraled off, and I thought, that we could sketch in some 'Fail-scenarios' (as a wiki document for start, and possibly as a heuristic script later), based on the results we already have, so the developers can use it as reference while researching the (possibly filtered) results.
In the previous email, I described one specific example of such a fail-scenario, and my 'personal englightenment' based on the results of my research.
Hope you find it at least a bit informative, and not too daunting and confusing :)
J.
----- Original Message -----
From: "Josef Skladanka" jskladan@redhat.com To: "AutoQA development" autoqa-devel@lists.fedorahosted.org Sent: Friday, April 29, 2011 1:31:12 PM Subject: Re: Proposed Change in Focus for 0.5.0
<snipped>
On Fri, 2011-04-29 at 07:31 -0400, Josef Skladanka wrote:
Hi,
I completely agree, that we should create a documentation on 'how to read XYZ test results', especially for depcheck (since upgradepath is more-or-less pretty straightforward), because sometimes, it takes even me (the 'allmighty autoqa developer') quite a significant time to decipher, why is it failing.
The feedback I've heard on upgradepath is it's not clear where the results are for the specific update. There is too much scrolling required. If would be really helpful to link directly to a page that contains *only* the packages tested in the update. Or perhaps output.log is html and has <href>'s for each update in a table of contents?
The same applies to depcheck, and any test that operates over all updates, not just single packages. Perhaps we need to find a way to eliminate/reduce page scrolling for the test output (not the debug logs)?
I thought, that we could write some 'filter' to show (highlight) only those lines, which are relevant to the failing update, so I found one FAILED Depcheck run, and started to dig into that.
Let's take this one as an example https://fedorahosted.org/pipermail/autoqa-results/2011-April/109935.html (I'll use http://autoqa.fedoraproject.org/results/89197-autotest/172.16.0.20/debug/cli... for line-numbers reference)
From the mail, we see that there's a new update, and it fails:
Update FAILED: openscada-0.7.1-3.fc15 Build FAILED: openscada-0.7.1-3.fc15 Rejected: openscada-0.7.1-3.fc15
Specificaly, it's build openscada-0.7.1-3.fc15 fails.
From the client.DEBUG file, it's quite clear, that there really is a
new build in -pending (line 37): 04/28 18:15:22 INFO | depcheck:0082| Pending builds: ['openscada-0.7.1-3.fc15']
The build contains 88 RPMs, which are fetched (lines 722 to 810), with the rest of RPMs for the already accepted builds.
Simple search for "depsolving problems" takes me to line 1283 (and goes on down to line 1485)
04/28 18:17:05 DEBUG| utils:0105| [stdout] SKIPBROKEN: openscada-demo-0.7.0.2-3.fc15.x86_64 from f15 has depsolving problems 04/28 18:17:05 DEBUG| utils:0105| [stdout] SKIPBROKEN: --> Package: openscada-demo-0.7.0.2-3.fc15.x86_64 (f15) 04/28 18:17:05 DEBUG| utils:0105| [stdout] --> Requires: openscada-Protocol-HTTP = 0.7.0.2-3.fc15 04/28 18:17:05 DEBUG| utils:0105| [stdout] --> Removing: openscada-Protocol-HTTP-0.7.0.2-3.fc15.x86_64 (f15) 04/28 18:17:05 DEBUG| utils:0105| [stdout] --> openscada-Protocol-HTTP = 0.7.0.2-3.fc15 04/28 18:17:05 DEBUG| utils:0105| [stdout] --> Updated By: openscada-Protocol-HTTP-0.7.1-3.fc15.x86_64 (pending) 04/28 18:17:05 DEBUG| utils:0105| [stdout] --> openscada-Protocol-HTTP = 0.7.1-3.fc15
I'm horrible at interpreting these results. To confirm, this is saying that the package openscada-demo has a dep problem. The problem is that openscada-demo has a strict version requirement on openscada-Protocol-HTTP = 0.7.0.2-3.fc15, and that requirement is not met since openscada-Protocol-HTTP is now updated to 0.7.1-3.fc15?
Carefull reader will notice, that it's actually _the old version_ of openscada-demo, which is failing. What was not so clear to me, though, was why is it even taking the old version into account - it should be updated by the openscada-0.7.1-3.fc15.
After an hour or so of basically pointless diggign through all the logs from the depcheck, and even trying to draw 'dependencies diagram', I realized, that all the 'depsolving problems' are just for that one rpm 'openscada-demo-0.7.0.2-3.fc15.x86_64'. No other openscada RPM seemed to have the dependency issues.
At first, I thought about some circular dependencies (hence the diagram), but in the moment of clarity, I looked went through the RPMs contained in the openscada-0.7.1-3.fc15 build (lines 722 to 810). You probably guessed it - the openscada-demo is not a RPM in this build. And this is causing it to fail.
What have I taken from this?
It might be usefull to aggregate all the "SKIPBROKEN: <ENVRA> from <repo> has depsolving issues" lines, and filter out the <package name>s duplicities. So we're sure which rpm is actually causing the trouble. This is really valuable information, because it not only tells you, which build caused the update to fail, but even which particular RPMs are the root of that problem. This is IMHO generally usefull information.
Then, we should probably describe some specific 'Fail scenarios', and ways to spot them. In this particular case, it would be helpfull to emphasise, that RPM which is not a part of the update is causing trouble. And that it might be good idea to check, whether the <N from ENVRA> is a RPM in the update. I can't really find a way to describe it in general terms, so taking this specific example:
- remember all rpms, which were a part of 'Pending builds' (lines 722 to 810)
- openscada-demo-0.7.0.2-3.fc15.x86_64 has problems
- take the 'name' part from the envra (openscada-demo), and check, whether rpm file with name 'openscada-demo' was a part of 'Pending builds'
- if it wasn't, then print out a message like "Depcheck is might be failing, because openscada-demo is not a part of the openscada build."
- if it was, then it's probably a different 'fail-case', and we should investigate further.
So to sum it up - it might be good to set up some heuristic alghoritms, which would try to hint, what went wrong.
Great analysis Joza! If it's at all possible to make some informed guesses based on the yum output, I could see that going a long way to provide a concise actionable result for maintainers. Or at the very least, showing *only* the yum dep failure output for each update might go a long way.
Thanks, James
On 04/29/2011 08:13 AM, James Laska wrote:
Great analysis Joza! If it's at all possible to make some informed guesses based on the yum output, I could see that going a long way to provide a concise actionable result for maintainers. Or at the very least, showing *only* the yum dep failure output for each update might go a long way.
+1 on the great analysis and ideas!
From here, I propose we do the following: - Each compile a list of the things that we would like to see the logs do and contain.
- Mockups of what the logs could look like would also be awesome - The more simple mockups we do, the more we have to choose and learn from - Email the mockups the list (or to me, if you'd prefer) - Thoughts on how to do it would be great, too!
- Meet via phone on Tuesday or Wednesday - Complete with some collaborative editing session (ie gobby)
Regarding times for said meeting, how about 13:30 UTC (09:30 EDT, 15:30 CEST) on either Tuesday (2011-03-03) or Wednesday (2011-03-04)? Any conflicts, preferences or other suggestions?
Tim
On 04/29/2011 02:50 AM, Kamil Paral wrote:
I'm proposing that we change the 0.5.0 release of AutoQA to focus on getting the best information and the least noise to package maintainers. The focus would be on decreasing the number of emails that maintainers are receiving and improving the understandability of our logs (focusing on depcheck and upgradepath).
I talked about this with tflink in length yesterday on IRC and he convinced me that we should really spend some time looking into this. At first I was unwilling to waste more time with temporary solutions like bodhi comments, but finally I had to admit that the "proper" solutions will take some months. And in the meantime quite a few developers could really start ignoring anything AutoQA-related or even hate it.
On a political front, it also makes us look good by responding to issues raised outside of the team. Again, coming from my experience at least, if the devs rarely act in response to user-raised issues those users are not as likely to come to the devs with ideas and suggestions.
I'm not saying that political stuff is the most important thing and this might be obvious but I'm of the mind that it's a bit of give and take. If we want input and help from users, it would help to respond to at least some popular requests with more than "we don't have time for that but look at this shiny thing in the future".
Therefore I'm inclined to re-think the work that needs to be done next and propose changes that won't require heavy architecture changes, but OTOH will improve the impression AutoQA makes on package maintainers. Like better logs readability. Like documentation.
I also think that aside from the bodhi comments part, the results from this will be useful for a while. Easily readable and understandable logs are always a good thing.
If we create a set of tickets relevant to this topic, I think we could make it as the theme for 0.5.0. And move the current tickets from 0.5.0 to 0.6.0, or some similar approach.
Sounds like a plan to me. Do we want to continue the discussion a bit more before we start writing tickets or get started on that now?
- Stop spamming maintainers with not-needed comment updates from
bodhi - The current proposal is covered in #314 [3]
This needs to be discussed properly, I'd probably create a new thread about it. Do we really want to discard emails for PASSED results? What if the test FAILED first and PASSED after that, does it change anything? And what about discarding emails for all results altogether? How do we expect maintainers to learn the result when we won't notify them (at least in some cases)? Can we do it configurable per-maintainer? Are we able to send a single email after all tests have passed? Is there a different approach?
Too many questions :) Anyway I agree we can improve in this area, we just need to carefully consider how.
Agreed on the discussion part. Honestly, I hadn't thought about the failed then passed situation and I bet that there are more situations that I hadn't thought of. I'll start a separate conversation for this and delete the review request.
Like I said before, I didn't expect the changes to bodhi to happen quite that fast. My email to lmacken was more of a 'could any of this be done' question than anything. It's not a criticism and more of a compliment and a thanks for the quick turn around.
Personally, I'm still of the mind that this interface is going to be the best way for us to reduce the amount of emails that are being sent out to maintainers but you're right; our usage is going to require more thought. Not too much, though because as you pointed out, ResultsDB is going to render this pretty much useless.
- Improve our logging - focusing on depcheck and upgradepath -
Goal 1: maintainers should be able to find the information they need about why their package failed within 30 seconds of opening the log file.
Possible tickets: * split one big result log into small logs per-update * create highlights per-update (relevant to ↑) * search through command output (depcheck output is a good example) and look for possible lines indicating the failure, then offer those to the maintainer (e.g. highlight them)
For depcheck, in particular I think that we can also make better use of the information we already have when we generate the logs. I haven't dug into this yet but we might be able to generate more intelligent logs that are more easily filtered (as Joza is talking about later in the thread.
Another thought that James brought up is generating html logs instead of just plain text. AFAIK, our users are viewing results in a web browser and we could take advantage of that.
I'm not hearing any thoughts to the contrary but figured that it wouldn't hurt to be clear. If we start highlighting stuff or splitting off sub-files, we still need to make sure that the raw data is still available for debugging purposes or if there is some situation where our newfangled output is hiding information.
- Goal 2: users should be able to easily find documentation on
what a test is supposed to do and examples on how to triage a failure using our logging output.
Possible tickets: * create a wiki page for every test with thorough description of the test, how to interpret the results, how to find the cause of the failure in the log, the explanation for most common failures, links to resources with correct procedure descriptions (e.g. packaging guidelines, etc) * link those wiki pages from bodhi comments (can we use html in the comments, at least <a> tag?)
Those sound like good ideas to me. I imagine that we also want to be a bit proactive and start bugging maintainers on email lists, blogs etc - "this is AutoQA, we've made it better. Here are examples on how to effectively use the output, let us know if you have questions." kind of stuff.
I think that's enough ideas for (brand new) 0.5.0, what do you think?
I don't know, I want a pony ... does that fit into 0.5.0 ?
And now, discussion time! Thoughts, suggestions, complaints?
Yes, please. _______________________________________________ autoqa-devel mailing list autoqa-devel@lists.fedorahosted.org https://fedorahosted.org/mailman/listinfo/autoqa-devel
autoqa-devel@lists.fedorahosted.org