On 04/14/2011 09:45 AM, James Laska wrote:
On Thu, 2011-04-14 at 06:44 -0400, Kamil Paral wrote:
> Hello,
>
> I have been spoken to by Marcela Mašláňová about "The Future of
> FTBFS". See this thread:
>
>
http://lists.fedoraproject.org/pipermail/devel/2011-April/150310.html
>
>
> IIUC (Is there an abbreviation for "I'm not a developer"?) the problem
is as follows:
>
> * Matt Domsch from Dell used to rebuild *all* packages from Rawhide
> periodically (so-called "mass rebuild"). When some package failed
> to build, he reported errors against that package.
>
> * This testing ensured we often find build problems early in the
> release process. Without it there is a chance that we discover the
> build failures only when a new build of that package is required,
> which may be shortly before final release or even after that.
> That's a problem.
>
> * Mass-rebuilds in Koji are not done frequently (maybe once a
> year), so they can't cover this issue.
>
> * Matt can't do this testing anymore. Marcela asked me whether
> AutoQA could be used for that. Matt's tools (scripts, etc) should
> be available.
>
> * I asked Marcela to inquire more about some details. I have
> attached the discussion below (read from bottom up).
>
>
> What are your thoughts? Is that something AutoQA can and should
> handle? Do we (will we) have enough hardware to be able to do that?
> According to our current priorities, is that even something we are
> able to implement in some reasonable time (under a year)?
Could? Sure, we could make AutoQA handle anything we want it to :)
Should? Not sure on this one. I think that there would be value in
having AutoQA involved in the process (running tests on builds as they
are done) but I'm not so sure about the mass-rebuild process itself.
It feels a little outside the scope of AutoQA at the moment. AutoQA
seems to be more focused on individual builds and updates as they are
done in order to help keep a handle on package quality.
I guess it also comes down to priority. What are the order of our
priorities and how would facilitating a mass-rebuild fit into those
priorities? How important is a mass-rebuild of rawhide to the project?
Personally, I don't have a great feel for this ATM. I'll try to take a
look at Matt's code today or tomorrow to see what all was being done and
whether or not it would be a good idea to involve AutoQA.
> As for the last question, I think it clearly fits our current
> effort to provide generic Fedora-related tests. OTOH we still have
> many generic tests to finish (either un-started or semi-finished)
> and before that we need to concentrate on architecture first
> (ResultDB etc.). I'm afraid to have complex tests running without
> solid architecture basis beneath it. In that respect unless we all
> agree this is a top-priority next-to-work-on test (and provided
> that we have enough hardware for it) I don't think we're able to
> run it soon.
I haven't had the chance to look through Matt's scripts yet but it MIGHT
be possible to run it sooner if we go 'to the cloud!' or get some
hardware lent to us. I emphasize might here and am in no way committing
to anything or suggesting that it's a good idea :)
> Do we need some more information I should ask Matt for?
I can't see a ton of technical limitations on why AutoQA couldn't
manage this workflow. The big issue that jumps out to me are
resources (homand and hardware). I don't believe we have hardware
capacity to run this workflow now, can the current hardware be loaned
to Fedora? Additionally, running, maintaining and reviewing the
scripts/results appears to be a significant effort. Is there anyone
(rel-eng or devel) volunteering to maintain the scripts needed for
rebuilds?
I'm with you on the hardware part. Unless we have some extra HW that I
don't know of, I sincerely doubt that we have enough to do this AND the
normal autoqa stuff. We might have enough HW if we turned off all of
autoqa for the length of the mass-rebuild but I think that would be a
VERY bad idea. However, I think that seth vidal hit on an interesting
idea in that same thread on devel@ [1].
'Should' or 'should not' aside, If we're talking about doing
mass-rebuilds every so often, it might be worth looking into using
rackspace cloud or EC2 for the mass rebuild.
Another thing that might be worth looking into is setting up a
secondary, disposable koji+bodhi+autoqa infrastructure for the purpose
of the mass-rebuild. That way, extra scripts wouldn't have to be written
and maintained for the actual building.
I also agree that we would need help from others to at the very least,
review the results from said mass-rebuild. How was it handled in the past?
I'm just talking about ideas here, though. Thoughts?
Tim
[1]
http://lists.fedoraproject.org/pipermail/devel/2011-April/150319.html