Hello,
I have been spoken to by Marcela Mašláňová about "The Future of FTBFS". See this thread:
http://lists.fedoraproject.org/pipermail/devel/2011-April/150310.html
IIUC (Is there an abbreviation for "I'm not a developer"?) the problem is as follows:
* Matt Domsch from Dell used to rebuild *all* packages from Rawhide periodically (so-called "mass rebuild"). When some package failed to build, he reported errors against that package.
* This testing ensured we often find build problems early in the release process. Without it there is a chance that we discover the build failures only when a new build of that package is required, which may be shortly before final release or even after that. That's a problem.
* Mass-rebuilds in Koji are not done frequently (maybe once a year), so they can't cover this issue.
* Matt can't do this testing anymore. Marcela asked me whether AutoQA could be used for that. Matt's tools (scripts, etc) should be available.
* I asked Marcela to inquire more about some details. I have attached the discussion below (read from bottom up).
What are your thoughts? Is that something AutoQA can and should handle? Do we (will we) have enough hardware to be able to do that? According to our current priorities, is that even something we are able to implement in some reasonable time (under a year)?
As for the last question, I think it clearly fits our current effort to provide generic Fedora-related tests. OTOH we still have many generic tests to finish (either un-started or semi-finished) and before that we need to concentrate on architecture first (ResultDB etc.). I'm afraid to have complex tests running without solid architecture basis beneath it. In that respect unless we all agree this is a top-priority next-to-work-on test (and provided that we have enough hardware for it) I don't think we're able to run it soon.
Do we need some more information I should ask Matt for?
Thanks, Kamil
----- Forwarded Message ----- From: "Matt Domsch" Matt_Domsch@Dell.com To: mmaslano@redhat.com Cc: kparal@redhat.com, skvidal@fedoraproject.org Sent: Tuesday, April 12, 2011 7:30:14 PM Subject: RE: future of FTBFS
Seth was asking me the same question.
My environment consists of: Builders: 10 PowerEdge 1955 servers, each with 2x4core 3.0GHz CPUs, 8GB RAM, 2x144GB disks. Disk space on the builders is mostly used for the buildroots, and swap for large (or several parallel) jobs that can exceed 8GB in a tmpfs buildroot.
http and NFS server with space for the current rawhide tree (daily rsync), a hardlinked copy of the rawhide tree from the day the build starts (initially zero space, but growing to the size of the full rawhide tree as rawhide moves on), and space for the newly built tree results to land. ~250GB total.
One more server as the "master" that kicks off all the jobs to the builders. This can be anything, technically even one of the builders.
It takes this setup ~30 hours to build all 10,000 SRPMs twice, once for each of i386 and x86_64. That was before the tmpfs change in F14, which prevents mock from using tmpfs for its buildroots. Now it takes ~96 hours with disk-backed buildroots.
In my setup, each builder runs 4 jobs concurrently, two for each architecture. They're mostly I/O-bound, hence the disk-backed buildroots are so much slower. There is often plenty of memory and CPU left over; not always (depends on the size of the jobs that happen to be handed to each builder concurrently) - sometimes they're CPU-bound, but not mostly.
Seth was going to look into using cloud-based builders. I think this is a great idea, provided you have a place to store the build results outside of the builders themselves, and have network-local copy of the SRPM tree you're starting from and a copy of the buildroot repositories network-local too.
Thanks for your interest in taking this on! -Matt
-- Matt Domsch Technology Strategist Dell | Office of the CTO
-----Original Message----- From: Marcela Mašláňová [mailto:mmaslano@redhat.com] Sent: Tuesday, April 12, 2011 4:02 AM To: Domsch, Matt Cc: kparal@redhat.com Subject: future of FTBFS
Hello Matt, I was speaking with Fedora QA (Kamil Páral) about FTBFS. It might be possible to run it as one of project of QA, but they'd like to know some details.
What are the hw requirements (disk space, number of machines used to run it, how long it takes (days?), is it needed to have installed rawhide)?
Best regards, Marcela
-- Marcela Mašláňová BaseOS team Brno
On Thu, 2011-04-14 at 06:44 -0400, Kamil Paral wrote:
Hello,
I have been spoken to by Marcela Mašláňová about "The Future of FTBFS". See this thread:
http://lists.fedoraproject.org/pipermail/devel/2011-April/150310.html
IIUC (Is there an abbreviation for "I'm not a developer"?) the problem is as follows:
Matt Domsch from Dell used to rebuild *all* packages from Rawhide periodically (so-called "mass rebuild"). When some package failed to build, he reported errors against that package.
This testing ensured we often find build problems early in the release process. Without it there is a chance that we discover the build failures only when a new build of that package is required, which may be shortly before final release or even after that. That's a problem.
Mass-rebuilds in Koji are not done frequently (maybe once a year), so they can't cover this issue.
Matt can't do this testing anymore. Marcela asked me whether AutoQA could be used for that. Matt's tools (scripts, etc) should be available.
I asked Marcela to inquire more about some details. I have attached the discussion below (read from bottom up).
What are your thoughts? Is that something AutoQA can and should handle? Do we (will we) have enough hardware to be able to do that? According to our current priorities, is that even something we are able to implement in some reasonable time (under a year)?
As for the last question, I think it clearly fits our current effort to provide generic Fedora-related tests. OTOH we still have many generic tests to finish (either un-started or semi-finished) and before that we need to concentrate on architecture first (ResultDB etc.). I'm afraid to have complex tests running without solid architecture basis beneath it. In that respect unless we all agree this is a top-priority next-to-work-on test (and provided that we have enough hardware for it) I don't think we're able to run it soon.
Do we need some more information I should ask Matt for?
I can't see a ton of technical limitations on why AutoQA couldn't manage this workflow. The big issue that jumps out to me are resources (homand and hardware). I don't believe we have hardware capacity to run this workflow now, can the current hardware be loaned to Fedora? Additionally, running, maintaining and reviewing the scripts/results appears to be a significant effort. Is there anyone (rel-eng or devel) volunteering to maintain the scripts needed for rebuilds?
Thanks, James
On Thu, 2011-04-14 at 11:45 -0400, James Laska wrote:
On Thu, 2011-04-14 at 06:44 -0400, Kamil Paral wrote:
Hello,
I have been spoken to by Marcela Mašláňová about "The Future of FTBFS". See this thread:
http://lists.fedoraproject.org/pipermail/devel/2011-April/150310.html
IIUC (Is there an abbreviation for "I'm not a developer"?) the problem is as follows:
Matt Domsch from Dell used to rebuild *all* packages from Rawhide periodically (so-called "mass rebuild"). When some package failed to build, he reported errors against that package.
This testing ensured we often find build problems early in the release process. Without it there is a chance that we discover the build failures only when a new build of that package is required, which may be shortly before final release or even after that. That's a problem.
Mass-rebuilds in Koji are not done frequently (maybe once a year), so they can't cover this issue.
Matt can't do this testing anymore. Marcela asked me whether AutoQA could be used for that. Matt's tools (scripts, etc) should be available.
I asked Marcela to inquire more about some details. I have attached the discussion below (read from bottom up).
What are your thoughts? Is that something AutoQA can and should handle? Do we (will we) have enough hardware to be able to do that? According to our current priorities, is that even something we are able to implement in some reasonable time (under a year)?
As for the last question, I think it clearly fits our current effort to provide generic Fedora-related tests. OTOH we still have many generic tests to finish (either un-started or semi-finished) and before that we need to concentrate on architecture first (ResultDB etc.). I'm afraid to have complex tests running without solid architecture basis beneath it. In that respect unless we all agree this is a top-priority next-to-work-on test (and provided that we have enough hardware for it) I don't think we're able to run it soon.
Do we need some more information I should ask Matt for?
I can't see a ton of technical limitations on why AutoQA couldn't manage this workflow. The big issue that jumps out to me are resources (homand and hardware). I don't believe we have hardware capacity to run this workflow now, can the current hardware be loaned to Fedora? Additionally, running, maintaining and reviewing the scripts/results appears to be a significant effort. Is there anyone (rel-eng or devel) volunteering to maintain the scripts needed for rebuilds?
I think I can help with that (although I am afraid that my overcommit rate is high right now :)) I'd be happy to do that for Fedora, but I'd like to see the scripts and evaluate the effort before I commit on doing it.
Thanks, James _______________________________________________ autoqa-devel mailing list autoqa-devel@lists.fedorahosted.org https://fedorahosted.org/mailman/listinfo/autoqa-devel
On 04/14/2011 10:44 AM, Lucas Meneghel Rodrigues wrote: <snip>
I think I can help with that (although I am afraid that my overcommit rate is high right now :)) I'd be happy to do that for Fedora, but I'd like to see the scripts and evaluate the effort before I commit on doing it.
Matt posted his current code on devel@ [1] as a link to linux.dell.com [2].
I've downloaded it but haven't had much of a chance to go through it yet. Matt asserts that it would probably have to be at least somewhat re-written.
Tim
[1] http://lists.fedoraproject.org/pipermail/devel/2011-April/150316.html [2]http://linux.dell.com/files/fedora/FixBuildRequires/ftbfs-nov08.tgz
On 04/14/2011 09:45 AM, James Laska wrote:
On Thu, 2011-04-14 at 06:44 -0400, Kamil Paral wrote:
Hello,
I have been spoken to by Marcela Mašláňová about "The Future of FTBFS". See this thread:
http://lists.fedoraproject.org/pipermail/devel/2011-April/150310.html
IIUC (Is there an abbreviation for "I'm not a developer"?) the problem
is as follows:
- Matt Domsch from Dell used to rebuild *all* packages from Rawhide
periodically (so-called "mass rebuild"). When some package failed to build, he reported errors against that package.
- This testing ensured we often find build problems early in the
release process. Without it there is a chance that we discover the build failures only when a new build of that package is required, which may be shortly before final release or even after that. That's a problem.
- Mass-rebuilds in Koji are not done frequently (maybe once a
year), so they can't cover this issue.
- Matt can't do this testing anymore. Marcela asked me whether
AutoQA could be used for that. Matt's tools (scripts, etc) should be available.
- I asked Marcela to inquire more about some details. I have
attached the discussion below (read from bottom up).
What are your thoughts? Is that something AutoQA can and should handle? Do we (will we) have enough hardware to be able to do that? According to our current priorities, is that even something we are able to implement in some reasonable time (under a year)?
Could? Sure, we could make AutoQA handle anything we want it to :)
Should? Not sure on this one. I think that there would be value in having AutoQA involved in the process (running tests on builds as they are done) but I'm not so sure about the mass-rebuild process itself.
It feels a little outside the scope of AutoQA at the moment. AutoQA seems to be more focused on individual builds and updates as they are done in order to help keep a handle on package quality.
I guess it also comes down to priority. What are the order of our priorities and how would facilitating a mass-rebuild fit into those priorities? How important is a mass-rebuild of rawhide to the project?
Personally, I don't have a great feel for this ATM. I'll try to take a look at Matt's code today or tomorrow to see what all was being done and whether or not it would be a good idea to involve AutoQA.
As for the last question, I think it clearly fits our current effort to provide generic Fedora-related tests. OTOH we still have many generic tests to finish (either un-started or semi-finished) and before that we need to concentrate on architecture first (ResultDB etc.). I'm afraid to have complex tests running without solid architecture basis beneath it. In that respect unless we all agree this is a top-priority next-to-work-on test (and provided that we have enough hardware for it) I don't think we're able to run it soon.
I haven't had the chance to look through Matt's scripts yet but it MIGHT be possible to run it sooner if we go 'to the cloud!' or get some hardware lent to us. I emphasize might here and am in no way committing to anything or suggesting that it's a good idea :)
Do we need some more information I should ask Matt for?
I can't see a ton of technical limitations on why AutoQA couldn't manage this workflow. The big issue that jumps out to me are resources (homand and hardware). I don't believe we have hardware capacity to run this workflow now, can the current hardware be loaned to Fedora? Additionally, running, maintaining and reviewing the scripts/results appears to be a significant effort. Is there anyone (rel-eng or devel) volunteering to maintain the scripts needed for rebuilds?
I'm with you on the hardware part. Unless we have some extra HW that I don't know of, I sincerely doubt that we have enough to do this AND the normal autoqa stuff. We might have enough HW if we turned off all of autoqa for the length of the mass-rebuild but I think that would be a VERY bad idea. However, I think that seth vidal hit on an interesting idea in that same thread on devel@ [1].
'Should' or 'should not' aside, If we're talking about doing mass-rebuilds every so often, it might be worth looking into using rackspace cloud or EC2 for the mass rebuild.
Another thing that might be worth looking into is setting up a secondary, disposable koji+bodhi+autoqa infrastructure for the purpose of the mass-rebuild. That way, extra scripts wouldn't have to be written and maintained for the actual building.
I also agree that we would need help from others to at the very least, review the results from said mass-rebuild. How was it handled in the past?
I'm just talking about ideas here, though. Thoughts?
Tim
[1] http://lists.fedoraproject.org/pipermail/devel/2011-April/150319.html
Hello,
I have been spoken to by Marcela Mašláňová about "The Future of FTBFS". See this thread:
http://lists.fedoraproject.org/pipermail/devel/2011-April/150310.html
The discussion died out a little, so I'll make a quick summary:
1. We don't have enough hardware to run this test at the moment. Maybe Matt would be willing to lend his hardware pool to Fedora Infrastructure, that would solve the problem.
2. We don't want to maintain the test itself, just the AutoQA integration part. The scripts were made public and Lucas proclaimed that he would maybe volunteer to maintain them.
3. As Tim correctly pointed out this test may be important but it does not stand of top of our priority ladder. We have many planned tests that have better effort/benefit ratio.
My conclusion is that if items 1) and 2) are met, we will gladly try to hook this test up into AutoQA. But it might take quite some time (per item 3)).
Marcela, does that answer the question?
Anyone, further comments?
Thanks, Kamil
autoqa-devel@lists.fedorahosted.org