Re: Storage requirements for CI

Tuesday, 25 July 2017

On 07/25/2017 10:59 AM, Paul W. Frields wrote:
...
 I'd meant to raise this question last week but it turned out
several
 folks were out of pocket who'd probably want to discuss.  One of the
 aspects of continuous integration[1] that impacts my team is the
 storage requirement.  How much storage is required for keeping test
 results, composed trees and ostrees, and other artifacts?  What is
 their retention policy?

 A policy of "keep everything ever made, forever" clearly isn't
 scalable.  We don't do that in the non-CI realm either, e.g. with
 scratch builds.  I do think that we must retain everything we
 officially ship, that's well understood.  But atop that, anything we
 keep costs storage, and over time this storage costs money.  So we
 need to draw some reasonable line that balances thrift and service.

 A. Retention
 ============

 The second question is probably a good one to start with, so we can
 answer the first.  So we need to answer the retention question for
 some combination of:

 1. candidate builds that fail a CI pipeline
 2. candidate builds that pass a CI pipeline
 3. CI composed testables
   * a tree, ISO, AMI, other image, etc. that's a unit
   * ostree change which is more like a delta (AIUI)
 4. CI generated logs
 5. ...other stuff I may be forgetting The other big bucket is packages in the
buildroot used to build the builds.  You may want to keep these as well if there is a
desire to be able to rebuild packages at a later point.  

...

 My general thoughts are that these things are kept forever:

 * (2), but only if that build is promoted as an update or as part of a
   shipped tree/ostree/image
 * (3), but only if the output is shipped to users
 * (4), but only if corresponding to an item in (2) or (3)

 Outside that, artifacts and logs are kept only for a reasonable amount
 of troubleshooting time.  Say 30 days, but I'm not too worried about
 the actual time period.  It could be adjusted based on factors we have
 yet to encounter. 
How does this proposal compare the existing practice in Fedora?

> 
> B. Storage - How much?
> ======================
> 
> To get an idea of what this might look like, I think we might make
> estimates based on:
> 
> * the number of builds currently happening per day
> * how many of these builds are within the definition for an officially
>   shipped thing (like Atomic Host, Workstation, Server, etc.)
> * The average size of the sources + binaries, summed out over the ways
>   we deliver them (SRPM + RPM, ostree binary, binary in another
>   image), and multiplied out by arches
> * Then sum this out over the length of a Fedora release
> 
> This is the part I think will need information from the rel-eng and CI
> contributors, working together.  My assumption is there are gaping
> holes in this concept, so don't take this as a full-on proposal.
> Rather, looking for folks to help harden the concepts and fill in the
> missing pieces.  I don't think we need a measurement down to the
> single GB; a broad estimate in 100s of GB (or even at the 1 TB order
> of magnitude) is likely good enough.
> 
> I'm setting the follow-up to infrastructure(a)lists.fedoraproject.org,
> since that team has the most information about our existing storage
> and constraints.
> 
> 

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

Re: Storage requirements for CI