The discussion on devel list about ARM and my work last week on reinstalling builders quickly and commonly has raised a number of issues with how we manage our builders and how we should manage them in the future.
It is apparent that if we add arm builders they will be lots of physical systems (probably in a very small space) but physical, none-the-less. So we need a sensible way to manage and reinstall these hosts commonly and quickly.
Additionally, we need to consider what the introduction of a largish number of arm builders (and other arm infrastructure) would do to our existing puppet setup. Specifically overloading it pretty badly and making it not-very-manageable.
I'm making certain assumptions here and I'd like to be clear about what those are:
1. the builders need to be kept pristine 2. that currently our builders are not freshly installed frequently enough. 3. that the builders are relatively static in their configuration and most changes are done with pkg additions 4. that builder setups require at least two manual-ish steps of a koji admin who can disable/enable/register the builder with the kojihub. 5. that the builders are fairly different networking and setup-wise to the rest of our systems.
So I am proposing that we consider the following as a general process for maintaining our builders:
1. disable the builder in koji 2. make sure all jobs are finished 3. add installer entries into grub (or run the undefine, reinstall process if the builder is virt-based) 4. reinstall the system 5. monitor for ssh to return 6. connect in and force our post-install configuration: identification, network, mount-point setup, ssl certs/keys for koji, etc 7. reboot 8. re-enable host in koji
We would do this with frequency and regularity. Perhaps even having some percentage of our builders doing this at all times. Ie: 1/10th of the boxes reinstalling at any given moment so in a certain time frame*10 all of them are reinstalled.
Additionally, this would mean these systems would NOT have a puppet management piece at all. Package updates would still be handled by pushes as we do now, if things were security critical, but barring the need for significant changes we could rely on the boxes simply being refreshed frequently enough that it wouldn't need to be pushed.
What do folks think about this idea? It would dramatically reduce the node entries in our puppet config, it would drop the number of hosts connecting to puppet, too. It will mean more systems being reinstalled and more often. It will also require some work to make the steps I mention above be automated. I think I can achieve that without too much difficulty, actually. I think, in general, it will increase our ability to scale up to more and more builders.
I'd like input, constructive, please.
Thanks, -sv
El Tue, 20 Mar 2012 14:44:07 -0400 seth vidal skvidal@fedoraproject.org escribió:
The discussion on devel list about ARM and my work last week on reinstalling builders quickly and commonly has raised a number of issues with how we manage our builders and how we should manage them in the future.
It is apparent that if we add arm builders they will be lots of physical systems (probably in a very small space) but physical, none-the-less. So we need a sensible way to manage and reinstall these hosts commonly and quickly.
Today there is not a way to do an anaconda install on any arm system. though hopefully we will have that for deployment.
Additionally, we need to consider what the introduction of a largish number of arm builders (and other arm infrastructure) would do to our existing puppet setup. Specifically overloading it pretty badly and making it not-very-manageable.
probably we would be adding 100-300 systems. not only do we need to consider overloading of puppet, but also logging and monitoring. I guess its more how do we scale our infrastructure from at a guess ~100 nodes today to 3 to 4 times that
I'm making certain assumptions here and I'd like to be clear about what those are:
- the builders need to be kept pristine
- that currently our builders are not freshly installed frequently
enough. 3. that the builders are relatively static in their configuration and most changes are done with pkg additions 4. that builder setups require at least two manual-ish steps of a koji admin who can disable/enable/register the builder with the kojihub. 5. that the builders are fairly different networking and setup-wise to the rest of our systems.
So I am proposing that we consider the following as a general process for maintaining our builders:
- disable the builder in koji
- make sure all jobs are finished
- add installer entries into grub (or run the undefine, reinstall
process if the builder is virt-based) 4. reinstall the system 5. monitor for ssh to return 6. connect in and force our post-install configuration: identification, network, mount-point setup, ssl certs/keys for koji, etc 7. reboot 8. re-enable host in koji
We would do this with frequency and regularity. Perhaps even having some percentage of our builders doing this at all times. Ie: 1/10th of the boxes reinstalling at any given moment so in a certain time frame*10 all of them are reinstalled.
honestly we could do this instead of the monthly updates. just rebuild them instead
Additionally, this would mean these systems would NOT have a puppet management piece at all. Package updates would still be handled by pushes as we do now, if things were security critical, but barring the need for significant changes we could rely on the boxes simply being refreshed frequently enough that it wouldn't need to be pushed.
im ok with that, im pretty sure fas will scale to the extra boxes. do we drop monitoring of the builders? what about collectd etc.
What do folks think about this idea? It would dramatically reduce the node entries in our puppet config, it would drop the number of hosts connecting to puppet, too. It will mean more systems being reinstalled and more often. It will also require some work to make the steps I mention above be automated. I think I can achieve that without too much difficulty, actually. I think, in general, it will increase our ability to scale up to more and more builders.
main issue is that today we are not 100% sure of how we will install arm boxes. how do we deal with all the non puppet related systems? also need to look into how we can better scale koji itself. when we go from 20 to 200+ builders we need to make sure that load doesn't cause koji to fall over.
all the arm boxes will have management consoles. but today im not 100% sure how access to that would be. we would also need to deploy fedora for any arm based systems. things we need to reconsider also is networking today the storage network and the builder networks are /24's so we could use 253 nodes. i suspect we will go over that on the build network. we could not have the storage network on arm builders. it is really only needed for createrepo. but we may need to look at expanding kojipkgs to more nodes. or increase its network throughput with multiple bonded gig network ports. think mass rebuild and 100 or 200 buildroots initialising at once. it will stress our resources on all levels. but the flexibility of so many nodes could allow us to deploy solid solutions to scale and show that fedora is still the leader in open infrastructure and sets industry best practices.
Dennis
On Tue, 20 Mar 2012 21:38:13 -0500 Dennis Gilmore dennis@ausil.us wrote:
Today there is not a way to do an anaconda install on any arm system. though hopefully we will have that for deployment.
I would hope so. :)
probably we would be adding 100-300 systems. not only do we need to consider overloading of puppet, but also logging and monitoring. I guess its more how do we scale our infrastructure from at a guess ~100 nodes today to 3 to 4 times that
Centrally logging the builders is probably unnecessary. Especially if we're bouncing them all the time.
honestly we could do this instead of the monthly updates. just rebuild them instead
Sure - but I'm thinking of the emergency "oh look at that nightmare" updates.
im ok with that, im pretty sure fas will scale to the extra boxes. do we drop monitoring of the builders? what about collectd etc.
Collectd - off. We're not gaining much by having that punish the syslog server. We can monitor the builders w/o needing all of the copious info that collectd provides.
fas I'm not very worried about - though I suspect a couple of things will change w/how we get the dbs onto the hosts.
main issue is that today we are not 100% sure of how we will install arm boxes. how do we deal with all the non puppet related systems?
I think, if the playbooks are working well, we can use ansible to do this.
also need to look into how we can better scale koji itself. when we go from 20 to 200+ builders we need to make sure that load doesn't cause koji to fall over.
okay - but I think that's more something for the kojidevs than fedora infra?
all the arm boxes will have management consoles. but today im not 100% sure how access to that would be. we would also need to deploy fedora for any arm based systems. things we need to reconsider also is networking today the storage network and the builder networks are /24's so we could use 253 nodes. i suspect we will go over that on the build network. we could not have the storage network on arm builders. it is really only needed for createrepo. but we may need to look at expanding kojipkgs to more nodes. or increase its network throughput with multiple bonded gig network ports. think mass rebuild and 100 or 200 buildroots initialising at once. it will stress our resources on all levels. but the flexibility of so many nodes could allow us to deploy solid solutions to scale and show that fedora is still the leader in open infrastructure and sets industry best practices.
So one thing I'm not sure I understand - why would we need so many arm builders? Is it b/c there are so many more arm archs so there will need to be more pkgs built?
-sv
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
On Wed, 21 Mar 2012 10:08:38 -0400 seth vidal skvidal@fedoraproject.org wrote:
On Tue, 20 Mar 2012 21:38:13 -0500 Dennis Gilmore dennis@ausil.us wrote:
Today there is not a way to do an anaconda install on any arm system. though hopefully we will have that for deployment.
I would hope so. :)
probably we would be adding 100-300 systems. not only do we need to consider overloading of puppet, but also logging and monitoring. I guess its more how do we scale our infrastructure from at a guess ~100 nodes today to 3 to 4 times that
Centrally logging the builders is probably unnecessary. Especially if we're bouncing them all the time.
i think it could be useful for capacity planning and detecting when things go bad(TM). I wouldn't cry if we do not have it.
honestly we could do this instead of the monthly updates. just rebuild them instead
Sure - but I'm thinking of the emergency "oh look at that nightmare" updates.
im ok with that, im pretty sure fas will scale to the extra boxes. do we drop monitoring of the builders? what about collectd etc.
Collectd - off. We're not gaining much by having that punish the syslog server. We can monitor the builders w/o needing all of the copious info that collectd provides.
fas I'm not very worried about - though I suspect a couple of things will change w/how we get the dbs onto the hosts.
main issue is that today we are not 100% sure of how we will install arm boxes. how do we deal with all the non puppet related systems?
I think, if the playbooks are working well, we can use ansible to do this.
also need to look into how we can better scale koji itself. when we go from 20 to 200+ builders we need to make sure that load doesn't cause koji to fall over.
okay - but I think that's more something for the kojidevs than fedora infra?
not really, its not that koji itself wont scale but that we really will likely need to look at load balancing again, or look at an internal hub or 2, each builder checks in every 10 seconds to see if there is anything to do. all state and everything else is stored in the db. so adding multiple hubs to read and write to the db are ok. but i want to make sure that 300 hosts checking in and all the public traffic for koji get gracefully handled
all the arm boxes will have management consoles. but today im not 100% sure how access to that would be. we would also need to deploy fedora for any arm based systems. things we need to reconsider also is networking today the storage network and the builder networks are /24's so we could use 253 nodes. i suspect we will go over that on the build network. we could not have the storage network on arm builders. it is really only needed for createrepo. but we may need to look at expanding kojipkgs to more nodes. or increase its network throughput with multiple bonded gig network ports. think mass rebuild and 100 or 200 buildroots initialising at once. it will stress our resources on all levels. but the flexibility of so many nodes could allow us to deploy solid solutions to scale and show that fedora is still the leader in open infrastructure and sets industry best practices.
So one thing I'm not sure I understand - why would we need so many arm builders? Is it b/c there are so many more arm archs so there will need to be more pkgs built?
2 reasons why we will be looking at so many. hardware and software floating point are incompatiable. so builders that are building hardware floating point only build hardware floating point and the same for software floating point. and while we are looking at quad core 1.5ghz-2.0ghz builders with 4gb ram to start with they are still not quite as powerful a as there x86 counterparts. since they are low power 3-10 watts per node as opposed to 200-300watts for the existing builders I want to err on the side of too many rather than not enough and have people complain that they have to wait for a arm builder. realistically mass rebuilds are when it will be most noticiable. At a minimum I want at least double the number of x86 nodes for each arch so ~80 total. I do have on my list of things to come up with some reporting from arm koji and primary koji to see what the average build time is. knowing that what will will deploy will be faster than what we have today.
Dennis
On Tue, 20 Mar 2012 21:38:13 -0500 Dennis Gilmore dennis@ausil.us wrote:
...snip...
probably we would be adding 100-300 systems. not only do we need to consider overloading of puppet, but also logging and monitoring. I guess its more how do we scale our infrastructure from at a guess ~100 nodes today to 3 to 4 times that
Yeah.
...snip...
im ok with that, im pretty sure fas will scale to the extra boxes. do we drop monitoring of the builders? what about collectd etc.
There's a few things we could do on fas load:
a) add more fas servers. b) reduce the number of runs. How often do we change someone in sysadmin-noc, sysadmin-main, sysadmin-build? c) move to a system where we only re-run fasClient when there is a change.
I'd agree collectd off probibly. Or at least a seperate one if we needed to monitor them.
main issue is that today we are not 100% sure of how we will install arm boxes. how do we deal with all the non puppet related systems? also need to look into how we can better scale koji itself. when we go from 20 to 200+ builders we need to make sure that load doesn't cause koji to fall over.
yeah.
all the arm boxes will have management consoles. but today im not 100% sure how access to that would be. we would also need to deploy fedora for any arm based systems. things we need to reconsider also is networking today the storage network and the builder networks are /24's so we could use 253 nodes. i suspect we will go over that on the build network. we could not have the storage network on arm builders. it is really only needed for createrepo. but we may need to look at expanding kojipkgs to more nodes. or increase its network throughput with multiple bonded gig network ports. think mass rebuild and 100 or 200 buildroots initialising at once. it will stress our resources on all levels. but the flexibility of so many nodes could allow us to deploy solid solutions to scale and show that fedora is still the leader in open infrastructure and sets industry best practices.
Yeah, we could hopefully have another network thats larger than /24 for the arm builders.
I'm sure some of this will be a process of 'oh no, what we have now doesn't scale, lets fix it'. Of course some of it we can get ready for up front too.
Overall I like the idea of the automated builder re-install and think it will get us more ready for things like a large arm cluster.
kevin
On Wed, 21 Mar 2012 08:33:51 -0600 Kevin Fenzi kevin@scrye.com wrote:
There's a few things we could do on fas load:
a) add more fas servers. b) reduce the number of runs. How often do we change someone in sysadmin-noc, sysadmin-main, sysadmin-build? c) move to a system where we only re-run fasClient when there is a change.
I'm thinking for the hosts which are sysadmin-ish only - do C.
for the publicish hosts continue to poll fas directly.
so: - hosted, people, bastion, publictests == poll - everything else is a set built and pushed to them.
I'd agree collectd off probibly. Or at least a seperate one if we needed to monitor them.
I'm not sure what benefit we get from collectd on transient builders, though.
On our long-running hosts I understand but not on the builders.
Yeah, we could hopefully have another network thats larger than /24 for the arm builders.
I can imagine various network changes should easily allow us to allocate larger than a /24 to the internal build network.
I'm sure some of this will be a process of 'oh no, what we have now doesn't scale, lets fix it'. Of course some of it we can get ready for up front too.
yay for planning! :)
Overall I like the idea of the automated builder re-install and think it will get us more ready for things like a large arm cluster.
Then I will get crackin' on making it work.
-sv
On Wed, 21 Mar 2012 11:03:24 -0400 seth vidal skvidal@fedoraproject.org wrote:
On Wed, 21 Mar 2012 08:33:51 -0600 Kevin Fenzi kevin@scrye.com wrote:
There's a few things we could do on fas load:
a) add more fas servers. b) reduce the number of runs. How often do we change someone in sysadmin-noc, sysadmin-main, sysadmin-build? c) move to a system where we only re-run fasClient when there is a change.
I'm thinking for the hosts which are sysadmin-ish only - do C.
for the publicish hosts continue to poll fas directly.
so:
- hosted, people, bastion, publictests == poll
- everything else is a set built and pushed to them.
Yeah, the trick is knowing when there is a change that affects them...
I wonder if we could make fas smarter. Have a serial # for each group. It pulls and keeps track of that. Then it pulls again but just asks "what serial # do you have for groups x, y, z". Probibly too much added complexity I guess.
I'd agree collectd off probibly. Or at least a seperate one if we needed to monitor them.
I'm not sure what benefit we get from collectd on transient builders, though.
On our long-running hosts I understand but not on the builders.
Yeah, the only case I can see is so we could see how loaded they are... and we might have better ways to tell that.
Yeah, we could hopefully have another network thats larger than /24 for the arm builders.
I can imagine various network changes should easily allow us to allocate larger than a /24 to the internal build network.
Yeah.
I'm sure some of this will be a process of 'oh no, what we have now doesn't scale, lets fix it'. Of course some of it we can get ready for up front too.
yay for planning! :)
Overall I like the idea of the automated builder re-install and think it will get us more ready for things like a large arm cluster.
Then I will get crackin' on making it work.
Sounds good.
kevin
On Wed, 21 Mar 2012, Kevin Fenzi wrote:
I'd agree collectd off probibly. Or at least a seperate one if we needed to monitor them.
I'm not sure what benefit we get from collectd on transient builders, though.
On our long-running hosts I understand but not on the builders.
Yeah, the only case I can see is so we could see how loaded they are... and we might have better ways to tell that.
Yeah, we could hopefully have another network thats larger than /24 for the arm builders.
I can imagine various network changes should easily allow us to allocate larger than a /24 to the internal build network.
Yeah.
I'm sure some of this will be a process of 'oh no, what we have now doesn't scale, lets fix it'. Of course some of it we can get ready for up front too.
yay for planning! :)
Overall I like the idea of the automated builder re-install and think it will get us more ready for things like a large arm cluster.
Then I will get crackin' on making it work.
Sounds good.
I wanted to come back around to this discussion to close it out- as we are most of the way complete here:
In the last few weeks I've setup a system that deploys a new builder, provisions it and gets it ready in a single command.
It's in the builder git repository. This repo is on lockbox but it is only accessible to sysadmin-main and sysadmin-releng.
I've posted a site-specific sanitized version of the script I'm using here: http://fedorapeople.org/cgit/skvidal/public_git/scripts.git/tree/ansible/sta...
and I'll be happy to post the playbooks I'm using to provision these hosts.
The repo is restricted b/c it contains some certs/ssl keys that we aren't going to give away to everyone :)
The process for reinstalling a host is incredibly trivial, we built all the hosts for the latest mass rebuild using that process. It takes a single command and you walk away.
(other than any enabling of the build in koji).
The next step is to put this process into a cron job so we, ideally, can reinstall a certain percentage of our builders at any/all times.
We're using ansible for all of the command/control and it has been remarkably stable for our use case. It does require ssh keys on the hosts but we have that set via kickstarts now for the builders.
After some discussion we took the step of removing FAS and all fedora accounts from the builders. We couldn't come up with a compelling reason to keep these throw-away hosts coupled to FAS since the only folks connecting to them were sysadmin-main/releng - it was a waste of time to setup and keep the FAS db on the hosts current. Furthermore, it was an additional risk that a rogue package could try to snatch up our fas db and crack the passwords.
If anyone has any questions about how this works or would like any piece of the infrastructure for doing it (other than the certs/keys :)) please email to this list and ask.
-sv
On 3/20/12 7:38 PM, Dennis Gilmore wrote:
probably we would be adding 100-300 systems. not only do we need to consider overloading of puppet, but also logging and monitoring. I guess its more how do we scale our infrastructure from at a guess ~100 nodes today to 3 to 4 times that
Do we know how well kojihub will scale with 300+ builders? I know we've had issues before where a large number of builders causes some interesting issues when they are all pinging home to see if there is any work to be done.
On 3/21/12 11:42 AM, Jesse Keating wrote:
On 3/20/12 7:38 PM, Dennis Gilmore wrote:
probably we would be adding 100-300 systems. not only do we need to consider overloading of puppet, but also logging and monitoring. I guess its more how do we scale our infrastructure from at a guess ~100 nodes today to 3 to 4 times that
Do we know how well kojihub will scale with 300+ builders? I know we've had issues before where a large number of builders causes some interesting issues when they are all pinging home to see if there is any work to be done.
Disregard, I see this was already discussed.
infrastructure@lists.fedoraproject.org