Greetings.
So, over the holiday break I did some massive cleanup on our ansible repo. I took an initial patch from janeznemanic to fix old syntax and went from there. I got all the depreciated syntax fixed (there might be some small amount of stray ones). I also moved accelerate into global.yml, so it should apply to all playbooks. The needed package and firewall port should be set in the kickstarts now.
Next I took a simple script to run --check --diff on each host and group playbook and got it up and running. It takes about an hour to run against our host/group playbooks when it's run one at a time. We could just fire them all off but that might swamp lockbox01.
Ideally, what I would like to see from a run of this script is all hosts/groups reachable and 0 items changed. This is the state we should strive for. ;)
* The following hosts are unreachable:
209.132.184.158 (see jenkins note below) 209.132.184.209 (see jenkins note below) arm03-packager01.arm.fedoraproject.org (will fix) arm03-packager02.arm.fedoraproject.org (will fix) arm03-qa01.arm.fedoraproject.org (will fix) buildvm-27.phx2.fedoraproject.org (test buildvm, expected down) jenkins-cloud jenkins-slaves (these look to need a bit of tweaking) lists-dev.cloud.fedoraproject.org (is up, but / is 100% full) mailman01.stg.phx2.fedoraproject.org releng01.phx2.fedoraproject.org (is down since we don't have a branched right now)
* The following hosts have changed > 0:
209.132.184.144 209.132.184.153 209.132.184.157 arm03-qa00.arm.fedoraproject.org arm03-qa02.arm.fedoraproject.org arm03-qa03.arm.fedoraproject.org arm03-releng00.arm.fedoraproject.org arm03-releng01.arm.fedoraproject.org arm03-releng02.arm.fedoraproject.org arm03-releng03.arm.fedoraproject.org backup03.phx2.fedoraproject.org beaker01.qa.fedoraproject.org bkernel01.phx2.fedoraproject.org bkernel02.phx2.fedoraproject.org buildvm-01.phx2.fedoraproject.org buildvmhost-10.phx2.fedoraproject.org buildvmhost-11.phx2.fedoraproject.org buildvmhost-12.phx2.fedoraproject.org bvirthost07.phx2.fedoraproject.org copr-be-dev.cloud.fedoraproject.org copr-fe-dev.cloud.fedoraproject.org db02.stg.phx2.fedoraproject.org docs-backend01.phx2.fedoraproject.org fedocal01.phx2.fedoraproject.org fedocal01.stg.phx2.fedoraproject.org fedocal02.phx2.fedoraproject.org gallery01.stg.phx2.fedoraproject.org kernel01.qa.fedoraproject.org kernel02.qa.fedoraproject.org keys01.fedoraproject.org mailman01.stg.phx2.fedoraproject.org notifs-backend01.stg.phx2.fedoraproject.org notifs-web01.stg.phx2.fedoraproject.org notifs-web02.stg.phx2.fedoraproject.org nuancier01.phx2.fedoraproject.org nuancier01.stg.phx2.fedoraproject.org nuancier02.phx2.fedoraproject.org nuancier02.stg.phx2.fedoraproject.org releng02.phx2.fedoraproject.org taskotron-dev01.qa.fedoraproject.org virthost15.phx2.fedoraproject.org
I'll work with others to get those all fixed up in the coming weeks.
That said, how do we want to run our non manual ansible jobs?
a) run a --check --diff once a day and yell about unreachable or changed>0 (I could commit this now)
b) just run them once a day and yell about anything that changes. (I could commit this now)
c) Trigger them on git commits. This would take work to figure out what was affected by the commit, or just fire off a run of everything.
d) setup some file somewhere that can be created by sysadmin group and a cron job picks it up and runs the next time it runs. This would allow someone to commit something, schedule a run and give a bit of time for someone to notice a problem with it before it does.
Thoughts?
As far as roadmap for migration:
I'm going to try and work on splitting out everything that is still on app* servers to their own ansible instances. Once the app servers are fully migrated we can tackle proxy*, then virthosts, then various singletons. Then we can see where we are, and work a final push to get everything left moved over. ;)
kevin
On 01/08/2014 09:10 PM, Kevin Fenzi wrote:
a) run a --check --diff once a day and yell about unreachable or changed>0 (I could commit this now)
+1 but allow to set exceptions. For example I expect that copr-fe-dev and copr-be-dev differ from ansible config, because I'm breaking it on purpose during development very often. On the other hand I would love to get warnings about productions copr machines.
On Wed, Jan 08, 2014 at 01:10:40PM -0700, Kevin Fenzi wrote:
That said, how do we want to run our non manual ansible jobs?
a) run a --check --diff once a day and yell about unreachable or changed>0 (I could commit this now)
:+1:
b) just run them once a day and yell about anything that changes. (I could commit this now)
+0, this could be fine.. but it would be a shame if it ran at a time when we were all asleep, or when there happened to be load, or...
c) Trigger them on git commits. This would take work to figure out what was affected by the commit, or just fire off a run of everything.
I think I saw you mention that a run of everything takes ~1 hour? That's probably too much for a per-commit action. A selective run per commit could be cool though :)
d) setup some file somewhere that can be created by sysadmin group and a cron job picks it up and runs the next time it runs. This would allow someone to commit something, schedule a run and give a bit of time for someone to notice a problem with it before it does.
Push... and then wait for the doom.
infrastructure@lists.fedoraproject.org