On Thu, Oct 21, 2021 at 5:03 AM David Duncan <davdunc@fedoraproject.org> wrote:

> Mark and I have talked in the past about moving some of the aws
> provisioning we are doing into ansible, and it came up again this
> morning in #fedora-admin.
>
> We have many (but not all) maintainer-test instances, a bunch of proxies
> and all the copr machines in aws that are currently in ansible.

I was looking over issue #10267 after Leo [https://pagure.io/user/leo] offered to work it and noticed that it was being provisioned manually. Like you said, some of the instances on AWS are provisioned manually, but this seemed like a great candidate for being managed through fedora-infra/ansible
playbooks. It made me want to know more.

>
> I think we could do something similar to tasks/virt_instance_create.yml
> (which we have for using virt-install and installing libvirt vms).
> We do have a tasks/aws_cloud.yml that copr instances use, but it doesn't
> provision the instance, just sets up things for ansible.
>

This might be a little different than what I was expecting, but yea. I can see how that would work.

> Copr folks: if we expand aws_cloud.yml to provision also (if the host is
> not up/reachable on it's dns name/ip) would you be ok using that as
> well? Or should we seperate out copr from maintainer-test/proxies?
>
> Also, there's the question of auth. batcave01 will need creds for those
> things. Sadly, I think this means making a user and getting a token, but
> if there's some better way to handle auth there that might be nice.

This would likely be the case. I had something like this setup before and

used a .boto file with the user creds. It would have to be a limited user

but should be ok as rbac would help us with access a little here.

>
> Finally, we are using ansible-2.9.x currently. We would need any
> solution to be able to work with that and newer ansible, which we likely
> will be switching to before long.
>
I am personally thumbs-up for this for a number of reasons and here is what I was thinking off the top of my head:

1) Deployments can be done either through

a) AWS Cloudformation Templates or the new Cloud AP managed with Ansible or
b) through the Ansible community.aws and amazon.aws collections and
their respective modules (works in 2.9+)

We have used ansible for AWS instances in the arc team https://pagure.io/fedora-infra/arc/blob/main/f/ansible

It would take a bit of a remodel to use the new collections but the base of the work can be copied from there.

2) The ec2_inventory plugin is very reliable. It's possible to create a
filter to ensure that we are managing only instances that are:

a) manageable by the deploying IAM user (resource groups can ensure proper authorization)
b) tagged according to the requirements of the ec2 inventory plugin
configuration (that can be specific to copr or maintainer instances, etc)

This is something I've also used before and can vouch for. It is very helpful.

3) Instance availability isn't guaranteed. There are many reasons for a
single instance to require replacement. It could be for an underlying
hardware failure or reboot requirement, but occasionally, for whatever
reasons, it happens and the requirement is best satisfied with a
redeployment. Kicking that off from the Ansible script saves a lot of
headache for, say, the newest on-call resource who doesn't want to learn a new
skill before they can go back to sleep.

4) It fits with the expectations of the open practice library and
pillars for Operational Excellence and Reliability.

+1 on both points