Condor Cloud for Aeolus -----------------------
I wrote up a quick summary of the current issues/ideas around this project so we can get things rolling. If you don't want to read the whole document please at least look at the section you are interested in. :) I CC'd a few condor guys as well.
Goal ----
The idea here is to create a simple cloud provider driven through deltacloud that we can use in Aeolus. The first iteration will be extremely simple indeed and basically just make use of the facilities offered by condor with some glue to make things work.
- VMs will use KVM via Condor. - Condor will keep track of all instances for us and offer scalability. - Instances will be stateless. Condor will copy VM images before running them and NOT save state when done. - Simple mechanism for uploading images to machine running condor. - Deltacloud driver to interface directly with condor or through simple agent (to handle MAC/IP mappings, authentication etc).
Image Wharehouse Integration ----------------------------
So far the general consensus is to use something simple like NFS or scp to copy images to the Condor central manager server. This will consist of an upload directory and then a staging directory. While uploading files go into the uploading directory and then are moved to staging once upload is complete. This prevents trying to start a half-copied image, although that is generally not possible from the aeolus UI.
Only question here is that we may need some info on what OS this is targeting so we can do the drive mapping correctly etc. The iwhd guys may know more issues as well.
Deltacloud Driver -----------------
The deltacloud driver can interact directly with condor commands to start, stop, and query the state of instances running in condor. Ruby code to do everything but the query is already in conductor, but generally we use condor_submit to start a new job/instance, condor_q -xml to get the state of jobs on the system (parsing xml output), and condor_rm to stop an instance.
Now, the difficulty comes in when we start looking at doing things like mapping MACs to IPs, authentication etc. If we do the condor commands directly in the driver then the driver would have to handle these tasks as well. I'm not sure if that is desirable.
The other idea is to put some kind of agent in between the driver and condor which can handle MAC/IP translation, authentication, perhaps image registration etc. The benefit of this is that then we can work on adding features to the 'cloud' without having to change the deltacloud driver itself as often. I'm starting to think this is the way to go even tho it may be slightly more work up front.
I also looked into having the IP address set in the job via condor but I do not see a clean/easy way to do this.
IP Discovery ------------
There are many possible solutions to this problem but none of them are obviously better than any other. The three that seem the most promising are:
1) Using an agent in the VMs to register with a central server thereby letting us know its IP. 2) Having control of the DHCP server so we can see which leases are out for which MACs. 3) Using a config file which maps MACs to IPs and then configuring dhcpd with the same mapping.
Of these I'm thinking we should implement #3 first. I think this should get us up and running quickly and it can be configured in almost any environment. Second option would be #1. I think in the end we will need to implement all of these and have it be configurable by the admin.
Using the config file to map MACs to IPs means having to keep track of which macs are in use. We can do this easily by querying the jobs which contain this information. However this lookup would have to be done before the job is submitted to condor and so this duty would fall on either the deltacloud driver itself or the condor cloud agent.
Installation/Configuration --------------------------
I'm hoping I can get some help on using a puppet script to ease installation similar to what we are doing with Aeolus Conductor now. While I don't have all the details yet, in general we will need it to:
- Configure condor. - Set up deltacloud correctly with the right driver etc. - Set up a network bridge for VMs. - Set up the MAC/IP address table for admins to edit.
Other Docs ----------
Please also see Mathews document on Condor as a cloud provider:
http://spinningmatt.files.wordpress.com/2010/04/matthewfarrelleeopensourcecl...
Summary -------
I think I've touched on the major issues/talking points here. Hopefully we can come up with the best solutions quickly and get to writing the parts soon.
Really the only parts that are still up in the air are the IP/MAC mapping (possibly tho I'm fine with the config file thing for now), and how to architect the driver - eg whether we need an agent in between the driver and condor. I think the rest we can work out as we go.
Ian