On Mar 11, 2011, at 10:04 PM, Ian Main wrote:
Condor Cloud for Aeolus
I wrote up a quick summary of the current issues/ideas around this project so we can get things rolling. If you don't want to read the whole document please at least look at the section you are interested in. :) I CC'd a few condor guys as well.
Goal
The idea here is to create a simple cloud provider driven through deltacloud that we can use in Aeolus. The first iteration will be extremely simple indeed and basically just make use of the facilities offered by condor with some glue to make things work.
- VMs will use KVM via Condor.
- Condor will keep track of all instances for us and offer scalability.
- Instances will be stateless. Condor will copy VM images before running
them and NOT save state when done.
- Simple mechanism for uploading images to machine running condor.
- Deltacloud driver to interface directly with condor or through simple
agent (to handle MAC/IP mappings, authentication etc).
Image Wharehouse Integration
So far the general consensus is to use something simple like NFS or scp to copy images to the Condor central manager server. This will consist of an upload directory and then a staging directory. While uploading files go into the uploading directory and then are moved to staging once upload is complete. This prevents trying to start a half-copied image, although that is generally not possible from the aeolus UI.
Only question here is that we may need some info on what OS this is targeting so we can do the drive mapping correctly etc. The iwhd guys may know more issues as well.
Deltacloud Driver
The deltacloud driver can interact directly with condor commands to start, stop, and query the state of instances running in condor. Ruby code to do everything but the query is already in conductor, but generally we use condor_submit to start a new job/instance, condor_q -xml to get the state of jobs on the system (parsing xml output), and condor_rm to stop an instance.
In Deltacloud API I *need* to have this collections defined:
* realms I though it should be Condor location (or in other terms machine running Condor). For start it can be something like 'default' and just report state (AVAILABLE, UNAVAILABLE)
* hardware_profiles Dunno how we get this informations. We can get it from Libvirt. Like: - maximum amount of memory which can be used for VM - maximum number of CPU cores available - storage?
* images Image Warehouse here? - There informations are required: - id and name (could be the same) - description? - state (UP, PENDING/BUILDING...) - owner_id (!)
* instances - These informations are required: - state - id and name (could be the same) - image_id, realm_id, hardware_profile_id - public_addresses (MAC->IP translation here) - authentication (password/SSh keys? for Linux, administrator password for Windows)
Also there are some optional collections like storage_snapshots and storage_volumes.
Now, the difficulty comes in when we start looking at doing things like mapping MACs to IPs, authentication etc. If we do the condor commands directly in the driver then the driver would have to handle these tasks as well. I'm not sure if that is desirable.
Let me do some 'brainstorming' here. My idea is to have something middle, between DC API and Condor. Something that will do:
POST /condor -> Pass command to Condor using condor_q -xml.
Reason for this is to have more 'clean' Ruby code in driver. I'm a little bit aware about doing `condor_q -xml ...` in every method or some client library. Just because that driver or client lib will look terribly messy. Another point is that with this 'thing' you don't need to have Condor on the same machine as Deltacloud API is and DC API will not need to store anything driver specific. Implementation of this call in Ruby could be a 10 lines of Sinatra code like this:
post '/condor' do content_type 'application/xml' `condor_q -xml #{params[:q]}` end
This 'middleware' can also do IP translations for driver (simple key->value DB) and also image registration and other things needed besides driver.
The other idea is to put some kind of agent in between the driver and condor which can handle MAC/IP translation, authentication, perhaps image registration etc. The benefit of this is that then we can work on adding features to the 'cloud' without having to change the deltacloud driver itself as often. I'm starting to think this is the way to go even tho it may be slightly more work up front.
I also looked into having the IP address set in the job via condor but I do not see a clean/easy way to do this.
IP Discovery
There are many possible solutions to this problem but none of them are obviously better than any other. The three that seem the most promising are:
- Using an agent in the VMs to register with a central server thereby letting us know its IP.
RHEV-M is doing this but I don't like this idea at all. Just because it complicate things, you need to install something in templates and this agent need to know where IP translation app resides (to report IP).
- Having control of the DHCP server so we can see which leases are out for which MACs.
- Using a config file which maps MACs to IPs and then configuring dhcpd with the same mapping.
Of these I'm thinking we should implement #3 first. I think this should get us up and running quickly and it can be configured in almost any environment. Second option would be #1. I think in the end we will need to implement all of these and have it be configurable by the admin.
Using the config file to map MACs to IPs means having to keep track of which macs are in use. We can do this easily by querying the jobs which contain this information. However this lookup would have to be done before the job is submitted to condor and so this duty would fall on either the deltacloud driver itself or the condor cloud agent.
Installation/Configuration
I'm hoping I can get some help on using a puppet script to ease installation similar to what we are doing with Aeolus Conductor now. While I don't have all the details yet, in general we will need it to:
- Configure condor.
- Set up deltacloud correctly with the right driver etc.
- Set up a network bridge for VMs.
- Set up the MAC/IP address table for admins to edit.
Other Docs
Please also see Mathews document on Condor as a cloud provider:
http://spinningmatt.files.wordpress.com/2010/04/matthewfarrelleeopensourcecl...
Summary
I think I've touched on the major issues/talking points here. Hopefully we can come up with the best solutions quickly and get to writing the parts soon.
Really the only parts that are still up in the air are the IP/MAC mapping (possibly tho I'm fine with the config file thing for now), and how to architect the driver - eg whether we need an agent in between the driver and condor. I think the rest we can work out as we go.
Ian
aeolus-devel mailing list aeolus-devel@lists.fedorahosted.org https://fedorahosted.org/mailman/listinfo/aeolus-devel
------------------------------------------------------ Michal Fojtik, mfojtik@redhat.com Deltacloud API: http://deltacloud.org