On Mar 14, 2011, at 8:57 PM, Ian Main wrote:
On Mon, Mar 14, 2011 at 11:07:53AM +0100, Michal Fojtik wrote:
On Mar 11, 2011, at 10:04 PM, Ian Main wrote:
Condor Cloud for Aeolus
I wrote up a quick summary of the current issues/ideas around this project so we can get things rolling. If you don't want to read the whole document please at least look at the section you are interested in. :) I CC'd a few condor guys as well.
Goal
The idea here is to create a simple cloud provider driven through deltacloud that we can use in Aeolus. The first iteration will be extremely simple indeed and basically just make use of the facilities offered by condor with some glue to make things work.
- VMs will use KVM via Condor.
- Condor will keep track of all instances for us and offer scalability.
- Instances will be stateless. Condor will copy VM images before running
them and NOT save state when done.
- Simple mechanism for uploading images to machine running condor.
- Deltacloud driver to interface directly with condor or through simple
agent (to handle MAC/IP mappings, authentication etc).
Image Wharehouse Integration
So far the general consensus is to use something simple like NFS or scp to copy images to the Condor central manager server. This will consist of an upload directory and then a staging directory. While uploading files go into the uploading directory and then are moved to staging once upload is complete. This prevents trying to start a half-copied image, although that is generally not possible from the aeolus UI.
Only question here is that we may need some info on what OS this is targeting so we can do the drive mapping correctly etc. The iwhd guys may know more issues as well.
Deltacloud Driver
The deltacloud driver can interact directly with condor commands to start, stop, and query the state of instances running in condor. Ruby code to do everything but the query is already in conductor, but generally we use condor_submit to start a new job/instance, condor_q -xml to get the state of jobs on the system (parsing xml output), and condor_rm to stop an instance.
In Deltacloud API I *need* to have this collections defined:
- realms
I though it should be Condor location (or in other terms machine running Condor). For start it can be something like 'default' and just report state (AVAILABLE, UNAVAILABLE)
I think for now we just stick with giving the whole cloud one realm name? I suspect that is how it will go anyway. If you wanted another installation someplace else you could install another condor and change that realm name and conductor could figure out what to do there.
Yes this make sense to me, we can keep it like 'default' or just use some hostname where Condor is reside.
- hardware_profiles
Dunno how we get this informations. We can get it from Libvirt. Like:
- maximum amount of memory which can be used for VM
- maximum number of CPU cores available
- storage?
I think we're going to have to just define some reasonable sizes. There is a condor config for setting the maximums though. Probably a configuration option again eventually.
I played with libvirt Ruby gem recently and I found that you can easily get this sizes (limits) via libvirt API:
<snip> result = Nokogiri::XML(lvirt.capabilities) profiles << HardwareProfile.new((result/'/capabilities/host/cpu/model').text) do architecture lvirt.node_get_info.model cpu (1..lvirt.node_get_info.cpus.to_i) memory (1..lvirt.node_get_info.memory.to_i) storage 1000 end </snip>
Magic thing there is 'capabilities', which produce nice XML from where we can get all this informations. It would be nice to pass this to Condor driver, so client will know how much memory left from 'pool' in hypervisor.
- images
Image Warehouse here?
- There informations are required:
- id and name (could be the same)
- description?
- state (UP, PENDING/BUILDING...)
- owner_id (!)
Yes we will have image wharehouse. I can get you all the other stuff as well I think. Is owner id really important? This looks like we'd need a bit of metadata to go along with our images.
Owner: I realized that in Condor/libvirt is not really important because I think we will not mess with user authentication anyway. So we can set it to 'root' IMHO.
Metadata: Example representation of Deltacloud API Image model looks like:
<image href="http://localhost:3001/api/images/img1" id="img1"> <name>Fedora 10</name> <owner_id>fedoraproject</owner_id> <description>Fedora 10</description> <architecture>x86_64</architecture> <state>AVAILABLE</state> <actions> <link href="http://localhost:3001/api/instances;image_id=img1" method="post" rel="create_instance"/> </actions> </image>
- instances
- These informations are required:
- state
- id and name (could be the same)
- image_id, realm_id, hardware_profile_id
- public_addresses (MAC->IP translation here)
- authentication (password/SSh keys? for Linux, administrator password for Windows)
Yes, I can get you all that stuff given an intermediate agent. Everything is available through condor except IP and authentication.
I bet that ImageWarehouse can produce images with built-in SSH key authentication. Maybe we can do authentication in the 'middle-layer' app as well as IP management. This application will have pool of MAC->IP and we can also associate some SSH key with IP. So ImageWarehouse can connect to this application and fetch SSH key and then bundle it to Image. (I assume that ImageWarehouse will get MAC address for final VM)
-- Michal
------------------------------------------------------ Michal Fojtik, mfojtik@redhat.com Deltacloud API: http://deltacloud.org