All,
This is a RFC patch series to remove condor from the conductor. In short,
condor presents problems for our project because it is an external project, it
is written in C++ (when most of our developers are ruby), and it is too complex
for our current needs.
The new way we do scheduling is described pretty well in patch 1, so I
won't delve into it here. This is an RFC series because there are 2 known
problems and it has only been lightly tested.
The first known problem is that since we are doing the deltacloud create
calls inline in the conductor, this can cause the UI itself to timeout. This is
going to be a problem when using the VMware backend, as we know that the
create call there can take a long time. Possible solutions are to use a
different thread or process for the call, but there may be others.
The second known problem is that for reasons I don't really understand,
updating the instance row in the database (using instance.save!) has some
surprising results. One example of this is public_addresses; the
public_addresses field I get from the deltacloud backend looks correct
(
ec2-50-7-27-214.compute-1.amazonaws.com, or whatever), but when it is saved
into the database it looks odd (---\n- ec2-50-7-27-214.compute-1.amazonaws.com\n).
Since dbomatic is not doing this manipulation, I can only presume that some
observer is screwing it up, but I can't see how. A second example of this
problem is that the public key that gets created when the instance is launched
disappears from the UI as soon as dbomatic runs.
I have only tested it so far using the EC2 backend. Except for the two
bugs above, things work pretty well there; I can start and stop deployments
from the UI, and the state gets updated along the way. To really put this
into the repository we would need to test on the other backends, particularly
RHEV-M and VMware.
Comments and questions about the patchset and what we are trying to
accomplish here are welcome.
Chris Lalancette