On 04/05/2012 12:02 PM, Mo Morsi wrote:
>>>>
>>>> Sooner today I proposed in thread "RFC feature planning - robust
>>>> instance launching" to replace dbomatic with a background job tool
>>>> which supports both delayed and recurring jobs:
>>>> - delayed jobs would be used for launching instances
>>>> - recurring jobs would be used for all stuff done by dbomatic
>>>>
>>>> The code in "check_one_account" method could be part of
>>>> ProviderAccount model, so it can be covered by common rspec tests
>>>> we have.
>>>> The rest of script is just about parsing params, and forking
>>>> processes in an interval - we get all this logic for free when
>>>> using a bg job tool.
>>>>
>>>> Jan
>>>
>>> Thanks for the feedback Jan. I like the idea of replacing dbomatic
>>> with a background job tool. Placing a task on a queue to launch an
>>> instance is much better approach to having dbomatic poll in the
>>> background. It also eliminates the dbomatic timeout issues when
>>> launching a large number of instances.
>>>
Agree. Though as I understand it, it's only part of the problem w/ 30s
time window: if there are many instances for a provider account, 30s
might be too short time for checking all these instances. We want to
make sure that an instance is checked every X seconds (current 60s
sounds good to me), but also check of all instances should be finished
before another check is executed (IOW, it shouldn't take more than 60s).
I don't know how much time (average time) does it take to check one
instance through DC API for EC2/RHEV/vsphere so I'm not sure how many
instances can be checked in 60s. 50? 100? more? But I guess that 'get
instance' time will not be too different from 'launch instance' time for
which there was a problem w/ 30s time window.
Simple solution might be make this check interval configurable, but it's
not ideal because then instances state will be updated with even bigger
delay. Or use more parallel queues per one provider account
>>> As for the instance status checks, do we need recurring
jobs at all?
>>> I wonder if we can simply create a delayed_job that runs and then
>>> queues itself again upon completion.
>>>
>>>
>> Would requiring all recurring jobs to re-scedule the next iteration
>> be less reliable than making sure the job scheduler component can
>> handle recurring jobs natively? We could get around that a bit by
>> making sure wrap the whole job impl with a "rescue everything" block
>> which queues the next iteration no matter what. What do other
>> rails-based projects do about recurring background jobs?
>>
>> Scott
>>
It depends on situation - if you take a look at this page:
https://www.ruby-toolbox.com/categories/scheduling it seems that cron is
popular choice, but we can't use this because our schedule interval is
too short and re-loading whole rails env for each run is too expensive
for us. If not cron, ppl choose any of bg tool, it doesn't seem there is
a best one.
Here is couple of links to start:
http://www.tobinharris.com/past/2009/3/9/6-ways-to-run-background-jobs-in...
http://4loc.wordpress.com/2010/03/10/background-jobs-in-ruby-on-rails/
>
> I have to say, I don't immediately like the idea of having the job
> need to reschedule itself. It feels error-prone and tightly coupled to
> me.
Agreed, if a task gets interrupted and lost (someone pulls the plug on a
machine) it will be so forever.
+1 for using a tool which supports recurring jobs natively.
There is a patch for delayed job which adds recurring jobs support by
re-enqueueing them before running them - this minimizes the risk that a
job is not rescheduled because of an interruption during processing the
job but it still looks hacky and dangerous to me.
> I also strongly feel a full task/feature/whatever is needed here
> to:
> * list out what we would actually use such a daemon for, intial
> candidates off the top of my head include everything currently in
> dbomatic plus ldap user/group syncing, though I am sure there are
> others.
what I know about:
recurring tasks: instance checking, realms checking, ldap checking
delayed(background) tasks: instance start/stop at least when launching a
deployment. Not sure if it should be used also for start/stop/reboot
actions when a user seletects one or more instances in UI, but I tend to
think it should be.
Did we ever take care of handling instance actions that get invoked
outside the scope of Aeolus? Eg if an instance is started on a cloud
provider can we detect it? Can see an external tool (eg outside of
conductor, such as dbomatic) being useful in this sort of situation.
-Mo
No, dbomatic takes care only for instances created from conductor (IOW
instance must exist in conductor DB to monitor it).
Jan