RFC: Background Processing

Monday, 9 April 2012

Hi,

This expands on some of the notes Jan provided in other RFCs. 
delayed_jobs and resque appears to be the most commonly deployed solution.

I listed what I thought should be the requirements for a background 
processing solution. For each requirement I then added some details on 
how well delayed_jobs and resque could satisfy it.

Resque contains most of the features we need. It requires Redis, which 
is a open source project sponsored by VMware. Redis is available in 
Fedora. But I don't see Redis available in RHEL and getting it in for 
RHEL is the big question mark.

https://www.aeolusproject.org/redmine/projects/aeolus/wiki/Background_Pro...

---

Background Processing

# Summary

The two most common solutions are delayed_jobs and resque. There is a 
good write up on github comparing other background processing solutions 
and why they eventually steered towards delayed_jobs and then resque, 
https://github.com/blog/542-introducing-resque.

The primary differences between delayed_jobs and resque are:

At the moment, delayed_jobs doesn't have support for recurring jobs. 
Resque does support recuring jobs through the resque-scheduler 
extension/gem.

resque provides a sinatra app to monitor the queue. delayed_job doesn't 
provide monitoring tools out of the box, but we can potential build 
something on top of rails or simply look at the contents of the database 
table.

resque requires multiple components and potentially could be more 
difficult to support. It requries a second gem called resque-scheduler. 
It also uses Redis as its backend and it is currently not available with 
RHEL. This may be the deal breaker.

# Requirements

1. Bucket jobs into different queues. A long running job to check 
instance status for 1000 instances should not hold up other jobs. The 
solution should also support multiple workers which would minimize 
impact of longer running jobs. But using different queues will offer 
finer grain control.

* delayed_job: supports multiple queues through named queues starting 
with version 3.0. Can start up multiple workers for all queues or for 
specific queues.
* resque: supports multiple queues and workers.

2. Jobs should persist in some way. If a crash occurs, we should be able 
to restart the system and continue with processing incomplete jobs in 
the queue.

* delayed_job: Jobs persists as objects stored in activerecord entries.
* resque: Jobs persists as json objects in redis entries. Using json 
objects instead of actual objects which may have advanced to a different 
version makes updating the application potentially easier.

3. Recurring jobs.

* delayed_jobs: Not available, in development.
* resque: Through resque-scheduler extension.
* whenever: A potential alternative to do cron style scheduling [6].

4. Alerts. Failures should be presented to the user in some way (email, 
conductor UI) so that appropriate actions can be taken.

* delayed_jobs: Support code hooks for different stages in the process. 
Hooks can be added for error, failure, success.. By default workers will 
retry a job 25 times. We should use a lower number. No sense in retrying 
that number of times and holding up the queue if there is a hard failure 
somewhere in the system.  By default it also deletes failed jobs, but 
can be configured to leave them in the queue with a flag to indicate 
failure.
* resque: Failed jobs can go through additional processing using 
different failure backends. redis, syslog, custom, etc..

5. A mechanism to requeue a failed job once the underlying issue has 
been resolved. If an instance start job fails and there is a network 
failure to a provider. Once the network is back online, we should have 
an ability to requeue those jobs.  Not sure if this should be automated 
or if this should be a button somewhere where a user can manually 
requeue all or select failed jobs.

* custom

6. Monitor job status. We should have some way to see what is in the queue.

* delayed_jobs: Can only view queue through activerecord database 
entries. There is no UI so it is more difficult to see what is going on.
* resque: Provides a sinatra app to monitor queues, jobs, and workers.

7. Should not enqueue duplicate jobs.

* custom

8. Ability to remove jobs from the queues and to place a pause on the 
queues or jobs.

* custom

9. Supportable in Fedora and RHEL

* delayed_jobs: We used it in the past. Will need to carry the gem.
* resque: Will need to carry the gem. In addition it requires Redis as 
the backend. Redis is available in Fedora but not in RHEL. Redis is a 
open source project sponsored by VMware [4].

# Use Cases

1. Dbomatic replacement for instance and realm checking and RHEV 
instance start.

Each RHEV instance that is created will also lead to a job that is 
enqueued to start that instance.

Create a new job to perform instance status check. Create a status check 
job for each provider account. Allow status check job to be 
disabled/enabled per provider account.

Create a new job to sync realms for all providers. This can be broken up 
to a job per provider if needed.

Create two queues. One for managing instance lifecycle. And a second 
queue for all other jobs. Start with two workers per queue. Make the 
number of workers configurable so that it may be adjusted when needed.

2. ldap syncing

3. Generic instance start and stop

# Reference

[1] https://github.com/collectiveidea/delayed_job/wiki/Named-Queues-Proposal

[2] https://github.com/blog/542-introducing-resque
Discusses github's use of different background job solutions

[3] https://github.com/bvandenbos/resque-scheduler

[4] http://redis.io/

[5] http://blog.railsupgrade.com/2011/08/replace-delayedjob-with-resque.html

[6] https://github.com/javan/whenever

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011