This expands on some of the notes Jan provided in other RFCs.
delayed_jobs and resque appears to be the most commonly deployed solution.
I listed what I thought should be the requirements for a background
processing solution. For each requirement I then added some details on
how well delayed_jobs and resque could satisfy it.
Resque contains most of the features we need. It requires Redis, which
is a open source project sponsored by VMware. Redis is available in
Fedora. But I don't see Redis available in RHEL and getting it in for
RHEL is the big question mark.
The two most common solutions are delayed_jobs and resque. There is a
good write up on github comparing other background processing solutions
and why they eventually steered towards delayed_jobs and then resque,
The primary differences between delayed_jobs and resque are:
At the moment, delayed_jobs doesn't have support for recurring jobs.
Resque does support recuring jobs through the resque-scheduler
resque provides a sinatra app to monitor the queue. delayed_job doesn't
provide monitoring tools out of the box, but we can potential build
something on top of rails or simply look at the contents of the database
resque requires multiple components and potentially could be more
difficult to support. It requries a second gem called resque-scheduler.
It also uses Redis as its backend and it is currently not available with
RHEL. This may be the deal breaker.
1. Bucket jobs into different queues. A long running job to check
instance status for 1000 instances should not hold up other jobs. The
solution should also support multiple workers which would minimize
impact of longer running jobs. But using different queues will offer
finer grain control.
* delayed_job: supports multiple queues through named queues starting
with version 3.0. Can start up multiple workers for all queues or for
* resque: supports multiple queues and workers.
2. Jobs should persist in some way. If a crash occurs, we should be able
to restart the system and continue with processing incomplete jobs in
* delayed_job: Jobs persists as objects stored in activerecord entries.
* resque: Jobs persists as json objects in redis entries. Using json
objects instead of actual objects which may have advanced to a different
version makes updating the application potentially easier.
3. Recurring jobs.
* delayed_jobs: Not available, in development.
* resque: Through resque-scheduler extension.
* whenever: A potential alternative to do cron style scheduling .
4. Alerts. Failures should be presented to the user in some way (email,
conductor UI) so that appropriate actions can be taken.
* delayed_jobs: Support code hooks for different stages in the process.
Hooks can be added for error, failure, success.. By default workers will
retry a job 25 times. We should use a lower number. No sense in retrying
that number of times and holding up the queue if there is a hard failure
somewhere in the system. By default it also deletes failed jobs, but
can be configured to leave them in the queue with a flag to indicate
* resque: Failed jobs can go through additional processing using
different failure backends. redis, syslog, custom, etc..
5. A mechanism to requeue a failed job once the underlying issue has
been resolved. If an instance start job fails and there is a network
failure to a provider. Once the network is back online, we should have
an ability to requeue those jobs. Not sure if this should be automated
or if this should be a button somewhere where a user can manually
requeue all or select failed jobs.
6. Monitor job status. We should have some way to see what is in the queue.
* delayed_jobs: Can only view queue through activerecord database
entries. There is no UI so it is more difficult to see what is going on.
* resque: Provides a sinatra app to monitor queues, jobs, and workers.
7. Should not enqueue duplicate jobs.
8. Ability to remove jobs from the queues and to place a pause on the
queues or jobs.
9. Supportable in Fedora and RHEL
* delayed_jobs: We used it in the past. Will need to carry the gem.
* resque: Will need to carry the gem. In addition it requires Redis as
the backend. Redis is available in Fedora but not in RHEL. Redis is a
open source project sponsored by VMware .
# Use Cases
1. Dbomatic replacement for instance and realm checking and RHEV
Each RHEV instance that is created will also lead to a job that is
enqueued to start that instance.
Create a new job to perform instance status check. Create a status check
job for each provider account. Allow status check job to be
disabled/enabled per provider account.
Create a new job to sync realms for all providers. This can be broken up
to a job per provider if needed.
Create two queues. One for managing instance lifecycle. And a second
queue for all other jobs. Start with two workers per queue. Make the
number of workers configurable so that it may be adjusted when needed.
2. ldap syncing
3. Generic instance start and stop
Discusses github's use of different background job solutions
To Whom It May Concern,
Whilst I was using the application, I came to the realization that some of the links in the footer served no purpose. They went nowhere, and there wasn't even an apparent place that they *should* go.
I entered https://bugzilla.redhat.com/show_bug.cgi?id=787280 and proceeded to remove the links and the associated entries in en.yml.
I thank you for your kind consideration of this matter.
This issue was really hard to debug: first, I was able to reproduce it only in production environment, not in development env. It's still unclear how it affects Internet Explorer's behaviour.
As it turned out, IE caches every ajax requests. It means that e.g. on the deployments#show page clicking on the Properties tab, then the Instances tab and then back to the Properties tab does not fire any request to the server to update the content of the Properties tab. Needless to say that it's not broken in FF or Chrome.
The solution is quite simple, jQuery provides a 'cache' parameter for the global ajaxSetup config. If it's false, then jQuery will append a timestamp to the url avoiding the IE browser cache.
Our Apache config was invalid for /fonts, though the problem wasn't apparent because the current UI doesn't use them. As we work towards a unified UI, we're using a web font and it wasn't showing in RPM builds due to the invalid reference. This fixes the Alias, tells Apache to not proxy those requests to Rails, and also sets Cache-Control headers on the fonts. I had to add a MIME type for *.woff manually to appease Apache.
This is not associated with a BZ. There is no need to carry this in the existing product release; it's only useful going forward when we add web fonts.
this patchset adds deployment state attribute. This attribute will be used for
tracking deployment's state which will be needed especially for doing rollback
when launching a deployment. It also replaces 'status' method which computed deployment's
state from states of all instances.
This patchset introduces state transitions (IOW now it's not possible to change
deployment's state from 'running' to 'pending' or from 'stopped' to 'shutting down').
deployment's state is now mostly changed on a user's action (e.g. a user clicks stop
button -> state is changed to 'shutting down'), though some transitions are done
automatically ('pending' -> 'running', 'shutting down' -> 'stopped', all rollback
state transitions will be automatic too).
Drawback of setting state when an action is performed is that deployment's state may not describe
real state of all instances properly, for example if a deployment is running and all instances are
suddenly changed to 'stopped' (from outside of conductor), then deployment's state will be 'incomplete'
until a user stops/deletes the deployment from conductor.
Which reminds me I've added another deployment state: 'incomplete', this state is used when a deployment
is running and then some (or all) instances unexpectedly change state (wiki page Robust_instance_launching
is updated too).
One of the tasks I've been working on is #3178, "Assess the viability of
importing and launching images from a openstack provider." We don't have
many OpenStack implementation tasks in this sprint, but I was kind of
roaring to see something working, so I wanted to sort of take inventory
of where things stood today.
With the patches adding support applied to master a bit ago, I pulled up
a Rails console and saw how far I could get.  lists the steps I took.
The first obstacle I hit was some exception in the Deltacloud logs,
raised when I tried to list instances. Marios helped me debug this
and indicated that the problem was that I was using an older version of
the Deltacloud package; it was using the Rackspace driver versus the
OpenStack one. (See the filename in line 20 of the link.)
Until this is updated in the next Deltacloud release, I worked around
this by building a local gem for Deltacloud and restarting with that.
That much worked great.
My next issue was that we do not appear to have anything in place to
collect any credentials for those setting up an OpenStack provider
account. Thus I cheated and used the Rails console for now, talking to
Deltacloud directly. And with that, basic Deltacloud API operations work
-- I can list images and instances, for example.
Complicating things, the credentials we need are semi-variable. If you
are running Keystone, we need to collect your username, password, and
tenant name. If you're not running Keystone, we don't need your tenant
name. (OpenStack also allows authentication via key + secret key, but
Deltacloud does not presently support this.) I'm not sure what the most
elegant solution is here just yet, nor if there's a good way to proceed
without provider-specific logic. As far as Deltacloud is concerned, the
tenant name is just added to the username, so we could do the same,
except it might not be particularly clear to end users.
I attempted to add Credential Definitions and add a provider account,
but that resulted in me receiving some exceptions about hardware
I'd like to investigate further, but I think I'm fairly far afield from
my stated objective of investigating the viability of importing images.
It may be that I'm just around the bend from getting it working, but I'm
starting to think that the conclusion here is that, in its present form,
the answer to the task is "Not very viable at the moment." I'd be happy
to carry on, but only if people think it's a worthwhile endeavor before
Factory adds support for building for OpenStack.
Note that you need the newly pushed (to fedora 16) version of deltacloud
client to see this error. This patch should not cause 0.4 to fail for
those still on that older version (you need to upgrade soon, btw). As
Fedora 15 is closing in on EOL, you can always gem install this, or
grab the rpm from koji for version 0.5. The exact test to run is in
the RM issue, but for your convenience in testing, it is: