On 02/03/2011 06:38 PM, Chris Lalancette wrote:
On 02/03/11 - 03:53:59PM, jprovazn(a)redhat.com wrote:
> From: Jan Provaznik<jprovazn(a)redhat.com>
>
> While-true loop w/o any sleep eats all system resources.
Hm, I'm not sure that this patch is what is needed. The fact of the matter is
that in both condor_refreshd and dbomatic, they should be blocking based on
events. While your patch will stop them from consuming all resources, it may
mask an underlying problem.
condor_refreshd should block indefinitely waiting from an update from condor
(which should only happen once every 15 minutes). Similarly, dbomatic should
block indefinitely waiting on a change to EventLog (which again, shouldn't
happen that often).
What's the failure scenario that you saw? Is it repeatable?
Actually sleep doesn't mask any problem (wrapping in rescue w/o logging
is example of masking), it just reduces "damages" of while-true loop ->
especially filling disk with big logfile.
condor_refreshd - weakness is in calling condormatic_classads_sync
before waiting for data on socket. Examples of failing scenarios:
- cloud_account has nil instance_key
- DB is not initialized (there are no tables in DB) when you are
starting the script
dbomatic - haven't caught error with this, but only because notifier
apparently wraps code in its own rescue block, but it doesn't matter -
generally you can't rely on that code in 'while block' won't crash or
that it will be blocked by waiting for data so I believe it's always
good idea to add sleep into busy-waiting blocks.
Jan