On Wed, May 18, 2011 at 13:51, Adam M. Dutko <dutko.adam(a)gmail.com> wrote:
> I think this is a good test to see what is the problem. The
deadlocks
> and OOM's seem to happen at 0400 when other virtual systems are
Hrm... so all of these are xen instances and they're doing backups at
the same time. If the rsync processes are going into a D state I'd
think it's an I/O exhaustion problem. Would it be possible to alter
the backup schedule and stagger them if the scheduler change doesn't
work?
I believe the backup jobs are staggered. The issue is more with the
0400 do my daily jobs that happen in cron.daily fires off. Now this
does not seem to be the cause all the time, but for at least a couple
it has occured. I am guessing that IO exhaustion is going on and the
OOM is because the kernel could not talk to swap for over 120 s and
went in a killing frenzy.
-Adam
_______________________________________________
infrastructure mailing list
infrastructure(a)lists.fedoraproject.org
https://admin.fedoraproject.org/mailman/listinfo/infrastructure
--
Stephen J Smoogen.
"The core skill of innovators is error recovery, not failure avoidance."
Randy Nelson, President of Pixar University.
"Let us be kind, one to another, for most of us are fighting a hard
battle." -- Ian MacLaren