On Fri, 18 Apr 2008, seth vidal wrote:
So xen1 went down today and I was helping bring things back up. I didn't know to look in /var/log/messages for the messages from xenGuestsRunning.sh. I was wondering this:
would it make sense to have xenGuestsRunning run every hour and re-make the symlinks in /etc/xen/auto for the guests which should be running on the machine? Also - if for some reason the xen guests can't be started up automatically due to other complexities (iscsi, memory over commit, etc) we could have xenGuestsrunning auto-generate a script which can be run to re-make the xen guests which should be running.
I'd be willing to put the script together, I just wanted to ask if there was a good reason NOT to do this, so I don't waste time if I've missed something.
The only reason we haven't done this already is the inability to detect if the box is already up somewhere (which is something we need already) Consider this scenario:
app1 running on xen1 (which is having high load from koji1 also on xen1)
People complain about the wiki.
We move app1 to a more free box, xen7.
high load causes CRASH
xen1 reboots. Attempts to bring app1 up (already up on xen7)
Two machines try to write to the same disk - DOOM.
There is a bit of hope in this. 1) its happened before and it seems that the second guest sees the disk is already mounted and gets stuck at an fsck shell. As long as we realize that that condition potentially means the box is already up and needs to be checked... we're fine. If someone tries to type the root password and fsck the disk... DOOM.
This is all a sign of a larger problem with the lack of open source management tools for virtualization on more then one host at a time. I'm a huge fan of automation so in general I'd like to see the plan above implemented but I think we need to alter the xm creation scripts (I'm not sure what this involves) that makes sure hosts don't come up on the wrong xen host.
-Mike