On Wed, Aug 31, 2011 at 9:40 AM, Jerry James loganjerry@gmail.com wrote:
I have a Fedora 15 x86_64 system, 8 GB of RAM, 4 CPUs, 1 terabyte disk. I run several virtual machines on it, used to check cross-system compatibility of some software I develop for my employer. I run "yum upgrade" on the host every work day, so it is up to date as of today (August 31).
I have a RHEL 6.1 guest that I am using very heavily right now. I run it with virt-manager. The guest's disk is a logical volume with no (host) filesystem. The guest's memory is drawn from a 1 GB hugetlbfs. The host has a Nehalem CPU, which is exposed (to the extent possible) to the guest. The display uses spice + qxl drivers in the guest.
Recently, the guest has gotten stuck from time to time. I'll be typing away and suddenly the guest's display will freeze. When this happens, I can still use the mouse to perform functions on the host, but pressing keys has no effect. Exactly 150 seconds later, the guest will unfreeze (sort of; see below) and I can change keyboard focus to other host applications again. At that point, the guest window will *partially* repaint, but will still not change in response to mouse or keyboard actions. I have to close that window and then click in virt-manager to open a new window. At that point, I can see that all of the key presses I made while it was frozen were received and acted on by the guest, so it is only the display that froze.
I have other Linux guests. None of them display this behavior. Is this some kind of incompatibility between the RHEL 6.1 qxl drivers and Fedora 15 spice? Does a 150 second timeout ring a bell with anyone? Is there some way to get the keyboard focus away from the guest when this happens so I can at least do something useful on the host?
So nobody else is seeing this? I just had it happen with a Rawhide x86_64 guest that had been running for several hours. So while it seems to happen with much greater frequency with the RHEL 6.1 guest for some reason, it's affecting other guests, too. A quick scan through the spice source code reveals two 150 second timeouts:
DETACH_TIMEOUT DISPLAY_CLIENT_TIMEOUT
both defined in server/red_worker.c. The first seems unlikely; does anyone know what hitting that second timeout implies?