We've a pretty gnarly issue with KVM paravirt clock at the moment.
Basically, cpufreq can kick the TSC out of sync on CPUs and that
confuses the hell out of guests because the current code assumes the
same TSC rate on all CPUs.
The problem manifests itself as completely random hangs and guest
crashes, with the current workaround being to boot the guest with
Glommer, Gerd, Juan and Marcelo are all trying to figure out the best
fix, with the latest candidate being:
But we'd really like to add this temporary patch to rawhide (and maybe
F10 if we don't fix it soon) ... any objections?
From: Glauber Costa <glommer(a)redhat.com>
Date: Thu, 29 Jan 2009 12:39:22 -0500
Subject: [PATCH] Disable kvmclock for non constant tsc cpus.
Currently, this code path is posing us big troubles,
and we won't have a decent patch in time. So, temporarily
There's a module parameter for the adventurous who want to force
Signed-off-by: Glauber Costa <glommer(a)redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti(a)redhat.com>
Signed-off-by: Mark McLoughlin <markmc(a)redhat.com>
arch/x86/kvm/x86.c | 7 ++++++-
1 files changed, 6 insertions(+), 1 deletions(-)
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index cc17546..2e22ac9 100644
@@ -957,6 +957,9 @@ out:
+static int force_kvmclock = 0;
+module_param(force_kvmclock, bool, 0644);
int kvm_dev_ioctl_check_extension(long ext)
@@ -967,7 +970,6 @@ int kvm_dev_ioctl_check_extension(long ext)
- case KVM_CAP_CLOCKSOURCE:
@@ -992,6 +994,9 @@ int kvm_dev_ioctl_check_extension(long ext)
r = iommu_found();
+ case KVM_CAP_CLOCKSOURCE:
+ r = force_kvmclock || boot_cpu_has(X86_FEATURE_CONSTANT_TSC);
r = 0;
I enabled CONFIG_PCI_STUB (#482792), but the build failed on ppc:
drivers/gpu/drm/nouveau/nouveau_state.c: In function 'nouveau_load':
drivers/gpu/drm/nouveau/nouveau_state.c:487: error: implicit declaration of
make: *** [drivers/gpu/drm/nouveau/nouveau_state.o] Error 1
I considered disabling nouveau again, but I see the last few commits
haven't been built so I just left it be.
2.6.28 has turned out to be a bit buggy. Also 2.6.27 has been chosen to be
a long-term supported kernel upstream. This means we can leave F9 and F10
on .27 and concentrate on getting .29 into shape for F10 and F11. With the
extra resources available from not trying to fix up .28 we can make .29 even
On Fri, 2009-01-23 at 18:49 +0000, Kyle McMartin wrote:
> * Fri Jan 23 2009 Kyle McMartin <kyle(a)redhat.com>
> - disable intel_iommu by default (enable with "intel_iommu=on")
We're pretty much guaranteed that no-one will enable this ... do we know
of specific hardware that this is breaking?
One of the major virt features in F11 needs this:
I thought there was a request to make memroy resource controller
(CONFIG_CGROUP_MEM_RES_CTLR) availele in Fedora10 age.
(But not configured.)
I'd like to request memory cgroup configured in Fedora 11.
(I'm sorry if too late.)
Comapring current implementation(2.6.28) with the version half year ago,
- There is no pointer from struct page.
IIRC, this pointer was a big obstacle for the merge request.
BTW, what kernel version Fedora11 will be based on ?
I prefer 2.6.29-rc version of memory cgroup rather than 2.6.28 ;)
I'm almost certain that is the not the right place to ask this question,
but if RedHat/Fedora's kernel engineers can't help me, I'm truly
I'm are using two Intel 10GbE (ixgbe) cards to passively monitor 10GbE
lines (Under RHEL 5.2) either using the in-kernel dev_add_pack interface
(built-in ixgbe driver) or using a slightly modified ixgbe driver.
(built around Intel's latest ixgbe driver)
However, I'm experiencing odd performance issues - namely, once I
configure the driver to use MSI-X w/ multi-queue [MQ] (forcing pci=msi)
and assign each IRQ to one CPU core (irq cpu affinity), my software
requires -10x- more CPU cycles (measured using rdtsc; compared to
multiple GbE links and/or w/ MSI-X/MQ disabled) to process each packet,
causing massive missed IRQs (rx_missed_errors) induced packet loss.
Looking at mpstat I can see the each CPU core is handling a fairly low
number of interrupts (200-1000) while spending most of its time in
softIRQ. (>90%, most likely within my own code)
I decided to check newer kernels so I've installed F10 (24C Xeon-MP
Intel S7000FC4U) and F9 (16C Opteron DL585G5, *) on two machines, but
even with 2.6.27 kernels and I'm experiencing the same performance
Given the fact that the same code is used to process packets - no matter
what type of links are being used, my first instinct was to look at the
CPU cores themselves. (E.g. L1 & L2 dcache miss rates; TLB flushes;
I tried using oprofile, but I failed to make it work.
On one machine (Xeon-MP, F10), oprofile failed to identify the
Dunnington CPU (switching to timer mode) and on the other (Barcelona
8354, F9), even though it was configured to report dcache statistics
[1,2] opreport returns empty reports.
In-order to verify that oprofile indeed works on Opteron machine, I
reconfigured oprofile to report CPU usage , but even than, oprofile
either returns empty results to hard-locks the machine.
A. Anyone else seeing the same odd behavior once MSI-X/MQ is enabled on
Intel's 10G cards? (P.S. MQ cannot be enabled on both machines unless I
add pci=msi to the kernel's command line)
B. Any idea why oprofile refuses to generate cache statistics and/or
what did I do wrong?
C. Before I dive into AMD's and Intel's MSR/PMC documentation and spend
the next five days trying to decipher which architectural /
non-architectural counter needs to set/used and how, do you have any
idea how I can access the performance counters without writing the code
 opcontrol --setup --vmlinux /usr/lib/debug/lib/modules/220.127.116.11-73.fc9.x86_64/vmlinux --event=DATA_CACHE_ACCESS:1000:0:1:1
 opcontrol --setup --vmlinux /usr/lib/debug/lib/modules/18.104.22.168-73.fc9.x86_64/vmlinux --event=L2_CACHE_MISS:1000:0:1:1
 opcontrol --setup --vmlinux /usr/lib/debug/lib/modules/22.214.171.124-73.fc9.x86_64/vmlinux --event=CPU_CLK_UNHALTED:10000000:0:1:1
* F10 seems to dislike the DL585G5; Issue already reported against anaconda. (#480638)