On 07/13/17 at 11:50am, Xunlei Pang wrote:
On 05/09/2017 at 07:52 PM, Xunlei Pang wrote:
> We met a problem on AMD machines, when using "nr_cpus=4" for
> kdump, and crash happens on cpus other than cpu0, kdump kernel
> will fail to boot and eventually reset.
> After some debugging, we found that it stuck at the kernel path
> do_boot_cpu()-> ... ->wakeup_secondary_cpu_via_init():
> that is, it stuck at sending INIT from AP to BP and reset, which
> is actually what "disable_cpu_apicid=X" tries to solve. Printing
> the value of @phys_apicid showed that it was the value of "apicid"
> other that of "initial apicid" showed by /proc/cpuinfo.
> As described in x86 specification:
> "In MP systems, the local APIC ID is also used as a processor ID by the
> BIOS and the operating system. Some processors permit software to modify
> the APIC ID. However, the ability of software to modify the APIC ID is
> processor model specific. Because of this, operating system software
> should avoid writing to the local APIC ID register. The value returned by
> bits 31-24 of the EBX register (when the CPUID instruction is executed with a
> source operand value of 1 in the EAX register) is always the Initial APIC ID
> (determined by the platform initialization). This is true even if software
> has changed the value in the Local APIC ID register."
> From kernel commit 151e0c7de("x86, apic, kexec: Add disable_cpu_apicid
> kernel parameter"), we can see in generic_processor_info(), it uses
> a)read_apic_id() and b)@apicid to compare with @disabled_cpu_apicid.
> a)@apicid which is actually @phys_apicid above-mentioned is from the
> following calltrace(on the problematic AMD machine):
> The value of @apicid(from acpi MADT) is equal to the value of "apicid"
> showed by /proc/cpuinfo as proved by our debug printk.
> b)read_apic_id() gets the value from LAPIC ID register which is "apicid"
> as well.
> While the value of "initial apicid" is from cpuid instruction.
> One example of "apicid" and "initial apicid" of cpu0 from
> on AMD machine:
> apicid : 32
> initial apicid : 0
> Therefore, we should assign /proc/cpuifo "apicid" to
> We've never met such issue before, because we usually tested
> and mostly on Intel machines, and "apicid" and "initial apicid"
> same value in most cases on Intel machines.
According previous discussions, this patch looks correct to use "apicid".
Ping Dave, can we have this one now?
I do not know much the x86 specific so I just leave it to you. Since Hatayama
is also fine with it, so:
Acked-by: Dave Young <dyoung(a)redhat.com>
Qiao, can you help to do more testing on both Intel and AMD machines
before we apply it?