System runs latest FC5 x86-64 kernel (2.6.17-1.2139_FC5) System might suddenly hang hard or reboot.
Seems to blurt sporadic errors to syslog.
Console messages: Message from syslogd@m1 at Thu Jun 29 00:56:55 2006 ... m1 kernel: Oops: 0000 [1] SMP Message from syslogd@m1 at Thu Jun 29 00:56:55 2006 ... m1 kernel: CR2: 0000000000000020
Dmesg shows this oops: Unable to handle kernel NULL pointer dereference at 0000000000000020 RIP: <ffffffff8022021c>{copy_process+3132} PGD 56690067 PUD 54fbe067 PMD 0 Oops: 0000 [1] SMP last sysfs file: /block/hdd/removable CPU 0 Modules linked in: ipv6 autofs4 dm_mirror dm_mod video button battery acpi_memhotplug ac lp parport_pc parport sg i2c_nforce2 forcedeth floppy i2c_core raid1 ext3 jbd sata_nv libata sd_mod scsi_mod Pid: 27939, comm: get-errors.sh Not tainted 2.6.17-1.2139_FC5 #1 RIP: 0010:[<ffffffff8022021c>] <ffffffff8022021c>{copy_process+3132} RSP: 0018:ffff810056239d78 EFLAGS: 00010286 RAX: 0000000000000000 RBX: ffff810056bfa700 RCX: 0000000000000000 RDX: 0000000000000000 RSI: ffff81005642c648 RDI: ffff81007ac52420 RBP: ffff81007ac52420 R08: ffff81005593a000 R09: 00000000000559e1 R10: 0000000000000000 R11: 0000000000000001 R12: ffff81007dccf0c0 R13: ffff810037e6e400 R14: ffff81005642c590 R15: ffff81007b5be080 FS: 00002aaaaaab2d50(0000) GS:ffffffff8069c000(0000) knlGS:00000000f7e45b60 CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b CR2: 0000000000000020 CR3: 0000000058a1e000 CR4: 00000000000006e0 Process get-errors.sh (pid: 27939, threadinfo ffff810056238000, task ffff8100639980c0) Stack: 00000000ffffffff 00002aaaaaab2de0 0000000000000000 ffff810056239f58 00007fffbe0c1fd0 0000000001200011 0000000000000000 ffff81007dccf0c0 ffff810037e6e400 ffff81007ac522c8 Call Trace: <ffffffff8024ade2>{sprintf+81} <ffffffff8026a183>{_spin_unlock_irq+9} <ffffffff802331d7>{do_fork+208} <ffffffff802697db>{__mutex_lock_slowpath+868} <ffffffff80254017>{do_pipe+610} <ffffffff80269469>{__mutex_unlock_slowpath+522} <ffffffff8021454a>{generic_file_llseek+127} <ffffffff80262d8e>{system_call+126} <ffffffff8026309b>{ptregscall_common+103}
Code: 48 8b 40 20 f0 ff 43 28 f6 45 29 08 74 07 f0 ff 88 34 03 00 RIP <ffffffff8022021c>{copy_process+3132} RSP <ffff810056239d78> CR2: 0000000000000020
get-errors.sh runs smartctl on both existing and nonexisting disks. It seems to have hung on hdd. hdd: CDU5211, ATAPI CD/DVD-ROM drive hdd: ATAPI 52X CD-ROM drive, 120kB Cache, UDMA(33)
relevant part of get-errors.sh: ----- for device in "hda" "hdb" "hdc" "hdd" "hde" "hdf" do exists=0 smartctl -i /dev/$device | grep -i support | grep -ci enabled > $tempfile1 exists=`cat $tempfile1`
# Continus only if SMART support is enabled if [ "$exists" != "0" ]; then smartctl -H /dev/$device > $tempfile2 smartctl -c /dev/$device >> $tempfile2 errtmp=`grep -ci PASSED $tempfile2`
# Continue only if the disk reports a temperature if [ "$errtmp" == "0" ]; then echo "SMART: /dev/$device failed smart tests" >> $logfile smarterr=$smarterr+1 fi fi done -----
The system has rebootet twicein 3 days, this time it did not reboot but some processes have hung. Running ps ax was a bad idea, as that hung too.
crash utility just fails with the following error: crash: cannot resolve "cpu_pda"
For more info just ask. I might reboot it but I'm sure it's not going to be far between crashes.
-HK
On Thu, 2006-06-29 at 13:21 -0400, Alan Cox wrote:
On Thu, Jun 29, 2006 at 06:39:34PM +0200, Hans Kristian Rosbach wrote:
Message from syslogd@m1 at Thu Jun 29 00:56:55 2006 ... m1 kernel: CR2: 0000000000000020 Unable to handle kernel NULL pointer dereference at 0000000000000020
Does the box have ECC memory ?
No, but it does not fail memtest86+ (the version that is on the FC5 cd) It is unstable on both of these atleast: kernel-2.6.16-1.2133_FC5 kernel-2.6.17-1.2139_FC5
We are also having trouble with an IBM 4x P3-xeon 700Mhz box using the same two kernels. It's located 10 hours from here so I'm not very likely to get access to oops messages. (their technician is on vacation so I have to coach the whoever-is-available staff when something goes wrong). But this does come up on boot:
setup_irq: irq handler mismatch <c0447c64> setup_irq+0xef/0xfc <c0540601> serial8250_interrupt +0x0/0xe2 <c0447dcc> request_irq+0x6d/0x89 <c0540269> serial8250_startup +0x2f3/0x434 <c053cf5b> uart_startup+0x65/0x10a <c053d1c1> uart_open+0x1c1/0x3d0 <c0524475> init_dev+0x3c6/0x4f6 <c0525881> tty_open+0x181/0x2d4 <c046ed80> chrdev_open+0x182/0x19e <c046ebfe> chrdev_open+0x0/0x19e <c04666db> __dentry_open+0xc6/0x1aa <c0466823> nameidata_to_filp +0x19/0x28 <c046685d> do_filp_open+0x2b/0x31 <c047b686> dput+0xf0/0x210 <c0466955> do_sys_open+0x3c/0xa9 <c04669ef> sys_open+0x16/0x18 <c0403d2f> syscall_call+0x7/0xb
Not sure whether it's related or not.
00:00.0 Host bridge: Broadcom CNB20HE Host Bridge (rev 21) 00:00.1 Host bridge: Broadcom CNB20HE Host Bridge (rev 01) 00:00.2 Host bridge: Broadcom CNB20HE Host Bridge 00:00.3 Host bridge: Broadcom CNB20HE Host Bridge 00:05.0 Ethernet controller: Advanced Micro Devices [AMD] 79c970 [PCnet32 LANCE] (rev 44) 00:06.0 VGA compatible controller: S3 Inc. Savage 4 (rev 04) 00:0f.0 ISA bridge: Broadcom OSB4 South Bridge (rev 4f) 00:0f.1 IDE interface: Broadcom OSB4 IDE Controller 00:0f.2 USB Controller: Broadcom OSB4/CSB5 OHCI USB Controller (rev 04) 02:06.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL-8169 Gigabit Ethernet (rev 10) 05:02.0 PCI bridge: IBM PCI-X to PCI-X Bridge (rev 02) 06:00.0 RAID bus controller: LSI Logic / Symbios Logic MegaRAID (rev 02)
-HK
On Fri, Jun 30, 2006 at 09:32:52AM +0200, Hans Kristian Rosbach wrote:
have to coach the whoever-is-available staff when something goes wrong). But this does come up on boot:
setup_irq: irq handler mismatch <c0447c64> setup_irq+0xef/0xfc <c0540601> serial8250_interrupt
Are both the serial port and one of the other devices assigned the same legacy IRQ (3 or 4) ?
On Fri, 2006-06-30 at 05:22 -0400, Alan Cox wrote:
On Fri, Jun 30, 2006 at 09:32:52AM +0200, Hans Kristian Rosbach wrote:
have to coach the whoever-is-available staff when something goes wrong). But this does come up on boot:
setup_irq: irq handler mismatch <c0447c64> setup_irq+0xef/0xfc <c0540601> serial8250_interrupt
Are both the serial port and one of the other devices assigned the same legacy IRQ (3 or 4) ?
# cat /proc/interrupts CPU0 CPU1 CPU2 CPU3 0: 3991989 3982729 3957498 4014977 IO-APIC-edge timer 1: 5 10 7 2 IO-APIC-edge i8042 3: 0 0 1 0 IO-APIC-level acpi 8: 0 0 1 0 IO-APIC-edge rtc 12: 79 83 84 83 IO-APIC-edge i8042 14: 238014 238873 236932 239364 IO-APIC-edge ide0 177: 19742 19363 19022 19420 IO-APIC-level megaraid 193: 47552 47200 46515 47467 IO-APIC-level eth1 201: 0 0 0 0 IO-APIC-level ohci_hcd:usb1 NMI: 0 0 0 0 LOC: 15947085 15947084 15947083 15947082 ERR: 0 MIS: 0
# cat /proc/ioports 0000-001f : dma1 0020-0021 : pic1 0040-0043 : timer0 0050-0053 : timer1 0060-006f : keyboard 0070-0077 : rtc 0080-008f : dma page reg 00a0-00a1 : pic2 00c0-00df : dma2 00f0-00ff : fpu 01f0-01f7 : ide0 02f8-02ff : serial 0374-0375 : pnp 00:0d 0377-0377 : pnp 00:0d 03c0-03df : vga+ 03f6-03f6 : ide0 03f8-03ff : serial 0430-0437 : pnp 00:01 0438-0439 : pnp 00:01 0440-0447 : piix4_smbus 0480-0483 : PM1b_EVT_BLK 0488-048b : PM_TMR 04e0-04e3 : PM1a_EVT_BLK 04e4-04e5 : PM1a_CNT_BLK 04f0-04f7 : GPE0_BLK 0600-0600 : pnp 00:0d 0700-070f : 0000:00:0f.1 0700-0707 : ide0 0708-070f : ide1 0900-090f : pnp 00:0d 0f50-0f58 : pnp 00:0d 2200-221f : 0000:00:05.0 2200-221f : pcnet32_probe_pci 4000-40ff : 0000:02:06.0 4000-40ff : r8169
# dmesg Linux version 2.6.17-1.2139_FC4smp (bhcompile@hs20-bc1-5.build.redhat.com) (gcc version 4.0.2 20051125 (Red Hat 4.0.2-8)) #1 SMP Fri Jun 23 21:12:13 EDT 2006 BIOS-provided physical RAM map: BIOS-e820: 0000000000000000 - 000000000009dc00 (usable) BIOS-e820: 000000000009dc00 - 00000000000a0000 (reserved) BIOS-e820: 00000000000e0000 - 0000000000100000 (reserved) BIOS-e820: 0000000000100000 - 000000007fff9380 (usable) BIOS-e820: 000000007fff9380 - 0000000080000000 (ACPI data) BIOS-e820: 00000000fec00000 - 00000000fec01000 (reserved) BIOS-e820: 00000000fee00000 - 00000000fee01000 (reserved) BIOS-e820: 00000000fff80000 - 0000000100000000 (reserved) 1151MB HIGHMEM available. 896MB LOWMEM available. found SMP MP-table at 0009ddd0 Using x86 segment limits to approximate NX protection On node 0 totalpages: 524281 DMA zone: 4096 pages, LIFO batch:0 Normal zone: 225280 pages, LIFO batch:31 HighMem zone: 294905 pages, LIFO batch:31 DMI 2.3 present. Using APIC driver default ACPI: RSDP (v000 IBM ) @ 0x000fdfd0 ACPI: RSDT (v001 IBM SERASSLT 0x00001001 IBM 0x45444f43) @ 0x7fffff80 ACPI: FADT (v001 IBM SERASSLT 0x00001001 IBM 0x45444f43) @ 0x7fffff00 ACPI: MADT (v001 IBM SERASSLT 0x00001001 IBM 0x45444f43) @ 0x7ffffe80 ACPI: DSDT (v001 IBM SERASSLT 0x00001001 MSFT 0x0100000b) @ 0x00000000 ACPI: PM-Timer IO Port: 0x488 ACPI: Local APIC address 0xfee00000 ACPI: LAPIC (acpi_id[0x00] lapic_id[0x03] enabled) Processor #3 6:10 APIC version 17 ACPI: LAPIC (acpi_id[0x01] lapic_id[0x00] enabled) Processor #0 6:10 APIC version 17 ACPI: LAPIC (acpi_id[0x02] lapic_id[0x01] enabled) Processor #1 6:10 APIC version 17 ACPI: LAPIC (acpi_id[0x03] lapic_id[0x02] enabled) Processor #2 6:10 APIC version 17 ACPI: IOAPIC (id[0x0e] address[0xfec00000] gsi_base[0]) IOAPIC[0]: apic_id 14, version 17, address 0xfec00000, GSI 0-15 ACPI: IOAPIC (id[0x0d] address[0xfec01000] gsi_base[16]) IOAPIC[1]: apic_id 13, version 17, address 0xfec01000, GSI 16-31 ACPI: IRQ3 used by override. Enabling APIC mode: Flat. Using 2 I/O APICs Using ACPI (MADT) for SMP configuration information Allocating PCI resources starting at 88000000 (gap: 80000000:7ec00000) Built 1 zonelists Kernel command line: ro lapic root=LABEL=/ mapped APIC to ffffd000 (fee00000) mapped IOAPIC to ffffc000 (fec00000) mapped IOAPIC to ffffb000 (fec01000) Enabling fast FPU save and restore... done. Enabling unmasked SIMD FPU exception support... done. Initializing CPU#0 CPU 0 irqstacks, hard=c07b0000 soft=c07d0000 PID hash table entries: 4096 (order: 12, 16384 bytes) Detected 701.902 MHz processor. Using pmtmr for high-res timesource Console: colour VGA+ 80x25 Dentry cache hash table entries: 131072 (order: 7, 524288 bytes) Inode-cache hash table entries: 65536 (order: 6, 262144 bytes) Memory: 2068736k/2097124k available (2061k kernel code, 27168k reserved, 1437k data, 236k init, 1179620k highmem) Checking if this processor honours the WP bit even in supervisor mode... Ok. Calibrating delay using timer specific routine.. 1405.10 BogoMIPS (lpj=2810203) Security Framework v1.0.0 initialized SELinux: Initializing. SELinux: Starting in permissive mode selinux_register_security: Registering secondary module capability Capability LSM initialized as secondary Mount-cache hash table entries: 512 CPU: After generic identify, caps: 0383fbff 00000000 00000000 00000000 00000000 00000000 00000000 CPU: After vendor identify, caps: 0383fbff 00000000 00000000 00000000 00000000 00000000 00000000 CPU: L1 I cache: 16K, L1 D cache: 16K CPU: L2 cache: 1024K CPU: After all inits, caps: 0383f3ff 00000000 00000000 00000040 00000000 00000000 00000000 Intel machine check architecture supported. Intel machine check reporting enabled on CPU#0. Checking 'hlt' instruction... OK. SMP alternatives: switching to UP code CPU0: Intel Pentium III (Cascades) stepping 01 SMP alternatives: switching to SMP code Booting processor 1/0 eip 3000 CPU 1 irqstacks, hard=c07b1000 soft=c07d1000 Initializing CPU#1 Calibrating delay using timer specific routine.. 1403.59 BogoMIPS (lpj=2807180) CPU: After generic identify, caps: 0383fbff 00000000 00000000 00000000 00000000 00000000 00000000 CPU: After vendor identify, caps: 0383fbff 00000000 00000000 00000000 00000000 00000000 00000000 CPU: L1 I cache: 16K, L1 D cache: 16K CPU: L2 cache: 1024K CPU: After all inits, caps: 0383f3ff 00000000 00000000 00000040 00000000 00000000 00000000 Intel machine check architecture supported. Intel machine check reporting enabled on CPU#1. CPU1: Intel Pentium III (Cascades) stepping 01 SMP alternatives: switching to SMP code Booting processor 2/1 eip 3000 CPU 2 irqstacks, hard=c07b2000 soft=c07d2000 Initializing CPU#2 Calibrating delay using timer specific routine.. 1403.61 BogoMIPS (lpj=2807225) CPU: After generic identify, caps: 0383fbff 00000000 00000000 00000000 00000000 00000000 00000000 CPU: After vendor identify, caps: 0383fbff 00000000 00000000 00000000 00000000 00000000 00000000 CPU: L1 I cache: 16K, L1 D cache: 16K CPU: L2 cache: 1024K CPU: After all inits, caps: 0383f3ff 00000000 00000000 00000040 00000000 00000000 00000000 Intel machine check architecture supported. Intel machine check reporting enabled on CPU#2. CPU2: Intel Pentium III (Cascades) stepping 01 SMP alternatives: switching to SMP code Booting processor 3/2 eip 3000 CPU 3 irqstacks, hard=c07b3000 soft=c07d3000 Initializing CPU#3 Calibrating delay using timer specific routine.. 1403.61 BogoMIPS (lpj=2807235) CPU: After generic identify, caps: 0383fbff 00000000 00000000 00000000 00000000 00000000 00000000 CPU: After vendor identify, caps: 0383fbff 00000000 00000000 00000000 00000000 00000000 00000000 CPU: L1 I cache: 16K, L1 D cache: 16K CPU: L2 cache: 1024K CPU: After all inits, caps: 0383f3ff 00000000 00000000 00000040 00000000 00000000 00000000 Intel machine check architecture supported. Intel machine check reporting enabled on CPU#3. CPU3: Intel Pentium III (Cascades) stepping 01 Total of 4 processors activated (5615.92 BogoMIPS). ENABLING IO-APIC IRQs ..TIMER: vector=0x31 apic1=0 pin1=0 apic2=-1 pin2=-1 checking TSC synchronization across 4 CPUs: passed. Brought up 4 CPUs migration_cost=4000 checking if image is initramfs... it is Freeing initrd memory: 1170k freed NET: Registered protocol family 16 ACPI: bus type pci registered PCI: PCI BIOS revision 2.10 entry at 0xfd32c, last bus=8 Setting up standard PCI resources ACPI: Subsystem revision 20060127 ACPI: Interpreter enabled ACPI: Using IOAPIC for interrupt routing ACPI: PCI Root Bridge [PCI0] (0000:00) PCI: Probing PCI hardware (bus 00) Boot video device is 0000:00:06.0 ACPI: PCI Interrupt Routing Table [_SB_.PCI0._PRT] ACPI: PCI Interrupt Link [LMXC] (IRQs *7) ACPI: PCI Interrupt Link [LMXD] (IRQs *7) ACPI: PCI Interrupt Link [LPET] (IRQs *10) ACPI: PCI Interrupt Link [LMVI] (IRQs *18), disabled. ACPI: PCI Interrupt Link [LP1A] (IRQs) *0 ACPI: PCI Interrupt Link [LM1B] (IRQs *15) ACPI: PCI Interrupt Link [LPUS] (IRQs *4) ACPI: PCI Root Bridge [PCI2] (0000:02) PCI: Probing PCI hardware (bus 02) ACPI: PCI Interrupt Routing Table [_SB_.PCI2._PRT] ACPI: PCI Interrupt Link [LPSA] (IRQs) *0 ACPI: PCI Interrupt Link [LPSB] (IRQs) *0 ACPI: PCI Interrupt Link [LP5A] (IRQs) *0 ACPI: PCI Interrupt Link [LM5B] (IRQs *11) ACPI: PCI Interrupt Link [LP6A] (IRQs *11) ACPI: PCI Interrupt Link [LM6B] (IRQs *15) ACPI: PCI Root Bridge [PCI5] (0000:05) PCI: Probing PCI hardware (bus 05) ACPI: PCI Interrupt Routing Table [_SB_.PCI5._PRT] ACPI: PCI Interrupt Link [LP2A] (IRQs *9) ACPI: PCI Interrupt Link [LM2B] (IRQs *4) ACPI: PCI Interrupt Link [LP3A] (IRQs) *0 ACPI: PCI Interrupt Link [LM3B] (IRQs *9) ACPI: PCI Interrupt Link [LP4A] (IRQs) *0 ACPI: PCI Interrupt Link [LM4B] (IRQs *10) Linux Plug and Play Support v0.97 (c) Adam Belay pnp: PnP ACPI init pnp: PnP ACPI: found 16 devices usbcore: registered new driver usbfs usbcore: registered new driver hub PCI: Using ACPI for IRQ routing PCI: If a device doesn't work, try "pci=routeirq". If it helps, post a report pnp: 00:01: ioport range 0x438-0x439 has been reserved pnp: 00:01: ioport range 0x430-0x437 has been reserved pnp: 00:0d: ioport range 0x600-0x600 has been reserved pnp: 00:0d: ioport range 0x900-0x90f has been reserved pnp: 00:0d: ioport range 0x374-0x375 has been reserved pnp: 00:0d: ioport range 0x377-0x377 has been reserved pnp: 00:0d: ioport range 0xf50-0xf58 has been reserved PCI: Bridge: 0000:05:02.0 IO window: disabled. MEM window: eb000000-ec0fffff PREFETCH window: ecf00000-ecffffff NET: Registered protocol family 2 IP route cache hash table entries: 65536 (order: 6, 262144 bytes) TCP established hash table entries: 131072 (order: 9, 2621440 bytes) TCP bind hash table entries: 65536 (order: 8, 1310720 bytes) TCP: Hash tables configured (established 131072 bind 65536) TCP reno registered IBM machine detected. Enabling interrupts during APM calls. apm: BIOS not found. audit: initializing netlink socket (disabled) audit(1151597294.096:1): initialized highmem bounce pool size: 64 pages Total HugeTLB memory allocated, 0 VFS: Disk quotas dquot_6.5.1 Dquot-cache hash table entries: 1024 (order 0, 4096 bytes) SELinux: Registering netfilter hooks Initializing Cryptographic API ksign: Installing public key data Loading keyring - Added public key A81E23FA1B61875E - User ID: Red Hat, Inc. (Kernel Module GPG key) io scheduler noop registered io scheduler anticipatory registered io scheduler deadline registered io scheduler cfq registered (default) pci_hotplug: PCI Hot Plug PCI Core version: 0.5 isapnp: Scanning for PnP cards... isapnp: No Plug & Play device found Real Time Clock Driver v1.12ac Non-volatile memory driver v1.2 Linux agpgart interface v0.101 (c) Dave Jones agpgart: unable to determine aperture size. agpgart: agp_backend_initialize() failed. agpgart-serverworks: probe of 0000:00:00.0 failed with error -22 agpgart: unable to determine aperture size. agpgart: agp_backend_initialize() failed. agpgart-serverworks: probe of 0000:00:00.1 failed with error -22 agpgart: ServerWorks CNB20HE is unsupported due to lack of documentation. agpgart: ServerWorks CNB20HE is unsupported due to lack of documentation. Serial: 8250/16550 driver $Revision: 1.90 $ 4 ports, IRQ sharing enabled pnp: Device 00:05 activated. 00:05: ttyS0 at I/O 0x3f8 (irq = 14) is a 16550A pnp: Device 00:06 activated. 00:06: ttyS1 at I/O 0x2f8 (irq = 15) is a 16550A RAMDISK driver initialized: 16 RAM disks of 16384K size 1024 blocksize Uniform Multi-Platform E-IDE driver Revision: 7.00alpha2 ide: Assuming 33MHz system bus speed for PIO modes; override with idebus=xx SvrWks OSB4: IDE controller at PCI slot 0000:00:0f.1 SvrWks OSB4: chipset revision 0 SvrWks OSB4: not 100% native mode: will probe irqs later ide0: BM-DMA at 0x0700-0x0707, BIOS settings: hda:DMA, hdb:DMA ide1: BM-DMA at 0x0708-0x070f, BIOS settings: hdc:pio, hdd:pio Probing IDE interface ide0... hda: LTN485S, ATAPI CD/DVD-ROM drive ide0 at 0x1f0-0x1f7,0x3f6 on irq 14 Probing IDE interface ide1... Probing IDE interface ide1... hda: ATAPI 48X CD-ROM drive, 120kB Cache, (U)DMA Uniform CD-ROM driver Revision: 3.20 ide-floppy driver 0.99.newide usbcore: registered new driver libusual usbcore: registered new driver hiddev usbcore: registered new driver usbhid drivers/usb/input/hid-core.c: v2.6:USB HID core driver PNP: PS/2 Controller [PNP0303:PS2K,PNP0f13:PS2M] at 0x60,0x64 irq 1,12 serio: i8042 AUX port at 0x60,0x64 irq 12 serio: i8042 KBD port at 0x60,0x64 irq 1 mice: PS/2 mouse device common for all mice md: md driver 0.90.3 MAX_MD_DEVS=256, MD_SB_DISKS=27 md: bitmap version 4.39 TCP bic registered Initializing IPsec netlink socket NET: Registered protocol family 1 NET: Registered protocol family 17 Using IPI No-Shortcut mode ACPI wakeup devices: PCI0 ACPI: (supports S0 S4 S5) Freeing unused kernel memory: 236k freed Write protecting the kernel read-only data: 947k input: AT Translated Set 2 keyboard as /class/input/input0 SCSI subsystem initialized megaraid cmm: 2.20.2.6 (Release Date: Mon Mar 7 00:01:03 EST 2005) megaraid: 2.20.4.8 (Release Date: Mon Apr 11 12:27:22 EST 2006) megaraid: probe new device 0x1000:0x0407:0x1000:0x0531: bus 6:slot 0:func 0 ACPI: PCI Interrupt 0000:06:00.0[A] -> GSI 21 (level, low) -> IRQ 177 megaraid: fw version:[414C] bios version:[H429] scsi0 : LSI Logic MegaRAID driver scsi[0]: scanning scsi channel 0 [Phy 0] for non-raid devices input: ImPS/2 Generic Wheel Mouse as /class/input/input1 Vendor: IBM Model: YGLv3 S2 Rev: 0 Type: Processor ANSI SCSI revision: 02 scsi[0]: scanning scsi channel 1 [Phy 1] for non-raid devices Vendor: IBM Model: YGHv3 S2 Rev: 0 Type: Processor ANSI SCSI revision: 02 scsi[0]: scanning scsi channel 2 [Phy 2] for non-raid devices Vendor: IBM Model: EXP300 S160 Rev: D014 Type: Processor ANSI SCSI revision: 03 scsi[0]: scanning scsi channel 3 [Phy 3] for non-raid devices Vendor: IBM Model: EXP300 S160 Rev: D014 Type: Processor ANSI SCSI revision: 03 scsi[0]: scanning scsi channel 4 [virtual] for logical drives Vendor: MegaRAID Model: LD 0 RAID5 34G Rev: 414C Type: Direct-Access ANSI SCSI revision: 02 SCSI device sda: 71077888 512-byte hdwr sectors (36392 MB) sda: Write Protect is off sda: Mode Sense: 00 00 00 00 sda: asking for cache data failed sda: assuming drive cache: write through SCSI device sda: 71077888 512-byte hdwr sectors (36392 MB) sda: Write Protect is off sda: Mode Sense: 00 00 00 00 sda: asking for cache data failed sda: assuming drive cache: write through sda: sda1 sda2 sda3 sda4 < sda5 sda6 > sd 0:4:0:0: Attached scsi disk sda Vendor: MegaRAID Model: LD 1 RAID5 70G Rev: 414C Type: Direct-Access ANSI SCSI revision: 02 SCSI device sdb: 143622144 512-byte hdwr sectors (73535 MB) sdb: Write Protect is off sdb: Mode Sense: 00 00 00 00 sdb: asking for cache data failed sdb: assuming drive cache: write through SCSI device sdb: 143622144 512-byte hdwr sectors (73535 MB) sdb: Write Protect is off sdb: Mode Sense: 00 00 00 00 sdb: asking for cache data failed sdb: assuming drive cache: write through sdb: sdb1 sd 0:4:1:0: Attached scsi disk sdb Vendor: MegaRAID Model: LD 2 RAID5 225G Rev: 414C Type: Direct-Access ANSI SCSI revision: 02 SCSI device sdc: 462006272 512-byte hdwr sectors (236547 MB) sdc: Write Protect is off sdc: Mode Sense: 00 00 00 00 sdc: asking for cache data failed sdc: assuming drive cache: write through SCSI device sdc: 462006272 512-byte hdwr sectors (236547 MB) sdc: Write Protect is off sdc: Mode Sense: 00 00 00 00 sdc: asking for cache data failed sdc: assuming drive cache: write through sdc: sdc1 sd 0:4:2:0: Attached scsi disk sdc kjournald starting. Commit interval 5 seconds EXT3-fs: mounted filesystem with ordered data mode. SELinux: Disabled at runtime. SELinux: Unregistering netfilter hooks audit(1151597306.733:2): selinux=0 auid=4294967295 0:0:8:0: Attached scsi generic sg0 type 3 0:1:9:0: Attached scsi generic sg1 type 3 0:2:15:0: Attached scsi generic sg2 type 3 0:3:15:0: Attached scsi generic sg3 type 3 sd 0:4:0:0: Attached scsi generic sg4 type 0 sd 0:4:1:0: Attached scsi generic sg5 type 0 sd 0:4:2:0: Attached scsi generic sg6 type 0 Floppy drive(s): fd0 is 1.44M FDC 0 is a National Semiconductor PC87306 pcnet32.c:v1.32 18.Mar.2006 tsbogend@alpha.franken.de ACPI: PCI Interrupt 0000:00:05.0[A] -> GSI 16 (level, low) -> IRQ 185 pcnet32: PCnet/FAST III 79C975 at 0x2200, 00 06 29 f6 2b fd assigned IRQ 185. pcnet32: Found PHY 0000:6b60 at address 30. eth0: registered as PCnet/FAST III 79C975 pcnet32: 1 cards_found. r8169 Gigabit Ethernet driver 2.2LK-NAPI loaded ACPI: PCI Interrupt 0000:02:06.0[A] -> GSI 25 (level, low) -> IRQ 193 eth1: Identified chip type is 'RTL8169s/8110s'. eth1: RTL8169 at 0xf8806c00, 00:08:a1:95:75:3a, IRQ 193 piix4_smbus 0000:00:0f.0: Found 0000:00:0f.0 device piix4_smbus 0000:00:0f.0: Unusual config register value piix4_smbus 0000:00:0f.0: Try using fix_hstcfg=1 if you experience problems piix4_smbus 0000:00:0f.0: Illegal Interrupt configuration (or code out of date)! ohci_hcd: 2005 April 22 USB 1.1 'Open' Host Controller (OHCI) Driver (PCI) ACPI: PCI Interrupt 0000:00:0f.2[A] -> GSI 19 (level, low) -> IRQ 201 ohci_hcd 0000:00:0f.2: OHCI Host Controller ohci_hcd 0000:00:0f.2: new USB bus registered, assigned bus number 1 ohci_hcd 0000:00:0f.2: irq 201, io mem 0xfebfe000 usb usb1: configuration #1 chosen from 1 choice hub 1-0:1.0: USB hub found hub 1-0:1.0: 4 ports detected ACPI: Power Button (FF) [PWRF] ibm_acpi: ec object not found md: Autodetecting RAID arrays. md: autorun ... md: ... autorun DONE. device-mapper: 4.6.0-ioctl (2006-02-17) initialised: dm-devel@redhat.com EXT3 FS on sda3, internal journal SGI XFS with ACLs, security attributes, large block numbers, no debug enabled SGI XFS Quota Management subsystem XFS mounting filesystem sda6 Ending clean XFS mount for filesystem: sda6 XFS mounting filesystem sdb1 Ending clean XFS mount for filesystem: sdb1 XFS mounting filesystem sdc1 Ending clean XFS mount for filesystem: sdc1 Adding 2096440k swap on /dev/sda5. Priority:-1 extents:1 across:2096440k r8169: eth1: link up audit(1151590121.991:3): audit_pid=1928 old=0 by auid=4294967295 lp: driver loaded but no devices found setup_irq: irq handler mismatch <c0447c64> setup_irq+0xef/0xfc <c0540601> serial8250_interrupt +0x0/0xe2 <c0447dcc> request_irq+0x6d/0x89 <c0540269> serial8250_startup +0x2f3/0x434 <c053cf5b> uart_startup+0x65/0x10a <c053d1c1> uart_open+0x1c1/0x3d0 <c0524475> init_dev+0x3c6/0x4f6 <c0525881> tty_open+0x181/0x2d4 <c046ed80> chrdev_open+0x182/0x19e <c046ebfe> chrdev_open+0x0/0x19e <c04666db> __dentry_open+0xc6/0x1aa <c0466823> nameidata_to_filp +0x19/0x28 <c046685d> do_filp_open+0x2b/0x31 <c047b686> dput+0xf0/0x210 <c0466955> do_sys_open+0x3c/0xa9 <c04669ef> sys_open+0x16/0x18 <c0403d2f> syscall_call+0x7/0xb NET: Registered protocol family 10 lo: Disabled Privacy Extensions IPv6 over IPv4 tunneling driver eth1: no IPv6 routers present
On Thu, 2006-06-29 at 18:39 +0200, Hans Kristian Rosbach wrote:
System runs latest FC5 x86-64 kernel (2.6.17-1.2139_FC5) System might suddenly hang hard or reboot.
I have a ThinkPad T42p (Pentium M cpu).
With kernel-2.6.17-1.2139_FC5 I can get about 5 mins of uptime (about enough time to open my email and browser) before it freezes solid.
I'm going to setup netconsole (no serial port on this laptop) and see if I can capture anything useful.
Looks like others are having trouble too:
https://bugzilla.redhat.com/bugzilla/buglist.cgi?bug_status=NEW&bug_stat...
On Thu, 2006-06-29 at 18:39 +0200, Hans Kristian Rosbach wrote:
System runs latest FC5 x86-64 kernel (2.6.17-1.2139_FC5) System might suddenly hang hard or reboot.
Now on 2.6.17-1.2145_FC5 still shows problems, this time it complains about page_mapcount beeing -1. And after that there is a Kernel BUG message curiously mentioning /block/hdd/removable aswell. But this time it seems it's mysqld that crashed?
I'm confused as to what is causing this since memtest86+ does not find anything wrong. There are also no new bioses available.
-HK
Losing some ticks... checking if CPU frequency changed. Eeek! page_mapcount(page) went negative! (-1) page->flags = 100000000000060 page->count = 1 page->mapping = ffff81006ebca741 ----------- [cut here ] --------- [please bite here ] --------- Kernel BUG at mm/rmap.c:560 invalid opcode: 0000 [1] SMP last sysfs file: /block/hdd/removable CPU 1 Modules linked in: ipv6 autofs4 dm_mirror dm_mod video button battery acpi_memho tplug ac lp parport_pc parport sg i2c_nforce2 i2c_core floppy forcedeth raid1 ex t3 jbd sata_nv libata sd_mod scsi_mod Pid: 20302, comm: mysqld Not tainted 2.6.17-1.2145_FC5 #1 RIP: 0010:[<ffffffff8020a721>] <ffffffff8020a721>{page_remove_rmap+115} RSP: 0018:ffff81005c08fdd8 EFLAGS: 00010286 RAX: 00000000ffffffff RBX: ffff81007f474c40 RCX: 0000000000000001 RDX: 0000000000000000 RSI: 0000000000000246 RDI: ffffffff80548ad0 RBP: ffff81007992a400 R08: 00000000000000a0 R09: ffff81005c08fb28 R10: 0000000000000010 R11: 0000000000000000 R12: 00002aaad440d000 R13: ffff81005642b068 R14: 00002aaad4600000 R15: ffff810001020480 FS: 000000004514c940(0063) GS:ffff81007debd740(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00002aaad4865000 CR3: 000000007688f000 CR4: 00000000000006e0 Process mysqld (pid: 20302, threadinfo ffff81005c08e000, task ffff810066db8040) Stack: ffff81007f474c40 ffffffff80207b10 0000000000000000 ffff81005c08fec0 00002aaad4866000 00002aaad3e01000 ffff81005e2ff928 ffff81005c08fec8 00000000001f3000 0000000000000000 Call Trace: <ffffffff80207b10>{unmap_vmas+1063} <ffffffff802122ea>{unmap_region+ 185} <ffffffff802114bd>{do_munmap+528} <ffffffff802169ae>{sys_munmap +86} <ffffffff80262d8e>{system_call+126}
Code: 0f 0b 68 ca d0 47 80 c2 30 02 5b 48 83 ce ff bf 20 00 00 00 RIP <ffffffff8020a721>{page_remove_rmap+115} RSP <ffff81005c08fdd8>