View previous topic :: View next topic |
Author |
Message |
ksool Guru


Joined: 27 May 2006 Posts: 337 Location: Cambridge, MA
|
Posted: Thu Nov 29, 2007 3:10 am Post subject: Deciphering mcelog after hard lock up |
|
|
I've got a dual opteron with 8GB ram that occasionally locks under heavy load (no ethernet, no terminal).
I thought the problem was bad ram so I ran memtest86 for three days but it failed to find anything.
Here's the mcelog:
Code: |
cat /var/log/mcelog
MCE 0
HARDWARE ERROR. This is *NOT* a software problem!
Please contact your hardware vendor
CPU 1 0 data cache TSC 56d93012b7c9
ADDR 1f61bfe40
Data cache ECC error (syndrome 51)
bit46 = corrected ecc error
bit62 = error overflow (multiple errors)
bus error 'local node origin, request didn't time out
data read mem transaction
memory access, level generic'
STATUS d428c00000000833 MCGSTATUS 0
MCE 1
HARDWARE ERROR. This is *NOT* a software problem!
Please contact your hardware vendor
CPU 1 2 bus unit TSC 56d93012c175
L2 cache ECC error
Bus or cache array error
bit46 = corrected ecc error
bit62 = error overflow (multiple errors)
bus error 'local node origin, request didn't time out
prefetch mem transaction
memory access, level generic'
STATUS d000400000000863 MCGSTATUS 0
MCE 2
HARDWARE ERROR. This is *NOT* a software problem!
Please contact your hardware vendor
CPU 1 4 northbridge TSC 56d93012c535
ADDR 1f61bfe78
Northbridge ECC error
ECC syndrome = 51
bit32 = err cpu0
bit46 = corrected ecc error
bus error 'local node origin, request didn't time out
generic read mem transaction
memory access, level generic'
STATUS 9428c00100000813 MCGSTATUS 0
MCE 3
HARDWARE ERROR. This is *NOT* a software problem!
Please contact your hardware vendor
CPU 1 4 northbridge TSC 741f0900e073
RIP 33:5cabec ADDR 1f4a77bf0
Northbridge ECC error
ECC syndrome = 5
bit45 = uncorrected ecc error
bit61 = error uncorrected
bit62 = error overflow (multiple errors)
bus error 'local node response, request didn't time out
generic read mem transaction
memory access, level generic'
STATUS f402a00000000a13 MCGSTATUS 7
MCE 0
HARDWARE ERROR. This is *NOT* a software problem!
Please contact your hardware vendor
CPU 1 4 northbridge TSC 73ac2df94de49
ADDR 1d2b97c78
Northbridge ECC error
ECC syndrome = 5b
bit32 = err cpu0
bit46 = corrected ecc error
bus error 'local node origin, request didn't time out
generic read mem transaction
memory access, level generic'
STATUS 942dc00100000813 MCGSTATUS 0
MCE 1
HARDWARE ERROR. This is *NOT* a software problem!
Please contact your hardware vendor
CPU 1 0 data cache TSC 76e2f811f9c8e
ADDR 1d09b7f40
Data cache ECC error (syndrome a4)
bit46 = corrected ecc error
bus error 'local node origin, request didn't time out
data read mem transaction
memory access, level generic'
STATUS 9452400000000833 MCGSTATUS 0
MCE 2
HARDWARE ERROR. This is *NOT* a software problem!
Please contact your hardware vendor
CPU 1 4 northbridge TSC 7742a4bf1d2c3
ADDR 1f307ff70
Northbridge ECC error
ECC syndrome = 57
bit32 = err cpu0
bit46 = corrected ecc error
bus error 'local node origin, request didn't time out
generic read mem transaction
memory access, level generic'
STATUS 942bc00100000813 MCGSTATUS 0
MCE 0
HARDWARE ERROR. This is *NOT* a software problem!
Please contact your hardware vendor
CPU 1 4 northbridge TSC d4a65cbc90c
RIP 33:5ce0b4 ADDR 1f321f778
Northbridge ECC error
ECC syndrome = 6
bit45 = uncorrected ecc error
bit61 = error uncorrected
bit62 = error overflow (multiple errors)
bus error 'local node response, request didn't time out
generic read mem transaction
memory access, level generic'
STATUS f403200000000a13 MCGSTATUS 7
MCE 0
HARDWARE ERROR. This is *NOT* a software problem!
Please contact your hardware vendor
CPU 1 4 northbridge TSC 8b4f97d40194f
RIP 33:5caceb ADDR 1dd5b74b8
Northbridge ECC error
ECC syndrome = c
bit45 = uncorrected ecc error
bit61 = error uncorrected
bus error 'local node response, request didn't time out
generic read mem transaction
memory access, level generic'
STATUS b406200000000a13 MCGSTATUS 7
MCE 0
HARDWARE ERROR. This is *NOT* a software problem!
Please contact your hardware vendor
CPU 1 2 bus unit TSC 229d6e2c9cb09
L2 cache ECC error
Bus or cache array error
bit46 = corrected ecc error
bit62 = error overflow (multiple errors)
bus error 'local node origin, request didn't time out
prefetch mem transaction
memory access, level generic'
STATUS d000400000000863 MCGSTATUS 0
MCE 1
HARDWARE ERROR. This is *NOT* a software problem!
Please contact your hardware vendor
CPU 1 4 northbridge TSC 229d6e2c9d58c
ADDR 1f42777b8
Northbridge ECC error
ECC syndrome = 5d
bit32 = err cpu0
bit46 = corrected ecc error
bus error 'local node origin, request didn't time out
generic read mem transaction
memory access, level generic'
STATUS 942ec00100000813 MCGSTATUS 0
MCE 0
HARDWARE ERROR. This is *NOT* a software problem!
Please contact your hardware vendor
CPU 1 4 northbridge TSC c178d1d8781
ADDR 1dd2bff78
Northbridge ECC error
ECC syndrome = 5b
bit46 = corrected ecc error
bus error 'local node response, request didn't time out
generic read mem transaction
memory access, level generic'
STATUS 942dc00000000a13 MCGSTATUS 0
MCE 0
HARDWARE ERROR. This is *NOT* a software problem!
Please contact your hardware vendor
CPU 1 4 northbridge TSC 60cafdf541181
ADDR 15fab7770
Northbridge ECC error
ECC syndrome = 54
bit46 = corrected ecc error
bus error 'local node response, request didn't time out
generic read mem transaction
memory access, level generic'
STATUS 942a400000000a13 MCGSTATUS 0
MCE 1
HARDWARE ERROR. This is *NOT* a software problem!
Please contact your hardware vendor
CPU 1 0 data cache TSC 62a48ee35c73b
ADDR 1938d7cc0
Data cache ECC error (syndrome bc)
bit46 = corrected ecc error
bit62 = error overflow (multiple errors)
bus error 'local node origin, request didn't time out
data read mem transaction
memory access, level generic'
STATUS d45e400000000833 MCGSTATUS 0
MCE 0
HARDWARE ERROR. This is *NOT* a software problem!
Please contact your hardware vendor
CPU 1 2 bus unit TSC 7dd527768b691
L2 cache ECC error
Bus or cache array error
bit46 = corrected ecc error
bus error 'local node origin, request didn't time out
prefetch mem transaction
memory access, level generic'
STATUS 9000400000000863 MCGSTATUS 0
MCE 0
HARDWARE ERROR. This is *NOT* a software problem!
Please contact your hardware vendor
CPU 1 4 northbridge TSC 723f90fe3872
RIP 23:818e05b ADDR 1f91577b8
Northbridge ECC error
ECC syndrome = a
bit45 = uncorrected ecc error
bit61 = error uncorrected
bus error 'local node response, request didn't time out
generic read mem transaction
memory access, level generic'
STATUS b405200000000a13 MCGSTATUS 7
|
Any ideas what it could be if its not bad ram? TIA |
|
Back to top |
|
 |
BradN Advocate


Joined: 19 Apr 2002 Posts: 2391 Location: Wisconsin (USA)
|
Posted: Thu Nov 29, 2007 3:33 am Post subject: |
|
|
I'm seeing a lot of "Northbridge ECC error" - these would seem to suggest a bad motherboard or processor, or possibly a bad power supply.
Also, I'm not sure if "CPU 1" means the second CPU is indicating the error, or what, but maybe it'd be worth disabling the second processor and seeing if the problem goes away. If that fixes it, you could then swap the physical processors and mostly rule out a bad processor. |
|
Back to top |
|
 |
|
|
You cannot post new topics in this forum You cannot reply to topics in this forum You cannot edit your posts in this forum You cannot delete your posts in this forum You cannot vote in polls in this forum
|
|