Gentoo Forums
Gentoo Forums
Gentoo Forums
Quick Search: in
Hardware error after installation
View unanswered posts
View posts from last 24 hours

 
Reply to topic    Gentoo Forums Forum Index Installing Gentoo
View previous topic :: View next topic  
Author Message
rozyk
n00b
n00b


Joined: 07 May 2010
Posts: 44

PostPosted: Sat Feb 09, 2013 3:06 pm    Post subject: Hardware error after installation Reply with quote

Hi. I've just installed gentoo on my pc. And it keeps sending me message like this from time to time:

[Hardware Error]: CPU0 MC2_STATUS[Over|CE|-|-|AddrV]: 0xd4000000000015
[Hardware Error]: MC2_ADDR: 0x000000002130297e8
[Hardware Error]: Bus Unit Error: INSN error in a Page Descriptor Cache or Guest TLB.
[Hardware Error]: cache level: L1, tx: INSN

What does it mean, a what I did wrong? What should I do to repair this? Thanks for help.
Back to top
View user's profile Send private message
NeddySeagoon
Administrator
Administrator


Joined: 05 Jul 2003
Posts: 54028
Location: 56N 3W

PostPosted: Sat Feb 09, 2013 6:02 pm    Post subject: Reply with quote

rozyk,

Taken at face value, it means that CPU0 is dead. It really doesn't get any worse as CPU0 is the only CPU you cannot turn off in a multi core CPU.

If you are lucky, it may be RAM.
Run mtest86+from CD or USB, if it will run.

If nothing works, remove RAM sticks until only one remains.
Try each stick in turn in the same slot. ... then in another slot.

If everything is still dead, its the CPU, Motherboard, or PSU.

Don't go spending any money until you post back with your findings.
_________________
Regards,

NeddySeagoon

Computer users fall into two groups:-
those that do backups
those that have never had a hard drive fail.
Back to top
View user's profile Send private message
rozyk
n00b
n00b


Joined: 07 May 2010
Posts: 44

PostPosted: Sat Feb 09, 2013 6:08 pm    Post subject: Reply with quote

Hmm, ok I'll stick to your advice.

As for now, I think that it would be very strange if CPU was dead, because everything seems to work. I have run over 10h of OCCT on Windows- not a single error. I keep getting this error every ~3 minutes but the system doesn't hang up and everything works.

I'll post further foundings soon.
Back to top
View user's profile Send private message
NeddySeagoon
Administrator
Administrator


Joined: 05 Jul 2003
Posts: 54028
Location: 56N 3W

PostPosted: Sat Feb 09, 2013 6:14 pm    Post subject: Reply with quote

rozyk,

Odd. Check the CPU temperature. You may need to install lm-sensors

Is this a bought system or a home made system ?
_________________
Regards,

NeddySeagoon

Computer users fall into two groups:-
those that do backups
those that have never had a hard drive fail.
Back to top
View user's profile Send private message
rozyk
n00b
n00b


Joined: 07 May 2010
Posts: 44

PostPosted: Sat Feb 09, 2013 6:17 pm    Post subject: Reply with quote

Home made system. CPU temperature is ok. ~43 right now. Under OCCT 55 max. Voltages are also correct. Not a single hang-up or restart happened, but the error keeps and keeps showing. I even lowered the RAM and CPU frequency but still the same.
Back to top
View user's profile Send private message
NeddySeagoon
Administrator
Administrator


Joined: 05 Jul 2003
Posts: 54028
Location: 56N 3W

PostPosted: Sat Feb 09, 2013 6:38 pm    Post subject: Reply with quote

rozyk,

Thats good.

Don't place too much reliance in the voltages reported by software. They a fairly long term averages ...milliseconds anyway.
Whats important is the dynamic regulation of the voltages provided to the CPU.
There is not a lot you can do to check them, you need £1000s of pounds worth of 'scope.

If you happen to have a spare PSU, its worth a PSU swap.
_________________
Regards,

NeddySeagoon

Computer users fall into two groups:-
those that do backups
those that have never had a hard drive fail.
Back to top
View user's profile Send private message
rozyk
n00b
n00b


Joined: 07 May 2010
Posts: 44

PostPosted: Sat Feb 09, 2013 11:15 pm    Post subject: Reply with quote

I bumped vcore a little but with no effect. I will run memtest overnight. But is this warning really important and dangerous as the system is working normally? Maybe it's just some kind of bug?
Back to top
View user's profile Send private message
rozyk
n00b
n00b


Joined: 07 May 2010
Posts: 44

PostPosted: Sat Feb 09, 2013 11:40 pm    Post subject: Reply with quote

Update: I've also noticed that this error keeps showing every 5 minutes (exactly 5minutes=300seconds). Isn't that weird? :roll:

Edit: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/993758 They have the same error here on Ubuntu, but closed the bug topic for the reason I don't understand. :(

Quote:
I've got a AMD Phenom(tm) II X4 B45 Processor, it' sthe unlocked X2 545, running at 1.43v with NB of 1.25 to keep the ram stable (at 1600).

I can pass any ram/memory test, stress tests work fine too. I've ran prime for over 24hrs straight so I know this works well. I've got it at stock speeds, 3GHz, I can run it at ~1.45 @3.6GHz but don't due to seeing no point in the gain. I haven't had any hardware failures, crashes, no corruption, the only thing I ever dealt with in the last 3x or so years of doing this is 5 bad sectors in my 640G HDD. They weren't bad either, just corrupt, forced writes in linux and a new format later and it reports everything clean in SMART. I haven't checked it in a while but I'm sure it should be close to the same, I could do a quick check if anyone wants to see the info on it. It's a ASRock m3a770de motherboard, 3 hdds, two over 5 years old and still going strong (the third is the newish 640G). I'm running a GTX460, bios mod to increase voltage but can only do that in windows (due to lack of OC tools released in linux). I can't think of any other background history on the PC that should be relavent, BIOS is latest, that about sums it up.....

The problem is I keep getting MCEs, or minor kernel errors output to terminal through syslog. I've currently disabled syslog to keep the messages out, as they seem to happen randomly. Days go by, nothing happens, some days it's every seccond flooding my terminal. You can see why it's annoying. The thing is, they are all related to the CPU or the data bus. I haven't bothered to log them all, too lazy, though I wonder if this is related to my unlocked cores. My thoughts on this is that the cores require higher voltages, they don't require the 1.43v that I have. I can get away with the 1.425 (or whatever) and it runs fine stable there. 1.4v doesn't seem to work, so I figure if 1.42 is good I'll bump it up the one above just in case. I should also mention they only started happening with newer kernels, if I revert back to before 2.6.38 ( I believe?) they do not appear.

Now everything runs fine as I said, I was just wondering if there is a more elegant way of suppressing the messanges. I also have a concern about failing chipset (MB) or CPU, though it seems to run everything just fine with no crashing. I'll take the off chance that it'll fail sooner than later, I paid $80 for a quad core when they had just come out (C2 revision, I know it's a first model). My other thought on it was that it might be the IPC, since it has troubles with the RAM at 1600 and that's why I upped the NB to 1.25... Any thoughts on this? If it's failing slowly, I don't care, within a year I'll probably go AM3+ and get a Phenom II x6, I don't see the reason currently to buy a BD or anything else when games play accepably on this now. Maybe in 2-3 years, I just don't see games utalizing threads enough to say I need an x6 right now and a Phenom II x4 is still a nice chip.

Oh, I forgot, idle temps are generally at 32C max runs around ~50C.


@found on the net
Back to top
View user's profile Send private message
rozyk
n00b
n00b


Joined: 07 May 2010
Posts: 44

PostPosted: Sun Feb 10, 2013 12:24 am    Post subject: Reply with quote

Update2: I emerged mcelog and after running it I get:

Quote:
mcelog: AMD Processor family 16: Please load edac_mce_amd module.
: Success
CPU is unsupported


Maybe that's the reason and there isn't anything to worry about?
Back to top
View user's profile Send private message
NeddySeagoon
Administrator
Administrator


Joined: 05 Jul 2003
Posts: 54028
Location: 56N 3W

PostPosted: Sun Feb 10, 2013 11:07 am    Post subject: Reply with quote

rozyk,

Without reading the AMD CPU data sheet, I'm fairly sure that the internal CPU data structures, the caches and so on, have error correction.
Provided the errors are such that the error connection can handle the issue, the system will work almost normally and you will see these warnings.
It may be slightly slower, depending on the error correction, as the correction may take several clock cycles to operate.

The every five minutes may well be the logger - to minimise logspam.

If it were my system, and the CPU is still under warranty, I would return the CPU for a replacement.
_________________
Regards,

NeddySeagoon

Computer users fall into two groups:-
those that do backups
those that have never had a hard drive fail.
Back to top
View user's profile Send private message
Display posts from previous:   
Reply to topic    Gentoo Forums Forum Index Installing Gentoo All times are GMT
Page 1 of 1

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum