Gentoo Forums
Gentoo Forums
Gentoo Forums
Quick Search: in
[solved] Identifying hardware issue
View unanswered posts
View posts from last 24 hours

 
Reply to topic    Gentoo Forums Forum Index Kernel & Hardware
View previous topic :: View next topic  
Author Message
Spargeltarzan
Guru
Guru


Joined: 23 Jul 2017
Posts: 317

PostPosted: Fri Dec 08, 2017 4:17 pm    Post subject: [solved] Identifying hardware issue Reply with quote

Dear Community,

I splitted this from GCC-7.2 thread to keep that topic clean for gcc relevants.

Several segmentation faults occurring when compiling different packages. Most commonly llvm and libreoffice fails.

I re-checked my RAM with Memtest (4th time) and now it failed again after the last check without errors. This makes me believe, another component but not the RAM might be the issue since a RAM issue should be reproduceable every time I guess?


Maybe my Intel Core i7 became damaged because the heat paste started to be bad (had 99 ° C for a while, now is cool again when I replaced the heat paste - my PC is 4 Years old and I would like to try to repair it)

Do you have an idea how I can check all components? Will burn-in-tests like Prime95 or something similar be a good choice?

Maybe package "CPU-burn"?
_________________
___________________
Regards

Spargeltarzan

Notebook: Lenovo YOGA 900-13ISK: Gentoo stable amd64, GNOME systemd, KVM/QEMU
Desktop-PC: Intel Core i7-4770K, 8GB Ram, AMD Radeon R9 280X, ZFS Storage, GNOME openrc, Dantrell, Xen


Last edited by Spargeltarzan on Wed Jan 17, 2018 7:13 pm; edited 1 time in total
Back to top
View user's profile Send private message
krinn
Watchman
Watchman


Joined: 02 May 2003
Posts: 7470

PostPosted: Fri Dec 08, 2017 4:46 pm    Post subject: Reply with quote

i would say the best package should be app-admin/mcelog
Back to top
View user's profile Send private message
Spargeltarzan
Guru
Guru


Joined: 23 Jul 2017
Posts: 317

PostPosted: Sun Dec 10, 2017 10:17 am    Post subject: Reply with quote

Thanks for advice, I performed a recompile of llvm with mcelog enabled

[2849/3151] /usr/bin/x86_64-pc-linux-gnu-g++ -m32 -DCLANG_ENABLE_ARCMT -DCLANG_ENABLE_OBJC_REWRITER -DCLANG_ENABLE_STAT$
ninja: build stopped: subcommand failed.


The recompile was unsuccessful, but no errors in mcelog detected. (Remember llvm compiled in my system like 1-2 times out of 8 attempts successfully)

Any idea how I could proceed?
_________________
___________________
Regards

Spargeltarzan

Notebook: Lenovo YOGA 900-13ISK: Gentoo stable amd64, GNOME systemd, KVM/QEMU
Desktop-PC: Intel Core i7-4770K, 8GB Ram, AMD Radeon R9 280X, ZFS Storage, GNOME openrc, Dantrell, Xen
Back to top
View user's profile Send private message
krinn
Watchman
Watchman


Joined: 02 May 2003
Posts: 7470

PostPosted: Sun Dec 10, 2017 11:21 am    Post subject: Reply with quote

it work like that: kernel log cpu mce in dmesg, and mcelog read dmesg to output the logged error.
it mean your kernel should have CONFIG_X86_MCE and CONFIG_X86_MCE_INTEL enable, i'm not sure if the ebuild check they are enable.
but if they are in your kernel: it mean your CPU didn't report anything nasty (which, well, is a good news).
Intel cpu also report heat error thru mce (again good news so, your paste has done the work).

But not all compile errors comes from hardware, if you get oom, it's not an hardware error, while the result is a build failure.
I would say, try building it with -j1, because 8Gb could be burn easy. If it succeed, repeat 8 times :)
Back to top
View user's profile Send private message
Spargeltarzan
Guru
Guru


Joined: 23 Jul 2017
Posts: 317

PostPosted: Sun Dec 10, 2017 11:33 am    Post subject: Reply with quote

Krinn,

I had to activate X86_MCELOG_LEGACY in order to be able to start mcelog, I checked now for CONFIG_X86_MCE and CONFIG_X86_MCE_INTEL and they are enabled too.

I repeated with emerge -1 -j1 llvm and it crashed already at 52/3151 with "subcommand failed" after some seconds. (my system wasn't rebooted in since the last crash) - could this tell us something?

Again no log entry in /var/log/mcelog or mcelog --client.

I will reboot now and try again, maybe the last "subcommand failed" caused some fault in the build environment prior the reboot.

ADD after reboot it again crashed instantly at 52/3151. I compile now with "NINJAOPTS="-j2" emerge -1av llvm", because my mapping from MAKEOPTS to NINJAOPTS is not working correctly (realised already before - any idea what could be misconfigured?). Currently it is at position 570/3151, I will let you know if it succeeds

ADD with NINJAOPTS="-j2" it crashed (while I did a second emerge of another package)
ADD with "NINJAOPTS="-j4" it crashed (while browsing Internet)
ADD with "NINJAOPTS="-j1" it was successful. Within 3h,7min instead of 50min normally.

On my notebook, also with 8 GB RAM, llvm compilation is always smooth.
Mcelog always clean.

My Ram never is more loaded than ~60%, with NINJAOPTS -j1 only around ~38%.
I do not want believe llvm is so sensitive on my system... 3h llvm
_________________
___________________
Regards

Spargeltarzan

Notebook: Lenovo YOGA 900-13ISK: Gentoo stable amd64, GNOME systemd, KVM/QEMU
Desktop-PC: Intel Core i7-4770K, 8GB Ram, AMD Radeon R9 280X, ZFS Storage, GNOME openrc, Dantrell, Xen
Back to top
View user's profile Send private message
eccerr0r
Watchman
Watchman


Joined: 01 Jul 2004
Posts: 9679
Location: almost Mile High in the USA

PostPosted: Sun Dec 10, 2017 8:00 pm    Post subject: Reply with quote

You seem to have a K-series cpu like mine, though mine's quite a bit older.

Try underclocking and see if it makes any difference.

I had to replace the heatsink compound once already though luckily I replaced it before any compile errors occurred. I just noted the temperatures were increasing for no good reason; after cleaning the heat sink really well and replacing the thermal interface material, temperatures went back down to what they used to be.

The newer Intel CPUs should throttle when hot, so this isn't a good sign. You might also want to consider motherboard or PSU issues.
_________________
Intel Core i7 2700K/Radeon R7 250/24GB DDR3/256GB SSD
What am I supposed watching?
Back to top
View user's profile Send private message
Spargeltarzan
Guru
Guru


Joined: 23 Jul 2017
Posts: 317

PostPosted: Sun Dec 10, 2017 9:54 pm    Post subject: Reply with quote

I cleaned already my heat sink and changed the heat paste, so now I have good temperature (also on load). Before the maximum was 99 ° C, the CPU throttled at this threshold.

Since no issues in mcelog occur, (as Krinn already said), there might be no CPU damage.
I will try your hint to underclock the machine.

Currently the hard facts are:
-) Compilation only with NINJAOPTS -j1 successful (I will re-test it more often as Krinn proposed, just one test takes ~3h)
-) Memtest86 identifies memory issues non-reproducible in several tests.

I am considering Motherboard and PSU issues, but have neither of it for testing. Thus, I am thinking to bring my Computer to a service center...
_________________
___________________
Regards

Spargeltarzan

Notebook: Lenovo YOGA 900-13ISK: Gentoo stable amd64, GNOME systemd, KVM/QEMU
Desktop-PC: Intel Core i7-4770K, 8GB Ram, AMD Radeon R9 280X, ZFS Storage, GNOME openrc, Dantrell, Xen
Back to top
View user's profile Send private message
eccerr0r
Watchman
Watchman


Joined: 01 Jul 2004
Posts: 9679
Location: almost Mile High in the USA

PostPosted: Sun Dec 10, 2017 10:16 pm    Post subject: Reply with quote

Oddly enough, if you run out of memory and have to swap, during the dead time the CPU gets a breather and can cool down...

:)
_________________
Intel Core i7 2700K/Radeon R7 250/24GB DDR3/256GB SSD
What am I supposed watching?
Back to top
View user's profile Send private message
Spargeltarzan
Guru
Guru


Joined: 23 Jul 2017
Posts: 317

PostPosted: Sun Dec 10, 2017 10:30 pm    Post subject: Reply with quote

Unfortunately, I do not run out of memory :D
_________________
___________________
Regards

Spargeltarzan

Notebook: Lenovo YOGA 900-13ISK: Gentoo stable amd64, GNOME systemd, KVM/QEMU
Desktop-PC: Intel Core i7-4770K, 8GB Ram, AMD Radeon R9 280X, ZFS Storage, GNOME openrc, Dantrell, Xen
Back to top
View user's profile Send private message
Spargeltarzan
Guru
Guru


Joined: 23 Jul 2017
Posts: 317

PostPosted: Mon Dec 11, 2017 9:00 pm    Post subject: Reply with quote

Today I brought my RAM to my retailer, he identified issues too. They are in guarantee, 4-6 weeks waiting... many thanks to Corsair!

ADD: With new RAM no segmentation faults any more. The issue should be closed. Thanks for support!
_________________
___________________
Regards

Spargeltarzan

Notebook: Lenovo YOGA 900-13ISK: Gentoo stable amd64, GNOME systemd, KVM/QEMU
Desktop-PC: Intel Core i7-4770K, 8GB Ram, AMD Radeon R9 280X, ZFS Storage, GNOME openrc, Dantrell, Xen
Back to top
View user's profile Send private message
Display posts from previous:   
Reply to topic    Gentoo Forums Forum Index Kernel & Hardware All times are GMT
Page 1 of 1

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum