View previous topic :: View next topic |
Author |
Message |
treffer Apprentice
Joined: 14 Dec 2004 Posts: 150
|
Posted: Fri Jan 24, 2014 10:37 pm Post subject: What to do after removing bad ram? |
|
|
Hi,
I recently noticed that my Gentoo laptop was extremly unstable (emerge would randomly fail to build a package, the android source tree was near impossible to build, random graphics glitches under load and full system crashes)....
It turns out that one memory module had some problems. Just some cells out of 16GB of RAM, small enough to go undetected for at least a month.
I do not trust binaries/data produced on the system. (Obviously)
Is there a recommended way to gain at least some confidence about the state of my system?
I'm currently recompiling the kernel as bitflips and errors in that binary could kill hardware. I'm also thinking about a reemerge of @system. Anything else I could try? Apart from a heavy @world recompile (my @world is HUGE: kde, gnome, mate, all tex stuff, java + IDEs, ...) _________________ root@localhost# whois POEM-RIPE55-SONG
root@localhost# : ( ) { : | : & } ; : |
|
Back to top |
|
|
PaulBredbury Watchman
Joined: 14 Jul 2005 Posts: 7310
|
Posted: Fri Jan 24, 2014 10:53 pm Post subject: |
|
|
Running memtest overnight can give some confidence.
It's not a comprehensive test, because other system components aren't being stressed at the same time. |
|
Back to top |
|
|
treffer Apprentice
Joined: 14 Dec 2004 Posts: 150
|
Posted: Fri Jan 24, 2014 11:35 pm Post subject: |
|
|
PaulBredbury wrote: | Running memtest overnight can give some confidence. |
That's how I found the broken RAM. The problem is I know that it corrupted builds. I don't know if a broken binary got through the build (e.g. the .o file was damaged within a large function and the resulting .so will crashing any caller). _________________ root@localhost# whois POEM-RIPE55-SONG
root@localhost# : ( ) { : | : & } ; : |
|
Back to top |
|
|
Logicien Veteran
Joined: 16 Sep 2005 Posts: 1555 Location: Montréal
|
Posted: Sat Jan 25, 2014 1:17 am Post subject: |
|
|
Now that you remove the bad ram, if you do not see the behavior you described when your bad ram was in 'service'
treffer wrote: | (emerge would randomly fail to build a package, the android source tree was near impossible to build, random graphics glitches under load and full system crashes |
that's an indication that your emerge builds are good. If not completely sure, you can use the emerge option
Code: | --emptytree (-e)
Reinstalls target atoms and their entire deep dependency tree, as though no packages are currently installed. You should run this with --pretend first to make sure the
result is what you expect. |
But you can stay in doubt even with this option because if the binaries used to do so, Gcc, Glibc, Ld and so on are broken, the binaries result can be broken too. So at this point I do not see any other solution than use an other Gentoo host to recompile your entire world if possible, or reinstall Gentoo from zero. _________________ Paul |
|
Back to top |
|
|
shazeal Apprentice
Joined: 03 May 2006 Posts: 206 Location: New Zealand
|
Posted: Sat Jan 25, 2014 8:13 am Post subject: |
|
|
treffer wrote: | PaulBredbury wrote: | Running memtest overnight can give some confidence. |
That's how I found the broken RAM. The problem is I know that it corrupted builds. I don't know if a broken binary got through the build (e.g. the .o file was damaged within a large function and the resulting .so will crashing any caller). |
You can 'emerge -e world' as above. I had a similar issue some time ago. I did the emerge -e, however the gcc compiler and glibc were corrupted themselves so things did not build correctly. Ended up just reinstalling the system using the old world file as I didnt trust my backups either.
If you can run a full -e then your system should be fine. _________________ CFLAGS="-OmgWTFR1CE --fun-lol-loops --march=asmx86go" |
|
Back to top |
|
|
PaulBredbury Watchman
Joined: 14 Jul 2005 Posts: 7310
|
Posted: Sat Jan 25, 2014 10:25 am Post subject: |
|
|
You should run memtest again, like I said, to gain confidence that your system is now reliable.
Then recompile everything, from the bottom up. |
|
Back to top |
|
|
|