Gentoo Forums
Gentoo Forums
Gentoo Forums
Quick Search: in
sys-apps/memtest86+-5.01 just rebooting
View unanswered posts
View posts from last 24 hours

 
Reply to topic    Gentoo Forums Forum Index Kernel & Hardware
View previous topic :: View next topic  
Author Message
Quincy
Apprentice
Apprentice


Joined: 02 Jun 2005
Posts: 201
Location: Germany

PostPosted: Tue Feb 03, 2015 10:45 am    Post subject: sys-apps/memtest86+-5.01 just rebooting Reply with quote

Hi all,

I'm trying to track down RAM problems in some of my old hardware (Athlon X2 64 Dual Core on Asus A8N-E, 4x 512MB Kingston HyperX RAM) and therefore tried to install memtest86+-5.01 (testing). It builds and installs without errors, but when rebooting and selecting the entry in the grub1 menu it just shows a blank screen and immediately reboots. Version 4.20-r1 (stable) works as expected.
I also tried to install it on some newer system, but it shows the same result.

Did anybody manage to get this working? I seldom experienced "testing" to be unusable and also did not find any open bug regarding this.
Back to top
View user's profile Send private message
Roman_Gruber
Advocate
Advocate


Joined: 03 Oct 2006
Posts: 3846
Location: Austro Bavaria

PostPosted: Tue Feb 03, 2015 3:09 pm    Post subject: Reply with quote

just use a livecd like sysrescuecd which provides memtest

and if you want to track down memory problems i would not start with all modules equiped at all, i would remove as many components as possible and than check.

just remove as much as possible and try again.

it could be a not proper power supply, wrong calculated power and therfore hte power supply is wrong, the components have some conflicts, ... and other issues.

if you have only a few components in use and you see if those work flawless you can add more. if only these few do not work at all it is mobo or other fault. remove any not essential piece which is not needed for basic operation. all network cards, sound cars, ö...
Back to top
View user's profile Send private message
NeddySeagoon
Administrator
Administrator


Joined: 05 Jul 2003
Posts: 54216
Location: 56N 3W

PostPosted: Tue Feb 03, 2015 7:58 pm    Post subject: Reply with quote

Quincy,

RAM problems may not be due to the RAM itself. To work properly the RAM depends on lots of other things.

As tw04l124 says. Remove everything you don't need. Then test with just one memory module fitted.
A favourite failure is the capacitors in the regulater for the CPU Vcore. These are located near the CPU.
Domed tops, bulging bottoms and/or leaking contents are all bad signs.

The Asus A8N-E is only a 4 layer board. These parts can be replaced if you get good quality parts and are confident with a soldering iron.
I have a 12 year old Asus A8N-E.

If you past some pictures of the CPU area, I'll look them over.
_________________
Regards,

NeddySeagoon

Computer users fall into two groups:-
those that do backups
those that have never had a hard drive fail.
Back to top
View user's profile Send private message
Quincy
Apprentice
Apprentice


Joined: 02 Jun 2005
Posts: 201
Location: Germany

PostPosted: Tue Feb 03, 2015 9:39 pm    Post subject: Reply with quote

Thank you both for your advice!

(Un?)fortunately I already know that my problem are RAM timings, there is no need to find the source. Up to now the problem is, that I can only reproduce the error by copying files and comparing checksums afterwards which will give spurious bit flips (they go away completely with very relaxed RAM timings). memtest86+-4.20-r1 does not find any error after doing tests for hours, so I wanted to give the new version a try.
Now I just tried to use the precompiled binary from their homepage (instead of the ebuild, just exchanged binaries) together with my system and it restarts promptly, too. A bootable USB stick which I finally got working on the other system does not boot with the Asus board.
So my "mission" is still to get it installed on the HDD...

@NeddySeagoon: Once I had an Abit BE6 with capacitors blowing up after some years, so I know what you are talking about. I don't use it any more, but it got several times replacements :-)
Back to top
View user's profile Send private message
NeddySeagoon
Administrator
Administrator


Joined: 05 Jul 2003
Posts: 54216
Location: 56N 3W

PostPosted: Tue Feb 03, 2015 10:17 pm    Post subject: Reply with quote

Quincy,

Unplug a pair of RAM modules and test.
Swap the fitted pair for the removed pair, using the same motherboard sockets. Test again.

Are all four modules *identical* ?

Look at the chips on the RAM modules. There will be a long difficult to read part number on them.
It will end with a -x or -xx where or xx are digits. This indicates the speed of the chips. Are they all identical?

A few BIOSes only read the SPD PROM on the first memery stick and assume the rest are the same.
This can cause unintentional overclocking if the first one is faster than the others.

With the right app and kernel module, you can read the SPD PROMs yourself.
Its been a while since I needed to do it.

If its been OK for years, I suspect its a gradual deteriation in the transient response of the Vcore and/or RAM power supply on the motherboard.
Thats back to those capacitors again. It may be fixed by moving the RAM module around, which will 'wipe' the connectors and by removing and replacing the pawer connectors to the motherboard, again wiping the contacts which reduces the contact resistance for a while.
While the power plugs are removed, inspect them for signs of overheating. This is a sign that the contacts have failed. Its unusual to find the black wires (0v) with a problem as there is so many. but the others may fail. Thu hot spot forms when the contact resistance increases.
The voltage loss is proportional to the current and resistance. Eventually, the on board regulator no longer has enough input voltage to regulate properly. At first, the transient performance is affected, producing poor performace when the load changes rapidly, like when the CPU goes from idle to flat out in one clock time.

If you have a multimeter, you can measure the voltages on the backs of the connecters (where the wires go to the PSU) and again on the motherboard.
_________________
Regards,

NeddySeagoon

Computer users fall into two groups:-
those that do backups
those that have never had a hard drive fail.
Back to top
View user's profile Send private message
Quincy
Apprentice
Apprentice


Joined: 02 Jun 2005
Posts: 201
Location: Germany

PostPosted: Wed Feb 04, 2015 1:52 am    Post subject: Reply with quote

Thanks again for your advice, but I am some steps further in making the system usable again. The reason for this post was getting memtest86+-5.01 to run and I did not want to bother you with the underlying topic you're now jumping on.

To explain the story a bit more and to clarify that we don't need to discuss the memory issue on that level to further extend (I will still keep in mind the possible electrical issue):
- I originally ran into files being not identical after having them copied on this hardware from one HDD to another. The funny thing is that out of 70 GB of data with ~18K files there are certain files kind of always affected and some others on top of that, but in total there are only a few per run affected at all. Additionally there are only changes of mostly bit 7 and more unlikely bit 0 within a byte in both directions with certain "value pairs" being most prominent (0x7F -> 0xFF and 0x80 -> 0x00). So there is some kind of system behind that.
- Obviously there were spurious changes within the files coming and going and even errors when checking which could not be verified in a second check of the very same data. So I reasoned that there is data changed somewhere in between. Exchanging SATA cables and even copying stuff from a PATA disk did not change anything so I suspected RAM.
- I went to the BIOS and checked RAM timings and realized that they were OK for one pair (dual channel) of my Kingston HyperX KHX3200AK2/1G (DDR1, Timings 2-3-2-6-1) but not for 2 pairs of the very same modules occupying all 4 slots of the mainboard (according to various overclockers on the net). "Automatic" settings decrease RAM clock from 400 MHz to 333 MHz as a kind of security/stability measure when all slots are used. With this automatic settings I could not find an error anymore, but speed was significantly slower. So I wanted to figure out what is still stable and what causes errors (and why are the errors a little bit systematic).
- Copying and comparing these amounts of data to reproduce the errors/test the timings looked not very attractive, so I started to run memtest86+-v4.20 (the Gentoo stable one). Surprisingly it did not find anything within 10 runs in almost 9 hours. At this point I wanted to use the testing version in the tree (5.01) to see if it makes any difference. While doing that I ran into the problem with memtest as described in my original post, which is still unresolved.
- After that I tried to copy just a single file of random data (10 GB from /dev/urandom) to see errors, but this never reported any error with the original RAM settings. Copying the whole bunch of files within that session still failed.
- In the meantime I played around a bit and figured out that I can use a single file (AVI, 23 MB) which seems to be special as it is almost always (>90%) running into trouble which is even quicker than memtest. I do not see what makes the file prone to these errors, but neither memtest randomness as well as 10GB of urandom data seem to contain the "magic pattern".
- I also ruled out that there has to be a HDD to HDD transfer, it also fails when the file is just copied on the same HDD.

So my open questions are:
- Gentoo wise: Why does memtest86+-5.01 a immediate reboot on that system (and also on others)
- Out of curiosity: Why are the errors
a) kind of systematic
b) occurring only with "special" data
Back to top
View user's profile Send private message
Roman_Gruber
Advocate
Advocate


Joined: 03 Oct 2006
Posts: 3846
Location: Austro Bavaria

PostPosted: Wed Feb 04, 2015 9:01 am    Post subject: Reply with quote

Overclocking your ram can cuase these effects, basically you answered your hole question yourself. though ihave only read the last paragraph of your post...

Quote:
"Automatic" settings decrease RAM clock from 400 MHz to 333 MHz as a kind of security/stability measure when all slots are used. With this automatic settings I could not find an error anymore,


Quote:
Un?)fortunately I already know that my problem are RAM timings, there is no need to find the source.


well advise, just get second hand faster ram and maybe you can increase your ram voltage. but this is overclocking and another topic ...

hardware can have weird effects from some brands and can behave differently bc something broke over the time. yes thats a fishy explanation but it sums it up.
Back to top
View user's profile Send private message
Quincy
Apprentice
Apprentice


Joined: 02 Jun 2005
Posts: 201
Location: Germany

PostPosted: Wed Feb 04, 2015 9:43 am    Post subject: Reply with quote

You should read the full story first. The RAM is not overclocked per se as it is specified for this speed: http://www.ec.kingston.com/ecom/configurator_new/PartsInfo_disc.asp?root=&LinkBack=&ktcpartno=KHX3200AK2/1G
Furthermore this has nothing to do with memtest86+-5.01 not being running as this a) happens on another computer as well b) happens also at 333 MHz
Back to top
View user's profile Send private message
Roman_Gruber
Advocate
Advocate


Joined: 03 Oct 2006
Posts: 3846
Location: Austro Bavaria

PostPosted: Wed Feb 04, 2015 10:04 am    Post subject: Reply with quote

no offense.

i do read sometimes overclocker pages and even on standard specs some RAM need e.g. more Voltage as specified, e.g. OZX or how they are called. hardware supplier claim a lot to be in specs but some brands do not work well where others do. in a ideal world anything would run fine.

Does your box cause any issues when the RAM are running at 333MHZ or not?

Off topic: what peeps tend to forget that every piece of hardware has its own logic and firmware. and the firmware can be buggy as hell. and even when it is claimed to work it may work..

Just out of curiousity: Why dont you get some other RAMS from second hand market? or just borrow some to make some tests so you can be absolutely sure it is not the mobo ...

bios is up to date, right?

another out of curiousity question? why dont you ditch the mobo / ram / cpu and get something which works from second hand?`I assume the hardware is kinda old and it seems it has somewhere a defect which can not be tracked down at all. you claim it is the ram but it could be the bios / mobo or the ram...


MAybe i just do not get it. So I want to ask again.

Do you believe memtest is faulty?
Or is your hardware maybe faulty?
Back to top
View user's profile Send private message
Quincy
Apprentice
Apprentice


Joined: 02 Jun 2005
Posts: 201
Location: Germany

PostPosted: Wed Feb 04, 2015 1:06 pm    Post subject: Reply with quote

tw04l124 wrote:
i do read sometimes overclocker pages and even on standard specs some RAM need e.g. more Voltage as specified, e.g. OZX or how they are called. hardware supplier claim a lot to be in specs but some brands do not work well where others do. in a ideal world anything would run fine.

Yeah, that would be best if PC hardware would "just work" :-)

tw04l124 wrote:
Does your box cause any issues when the RAM are running at 333MHZ or not?

As far as I can judge it does not have issues, except being slower by numbers of frequency and memory bandwidth (did no "real use" test).

tw04l124 wrote:
Off topic: what peeps tend to forget that every piece of hardware has its own logic and firmware. and the firmware can be buggy as hell. and even when it is claimed to work it may work..

Thats one of the reasons, why things don't "just work". Firmware and hardware are designed and implemented by humans - and they make mistakes.

tw04l124 wrote:
Just out of curiousity: Why dont you get some other RAMS from second hand market? or just borrow some to make some tests so you can be absolutely sure it is not the mobo ...

I don't want to invest money for that old system, even if it is only 10 EUR. Other option would be to buy a completely new system (with new bugs :D).

tw04l124 wrote:
bios is up to date, right?

Yep.

tw04l124 wrote:
another out of curiousity question? why dont you ditch the mobo / ram / cpu and get something which works from second hand?`I assume the hardware is kinda old and it seems it has somewhere a defect which can not be tracked down at all. you claim it is the ram but it could be the bios / mobo or the ram...

See above + getting things to work again or at least understand why they are broken is better than just throwing away.
After searching the net for hours (mostly overclocker forums even if I'm more eager on stable than fast) it seems to be a usual problem of having 4 RAM slots populated on this (and other) boards. Clock has to be reduced or at least timings have to be increased (which doesn't help in my case).

tw04l124 wrote:
Do you believe memtest is faulty?

Yes and now even in there are two things which I would like to know:
1. (this was my original question here!): Why is memtest version 5.01 not working at all on several PCs when using the ebuild? Should I file a bug or is there anybody out here in the forums where it "just works" or at least something easy has to be changed.
2. Why is memtest 4.20 not finding any problem with the mobo/ram/timing combination that I can prove via the file copy and compare method? Is it just testing in a different way and/or the wrong thing? Should we just skip memtest and better go for methods like this to test system stability/reliability even if the thing is clearly memory related?

tw04l124 wrote:
Or is your hardware maybe faulty?

I wouldn't say "faulty" but "badly designed" (see above) as this seems to be known to some extend and is not a specially broken board or memory module.

Therefore I would like to have an answer to my two questions regarding memtest and not bothering you and others with my old hardware which has obviously been a misconception years ago. Its not that I don't appreciate your help, but I wanted to keep it short for others from the very beginning (which obviously didn't work).
Back to top
View user's profile Send private message
Display posts from previous:   
Reply to topic    Gentoo Forums Forum Index Kernel & Hardware All times are GMT
Page 1 of 1

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum