Gentoo Forums
Gentoo Forums
Gentoo Forums
Quick Search: in
Boot problems with 2004.0 and 4GB RAM
View unanswered posts
View posts from last 24 hours

 
Reply to topic    Gentoo Forums Forum Index Gentoo on AMD64
View previous topic :: View next topic  
Author Message
mieses
Tux's lil' helper
Tux's lil' helper


Joined: 28 Feb 2004
Posts: 86

PostPosted: Wed Mar 03, 2004 11:27 am    Post subject: Boot problems with 2004.0 and 4GB RAM Reply with quote

I am having problems with 4x1GB RAM on 2 separate SK8V boards, and 2 different CPU's - Opteron 140 and 148 (FX51). In each case, only one SATA drive is attached to the onboard VIA VT8237 SATA RAID, one Nvidia Quadro4 is installed in the AGP slot, and one DVD+RW is installed. It is a very basic setup except for the large RAM memory.

The 2004.0 install cd and the 20040223 livecd both boot and install very nicely with a pair of 1GB modules, but both freeze up during boot when all 4 slots are filled with 1GB modules. I can install and boot into gentoo with 2x1GB. But when I poweroff and add the other 2x1GB, it freezes on bootup.

The memory modules passed memtest86+ when tested individually. I did have to tweak a few BIOS settings to get all 4 modules to pass memtest86+ together:
- DRAM over 4G remapping=Enabled
- DDR Voltage=2.7V
- Master ECC=Enable
- DRAM ECC=Enabled
- ECC Chipkill=Enabled

The 350W Antec PSU is a possible culprit. Each DIMM reportedly draws between 25-29Amps. Next I will try to experiment with a 450W PSU..

When booting gentoo-nofb with all 4GB installed, the following messages have appeared:

CPU0: Machine Check Exception: 4 Bank 0 : f6172...
TSC21f972ec9d
CPU0: Machine Check Exception: 4 Bank 2 : d0004...
TSC21f972f08a

hde: 312581808 sectors (160041MB) w/8192KiB Cache, ... /target/lun0:

.. a long pause - about 10 - 15 seconds. then very slowly steps through:

dma_timer_expiry : dma status = 0x21
hde : DMA timeout error
hde : DMA timeout error : status = 0xd0 {Busy}
hde : DMA disabled
ide2 : reset success

..long pause..

dma_timer_expiry : dma status = 0x21

.. then scrolls too fast for me to record until stopping on

---Attempting to mount CD

None of these errors appear when booting with just 2x1GB modules, everything else being the same.

When booting with the regular gentoo or smp kernels, it freezes on the splash screen. The progress bar gutter appears, but the progress bar itself never appears.

Any ideas as to what is happening?
Back to top
View user's profile Send private message
Hello
n00b
n00b


Joined: 15 Jan 2003
Posts: 5

PostPosted: Sat Mar 06, 2004 6:26 am    Post subject: Reply with quote

I have a similiar problem with my ASUS SK8N motherboard. I have 4 sticks of 512M Corsair RAM for a total of 2 gigabytes. With ECC disabled, memory errors cause all sorts of problems from programs not compiling, to kernel panics, to filesystem corruption. Enabling ECC eliminates my problems--or at least all the problems that I can notice--but I do get memory error messages at the terminal, that say something to the effect of "you have a memory error, but ECC fixed it".

From my understanding of hardware, most motherboards have trouble handling 4 DIMMS of unregistered memory due to electrical interference on the motherboard. Motherboards with registered memory should be able to handle plenty of DIMMS, but, this assumes the motherboard is designed and built properly.

Usually, the solution is to lower the speed or latency of the RAM, but sometimes, the motherboard just cannot run stable with the 4 DIMMS. In this case, use less DIMMS or try changing motherboards. One idea that you might try is to increase the cooling in your case -- heat is never good and can increase instability.

As for your settings, I don't think remapping the memory will help you with memory errors. Increasing the voltage can improve stability (at the expense of longevity and added heat), but it can also decrease stability.

Try seeing if there is some sort of auto-correct option in your bios. This is the setting that will most likely help the most. If your motherboard is automatically correcting errors but still has memory corruption, it means your system has extremely serious memory problems (ECC can correct one error at a time, but cannot correct two errors at a time.)
Back to top
View user's profile Send private message
mieses
Tux's lil' helper
Tux's lil' helper


Joined: 28 Feb 2004
Posts: 86

PostPosted: Sat Mar 06, 2004 11:44 am    Post subject: Reply with quote

Which 512 MB Corsair modules are you using?

I am able to run 4x512MB Corsair PC200 ECC Registered with no problems on the SK8V. It's very stable. No errors at boot, and no errors in memtest86+. I'm using 2 kits of TWINX1024RE-3200LL.

The 4x1GB modules that gave me problems must not be up to it. Maybe they generate too much heat or require too much power. They are in fact ECC Registered modules, but it doesn't seem to matter. I have enabled all the error correcting options available in the BIOS. Strangely, they do work better when used with an opteron 140 than with an fx51 in the same motherboard.

The opteron and all amd64 chips have a memory controller built into the CPU. This means that the motherboard has very little effect on the memory access. HyperTransport links the CPU directly to the memory. There's no NorthBridge anymore.

Shuttle (?) makes a motherboard with a hard-drive style power connector right next to the memory slots - to provide them with extra power. Such a feature seems attractive right now..
Back to top
View user's profile Send private message
Hello
n00b
n00b


Joined: 15 Jan 2003
Posts: 5

PostPosted: Sat Mar 06, 2004 8:31 pm    Post subject: Reply with quote

I'm also using 2 kits of TWINX1024RE-3200LL. From what I read on various forums, the SK8V has less problems with memory than the SK8N. This makes sense to me because the SK8N was one of the first motherboards to get to market, so it is more likely to have some design flaws in it.

The memory controller is on the CPU, but the memory still plugs into the motherboard. My educated guess is that the motherboard is either not reading the memory correctly from the slot, is not transmitting correctly, or possibly both.

I tried running my computer with the case open (which lowers my case temperature about 10 C). It may have helped, but it did not solve my problem because I still got plenty of memory errors.

I never really thought of the power being a problem, because I have quality 400W PS, and am using one 7200 RPM hard drive, a low power video card (Matrox G550), and no pci slots are filled. But, I think you're right that it's a power issue. I think the motherboard simply cannot supply enough voltage or enough voltage cleanly to the board to run the RAM properly.

The outlet that I plug my computer power in is pretty noisy. So, a nice line regulator (which costs $50 to 100$) or a UPS with line regulation may solve the problem. Maybe a beefier power supply would compensate for the motherboard's inadaquate power circuitry. Maybe a BIOS update would help.

The Shuttle (?) board looks like it addresses the issue of the motherboard supplying power to the RAM. After I quick look on the web, I did not find any such motherboard. This may sound like a silly question, but are you sure the extra power connector is for the RAM? Opteron boards take the P4 power connector (to supply extra voltage to the CPU), and sometimes manufactures put a hard-drive style power connector to use in lieu of the P4 connector.

One possible reason that the opteron 140 runs with less memory errors than the FX-51 is that the opteron 140 memory controller uses a 333 MHZ bus as opposed to the FX-51, which uses a 400 MHZ bus. Even though the memory is rated to run at a certain speed, and supposedly can run at that speed, if their are incompatibilities with the motherboard, than lowering the speed can solve stability problems. For me, lowering the latency settings and MHZ of the RAM did not help at all.

I have one small correction to make: The HyperTransport does not link the CPU directly to the memory. HyperTransport links the CPU to the southbridge chipset, and, I think, the on die memory controller. The memory controller definitely uses the standard 333 MHZ or 400 MHZ bus to connect to the RAM.
Back to top
View user's profile Send private message
mieses
Tux's lil' helper
Tux's lil' helper


Joined: 28 Feb 2004
Posts: 86

PostPosted: Sat Mar 06, 2004 10:51 pm    Post subject: Reply with quote

It turns out that the Shuttle I mentioned is a 2 year old model socket A board. According to these links, the connector was for additional power to both cpu and memory:
http://www.lostcircuits.com/motherboard/shuttle_ak37/
http://www.overclockers.com/articles696/

Thanks for all the info, especially about the line regulator. is there a simple cheap tool to test line quality and PSU issues?
Back to top
View user's profile Send private message
Hello
n00b
n00b


Joined: 15 Jan 2003
Posts: 5

PostPosted: Sun Mar 07, 2004 3:14 am    Post subject: Reply with quote

Check out: http://www.firingsquad.com/hardware/building_gaming_opteron_2003_Part1/page9.asp

I've seen the PC1000 model for 50$ at either www.bestbuy.com or www.buy.com I don't actually own one, but I was thinking of getting one. So let me know how it goes if you decide to try one.

I know my power line is dirty because my speakers have a low hissing sound to them (even when no source is attached). Though, I think every power line is "dirty"; it's just a question of how much.
Back to top
View user's profile Send private message
Display posts from previous:   
Reply to topic    Gentoo Forums Forum Index Gentoo on AMD64 All times are GMT
Page 1 of 1

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum