Gentoo Forums
Gentoo Forums
Gentoo Forums
Quick Search: in
A7N8x / NForce2 / SCSI / MD Raid Hell
View unanswered posts
View posts from last 24 hours

 
Reply to topic    Gentoo Forums Forum Index Kernel & Hardware
View previous topic :: View next topic  
Author Message
matt2413
n00b
n00b


Joined: 12 Apr 2003
Posts: 48

PostPosted: Tue Jul 08, 2003 7:37 pm    Post subject: A7N8x / NForce2 / SCSI / MD Raid Hell Reply with quote

Trying to get relevent search terms into the subject. :)

Brief Synopsis:

Gentoo server running at home. Crashing under somwhat heavy I/O across more than one IDE controller. Have Promise 20268 installed to add another pair of IDE channels. Booting off of Adaptec SCSI. System is fine and responsive otherwise, will compile kernels, complete MD rebuilds, or 200GB of data copy from one disk to another w/o errors. It's that second or sometimes third simultanious data xfer that kills me.

Meat and tech details:

So I've got a bunch of hardware and a good functioning Gentoo install. I think it was 1.4RC3 based, but it might of been RC2. It was initially built in late April of this year.

I've got 2 ASUS A7N8X Deluxe's, one 1600xp, one 2400XP and 3 512 sticks of DDR. One working set of this was/is my main workstation, so that gear is known to be of solid reliability. The other CPU & stick of mem was originally driving the gentoo server, but in a Epox 8KHA (via kt266a based).

The other hardware installed:

GF2 MX
Adaptec 19160 in PCI #3
Intel MT1000 in PCI #5
Promise TX2 (using the kernel driver for 20268) in PCI #4

HDD's:
/dev/md0 = 100mb /boot mirror off scsi drives

/dev/md1 = 500mb /var mirroe off scsi drives

/dev/md2 is 5x36gb SCSI - this is setup as LSR (Using MD not LVM) RAID5 & is the root filesystem.

/dev/md3 is LSR of /dev/hdb1 & /dev/hdd1 (both Seagate Cuda IV's)

/dev/md4 is LSR of /dev/hde1 & /dev/hdg1 (both Maxtor 120G 7200rpm) & the only drives on the promise card.

/dev/hda is a 200GB drive (WD)
/dev/hdc is a 200GB drive (WD)

I've disabled all onboard resources, floppy, both nics, sound, game port, parallel etc.

An Antec True Power 430 is driving all of this. It's never clicked off due to overload, and runs warm, but not hot - the case is a lian-li and has good airflow.

All this gear was working fine, but slowish in the Via based system, and all of it was working fine under redhat 7.3 on an Intel 815 system w/ 512 & 1Ghz P3.

I compiled a new kernel (2.4.21-ac4) w/ the proper AMD IDE & no ACPI, and changed the hardware around. I first tried the 1600 and the backup board. Almost immediately after significant I/O I experienced complete OS lockup. No flashing KB lights, no kernel panic messages, just a frozen on IDE activity light, and an unresponsive system.

Sometimes I'd get to look at the system before it died, or as it was dying. I'm getting DMA timeouts on /dev/hde (on the promise) and on the two drives on the secondary mobo IDE controller. Then the channel'd reset and the RAID rebuild would continue, or stop as the MD driver had failed the drive & was running in "degraded" mode. As far as those go it's mostly /dev/hdd. This generally was happening after a crash and durring the MD rebuild (viewable in cat /proc/mdstat). I'd up the rebuild max to 500000, and let them all go at it. The SCSI array would rebuild at 25MBps, the array on the mobo at around 23MBps and the array on the promise at around 20MBps. After

I tried changing the cards around on the PCI bus, but I have to keep the promise card AFTER the scsi HBA or the mobo BIOS wants to boot off that. The GIGe card has floated all over, but no combo seemed to work.

So I pulled the board / cpu / mem out of my workstation case, and plugged it all into the server case. Fired it all up and was greeted w/ the same madness.

I've tried the following kernels:
2.4.20-r5 Gentoo Sources - Somewhat stable but hangs on hdparm at boot sometimes.
2.4.21 Vanilla Sources - NO DMA via Hdparm as others have reported.
2.4.21-ac4 - the "best" so far, but still crashes.
linux-2.4.22-pre3-ac1 - same behavior as 2.4.21-ac3
linux-2.5.74-mm2 as a wild chance, but didn't like the LSR raid5 set.

I'm at a loss here as I'd generally be saying a bad piece of hardware was to blame, but I've tried a couple different promise contollers, incl. a ATA66 based one. I haven't tried a different SCSI hba, but something tells me that's not the issue, as the I/O that kills me is IDE based. It's also easier for me to believe that the Promise driver is buggy as opposed to the Adaptec 7XXX driver.

Does anyone have any bright ideas towards this solution? Getting what's in the machine stable is the goal, but I still need to add sound and a 100 base NIC to the mix to be done. I'd like to use the soundstorm and the 3com (loading either module results in immediate hard lock), but I'll take what I can get at this point.

I've got a couple other questions, but I'll post those in seperate topics for better searchability.

Thanks to all for making it this far. :)

Matt
Back to top
View user's profile Send private message
Display posts from previous:   
Reply to topic    Gentoo Forums Forum Index Kernel & Hardware All times are GMT
Page 1 of 1

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum