Gentoo Forums
Gentoo Forums
Gentoo Forums
Quick Search: in
ext2, ext3, xfs corruption - hardware related???
View unanswered posts
View posts from last 24 hours

 
Reply to topic    Gentoo Forums Forum Index Kernel & Hardware
View previous topic :: View next topic  
Author Message
Ace_of_Spades
n00b
n00b


Joined: 06 Feb 2003
Posts: 15
Location: Eichgraben, AUSTRIA

PostPosted: Thu Mar 06, 2003 9:23 am    Post subject: ext2, ext3, xfs corruption - hardware related??? Reply with quote

i'v a brand new system here:

* MSI K7D Master-L (BIOS v1.5)
* 2x AMD Athlon MP 2000+
* 2x 512MB hynix PC2100 CL2,5 ECC
* 2x Promise Ultra 133 TX2 (PDC20269 with latest BIOS)
* 4x Maxtor 6E030L0 Ultra DMA 133 7200 RPM - 30GB
* Enermax PSU EG-651P-VE (550W)
* Nvidia Riva-TNT2 M64 32MB

I'v had no problems installing Gentoo 1.2 and 1.4_rc3 on LVM on software raid5 (3 raid disks, 1 spare). But when I create an ext2 or ext3 filesystem larger than 30GB e2fsck -f finds 10000s of errors without having the filesystem mounted at all.
BTW xfs behavior is also strange. Creating large files (>1GB) of random numbers and checking them with md5sum causes the system to fail. After that many executable files are not found even if they are on other partitions. After a reboot (power off - reboot never was found after that) and a raid-resync everything is fine again - same with ext2, ext3.

Onother command to kill the system in the above mentioned way is 'tar -cWvplf' for large paths like portage or src.

What I did to find the error:
* HD check with Powermax from Maxtor
* changed UDMA 133 cables
* tried with only one DIMM
* change CPUs

* tried with each CPU alone
* tried another UDMA controller (CMD649) and onboard IDE controller
instead of Promise controllers
* tried with UDMA5 (-X69) instead of UDMA6
* tried kernels: gentoo, 2.4.21pre5-ac1, redhat, 2.4.19 ....

Now memtest86 is running since 12 hours on extended tests with no error messages.

I do not know what to try next!!!
:?: Could anybody give me a hint please :?:
Back to top
View user's profile Send private message
bLanark
Apprentice
Apprentice


Joined: 27 Aug 2002
Posts: 181
Location: Royal Berkshire, UK

PostPosted: Thu Mar 06, 2003 10:19 am    Post subject: Probably hardware Reply with quote

I suspect hardware here.

Is there any chance that you can take the drives out and put them in a windows machine for some non-destructive testing using the utilites on the IBM or Maxtor site? Or if your BIOS has a S.M.A.R.T. test option (possible on the CD that came with it?) then run that.

I had a laptop HDD go and the symptoms were the same - repeated fscks for no reason. I had a server IDE drive go but the symptoms were entirely different - any process accessing the drive would just hang.

Of course, it might not be hardware. Can you remove LVM from the equation and create partitions on each drive in turn?

BTW, I use LVM with ReiserFS with a partition greater than 100G without problems.
_________________
.sig: access denied
Back to top
View user's profile Send private message
Ace_of_Spades
n00b
n00b


Joined: 06 Feb 2003
Posts: 15
Location: Eichgraben, AUSTRIA

PostPosted: Thu Mar 06, 2003 10:23 am    Post subject: Reply with quote

I did all the test of Powermax from Maxtor's website (low level format inclusive) - no errors!!
Back to top
View user's profile Send private message
bLanark
Apprentice
Apprentice


Joined: 27 Aug 2002
Posts: 181
Location: Royal Berkshire, UK

PostPosted: Thu Mar 06, 2003 10:57 am    Post subject: Sorry Reply with quote

Quote:
I did all the test of Powermax from Maxtor's website (low level format inclusive) - no errors!!


Hmm, yes, missed that at first.

OK, the drives are 30 gb and the errors are only when the partitions are bigger than 30 gigs - right?

Anything in the syslog? Can you turn up the logging level for the LVM stuff? I can't see how to, but it must be possible, certainly most of the utilities (e.g. vgdisplay) have a -v or -vv option.

You might want to try evms instead of lvm, if you're at a stage where you can afford to start again.

Ally
_________________
.sig: access denied
Back to top
View user's profile Send private message
Ace_of_Spades
n00b
n00b


Joined: 06 Feb 2003
Posts: 15
Location: Eichgraben, AUSTRIA

PostPosted: Thu Mar 06, 2003 11:27 am    Post subject: Reply with quote

My first try on installing gentoo 1.2 was directly on raid1 and raid5 partitions. So I think LVM isn't the bad boy.

On partitions up to 25GB (i don't know the exact limit) e2fsck -f works without problems after creating the filesystem. But the strange errors described above (tar, md5sum) occure on that partitions too.
On partitions of about 50GB (raid5) e2fsck -f finds 10000s of errors like:

"Inode XXX is in use, but has dtime set", "Inode XXX has compression flag set on filesystem without compression support", "Inode XXX has illegal block(s)", "Too many illegal blocks in inode XXX"

even if the filesystem wasn't mounted anyway.

Cannot try anything at the moment becouse memtest86 v3.0 is running (test11) and I'm looking foreward to the results --> probably no errors I bet!
Back to top
View user's profile Send private message
taskara
Advocate
Advocate


Joined: 10 Apr 2002
Posts: 3763
Location: Australia

PostPosted: Thu Mar 06, 2003 11:28 am    Post subject: Reply with quote

hardware.. or perhaps KERNEL
_________________
Kororaa install method - have Gentoo up and running quickly and easily, fully automated with an installer!
Back to top
View user's profile Send private message
Ace_of_Spades
n00b
n00b


Joined: 06 Feb 2003
Posts: 15
Location: Eichgraben, AUSTRIA

PostPosted: Thu Mar 06, 2003 11:40 am    Post subject: Reply with quote

:twisted: thx a lot taskara for your v e r y helpfull posting!

:?: what kind of hardware
:?: do you have a kernelconfig for me - or what do you mean by KERNEL
Back to top
View user's profile Send private message
bLanark
Apprentice
Apprentice


Joined: 27 Aug 2002
Posts: 181
Location: Royal Berkshire, UK

PostPosted: Thu Mar 06, 2003 2:31 pm    Post subject: Reply with quote

OK, for what it's worth, I've got this version of LVM, and this kernel:

Code:

altair root # vgcreate --version
vgcreate: Logical Volume Manager 1.0.5
Heinz Mauelshagen, Sistina Software  15/07/2002 (IOP 10)

Code:

altair root # uname -a
Linux altair 2.4.19-gentoo-r9


I am NOT currently spanning more than one HDD, I have a single 120 gb drive, one volume group and one physical volume - if I remember the terminology correctly

The other drive I was using is dodgy and is back with the vendor for "testing". I am using reiserFS, not xfs or ext2 or ext3.

I'm NOT using RAID either.

Sorry I can't be of more help.

Oh, this machine is under gentoo 1.2, not 1.4 (i.e.
Code:

gcc version 2.95.3 20010315 (release)

_________________
.sig: access denied
Back to top
View user's profile Send private message
taskara
Advocate
Advocate


Joined: 10 Apr 2002
Posts: 3763
Location: Australia

PostPosted: Thu Mar 06, 2003 8:45 pm    Post subject: Reply with quote

Ace_of_Spades wrote:
:twisted: thx a lot taskara for your v e r y helpfull posting!

:?: what kind of hardware
:?: do you have a kernelconfig for me - or what do you mean by KERNEL


:P

I'm just saying that maybe the kernel you are using is corrupting the filesystems.

try a vanilla kernel, or beta kernel..

is it happening when you INSTALL gentoo.. or AFTER you have mae your system and put in your own kernel ?
_________________
Kororaa install method - have Gentoo up and running quickly and easily, fully automated with an installer!
Back to top
View user's profile Send private message
Ace_of_Spades
n00b
n00b


Joined: 06 Feb 2003
Posts: 15
Location: Eichgraben, AUSTRIA

PostPosted: Thu Mar 06, 2003 8:52 pm    Post subject: Reply with quote

thx, but as mentioned in my first post I tried nearly every 2.4 kernel on the market.

memtest-86 v3.0 has finished all test (extended included) without errors!
Back to top
View user's profile Send private message
taskara
Advocate
Advocate


Joined: 10 Apr 2002
Posts: 3763
Location: Australia

PostPosted: Fri Mar 07, 2003 12:03 am    Post subject: Reply with quote

try a new hard drive
try a different ide controller
upgrade your powersupply
get a shotgun
_________________
Kororaa install method - have Gentoo up and running quickly and easily, fully automated with an installer!
Back to top
View user's profile Send private message
Ace_of_Spades
n00b
n00b


Joined: 06 Feb 2003
Posts: 15
Location: Eichgraben, AUSTRIA

PostPosted: Fri Mar 07, 2003 8:31 am    Post subject: Reply with quote

I did another install last night which seems to work without problems.

the following two parameters have been changed:

    * partitiontables were made with cfdisk instead of fdisk
    * 2 60cm (24") UDMA133 cables were replaced by 2 45cm (18") shielded ones


The system now runs gentoo 1.4_rc3 with 2.4.20-gentoo-r1 on ext3 on lvm on raid5 without any errors.

:wink: Special thanks to my little pink dancing elephant who put me back on the right way!

Finaly - can anybody explain to me which of the above mentioned changes did the trick (if cfdisk not only is a frontend to fdisk, could it be that fdisk is buggy?)
Back to top
View user's profile Send private message
taskara
Advocate
Advocate


Joined: 10 Apr 2002
Posts: 3763
Location: Australia

PostPosted: Fri Mar 07, 2003 12:00 pm    Post subject: Reply with quote

I would say it was the cables.. fdisk has nothing to do with your filesystem.. so it doesn't seem like the culprit.

long ide cables are notorious for problems.. expecially if they aren't shielded.

then again.. maybe there was a new package released since you did your first install.. who knows!? At least it works! ;)
_________________
Kororaa install method - have Gentoo up and running quickly and easily, fully automated with an installer!
Back to top
View user's profile Send private message
klimg
n00b
n00b


Joined: 21 Sep 2002
Posts: 55

PostPosted: Fri Mar 07, 2003 12:09 pm    Post subject: Reply with quote

I woldn't bet that you are out of the woods.I always get I/O errors from the hd after 3-4 month with ext2/3 filesystems on a samsung drive which does point to a defective drive.
The first time that happened I ran a hd test from Cerberus that is supposed to destroy defective hardware for a week flat.No problems.With reiserfs everything works fine.
Back to top
View user's profile Send private message
Ace_of_Spades
n00b
n00b


Joined: 06 Feb 2003
Posts: 15
Location: Eichgraben, AUSTRIA

PostPosted: Fri Mar 07, 2003 12:18 pm    Post subject: Reply with quote

Quote:
I always get I/O errors from the hd after 3-4 month with ext2/3 filesystems

You are right - e2fsck reported errors again, but the oher tests (tar; generating 4 files from /dev/random of 4GB size simultaniously and checking them with md5sum) work fine at the moment.

Think I will switch to xfs.
Back to top
View user's profile Send private message
taskara
Advocate
Advocate


Joined: 10 Apr 2002
Posts: 3763
Location: Australia

PostPosted: Fri Mar 07, 2003 1:01 pm    Post subject: Reply with quote

I went to reiserfs.. it's nice :)
_________________
Kororaa install method - have Gentoo up and running quickly and easily, fully automated with an installer!
Back to top
View user's profile Send private message
klimg
n00b
n00b


Joined: 21 Sep 2002
Posts: 55

PostPosted: Fri Mar 07, 2003 3:54 pm    Post subject: Reply with quote

I am pretty sure in my case it's some issue with my cheapass everything onboard mobo (seen some stuff about I/O errors with sis chipset on the kernel list).But reiser never gave me any trouble.
Back to top
View user's profile Send private message
taskara
Advocate
Advocate


Joined: 10 Apr 2002
Posts: 3763
Location: Australia

PostPosted: Fri Mar 07, 2003 10:14 pm    Post subject: Reply with quote

OOOOHHHH you didn't mention you have an SIS chipset!!! lol ;)
_________________
Kororaa install method - have Gentoo up and running quickly and easily, fully automated with an installer!
Back to top
View user's profile Send private message
dweigert
Guru
Guru


Joined: 04 Oct 2002
Posts: 369
Location: Somerset, NJ USA

PostPosted: Sun Mar 09, 2003 7:24 pm    Post subject: Reply with quote

The MAXIMUM length for an ide cable is 18 inches. Otherwise you do get corruption.

Dan
Back to top
View user's profile Send private message
Ace_of_Spades
n00b
n00b


Joined: 06 Feb 2003
Posts: 15
Location: Eichgraben, AUSTRIA

PostPosted: Fri Mar 14, 2003 12:04 pm    Post subject: Reply with quote

Solved the problem by myself and the help of google.

The solution is posted in:
https://forums.gentoo.org/viewtopic.php?t=41321
Back to top
View user's profile Send private message
ben_h
Tux's lil' helper
Tux's lil' helper


Joined: 26 Nov 2002
Posts: 118
Location: Australia

PostPosted: Sat Mar 15, 2003 2:45 am    Post subject: Reply with quote

Jeez that's interesting. Glad you've got it solved!

Although, I think the PS/2 issue was only half your problem -- the other half was probably the IDE cables being too long.

In any case, enjoy your (now stable) dual MP box :D
Back to top
View user's profile Send private message
Display posts from previous:   
Reply to topic    Gentoo Forums Forum Index Kernel & Hardware All times are GMT
Page 1 of 1

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum