Gentoo Forums
Gentoo Forums
Gentoo Forums
Quick Search: in
hdd errors, faulty cable? [SOLVED]
View unanswered posts
View posts from last 24 hours

 
Reply to topic    Gentoo Forums Forum Index Other Things Gentoo
View previous topic :: View next topic  
Author Message
PietdeBoer
Apprentice
Apprentice


Joined: 20 Oct 2005
Posts: 244
Location: Eindhoven, the Netherlands

PostPosted: Thu Feb 14, 2008 3:17 pm    Post subject: hdd errors, faulty cable? [SOLVED] Reply with quote

Hey guys


my system crashes like ones a week. giving messages this is NOT a software error...

when the system is running i get this in my dmesg:

Code:

0x400
ata5: CPB 0: ctl_flags 0x1f, resp_flags 0x1
ata5: CPB 1: ctl_flags 0x1f, resp_flags 0x2
ata5: CPB 2: ctl_flags 0x1f, resp_flags 0x1
ata5: CPB 3: ctl_flags 0x1f, resp_flags 0x1
ata5: CPB 4: ctl_flags 0x1f, resp_flags 0x1
ata5: CPB 5: ctl_flags 0x1f, resp_flags 0x1
ata5: CPB 6: ctl_flags 0x1f, resp_flags 0x1
ata5: CPB 7: ctl_flags 0x1f, resp_flags 0x1
ata5: CPB 8: ctl_flags 0x1f, resp_flags 0x1
ata5: CPB 9: ctl_flags 0x1f, resp_flags 0x1
ata5: CPB 10: ctl_flags 0x1f, resp_flags 0x1
ata5: CPB 11: ctl_flags 0x1f, resp_flags 0x1
ata5: CPB 12: ctl_flags 0x1f, resp_flags 0x1
ata5: CPB 13: ctl_flags 0x1f, resp_flags 0x1
ata5: CPB 14: ctl_flags 0x1f, resp_flags 0x1
ata5: CPB 15: ctl_flags 0x1f, resp_flags 0x1
ata5: CPB 16: ctl_flags 0x1f, resp_flags 0x1
ata5: CPB 17: ctl_flags 0x1f, resp_flags 0x1
ata5: CPB 18: ctl_flags 0x1f, resp_flags 0x1
ata5: CPB 19: ctl_flags 0x1f, resp_flags 0x1
ata5: CPB 20: ctl_flags 0x1f, resp_flags 0x1
ata5: CPB 21: ctl_flags 0x1f, resp_flags 0x1
ata5: CPB 22: ctl_flags 0x1f, resp_flags 0x1
ata5: CPB 23: ctl_flags 0x1f, resp_flags 0x1
ata5: CPB 24: ctl_flags 0x1f, resp_flags 0x1
ata5: CPB 25: ctl_flags 0x1f, resp_flags 0x1
ata5: CPB 26: ctl_flags 0x1f, resp_flags 0x1
ata5: CPB 27: ctl_flags 0x1f, resp_flags 0x1
ata5: CPB 28: ctl_flags 0x1f, resp_flags 0x1
ata5: CPB 29: ctl_flags 0x1f, resp_flags 0x1
ata5: CPB 30: ctl_flags 0x1f, resp_flags 0x1
ata5: Resetting port
ata5.00: exception Emask 0x10 SAct 0x3 SErr 0x19d0000 action 0x2 frozen
ata5.00: cmd 60/80:00:3f:01:00/00:00:00:00:00/40 tag 0 cdb 0x0 data 65536 in
         res 40/00:08:3f:02:00/00:00:00:00:00/40 Emask 0x10 (ATA bus error)
ata5.00: cmd 60/80:08:3f:02:00/00:00:00:00:00/40 tag 1 cdb 0x0 data 65536 in
         res 40/00:08:3f:02:00/00:00:00:00:00/40 Emask 0x10 (ATA bus error)
ata8: Hotplug event, freezing
ata8: EH in ADMA mode, notifier 0x0 notifier_error 0x0 gen_ctl 0x1501000 status                                                                               0x500
ata8: CPB 0: ctl_flags 0x1f, resp_flags 0x1
ata8: CPB 1: ctl_flags 0x1f, resp_flags 0x1
ata8: CPB 2: ctl_flags 0x1f, resp_flags 0x1
ata8: CPB 3: ctl_flags 0x1f, resp_flags 0x1
ata8: CPB 4: ctl_flags 0x1f, resp_flags 0x1
ata8: CPB 5: ctl_flags 0x1f, resp_flags 0x1
ata8: CPB 6: ctl_flags 0x1f, resp_flags 0x1
ata8: CPB 7: ctl_flags 0x1f, resp_flags 0x1
ata8: CPB 8: ctl_flags 0x1f, resp_flags 0x1
ata8: CPB 9: ctl_flags 0x1f, resp_flags 0x1
ata8: CPB 10: ctl_flags 0x1f, resp_flags 0x1
ata8: CPB 11: ctl_flags 0x1f, resp_flags 0x1
ata8: CPB 12: ctl_flags 0x1f, resp_flags 0x1
ata8: CPB 13: ctl_flags 0x1f, resp_flags 0x1
ata8: CPB 14: ctl_flags 0x1f, resp_flags 0x1
ata8: CPB 15: ctl_flags 0x1f, resp_flags 0x1
ata8: CPB 16: ctl_flags 0x1f, resp_flags 0x1
ata8: CPB 17: ctl_flags 0x1f, resp_flags 0x1
ata8: CPB 18: ctl_flags 0x1f, resp_flags 0x1
ata8: CPB 19: ctl_flags 0x1f, resp_flags 0x1
ata8: CPB 20: ctl_flags 0x1f, resp_flags 0x1
ata8: CPB 21: ctl_flags 0x1f, resp_flags 0x1
ata8: CPB 22: ctl_flags 0x1f, resp_flags 0x1
ata8: CPB 23: ctl_flags 0x1f, resp_flags 0x1
ata8: CPB 24: ctl_flags 0x1f, resp_flags 0x1
ata8: CPB 25: ctl_flags 0x1f, resp_flags 0x1
ata8: CPB 26: ctl_flags 0x1f, resp_flags 0x1
ata8: CPB 27: ctl_flags 0x1f, resp_flags 0x1
ata8: CPB 28: ctl_flags 0x1f, resp_flags 0x1
ata8: CPB 29: ctl_flags 0x1f, resp_flags 0x1
ata8: CPB 30: ctl_flags 0x1f, resp_flags 0x1
ata8: Resetting port
ata8.00: exception Emask 0x10 SAct 0x1 SErr 0x19d0000 action 0x2 frozen
ata8.00: cmd 60/00:00:3f:01:00/01:00:00:00:00/40 tag 0 cdb 0x0 data 131072 in
         res 40/00:00:3f:01:00/00:00:00:00:00/40 Emask 0x10 (ATA bus error)
ata5: soft resetting port
ata8: soft resetting port
ata5: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
ata8: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
ata5.00: configured for UDMA/133
ata5: EH complete
ata8.00: configured for UDMA/133
ata8: EH complete
SCSI device sdf: 976773168 512-byte hdwr sectors (500108 MB)
sdf: Write Protect is off
sdf: Mode Sense: 00 3a 00 00
SCSI device sdf: write cache: enabled, read cache: enabled, doesn't support DPO                                                                               or FUA
SCSI device sdi: 976773168 512-byte hdwr sectors (500108 MB)
sdi: Write Protect is off
sdi: Mode Sense: 00 3a 00 00
SCSI device sdi: write cache: enabled, read cache: enabled, doesn't support DPO                                                                               or FUA
ata5.00: exception Emask 0x10 SAct 0x0 SErr 0x19d0000 action 0x2 frozen
ata5.00: cmd 40/00:01:00:00:00/00:00:00:00:00/e0 tag 0 cdb 0x0 data 0
         res 50/00:00:00:00:00/00:01:01:00:00/e0 Emask 0x10 (ATA bus error)
ata5: soft resetting port
ata5: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
ATA: abnormal status 0x80 on port 0xFFFFC2000003C49C
ATA: abnormal status 0x80 on port 0xFFFFC2000003C49C
ATA: abnormal status 0x80 on port 0xFFFFC2000003C49C
ATA: abnormal status 0x80 on port 0xFFFFC2000003C49C
ATA: abnormal status 0x80 on port 0xFFFFC2000003C49C
ata5.00: revalidation failed (errno=-2)
ata5: failed to recover some devices, retrying in 5 secs
ata5: hard resetting port
ata5: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
ata5.00: configured for UDMA/133
ata5: EH complete
SCSI device sdf: 976773168 512-byte hdwr sectors (500108 MB)
sdf: Write Protect is off
sdf: Mode Sense: 00 3a 00 00
SCSI device sdf: write cache: enabled, read cache: enabled, doesn't support DPO                                                                               or FUA
ata8.00: exception Emask 0x10 SAct 0x0 SErr 0x19d0000 action 0x2 frozen
ata8.00: cmd 40/00:01:00:00:00/00:00:00:00:00/e0 tag 0 cdb 0x0 data 0
         res 50/00:00:00:00:00/00:01:01:00:00/e0 Emask 0x10 (ATA bus error)
ata8: soft resetting port
ata8: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
ATA: abnormal status 0x80 on port 0xFFFFC2000003E59C
ATA: abnormal status 0x80 on port 0xFFFFC2000003E59C
ATA: abnormal status 0x80 on port 0xFFFFC2000003E59C
ATA: abnormal status 0x80 on port 0xFFFFC2000003E59C
ATA: abnormal status 0x80 on port 0xFFFFC2000003E59C
ata8.00: revalidation failed (errno=-2)
ata8: failed to recover some devices, retrying in 5 secs
ata8: hard resetting port
ata8: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
ata8.00: configured for UDMA/133
ata8: EH complete
SCSI device sdi: 976773168 512-byte hdwr sectors (500108 MB)
sdi: Write Protect is off
sdi: Mode Sense: 00 3a 00 00
SCSI device sdi: write cache: enabled, read cache: enabled, doesn't support DPO                                                                               or FUA


the crashes started occuring when i added 4 new sata disks

output from seatools (all disks are seagate)
Code:

fileserver ~ # ./st -l

Drive information:

/dev/sg0 FUJITSU  MAX3036NP        HPF1 71132959 blocks
/dev/sg1 ATA      ST3400620NS      3.AE 781422767 blocks
/dev/sg2 ATA      ST3400620NS      3.AE 781422767 blocks
/dev/sg3 ATA      ST3400620NS      3.AE 781422767 blocks
/dev/sg4 ATA      ST3400620NS      3.AE 781422767 blocks
/dev/sg5 ATA      ST3500630AS      3.AA 976773167 blocks
/dev/sg6 ATA      ST3500630AS      3.AA 976773167 blocks
/dev/sg7 ATA      ST3500630AS      3.AA 976773167 blocks
/dev/sg8 ATA      ST3500630AS      3.AA 976773167 blocks


where ATA5,6,7, and 8 are the new disks

since it only gives errors on ata5 and ata8.. i suggest there's something wrong with the connectors on the motherboard or the sata cables?

anyone has a good idea where this is coming from..

cheers!
_________________
_ Got Root? _


Last edited by PietdeBoer on Fri Feb 15, 2008 7:27 pm; edited 1 time in total
Back to top
View user's profile Send private message
alex.blackbit
Advocate
Advocate


Joined: 26 Jul 2005
Posts: 2397

PostPosted: Thu Feb 14, 2008 3:59 pm    Post subject: Reply with quote

since you get errors on more than one drive at a time, i guess it is not a cable.
could it be a power problem? it seems you have a lot devices in your system. maybe the psu is at the limit?
Back to top
View user's profile Send private message
sageman
Guru
Guru


Joined: 04 May 2005
Posts: 363
Location: New Hampshire

PostPosted: Thu Feb 14, 2008 9:11 pm    Post subject: Reply with quote

alex.blackbit wrote:
since you get errors on more than one drive at a time, i guess it is not a cable.
could it be a power problem? it seems you have a lot devices in your system. maybe the psu is at the limit?


That's the first thing that jumped to my mind. What sort of wattage is your PSU?
_________________
Carlton Stedman
Gentoo Metalheads on Last.fm: http://www.last.fm/group/Gentoo+Metalheads
Back to top
View user's profile Send private message
Cyker
Veteran
Veteran


Joined: 15 Jun 2006
Posts: 1746

PostPosted: Thu Feb 14, 2008 11:10 pm    Post subject: Reply with quote

It might also be them going into some sort of re-cal mode, or forcing a quick SMART check.

It turns out this is one of the main differences between consumer drives and 'enterprise' grade drives (Where they charge 50% more for what is really the same drive...).

I get these every now and then on my RAID array, as the disks are always active unlike the non-RAID'd drives on my system, but it only happens after a month or so of continuous uptime.

So far, the kernel's just reset the drive and then everything's carried on as normal. I've not even had to re-add it to the RAID array!
Back to top
View user's profile Send private message
PietdeBoer
Apprentice
Apprentice


Joined: 20 Oct 2005
Posts: 244
Location: Eindhoven, the Netherlands

PostPosted: Fri Feb 15, 2008 11:23 am    Post subject: Reply with quote

it runs on a 450Watt zalman psu


further specs of the system are

amd 3200+ 1GB memory and like 7*9mm fans

could the psu be at limit with this amount of drives?
_________________
_ Got Root? _
Back to top
View user's profile Send private message
fangorn
Veteran
Veteran


Joined: 31 Jul 2004
Posts: 1886

PostPosted: Fri Feb 15, 2008 1:00 pm    Post subject: Reply with quote

From the sum of Watts used, I'd say no. But as all the drives share one Power level, I'd say quite possible. If you have another PSU at hand, try with open case powering part of the drives with another PSU.
_________________
Video Encoding scripts collection | Project page
Back to top
View user's profile Send private message
NeddySeagoon
Administrator
Administrator


Joined: 05 Jul 2003
Posts: 54244
Location: 56N 3W

PostPosted: Fri Feb 15, 2008 1:10 pm    Post subject: Reply with quote

PietdeBoer,

The "450W Zalman" is not very useful. You will get power problems whenever a single output voltage gets near its limit.
There are also power combination rules that may mean you are getting power problems before you get anywhere near the 450W maximum PSU output.

e.g. your drives will each want about 8w to spin the motors (from the +12V) and 4w (from the +5v) to run the rest of the electronics.
Head movements will want pulses from the +12v.

Your CPU is probably operated from the +12v too. Investigate the load on the +12v and the PSUs capability to supply it
_________________
Regards,

NeddySeagoon

Computer users fall into two groups:-
those that do backups
those that have never had a hard drive fail.
Back to top
View user's profile Send private message
PietdeBoer
Apprentice
Apprentice


Joined: 20 Oct 2005
Posts: 244
Location: Eindhoven, the Netherlands

PostPosted: Fri Feb 15, 2008 7:27 pm    Post subject: Reply with quote

I ordered the power cables a bit... making sure there aint to many hdds on one cable..

the server now ran for 1,5 hours, i've done some heavy copy work to test if it remains stable.. and it does.. my error messages dissapeared from dmesg

thx for your help guys, the solution was an overloaded powercable
_________________
_ Got Root? _
Back to top
View user's profile Send private message
alex.blackbit
Advocate
Advocate


Joined: 26 Jul 2005
Posts: 2397

PostPosted: Sat Feb 16, 2008 12:37 am    Post subject: Reply with quote

good to hear. :wink:
it does not happen to often that 4 gentooers have the same opinon AND that it's right.
have a nice day.
Back to top
View user's profile Send private message
Display posts from previous:   
Reply to topic    Gentoo Forums Forum Index Other Things Gentoo All times are GMT
Page 1 of 1

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum