Gentoo Forums
Gentoo Forums
Gentoo Forums
Quick Search: in
Hard Drives may be going out (solved)
View unanswered posts
View posts from last 24 hours

Goto page 1, 2  Next  
Reply to topic    Gentoo Forums Forum Index Kernel & Hardware
View previous topic :: View next topic  
Author Message
Bigun
Veteran
Veteran


Joined: 21 Sep 2003
Posts: 1961

PostPosted: Sat Dec 29, 2012 4:09 pm    Post subject: Hard Drives may be going out (solved) Reply with quote

I had a disk fail out (sdb - wouldn't spin up) and I sent it off for replacement. In the meanwhile I had a spare drive available and put it in for rebuild. Things were going normally and the rebuild was going at full speed (19000K/s). Then at about 30% it went to about 1900K/s. I began to research and found this in the two remaining drives' SMART log:

/dev/sda:
Code:
  5 Reallocated_Sector_Ct   0x0033   100   100   010    Pre-fail  Always       -       0
195 Hardware_ECC_Recovered  0x001a   100   100   000    Old_age   Always       -       1154010999
197 Current_Pending_Sector  0x0012   100   100   000    Old_age   Always       -       0
198 Offline_Uncorrectable   0x0030   100   100   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x003e   100   100   000    Old_age   Always       -       0


/dev/sdc:
Code:
  5 Reallocated_Sector_Ct   0x0033   100   100   010    Pre-fail  Always       -       0
195 Hardware_ECC_Recovered  0x001a   100   100   000    Old_age   Always       -       1040456479
197 Current_Pending_Sector  0x0012   100   100   000    Old_age   Always       -       1
198 Offline_Uncorrectable   0x0030   100   100   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x003e   100   100   000    Old_age   Always       -       0


And the Hardware_ECC_Recovered on both seem to steadily climb while the rebuild is in progress.

Is this what is causing the slowdown?

If so I want to replace them, but I have no idea if the manufacturer would warranty based on a low-level diagnostic like this -especially if the drive is still responding. How would I word this to the manufacturer when I request an RMA?


Last edited by Bigun on Fri Jan 04, 2013 5:56 pm; edited 1 time in total
Back to top
View user's profile Send private message
NeddySeagoon
Administrator
Administrator


Joined: 05 Jul 2003
Posts: 31907
Location: 56N 3W

PostPosted: Sat Dec 29, 2012 4:31 pm    Post subject: Reply with quote

Bigun,

The raw data doesn't always mean a great deal. I varies from vendor to vendor and ofter has several data items bit mapped into the same 32 bit field.
Check the drive vendors web site too see if they tell you how to decode the raw data. It may not be Hardware_ECC_Recovered at all.

The data in the three columns ending THRESH are normalised - how thing is done is vendor specific but the interpretation rules are the same everywhere.
When a VALUE is less than or equal to THRESH, that value has failed the SMART test.

Modern hard drives are 'zoned' this means that they have more sectors per track near the outside of the drive than near the spindle. As the drives rotate at a constant rate, the data rate at the ouside is higher than at the inside. Expect to see a variation of 2 to 3 times in actual data rates. Its a feature. not a fault.

Keep an eye on
Code:
197 Current_Pending_Sector  0x0012   100   100   000    Old_age   Always       -       1
it means that the drive has one sector that it would like to remap.
That something that drives do all through their useful life but just occasionally, the drive will leave it too late and the data is lost because the drive cannot read the sector.
Its not a reason for an RMA unless it gets worse or you have failed reads.
_________________
Regards,

NeddySeagoon

Computer users fall into two groups:-
those that do backups
those that have never had a hard drive fail.
Back to top
View user's profile Send private message
Bigun
Veteran
Veteran


Joined: 21 Sep 2003
Posts: 1961

PostPosted: Sat Dec 29, 2012 4:41 pm    Post subject: Reply with quote

Ok, so why the slowdown then?
Back to top
View user's profile Send private message
eccerr0r
Advocate
Advocate


Joined: 01 Jul 2004
Posts: 3899
Location: USA

PostPosted: Sat Dec 29, 2012 4:49 pm    Post subject: Reply with quote

I'd also see if your PSU is failing since you had two disks have issues recently.

Often, HDD manufacturers wants an error code from their diagnostic tool to start a RMA. These diagnostic tools often use SMART as well to grab data, you should try these tools to see if they complain about anything.
_________________
Intel Core i7 2700K@ 4.1GHz/HD3000 graphics/8GB DDR3/180GB SSD
What am I supposed to be advocating?
Back to top
View user's profile Send private message
NeddySeagoon
Administrator
Administrator


Joined: 05 Jul 2003
Posts: 31907
Location: 56N 3W

PostPosted: Sat Dec 29, 2012 5:20 pm    Post subject: Reply with quote

Bigun,

The data rate that the drive can sustain drops as the head moves towards the spindle. There are less sectors per revolution of the platter.
The sustained data rate is defined by the head/platter interface and its not constant because of the zoning.

If you are using the PC for other things - the other things take their toll on bandwidth. Especially seeks to find/write other data
_________________
Regards,

NeddySeagoon

Computer users fall into two groups:-
those that do backups
those that have never had a hard drive fail.
Back to top
View user's profile Send private message
Bigun
Veteran
Veteran


Joined: 21 Sep 2003
Posts: 1961

PostPosted: Sat Dec 29, 2012 5:32 pm    Post subject: Reply with quote

NeddySeagoon wrote:
Bigun,

The data rate that the drive can sustain drops as the head moves towards the spindle. There are less sectors per revolution of the platter.
The sustained data rate is defined by the head/platter interface and its not constant because of the zoning.

If you are using the PC for other things - the other things take their toll on bandwidth. Especially seeks to find/write other data


Right now the drive set isn't even mounted.
Back to top
View user's profile Send private message
eccerr0r
Advocate
Advocate


Joined: 01 Jul 2004
Posts: 3899
Location: USA

PostPosted: Sat Dec 29, 2012 6:08 pm    Post subject: Reply with quote

Zone bit recording doesn't really do 10:1, but as said, it's more like 2 to 3:1 where it's about 2 to 3x as fast on the outside to the inside rim. At 10:1 ratio something else is affecting speeds...

What interface are these drives? On my PATA disks (using libata) I get much faster than even 19MB/sec during RAID5 resync... Well into the 20MB/s range (my SATA RAID5 is in the 50MB/s range)..

Are these all the same drives?
_________________
Intel Core i7 2700K@ 4.1GHz/HD3000 graphics/8GB DDR3/180GB SSD
What am I supposed to be advocating?
Back to top
View user's profile Send private message
NeddySeagoon
Administrator
Administrator


Joined: 05 Jul 2003
Posts: 31907
Location: 56N 3W

PostPosted: Sat Dec 29, 2012 6:52 pm    Post subject: Reply with quote

My raid5 resync, if I do nothing, stars out at about 130Mb/sec and falls to just over 40Mb/sec

When I use the PC at the same time, the sequential reads and writes that the raid5 resync does are interrupted by seeks for other purposes.
Just reading mail and posting on the forums makes between 10x and 100x speed difference. These drops are not sustained but /proc/mdstat shows them. The sync speed recovers once its the only operation again.
_________________
Regards,

NeddySeagoon

Computer users fall into two groups:-
those that do backups
those that have never had a hard drive fail.
Back to top
View user's profile Send private message
Bigun
Veteran
Veteran


Joined: 21 Sep 2003
Posts: 1961

PostPosted: Sat Dec 29, 2012 7:15 pm    Post subject: Reply with quote

Started out at 140 Mb/s (SATA 300 drives), and now it's 14 Mb/s.
Back to top
View user's profile Send private message
NeddySeagoon
Administrator
Administrator


Joined: 05 Jul 2003
Posts: 31907
Location: 56N 3W

PostPosted: Sat Dec 29, 2012 7:43 pm    Post subject: Reply with quote

Bigun,

Pastebin dmesg. Provided that resyncing the raid is all its doing, there is something wrong.
Post /proc/mdstat too.
_________________
Regards,

NeddySeagoon

Computer users fall into two groups:-
those that do backups
those that have never had a hard drive fail.
Back to top
View user's profile Send private message
Bigun
Veteran
Veteran


Joined: 21 Sep 2003
Posts: 1961

PostPosted: Sun Dec 30, 2012 11:42 am    Post subject: Reply with quote

Thank God I got offsite backup of the entire drive before I started rebuild:

Code:
Personalities : [raid1] [raid6] [raid5] [raid4]
md127 : active raid5 sdb1[3](S) sdc1[4](F) sda1[2]
      2930271872 blocks level 5, 64k chunk, algorithm 2 [3/1] [__U]

md124 : active raid1 sde1[1] sdd1[0]
      96256 blocks [2/2] [UU]

md125 : active raid1 sde2[1] sdd2[0]
      979840 blocks [2/2] [UU]

md126 : active raid1 sde3[1] sdd3[0]
      77074304 blocks [2/2] [UU]

unused devices: <none>


dmesg output

Is it done?
Back to top
View user's profile Send private message
Anon-E-moose
Advocate
Advocate


Joined: 23 May 2008
Posts: 2294
Location: Dallas area

PostPosted: Sun Dec 30, 2012 12:15 pm    Post subject: Reply with quote

Code:
[299323.783377] ata5.00: failed command: READ FPDMA QUEUED
[299323.783384] ata5.00: cmd 60/00:00:0f:67:15/04:00:81:00:00/40 tag 0 ncq 524288 in
[299323.783384]          res 51/40:db:34:68:15/00:02:81:00:00/40 Emask 0x409 (media error) <F>
[299323.783387] ata5.00: status: { DRDY ERR }
[299323.783389] ata5.00: error: { UNC }
[299323.796537] ata5.00: configured for UDMA/133
[299323.796592] sd 4:0:0:0: [sdc] Unhandled sense code
[299323.796594] sd 4:0:0:0: [sdc] 
[299323.796596] Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
[299323.796597] sd 4:0:0:0: [sdc] 
[299323.796599] Sense Key : Medium Error [current] [descriptor]
[299323.796602] Descriptor sense data with sense descriptors (in hex):
[299323.796603]         72 03 11 04 00 00 00 0c 00 0a 80 00 00 00 00 00
[299323.796609]         81 15 68 34
[299323.796612] sd 4:0:0:0: [sdc] 
[299323.796615] Add. Sense: Unrecovered read error - auto reallocate failed
[299323.796616] sd 4:0:0:0: [sdc] CDB:
[299323.796617] Read(10): 28 00 81 15 67 0f 00 04 00 00
[299323.796623] end_request: I/O error, dev sdc, sector 2165663796


Looks like your drive is giving up the ghost, so to speak.

Get the Samsung disk took and have it run and see what it says.
As is said above most disk manufacturers will want to see that before an RMA is issued.
They will also be able to tell you if the disk is in warranty (according to them)

Good luck


Note: I had a Hitachi that had sectors pending and I reformatted the whole drive with their disk utility
and it reallocated things properly and it went from over 100 "reallocated" sectors to 24
and has held steady at that for the last several months.
YMMV
_________________
Asus m5a99fx, FX 8320 - amd64-multilib, 3.15.9-zen, glibc-2.17, gcc-4.7.3-r1, eudev
xorg-server-1.16, openbox w/lxpanel, nouveau, oss4
Back to top
View user's profile Send private message
NeddySeagoon
Administrator
Administrator


Joined: 05 Jul 2003
Posts: 31907
Location: 56N 3W

PostPosted: Sun Dec 30, 2012 3:01 pm    Post subject: Reply with quote

Bigun,

Code:
Personalities : [raid1] [raid6] [raid5] [raid4]
md127 : active raid5 sdb1[3](S) sdc1[4](F) sda1[2]
      2930271872 blocks level 5, 64k chunk, algorithm 2 [3/1] [__U]
shows that you have a three disk raid5 set with only one drive. Thats really bad news.

dmesg shows
Code:
[299323.796623] end_request: I/O error, dev sdc, sector 2165663796
[299323.796626] md/raid:md127: read error not correctable (sector 2165663728 on sdc1).
[299323.796629] md/raid:md127: Disk failure on sdc1, disabling device.
[299323.796629] md/raid:md127: Operation continuing on 1 devices.

The recovery failed and /dev/sdc1 shows
Code:
[299323.796632] md/raid:md127: read error not correctable (sector 2165663736 on sdc1).
[299323.796634] md/raid:md127: read error not correctable (sector 2165663744 on sdc1).
[299323.796635] md/raid:md127: read error not correctable (sector 2165663752 on sdc1).
[299323.796637] md/raid:md127: read error not correctable (sector 2165663760 on sdc1).
[299323.796639] md/raid:md127: read error not correctable (sector 2165663768 on sdc1).
[299323.796641] md/raid:md127: read error not correctable (sector 2165663776 on sdc1).
[299323.796642] md/raid:md127: read error not correctable (sector 2165663784 on sdc1).
[299323.796644] md/raid:md127: read error not correctable (sector 2165663792 on sdc1).
[299323.796646] md/raid:md127: read error not correctable (sector 2165663800 on sdc1).
a bad patch on the drive.
_________________
Regards,

NeddySeagoon

Computer users fall into two groups:-
those that do backups
those that have never had a hard drive fail.
Back to top
View user's profile Send private message
Bigun
Veteran
Veteran


Joined: 21 Sep 2003
Posts: 1961

PostPosted: Sun Dec 30, 2012 6:35 pm    Post subject: Reply with quote

NeddySeagoon wrote:
Bigun,

Code:
Personalities : [raid1] [raid6] [raid5] [raid4]
md127 : active raid5 sdb1[3](S) sdc1[4](F) sda1[2]
      2930271872 blocks level 5, 64k chunk, algorithm 2 [3/1] [__U]
shows that you have a three disk raid5 set with only one drive. Thats really bad news.

dmesg shows
Code:
[299323.796623] end_request: I/O error, dev sdc, sector 2165663796
[299323.796626] md/raid:md127: read error not correctable (sector 2165663728 on sdc1).
[299323.796629] md/raid:md127: Disk failure on sdc1, disabling device.
[299323.796629] md/raid:md127: Operation continuing on 1 devices.

The recovery failed and /dev/sdc1 shows
Code:
[299323.796632] md/raid:md127: read error not correctable (sector 2165663736 on sdc1).
[299323.796634] md/raid:md127: read error not correctable (sector 2165663744 on sdc1).
[299323.796635] md/raid:md127: read error not correctable (sector 2165663752 on sdc1).
[299323.796637] md/raid:md127: read error not correctable (sector 2165663760 on sdc1).
[299323.796639] md/raid:md127: read error not correctable (sector 2165663768 on sdc1).
[299323.796641] md/raid:md127: read error not correctable (sector 2165663776 on sdc1).
[299323.796642] md/raid:md127: read error not correctable (sector 2165663784 on sdc1).
[299323.796644] md/raid:md127: read error not correctable (sector 2165663792 on sdc1).
[299323.796646] md/raid:md127: read error not correctable (sector 2165663800 on sdc1).
a bad patch on the drive.


Like I said, good thing I had backup. I did a fresh backup before I unmounted and started the rebuild actually.

I had ordered another drive already via premonition that something like this would happen. That explains the slowdown then.

Neddy,

You may wanna start adding links to your signature of people who benefited from keeping up to date backups.
Back to top
View user's profile Send private message
eccerr0r
Advocate
Advocate


Joined: 01 Jul 2004
Posts: 3899
Location: USA

PostPosted: Mon Dec 31, 2012 1:44 am    Post subject: Reply with quote

Yay, saved by the backup.

I found when having RAID:

1. RAID is not a replacement for backups. This is a very important point people neglect.
2. A hot spare is nice. RAID6 counts toward a hot spare.
3. Having a COLD spare on hand helps a LOT - it closes your window if you don't have a hot spare and since it has less hours on it, it will have different failure characteristics than the ones that have been powered up and spinning. Test your cold spare before putting it on the shelf.
4. Having good PSUs is also mandatory... You don't want a common, bad PSU killing all your drives at the same time...

Unfortunately PSU testers are hard to come by... Really need an o-scope and a load board to test them correctly.
_________________
Intel Core i7 2700K@ 4.1GHz/HD3000 graphics/8GB DDR3/180GB SSD
What am I supposed to be advocating?
Back to top
View user's profile Send private message
NeddySeagoon
Administrator
Administrator


Joined: 05 Jul 2003
Posts: 31907
Location: 56N 3W

PostPosted: Mon Dec 31, 2012 1:53 am    Post subject: Reply with quote

eccerr0r,

... and that only gets you static testing. Dynamic testing is just as important in a PC.
A good rule of thumb is to by mid priced PSUs. They are a commodity and you get what you pay for.
_________________
Regards,

NeddySeagoon

Computer users fall into two groups:-
those that do backups
those that have never had a hard drive fail.
Back to top
View user's profile Send private message
Bigun
Veteran
Veteran


Joined: 21 Sep 2003
Posts: 1961

PostPosted: Tue Jan 01, 2013 12:40 pm    Post subject: Reply with quote

Real quick, I'm building the new array, it's was going really fast, averaging about 40-60MB/s. Then slows down to about 4MB/s. I check out the lights and a temp spare drive I grabbed from work, it's HD light seems to be staying on. I check smartctl on that drive:

Code:
  5 Reallocated_Sector_Ct   0x0033   200   200   140    Pre-fail  Always       -       0
197 Current_Pending_Sector  0x0032   200   200   000    Old_age   Always       -       0
198 Offline_Uncorrectable   0x0030   100   253   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x0032   200   200   000    Old_age   Always       -       0


Nothing seems weird, but I'm wondering if I need to RMA this thing for work. Here is the whole smartctl entry for that drive.
Back to top
View user's profile Send private message
NeddySeagoon
Administrator
Administrator


Joined: 05 Jul 2003
Posts: 31907
Location: 56N 3W

PostPosted: Tue Jan 01, 2013 1:00 pm    Post subject: Reply with quote

Bigun
dmesg and /proc/mdstat would be useful
Your SMART does have one warning sign .... Western Digital Green.
_________________
Regards,

NeddySeagoon

Computer users fall into two groups:-
those that do backups
those that have never had a hard drive fail.
Back to top
View user's profile Send private message
Bigun
Veteran
Veteran


Joined: 21 Sep 2003
Posts: 1961

PostPosted: Tue Jan 01, 2013 2:35 pm    Post subject: Reply with quote

NeddySeagoon wrote:
Bigun
dmesg and /proc/mdstat would be useful
Your SMART does have one warning sign .... Western Digital Green.


It's sped back up to about 11MB/s:

Code:
Personalities : [raid1] [raid6] [raid5] [raid4]
md127 : active raid5 sde1[3] sdb1[1] sda1[0]
      2930269184 blocks super 1.2 level 5, 512k chunk, algorithm 2 [3/2] [UU_]
      [============>........]  recovery = 61.2% (897263620/1465134592) finish=861.6min speed=10983K/sec
Back to top
View user's profile Send private message
NeddySeagoon
Administrator
Administrator


Joined: 05 Jul 2003
Posts: 31907
Location: 56N 3W

PostPosted: Tue Jan 01, 2013 5:21 pm    Post subject: Reply with quote

Bigun,

Look in dmesg for failures and recovery that indicate write retries.
This involves recalibrating the head, which usually produces an audible 'click'
The click of death, is produced by constant recalibrates.
_________________
Regards,

NeddySeagoon

Computer users fall into two groups:-
those that do backups
those that have never had a hard drive fail.
Back to top
View user's profile Send private message
Bigun
Veteran
Veteran


Joined: 21 Sep 2003
Posts: 1961

PostPosted: Wed Jan 02, 2013 9:03 am    Post subject: Reply with quote

NeddySeagoon wrote:
Bigun,

Look in dmesg for failures and recovery that indicate write retries.
This involves recalibrating the head, which usually produces an audible 'click'
The click of death, is produced by constant recalibrates.


I don't see anything, also the RAID finished. The WD Green drive is temporary until my other drive gets RMA'd. I also don't plan on writing anything to the drive until the RMA'd drive gets here and the RAID rebuilds again. I just need it to stream media to the WDTV live. :D
Back to top
View user's profile Send private message
Bigun
Veteran
Veteran


Joined: 21 Sep 2003
Posts: 1961

PostPosted: Thu Jan 03, 2013 1:44 pm    Post subject: Reply with quote

Oh FFS, it's the WD Green drive, but come on!

During recovery of the data on the RAID set:
Code:
[219380.354814] ata5.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
[219380.354819] ata5.00: failed command: SMART
[219380.354823] ata5.00: cmd b0/d0:01:00:4f:c2/00:00:00:00:00/00 tag 0 pio 512 in
[219380.354823]          res 40/00:00:00:4f:c2/00:00:00:00:00/00 Emask 0x4 (timeout)
[219380.354825] ata5.00: status: { DRDY }
[219380.354830] ata5: hard resetting link
[219385.700776] ata5: link is slow to respond, please be patient (ready=0)
[219387.534026] ata5: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
[219387.560506] ACPI Error: [DSSP] Namespace lookup failure, AE_NOT_FOUND (20120320/psargs-359)
[219387.560512] ACPI Error: Method parse/execution failed [\_SB_.PCI0.SAT0.SPT4._GTF] (Node f54571c8), AE_NOT_FOUND (20120320/psparse-536)
[219387.565768] ACPI Error: [DSSP] Namespace lookup failure, AE_NOT_FOUND (20120320/psargs-359)
[219387.565774] ACPI Error: Method parse/execution failed [\_SB_.PCI0.SAT0.SPT4._GTF] (Node f54571c8), AE_NOT_FOUND (20120320/psparse-536)
[219387.566030] ata5.00: configured for UDMA/133
[219387.566045] ata5: EH complete
[228367.972299] ata5.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
[228367.972305] ata5.00: failed command: SMART
[228367.972312] ata5.00: cmd b0/d0:01:00:4f:c2/00:00:00:00:00/00 tag 0 pio 512 in
[228367.972312]          res 40/00:00:00:4f:c2/00:00:00:00:00/00 Emask 0x4 (timeout)
[228367.972315] ata5.00: status: { DRDY }
[228367.972330] ata5: hard resetting link
[228373.318358] ata5: link is slow to respond, please be patient (ready=0)
[228378.004336] ata5: COMRESET failed (errno=-16)
[228378.004343] ata5: hard resetting link
[228383.350411] ata5: link is slow to respond, please be patient (ready=0)
[228388.035392] ata5: COMRESET failed (errno=-16)
[228388.035399] ata5: hard resetting link
[228393.381463] ata5: link is slow to respond, please be patient (ready=0)
[228411.053153] ata5: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
[228411.062538] ACPI Error: [DSSP] Namespace lookup failure, AE_NOT_FOUND (20120320/psargs-359)
[228411.062544] ACPI Error: Method parse/execution failed [\_SB_.PCI0.SAT0.SPT4._GTF] (Node f54571c8), AE_NOT_FOUND (20120320/psparse-536)
[228411.080905] ACPI Error: [DSSP] Namespace lookup failure, AE_NOT_FOUND (20120320/psargs-359)
[228411.080911] ACPI Error: Method parse/execution failed [\_SB_.PCI0.SAT0.SPT4._GTF] (Node f54571c8), AE_NOT_FOUND (20120320/psparse-536)
[228411.081781] ata5.00: configured for UDMA/133
[228411.081795] ata5: EH complete
[229681.016622] ata5.00: exception Emask 0x0 SAct 0x1 SErr 0x0 action 0x6 frozen
[229681.016628] ata5.00: failed command: READ FPDMA QUEUED
[229681.016634] ata5.00: cmd 60/68:00:d7:01:fc/02:00:aa:00:00/40 tag 0 ncq 315392 in
[229681.016634]          res 40/00:00:01:4f:c2/00:00:00:00:00/00 Emask 0x4 (timeout)
[229681.016637] ata5.00: status: { DRDY }
[229681.016651] ata5: hard resetting link
[229683.969236] ata5: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
[229683.982469] ACPI Error: [DSSP] Namespace lookup failure, AE_NOT_FOUND (20120320/psargs-359)
[229683.982475] ACPI Error: Method parse/execution failed [\_SB_.PCI0.SAT0.SPT4._GTF] (Node f54571c8), AE_NOT_FOUND (20120320/psparse-536)
[229684.001288] ACPI Error: [DSSP] Namespace lookup failure, AE_NOT_FOUND (20120320/psargs-359)
[229684.001294] ACPI Error: Method parse/execution failed [\_SB_.PCI0.SAT0.SPT4._GTF] (Node f54571c8), AE_NOT_FOUND (20120320/psparse-536)
[229684.001455] ata5.00: configured for UDMA/133
[229684.001475] sd 4:0:0:0: [sdb]
[229684.001477] Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
[229684.001480] sd 4:0:0:0: [sdb]
[229684.001482] Sense Key : Aborted Command [current] [descriptor]
[229684.001485] Descriptor sense data with sense descriptors (in hex):
[229684.001487]         72 0b 00 00 00 00 00 0c 00 0a 80 00 00 00 00 00
[229684.001502]         00 00 00 00
[229684.001505] sd 4:0:0:0: [sdb]
[229684.001507] Add. Sense: No additional sense information
[229684.001508] sd 4:0:0:0: [sdb] CDB:
[229684.001509] Read(10): 28 00 aa fc 01 d7 00 02 68 00
[229684.001515] end_request: I/O error, dev sdb, sector 2868642263
[229684.001560] ata5: EH complete


I'm starting to wonder about the PSU, any recommended PSU testers?

If it does wind up being the PSU, any recommended PSU? I only need 300W or so.
Back to top
View user's profile Send private message
Akkara
Administrator
Administrator


Joined: 28 Mar 2006
Posts: 5174
Location: &akkara

PostPosted: Thu Jan 03, 2013 2:05 pm    Post subject: Reply with quote

A few things off the top of my head that could be exacerbating your problems:

(1) Loose, poorly-contacting power connector (that plugs into the drive)
(2) Too many drives on any given power connector string (the ones toward the end are often affected). Try to keep it to 2 drives per connector chain.
(3) Loose or jiggle-able sata connector (either on the drive, or on the motherboard. I strap mine down with velcro ties)
(4) Drives that spin down, and later induce a momentary power sag when spinning back up (often goes along with points (1) or (2)).

Many of these are "power supply problems" but not necessarily a bad power supply itself. Although a good quality power supply might fix the issues, it might simply be a case of the thicker gauge wires it comes with better handles having lots of drives on a connector chain.

And regardless of gauge, wire inductance is still there (dependent mostly on length and especially if it's been coiled up into a "neat bundle"). I've seen many a "hard resetting link" message coming in just as I spin up a drive that's been inserted into the removable caddy. I wasn't running raid at the time so it only caused a momentary hiccup. But the only way I got rid of it, was to power the removable caddy from its own dedicated cable from the power supply. You might be having similar issues, in addition to bad drives.
_________________
echo 'long long long x;' | gcc -x c -c -
Back to top
View user's profile Send private message
NeddySeagoon
Administrator
Administrator


Joined: 05 Jul 2003
Posts: 31907
Location: 56N 3W

PostPosted: Thu Jan 03, 2013 6:27 pm    Post subject: Reply with quote

Bigun,

You may only need 300w steady state but thats total steady state.
Most PSUs cannot deliver their full rated output because one voltage, or a combination of two, hits its power limit first.
e.g. a random PSU, pulled out of my gander box says
+5v 32A 160w
+3.3v 20A 66w
+12v 16A 192w
-12v 0.8A
-5v 0.3A
5v Stby 3A

That looks like a 300W PSU ... but it goes on in smaller print ...
5v and 3.3v shall not exceed 165w
5v+3.3v+12v shall not exceed 280w
Its these combinations that are the limiting factor, not the advertised load that a PSU can support.

PSU derating is a good thing too, the PSU runs cooler, lasts longer and produces better regulation.

In short, if you need 300w, don't look at any PSUs under 500w and read the fine print.

Factor in the drive motor stalled load too ... thats about 2A on the 12v. Drive labels usually only give the steady state current.
_________________
Regards,

NeddySeagoon

Computer users fall into two groups:-
those that do backups
those that have never had a hard drive fail.
Back to top
View user's profile Send private message
eccerr0r
Advocate
Advocate


Joined: 01 Jul 2004
Posts: 3899
Location: USA

PostPosted: Thu Jan 03, 2013 8:46 pm    Post subject: Reply with quote

I had one drive that kept on getting kicked from my 500x4 array that upon checking by itself, always turned out good. Since I keep on swapping sata cables around I never noticed a pattern that it was always one particular hotswap bay that kept on "failing" but eventually I rootcaused it to a poorly fitting SATA power connector at the hotswap bay (not internal to the hotswap bay). Cleaning the connector with connector cleaner fixed the issue.

I feel that I have the same problem with one of the drives in my molex powered 120x4 array... After many years of using these connectors they start to feel loose, and that's a bad sign... I hope the kicked 120G disk survives another 30K hours and reach 100K power on hours and beyond :D
_________________
Intel Core i7 2700K@ 4.1GHz/HD3000 graphics/8GB DDR3/180GB SSD
What am I supposed to be advocating?
Back to top
View user's profile Send private message
Display posts from previous:   
Reply to topic    Gentoo Forums Forum Index Kernel & Hardware All times are GMT
Goto page 1, 2  Next
Page 1 of 2

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum