Gentoo Forums
Gentoo Forums
Gentoo Forums
Quick Search: in
Bad Blocks? (Solved)
View unanswered posts
View posts from last 24 hours

 
Reply to topic    Gentoo Forums Forum Index Kernel & Hardware
View previous topic :: View next topic  
Author Message
nlsa8z6zoz7lyih3ap
Apprentice
Apprentice


Joined: 25 Sep 2007
Posts: 230
Location: Canada

PostPosted: Fri Jul 06, 2012 4:22 pm    Post subject: Bad Blocks? (Solved) Reply with quote

/dev/mapper/sdc is a Western Digital Caviar Green 2TB Hard Drive (a few years old) that is encrypted using dm-crypt and an ext4 filesystem. It has no partitions so all of /dev/mapper/sdc is just a single ext4 filesystem.

A recent routine boot fsck indicated problems asking for a manual fsck.

I ran e2fsck -vcp /dev/mapper/sdc with these results:

Quote:
e2fsck -vcp /dev/mapper/sdc
Error reading block 350224385 (Attempt to read block from filesystem resulted in short read) while reading inode and block bitmaps.

/dev/mapper/sdc: UNEXPECTED INCONSISTENCY; RUN fsck MANUALLY.
(i.e., without -a or -p options)


I then copied everything off of this drive to a spare disk,
reformatted (mke2fs -t ext4 /dev/mapper/sdc) and then started rerunning
Quote:
e2fsck -vcp /dev/mapper/sdc



there has been no output so far. Hoever I tested it with dumpe2fs (while e2fsck was running) and got the following:

Quote:
sudo dumpe2fs /dev/mapper/sdc|grep -3 bad
dumpe2fs 1.42 (29-Nov-2011)
Free blocks: 24674304-24707071
Free inodes: 6168577-6176768
Group 754: (Blocks 24707072-24739839) [INODE_UNINIT, BLOCK_UNINIT]
Checksum 0xbad3, unused inodes 8192
Block bitmap at 24641538 (bg #752 + 2), Inode bitmap at 24641554 (bg #752 + 18)
Inode table at 24642592-24643103 (bg #752 + 1056)
32768 free blocks, 8192 free inodes, 0 directories, 8192 unused inodes
--
Free blocks: 58589184-58621951
Free inodes: 14647297-14655488
Group 1789: (Blocks 58621952-58654719) [INODE_UNINIT, BLOCK_UNINIT]
Checksum 0x5bad, unused inodes 8192
Block bitmap at 58195981 (bg #1776 + 13), Inode bitmap at 58195997 (bg #1776 + 29)
Inode table at 58202656-58203167 (bg #1776 + 6688)
32768 free blocks, 8192 free inodes, 0 directories, 8192 unused inodes
--
Free blocks: 114425856-114458623
Free inodes: 28606465-28614656
Group 3493: (Blocks 114458624-114491391) [INODE_UNINIT, BLOCK_UNINIT]
Checksum 0xbad5, unused inodes 8192
Block bitmap at 114294789 (bg #3488 + 5), Inode bitmap at 114294805 (bg #3488 + 21)
Inode table at 114297376-114297887 (bg #3488 + 2592)
32768 free blocks, 8192 free inodes, 0 directories, 8192 unused inodes
--
Free blocks: 159711232-159743999
Free inodes: 39927809-39936000
Group 4875: (Blocks 159744000-159776767) [INODE_UNINIT, BLOCK_UNINIT]
Checksum 0xbad9, unused inodes 8192
Block bitmap at 159383563 (bg #4864 + 11), Inode bitmap at 159383579 (bg #4864 + 27)
Inode table at 159389216-159389727 (bg #4864 + 5664)
32768 free blocks, 8192 free inodes, 0 directories, 8192 unused inodes
--
Free blocks: 229277696-229310463
Free inodes: 57319425-57327616
Group 6998: (Blocks 229310464-229343231) [INODE_UNINIT, BLOCK_UNINIT]
Checksum 0x6bad, unused inodes 8192
Block bitmap at 229113862 (bg #6992 + 6), Inode bitmap at 229113878 (bg #6992 + 22)
Inode table at 229116960-229117471 (bg #6992 + 3104)
32768 free blocks, 8192 free inodes, 0 directories, 8192 unused inodes
--
Free blocks: 237862912-237895679
Free inodes: 59465729-59473920
Group 7260: (Blocks 237895680-237928447) [INODE_UNINIT, BLOCK_UNINIT]
Checksum 0xbadf, unused inodes 8192
Block bitmap at 237502476 (bg #7248 + 12), Inode bitmap at 237502492 (bg #7248 + 28)
Inode table at 237508640-237509151 (bg #7248 + 6176)
32768 free blocks, 8192 free inodes, 0 directories, 8192 unused inodes
--
Free blocks: 281313280-281346047
Free inodes: 70328321-70336512
Group 8586: (Blocks 281346048-281378815) [INODE_UNINIT, BLOCK_UNINIT]
Checksum 0x9bad, unused inodes 8192
Block bitmap at 281018378 (bg #8576 + 10), Inode bitmap at 281018394 (bg #8576 + 26)
Inode table at 281023520-281024031 (bg #8576 + 5152)


Does this mean that the drive is failing and should be replaced, or could something else be involved?
I don't mind replacing it, but would hate to put in a new disc and find the same problem right away.

PS /dev/mapper/sdb (which is also a 2TB western digital caviar green is also showing bad blocks with the same tests,
but all partitions on /dev/mapper/sda (which is a seagate 500GB) show no bad blocks)
32768 free blocks, 8192 free inodes, 0 directories, 8192 unused inodes[/quote]


Last edited by nlsa8z6zoz7lyih3ap on Mon Jul 09, 2012 3:04 pm; edited 1 time in total
Back to top
View user's profile Send private message
aCOSwt
Advocate
Advocate


Joined: 19 Oct 2007
Posts: 2035
Location: Between the keyboard and the chair

PostPosted: Fri Jul 06, 2012 4:38 pm    Post subject: Reply with quote

Why don't you just ask badblocks (from sys-fs/e2fsprogs) to tell you ?
_________________
In theory there are no differences between theory and practice. In practice, there are.
Don't try to understand my posts. Immanuel Kant never did, he thinks that only music and laughter do not have to mean anything.
Back to top
View user's profile Send private message
nlsa8z6zoz7lyih3ap
Apprentice
Apprentice


Joined: 25 Sep 2007
Posts: 230
Location: Canada

PostPosted: Fri Jul 06, 2012 6:47 pm    Post subject: Reply with quote

Thanks.

The last line of output from
Quote:
sudo badblocks -s -v /dev/mapper/sdc
is
Quote:
100556555one, 1:00:24 elapsed. (86/0/0 errors)


So it appears that there are bad blocks.

QUESTION:
Code:
e2fsck -vcp /dev/mapper/sdc

is (as I understand it) suppposed to instruct the ext4 file system to not use those blocks.
Is it considered safe to carry on using the disc after this, or would it be standard practice to just replace the disc.?

Do you know exactly what the output "(86/0/0) " means?
Back to top
View user's profile Send private message
NeddySeagoon
Administrator
Administrator


Joined: 05 Jul 2003
Posts: 29961
Location: 56N 3W

PostPosted: Fri Jul 06, 2012 7:00 pm    Post subject: Reply with quote

nlsa8z6zoz7lyih3ap,

get smartmontools and ask the drive.

Check your warranty status too. I bought 5 of these drives for a media server. So far I have had two warranty replaements, tehy bth failed after about 9 months.

The drive should remap bad blocks when they are predicted to be failure prone, so you never actually see any bad blocks at the Os level.
A write to the affected blocks should force a remap too.
While the above is all very interesting, check your warranty before you try any 'fixed' and post your smartctrl output.
There is no point is messig with an iffy drive that qualifies for a free replacement.
_________________
Regards,

NeddySeagoon

Computer users fall into two groups:-
those that do backups
those that have never had a hard drive fail.
Back to top
View user's profile Send private message
nlsa8z6zoz7lyih3ap
Apprentice
Apprentice


Joined: 25 Sep 2007
Posts: 230
Location: Canada

PostPosted: Fri Jul 06, 2012 11:09 pm    Post subject: Reply with quote

Thanks for steering me to smartctl.

After subjecting the drive to several tests I read the log as follows:
Quote:
smartctl -l selftest /dev/sdc
smartctl 5.42 2011-10-20 r3458 [x86_64-linux-3.2.21-gentoo-b] (local build)
Copyright (C) 2002-11 by Bruce Allen, http://smartmontools.sourceforge.net

=== START OF READ SMART DATA SECTION ===
SMART Self-test log structure revision number 1
Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error
# 1 Extended offline Completed: read failure 90% 6128 31187072
# 2 Extended offline Completed: read failure 90% 6128 31187076
# 3 Extended offline Completed: read failure 90% 6128 31187072
# 4 Short offline Completed: read failure 90% 6128 31187076
# 5 Extended offline Completed: read failure 90% 6128 31187072
# 6 Conveyance offline Completed: read failure 90% 6127 31187076
# 7 Short offline Completed: read failure 90% 6127 31187072


Thanks in advance for your interpretation of this.
PS Twice in the last month e2fsck has found errors on this drive during boot, after a clean shutdown.
Once my vmware virtual machine (which lives on this drive) mysteriously refused to boot. (Fortunately I make frequent backups, and so lost nothing.
Back to top
View user's profile Send private message
Ant P.
Veteran
Veteran


Joined: 18 Apr 2009
Posts: 1917
Location: UK

PostPosted: Fri Jul 06, 2012 11:16 pm    Post subject: Reply with quote

The drive's almost certainly dying. `smartctl -a /dev/sdc` would be useful to see too.
Back to top
View user's profile Send private message
NeddySeagoon
Administrator
Administrator


Joined: 05 Jul 2003
Posts: 29961
Location: 56N 3W

PostPosted: Fri Jul 06, 2012 11:16 pm    Post subject: Reply with quote

nlsa8z6zoz7lyih3ap,

The smartmon log is more useful ... from memory its the -x option
_________________
Regards,

NeddySeagoon

Computer users fall into two groups:-
those that do backups
those that have never had a hard drive fail.
Back to top
View user's profile Send private message
nlsa8z6zoz7lyih3ap
Apprentice
Apprentice


Joined: 25 Sep 2007
Posts: 230
Location: Canada

PostPosted: Fri Jul 06, 2012 11:44 pm    Post subject: Reply with quote

Quote:
The smartmon log is more useful ... from memory its the -x option




Quote:
smartctl -x /dev/sdc
smartctl 5.42 2011-10-20 r3458 [x86_64-linux-3.2.21-gentoo-b] (local build)
Copyright (C) 2002-11 by Bruce Allen, http://smartmontools.sourceforge.net

=== START OF INFORMATION SECTION ===
Model Family: Western Digital Caviar Green (Adv. Format)
Device Model: WDC WD20EARS-00MVWB0
Serial Number: WD-WCAZA0780869
LU WWN Device Id: 5 0014ee 2af9ea618
Firmware Version: 51.0AB51
User Capacity: 2,000,398,934,016 bytes [2.00 TB]
Sector Size: 512 bytes logical/physical
Device is: In smartctl database [for details use: -P show]
ATA Version is: 8
ATA Standard is: Exact ATA specification draft version not indicated
Local Time is: Fri Jul 6 16:46:23 2012 PDT
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status: (0x84) Offline data collection activity
was suspended by an interrupting command from host.
Auto Offline Data Collection: Enabled.
Self-test execution status: ( 121) The previous self-test completed having
the read element of the test failed.
Total time to complete Offline
data collection: (38400) seconds.
Offline data collection
capabilities: (0x7b) SMART execute Offline immediate.
Auto Offline data collection on/off support.
Suspend Offline collection upon new
command.
Offline surface scan supported.
Self-test supported.
Conveyance Self-test supported.
Selective Self-test supported.
SMART capabilities: (0x0003) Saves SMART data before entering
power-saving mode.
Supports SMART auto save timer.
Error logging capability: (0x01) Error logging supported.
General Purpose Logging supported.
Short self-test routine
recommended polling time: ( 2) minutes.
Extended self-test routine
recommended polling time: ( 255) minutes.
Conveyance self-test routine
recommended polling time: ( 5) minutes.
SCT capabilities: (0x3035) SCT Status supported.
SCT Feature Control supported.
SCT Data Table supported.

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAGS VALUE WORST THRESH FAIL RAW_VALUE
1 Raw_Read_Error_Rate POSR-K 103 103 051 - 49173
3 Spin_Up_Time POS--K 173 164 021 - 6350
4 Start_Stop_Count -O--CK 100 100 000 - 557
5 Reallocated_Sector_Ct PO--CK 173 173 140 - 585
7 Seek_Error_Rate -OSR-K 200 200 000 - 0
9 Power_On_Hours -O--CK 092 092 000 - 6129
10 Spin_Retry_Count -O--CK 100 100 000 - 0
11 Calibration_Retry_Count -O--CK 100 100 000 - 0
12 Power_Cycle_Count -O--CK 100 100 000 - 545
192 Power-Off_Retract_Count -O--CK 200 200 000 - 136
193 Load_Cycle_Count -O--CK 182 182 000 - 56324
194 Temperature_Celsius -O---K 114 111 000 - 36
196 Reallocated_Event_Count -O--CK 001 001 000 - 264
197 Current_Pending_Sector -O--CK 200 001 000 - 272
198 Offline_Uncorrectable ----CK 200 199 000 - 228
199 UDMA_CRC_Error_Count -O--CK 200 200 000 - 0
200 Multi_Zone_Error_Rate ---R-- 134 134 000 - 17802
||||||_ K auto-keep
|||||__ C event count
||||___ R error rate
|||____ S speed/performance
||_____ O updated online
|______ P prefailure warning

General Purpose Log Directory Version 1
SMART Log Directory Version 1 [multi-sector log support]
GP/S Log at address 0x00 has 1 sectors [Log Directory]
SMART Log at address 0x01 has 1 sectors [Summary SMART error log]
SMART Log at address 0x02 has 5 sectors [Comprehensive SMART error log]
GP Log at address 0x03 has 6 sectors [Ext. Comprehensive SMART error log]
SMART Log at address 0x06 has 1 sectors [SMART self-test log]
GP Log at address 0x07 has 1 sectors [Extended self-test log]
SMART Log at address 0x09 has 1 sectors [Selective self-test log]
GP Log at address 0x10 has 1 sectors [NCQ Command Error log]
GP Log at address 0x11 has 1 sectors [SATA Phy Event Counters]
GP/S Log at address 0x80 has 16 sectors [Host vendor specific log]
GP/S Log at address 0x81 has 16 sectors [Host vendor specific log]
GP/S Log at address 0x82 has 16 sectors [Host vendor specific log]
GP/S Log at address 0x83 has 16 sectors [Host vendor specific log]
GP/S Log at address 0x84 has 16 sectors [Host vendor specific log]
GP/S Log at address 0x85 has 16 sectors [Host vendor specific log]
GP/S Log at address 0x86 has 16 sectors [Host vendor specific log]
GP/S Log at address 0x87 has 16 sectors [Host vendor specific log]
GP/S Log at address 0x88 has 16 sectors [Host vendor specific log]
GP/S Log at address 0x89 has 16 sectors [Host vendor specific log]
GP/S Log at address 0x8a has 16 sectors [Host vendor specific log]
GP/S Log at address 0x8b has 16 sectors [Host vendor specific log]
GP/S Log at address 0x8c has 16 sectors [Host vendor specific log]
GP/S Log at address 0x8d has 16 sectors [Host vendor specific log]
GP/S Log at address 0x8e has 16 sectors [Host vendor specific log]
GP/S Log at address 0x8f has 16 sectors [Host vendor specific log]
GP/S Log at address 0x90 has 16 sectors [Host vendor specific log]
GP/S Log at address 0x91 has 16 sectors [Host vendor specific log]
GP/S Log at address 0x92 has 16 sectors [Host vendor specific log]
GP/S Log at address 0x93 has 16 sectors [Host vendor specific log]
GP/S Log at address 0x94 has 16 sectors [Host vendor specific log]
GP/S Log at address 0x95 has 16 sectors [Host vendor specific log]
GP/S Log at address 0x96 has 16 sectors [Host vendor specific log]
GP/S Log at address 0x97 has 16 sectors [Host vendor specific log]
GP/S Log at address 0x98 has 16 sectors [Host vendor specific log]
GP/S Log at address 0x99 has 16 sectors [Host vendor specific log]
GP/S Log at address 0x9a has 16 sectors [Host vendor specific log]
GP/S Log at address 0x9b has 16 sectors [Host vendor specific log]
GP/S Log at address 0x9c has 16 sectors [Host vendor specific log]
GP/S Log at address 0x9d has 16 sectors [Host vendor specific log]
GP/S Log at address 0x9e has 16 sectors [Host vendor specific log]
GP/S Log at address 0x9f has 16 sectors [Host vendor specific log]
GP/S Log at address 0xa0 has 16 sectors [Device vendor specific log]
GP/S Log at address 0xa1 has 16 sectors [Device vendor specific log]
GP/S Log at address 0xa2 has 16 sectors [Device vendor specific log]
GP/S Log at address 0xa3 has 16 sectors [Device vendor specific log]
GP/S Log at address 0xa4 has 16 sectors [Device vendor specific log]
GP/S Log at address 0xa5 has 16 sectors [Device vendor specific log]
GP/S Log at address 0xa6 has 16 sectors [Device vendor specific log]
GP/S Log at address 0xa7 has 16 sectors [Device vendor specific log]
GP/S Log at address 0xa8 has 1 sectors [Device vendor specific log]
GP/S Log at address 0xa9 has 1 sectors [Device vendor specific log]
GP/S Log at address 0xaa has 1 sectors [Device vendor specific log]
GP/S Log at address 0xab has 1 sectors [Device vendor specific log]
GP/S Log at address 0xac has 1 sectors [Device vendor specific log]
GP/S Log at address 0xad has 1 sectors [Device vendor specific log]
GP/S Log at address 0xae has 1 sectors [Device vendor specific log]
GP/S Log at address 0xaf has 1 sectors [Device vendor specific log]
GP/S Log at address 0xb0 has 1 sectors [Device vendor specific log]
GP/S Log at address 0xb1 has 1 sectors [Device vendor specific log]
GP/S Log at address 0xb2 has 1 sectors [Device vendor specific log]
GP/S Log at address 0xb3 has 1 sectors [Device vendor specific log]
GP/S Log at address 0xb4 has 1 sectors [Device vendor specific log]
GP/S Log at address 0xb5 has 1 sectors [Device vendor specific log]
GP/S Log at address 0xb6 has 1 sectors [Device vendor specific log]
GP/S Log at address 0xb7 has 1 sectors [Device vendor specific log]
GP/S Log at address 0xc0 has 1 sectors [Device vendor specific log]
GP Log at address 0xc1 has 93 sectors [Device vendor specific log]
GP/S Log at address 0xe0 has 1 sectors [SCT Command/Status]
GP/S Log at address 0xe1 has 1 sectors [SCT Data Transfer]

SMART Extended Comprehensive Error Log Version: 1 (6 sectors)
Device Error Count: 8442 (device log contains only the most recent 24 errors)
CR = Command Register
FEATR = Features Register
COUNT = Count (was: Sector Count) Register
LBA_48 = Upper bytes of LBA High/Mid/Low Registers ] ATA-8
LH = LBA High (was: Cylinder High) Register ] LBA
LM = LBA Mid (was: Cylinder Low) Register ] Register
LL = LBA Low (was: Sector Number) Register ]
DV = Device (was: Device/Head) Register
DC = Device Control Register
ER = Error register
ST = Status register
Powered_Up_Time is measured from power on, and printed as
DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes,
SS=sec, and sss=millisec. It "wraps" after 49.710 days.

Error 8442 [17] occurred at disk power-on lifetime: 6129 hours (255 days + 9 hours)
When the command that caused the error occurred, the device was active or idle.

After command completion occurred, registers were:
ER -- ST COUNT LBA_48 LH LM LL DV DC
-- -- -- == -- == == == -- -- -- -- --
40 -- 51 00 08 00 00 8e 7e 7d e8 e0 00 Error: UNC 8 sectors at LBA = 0x8e7e7de8 = 2390654440

Commands leading to the command that caused the error were:
CR FEATR COUNT LBA_48 LH LM LL DV DC Powered_Up_Time Command/Feature_Name
-- == -- == -- == == == -- -- -- -- -- --------------- --------------------
25 00 00 00 08 00 00 8e 7e 7d e8 e0 0a 1d+02:02:33.123 READ DMA EXT
25 00 00 00 08 00 00 8e 7e 7d e0 e0 0a 1d+02:02:32.535 READ DMA EXT
25 00 00 00 08 00 00 8e 7e 7d d8 e0 0a 1d+02:02:32.032 READ DMA EXT
25 00 00 00 08 00 00 8e 7e 7d d0 e0 0a 1d+02:02:31.440 READ DMA EXT
25 00 00 00 08 00 00 8e 7e 7d c8 e0 0a 1d+02:02:31.440 READ DMA EXT

Error 8441 [16] occurred at disk power-on lifetime: 6129 hours (255 days + 9 hours)
When the command that caused the error occurred, the device was active or idle.

After command completion occurred, registers were:
ER -- ST COUNT LBA_48 LH LM LL DV DC
-- -- -- == -- == == == -- -- -- -- --
40 -- 51 00 08 00 00 8e 7e 24 08 e0 00 Error: UNC 8 sectors at LBA = 0x8e7e2408 = 2390631432

Commands leading to the command that caused the error were:
CR FEATR COUNT LBA_48 LH LM LL DV DC Powered_Up_Time Command/Feature_Name
-- == -- == -- == == == -- -- -- -- -- --------------- --------------------
25 00 00 00 08 00 00 8e 7e 24 08 e0 0a 1d+02:02:22.971 READ DMA EXT
25 00 00 00 08 00 00 8e 7e 24 00 e0 0a 1d+02:02:22.971 READ DMA EXT
25 00 00 00 08 00 00 8e 7e 23 f8 e0 0a 1d+02:02:22.970 READ DMA EXT
25 00 00 00 08 00 00 8e 7e 23 f0 e0 0a 1d+02:02:22.970 READ DMA EXT
25 00 00 00 08 00 00 8e 7e 23 e8 e0 0a 1d+02:02:22.970 READ DMA EXT

Error 8440 [15] occurred at disk power-on lifetime: 6129 hours (255 days + 9 hours)
When the command that caused the error occurred, the device was active or idle.

After command completion occurred, registers were:
ER -- ST COUNT LBA_48 LH LM LL DV DC
-- -- -- == -- == == == -- -- -- -- --
40 -- 51 00 08 00 00 8e 7e 12 70 e0 00 Error: UNC 8 sectors at LBA = 0x8e7e1270 = 2390626928

Commands leading to the command that caused the error were:
CR FEATR COUNT LBA_48 LH LM LL DV DC Powered_Up_Time Command/Feature_Name
-- == -- == -- == == == -- -- -- -- -- --------------- --------------------
25 00 00 00 08 00 00 8e 7e 12 70 e0 0a 1d+02:02:14.262 READ DMA EXT
27 00 00 00 00 00 00 00 00 00 00 e0 0a 1d+02:02:14.262 READ NATIVE MAX ADDRESS EXT
ec 00 00 00 00 00 00 00 00 00 00 a0 0a 1d+02:02:14.243 IDENTIFY DEVICE
ef 00 03 00 46 00 00 00 00 00 00 a0 0a 1d+02:02:14.243 SET FEATURES [Set transfer mode]
27 00 00 00 00 00 00 00 00 00 00 e0 0a 1d+02:02:14.243 READ NATIVE MAX ADDRESS EXT

Error 8439 [14] occurred at disk power-on lifetime: 6129 hours (255 days + 9 hours)
When the command that caused the error occurred, the device was active or idle.

After command completion occurred, registers were:
ER -- ST COUNT LBA_48 LH LM LL DV DC
-- -- -- == -- == == == -- -- -- -- --
40 -- 51 00 08 00 00 8e 7e 12 70 e0 00 Error: UNC 8 sectors at LBA = 0x8e7e1270 = 2390626928

Commands leading to the command that caused the error were:
CR FEATR COUNT LBA_48 LH LM LL DV DC Powered_Up_Time Command/Feature_Name
-- == -- == -- == == == -- -- -- -- -- --------------- --------------------
25 00 00 00 08 00 00 8e 7e 12 70 e0 0a 1d+02:02:11.216 READ DMA EXT
25 00 00 00 08 00 00 8e 7e 12 68 e0 0a 1d+02:02:11.216 READ DMA EXT
25 00 00 00 08 00 00 8e 7e 12 60 e0 0a 1d+02:02:11.216 READ DMA EXT
25 00 00 00 08 00 00 8e 7e 12 58 e0 0a 1d+02:02:11.215 READ DMA EXT
25 00 00 00 08 00 00 8e 7e 12 50 e0 0a 1d+02:02:11.215 READ DMA EXT

Error 8438 [13] occurred at disk power-on lifetime: 6129 hours (255 days + 9 hours)
When the command that caused the error occurred, the device was active or idle.

After command completion occurred, registers were:
ER -- ST COUNT LBA_48 LH LM LL DV DC
-- -- -- == -- == == == -- -- -- -- --
40 -- 51 00 08 00 00 8e 7d 80 f8 e0 00 Error: UNC 8 sectors at LBA = 0x8e7d80f8 = 2390589688

Commands leading to the command that caused the error were:
CR FEATR COUNT LBA_48 LH LM LL DV DC Powered_Up_Time Command/Feature_Name
-- == -- == -- == == == -- -- -- -- -- --------------- --------------------
25 00 00 00 08 00 00 8e 7d 80 f8 e0 0a 1d+02:01:55.828 READ DMA EXT
27 00 00 00 00 00 00 00 00 00 00 e0 0a 1d+02:01:55.828 READ NATIVE MAX ADDRESS EXT
ec 00 00 00 00 00 00 00 00 00 00 a0 0a 1d+02:01:55.808 IDENTIFY DEVICE
ef 00 03 00 46 00 00 00 00 00 00 a0 0a 1d+02:01:55.806 SET FEATURES [Set transfer mode]
27 00 00 00 00 00 00 00 00 00 00 e0 0a 1d+02:01:55.806 READ NATIVE MAX ADDRESS EXT

Error 8437 [12] occurred at disk power-on lifetime: 6129 hours (255 days + 9 hours)
When the command that caused the error occurred, the device was active or idle.

After command completion occurred, registers were:
ER -- ST COUNT LBA_48 LH LM LL DV DC
-- -- -- == -- == == == -- -- -- -- --
40 -- 51 00 08 00 00 8e 7d 80 f8 e0 00 Error: UNC 8 sectors at LBA = 0x8e7d80f8 = 2390589688

Commands leading to the command that caused the error were:
CR FEATR COUNT LBA_48 LH LM LL DV DC Powered_Up_Time Command/Feature_Name
-- == -- == -- == == == -- -- -- -- -- --------------- --------------------
25 00 00 00 08 00 00 8e 7d 80 f8 e0 0a 1d+02:01:52.913 READ DMA EXT
27 00 00 00 00 00 00 00 00 00 00 e0 0a 1d+02:01:52.913 READ NATIVE MAX ADDRESS EXT
ec 00 00 00 00 00 00 00 00 00 00 a0 0a 1d+02:01:52.894 IDENTIFY DEVICE
ef 00 03 00 46 00 00 00 00 00 00 a0 0a 1d+02:01:52.894 SET FEATURES [Set transfer mode]
27 00 00 00 00 00 00 00 00 00 00 e0 0a 1d+02:01:52.894 READ NATIVE MAX ADDRESS EXT

Error 8436 [11] occurred at disk power-on lifetime: 6129 hours (255 days + 9 hours)
When the command that caused the error occurred, the device was active or idle.

After command completion occurred, registers were:
ER -- ST COUNT LBA_48 LH LM LL DV DC
-- -- -- == -- == == == -- -- -- -- --
40 -- 51 00 08 00 00 8e 7d 80 f8 e0 00 Error: UNC 8 sectors at LBA = 0x8e7d80f8 = 2390589688

Commands leading to the command that caused the error were:
CR FEATR COUNT LBA_48 LH LM LL DV DC Powered_Up_Time Command/Feature_Name
-- == -- == -- == == == -- -- -- -- -- --------------- --------------------
25 00 00 00 08 00 00 8e 7d 80 f8 e0 0a 1d+02:01:49.860 READ DMA EXT
25 00 00 00 08 00 00 8e 7d 80 f0 e0 0a 1d+02:01:49.860 READ DMA EXT
25 00 00 00 08 00 00 8e 7d 80 e8 e0 0a 1d+02:01:49.860 READ DMA EXT
25 00 00 00 08 00 00 8e 7d 80 e0 e0 0a 1d+02:01:49.860 READ DMA EXT
25 00 00 00 08 00 00 8e 7d 80 d8 e0 0a 1d+02:01:49.860 READ DMA EXT

Error 8435 [10] occurred at disk power-on lifetime: 6129 hours (255 days + 9 hours)
When the command that caused the error occurred, the device was active or idle.

After command completion occurred, registers were:
ER -- ST COUNT LBA_48 LH LM LL DV DC
-- -- -- == -- == == == -- -- -- -- --
40 -- 51 00 08 00 00 8e 7d 68 08 e0 00 Error: UNC 8 sectors at LBA = 0x8e7d6808 = 2390583304

Commands leading to the command that caused the error were:
CR FEATR COUNT LBA_48 LH LM LL DV DC Powered_Up_Time Command/Feature_Name
-- == -- == -- == == == -- -- -- -- -- --------------- --------------------
25 00 00 00 08 00 00 8e 7d 68 08 e0 0a 1d+02:01:45.275 READ DMA EXT
27 00 00 00 00 00 00 00 00 00 00 e0 0a 1d+02:01:45.275 READ NATIVE MAX ADDRESS EXT
ec 00 00 00 00 00 00 00 00 00 00 a0 0a 1d+02:01:45.256 IDENTIFY DEVICE
ef 00 03 00 46 00 00 00 00 00 00 a0 0a 1d+02:01:45.256 SET FEATURES [Set transfer mode]
27 00 00 00 00 00 00 00 00 00 00 e0 0a 1d+02:01:45.256 READ NATIVE MAX ADDRESS EXT

SMART Extended Self-test Log Version: 1 (1 sectors)
Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error
# 1 Extended offline Completed: read failure 90% 6128 31187072
# 2 Extended offline Completed: read failure 90% 6128 31187076
# 3 Extended offline Completed: read failure 90% 6128 31187072
# 4 Short offline Completed: read failure 90% 6128 31187076
# 5 Extended offline Completed: read failure 90% 6128 31187072
# 6 Conveyance offline Completed: read failure 90% 6127 31187076
# 7 Short offline Completed: read failure 90% 6127 31187072

SMART Selective self-test log data structure revision number 1
SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS
1 0 0 Not_testing
2 0 0 Not_testing
3 0 0 Not_testing
4 0 0 Not_testing
5 0 0 Not_testing
Selective self-test flags (0x0):
After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.

SCT Status Version: 3
SCT Version (vendor specific): 258 (0x0102)
SCT Support Level: 1
Device State: Active (0)
Current Temperature: 36 Celsius
Power Cycle Min/Max Temperature: 19/37 Celsius
Lifetime Min/Max Temperature: 19/40 Celsius
Under/Over Temperature Limit Count: 0/0
SCT Temperature History Version: 2
Temperature Sampling Period: 1 minute
Temperature Logging Interval: 1 minute
Min/Max recommended Temperature: 0/60 Celsius
Min/Max Temperature Limit: -41/85 Celsius
Temperature History Size (Index): 478 (203)

Index Estimated Time Temperature Celsius
204 2012-07-06 08:49 35 ****************
... ..(131 skipped). .. ****************
336 2012-07-06 11:01 35 ****************
337 2012-07-06 11:02 36 *****************
... ..( 22 skipped). .. *****************
360 2012-07-06 11:25 36 *****************
361 2012-07-06 11:26 37 ******************
... ..( 4 skipped). .. ******************
366 2012-07-06 11:31 37 ******************
367 2012-07-06 11:32 36 *****************
368 2012-07-06 11:33 37 ******************
... ..( 3 skipped). .. ******************
372 2012-07-06 11:37 37 ******************
373 2012-07-06 11:38 36 *****************
374 2012-07-06 11:39 37 ******************
... ..( 11 skipped). .. ******************
386 2012-07-06 11:51 37 ******************
387 2012-07-06 11:52 36 *****************
388 2012-07-06 11:53 36 *****************
389 2012-07-06 11:54 36 *****************
390 2012-07-06 11:55 37 ******************
... ..( 35 skipped). .. ******************
426 2012-07-06 12:31 37 ******************
427 2012-07-06 12:32 36 *****************
... ..( 25 skipped). .. *****************
453 2012-07-06 12:58 36 *****************
454 2012-07-06 12:59 34 ***************
... ..( 27 skipped). .. ***************
4 2012-07-06 13:27 34 ***************
5 2012-07-06 13:28 35 ****************
... ..(197 skipped). .. ****************
203 2012-07-06 16:46 35 ****************

Warning: device does not support SCT Error Recovery Control command
SATA Phy Event Counters (GP Log 0x11)
ID Size Value Description
0x0001 2 0 Command failed due to ICRC error
0x0002 2 0 R_ERR response for data FIS
0x0003 2 0 R_ERR response for device-to-host data FIS
0x0004 2 0 R_ERR response for host-to-device data FIS
0x0005 2 0 R_ERR response for non-data FIS
0x0006 2 0 R_ERR response for device-to-host non-data FIS
0x0007 2 0 R_ERR response for host-to-device non-data FIS
0x000a 2 1 Device-to-host register FISes sent due to a COMRESET
0x000b 2 0 CRC errors within host-to-device FIS
0x8000 4 95356 Vendor specific

Back to top
View user's profile Send private message
nlsa8z6zoz7lyih3ap
Apprentice
Apprentice


Joined: 25 Sep 2007
Posts: 230
Location: Canada

PostPosted: Fri Jul 06, 2012 11:45 pm    Post subject: Reply with quote

Quote:
The drive's almost certainly dying. `smartctl -a /dev/sdc` would be useful to see too




Code:

 smartctl -a /dev/sdc
smartctl 5.42 2011-10-20 r3458 [x86_64-linux-3.2.21-gentoo-b] (local build)
Copyright (C) 2002-11 by Bruce Allen, http://smartmontools.sourceforge.net

=== START OF INFORMATION SECTION ===
Model Family:     Western Digital Caviar Green (Adv. Format)
Device Model:     WDC WD20EARS-00MVWB0
Serial Number:    WD-WCAZA0780869
LU WWN Device Id: 5 0014ee 2af9ea618
Firmware Version: 51.0AB51
User Capacity:    2,000,398,934,016 bytes [2.00 TB]
Sector Size:      512 bytes logical/physical
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   8
ATA Standard is:  Exact ATA specification draft version not indicated
Local Time is:    Fri Jul  6 16:45:05 2012 PDT
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x84) Offline data collection activity
                                        was suspended by an interrupting command from host.
                                        Auto Offline Data Collection: Enabled.
Self-test execution status:      ( 121) The previous self-test completed having
                                        the read element of the test failed.
Total time to complete Offline
data collection:                (38400) seconds.
Offline data collection
capabilities:                    (0x7b) SMART execute Offline immediate.
                                        Auto Offline data collection on/off support.
                                        Suspend Offline collection upon new
                                        command.
                                        Offline surface scan supported.
                                        Self-test supported.
                                        Conveyance Self-test supported.
                                        Selective Self-test supported.
SMART capabilities:            (0x0003) Saves SMART data before entering
                                        power-saving mode.
                                        Supports SMART auto save timer.
Error logging capability:        (0x01) Error logging supported.
                                        General Purpose Logging supported.
Short self-test routine
recommended polling time:        (   2) minutes.
Extended self-test routine
recommended polling time:        ( 255) minutes.
Conveyance self-test routine
recommended polling time:        (   5) minutes.
SCT capabilities:              (0x3035) SCT Status supported.
                                        SCT Feature Control supported.
                                        SCT Data Table supported.

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x002f   103   103   051    Pre-fail  Always       -       49158
  3 Spin_Up_Time            0x0027   173   164   021    Pre-fail  Always       -       6350
  4 Start_Stop_Count        0x0032   100   100   000    Old_age   Always       -       557
  5 Reallocated_Sector_Ct   0x0033   173   173   140    Pre-fail  Always       -       585
  7 Seek_Error_Rate         0x002e   200   200   000    Old_age   Always       -       0
  9 Power_On_Hours          0x0032   092   092   000    Old_age   Always       -       6129
 10 Spin_Retry_Count        0x0032   100   100   000    Old_age   Always       -       0
 11 Calibration_Retry_Count 0x0032   100   100   000    Old_age   Always       -       0
 12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       545
192 Power-Off_Retract_Count 0x0032   200   200   000    Old_age   Always       -       136
193 Load_Cycle_Count        0x0032   182   182   000    Old_age   Always       -       56324
194 Temperature_Celsius     0x0022   114   111   000    Old_age   Always       -       36
196 Reallocated_Event_Count 0x0032   001   001   000    Old_age   Always       -       264
197 Current_Pending_Sector  0x0032   200   001   000    Old_age   Always       -       272
198 Offline_Uncorrectable   0x0030   200   199   000    Old_age   Offline      -       228
199 UDMA_CRC_Error_Count    0x0032   200   200   000    Old_age   Always       -       0
200 Multi_Zone_Error_Rate   0x0008   134   134   000    Old_age   Offline      -       17802

SMART Error Log Version: 1
ATA Error Count: 3718 (device log contains only the most recent five errors)
        CR = Command Register [HEX]
        FR = Features Register [HEX]
        SC = Sector Count Register [HEX]
        SN = Sector Number Register [HEX]
        CL = Cylinder Low Register [HEX]
        CH = Cylinder High Register [HEX]
        DH = Device/Head Register [HEX]
        DC = Device Command Register [HEX]
        ER = Error register [HEX]
        ST = Status register [HEX]
Powered_Up_Time is measured from power on, and printed as
DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes,
SS=sec, and sss=millisec. It "wraps" after 49.710 days.

Error 3718 occurred at disk power-on lifetime: 6124 hours (255 days + 4 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  40 51 08 c0 c6 fc eb  Error: UNC 8 sectors at LBA = 0x0bfcc6c0 = 201115328

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  c8 00 08 c0 c6 fc eb 0a      21:27:17.589  READ DMA
  ec 00 00 00 00 00 a0 0a      21:27:17.570  IDENTIFY DEVICE
  ef 03 46 00 00 00 a0 0a      21:27:17.570  SET FEATURES [Set transfer mode]

Error 3717 occurred at disk power-on lifetime: 6124 hours (255 days + 4 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  40 51 08 c0 c6 fc eb  Error: UNC 8 sectors at LBA = 0x0bfcc6c0 = 201115328

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  c8 00 08 c0 c6 fc eb 0a      21:27:14.760  READ DMA
  ec 00 00 00 00 00 a0 0a      21:27:14.741  IDENTIFY DEVICE
  ef 03 46 00 00 00 a0 0a      21:27:14.741  SET FEATURES [Set transfer mode]

Error 3716 occurred at disk power-on lifetime: 6124 hours (255 days + 4 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  40 51 08 c0 c6 fc eb  Error: UNC 8 sectors at LBA = 0x0bfcc6c0 = 201115328

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  c8 00 08 c0 c6 fc eb 0a      21:27:11.931  READ DMA
  ec 00 00 00 00 00 a0 0a      21:27:11.912  IDENTIFY DEVICE
  ef 03 46 00 00 00 a0 0a      21:27:11.912  SET FEATURES [Set transfer mode]

Error 3715 occurred at disk power-on lifetime: 6124 hours (255 days + 4 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  40 51 08 c0 c6 fc eb  Error: UNC 8 sectors at LBA = 0x0bfcc6c0 = 201115328

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  c8 00 08 c0 c6 fc eb 0a      21:27:09.102  READ DMA
  ec 00 00 00 00 00 a0 0a      21:27:09.083  IDENTIFY DEVICE
  ef 03 46 00 00 00 a0 0a      21:27:09.083  SET FEATURES [Set transfer mode]

Error 3714 occurred at disk power-on lifetime: 6124 hours (255 days + 4 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  40 51 08 c0 c6 fc eb  Error: UNC 8 sectors at LBA = 0x0bfcc6c0 = 201115328

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  c8 00 08 c0 c6 fc eb 0a      21:27:06.261  READ DMA
  ec 00 00 00 00 00 a0 0a      21:27:06.242  IDENTIFY DEVICE
  ef 03 46 00 00 00 a0 0a      21:27:06.242  SET FEATURES [Set transfer mode]

SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Extended offline    Completed: read failure       90%      6128         31187072
# 2  Extended offline    Completed: read failure       90%      6128         31187076
# 3  Extended offline    Completed: read failure       90%      6128         31187072
# 4  Short offline       Completed: read failure       90%      6128         31187076
# 5  Extended offline    Completed: read failure       90%      6128         31187072
# 6  Conveyance offline  Completed: read failure       90%      6127         31187076
# 7  Short offline       Completed: read failure       90%      6127         31187072

SMART Selective self-test log data structure revision number 1
 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.


Log feformatted from quote to code for easy reading by NeddySeagoon
Back to top
View user's profile Send private message
NeddySeagoon
Administrator
Administrator


Joined: 05 Jul 2003
Posts: 29961
Location: 56N 3W

PostPosted: Sat Jul 07, 2012 10:44 am    Post subject: Reply with quote

nlsa8z6zoz7lyih3ap,

The important stuff first ...
Code:
Warranty Inquiry for : Canada
Serial Number     Model Number     Warranty Status  Warranty Exp Date
WCAZA0780869     WD20EARS-00MVWB0     IN WARRANTY   10/01/2013


In the UK at least, WD will ship you a new drive before you send your old one in. They need a creit card number in case the old drive is not returned.
Return postage is your cost and its worth insuring the scrap drive too, since you will be billed if its not received.

Since you have a few months yet to return the drive, the following is for interest only.
The
Code:
VALUE WORST THRESH
columns provide the interesting data. These are normalised numbers and can be read the same for all drive vendors.
VALUE shows the corrent value of a paramter., WORST is the closest to failing the parameter has been in the drives life. THRESH is the value consider to be a falure.
That is if VALUE or WORST <= THRESH, the parameter has failed. RAW_VALUE is vendor or even drive specific, since its a 32 bit field that may contain several bit fields, e.g. four 8 bit values.

Code:
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
196 Reallocated_Event_Count 0x0032   001   001   000    Old_age   Always       -       264
197 Current_Pending_Sector  0x0032   200   001   000    Old_age   Always       -       272

Shows the drive has already reallocated some sectors and had more it would like to realocate.

Get WD to send you a warrantly replacement before you send your drive in. Use ddrescure to image the old drive onto the new one.
As its a whold drive image, you will need to tell it to write the logfile someone else.
When/if you get all your data back, or you give up trying, send the dead drive back.

Once ddrescure is down to doing retries, its worth moving the dead drive around while ddrescue runs. Try it on all four edges, upside down and any other attitudes you can easily prop it up in. You just need one more read.

The error count in
Code:
Error 3718 occurred at disk power-on lifetime: 6124 hours (255 days + 4 hours)
  When the command that caused the error occurred, the device was active or idle.
is incremented for every failed command, so
Each time
Code:
Error: UNC 8 sectors at LBA = 0x0bfcc6c0 = 201115328
is read you get a new error record.
_________________
Regards,

NeddySeagoon

Computer users fall into two groups:-
those that do backups
those that have never had a hard drive fail.
Back to top
View user's profile Send private message
nlsa8z6zoz7lyih3ap
Apprentice
Apprentice


Joined: 25 Sep 2007
Posts: 230
Location: Canada

PostPosted: Mon Jul 09, 2012 3:04 pm    Post subject: Reply with quote

Thanks so much for explaining all this to me. It was very helpful indeed.


Quote:
Computer users fall into two groups:-
those that do backups
those that have never had a hard drive fail.


I do frequent backups of everything and so do not have to still try to get data of of the drive.

Thanks again.
Back to top
View user's profile Send private message
Display posts from previous:   
Reply to topic    Gentoo Forums Forum Index Kernel & Hardware All times are GMT
Page 1 of 1

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum