Gentoo Forums
Gentoo Forums
Gentoo Forums
Quick Search: in
Impending HD failure?
View unanswered posts
View posts from last 24 hours

 
Reply to topic    Gentoo Forums Forum Index Kernel & Hardware
View previous topic :: View next topic  
Author Message
Robert S
Guru
Guru


Joined: 15 Aug 2004
Posts: 412
Location: Canberra Australia

PostPosted: Mon Apr 09, 2012 9:04 pm    Post subject: Impending HD failure? Reply with quote

I'm starting to get these:
Code:
Apr 10 00:36:57 myserver smartd[21225]: Device: /dev/sda [SAT], 3 Currently unreadable (pending) sectors
Apr 10 00:36:57 myserver smartd[21225]: Device: /dev/sda [SAT], 3 Offline uncorrectable sectors
Apr 10 01:06:57 myserver smartd[21225]: Device: /dev/sda [SAT], 3 Currently unreadable (pending) sectors
Apr 10 01:06:57 myserver smartd[21225]: Device: /dev/sda [SAT], 3 Offline uncorrectable sectors

What is the best way of testing my HD? I'm currently doing a backup of the entire disk with a view to restoring it on another HD. I'll do a reboot with an automatic fsck when I've finished. Any other suggestions?
Back to top
View user's profile Send private message
Thistled
Guru
Guru


Joined: 06 Jan 2011
Posts: 433
Location: Scotland

PostPosted: Mon Apr 09, 2012 10:57 pm    Post subject: Reply with quote

I have been seeing these errors on 2 of my disks since installing gentoo back in 2008.
I thought it may have something to do with dual booting with windoze, as on one occasion I restarted my PC from Windoze str8 to Gentoo and there was a lock on the ntfs disk I was trying to mount. The solution was to shutdown windoze, then boot into Gentoo, and I would subsequently get access to the disk.
I tried defragmenting windoze to see if that would resolve it, but no joy.
The sectors always seem to be of the same size, and no increase over the years.
Just as long as you have made a backup of your important stuff, then I would not worry too much about this.
It sure as hell scared the ***t out of me when I first saw this info, but it has not escalated since the first warning, so I am not too worried.
_________________
Whatever you do, do it properly!
Back to top
View user's profile Send private message
Jaglover
Advocate
Advocate


Joined: 29 May 2005
Posts: 3979
Location: Saint Amant, Acadiana

PostPosted: Mon Apr 09, 2012 11:11 pm    Post subject: Reply with quote

You should run something like this
Code:
smartctl --all /dev/sda | grep -e "Reallocated_Sector_Ct" -e "Current_Pending_Sector" -e "Offline_Uncorrectable" -e "UDMA_CRC_Error_Count" -e "Hardware_ECC_Recovered"

to see if the drive is going bad. In my experience once the error count goes out of hand the drive is going to die soon.
_________________
Please learn how to denote units correctly!
Back to top
View user's profile Send private message
srs5694
Guru
Guru


Joined: 08 Mar 2004
Posts: 310
Location: Woonsocket, RI

PostPosted: Mon Apr 09, 2012 11:11 pm    Post subject: Reply with quote

I strongly advise both of you to run a full SMART diagnostic on the disk. This can be done with tools like smartctl (text-mode), GSmartControl (GUI), or Palimpsest Disk Utility (SMART options are buried in a menu somewhere). IIRC, smartctl and Palimpsest are available in portage, but for some reason GSmartControl isn't. You might also be able to run a SMART test using a utility provided by the disk manufacturer, but that's likely to be written for Windows. This might be OK if you dual-boot, but on a Linux-only system, this could be problematic.

Unfortunately, SMART diagnostic results can be difficult to interpret. Some manufacturers put weird values in some fields that make things look worse than they are. Some fields are strangely named, and utilities often provide poor descriptions of what they mean. As a general rule, the GUI tools make the results easier to interpret than do the text-mode tools.

If the SMART tool gives you anything but "passed" for its overall assessment, you should probably replace the disk ASAP. Likewise if individual tests look troubling and you get confirmation from an expert that this reflects a real problem. The whole point of SMART is to detect disks that are just starting to flake out, so that you can replace the hardware before it fails entirely. It's possible to go for days, weeks, or even months with a disk that SMART says is problematic, but such disks are much more likely to go south very quickly than is a disk that gets a clean bill of health from a SMART test.

Edit: I posted just seconds after Jaglover. By "both of you" in my first paragraph, I'm referring to the first two posters.
Back to top
View user's profile Send private message
BillWho
Veteran
Veteran


Joined: 03 Mar 2012
Posts: 1576
Location: US

PostPosted: Mon Apr 09, 2012 11:15 pm    Post subject: Reply with quote

Robert S,

I've had similar errors on a disk for close to three years now. I have gentoo installed as test and break system so there's nothing important on it.

I saved the output of /usr/sbin/smartctl --log=error /dev/sdb and it still reports the exact same info today.

That disk could live another several years with no problems or it could crash and burn tomorrow.

If you have any critical data on it then for sure back it up - don't take any chances.

Good luck :wink:
Back to top
View user's profile Send private message
Thistled
Guru
Guru


Joined: 06 Jan 2011
Posts: 433
Location: Scotland

PostPosted: Mon Apr 09, 2012 11:26 pm    Post subject: Reply with quote

In my case all the disks which are reporting errors
Code:
Apr 10 00:06:49 pig smartd[3636]: Device: /dev/sda [SAT], 1 Offline uncorrectable sectors
Apr 10 00:06:49 pig smartd[3636]: Device: /dev/sda [SAT], 1 Currently unreadable (pending) sectors
Apr 10 00:06:49 pig smartd[3636]: Device: /dev/sdb [SAT], 1 Currently unreadable (pending) sectors
Apr 10 00:06:49 pig smartd[3636]: Device: /dev/sdb [SAT], 1 Offline uncorrectable sectors
Apr 10 00:06:49 pig smartd[3636]: Device: /dev/sdc [SAT], 5 Currently unreadable (pending) sectors

are disks which were initially installed / utilised by Windoze. (i.e. they are ntfs)
These disks are not mounted at boot time. I mount these disks via nautilus, and they are used / shared between Windoze / Gentoo for
documents, pictures, music etc etc
In my case, I think the unreadable sector errors are because they are not mounted.
I think to ask palimpsest or other programs to repair, will bork my ntfs disks.
_________________
Whatever you do, do it properly!
Back to top
View user's profile Send private message
Mad Merlin
Veteran
Veteran


Joined: 09 May 2005
Posts: 1134

PostPosted: Tue Apr 10, 2012 4:58 am    Post subject: Reply with quote

Those errors are exactly what they sound like, a sector is unreadable on the hard drive. That sector might be part of your swap file (probably won't matter) or it could be part of your /boot/grub/grub.conf (not so good). Reads to that sector will fail. The next write made to that sector will cause the hard drive to transparently remap that sector to another spare sector and everything will be normal again.

Now, hard drives have a relatively small number of spare sectors (think dozens), and eventually it will run out. What happens after that is left as an exercise to the reader. Ideally, you will replace the drive before you are able to find out.

This might sound bad, but bad sectors are a fact of life, just as are dead pixels on your monitor, hard drives will deal with them just fine in small quantities. In general, if you see a small number of offline uncorrectable sectors and that number is not rising over time, the drive is probably fine. If you see a number that's steadily (or quickly) rising over time, toss the drive, it's going to eat your data.

Of course, I would point out that I've seen plenty of drives die completely out of the blue (SMART had no complaints right up until the drive's block device disappeared). Consequently, it's always a good time to test your backups.
_________________
Game! - Where the stick is mightier than the sword!
Back to top
View user's profile Send private message
Robert S
Guru
Guru


Joined: 15 Aug 2004
Posts: 412
Location: Canberra Australia

PostPosted: Tue Apr 10, 2012 8:15 am    Post subject: Reply with quote

Here's the output.

Code:
myserver robert # smartctl --all /dev/sda | grep -e "Reallocated_Sector_Ct" -e "Current_Pending_Sector" -e "Offline_Uncorrectable" -e "UDMA_CRC_Error_Count" -e "Hardware_ECC_Recovered"
  5 Reallocated_Sector_Ct   0x0033   100   100   036    Pre-fail  Always       -       0
195 Hardware_ECC_Recovered  0x001a   036   024   000    Old_age   Always       -       77071283
197 Current_Pending_Sector  0x0012   100   100   000    Old_age   Always       -       3
198 Offline_Uncorrectable   0x0010   100   100   000    Old_age   Offline      -       3
199 UDMA_CRC_Error_Count    0x003e   200   200   000    Old_age   Always       -       0
myserver robert # /usr/sbin/smartctl --log=error /dev/sda
smartctl 5.42 2011-10-20 r3458 [x86_64-linux-3.2.12-gentoo] (local build)
Copyright (C) 2002-11 by Bruce Allen, http://smartmontools.sourceforge.net

=== START OF READ SMART DATA SECTION ===
SMART Error Log Version: 1
No Errors Logged


My problem is that i'm going overseas for a few weeks soon and I can't afford to have this bomb. It might be easier to bite the bullet and get another HD.
Back to top
View user's profile Send private message
Thistled
Guru
Guru


Joined: 06 Jan 2011
Posts: 433
Location: Scotland

PostPosted: Wed Apr 11, 2012 12:34 pm    Post subject: Reply with quote

Code:
195 Hardware_ECC_Recovered  0x001a   036   024   000    Old_age   Always       -       77071283

That particular line does give a little cause for concern.
Like you say, back up all important stuff on /dev/sda and probably would be a good idea to replace said disk.
I spent 4 hours last night going through all my windoze partitions, defragmenting and running scan disks. Windoze reported no problems with the disk / partitions, but as soon as I come back into Gentoo, smartd still throws out warnings.
My situation is more akin to BillWhos', as my errors are in the 1 - 5 range, and have been since the installation of a brand new disk so I am not worrying too much.
_________________
Whatever you do, do it properly!
Back to top
View user's profile Send private message
BillWho
Veteran
Veteran


Joined: 03 Mar 2012
Posts: 1576
Location: US

PostPosted: Wed Apr 11, 2012 1:12 pm    Post subject: Reply with quote

Thistled,

I don't believe that you can attribute the errors to winblows. I have a winblows installation on my disk and no errors are reported with smartctl.
Code:
   Device Boot      Start         End      Blocks   Id  System
/dev/sda1              63    20482874    10241406   27  Hidden NTFS WinRE
/dev/sda2   *    20484096   336990191   158253048    7  HPFS/NTFS/exFAT


Code:
root@gentoo-gateway bill # /usr/sbin/smartctl --log=error /dev/sda
smartctl 5.42 2011-10-20 r3458 [x86_64-linux-3.3.0-rc7] (local build)
Copyright (C) 2002-11 by Bruce Allen, http://smartmontools.sourceforge.net

=== START OF READ SMART DATA SECTION ===
SMART Error Log Version: 1
No Errors Logged


This is the original installed hd with a vista installation along with a recovery partition and then later upgraded to win7.
Back to top
View user's profile Send private message
srs5694
Guru
Guru


Joined: 08 Mar 2004
Posts: 310
Location: Woonsocket, RI

PostPosted: Wed Apr 11, 2012 2:33 pm    Post subject: Reply with quote

I agree with Mad Merlin: Back up your data and either replace the drive ASAP or be prepared to lose it suddenly.

One more point: SMART tools work with the disk hardware itself to detect problems. As such, SMART works at a much lower level than filesystem drivers. SMART can detect errors in parts of the disk that are unused -- unused parts of a filesystem or even gaps between partitions. Thus, you can spend all day running fsck in Linux or defragmenting files in Windows and there's no guarantee that you'll touch the affected sectors. Likewise if the bad sectors are in the middle of a big file that happens not to be adjusted by a defragment operation.

The best way to ensure that you do something with a sector that's going bad is to do a raw write operation to the whole disk, as in:

Code:

dd if=/dev/zero of=/dev/sdb


This is, however, a destructive operation -- it zeroes out the entire disk! If your disk holds important data, you obviously don't want to do this. If you replace the disk, though, and you want to discover how bad it is and perhaps salvage some life from the disk in a non-critical capacity, you could do this and see what happens to the SMART test results. If the "pending sectors" count drops to 0, then it could be there were just a handful of bad sectors and the disk will be good for a while longer. If the values skyrocket, OTOH, then you'll know the disk was in bad shape and you replaced it just in time. (The latter happened to me recently, FWIW. Fortunately, the disk was still under warranty, so now I've got a replacement drive waiting to be used.)
Back to top
View user's profile Send private message
Thistled
Guru
Guru


Joined: 06 Jan 2011
Posts: 433
Location: Scotland

PostPosted: Thu Apr 12, 2012 12:33 am    Post subject: Reply with quote

I am a little confused by all of this. This 1st disk is my Winblows disk, and is barely used by Linux, but I can mount it if I want to install any apps using Wine.
Code:
   Device Boot      Start         End      Blocks   Id  System
/dev/sda1   *          63    41945714    20972826    7  HPFS/NTFS/exFAT
/dev/sda2        41945715   265168889   111611587+   7  HPFS/NTFS/exFAT
/dev/sda3       265168890   488392064   111611587+   7  HPFS/NTFS/exFAT

and smarctl reports the following:
Code:
/usr/sbin/smartctl --log=error /dev/sda
smartctl 5.42 2011-10-20 r3458 [i686-linux-3.3.1-gentoo] (local build)
Copyright (C) 2002-11 by Bruce Allen, http://smartmontools.sourceforge.net

=== START OF READ SMART DATA SECTION ===
SMART Error Log Version: 1
ATA Error Count: 10 (device log contains only the most recent five errors)
   CR = Command Register [HEX]
   FR = Features Register [HEX]
   SC = Sector Count Register [HEX]
   SN = Sector Number Register [HEX]
   CL = Cylinder Low Register [HEX]
   CH = Cylinder High Register [HEX]
   DH = Device/Head Register [HEX]
   DC = Device Command Register [HEX]
   ER = Error register [HEX]
   ST = Status register [HEX]
Powered_Up_Time is measured from power on, and printed as
DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes,
SS=sec, and sss=millisec. It "wraps" after 49.710 days.

Error 10 occurred at disk power-on lifetime: 6151 hours (256 days + 7 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  40 51 01 9b 4b 4c e2  Error: UNC at LBA = 0x024c4b9b = 38554523

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  42 d8 01 9b 4b 4c e0 08      00:28:29.100  READ VERIFY SECTOR(S) EXT
  42 d8 02 9d 4b 4c e0 08      00:28:29.100  READ VERIFY SECTOR(S) EXT
  25 d8 01 00 00 00 e0 08      00:28:29.100  READ DMA EXT
  42 d8 02 9b 4b 4c e0 08      00:28:24.700  READ VERIFY SECTOR(S) EXT
  25 d8 01 00 00 00 e0 08      00:28:24.700  READ DMA EXT

Error 9 occurred at disk power-on lifetime: 6151 hours (256 days + 7 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  40 51 02 9b 4b 4c e2  Error: UNC at LBA = 0x024c4b9b = 38554523

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  42 d8 02 9b 4b 4c e0 08      00:28:24.700  READ VERIFY SECTOR(S) EXT
  25 d8 01 00 00 00 e0 08      00:28:24.700  READ DMA EXT
  25 d8 01 00 00 00 e0 08      00:28:24.700  READ DMA EXT
  42 d8 04 9b 4b 4c e0 08      00:28:20.100  READ VERIFY SECTOR(S) EXT
  42 d8 04 97 4b 4c e0 08      00:28:20.100  READ VERIFY SECTOR(S) EXT

Error 8 occurred at disk power-on lifetime: 6151 hours (256 days + 7 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  40 51 04 9b 4b 4c e2  Error: UNC at LBA = 0x024c4b9b = 38554523

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  42 d8 04 9b 4b 4c e0 08      00:28:20.100  READ VERIFY SECTOR(S) EXT
  42 d8 04 97 4b 4c e0 08      00:28:20.100  READ VERIFY SECTOR(S) EXT
  25 d8 01 00 00 00 e0 08      00:28:20.000  READ DMA EXT
  42 d8 08 97 4b 4c e0 08      00:28:15.700  READ VERIFY SECTOR(S) EXT
  25 d8 01 00 00 00 e0 08      00:28:15.700  READ DMA EXT

Error 7 occurred at disk power-on lifetime: 6151 hours (256 days + 7 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  40 51 04 9b 4b 4c e2  Error: UNC at LBA = 0x024c4b9b = 38554523

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  42 d8 08 97 4b 4c e0 08      00:28:15.700  READ VERIFY SECTOR(S) EXT
  25 d8 01 00 00 00 e0 08      00:28:15.700  READ DMA EXT
  42 d8 08 8f 4b 4c e0 08      00:28:15.700  READ VERIFY SECTOR(S) EXT
  25 d8 01 00 00 00 e0 08      00:28:15.600  READ DMA EXT
  42 d8 10 8f 4b 4c e0 08      00:28:11.200  READ VERIFY SECTOR(S) EXT

Error 6 occurred at disk power-on lifetime: 6151 hours (256 days + 7 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  40 51 04 9b 4b 4c e2  Error: UNC at LBA = 0x024c4b9b = 38554523

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  42 d8 10 8f 4b 4c e0 08      00:28:11.200  READ VERIFY SECTOR(S) EXT
  42 d8 10 7f 4b 4c e0 08      00:28:11.200  READ VERIFY SECTOR(S) EXT
  25 d8 01 00 00 00 e0 08      00:28:11.200  READ DMA EXT
  42 d8 20 7f 4b 4c e0 08      00:28:06.700  READ VERIFY SECTOR(S) EXT
  25 d8 01 00 00 00 e0 08      00:28:06.700  READ DMA EXT

For my "main" Linux disk. i.e. Boot Swap and Root:
Code:
   Device Boot      Start         End      Blocks   Id  System
/dev/sdb1   *          63      417689      208813+  83  Linux
/dev/sdb2          417690     4401809     1992060   82  Linux swap / Solaris
/dev/sdb3         4401810   312576704   154087447+  83  Linux

smartctl reports:
Code:
/usr/sbin/smartctl --log=error /dev/sdb
smartctl 5.42 2011-10-20 r3458 [i686-linux-3.3.1-gentoo] (local build)
Copyright (C) 2002-11 by Bruce Allen, http://smartmontools.sourceforge.net

=== START OF READ SMART DATA SECTION ===
SMART Error Log Version: 1
ATA Error Count: 166 (device log contains only the most recent five errors)
   CR = Command Register [HEX]
   FR = Features Register [HEX]
   SC = Sector Count Register [HEX]
   SN = Sector Number Register [HEX]
   CL = Cylinder Low Register [HEX]
   CH = Cylinder High Register [HEX]
   DH = Device/Head Register [HEX]
   DC = Device Command Register [HEX]
   ER = Error register [HEX]
   ST = Status register [HEX]
Powered_Up_Time is measured from power on, and printed as
DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes,
SS=sec, and sss=millisec. It "wraps" after 49.710 days.

Error 166 occurred at disk power-on lifetime: 20434 hours (851 days + 10 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  40 51 00 ed ed ab ea  Error: UNC at LBA = 0x0aabeded = 179039725

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  c8 00 08 ea ed ab ea 00      03:00:24.681  READ DMA
  27 00 00 00 00 00 e0 00      03:00:24.681  READ NATIVE MAX ADDRESS EXT
  ec 00 00 00 00 00 a0 00      03:00:24.623  IDENTIFY DEVICE
  ef 03 46 00 00 00 a0 00      03:00:24.622  SET FEATURES [Set transfer mode]
  27 00 00 00 00 00 e0 00      03:00:21.710  READ NATIVE MAX ADDRESS EXT

Error 165 occurred at disk power-on lifetime: 20434 hours (851 days + 10 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  40 51 00 ed ed ab ea  Error: UNC at LBA = 0x0aabeded = 179039725

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  c8 00 08 ea ed ab ea 00      03:00:18.565  READ DMA
  27 00 00 00 00 00 e0 00      03:00:15.546  READ NATIVE MAX ADDRESS EXT
  ec 00 00 00 00 00 a0 00      03:00:15.546  IDENTIFY DEVICE
  ef 03 46 00 00 00 a0 00      03:00:15.546  SET FEATURES [Set transfer mode]
  27 00 00 00 00 00 e0 00      03:00:21.710  READ NATIVE MAX ADDRESS EXT

Error 164 occurred at disk power-on lifetime: 20434 hours (851 days + 10 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  40 51 00 ed ed ab ea  Error: UNC at LBA = 0x0aabeded = 179039725

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  c8 00 08 ea ed ab ea 00      03:00:18.565  READ DMA
  27 00 00 00 00 00 e0 00      03:00:15.546  READ NATIVE MAX ADDRESS EXT
  ec 00 00 00 00 00 a0 00      03:00:15.546  IDENTIFY DEVICE
  ef 03 46 00 00 00 a0 00      03:00:15.546  SET FEATURES [Set transfer mode]
  27 00 00 00 00 00 e0 00      03:00:15.546  READ NATIVE MAX ADDRESS EXT

Error 163 occurred at disk power-on lifetime: 20434 hours (851 days + 10 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  40 51 00 ed ed ab ea  Error: UNC at LBA = 0x0aabeded = 179039725

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  c8 00 08 ea ed ab ea 00      03:00:15.545  READ DMA
  ca 00 10 ba 30 9d ea 00      03:00:15.546  WRITE DMA
  ca 00 08 82 4b 9e ea 00      03:00:15.546  WRITE DMA
  ca 00 08 d2 4b 9e ea 00      03:00:15.546  WRITE DMA
  ca 00 08 7a 44 9b ea 00      03:00:15.546  WRITE DMA

Error 162 occurred at disk power-on lifetime: 17827 hours (742 days + 19 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  40 51 00 bd 82 31 ed  Error: UNC at LBA = 0x0d3182bd = 221348541

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  c8 00 08 ba 82 31 ed 00      02:17:54.884  READ DMA
  27 00 00 00 00 00 e0 00      02:17:54.828  READ NATIVE MAX ADDRESS EXT
  ec 00 00 00 00 00 a0 02      02:17:54.825  IDENTIFY DEVICE
  ef 03 46 00 00 00 a0 02      02:17:51.922  SET FEATURES [Set transfer mode]
  27 00 00 00 00 00 e0 00      02:17:51.854  READ NATIVE MAX ADDRESS EXT

and finally, the disk which is a combination of ntfs and ext3, which is my Linux /home partition (sdc2):
Code:
   Device Boot      Start         End      Blocks   Id  System
/dev/sdc1              63   244187999   122093968+   7  HPFS/NTFS/exFAT
/dev/sdc2       244188000   349044254    52428127+  83  Linux
/dev/sdc3       349044255   418718159    34836952+   7  HPFS/NTFS/exFAT
/dev/sdc4       418718160   488392064    34836952+   7  HPFS/NTFS/exFAT

smartctl reports:
Code:
/usr/sbin/smartctl --log=error /dev/sdc
smartctl 5.42 2011-10-20 r3458 [i686-linux-3.3.1-gentoo] (local build)
Copyright (C) 2002-11 by Bruce Allen, http://smartmontools.sourceforge.net

=== START OF READ SMART DATA SECTION ===
SMART Error Log Version: 1
No Errors Logged


So what is with all of the Errors which
Code:
occurred at disk power-on lifetime

?
_________________
Whatever you do, do it properly!
Back to top
View user's profile Send private message
Jaglover
Advocate
Advocate


Joined: 29 May 2005
Posts: 3979
Location: Saint Amant, Acadiana

PostPosted: Thu Apr 12, 2012 12:49 am    Post subject: Reply with quote

Alright, have you run self-test on this drive? Did it finish? If the drive is bad the test usually will not accomplish.
Below is a sample of a healthy drive.

Code:
SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Extended offline    Completed without error       00%      4742         -

_________________
Please learn how to denote units correctly!
Back to top
View user's profile Send private message
Thistled
Guru
Guru


Joined: 06 Jan 2011
Posts: 433
Location: Scotland

PostPosted: Thu Apr 12, 2012 1:00 am    Post subject: Reply with quote

Well palimpsest reports I have 12 bad sectors on both sdb and sdc

and

Code:
pig ~ # smartctl --attributes --log=selftest --quietmode=errorsonly /dev/sda
pig ~ # smartctl --attributes --log=selftest --quietmode=errorsonly /dev/sdb
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Extended offline    Completed: read failure       90%     33311         221348541
# 2  Extended offline    Completed: read failure       90%     33311         221348541
# 3  Short offline       Completed: read failure       80%     33310         221348541
# 4  Short offline       Completed: read failure       80%     24093         221348541
# 5  Short offline       Completed: read failure       80%     23407         221348541
# 6  Short offline       Completed: read failure       80%     22024         221348541
# 7  Short offline       Completed: read failure       80%     22024         221348541
# 8  Short offline       Completed: read failure       80%     22024         221348541
# 9  Short offline       Completed: read failure       80%     20657         221348541
#10  Short offline       Completed: read failure       80%     19723         221348541
#11  Short offline       Completed: read failure       80%     19070         221348541
#12  Short offline       Completed: read failure       80%     18120         221348541

pig ~ # smartctl --attributes --log=selftest --quietmode=errorsonly /dev/sdc
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Short offline       Completed: read failure       90%     39657         6163165
# 2  Short offline       Completed: read failure       90%     31057         6163165
# 3  Extended offline    Completed: read failure       90%     31057         6163165
# 4  Short offline       Completed: read failure       90%     30648         6163165
# 5  Short offline       Completed: read failure       90%     30648         6163165
# 6  Short offline       Completed: read failure       90%     28018         6163165
# 7  Short offline       Completed: read failure       90%     12615         250373360
# 8  Extended offline    Completed: read failure       90%     12615         250373360
# 9  Short offline       Completed: read failure       90%     12615         250373360
#10  Extended offline    Completed: read failure       90%     12612         250373360
#11  Short offline       Completed: read failure       90%     12612         250373360
#12  Short offline       Completed: read failure       90%     12612         250373360
#13  Short offline       Completed: read failure       90%     11543         250373360
#14  Extended offline    Completed: read failure       90%     10086         250373360
#15  Short offline       Completed: read failure       90%     10079         250373360
#16  Short offline       Completed: read failure       90%     10079         250373360

_________________
Whatever you do, do it properly!
Back to top
View user's profile Send private message
Hu
Watchman
Watchman


Joined: 06 Mar 2007
Posts: 7616

PostPosted: Thu Apr 12, 2012 2:20 am    Post subject: Reply with quote

Thistled wrote:
So what is with all of the Errors which
Code:
occurred at disk power-on lifetime
?
The drive failed to complete a command that was sent to it by the OS. This is a bad sign. The "disk power-on lifetime" bit is so you can determine whether the error was reported yesterday or last year. The drive tells you how many power-on hours it has accumulated, so you can work out from that how recently an error occurred.
Back to top
View user's profile Send private message
Thistled
Guru
Guru


Joined: 06 Jan 2011
Posts: 433
Location: Scotland

PostPosted: Wed Apr 18, 2012 10:19 pm    Post subject: Reply with quote

But this has been like this since the day I bought the disk.
Palimpsest has always reported the current pending sector error.
Like I said in earlier posts, all my important stuff is backed up on my server, I am kind of taking that same approach as BillWho.
Quote:
Robert S,

I've had similar errors on a disk for close to three years now. I have gentoo installed as test and break system so there's nothing important on it.

I saved the output of /usr/sbin/smartctl --log=error /dev/sdb and it still reports the exact same info today.

That disk could live another several years with no problems or it could crash and burn tomorrow.

If you have any critical data on it then for sure back it up - don't take any chances.

Good luck


I would not be surprised to discover this is because I am overclocking a 2.77Ghz to 3.16Ghz, as I am fully aware overclocking can put a stress on gear.
_________________
Whatever you do, do it properly!
Back to top
View user's profile Send private message
Display posts from previous:   
Reply to topic    Gentoo Forums Forum Index Kernel & Hardware All times are GMT
Page 1 of 1

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum