Gentoo Forums
Gentoo Forums
Gentoo Forums
Quick Search: in
Reading SMART hard drives through USB
View unanswered posts
View posts from last 24 hours

 
Reply to topic    Gentoo Forums Forum Index Documentation, Tips & Tricks
View previous topic :: View next topic  
Author Message
dufeu
l33t
l33t


Joined: 30 Aug 2002
Posts: 924
Location: US-FL-EST

PostPosted: Mon May 02, 2011 9:12 pm    Post subject: Reading SMART hard drives through USB Reply with quote

This post is a result of my search to find out more information on the status of various SMART capable hard drives across various of my systems. For a write up on SMART hard drive technology, see the wikipedia article.

As noted in the article, there isn't a standard for accessing SMART information from drives connected via USB based controllers. This is because the ATA command set used to gather SMART information is not part of the UBS standards for interacting with hard drives. This was clearly a major oversight on the part of the relevent standards body - 'nuff said.

To determine what SMART capable hard drives are available on a system, you would normally execute:
Code:
# smartctl --scan
and receive results similar to:
Code:
/dev/sda -d scsi [SCSI]
/dev/sdb -d scsi [SCSI]
/dev/sdc -d scsi [SCSI]
/dev/sdd -d scsi [SCSI]

Yet, on the very same system, executing:
Code:
# df -h
reveals:
Code:
Filesystem               Size  Used Avail Use% Mounted on
rootfs                   363G  114G  232G  33% /
/dev/root                363G  114G  232G  33% /
rc-svcdir                1.0M  132K  892K  13% /lib64/rc/init.d
udev                      10M  292K  9.8M   3% /dev
shm                      3.7G  292K  3.7G   1% /dev/shm
/dev/sdb1                917G  730G  188G  80% /home
/dev/sda4                559G  478G   80G  86% /pub00
/dev/sdc1                1.8T  1.5T  347G  82% /pub01
/dev/sdd1                1.8T  1.7T   95G  95% /pub02
/dev/sde1                917G  196G  722G  22% /pubu01
/dev/sdf1                1.8T  1.8T   35G  99% /pubu02
/dev/sdg1                1.4T  1.1T  299G  79% /pubu03
/dev/sda1                 31M  6.5M   23M  23% /boot

This is quite the discrepancy.

Hard drives /dev/sde, /dev/sdf and /dev/sdg are attatched through USB ports. They are respectively 1T Seagate, 2T Seagate and 1.5T WD external USB 2.0 based hard drives. So how can we get SMART status information from these hard drives?

While there is no standard for doing so, several of the USB chip manufacturers support the ability to pass through raw ATA commands to any hard drives attached to them. To be completely clear, the bottleneck for SMART status information is not the USB chips/logic which reside on your motherboard, but rather, the USB chips which reside on the external device at hand. Fortunately, there has been a fair amount of effort expended regarding which USB chips permit the pass through of raw ATA commands and which devices these chips are present in. In addition, there are modifiers for the 'smartctl' command which will enable you to tell 'smartctl' to pass these commands through in raw form and retrieve the resulting staus information. A list of the known capable devices resides at the smartmontools wiki.

Also, execute:
Code:
#man smartctl
to read the instructions for using smartctl.

The modifiers to the smartctl command follow the name of the device and can be one these:
Code:
-d usbcypress
-d usbjmicron
-d sat
The Cypress USB chip uses a format they refer to as ATACB for supporting the passing of raw ATA commands through USB.

I didn't see what JMicron calls their method.

SAT basically tells smartctl to treat the attached device as if it's through a Standard AT connection {i.e. PATA, not SATA}.

The full command format might then look like:
Code:
# smartctl -a /dev/sde -d usbcypress
or perhaps
Code:
# smartctl -a /dev/sde -d sat

These are typical 'fail' results:
Code:
# smartctl -a /dev/sdg -d usbcypress
smartctl 5.40 2010-10-16 r3189 [x86_64-pc-linux-gnu] (local build)
Copyright (C) 2002-10 by Bruce Allen, http://smartmontools.sourceforge.net

Smartctl: Device Read Identity Failed (not an ATA/ATAPI device)

A mandatory SMART command failed: exiting. To continue, add one or more '-T permissive' options.

These are typical 'sucess' results:
Code:
# smartctl -a /dev/sdg -d sat
smartctl 5.40 2010-10-16 r3189 [x86_64-pc-linux-gnu] (local build)
Copyright (C) 2002-10 by Bruce Allen, http://smartmontools.sourceforge.net

=== START OF INFORMATION SECTION ===
Model Family:     Western Digital Caviar Green family
Device Model:     WDC WD15EADS-11R6B1
Serial Number:    WD-WCAVY2024420
Firmware Version: 80.00A80
User Capacity:    1,500,301,910,016 bytes
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   8
ATA Standard is:  Exact ATA specification draft version not indicated
Local Time is:    Mon May  2 16:31:12 2011 EDT
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x84) Offline data collection activity
                                        was suspended by an interrupting command from host.
                                        Auto Offline Data Collection: Enabled.
Self-test execution status:      (   0) The previous self-test routine completed
                                        without error or no self-test has ever
                                        been run.
Total time to complete Offline
data collection:                 (30000) seconds.
Offline data collection
capabilities:                    (0x7b) SMART execute Offline immediate.
                                        Auto Offline data collection on/off support.
                                        Suspend Offline collection upon new
                                        command.
                                        Offline surface scan supported.
                                        Self-test supported.
                                        Conveyance Self-test supported.
                                        Selective Self-test supported.
SMART capabilities:            (0x0003) Saves SMART data before entering
                                        power-saving mode.
                                        Supports SMART auto save timer.
Error logging capability:        (0x01) Error logging supported.
                                        General Purpose Logging supported.
Short self-test routine
recommended polling time:        (   2) minutes.
Extended self-test routine
recommended polling time:        ( 255) minutes.
Conveyance self-test routine
recommended polling time:        (   5) minutes.
SCT capabilities:              (0x3031) SCT Status supported.
                                        SCT Feature Control supported.
                                        SCT Data Table supported.

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x002f   200   200   051    Pre-fail  Always       -       0
  3 Spin_Up_Time            0x0027   142   141   021    Pre-fail  Always       -       9858
  4 Start_Stop_Count        0x0032   099   099   000    Old_age   Always       -       1564
  5 Reallocated_Sector_Ct   0x0033   200   200   140    Pre-fail  Always       -       0
  7 Seek_Error_Rate         0x002e   200   200   000    Old_age   Always       -       0
  9 Power_On_Hours          0x0032   098   098   000    Old_age   Always       -       1516
 10 Spin_Retry_Count        0x0032   100   100   000    Old_age   Always       -       0
 11 Calibration_Retry_Count 0x0032   100   253   000    Old_age   Always       -       0
 12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       6
192 Power-Off_Retract_Count 0x0032   200   200   000    Old_age   Always       -       3
193 Load_Cycle_Count        0x0032   197   197   000    Old_age   Always       -       9014
194 Temperature_Celsius     0x0022   094   092   000    Old_age   Always       -       58
196 Reallocated_Event_Count 0x0032   200   200   000    Old_age   Always       -       0
197 Current_Pending_Sector  0x0032   200   200   000    Old_age   Always       -       0
198 Offline_Uncorrectable   0x0030   200   200   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x0032   200   200   000    Old_age   Always       -       0
200 Multi_Zone_Error_Rate   0x0008   200   200   000    Old_age   Offline      -       0

SMART Error Log Version: 1
No Errors Logged

SMART Self-test log structure revision number 1
No self-tests have been logged.  [To run self-tests, use: smartctl -t]


SMART Selective self-test log data structure revision number 1
 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.

In addition to the above 'smartctl' command modifiers, the modifiers themselves can have modifiers. These are generally counters to get to a specific device. i.e. if you're using a USB connected NAS box with multiple drives where they can take the form of 'x' {the number of the drive you want - 0, 1, 2 .. N} or the number '12' which is apparently {pure guess on my part without reading the relevant technical specifications} beyond the maximum number of supported devices on a single chip. Such a command might look like:
Code:
# smartctl -a /dev/sdg -d sat,x
# smartctl -a /dev/sdg -d sat,12

So .. now you may be wondering why I was so interested in finding this information out? OK, I know you're not really interested, but I'll tell you anyway.

It's my opinion that the single most causative factor in electronics failure is excess heat. This is more true for hard disks as any other piece of electronic equipment. I happened to notice while installing some 2T hard drives and subsequently re-arranging older drives that some of the drives were pretty darn hot to the touch. A quick command:
Code:
# smartctl -a /dev/sdb | grep Temper
displayed:
Code:
194 Temperature_Celsius     0x0032   046   253   000    Old_age   Always       -       56
56 Celcius {OUCH!!!} That's more than 'pretty darned hot'!

This particular case is configured such that the hard drives are spaced with 1/2 gaps minimum. I usually consider this sufficient for adequate cooling. It turned out that all the non Samsung dives in that case were running 55-56 C while the Samsung drives were running 47-48 C. This is much too hot so it was back to the stripped parts bin looking for a suitable fan to stick in front of the hard drive bay. These are the final results:
Code:
# smartctl -a /dev/sda -d ata | grep Temper
190 Airflow_Temperature_Cel 0x0022   054   032   045    Old_age   Always   In_the_past 46
194 Temperature_Celsius     0x0022   104   082   000    Old_age   Always       -       46
pyrotekk ~ # smartctl -a /dev/sdb -d ata | grep Temper
194 Temperature_Celsius     0x0032   046   253   000    Old_age   Always       -       40
pyrotekk ~ # smartctl -a /dev/sdc -d ata | grep Temper
194 Temperature_Celsius     0x0032   046   253   000    Old_age   Always       -       40
pyrotekk ~ # smartctl -a /dev/sdd -d ata | grep Temper
190 Airflow_Temperature_Cel 0x0022   067   050   000    Old_age   Always       -       33
194 Temperature_Celsius     0x0022   139   088   000    Old_age   Always       -       33
pyrotekk ~ # smartctl -a /dev/sdf -d ata | grep Temper
190 Airflow_Temperature_Cel 0x0022   067   048   000    Old_age   Always       -       33
194 Temperature_Celsius     0x0022   139   082   000    Old_age   Always       -       33
pyrotekk ~ # smartctl -a /dev/sdg -d ata | grep Temper
190 Airflow_Temperature_Cel 0x0022   069   057   000    Old_age   Always       -       31 (Min/Max 26/31)
194 Temperature_Celsius     0x0022   068   056   000    Old_age   Always       -       32 (Min/Max 26/32)

Things are much improved! Note that /dev/sda is in a drive bay separate from the other drives, hence the higher temperature.

BTW - the examples displayed here to show temperature readings are from a different system than the original examples used to display USB query results. The similar temperature results from that system are:
Code:
pyrodyno pubroot # smartctl -a /dev/sda | grep Tempera
190 Airflow_Temperature_Cel 0x0022   065   055   045    Old_age   Always       -       35 (Min/Max 34/38)
194 Temperature_Celsius     0x0022   035   045   000    Old_age   Always       -       35 (0 23 0 0)
pyrodyno pubroot # smartctl -a /dev/sdb | grep Tempera
190 Airflow_Temperature_Cel 0x0022   072   069   000    Old_age   Always       -       28 (Min/Max 27/29)
194 Temperature_Celsius     0x0022   072   067   000    Old_age   Always       -       28 (Min/Max 26/30)
pyrodyno pubroot # smartctl -a /dev/sdc | grep Tempera
194 Temperature_Celsius     0x0002   064   062   000    Old_age   Always       -       34 (Min/Max 28/38)
pyrodyno pubroot # smartctl -a /dev/sdd | grep Tempera
194 Temperature_Celsius     0x0002   064   063   000    Old_age   Always       -       32 (Min/Max 27/37)
pyrodyno pubroot # smartctl -a /dev/sdf -d sat | grep Tempera
190 Airflow_Temperature_Cel 0x0022   057   042   045    Old_age   Always   In_the_past 43 (0 122 50 37)
194 Temperature_Celsius     0x0022   043   058   000    Old_age   Always       -       43 (0 19 0 0)
pyrodyno pubroot # smartctl -a /dev/sdg -d sat | grep Tempera
194 Temperature_Celsius     0x0022   100   092   000    Old_age   Always       -       52

Note how much warmer the external hard drives /dev/sdf and /dev/sdg are. As I noted earlier, /dev/sdf is a Seagate and /dev/sdg is a WD. Both drives are standalone close to but not next to each other with plenty of available air flow. The 2T Seagate external drives don't appear to be accessible for SMART status information. I suspect the Cypress USB chip might have issues supporting such large drives and Seagate may have used a different USB chip. You can see in the wiki page of known devices that the 2T Seagate external drive is an open question mark.

{edit: same day 2 hours later} - The 2T Seagate ended up readable by adding the ',12' modifier:
Code:
pyrodyno pubroot # smartctl -a /dev/sde -d sat,12 | grep Tempera
190 Airflow_Temperature_Cel 0x0022   055   042   045    Old_age   Always   In_the_past 45 (0 122 50 37)
194 Temperature_Celsius     0x0022   045   058   000    Old_age   Always       -       45 (0 19 0 0)

It's running at 45C. Acceptable though I'd prefer under 40C. The WD external drive was unchanged at 52C which is still quite disappointing since the drives were purchased within days of each other and are directly comparable in terms of technology generation.
{end edit}

{edit: same day 3 hours later} - Points of information:
  1. The SMART technology wikipedia article referenced in the first paragraph includes a list of SMART attributes and what they {most likely} mean. {different manufacturers may not ascribe identical meanings to the same numbered attributes}
  2. Some of these SMART attributes are highlighted to indicate those attributes which are more relevant in terms of predicting imminent failure.
  3. Temperature is a good attribute to monitor when assembling a system or adding a new drive. Air flow is important and you don't want a drive inadvertently stuck in a local hot spot.
{end edit}

I hope you find the information here helpful.
_________________
People whom think M$ is mediocre, don't know the half of it.


Last edited by dufeu on Mon May 02, 2011 10:36 pm; edited 3 times in total
Back to top
View user's profile Send private message
BradN
Advocate
Advocate


Joined: 19 Apr 2002
Posts: 2391
Location: Wisconsin (USA)

PostPosted: Mon May 02, 2011 9:29 pm    Post subject: Reply with quote

I remember reading an article about Google's hard drive management statistics (failure rates of drives compared to various pieces of data, including drive temperature).

Check it out here: http://static.googleusercontent.com/external_content/untrusted_dlcp/labs.google.com/en/us/papers/disk_failures.pdf

On page 6 there are a couple graphs about temperature vs failure rate. Their findings seem to show that temperatures *below* 30C increase failure rates in the first couple years of operation, and temperatures above 40C increase failure rates in years 3-4 of operation.

I would suspect an increase in low temperature failures would be due to mechanical failure (bearing/lubrication, etc not being as effective when colder), as controller chips are generally quite happy at low temperatures, but I could be wrong.

Also, note that since these are statistics across their entire range of drives (many brands/models), that doesn't automatically mean that a particular drive is more likely to fail at a given temperature (some might be perfectly capable of reliable operation at 55C, who knows), just that the aggregate of their drive pool exhibits those rates.
Back to top
View user's profile Send private message
dufeu
l33t
l33t


Joined: 30 Aug 2002
Posts: 924
Location: US-FL-EST

PostPosted: Mon May 02, 2011 9:53 pm    Post subject: Reply with quote

BradN wrote:
On page 6 there are a couple graphs about temperature vs failure rate. Their findings seem to show that temperatures *below* 30C increase failure rates in the first couple years of operation, and temperatures above 40C increase failure rates in years 3-4 of operation.

Absolutely correct.

I usually don't consider the 'too cool' case because I don't ever have anything running less than 26C {regular room temps}.

FWIW - I've experienced a lessor failure rate with Samsung drives due, I believe, to their cooler running temperatures. In all of my systems with mixed drives, the Samsungs typically run 28-34C while everything else typically runs 38-46C. I usually ensure at least 1/2 spacing gaps between drives or forced airflow for drive bays with close spacing.

In this particular case {pryotekk}, I noticed, only with this incident, that the area of the front panel before the drive bay is a solid sheet. There are no breather holes at all. I cut some slots and stuck a scrapped 120mm fan there. Whatever works. Dropping down from 56C to 33C is a big plus!

FWIW, I'd try to avoid running any hard drive over 46C .. which means I'm not really happy with the WD external drive. :(
_________________
People whom think M$ is mediocre, don't know the half of it.
Back to top
View user's profile Send private message
Display posts from previous:   
Reply to topic    Gentoo Forums Forum Index Documentation, Tips & Tricks All times are GMT
Page 1 of 1

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum