smartctl 'R'eport and output for seagate HDD[enlightened]

Message

dufeu · Post by **dufeu** » Fri Jan 25, 2019 10:31 pm

For a long time now, I've been using smartctl to monitor my hard drives to identify failing drives. I do two things.

I set the smartcontrol daemon so that it runs a short self test automatically once a day in the early morning:

DEVICESCAN -a -I 194 -R 5! -W 2,45,50 -s S/../.././03
# Full list at end of file. This is interpreted as:
#       -a              monitor all default attributes
#       -I 194          ignore normalized att# (temp celsius) values
#       -R5!            report any change for att# as 'critical'
#       -W 2,45,50      warn of temperature (celsius):
#                       2 - changes of greater than 2 degrees
#                       informational warning: above 45
#                       critical warning: above 50
#       -s S/../../.././03
#                       perform short test everyday between 03:00-04:00

I use a custom script to produce a report on every drive:

Code: Select all

#!/bin/bash

# Sript provides meaningful report regarding current status of all attached SHD and SSD
# devices. It is assumed SMART is installed and the 'smartd' daemon is running. It is
# also assumed /path/to/smartd.conf is configured appropriately.

#       clear results of previous run
#               this should probably live in /var/log and be saved/rotated
#               if so, then run date and time initialized in log name?

RPTNAME="smartdisk.rpt"
touch $RPTNAME
rm $RPTNAME
touch $RPTNAME
date > $RPTNAME
echo " "

#       Select and process SHD and SSD drives. It is not meaningful to process other types
#       of block devices. Using 'lsblk' does this perfectly.
#               -dn     these options suppress lsblk header and partition details
#               TODO    Not yet sure what's needed here. Connect a USB card reader and see.
#                       (i.e. - check how other block devices are reported and then filter
#                       if needed)

for f in $(lsblk -dn | awk '{print $1}')

#       We want the report to show:
#               make, model and serial number info
#               select status info
#                       all errors
#                       temperature
#                       'Unkown' - so we can pick up the HE status for helium drives.
#                               Normal value should be '100'
#               daily self test results
#                       '## Short' - lists prior self test results. SMART enabled drives retain
#                               the last 21 entriess. We will report only the last 5 entries.
#                       'result' - Health check overall result. This is meaningless without
#                               first running at least one short/long test.

do

        echo " " >> $RPTNAME
        echo "Processing disk /dev/$f ..." >> $RPTNAME
        smartctl -a "/dev/$f" | grep -E 'Model|Serial|Version|atabase|Unknown|ogged|eallocated|Spin_Retry|Read_Error|Celsius| [12345]  Short'  >> $RPTNAME
        smartctl -H "/dev/$f" | grep 'result'  >> $RPTNAME

done

#       close the report

echo " "  >> $RPTNAME
echo "report complete" >> $RPTNAME

The ouput per drive looks like this when good:

Code: Select all

Processing disk /dev/sdb ...
Device Model:     MD1TBLSSHD
Serial Number:    MD302334551
Firmware Version: 03.01A02
Device is:        Not in smartctl database [for details use: -P showall]
ATA Version is:   ACS-2 (minor revision not indicated)
SATA Version is:  SATA 3.0, 6.0 Gb/s (current: 6.0 Gb/s)
  1 Raw_Read_Error_Rate     0x002f   200   200   051    Pre-fail  Always       -       0
  5 Reallocated_Sector_Ct   0x0033   200   200   140    Pre-fail  Always       -       0
 10 Spin_Retry_Count        0x0032   100   100   000    Old_age   Always       -       0
194 Temperature_Celsius     0x0022   104   094   000    Old_age   Always       -       43
196 Reallocated_Event_Count 0x0032   200   200   000    Old_age   Always       -       0
SMART Error Log Version: 1
No Errors Logged
# 1  Short offline       Completed without error       00%      9142         -
# 2  Short offline       Completed without error       00%      9118         -
# 3  Short offline       Completed without error       00%      9094         -
# 4  Short offline       Completed without error       00%      9070         -
# 5  Short offline       Completed without error       00%      9046         -
SMART overall-health self-assessment test result: PASSED

or like this when a disk fails:

Code: Select all

Processing disk /dev/sdaj ...
Device Model:     WL6000GSA6457
Serial Number:    WOL240332217
Firmware Version: A3.00F.0
Device is:        Not in smartctl database [for details use: -P showall]
ATA Version is:   ATA8-ACS, ACS-3 T13/2161-D revision 3b
SATA Version is:  SATA 3.1, 3.0 Gb/s (current: 3.0 Gb/s)
  1 Raw_Read_Error_Rate     0x002f   200   191   051    Pre-fail  Always       -       0
  5 Reallocated_Sector_Ct   0x0033   194   194   140    Pre-fail  Always       -       195
 10 Spin_Retry_Count        0x0033   100   253   051    Pre-fail  Always       -       0
194 Temperature_Celsius     0x0022   117   096   000    Old_age   Always       -       35
196 Reallocated_Event_Count 0x0032   196   196   000    Old_age   Always       -       4
SMART Error Log Version: 1
No Errors Logged
# 1  Short offline       Completed: read failure       60%     23226         987648
# 2  Short offline       Completed: read failure       60%     23202         987648
# 3  Short offline       Completed: read failure       60%     23179         987648
# 4  Short offline       Completed: read failure       60%     23155         987648
# 5  Short offline       Completed: read failure       60%     23131         987648
SMART overall-health self-assessment test result: PASSED

I haven't used Seagate drives for a long time (not since the less than 1 year life span of the 2T drives fiasco). I recently installed 4 6T Seagate drives and their smartctl output does NOT conform to the format I'm used to. I've tried the various smartctl format output options to be able to extract specific drive attribute data but have failed miserably. I know some of the same data is collected and available on the Seagate drives but ...

Is it actually possible to extract specific attribute data from Seagate drives anymore? The default smartctl ouput for Seagate drives now looks like:

Code: Select all

# smartctl -a /dev/sdaz
smartctl 7.0 2018-12-30 r4883 [x86_64-linux-4.19.16-gentoo] (local build)
Copyright (C) 2002-18, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Vendor:               SEAGATE
Product:              ST6000NM0034
Revision:             E005
Compliance:           SPC-4
User Capacity:        6,001,175,126,016 bytes [6.00 TB]
Logical block size:   512 bytes
Physical block size:  4096 bytes
Formatted with type 2 protection
8 bytes of protection information per logical block
LU is fully provisioned
Rotation Rate:        7200 rpm
Form Factor:          3.5 inches
Logical Unit id:      0x5000c500835ec477
Serial number:        Z4D1RXYQ0000R535WXLM
Device type:          disk
Transport protocol:   SAS (SPL-3)
Local Time is:        Fri Jan 25 17:40:09 2019 EST
SMART support is:     Available - device has SMART capability.
SMART support is:     Enabled
Temperature Warning:  Enabled

=== START OF READ SMART DATA SECTION ===
SMART Health Status: OK

Grown defects during certification <not available>
Total blocks reassigned during format <not available>
Total new blocks reassigned <not available>
Power on minutes since format <not available>
Current Drive Temperature:     37 C
Drive Trip Temperature:        60 C

Manufactured in week 14 of year 2015
Specified cycle count over device lifetime:  10000
Accumulated start-stop cycles:  32
Specified load-unload count over device lifetime:  300000
Accumulated load-unload cycles:  1457
Elements in grown defect list: 0

Vendor (Seagate Cache) information
  Blocks sent to initiator = 2524347320
  Blocks received from initiator = 2261760576
  Blocks read from cache and sent to initiator = 2208709482
  Number of read and write commands whose size <= segment size = 2850162024
  Number of read and write commands whose size > segment size = 8848917

Vendor (Seagate/Hitachi) factory information
  number of hours powered up = 27745.60
  number of minutes until next internal SMART test = 36

Error counter log:
           Errors Corrected by           Total   Correction     Gigabytes    Total
               ECC          rereads/    errors   algorithm      processed    uncorrected
           fast | delayed   rewrites  corrected  invocations   [10^9 bytes]  errors
read:   500918108        0         0  500918108          0    1770151.992           0
write:         0        0         0         0          0    1805802.472           0
verify:       64        0         0        64          0          0.000           0

Non-medium error count:      338

SMART Self-test log
Num  Test              Status                 segment  LifeTime  LBA_first_err [SK ASC ASQ]
     Description                              number   (hours)
# 1  Background short  Completed                   -   27731                 - [-   -    -]
# 2  Background short  Completed                   -   27707                 - [-   -    -]
# 3  Background short  Completed                   -   27691                 - [-   -    -]

Long (extended) Self-test duration: 40836 seconds [680.6 minutes]

addendum - 2019-01-25

I can tune my script so that I get most of the equivalent information. The resultant Seagate drives look like this:

Code: Select all

Processing disk /dev/sdbk ...
Vendor:               SEAGATE
Product:              ST6000NM0034
Revision:             E005
Serial number:        Z4D1P08Y0000R536MR9N
Transport protocol:   SAS (SPL-3)
Temperature Warning:  Enabled
Current Drive Temperature:     37 C
Drive Trip Temperature:        60 C
read:   963713629        0         0  963713629          0    1746531.708           0
write:         0        0         0         0          0    1801339.923           0
verify:    10336        0         0     10336          0          0.021           0
# 1  Background short  Completed                   -   27755                 - [-   -    -]
# 2  Background short  Completed                   -   27731                 - [-   -    -]
# 3  Background short  Completed                   -   27716                 - [-   -    -]

You can replace the equivalent 'smartctl -a' line in the script above with this:

Code: Select all

        smartctl -a "/dev/$f" | grep -E 'Vendor:|Product|odel|Serial|Version|Revision|Transport|atabase|Unknown|ogged|eallocated|Spin_Retry|Read_Error|Celsius|Temperature|read:|write:|verify:| [12345]  Short| [12345]  Background'  >> $RPTNAME

The unanswered issue is: Does the 'R'eport function of 'smartd' work for Seagate drives? This smartctl daemon function is supposed to allow you to add reporting conditions when a user specified attribute changes. Typically, attribute 5 for REALLOCATED_SECTOR_COUNT is the attribute of greatest interest. If Seagate doesn't report attributes anymore, what happens to smartctl daemon functionality?

molletts · Post by **molletts** » Wed Jan 30, 2019 12:17 pm

The reason the smartctl output is in a different form is because it's a SCSI (SAS) disk rather than an ATA one - what you're seeing is the standard smartctl output for SCSI drives.

The equivalent of the "Reallocated Sector Count" attribute is "Elements in grown defect list".

I don't know how you get smartd to report on that, though - I haven't used it for many years.

dufeu · Post by **dufeu** » Wed Feb 20, 2019 4:24 pm

molletts wrote:The reason the smartctl output is in a different form is because it's a SCSI (SAS) disk rather than an ATA one - what you're seeing is the standard smartctl output for SCSI drives.

The equivalent of the "Reallocated Sector Count" attribute is "Elements in grown defect list".

I don't know how you get smartd to report on that, though - I haven't used it for many years.

Aaack! I completely missed this when I ordered the drives. I was looking at the price. (used, 6TB $96 each). This is the first time I've ever purchased a SAS drive. Fortunately, I popped them into a chassis which can handle them with the correct controller!

Code: Select all

01:00.0 Serial Attached SCSI controller: LSI Logic / Symbios Logic SAS2308 PCI-Express Fusion-MPT SAS-2 (rev 05)
02:00.0 Serial Attached SCSI controller: LSI Logic / Symbios Logic SAS2116 PCI-Express Fusion-MPT SAS-2 [Meteor] (rev 02)

I have a Chenbro NR40700 (48 drive top loader) and a Norco 24 drive expansion chassis.

Thanks for the clarification! Now I have a better idea where to look/search for more information.