Gentoo Forums
Gentoo Forums
Gentoo Forums
Quick Search: in
Two SSDs going bad?
View unanswered posts
View posts from last 24 hours

 
Reply to topic    Gentoo Forums Forum Index Kernel & Hardware
View previous topic :: View next topic  
Author Message
Zucca
Moderator
Moderator


Joined: 14 Jun 2007
Posts: 3311
Location: Rasi, Finland

PostPosted: Fri Feb 15, 2019 3:22 pm    Post subject: Two SSDs going bad? Reply with quote

So I encountered this from my logs:
disk log:
[disk_maintenance.sh - BALANCE] BTRFS: [/dev/sdd].corruption_errs  23
[disk_maintenance.sh - BALANCE] BTRFS: [/dev/sde].corruption_errs  20
[disk_maintenance.sh - BALANCE] BTRFS: [/dev/sde].generation_errs  22
That is generated via:
root shell:
# echo /dev/sd{a,b,c,d,e,f} | xargs -n 1 btrfs dev stats
[/dev/sda].write_io_errs    0
[/dev/sda].read_io_errs     0
[/dev/sda].flush_io_errs    0
[/dev/sda].corruption_errs  0
[/dev/sda].generation_errs  0
[/dev/sdb].write_io_errs    0
[/dev/sdb].read_io_errs     0
[/dev/sdb].flush_io_errs    0
[/dev/sdb].corruption_errs  0
[/dev/sdb].generation_errs  0
[/dev/sdc].write_io_errs    0
[/dev/sdc].read_io_errs     0
[/dev/sdc].flush_io_errs    0
[/dev/sdc].corruption_errs  0
[/dev/sdc].generation_errs  0
[/dev/sdd].write_io_errs    0
[/dev/sdd].read_io_errs     0
[/dev/sdd].flush_io_errs    0
[/dev/sdd].corruption_errs  23
[/dev/sdd].generation_errs  0
[/dev/sde].write_io_errs    0
[/dev/sde].read_io_errs     0
[/dev/sde].flush_io_errs    0
[/dev/sde].corruption_errs  20
[/dev/sde].generation_errs  22
[/dev/sdf].write_io_errs    0
[/dev/sdf].read_io_errs     0
[/dev/sdf].flush_io_errs    0
[/dev/sdf].corruption_errs  0
[/dev/sdf].generation_errs  0


Then
root shell:
# echo /dev/sd{d,e} | xargs -n 1 smartctl -a
martctl 6.6 2017-11-05 r4594 [x86_64-linux-4.14.65-gentoo-wren] (local build)
Copyright (C) 2002-17, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Family:     SandForce Driven SSDs
Device Model:     KINGSTON SV300S37A480G
Serial Number:    50026B725C04894F
LU WWN Device Id: 5 0026b7 25c04894f
Firmware Version: 605ABBF2
User Capacity:    480 103 981 056 bytes [480 GB]
Sector Size:      512 bytes logical/physical
Rotation Rate:    Solid State Device
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   ATA8-ACS, ACS-2 T13/2015-D revision 3
SATA Version is:  SATA 3.0, 6.0 Gb/s (current: 3.0 Gb/s)
Local Time is:    Fri Feb 15 17:21:33 2019 EET
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x02)   Offline data collection activity
               was completed without error.
               Auto Offline Data Collection: Disabled.
Self-test execution status:      (   0)   The previous self-test routine completed
               without error or no self-test has ever
               been run.
Total time to complete Offline
data collection:       (    0) seconds.
Offline data collection
capabilities:           (0x7d) SMART execute Offline immediate.
               No Auto Offline data collection support.
               Abort Offline collection upon new
               command.
               Offline surface scan supported.
               Self-test supported.
               Conveyance Self-test supported.
               Selective Self-test supported.
SMART capabilities:            (0x0003)   Saves SMART data before entering
               power-saving mode.
               Supports SMART auto save timer.
Error logging capability:        (0x01)   Error logging supported.
               General Purpose Logging supported.
Short self-test routine
recommended polling time:     (   1) minutes.
Extended self-test routine
recommended polling time:     (  48) minutes.
Conveyance self-test routine
recommended polling time:     (   2) minutes.
SCT capabilities:           (0x0025)   SCT Status supported.
               SCT Data Table supported.

SMART Attributes Data Structure revision number: 10
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x0032   095   095   050    Old_age   Always       -       0/149071042
  5 Retired_Block_Count     0x0033   100   100   003    Pre-fail  Always       -       0
  9 Power_On_Hours_and_Msec 0x0032   078   078   000    Old_age   Always       -       19723h+46m+29.690s
 12 Power_Cycle_Count       0x0032   099   099   000    Old_age   Always       -       1107
171 Program_Fail_Count      0x000a   100   100   000    Old_age   Always       -       0
172 Erase_Fail_Count        0x0032   100   100   000    Old_age   Always       -       0
174 Unexpect_Power_Loss_Ct  0x0030   000   000   000    Old_age   Offline      -       135
177 Wear_Range_Delta        0x0000   000   000   000    Old_age   Offline      -       2
181 Program_Fail_Count      0x000a   100   100   000    Old_age   Always       -       0
182 Erase_Fail_Count        0x0032   100   100   000    Old_age   Always       -       0
187 Reported_Uncorrect      0x0012   100   100   000    Old_age   Always       -       0
189 Airflow_Temperature_Cel 0x0000   037   058   000    Old_age   Offline      -       37 (Min/Max 15/58)
194 Temperature_Celsius     0x0022   037   058   000    Old_age   Always       -       37 (Min/Max 15/58)
195 ECC_Uncorr_Error_Count  0x001c   120   120   000    Old_age   Offline      -       0/149071042
196 Reallocated_Event_Count 0x0033   100   100   003    Pre-fail  Always       -       0
201 Unc_Soft_Read_Err_Rate  0x001c   120   120   000    Old_age   Offline      -       0/149071042
204 Soft_ECC_Correct_Rate   0x001c   120   120   000    Old_age   Offline      -       0/149071042
230 Life_Curve_Status       0x0013   100   100   000    Pre-fail  Always       -       100
231 SSD_Life_Left           0x0000   100   100   011    Old_age   Offline      -       77309411328
233 SandForce_Internal      0x0032   000   000   000    Old_age   Always       -       20038
234 SandForce_Internal      0x0032   000   000   000    Old_age   Always       -       23691
241 Lifetime_Writes_GiB     0x0032   000   000   000    Old_age   Always       -       23691
242 Lifetime_Reads_GiB      0x0032   000   000   000    Old_age   Always       -       31884
244 Unknown_Attribute       0x0000   099   099   010    Old_age   Offline      -       5242917

SMART Error Log not supported

SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Short offline       Completed without error       00%     19704         -
# 2  Short offline       Completed without error       00%     19680         -
# 3  Short offline       Completed without error       00%     19656         -
# 4  Short offline       Completed without error       00%     19632         -
# 5  Short offline       Completed without error       00%     19608         -
# 6  Short offline       Completed without error       00%     19584         -
# 7  Short offline       Completed without error       00%     19561         -
# 8  Short offline       Completed without error       00%     19549         -
# 9  Short offline       Completed without error       00%     19525         -
#10  Short offline       Completed without error       00%     19501         -
#11  Short offline       Completed without error       00%     19477         -
#12  Short offline       Completed without error       00%     19453         -
#13  Short offline       Completed without error       00%     19429         -
#14  Short offline       Completed without error       00%     19405         -
#15  Short offline       Completed without error       00%     19381         -
#16  Short offline       Completed without error       00%     19369         -
#17  Short offline       Completed without error       00%     19345         -
#18  Short offline       Completed without error       00%     19321         -
#19  Short offline       Completed without error       00%     19297         -
#20  Short offline       Completed without error       00%     19273         -
#21  Short offline       Completed without error       00%     19249         -

SMART Selective self-test log data structure revision number 1
 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.

smartctl 6.6 2017-11-05 r4594 [x86_64-linux-4.14.65-gentoo-wren] (local build)
Copyright (C) 2002-17, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Family:     SandForce Driven SSDs
Device Model:     KINGSTON SV300S37A480G
Serial Number:    50026B725C0487E4
LU WWN Device Id: 5 0026b7 25c0487e4
Firmware Version: 605ABBF2
User Capacity:    480 103 981 056 bytes [480 GB]
Sector Size:      512 bytes logical/physical
Rotation Rate:    Solid State Device
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   ATA8-ACS, ACS-2 T13/2015-D revision 3
SATA Version is:  SATA 3.0, 6.0 Gb/s (current: 3.0 Gb/s)
Local Time is:    Fri Feb 15 17:21:33 2019 EET
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x02)   Offline data collection activity
               was completed without error.
               Auto Offline Data Collection: Disabled.
Self-test execution status:      (   0)   The previous self-test routine completed
               without error or no self-test has ever
               been run.
Total time to complete Offline
data collection:       (    0) seconds.
Offline data collection
capabilities:           (0x7d) SMART execute Offline immediate.
               No Auto Offline data collection support.
               Abort Offline collection upon new
               command.
               Offline surface scan supported.
               Self-test supported.
               Conveyance Self-test supported.
               Selective Self-test supported.
SMART capabilities:            (0x0003)   Saves SMART data before entering
               power-saving mode.
               Supports SMART auto save timer.
Error logging capability:        (0x01)   Error logging supported.
               General Purpose Logging supported.
Short self-test routine
recommended polling time:     (   1) minutes.
Extended self-test routine
recommended polling time:     (  48) minutes.
Conveyance self-test routine
recommended polling time:     (   2) minutes.
SCT capabilities:           (0x0025)   SCT Status supported.
               SCT Data Table supported.

SMART Attributes Data Structure revision number: 10
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x0032   095   095   050    Old_age   Always       -       0/148989892
  5 Retired_Block_Count     0x0033   100   100   003    Pre-fail  Always       -       0
  9 Power_On_Hours_and_Msec 0x0032   078   078   000    Old_age   Always       -       19723h+28m+32.230s
 12 Power_Cycle_Count       0x0032   099   099   000    Old_age   Always       -       1107
171 Program_Fail_Count      0x000a   100   100   000    Old_age   Always       -       0
172 Erase_Fail_Count        0x0032   100   100   000    Old_age   Always       -       0
174 Unexpect_Power_Loss_Ct  0x0030   000   000   000    Old_age   Offline      -       135
177 Wear_Range_Delta        0x0000   000   000   000    Old_age   Offline      -       2
181 Program_Fail_Count      0x000a   100   100   000    Old_age   Always       -       0
182 Erase_Fail_Count        0x0032   100   100   000    Old_age   Always       -       0
187 Reported_Uncorrect      0x0012   100   100   000    Old_age   Always       -       0
189 Airflow_Temperature_Cel 0x0000   037   059   000    Old_age   Offline      -       37 (Min/Max 15/59)
194 Temperature_Celsius     0x0022   037   059   000    Old_age   Always       -       37 (Min/Max 15/59)
195 ECC_Uncorr_Error_Count  0x001c   120   120   000    Old_age   Offline      -       0/148989892
196 Reallocated_Event_Count 0x0033   100   100   003    Pre-fail  Always       -       0
201 Unc_Soft_Read_Err_Rate  0x001c   120   120   000    Old_age   Offline      -       0/148989892
204 Soft_ECC_Correct_Rate   0x001c   120   120   000    Old_age   Offline      -       0/148989892
230 Life_Curve_Status       0x0013   100   100   000    Pre-fail  Always       -       100
231 SSD_Life_Left           0x0000   100   100   011    Old_age   Offline      -       98784247808
233 SandForce_Internal      0x0032   000   000   000    Old_age   Always       -       20376
234 SandForce_Internal      0x0032   000   000   000    Old_age   Always       -       23656
241 Lifetime_Writes_GiB     0x0032   000   000   000    Old_age   Always       -       23656
242 Lifetime_Reads_GiB      0x0032   000   000   000    Old_age   Always       -       31532
244 Unknown_Attribute       0x0000   099   099   010    Old_age   Offline      -       5177382

SMART Error Log not supported

SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Short offline       Completed without error       00%     19704         -
# 2  Short offline       Completed without error       00%     19680         -
# 3  Short offline       Completed without error       00%     19656         -
# 4  Short offline       Completed without error       00%     19632         -
# 5  Short offline       Completed without error       00%     19608         -
# 6  Short offline       Completed without error       00%     19584         -
# 7  Short offline       Completed without error       00%     19561         -
# 8  Short offline       Completed without error       00%     19549         -
# 9  Short offline       Completed without error       00%     19525         -
#10  Short offline       Completed without error       00%     19501         -
#11  Short offline       Completed without error       00%     19477         -
#12  Short offline       Completed without error       00%     19453         -
#13  Short offline       Completed without error       00%     19429         -
#14  Short offline       Completed without error       00%     19405         -
#15  Short offline       Completed without error       00%     19381         -
#16  Short offline       Completed without error       00%     19369         -
#17  Short offline       Completed without error       00%     19345         -
#18  Short offline       Completed without error       00%     19321         -
#19  Short offline       Completed without error       00%     19297         -
#20  Short offline       Completed without error       00%     19273         -
#21  Short offline       Completed without error       00%     19249         -

SMART Selective self-test log data structure revision number 1
 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.


Not sure, but SMART data looks ok... Anyone to confirm?
Should I start considering swapping out the two SSDs to new ones? Sadly those are one one the newer ones, but also the biggest, thus getting more I/O than smaller ones in the pool. My mistake of buying identical disks at the same time maybe. :(
_________________
..: Zucca :..
Gentoo IRC channels reside on Libera.Chat.
--
Quote:
I am NaN! I am a man!
Back to top
View user's profile Send private message
NeddySeagoon
Administrator
Administrator


Joined: 05 Jul 2003
Posts: 54099
Location: 56N 3W

PostPosted: Fri Feb 15, 2019 6:18 pm    Post subject: Reply with quote

Zucca,

The SMART output looks OK.

The Short offline test is almost useless. Run the long test on both drives (takes 48 min) and post the smart data again.

With a 480GB drive and 23691GB writes, that's less than an average of 50 erase cycles spread over the drive.
The drive will spread it like than too, swapping out erase blocks that would normally only ever be written once with other erase blocks, so wear is levelled over the entire drive, not just the unused block pool.

Its unlikely the drives are worn out.
_________________
Regards,

NeddySeagoon

Computer users fall into two groups:-
those that do backups
those that have never had a hard drive fail.
Back to top
View user's profile Send private message
Zucca
Moderator
Moderator


Joined: 14 Jun 2007
Posts: 3311
Location: Rasi, Finland

PostPosted: Fri Feb 15, 2019 7:35 pm    Post subject: Reply with quote

Thanks for the reply, Neddy.

The error btrfs reported aren't very serious.
smartd runs longs tests once a month.
I also parsed the logfiles using awk and the values btrfs had reported first appeared on january 21st and since have not changed...

I'll run the tests and report back.
_________________
..: Zucca :..
Gentoo IRC channels reside on Libera.Chat.
--
Quote:
I am NaN! I am a man!
Back to top
View user's profile Send private message
BitJam
Advocate
Advocate


Joined: 12 Aug 2003
Posts: 2508
Location: Silver City, NM

PostPosted: Fri Feb 15, 2019 8:48 pm    Post subject: Reply with quote

It is likely there is a common source for those errors. IOW, the fault likely lies somewhere other than in the two ssds. For example, there could be a problem with RAM.
Back to top
View user's profile Send private message
Zucca
Moderator
Moderator


Joined: 14 Jun 2007
Posts: 3311
Location: Rasi, Finland

PostPosted: Sat Feb 16, 2019 10:02 am    Post subject: Reply with quote

tail of 'grep -vi temper /var/log/disk/smart/current':
2019-02-15T21:19:10+0200 [smartd] Device: /dev/disk/by-path/ata1-host0-target0:0:0-0:0:0:0 [SAT], starting scheduled Long Self-Test._                               
2019-02-15T21:19:10+0200 [smartd] Device: /dev/disk/by-path/ata2-host1-target1:0:0-1:0:0:0 [SAT], starting scheduled Long Self-Test._                               
2019-02-15T21:19:10+0200 [smartd] Device: /dev/disk/by-path/ata3-host2-target2:0:0-2:0:0:0 [SAT], starting scheduled Long Self-Test._                               
2019-02-15T21:19:10+0200 [smartd] Device: /dev/disk/by-path/ata4-host3-target3:0:0-3:0:0:0 [SAT], starting scheduled Long Self-Test._                               
2019-02-15T21:19:10+0200 [smartd] Device: /dev/disk/by-path/ata5-host4-target4:0:0-4:0:0:0 [SAT], starting scheduled Long Self-Test._                               
2019-02-15T21:19:10+0200 [smartd] Device: /dev/disk/by-path/ata6-host5-target5:0:0-5:0:0:0 [SAT], starting scheduled Long Self-Test._                               
2019-02-15T21:49:11+0200 [smartd] Device: /dev/disk/by-path/ata1-host0-target0:0:0-0:0:0:0 [SAT], previous self-test completed without error_                       
2019-02-15T21:49:11+0200 [smartd] Device: /dev/disk/by-path/ata2-host1-target1:0:0-1:0:0:0 [SAT], self-test in progress, 10% remaining_                             
2019-02-15T21:49:11+0200 [smartd] Device: /dev/disk/by-path/ata3-host2-target2:0:0-2:0:0:0 [SAT], previous self-test completed without error_                       
2019-02-15T21:49:11+0200 [smartd] Device: /dev/disk/by-path/ata4-host3-target3:0:0-3:0:0:0 [SAT], self-test in progress, 70% remaining_                             
2019-02-15T21:49:11+0200 [smartd] Device: /dev/disk/by-path/ata5-host4-target4:0:0-4:0:0:0 [SAT], self-test in progress, 70% remaining_                             
2019-02-15T21:49:11+0200 [smartd] Device: /dev/disk/by-path/ata6-host5-target5:0:0-5:0:0:0 [SAT], previous self-test completed without error_                       
2019-02-15T22:19:10+0200 [smartd] Device: /dev/disk/by-path/ata1-host0-target0:0:0-0:0:0:0 [SAT], starting scheduled Long Self-Test._                               
2019-02-15T22:19:10+0200 [smartd] Device: /dev/disk/by-path/ata2-host1-target1:0:0-1:0:0:0 [SAT], previous self-test completed without error_                       
2019-02-15T22:19:10+0200 [smartd] Device: /dev/disk/by-path/ata2-host1-target1:0:0-1:0:0:0 [SAT], starting scheduled Long Self-Test._                               
2019-02-15T22:19:10+0200 [smartd] Device: /dev/disk/by-path/ata3-host2-target2:0:0-2:0:0:0 [SAT], starting scheduled Long Self-Test._                               
2019-02-15T22:19:10+0200 [smartd] Device: /dev/disk/by-path/ata4-host3-target3:0:0-3:0:0:0 [SAT], self-test in progress, 50% remaining_                             
2019-02-15T22:19:10+0200 [smartd] Device: /dev/disk/by-path/ata4-host3-target3:0:0-3:0:0:0 [SAT], skip scheduled Long Self-Test; 50% remaining of current Self-Test._
2019-02-15T22:19:10+0200 [smartd] Device: /dev/disk/by-path/ata5-host4-target4:0:0-4:0:0:0 [SAT], self-test in progress, 50% remaining_                             
2019-02-15T22:19:10+0200 [smartd] Device: /dev/disk/by-path/ata5-host4-target4:0:0-4:0:0:0 [SAT], skip scheduled Long Self-Test; 50% remaining of current Self-Test._
2019-02-15T22:19:10+0200 [smartd] Device: /dev/disk/by-path/ata6-host5-target5:0:0-5:0:0:0 [SAT], starting scheduled Long Self-Test._                               
2019-02-15T22:49:10+0200 [smartd] Device: /dev/disk/by-path/ata1-host0-target0:0:0-0:0:0:0 [SAT], previous self-test completed without error_                       
2019-02-15T22:49:10+0200 [smartd] Device: /dev/disk/by-path/ata2-host1-target1:0:0-1:0:0:0 [SAT], self-test in progress, 10% remaining_                             
2019-02-15T22:49:10+0200 [smartd] Device: /dev/disk/by-path/ata3-host2-target2:0:0-2:0:0:0 [SAT], previous self-test completed without error_                       
2019-02-15T22:49:10+0200 [smartd] Device: /dev/disk/by-path/ata4-host3-target3:0:0-3:0:0:0 [SAT], self-test in progress, 20% remaining_                             
2019-02-15T22:49:10+0200 [smartd] Device: /dev/disk/by-path/ata5-host4-target4:0:0-4:0:0:0 [SAT], self-test in progress, 20% remaining_                             
2019-02-15T22:49:10+0200 [smartd] Device: /dev/disk/by-path/ata6-host5-target5:0:0-5:0:0:0 [SAT], previous self-test completed without error_                       
2019-02-15T23:19:10+0200 [smartd] Device: /dev/disk/by-path/ata2-host1-target1:0:0-1:0:0:0 [SAT], previous self-test completed without error_                       
2019-02-15T23:19:11+0200 [smartd] Device: /dev/disk/by-path/ata4-host3-target3:0:0-3:0:0:0 [SAT], self-test in progress, 10% remaining_                             
2019-02-15T23:19:11+0200 [smartd] Device: /dev/disk/by-path/ata5-host4-target4:0:0-4:0:0:0 [SAT], self-test in progress, 10% remaining_                             
2019-02-15T23:49:10+0200 [smartd] Device: /dev/disk/by-path/ata4-host3-target3:0:0-3:0:0:0 [SAT], previous self-test completed without error_                       
2019-02-15T23:49:10+0200 [smartd] Device: /dev/disk/by-path/ata5-host4-target4:0:0-4:0:0:0 [SAT], previous self-test completed without error_


And...
root shell:
# echo /dev/sd{d,e} | xargs -n 1 smartctl --attributes
=== START OF READ SMART DATA SECTION ===
SMART Attributes Data Structure revision number: 10
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x0032   095   095   050    Old_age   Always       -       0/149110891
  5 Retired_Block_Count     0x0033   100   100   003    Pre-fail  Always       -       0
  9 Power_On_Hours_and_Msec 0x0032   078   078   000    Old_age   Always       -       19742h+22m+16.660s
 12 Power_Cycle_Count       0x0032   099   099   000    Old_age   Always       -       1107
171 Program_Fail_Count      0x000a   100   100   000    Old_age   Always       -       0
172 Erase_Fail_Count        0x0032   100   100   000    Old_age   Always       -       0
174 Unexpect_Power_Loss_Ct  0x0030   000   000   000    Old_age   Offline      -       135
177 Wear_Range_Delta        0x0000   000   000   000    Old_age   Offline      -       2
181 Program_Fail_Count      0x000a   100   100   000    Old_age   Always       -       0
182 Erase_Fail_Count        0x0032   100   100   000    Old_age   Always       -       0
187 Reported_Uncorrect      0x0012   100   100   000    Old_age   Always       -       0
189 Airflow_Temperature_Cel 0x0000   034   058   000    Old_age   Offline      -       34 (Min/Max 15/58)
194 Temperature_Celsius     0x0022   034   058   000    Old_age   Always       -       34 (Min/Max 15/58)
195 ECC_Uncorr_Error_Count  0x001c   120   120   000    Old_age   Offline      -       0/149110891
196 Reallocated_Event_Count 0x0033   100   100   003    Pre-fail  Always       -       0
201 Unc_Soft_Read_Err_Rate  0x001c   120   120   000    Old_age   Offline      -       0/149110891
204 Soft_ECC_Correct_Rate   0x001c   120   120   000    Old_age   Offline      -       0/149110891
230 Life_Curve_Status       0x0013   100   100   000    Pre-fail  Always       -       100
231 SSD_Life_Left           0x0000   100   100   011    Old_age   Offline      -       77309411328
233 SandForce_Internal      0x0032   000   000   000    Old_age   Always       -       20048
234 SandForce_Internal      0x0032   000   000   000    Old_age   Always       -       23712
241 Lifetime_Writes_GiB     0x0032   000   000   000    Old_age   Always       -       23712
242 Lifetime_Reads_GiB      0x0032   000   000   000    Old_age   Always       -       31884
244 Unknown_Attribute       0x0000   099   099   010    Old_age   Offline      -       5242917

smartctl 6.6 2017-11-05 r4594 [x86_64-linux-4.14.65-gentoo-wren] (local build)
Copyright (C) 2002-17, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF READ SMART DATA SECTION ===
SMART Attributes Data Structure revision number: 10
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x0032   095   095   050    Old_age   Always       -       0/149029343
  5 Retired_Block_Count     0x0033   100   100   003    Pre-fail  Always       -       0
  9 Power_On_Hours_and_Msec 0x0032   078   078   000    Old_age   Always       -       19742h+04m+14.570s
 12 Power_Cycle_Count       0x0032   099   099   000    Old_age   Always       -       1107
171 Program_Fail_Count      0x000a   100   100   000    Old_age   Always       -       0
172 Erase_Fail_Count        0x0032   100   100   000    Old_age   Always       -       0
174 Unexpect_Power_Loss_Ct  0x0030   000   000   000    Old_age   Offline      -       135
177 Wear_Range_Delta        0x0000   000   000   000    Old_age   Offline      -       2
181 Program_Fail_Count      0x000a   100   100   000    Old_age   Always       -       0
182 Erase_Fail_Count        0x0032   100   100   000    Old_age   Always       -       0
187 Reported_Uncorrect      0x0012   100   100   000    Old_age   Always       -       0
189 Airflow_Temperature_Cel 0x0000   034   059   000    Old_age   Offline      -       34 (Min/Max 15/59)
194 Temperature_Celsius     0x0022   034   059   000    Old_age   Always       -       34 (Min/Max 15/59)
195 ECC_Uncorr_Error_Count  0x001c   120   120   000    Old_age   Offline      -       0/149029343
196 Reallocated_Event_Count 0x0033   100   100   003    Pre-fail  Always       -       0
201 Unc_Soft_Read_Err_Rate  0x001c   120   120   000    Old_age   Offline      -       0/149029343
204 Soft_ECC_Correct_Rate   0x001c   120   120   000    Old_age   Offline      -       0/149029343
230 Life_Curve_Status       0x0013   100   100   000    Pre-fail  Always       -       100
231 SSD_Life_Left           0x0000   100   100   011    Old_age   Offline      -       98784247808
233 SandForce_Internal      0x0032   000   000   000    Old_age   Always       -       20386
234 SandForce_Internal      0x0032   000   000   000    Old_age   Always       -       23677
241 Lifetime_Writes_GiB     0x0032   000   000   000    Old_age   Always       -       23677
242 Lifetime_Reads_GiB      0x0032   000   000   000    Old_age   Always       -       31532
244 Unknown_Attribute       0x0000   099   099   010    Old_age   Offline      -       5177382

I don't see anything alarming.

In addition to smart tests my system runs btrfs scrub weekly.
_________________
..: Zucca :..
Gentoo IRC channels reside on Libera.Chat.
--
Quote:
I am NaN! I am a man!
Back to top
View user's profile Send private message
NeddySeagoon
Administrator
Administrator


Joined: 05 Jul 2003
Posts: 54099
Location: 56N 3W

PostPosted: Sat Feb 16, 2019 11:18 am    Post subject: Reply with quote

Zucca,

That's all good. I tend to go with the theory posted by BitJam above that something that that stored data shared in common at sometime suffered an incident that was then propagated to your BTRFS filesystem but the underlying drives are fine.

Think RAM or CPU. THE CPU will run ECC internally but what about your RAM?
Cosmic rays are still a thing that cause soft errors but very rare compared to early DRAMs.
_________________
Regards,

NeddySeagoon

Computer users fall into two groups:-
those that do backups
those that have never had a hard drive fail.
Back to top
View user's profile Send private message
Zucca
Moderator
Moderator


Joined: 14 Jun 2007
Posts: 3311
Location: Rasi, Finland

PostPosted: Sat Feb 16, 2019 2:33 pm    Post subject: Reply with quote

NeddySeagoon wrote:
Think RAM or CPU. THE CPU will run ECC internally but what about your RAM?
Cosmic rays are still a thing that cause soft errors but very rare compared to early DRAMs.

That's what I was thinking too. Remember? I had something like that on my server.

I don't have ECC RAM.
At least the counters haven't changed. I'll keep my eye on them.
_________________
..: Zucca :..
Gentoo IRC channels reside on Libera.Chat.
--
Quote:
I am NaN! I am a man!
Back to top
View user's profile Send private message
NeddySeagoon
Administrator
Administrator


Joined: 05 Jul 2003
Posts: 54099
Location: 56N 3W

PostPosted: Sat Feb 16, 2019 5:01 pm    Post subject: Reply with quote

Zucca,

I recall that thread. Do you keep your hardware somewhere with a high Radon count?
_________________
Regards,

NeddySeagoon

Computer users fall into two groups:-
those that do backups
those that have never had a hard drive fail.
Back to top
View user's profile Send private message
Zucca
Moderator
Moderator


Joined: 14 Jun 2007
Posts: 3311
Location: Rasi, Finland

PostPosted: Sat Feb 16, 2019 5:17 pm    Post subject: Reply with quote

Well...
See this.
I live near Lahti.

So every 4th sample taken around here resulted to over 200Bq/m³.
But I have no clue if it's a lot or little.
_________________
..: Zucca :..
Gentoo IRC channels reside on Libera.Chat.
--
Quote:
I am NaN! I am a man!
Back to top
View user's profile Send private message
NeddySeagoon
Administrator
Administrator


Joined: 05 Jul 2003
Posts: 54099
Location: 56N 3W

PostPosted: Sat Feb 16, 2019 7:36 pm    Post subject: Reply with quote

Zucca,

google wrote:
One Becquerel means one radioactive disintegration per second, and 4 pCi/L equals to 148 Bq/m3

Another site says that at 4 pCi/L, you should do something about it, so you may be above the action level for your country.

At 200Bq/m³, you get a 200x the decays that a region of 1Bq/m³ sees.
In terms of effects on DRAMs, I don't know if that's statistically significant.
_________________
Regards,

NeddySeagoon

Computer users fall into two groups:-
those that do backups
those that have never had a hard drive fail.
Back to top
View user's profile Send private message
Ant P.
Watchman
Watchman


Joined: 18 Apr 2009
Posts: 6920

PostPosted: Sat Feb 16, 2019 8:32 pm    Post subject: Reply with quote

135 power failures is a bit high… that's more than one per week. Any chance it's a dirty mains power supply to blame?
Back to top
View user's profile Send private message
mike155
Advocate
Advocate


Joined: 17 Sep 2010
Posts: 4438
Location: Frankfurt, Germany

PostPosted: Sat Feb 16, 2019 8:46 pm    Post subject: Reply with quote

If I understand Btrfs correctly, .corruption_errs and/or .generation_errs may result from Btrfs or kernel crashes. Look at StackExchange: What are btrfs generation_errs?.
Back to top
View user's profile Send private message
Zucca
Moderator
Moderator


Joined: 14 Jun 2007
Posts: 3311
Location: Rasi, Finland

PostPosted: Sun Feb 17, 2019 11:05 pm    Post subject: Reply with quote

Ant P. wrote:
135 power failures is a bit high… that's more than one per week. Any chance it's a dirty mains power supply to blame?
My PSUs are mady by Super Flower. A pretty reputable brand, I think.

I guess I need to fix my online UPS then... :\
_________________
..: Zucca :..
Gentoo IRC channels reside on Libera.Chat.
--
Quote:
I am NaN! I am a man!
Back to top
View user's profile Send private message
Display posts from previous:   
Reply to topic    Gentoo Forums Forum Index Kernel & Hardware All times are GMT
Page 1 of 1

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum