Gentoo Forums
Gentoo Forums
Gentoo Forums
Quick Search: in
[SOLVED] Inaccurate Smart data on SSD
View unanswered posts
View posts from last 24 hours

 
Reply to topic    Gentoo Forums Forum Index Other Things Gentoo
View previous topic :: View next topic  
Author Message
Troopo
Guru
Guru


Joined: 14 Jun 2015
Posts: 310

PostPosted: Sun Jun 13, 2021 10:54 pm    Post subject: [SOLVED] Inaccurate Smart data on SSD Reply with quote

Hi,

This is going to sound strange but i have an SSD which i've been using since 2012 and the power on hours in the smart data doesn't make any sense:

Code:

  9 Power_On_Hours          0x0032   100   100   001    Old_age   Always       -       34421


It's about 3 years but i've had it close to 10 now so i'm wondering what could cause this data to be erased or lost and how much can i trust the other values.

Any other ideas how to check? or get more info?

The only thing i can think of is i did update the FW version because of a bug they had with it, could that have erased the data?


Last edited by Troopo on Wed Jun 16, 2021 7:04 pm; edited 1 time in total
Back to top
View user's profile Send private message
eccerr0r
Watchman
Watchman


Joined: 01 Jul 2004
Posts: 9679
Location: almost Mile High in the USA

PostPosted: Sun Jun 13, 2021 11:43 pm    Post subject: Reply with quote

This data is apparently corruptible for some drives. One of my SSDs has nearly a million power on hours on it due to corruption. Likewise I've had it for about a decade. SSD still works just fine but it reached well past its "working lifetime" according to onboard SMART. My others are pretty much spot on to how many hours I think they should be.

Really just depends on the firmware and how it deals with corruption if anything at all.
_________________
Intel Core i7 2700K/Radeon R7 250/24GB DDR3/256GB SSD
What am I supposed watching?
Back to top
View user's profile Send private message
Troopo
Guru
Guru


Joined: 14 Jun 2015
Posts: 310

PostPosted: Mon Jun 14, 2021 9:50 pm    Post subject: Reply with quote

eccerr0r wrote:
This data is apparently corruptible for some drives. One of my SSDs has nearly a million power on hours on it due to corruption. Likewise I've had it for about a decade. SSD still works just fine but it reached well past its "working lifetime" according to onboard SMART. My others are pretty much spot on to how many hours I think they should be.

Really just depends on the firmware and how it deals with corruption if anything at all.


Thanks, so what you are saying is basically do not trust SMART which in turns means i can't really know or estimate when will the drive die.
So i would wait until some bad signs start showing up and then guess it will happen soon, but having backups is a must in any stage of it.
Back to top
View user's profile Send private message
eccerr0r
Watchman
Watchman


Joined: 01 Jul 2004
Posts: 9679
Location: almost Mile High in the USA

PostPosted: Mon Jun 14, 2021 10:46 pm    Post subject: Reply with quote

Always backup no matter if it still appears to be new...

Also possible for firmware bugs, now I think my particular drive suffers from it...
_________________
Intel Core i7 2700K/Radeon R7 250/24GB DDR3/256GB SSD
What am I supposed watching?
Back to top
View user's profile Send private message
Aiken
Apprentice
Apprentice


Joined: 22 Jan 2003
Posts: 239
Location: Toowoomba/Australia

PostPosted: Mon Jun 14, 2021 11:19 pm    Post subject: Reply with quote

I have a wd blue 250G ssd that has been running close enough to 24x7 since june 20 2019. That should put it's power on hours over 17,000. As I type it's power on hours is 2054. 2 hours ago was 2172. Not a typo, it's hours jump backwards this morning.

Found I have 582 days of smart data for that drive and have been looking at it this morning. 582 days is less than the 2 years the drive has been powered on. After running somewhere over 150 days full time it's power on hours had dropped to 20 before climbing back up then dropping again. I have spent the last 1/2 hour costing a new drive as I don't trust the smart data in it.
_________________
Beware the grue.
Back to top
View user's profile Send private message
dmpogo
Advocate
Advocate


Joined: 02 Sep 2004
Posts: 3267
Location: Canada

PostPosted: Tue Jun 15, 2021 3:57 am    Post subject: Reply with quote

Troopo wrote:
eccerr0r wrote:
This data is apparently corruptible for some drives. One of my SSDs has nearly a million power on hours on it due to corruption. Likewise I've had it for about a decade. SSD still works just fine but it reached well past its "working lifetime" according to onboard SMART. My others are pretty much spot on to how many hours I think they should be.

Really just depends on the firmware and how it deals with corruption if anything at all.


Thanks, so what you are saying is basically do not trust SMART which in turns means i can't really know or estimate when will the drive die.
So i would wait until some bad signs start showing up and then guess it will happen soon, but having backups is a must in any stage of it.


I'd say the only real indication about drive dying is an appearance of bad blocks. The rest, like power on time, is not really actionable. It may still last years, it may die tomorrow.
Oops, sorry, you are talking about SSD's, bad blocks is more about old spinners. And well, you know how long your drive was on anyway :)
Back to top
View user's profile Send private message
eccerr0r
Watchman
Watchman


Joined: 01 Jul 2004
Posts: 9679
Location: almost Mile High in the USA

PostPosted: Tue Jun 15, 2021 2:49 pm    Post subject: Reply with quote

I've found in general the POH data is fairly accurate for all SMART devices, just that I saw that corruption on the one SSD I have, and I had another HDD that computed POH funny because it didn't want to divide by 10... (It calculated 1 hour as 64 minutes elapsed instead of 60 minutes!)
_________________
Intel Core i7 2700K/Radeon R7 250/24GB DDR3/256GB SSD
What am I supposed watching?
Back to top
View user's profile Send private message
Troopo
Guru
Guru


Joined: 14 Jun 2015
Posts: 310

PostPosted: Wed Jun 16, 2021 5:00 pm    Post subject: Reply with quote

eccerr0r wrote:
Always backup no matter if it still appears to be new...

Also possible for firmware bugs, now I think my particular drive suffers from it...


Yeah...

Aiken wrote:
I have a wd blue 250G ssd that has been running close enough to 24x7 since june 20 2019. That should put it's power on hours over 17,000. As I type it's power on hours is 2054. 2 hours ago was 2172. Not a typo, it's hours jump backwards this morning.

Found I have 582 days of smart data for that drive and have been looking at it this morning. 582 days is less than the 2 years the drive has been powered on. After running somewhere over 150 days full time it's power on hours had dropped to 20 before climbing back up then dropping again. I have spent the last 1/2 hour costing a new drive as I don't trust the smart data in it.


I've checked that and the counter works as intended so this behavior doesn't happen with mine but thanks for sharing that.

dmpogo wrote:
Troopo wrote:
eccerr0r wrote:
This data is apparently corruptible for some drives. One of my SSDs has nearly a million power on hours on it due to corruption. Likewise I've had it for about a decade. SSD still works just fine but it reached well past its "working lifetime" according to onboard SMART. My others are pretty much spot on to how many hours I think they should be.

Really just depends on the firmware and how it deals with corruption if anything at all.


Thanks, so what you are saying is basically do not trust SMART which in turns means i can't really know or estimate when will the drive die.
So i would wait until some bad signs start showing up and then guess it will happen soon, but having backups is a must in any stage of it.


I'd say the only real indication about drive dying is an appearance of bad blocks. The rest, like power on time, is not really actionable. It may still last years, it may die tomorrow.
Oops, sorry, you are talking about SSD's, bad blocks is more about old spinners. And well, you know how long your drive was on anyway :)


The Smart data is actually good with this drive and i don't see any indications of problems when using it however due to the fact i know how old and used the drive is, can i really trust that Smart data? probably not...

eccerr0r wrote:
I've found in general the POH data is fairly accurate for all SMART devices, just that I saw that corruption on the one SSD I have, and I had another HDD that computed POH funny because it didn't want to divide by 10... (It calculated 1 hour as 64 minutes elapsed instead of 60 minutes!)


I didn't all the possible calculations but since i don't actually have the original Smart data my estimation was that my drive is on the 7th year of the 3 years warranty life span, but then again that is per Max load and i only run Gentoo off it so ihave no idea how much Gentoo actually writes on a daily basis to make a guess about how long it has to live but so far so good.
Back to top
View user's profile Send private message
dmpogo
Advocate
Advocate


Joined: 02 Sep 2004
Posts: 3267
Location: Canada

PostPosted: Wed Jun 16, 2021 5:47 pm    Post subject: Reply with quote

Troopo wrote:


dmpogo wrote:

I'd say the only real indication about drive dying is an appearance of bad blocks. The rest, like power on time, is not really actionable. It may still last years, it may die tomorrow.
Oops, sorry, you are talking about SSD's, bad blocks is more about old spinners. And well, you know how long your drive was on anyway :)


The Smart data is actually good with this drive and i don't see any indications of problems when using it however due to the fact i know how old and used the drive is, can i really trust that Smart data? probably not...



My point is that you can not make any decisions based on this data (this is what I meant by "not actionable" )
Back to top
View user's profile Send private message
Troopo
Guru
Guru


Joined: 14 Jun 2015
Posts: 310

PostPosted: Wed Jun 16, 2021 7:04 pm    Post subject: Reply with quote

dmpogo wrote:
Troopo wrote:


dmpogo wrote:

I'd say the only real indication about drive dying is an appearance of bad blocks. The rest, like power on time, is not really actionable. It may still last years, it may die tomorrow.
Oops, sorry, you are talking about SSD's, bad blocks is more about old spinners. And well, you know how long your drive was on anyway :)


The Smart data is actually good with this drive and i don't see any indications of problems when using it however due to the fact i know how old and used the drive is, can i really trust that Smart data? probably not...



My point is that you can not make any decisions based on this data (this is what I meant by "not actionable" )


So it seems, it's worse than i initially thought.

I'm gonna mark this thread a resolved

Thanks everyone
Back to top
View user's profile Send private message
Goverp
Advocate
Advocate


Joined: 07 Mar 2007
Posts: 2004

PostPosted: Thu Jun 17, 2021 8:28 am    Post subject: Reply with quote

As Neddy Seagoon has often pointed out, you shouldn't try to interpret the raw data; it's often packed into fields shared with other values. IIUC the smartmon database is full of rules to help parse the data for a particular bit of hardware, but may get it wrong.

IIUC, the reliable and important date are the value, worst and threshold numbers, where values of 100 are good (typical for new kit), and values below threshold bad.
_________________
Greybeard
Back to top
View user's profile Send private message
Troopo
Guru
Guru


Joined: 14 Jun 2015
Posts: 310

PostPosted: Thu Jun 17, 2021 8:43 am    Post subject: Reply with quote

Goverp wrote:
As Neddy Seagoon has often pointed out, you shouldn't try to interpret the raw data; it's often packed into fields shared with other values. IIUC the smartmon database is full of rules to help parse the data for a particular bit of hardware, but may get it wrong.

IIUC, the reliable and important date are the value, worst and threshold numbers, where values of 100 are good (typical for new kit), and values below threshold bad.


I agree but the data is strange so i don't know if i can trust it you know?
Take a look (10 years old SSD):
Code:

ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x002f   100   100   050    Pre-fail  Always       -       0
  5 Reallocated_Sector_Ct   0x0033   100   100   010    Pre-fail  Always       -       0
  9 Power_On_Hours          0x0032   100   100   001    Old_age   Always       -       34477
 12 Power_Cycle_Count       0x0032   100   100   001    Old_age   Always       -       4608
170 Grown_Failing_Block_Ct  0x0033   100   100   010    Pre-fail  Always       -       0
171 Program_Fail_Count      0x0032   100   100   001    Old_age   Always       -       0
172 Erase_Fail_Count        0x0032   100   100   001    Old_age   Always       -       0
173 Wear_Leveling_Count     0x0033   087   087   010    Pre-fail  Always       -       390
174 Unexpect_Power_Loss_Ct  0x0032   100   100   001    Old_age   Always       -       812
181 Non4k_Aligned_Access    0x0022   100   100   001    Old_age   Always       -       1015 201 813
183 SATA_Iface_Downshift    0x0032   100   100   001    Old_age   Always       -       0
184 End-to-End_Error        0x0033   100   100   050    Pre-fail  Always       -       0
187 Reported_Uncorrect      0x0032   100   100   001    Old_age   Always       -       0
188 Command_Timeout         0x0032   100   100   001    Old_age   Always       -       0
189 Factory_Bad_Block_Ct    0x000e   100   100   001    Old_age   Always       -       48
194 Temperature_Celsius     0x0022   100   100   000    Old_age   Always       -       0
195 Hardware_ECC_Recovered  0x003a   100   100   001    Old_age   Always       -       0
196 Reallocated_Event_Count 0x0032   100   100   001    Old_age   Always       -       0
197 Current_Pending_Sector  0x0032   100   100   001    Old_age   Always       -       0
198 Offline_Uncorrectable   0x0030   100   100   001    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x0032   100   100   001    Old_age   Always       -       0
202 Perc_Rated_Life_Used    0x0018   087   087   001    Old_age   Offline      -       13
206 Write_Error_Rate        0x000e   100   100   001    Old_age   Always       -       0

Back to top
View user's profile Send private message
eccerr0r
Watchman
Watchman


Joined: 01 Jul 2004
Posts: 9679
Location: almost Mile High in the USA

PostPosted: Thu Jun 17, 2021 12:14 pm    Post subject: Reply with quote

That's near 4 years POH, but what makes you think it should be higher anyway? If you turned the machine off at night, this would be reasonable. If it was on 24/7 then it'd be another story perhaps. Sort of like my hdd that had that 64/60 issue but indeed quite a bit farther off than this...

How did you treat this apparently MLC disk - did you use it like a hdd or did you try to save write cycles on it? It's showing signs of wear, but it's still got a lot of life in it.
_________________
Intel Core i7 2700K/Radeon R7 250/24GB DDR3/256GB SSD
What am I supposed watching?
Back to top
View user's profile Send private message
Troopo
Guru
Guru


Joined: 14 Jun 2015
Posts: 310

PostPosted: Thu Jun 17, 2021 1:00 pm    Post subject: Reply with quote

eccerr0r wrote:
That's near 4 years POH, but what makes you think it should be higher anyway? If you turned the machine off at night, this would be reasonable. If it was on 24/7 then it'd be another story perhaps. Sort of like my hdd that had that 64/60 issue but indeed quite a bit farther off than this...

How did you treat this apparently MLC disk - did you use it like a hdd or did you try to save write cycles on it? It's showing signs of wear, but it's still got a lot of life in it.


Very good point, it runs about 12h a day and i got it back in 2012, that's 9 years lets even go with 8:
8 * 365 * 12 = 35040
It is pretty close.

It used to run Win7 for 3-4 maybe but since it's 64GB i had to split the user\appdata with an hdd so i'm guessing most of the writes went there.
After that it runs Gentoo ever since with TRIM so that's mostly logs and configs.

Problem is it's old and doesn't have TLW so i had to calculate that from raw data and estimate the numbers, still not sure how much it has to live but it isn't showing a lot of wear signs.
This is why i asked.
Back to top
View user's profile Send private message
eccerr0r
Watchman
Watchman


Joined: 01 Jul 2004
Posts: 9679
Location: almost Mile High in the USA

PostPosted: Thu Jun 17, 2021 3:20 pm    Post subject: Reply with quote

Ah... so yes it makes perfect sense since you shut off at night. My weird corrupted SSD has over 900K hours POH according to SMART which however does not make sense. If it were true, it was in service before the invention of ENIAC... imagine what ENIAC could do with hundreds of GB of storage...

Fortunately your SSD does give average erase count. This is saying explicitly that you have 87% life left, so nothing to worry about. A 64GB SSD with today's OS loads will turn over quite a bit quicker than with larger SSDs of course. I have another SSD on my Atom netbook which is only 32GB. It runs Gentoo, and now this SSD I have no clue how often it's been erased - yet it still appears to give 100 for its SMART fields without telling me anything about which one is dealing with wear.

Based on how I've used this 32GB SSD, I'd imagine it's been erased each block on average at least 100 times by now, fortunately it is a MLC device as well so it should likewise have a lot of life left.

On another note, "modern" larger SSDs that are in the large fractional to terabyte range are invariably TLC or worse, and can only be erased hundreds of times at best. But that means you'd have to rewrite many terabytes before it's time to replace it. Your MLC drive could likewise take the same load before failing because it can take more erase cycles than TLC flash.
_________________
Intel Core i7 2700K/Radeon R7 250/24GB DDR3/256GB SSD
What am I supposed watching?
Back to top
View user's profile Send private message
Troopo
Guru
Guru


Joined: 14 Jun 2015
Posts: 310

PostPosted: Thu Jun 17, 2021 10:51 pm    Post subject: Reply with quote

eccerr0r wrote:
Ah... so yes it makes perfect sense since you shut off at night. My weird corrupted SSD has over 900K hours POH according to SMART which however does not make sense. If it were true, it was in service before the invention of ENIAC... imagine what ENIAC could do with hundreds of GB of storage...

Fortunately your SSD does give average erase count. This is saying explicitly that you have 87% life left, so nothing to worry about. A 64GB SSD with today's OS loads will turn over quite a bit quicker than with larger SSDs of course. I have another SSD on my Atom netbook which is only 32GB. It runs Gentoo, and now this SSD I have no clue how often it's been erased - yet it still appears to give 100 for its SMART fields without telling me anything about which one is dealing with wear.

Based on how I've used this 32GB SSD, I'd imagine it's been erased each block on average at least 100 times by now, fortunately it is a MLC device as well so it should likewise have a lot of life left.

On another note, "modern" larger SSDs that are in the large fractional to terabyte range are invariably TLC or worse, and can only be erased hundreds of times at best. But that means you'd have to rewrite many terabytes before it's time to replace it. Your MLC drive could likewise take the same load before failing because it can take more erase cycles than TLC flash.


Thanks for the reassurance, that much i've already suspected\calculated only the night part wasn't taken into account until you mentioned it :)
Back to top
View user's profile Send private message
Display posts from previous:   
Reply to topic    Gentoo Forums Forum Index Other Things Gentoo All times are GMT
Page 1 of 1

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum