View previous topic :: View next topic |
Author |
Message |
Troopo Guru
Joined: 14 Jun 2015 Posts: 310
|
Posted: Sun Jun 13, 2021 10:54 pm Post subject: [SOLVED] Inaccurate Smart data on SSD |
|
|
Hi,
This is going to sound strange but i have an SSD which i've been using since 2012 and the power on hours in the smart data doesn't make any sense:
Code: |
9 Power_On_Hours 0x0032 100 100 001 Old_age Always - 34421
|
It's about 3 years but i've had it close to 10 now so i'm wondering what could cause this data to be erased or lost and how much can i trust the other values.
Any other ideas how to check? or get more info?
The only thing i can think of is i did update the FW version because of a bug they had with it, could that have erased the data?
Last edited by Troopo on Wed Jun 16, 2021 7:04 pm; edited 1 time in total |
|
Back to top |
|
|
eccerr0r Watchman
Joined: 01 Jul 2004 Posts: 9679 Location: almost Mile High in the USA
|
Posted: Sun Jun 13, 2021 11:43 pm Post subject: |
|
|
This data is apparently corruptible for some drives. One of my SSDs has nearly a million power on hours on it due to corruption. Likewise I've had it for about a decade. SSD still works just fine but it reached well past its "working lifetime" according to onboard SMART. My others are pretty much spot on to how many hours I think they should be.
Really just depends on the firmware and how it deals with corruption if anything at all. _________________ Intel Core i7 2700K/Radeon R7 250/24GB DDR3/256GB SSD
What am I supposed watching? |
|
Back to top |
|
|
Troopo Guru
Joined: 14 Jun 2015 Posts: 310
|
Posted: Mon Jun 14, 2021 9:50 pm Post subject: |
|
|
eccerr0r wrote: | This data is apparently corruptible for some drives. One of my SSDs has nearly a million power on hours on it due to corruption. Likewise I've had it for about a decade. SSD still works just fine but it reached well past its "working lifetime" according to onboard SMART. My others are pretty much spot on to how many hours I think they should be.
Really just depends on the firmware and how it deals with corruption if anything at all. |
Thanks, so what you are saying is basically do not trust SMART which in turns means i can't really know or estimate when will the drive die.
So i would wait until some bad signs start showing up and then guess it will happen soon, but having backups is a must in any stage of it. |
|
Back to top |
|
|
eccerr0r Watchman
Joined: 01 Jul 2004 Posts: 9679 Location: almost Mile High in the USA
|
Posted: Mon Jun 14, 2021 10:46 pm Post subject: |
|
|
Always backup no matter if it still appears to be new...
Also possible for firmware bugs, now I think my particular drive suffers from it... _________________ Intel Core i7 2700K/Radeon R7 250/24GB DDR3/256GB SSD
What am I supposed watching? |
|
Back to top |
|
|
Aiken Apprentice
Joined: 22 Jan 2003 Posts: 239 Location: Toowoomba/Australia
|
Posted: Mon Jun 14, 2021 11:19 pm Post subject: |
|
|
I have a wd blue 250G ssd that has been running close enough to 24x7 since june 20 2019. That should put it's power on hours over 17,000. As I type it's power on hours is 2054. 2 hours ago was 2172. Not a typo, it's hours jump backwards this morning.
Found I have 582 days of smart data for that drive and have been looking at it this morning. 582 days is less than the 2 years the drive has been powered on. After running somewhere over 150 days full time it's power on hours had dropped to 20 before climbing back up then dropping again. I have spent the last 1/2 hour costing a new drive as I don't trust the smart data in it. _________________ Beware the grue. |
|
Back to top |
|
|
dmpogo Advocate
Joined: 02 Sep 2004 Posts: 3267 Location: Canada
|
Posted: Tue Jun 15, 2021 3:57 am Post subject: |
|
|
Troopo wrote: | eccerr0r wrote: | This data is apparently corruptible for some drives. One of my SSDs has nearly a million power on hours on it due to corruption. Likewise I've had it for about a decade. SSD still works just fine but it reached well past its "working lifetime" according to onboard SMART. My others are pretty much spot on to how many hours I think they should be.
Really just depends on the firmware and how it deals with corruption if anything at all. |
Thanks, so what you are saying is basically do not trust SMART which in turns means i can't really know or estimate when will the drive die.
So i would wait until some bad signs start showing up and then guess it will happen soon, but having backups is a must in any stage of it. |
I'd say the only real indication about drive dying is an appearance of bad blocks. The rest, like power on time, is not really actionable. It may still last years, it may die tomorrow.
Oops, sorry, you are talking about SSD's, bad blocks is more about old spinners. And well, you know how long your drive was on anyway |
|
Back to top |
|
|
eccerr0r Watchman
Joined: 01 Jul 2004 Posts: 9679 Location: almost Mile High in the USA
|
Posted: Tue Jun 15, 2021 2:49 pm Post subject: |
|
|
I've found in general the POH data is fairly accurate for all SMART devices, just that I saw that corruption on the one SSD I have, and I had another HDD that computed POH funny because it didn't want to divide by 10... (It calculated 1 hour as 64 minutes elapsed instead of 60 minutes!) _________________ Intel Core i7 2700K/Radeon R7 250/24GB DDR3/256GB SSD
What am I supposed watching? |
|
Back to top |
|
|
Troopo Guru
Joined: 14 Jun 2015 Posts: 310
|
Posted: Wed Jun 16, 2021 5:00 pm Post subject: |
|
|
eccerr0r wrote: | Always backup no matter if it still appears to be new...
Also possible for firmware bugs, now I think my particular drive suffers from it... |
Yeah...
Aiken wrote: | I have a wd blue 250G ssd that has been running close enough to 24x7 since june 20 2019. That should put it's power on hours over 17,000. As I type it's power on hours is 2054. 2 hours ago was 2172. Not a typo, it's hours jump backwards this morning.
Found I have 582 days of smart data for that drive and have been looking at it this morning. 582 days is less than the 2 years the drive has been powered on. After running somewhere over 150 days full time it's power on hours had dropped to 20 before climbing back up then dropping again. I have spent the last 1/2 hour costing a new drive as I don't trust the smart data in it. |
I've checked that and the counter works as intended so this behavior doesn't happen with mine but thanks for sharing that.
dmpogo wrote: | Troopo wrote: | eccerr0r wrote: | This data is apparently corruptible for some drives. One of my SSDs has nearly a million power on hours on it due to corruption. Likewise I've had it for about a decade. SSD still works just fine but it reached well past its "working lifetime" according to onboard SMART. My others are pretty much spot on to how many hours I think they should be.
Really just depends on the firmware and how it deals with corruption if anything at all. |
Thanks, so what you are saying is basically do not trust SMART which in turns means i can't really know or estimate when will the drive die.
So i would wait until some bad signs start showing up and then guess it will happen soon, but having backups is a must in any stage of it. |
I'd say the only real indication about drive dying is an appearance of bad blocks. The rest, like power on time, is not really actionable. It may still last years, it may die tomorrow.
Oops, sorry, you are talking about SSD's, bad blocks is more about old spinners. And well, you know how long your drive was on anyway |
The Smart data is actually good with this drive and i don't see any indications of problems when using it however due to the fact i know how old and used the drive is, can i really trust that Smart data? probably not...
eccerr0r wrote: | I've found in general the POH data is fairly accurate for all SMART devices, just that I saw that corruption on the one SSD I have, and I had another HDD that computed POH funny because it didn't want to divide by 10... (It calculated 1 hour as 64 minutes elapsed instead of 60 minutes!) |
I didn't all the possible calculations but since i don't actually have the original Smart data my estimation was that my drive is on the 7th year of the 3 years warranty life span, but then again that is per Max load and i only run Gentoo off it so ihave no idea how much Gentoo actually writes on a daily basis to make a guess about how long it has to live but so far so good. |
|
Back to top |
|
|
dmpogo Advocate
Joined: 02 Sep 2004 Posts: 3267 Location: Canada
|
Posted: Wed Jun 16, 2021 5:47 pm Post subject: |
|
|
Troopo wrote: |
dmpogo wrote: |
I'd say the only real indication about drive dying is an appearance of bad blocks. The rest, like power on time, is not really actionable. It may still last years, it may die tomorrow.
Oops, sorry, you are talking about SSD's, bad blocks is more about old spinners. And well, you know how long your drive was on anyway |
The Smart data is actually good with this drive and i don't see any indications of problems when using it however due to the fact i know how old and used the drive is, can i really trust that Smart data? probably not...
|
My point is that you can not make any decisions based on this data (this is what I meant by "not actionable" ) |
|
Back to top |
|
|
Troopo Guru
Joined: 14 Jun 2015 Posts: 310
|
Posted: Wed Jun 16, 2021 7:04 pm Post subject: |
|
|
dmpogo wrote: | Troopo wrote: |
dmpogo wrote: |
I'd say the only real indication about drive dying is an appearance of bad blocks. The rest, like power on time, is not really actionable. It may still last years, it may die tomorrow.
Oops, sorry, you are talking about SSD's, bad blocks is more about old spinners. And well, you know how long your drive was on anyway |
The Smart data is actually good with this drive and i don't see any indications of problems when using it however due to the fact i know how old and used the drive is, can i really trust that Smart data? probably not...
|
My point is that you can not make any decisions based on this data (this is what I meant by "not actionable" ) |
So it seems, it's worse than i initially thought.
I'm gonna mark this thread a resolved
Thanks everyone |
|
Back to top |
|
|
Goverp Advocate
Joined: 07 Mar 2007 Posts: 2004
|
Posted: Thu Jun 17, 2021 8:28 am Post subject: |
|
|
As Neddy Seagoon has often pointed out, you shouldn't try to interpret the raw data; it's often packed into fields shared with other values. IIUC the smartmon database is full of rules to help parse the data for a particular bit of hardware, but may get it wrong.
IIUC, the reliable and important date are the value, worst and threshold numbers, where values of 100 are good (typical for new kit), and values below threshold bad. _________________ Greybeard |
|
Back to top |
|
|
Troopo Guru
Joined: 14 Jun 2015 Posts: 310
|
Posted: Thu Jun 17, 2021 8:43 am Post subject: |
|
|
Goverp wrote: | As Neddy Seagoon has often pointed out, you shouldn't try to interpret the raw data; it's often packed into fields shared with other values. IIUC the smartmon database is full of rules to help parse the data for a particular bit of hardware, but may get it wrong.
IIUC, the reliable and important date are the value, worst and threshold numbers, where values of 100 are good (typical for new kit), and values below threshold bad. |
I agree but the data is strange so i don't know if i can trust it you know?
Take a look (10 years old SSD):
Code: |
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
1 Raw_Read_Error_Rate 0x002f 100 100 050 Pre-fail Always - 0
5 Reallocated_Sector_Ct 0x0033 100 100 010 Pre-fail Always - 0
9 Power_On_Hours 0x0032 100 100 001 Old_age Always - 34477
12 Power_Cycle_Count 0x0032 100 100 001 Old_age Always - 4608
170 Grown_Failing_Block_Ct 0x0033 100 100 010 Pre-fail Always - 0
171 Program_Fail_Count 0x0032 100 100 001 Old_age Always - 0
172 Erase_Fail_Count 0x0032 100 100 001 Old_age Always - 0
173 Wear_Leveling_Count 0x0033 087 087 010 Pre-fail Always - 390
174 Unexpect_Power_Loss_Ct 0x0032 100 100 001 Old_age Always - 812
181 Non4k_Aligned_Access 0x0022 100 100 001 Old_age Always - 1015 201 813
183 SATA_Iface_Downshift 0x0032 100 100 001 Old_age Always - 0
184 End-to-End_Error 0x0033 100 100 050 Pre-fail Always - 0
187 Reported_Uncorrect 0x0032 100 100 001 Old_age Always - 0
188 Command_Timeout 0x0032 100 100 001 Old_age Always - 0
189 Factory_Bad_Block_Ct 0x000e 100 100 001 Old_age Always - 48
194 Temperature_Celsius 0x0022 100 100 000 Old_age Always - 0
195 Hardware_ECC_Recovered 0x003a 100 100 001 Old_age Always - 0
196 Reallocated_Event_Count 0x0032 100 100 001 Old_age Always - 0
197 Current_Pending_Sector 0x0032 100 100 001 Old_age Always - 0
198 Offline_Uncorrectable 0x0030 100 100 001 Old_age Offline - 0
199 UDMA_CRC_Error_Count 0x0032 100 100 001 Old_age Always - 0
202 Perc_Rated_Life_Used 0x0018 087 087 001 Old_age Offline - 13
206 Write_Error_Rate 0x000e 100 100 001 Old_age Always - 0
|
|
|
Back to top |
|
|
eccerr0r Watchman
Joined: 01 Jul 2004 Posts: 9679 Location: almost Mile High in the USA
|
Posted: Thu Jun 17, 2021 12:14 pm Post subject: |
|
|
That's near 4 years POH, but what makes you think it should be higher anyway? If you turned the machine off at night, this would be reasonable. If it was on 24/7 then it'd be another story perhaps. Sort of like my hdd that had that 64/60 issue but indeed quite a bit farther off than this...
How did you treat this apparently MLC disk - did you use it like a hdd or did you try to save write cycles on it? It's showing signs of wear, but it's still got a lot of life in it. _________________ Intel Core i7 2700K/Radeon R7 250/24GB DDR3/256GB SSD
What am I supposed watching? |
|
Back to top |
|
|
Troopo Guru
Joined: 14 Jun 2015 Posts: 310
|
Posted: Thu Jun 17, 2021 1:00 pm Post subject: |
|
|
eccerr0r wrote: | That's near 4 years POH, but what makes you think it should be higher anyway? If you turned the machine off at night, this would be reasonable. If it was on 24/7 then it'd be another story perhaps. Sort of like my hdd that had that 64/60 issue but indeed quite a bit farther off than this...
How did you treat this apparently MLC disk - did you use it like a hdd or did you try to save write cycles on it? It's showing signs of wear, but it's still got a lot of life in it. |
Very good point, it runs about 12h a day and i got it back in 2012, that's 9 years lets even go with 8:
8 * 365 * 12 = 35040
It is pretty close.
It used to run Win7 for 3-4 maybe but since it's 64GB i had to split the user\appdata with an hdd so i'm guessing most of the writes went there.
After that it runs Gentoo ever since with TRIM so that's mostly logs and configs.
Problem is it's old and doesn't have TLW so i had to calculate that from raw data and estimate the numbers, still not sure how much it has to live but it isn't showing a lot of wear signs.
This is why i asked. |
|
Back to top |
|
|
eccerr0r Watchman
Joined: 01 Jul 2004 Posts: 9679 Location: almost Mile High in the USA
|
Posted: Thu Jun 17, 2021 3:20 pm Post subject: |
|
|
Ah... so yes it makes perfect sense since you shut off at night. My weird corrupted SSD has over 900K hours POH according to SMART which however does not make sense. If it were true, it was in service before the invention of ENIAC... imagine what ENIAC could do with hundreds of GB of storage...
Fortunately your SSD does give average erase count. This is saying explicitly that you have 87% life left, so nothing to worry about. A 64GB SSD with today's OS loads will turn over quite a bit quicker than with larger SSDs of course. I have another SSD on my Atom netbook which is only 32GB. It runs Gentoo, and now this SSD I have no clue how often it's been erased - yet it still appears to give 100 for its SMART fields without telling me anything about which one is dealing with wear.
Based on how I've used this 32GB SSD, I'd imagine it's been erased each block on average at least 100 times by now, fortunately it is a MLC device as well so it should likewise have a lot of life left.
On another note, "modern" larger SSDs that are in the large fractional to terabyte range are invariably TLC or worse, and can only be erased hundreds of times at best. But that means you'd have to rewrite many terabytes before it's time to replace it. Your MLC drive could likewise take the same load before failing because it can take more erase cycles than TLC flash. _________________ Intel Core i7 2700K/Radeon R7 250/24GB DDR3/256GB SSD
What am I supposed watching? |
|
Back to top |
|
|
Troopo Guru
Joined: 14 Jun 2015 Posts: 310
|
Posted: Thu Jun 17, 2021 10:51 pm Post subject: |
|
|
eccerr0r wrote: | Ah... so yes it makes perfect sense since you shut off at night. My weird corrupted SSD has over 900K hours POH according to SMART which however does not make sense. If it were true, it was in service before the invention of ENIAC... imagine what ENIAC could do with hundreds of GB of storage...
Fortunately your SSD does give average erase count. This is saying explicitly that you have 87% life left, so nothing to worry about. A 64GB SSD with today's OS loads will turn over quite a bit quicker than with larger SSDs of course. I have another SSD on my Atom netbook which is only 32GB. It runs Gentoo, and now this SSD I have no clue how often it's been erased - yet it still appears to give 100 for its SMART fields without telling me anything about which one is dealing with wear.
Based on how I've used this 32GB SSD, I'd imagine it's been erased each block on average at least 100 times by now, fortunately it is a MLC device as well so it should likewise have a lot of life left.
On another note, "modern" larger SSDs that are in the large fractional to terabyte range are invariably TLC or worse, and can only be erased hundreds of times at best. But that means you'd have to rewrite many terabytes before it's time to replace it. Your MLC drive could likewise take the same load before failing because it can take more erase cycles than TLC flash. |
Thanks for the reassurance, that much i've already suspected\calculated only the night part wasn't taken into account until you mentioned it |
|
Back to top |
|
|
|