View previous topic :: View next topic |
Author |
Message |
sdauth Guru
Joined: 19 Sep 2018 Posts: 569 Location: Ásgarðr
|
Posted: Sat Feb 19, 2022 8:00 pm Post subject: HDD current pending sector back to 0 after random pass |
|
|
Hi,
A couple of days ago, one of my old (2012) 3.5" HDD smart status shown 4 "current pending sector". I quickly replaced it with a new one. Today, I decided to check again the old disk with a short smart test, which quicky failed at the same LBA.
Then, I ran a full /dev/random pass to erase it and put in my dead HDD box..
But with much surprise, after the random pass, I noticed the smart value for current pending sector was back to 0. So I ran a full smart test this time, and it finished without error.
What could explain this behaviour ? Or maybe is it expected ? I always assumed that when that value was going up (as with "Reallocated Sector Count or Uncorrectable Error Count) then the HDD was heading to the grave..
Last edited by sdauth on Sat Feb 19, 2022 10:38 pm; edited 1 time in total |
|
Back to top |
|
|
mike155 Advocate
Joined: 17 Sep 2010 Posts: 4438 Location: Frankfurt, Germany
|
Posted: Sat Feb 19, 2022 9:20 pm Post subject: |
|
|
I can only guess.
A HDD can remap faulty sectors. In order to remap a faulty sector, the drive must read the sector. After that, it can write the data to a spare sector. If the drive can't read a sector to be remapped, it will increase the number of current pending sectors.
If you send a write command to the sector that needs to be remapped, the HDD can skip reading the faulty sector. It will write the new contents directly to the spare sector. After that, it will decrease the number of current pending sectors.
I'm not surprised the number of current pending sectors went down to zero after you copied /dev/random to your disk.
See: https://harddrivegeek.com/current-pending-sector-count/, section "Can You Fix/Lower Your Pending Sectors Count?" |
|
Back to top |
|
|
sdauth Guru
Joined: 19 Sep 2018 Posts: 569 Location: Ásgarðr
|
Posted: Sat Feb 19, 2022 10:47 pm Post subject: |
|
|
Thanks for the explanation, appreciated. From your link : "Pending sectors are the precursor to reallocated sectors which can be a strong indicator of a dead hard drive on the horizon."
For now that value is at 0 so all good but from now on, I will only use that disk for cold storage and monitor closely smart values when I power it on. At least, I can still use it to store some files. |
|
Back to top |
|
|
figueroa Advocate
Joined: 14 Aug 2005 Posts: 2964 Location: Edge of marsh USA
|
Posted: Sun Feb 20, 2022 5:28 am Post subject: |
|
|
I have an over 10 years uptime 500 GB WDC WD5000AAKS-00UU3A0 that has been reporting
Code: | Device: /dev/sda, 4 Currently unreadable (pending) sectors |
and
Code: | Device: /dev/sda, 2 Offline uncorrectable sectors |
for now many years. Actually, sometimes the unreadable is 2 or 3 or 4 and uncorrectable is 1 or 2, going back and forth. I'm not sure how smart SMART is sometimes.
The drive was one of a pair; the other having failed last year catastrophically with no warning. Everything was currently backed up. The computer is a remote to me server primarily for storing redundant backups and has no active user, just me, the system administrator. There is a man on-site who can do drive swaps and other hands-on maintenance.
I've got those sectors isolated into a small partition that isn't used. It's kind of an irritant but I chose to watch it until it fails. I get regular smartd daemon error reports by email. Probably not my smartest decision. If it was a user's desktop computer, I would have swapped the drive immediately. _________________ Andy Figueroa
hp pavilion hpe h8-1260t/2AB5; spinning rust x3
i7-2600 @ 3.40GHz; 16 gb; Radeon HD 7570
amd64/23.0/split-usr/desktop (stable), OpenRC, -systemd -pulseaudio -uefi |
|
Back to top |
|
|
Hu Moderator
Joined: 06 Mar 2007 Posts: 21624
|
Posted: Sun Feb 20, 2022 5:31 pm Post subject: |
|
|
figueroa wrote: | I've got those sectors isolated into a small partition that isn't used. It's kind of an irritant but I chose to watch it until it fails. I get regular smartd daemon error reports by email. Probably not my smartest decision. If it was a user's desktop computer, I would have swapped the drive immediately. | If you forcibly overwrite every sector in the scrap partition with zeroes, does the drive raise errors (and fail the write) or does it remap out the bad sectors? If the latter, that might calm the SMART complaints. |
|
Back to top |
|
|
eccerr0r Watchman
Joined: 01 Jul 2004 Posts: 9679 Location: almost Mile High in the USA
|
Posted: Tue Mar 01, 2022 3:03 am Post subject: |
|
|
As long as it still has spares in the same replacement zone as the pending sectors, a write to that failed sector will cause a remap and clear that pending sector. Copying /dev/zero or anything (including a sequential rebuild in a RAID) to a hard disk to erase or rewrite it, by that virtue, will clear out every pending sector and attempt to remap them.
Usually there shouldn't be a SMART error on BIOS boot or smartd for just having a few bad sectors. Only if you have a SMART condition alert where WORST < THRESHOLD will a SMART warning flag. Bad sectors don't count... yet...
I have a 2TB disk with 150 pending sectors, 15 offline uncorrectables, 364 reallocations already occurred in the past, and 408 rellocated sectors so far. This is called a slow moving train wreck as I keep on forcing reallocates and new ones pop up. The reallocations already tipped off SMART that this disk is on the verge of failing as it set WORST to 1 for this field. TBH, this disk is effectively dead already. It's unfortunate, I could use another 2TB of disk space... _________________ Intel Core i7 2700K/Radeon R7 250/24GB DDR3/256GB SSD
What am I supposed watching? |
|
Back to top |
|
|
figueroa Advocate
Joined: 14 Aug 2005 Posts: 2964 Location: Edge of marsh USA
|
Posted: Tue Mar 01, 2022 4:42 am Post subject: |
|
|
The only reason that I haven't yet acted on Hu's suggestion is time to do it, and the need to be careful as this is a remote server. Although that drive has 4 pending and 2 offline uncorrectable sectors, the reallocated sector count is zero. On the other hand, your (eccerr0r) 2TB drive should be demoted to toy status. _________________ Andy Figueroa
hp pavilion hpe h8-1260t/2AB5; spinning rust x3
i7-2600 @ 3.40GHz; 16 gb; Radeon HD 7570
amd64/23.0/split-usr/desktop (stable), OpenRC, -systemd -pulseaudio -uefi |
|
Back to top |
|
|
eccerr0r Watchman
Joined: 01 Jul 2004 Posts: 9679 Location: almost Mile High in the USA
|
Posted: Tue Mar 01, 2022 4:58 am Post subject: |
|
|
It's tough to fix sectors when the drive is mounted read/write, really need to make sure the filesystem knows about what you're doing in case it has a write pending from an open file to that sector. On the other hand the data is already lost, it might be enough to just try to find out what file (or directory or metadata) contains the bad sector... _________________ Intel Core i7 2700K/Radeon R7 250/24GB DDR3/256GB SSD
What am I supposed watching? |
|
Back to top |
|
|
sdauth Guru
Joined: 19 Sep 2018 Posts: 569 Location: Ásgarðr
|
Posted: Fri Jul 29, 2022 3:06 pm Post subject: |
|
|
So I just powered it on today since I need more storage for some backups. I made a short smart test (via USB adapter) and it seems ok.
Would you use it ?
Code: | smartctl 7.3 2022-02-28 r5338 [x86_64-linux-5.15.57-gentoo-gnu-x200] (local build)
Copyright (C) 2002-22, Bruce Allen, Christian Franke, www.smartmontools.org
=== START OF INFORMATION SECTION ===
Model Family: Western Digital Green
Device Model: WDC WD30EZRX-00DC0B0
Serial Number:
LU WWN Device Id: 5 0014ee 603750894
Firmware Version: 80.00A80
User Capacity: 3 000 592 982 016 bytes [3,00 TB]
Sector Sizes: 512 bytes logical, 4096 bytes physical
Device is: In smartctl database 7.3/5387
ATA Version is: ACS-2 (minor revision not indicated)
SATA Version is: SATA 3.0, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is: Fri Jul 29 17:00:31 2022 CEST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED
General SMART Values:
Offline data collection status: (0x82) Offline data collection activity
was completed without error.
Auto Offline Data Collection: Enabled.
Self-test execution status: ( 0) The previous self-test routine completed
without error or no self-test has ever
been run.
Total time to complete Offline
data collection: (38880) seconds.
Offline data collection
capabilities: (0x7b) SMART execute Offline immediate.
Auto Offline data collection on/off support.
Suspend Offline collection upon new
command.
Offline surface scan supported.
Self-test supported.
Conveyance Self-test supported.
Selective Self-test supported.
SMART capabilities: (0x0003) Saves SMART data before entering
power-saving mode.
Supports SMART auto save timer.
Error logging capability: (0x01) Error logging supported.
General Purpose Logging supported.
Short self-test routine
recommended polling time: ( 2) minutes.
Extended self-test routine
recommended polling time: ( 390) minutes.
Conveyance self-test routine
recommended polling time: ( 5) minutes.
SCT capabilities: (0x70b5) SCT Status supported.
SCT Feature Control supported.
SCT Data Table supported.
SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
1 Raw_Read_Error_Rate 0x002f 200 200 051 Pre-fail Always - 76
3 Spin_Up_Time 0x0027 183 175 021 Pre-fail Always - 5833
4 Start_Stop_Count 0x0032 097 097 000 Old_age Always - 3127
5 Reallocated_Sector_Ct 0x0033 200 200 140 Pre-fail Always - 0
7 Seek_Error_Rate 0x002e 200 200 000 Old_age Always - 0
9 Power_On_Hours 0x0032 077 077 000 Old_age Always - 17043
10 Spin_Retry_Count 0x0032 100 100 000 Old_age Always - 0
11 Calibration_Retry_Count 0x0032 100 100 000 Old_age Always - 0
12 Power_Cycle_Count 0x0032 098 098 000 Old_age Always - 2940
192 Power-Off_Retract_Count 0x0032 197 197 000 Old_age Always - 2511
193 Load_Cycle_Count 0x0032 200 200 000 Old_age Always - 621
194 Temperature_Celsius 0x0022 121 101 000 Old_age Always - 29
196 Reallocated_Event_Count 0x0032 200 200 000 Old_age Always - 0
197 Current_Pending_Sector 0x0032 200 200 000 Old_age Always - 0
198 Offline_Uncorrectable 0x0030 200 200 000 Old_age Offline - 0
199 UDMA_CRC_Error_Count 0x0032 200 200 000 Old_age Always - 0
200 Multi_Zone_Error_Rate 0x0008 200 200 000 Old_age Offline - 0
SMART Error Log Version: 1
ATA Error Count: 436 (device log contains only the most recent five errors)
CR = Command Register [HEX]
FR = Features Register [HEX]
SC = Sector Count Register [HEX]
SN = Sector Number Register [HEX]
CL = Cylinder Low Register [HEX]
CH = Cylinder High Register [HEX]
DH = Device/Head Register [HEX]
DC = Device Command Register [HEX]
ER = Error register [HEX]
ST = Status register [HEX]
Powered_Up_Time is measured from power on, and printed as
DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes,
SS=sec, and sss=millisec. It "wraps" after 49.710 days.
Error 436 occurred at disk power-on lifetime: 7807 hours (325 days + 7 hours)
When the command that caused the error occurred, the device was active or idle.
After command completion occurred, registers were:
ER ST SC SN CL CH DH
-- -- -- -- -- -- --
04 61 46 88 40 41 00 Device Fault; Error: ABRT
Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
-- -- -- -- -- -- -- -- ---------------- --------------------
ef 03 46 88 40 41 00 00 00:08:36.869 SET FEATURES [Set transfer mode]
ef 03 0c 88 40 41 00 00 00:08:36.869 SET FEATURES [Set transfer mode]
ec 03 68 88 40 41 00 00 00:08:36.869 IDENTIFY DEVICE
ef 03 46 00 08 84 00 00 00:08:36.862 SET FEATURES [Set transfer mode]
Error 435 occurred at disk power-on lifetime: 7807 hours (325 days + 7 hours)
When the command that caused the error occurred, the device was active or idle.
After command completion occurred, registers were:
ER ST SC SN CL CH DH
-- -- -- -- -- -- --
04 61 0c 88 40 41 00 Device Fault; Error: ABRT
Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
-- -- -- -- -- -- -- -- ---------------- --------------------
ef 03 0c 88 40 41 00 00 00:08:36.869 SET FEATURES [Set transfer mode]
ec 03 68 88 40 41 00 00 00:08:36.869 IDENTIFY DEVICE
ef 03 46 00 08 84 00 00 00:08:36.862 SET FEATURES [Set transfer mode]
ef 03 0c 00 08 84 00 00 00:08:36.862 SET FEATURES [Set transfer mode]
Error 434 occurred at disk power-on lifetime: 7807 hours (325 days + 7 hours)
When the command that caused the error occurred, the device was active or idle.
After command completion occurred, registers were:
ER ST SC SN CL CH DH
-- -- -- -- -- -- --
04 61 46 00 08 84 00 Device Fault; Error: ABRT
Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
-- -- -- -- -- -- -- -- ---------------- --------------------
ef 03 46 00 08 84 00 00 00:08:36.862 SET FEATURES [Set transfer mode]
ef 03 0c 00 08 84 00 00 00:08:36.862 SET FEATURES [Set transfer mode]
ec 03 08 00 08 84 00 00 00:08:36.861 IDENTIFY DEVICE
ef 03 46 d8 b2 84 00 00 00:08:05.538 SET FEATURES [Set transfer mode]
Error 433 occurred at disk power-on lifetime: 7807 hours (325 days + 7 hours)
When the command that caused the error occurred, the device was active or idle.
After command completion occurred, registers were:
ER ST SC SN CL CH DH
-- -- -- -- -- -- --
04 61 0c 00 08 84 00 Device Fault; Error: ABRT
Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
-- -- -- -- -- -- -- -- ---------------- --------------------
ef 03 0c 00 08 84 00 00 00:08:36.862 SET FEATURES [Set transfer mode]
ec 03 08 00 08 84 00 00 00:08:36.861 IDENTIFY DEVICE
ef 03 46 d8 b2 84 00 00 00:08:05.538 SET FEATURES [Set transfer mode]
ef 03 0c d8 b2 84 00 00 00:08:05.538 SET FEATURES [Set transfer mode]
Error 432 occurred at disk power-on lifetime: 7807 hours (325 days + 7 hours)
When the command that caused the error occurred, the device was active or idle.
After command completion occurred, registers were:
ER ST SC SN CL CH DH
-- -- -- -- -- -- --
04 61 46 d8 b2 84 00 Device Fault; Error: ABRT
Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
-- -- -- -- -- -- -- -- ---------------- --------------------
ef 03 46 d8 b2 84 00 00 00:08:05.538 SET FEATURES [Set transfer mode]
ef 03 0c d8 b2 84 00 00 00:08:05.538 SET FEATURES [Set transfer mode]
ec 03 e0 d8 b2 84 00 00 00:08:05.538 IDENTIFY DEVICE
ef 03 46 b0 f9 02 00 00 00:08:05.525 SET FEATURES [Set transfer mode]
SMART Self-test log structure revision number 1
Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error
# 1 Short offline Completed without error 00% 17043 -
# 2 Extended offline Completed without error 00% 17043 -
# 3 Short offline Completed: read failure 90% 16997 1228116656
# 4 Short offline Completed: read failure 90% 16997 1228116657
# 5 Extended offline Completed without error 00% 16869 -
# 6 Short offline Completed without error 00% 16832 -
# 7 Short offline Completed without error 00% 16494 -
# 8 Extended offline Completed without error 00% 16470 -
# 9 Extended offline Completed without error 00% 16410 -
#10 Extended offline Completed without error 00% 15640 -
#11 Short offline Completed without error 00% 15590 -
#12 Short offline Completed without error 00% 15441 -
#13 Short offline Completed without error 00% 15430 -
#14 Short offline Completed without error 00% 15426 -
#15 Short offline Completed without error 00% 15415 -
#16 Extended offline Completed without error 00% 15177 -
#17 Extended offline Completed without error 00% 14232 -
#18 Extended offline Completed without error 00% 14195 -
#19 Extended offline Completed without error 00% 12773 -
#20 Short offline Completed without error 00% 12766 -
#21 Extended offline Completed without error 00% 10310 -
2 of 2 failed self-tests are outdated by newer successful extended offline self-test # 2
SMART Selective self-test log data structure revision number 1
SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS
1 0 0 Not_testing
2 0 0 Not_testing
3 0 0 Not_testing
4 0 0 Not_testing
5 0 0 Not_testing
Selective self-test flags (0x0):
After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay. |
|
|
Back to top |
|
|
pjp Administrator
Joined: 16 Apr 2002 Posts: 20067
|
Posted: Fri Jul 29, 2022 4:42 pm Post subject: |
|
|
sdauth wrote: | Would you use it ? | How important is the data, or how difficult would replacing the data be in the event of hardware failure? _________________ Quis separabit? Quo animo? |
|
Back to top |
|
|
NeddySeagoon Administrator
Joined: 05 Jul 2003 Posts: 54234 Location: 56N 3W
|
Posted: Fri Jul 29, 2022 5:13 pm Post subject: |
|
|
sdauth,
Code: | === START OF INFORMATION SECTION ===
Model Family: Western Digital Green |
That says it all really.
Code: | ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
5 Reallocated_Sector_Ct 0x0033 200 200 140 Pre-fail Always - 0
196 Reallocated_Event_Count 0x0032 200 200 000 Old_age Always - 0
197 Current_Pending_Sector 0x0032 200 200 000 Old_age Always - 0 |
Whatever was wrong just fixed itself in place. There are no reallocated sectors.
The faulty sectors became good by magic.
The short test is almost worthless. Don't even think about anything less than the long test.
The error log does not suggest media errors but the self test log suggests internal drive problems.
Code: | SMART Self-test log structure revision number 1
Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error -
# 3 Short offline Completed: read failure 90% 16997 1228116656
# 4 Short offline Completed: read failure 90% 16997 1228116657 |
I wouldn't trust it. _________________ Regards,
NeddySeagoon
Computer users fall into two groups:-
those that do backups
those that have never had a hard drive fail. |
|
Back to top |
|
|
sdauth Guru
Joined: 19 Sep 2018 Posts: 569 Location: Ásgarðr
|
Posted: Fri Jul 29, 2022 5:47 pm Post subject: |
|
|
pjp wrote: | sdauth wrote: | Would you use it ? | How important is the data, or how difficult would replacing the data be in the event of hardware failure? |
It is an archive of my DVD stored in *.iso format. Well, it would be fairly easy to replace since I still have the disks in the basement although it would still take a while to dump again. Overall, not critical but painful.
NeddySeagoon wrote: | Whatever was wrong just fixed itself in place. There are no reallocated sectors.
The faulty sectors became good by magic.
The short test is almost worthless. Don't even think about anything less than the long test.
I wouldn't trust it. |
Alright. Well, the latest long test (made when I opened up this thread) returned the same value.
I'm currently filling it up with the files and will do a fresh extended long test later. Would it make any difference if I connect it directly to a SATA port ?
Anyway, after that, it will only mounted read-only.
It's not ideal but I really need to get back those precious TB on my main desktop and after all I can still get the data back if it croaks. |
|
Back to top |
|
|
NeddySeagoon Administrator
Joined: 05 Jul 2003 Posts: 54234 Location: 56N 3W
|
Posted: Fri Jul 29, 2022 6:51 pm Post subject: |
|
|
sdauth,
A non zero Current_Pending_Sector count is a count of the sectors that the drive has tried to read but failed.
It would therefore relocate them, if only it could read ream.
In short, the drive cannot read its own writing.
Its grounds for a warranty return unless the warranty has expired.
Its no quite that simple. Every time the drive goes to read a sector, it also gets a 'measure of difficulty' associated with the read.
At some threshold, difficult to read sectors are reallocated and the originals abandoned, with the data still in them.
Sometimes a read fails and the sector added to the Current_Pending_Sector count. Then a subsequent read works and its below the relocation threshold, so the sector is removed from the Current_Pending_Sector list without being relocated. This is what happened to you.
The Current_Pending_Sector count is only the sectors the drive has tried to read and failed.
There may be many more that its not tried to read yet, so are unknown. _________________ Regards,
NeddySeagoon
Computer users fall into two groups:-
those that do backups
those that have never had a hard drive fail. |
|
Back to top |
|
|
pjp Administrator
Joined: 16 Apr 2002 Posts: 20067
|
Posted: Fri Jul 29, 2022 7:07 pm Post subject: |
|
|
sdauth wrote: | I'm currently filling it up with the files | Sounds like an opportunity to shop for a replacement ;) _________________ Quis separabit? Quo animo? |
|
Back to top |
|
|
sdauth Guru
Joined: 19 Sep 2018 Posts: 569 Location: Ásgarðr
|
Posted: Fri Jul 29, 2022 8:30 pm Post subject: |
|
|
@pjp
Yeah I'm going to buy a new one very soon.
@NeddySeagoon
The HDD is currently being filled up to the max. (1/3 done currently), so once it is done, I assume that if I generate a hash for each file, it will 100% triggers a read failure (or most likely multiple) at some point right ?
All of this is really a bad idea haha and I should just buy a new drive right now
At least, I've been warned. Thanks again for the exhaustive explanation. |
|
Back to top |
|
|
NeddySeagoon Administrator
Joined: 05 Jul 2003 Posts: 54234 Location: 56N 3W
|
Posted: Fri Jul 29, 2022 8:57 pm Post subject: |
|
|
sdauth,
As long as you build the hash from the sources and don't save it is the failed drive, maybe.
All hash functions have a finite length. That means that there must be collisions in the hash space.
(Several different inputs will have the same hash value)
You will notice bad blocks in your media collection, the application will stop playback.
A hash is not required.
How will you know if the media collection is bad or the hash is bad if there is a mismatch in future? _________________ Regards,
NeddySeagoon
Computer users fall into two groups:-
those that do backups
those that have never had a hard drive fail. |
|
Back to top |
|
|
sdauth Guru
Joined: 19 Sep 2018 Posts: 569 Location: Ásgarðr
|
Posted: Sat Dec 24, 2022 11:14 am Post subject: |
|
|
I finally replaced the disk, I mounted the failing drive read-only (plugged in on a SATA port directly instead of using my USB3 adapter) and I've been able to transfer everything to the new one without any issue.
No read error during the transfer so everything looks good. |
|
Back to top |
|
|
|