[SOLVED] Clone a hard disk

Message

Vieri · Post by **Vieri** » Wed May 18, 2022 7:17 am

Hi,

I'm having trouble clonig a hard disk with an old Gentoo system on it.
I have the disk and a blank target disk (both mechanical SATA - target is bigger than source) connected via USB. They show up as /dev/sdb (source) and /dev/sdc (target).

So I run a simple dd command as shown here below:

Code: Select all

# dd if=/dev/sdb of=/dev/sdc status=progress |
711426560 bytes (711 MB, 678 MiB) copied, 120 s, 5,9 MB/s                       
dd: error reading '/dev/sdb': Input/output error                                
1389632+0 records in                                                            
1389632+0 records out                                                           
711491584 bytes (711 MB, 679 MiB) copied, 145,221 s, 4,9 MB/s

The target disk partitions are as expected:

Code: Select all

Disk /dev/sdc: 465,8 GiB, 500107862016 bytes, 976773168 sectors                 
Units: sectors of 1 * 512 = 512 bytes                                           
Sector size (logical/physical): 512 bytes / 512 bytes                           
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disklabel type: dos
Disk identifier: 0x05dbeb60

Device     Boot   Start       End   Sectors   Size Id Type
/dev/sdc1  *         63    208844    208782   102M 83 Linux
/dev/sdc2        208845   2184839   1975995 964,9M 82 Linux swap / Solaris
/dev/sdc3       2184840 488375999 486191160 231,9G 83 Linux

I can boot the target HDD, but it cannot mount "root". It stays there forever with the message "Mounting root...".

So when I connect the source and target disks to my "rescue" system again, I can confirm that it is not possible to mount root.

Code: Select all

# mount /dev/sdc3 ./disk
mount: /root/disk: wrong fs type, bad option, bad superblock on /dev/sdc
3, missing codepage or helper program, or other error.

Doing the same on the source partition works fine (I can mount and list the files):

Code: Select all

# mount /dev/sdb3 ./disk

Also, mounting the target disk's boot partition works:

Code: Select all

# mount /dev/sdc1 ./disk
# ls ./disk/

So the problem is ONLY with the root partition.

Code: Select all

# dumpe2fs -h /dev/sdc3
dumpe2fs 1.44.1 (24-Mar-2018)
dumpe2fs: Bad magic number in super-block while trying to open /dev/sdc3
Couldn't find valid filesystem superblock.

Code: Select all

# mke2fs -n /dev/sdc
mke2fs 1.44.1 (24-Mar-2018)
Found a dos partition table in /dev/sdc
Proceed anyway? (y,N) y
Creating filesystem with 122096646 4k blocks and 30531584 inodes
Filesystem UUID: 0e78bf4a-bbff-474e-84a6-83034ca45bac
Superblock backups stored on blocks:
        32768, 98304, 163840, 229376, 294912, 819200, 884736, 1605632, 2654208,
        4096000, 7962624, 11239424, 20480000, 23887872, 71663616, 78675968,
        102400000

Code: Select all

# fsck -b 32768 /dev/sdc
fsck from util-linux 2.31.1
e2fsck 1.44.1 (24-Mar-2018)
fsck.ext2: Bad magic number in super-block while trying to open /dev/sdc

The superblock could not be read or does not describe a valid ext2/ext3/ext4
filesystem.  If the device is valid and it really contains an ext2/ext3/ext4
filesystem (and not swap or ufs or something else), then the superblock
is corrupt, and you might try running e2fsck with an alternate superblock:
    e2fsck -b 8193 <device>
 or
    e2fsck -b 32768 <device>

Found a dos partition table in /dev/sdc

No matter which superblock backup I try I still get the same output.

What can I try?

Vieri

[EDIT]
BTW I know fsck found a DOS partition table in /dev/sdc, but keep in mind that the target disk is 500GB in size.

Post by **NeddySeagoon** » Wed May 18, 2022 7:28 am

Vieri,

You have an error at

711491584 bytes (711 MB, 679 MiB) copied, 145,221 s, 4,9 MB/s

down the source drive,
dmesg will tell much more. Put the whole thin onto a pastebin. The copy stopped there.
The partition table is in the first block of the drive, so that is in the part that copied.
The copy failed in the swap partition, so copying root has not started yet.

Code: Select all

smartctl -x /dev/sdb

will be informative too. That may not work over USB2.

There are better tools than dd for copying from problem disks but that's for another post, when we know what we are up against.

sdauth · Post by **sdauth** » Wed May 18, 2022 7:34 am

edit : NeddySeagoon was faster

Code: Select all

dd: error reading '/dev/sdb': Input/output error

Looks like it only copied partition table & boot partition from sdb to sdc then halted because of I/O error ?
You could try to make an archive of the root filesystem of sdb3 with fsarchiver then restore it to sdc3.

Code: Select all

fsarchiver -v -j$(nproc) -Z1 savefs sdb_root.fsa /dev/sdb3

then restore (make sure /dev/sdc3 is really the target device..

)

Code: Select all

fsarchiver -v -j$(nproc) restfs sdb_root.fsa id=0,dest=/dev/sdc3

fsarchiver will automatically recreates the partition with preserved UUID, xattrs etc..
Once done, resize2fs -p /dev/sdc3 to expand the root partition if needed.

then manually recreates swap on /dev/sdc2 with

Code: Select all

mkswap -U *old UUID from /dev/sdb2* /dev/sdc2

Vieri · Post by **Vieri** » Wed May 18, 2022 7:55 am

OK, thanks for pointing that out. I guess my source hard disk is failing:

Code: Select all

[ 2040.168570] sd 4:0:0:0: [sdb] tag#0 FAILED Result: hostbyte=DID_ERROR driverbyte=DRIVER_OK
[ 2040.168575] sd 4:0:0:0: [sdb] tag#0 CDB: Read(10) 28 00 00 21 68 00 00 00 08 00
[ 2040.168577] blk_update_request: I/O error, dev sdb, sector 2189312 op 0x0:(READ) flags 0x80700 phys_seg 1 prio clas
s 0
[ 2040.168632] blk_update_request: I/O error, dev sdb, sector 2189312 op 0x0:(READ) flags 0x0 phys_seg 1 prio class 0
[ 2040.168635] Buffer I/O error on dev sdb3, logical block 512, async page read

even if it does boot fine and seems to work OK.

I'll check out the fsarchiver that you're mentioning while I'm posting this reply (you're faster than my dd process).
Thanks

Vieri · Post by **Vieri** » Wed May 18, 2022 7:57 am

Could I just add "conv=sync,noerror" to the dd command?

sdauth · Post by **sdauth** » Wed May 18, 2022 8:04 am

Some cheap SATA to USB controller can cause issue also. Especially when both are used at the same time.

If the source disk is fine, then fsarchiver is ok to make an archive of the rootfs.
Otherwise, ddrescue can also be used if there are really errors on the source disk.

The safest route would be to only connect the source disk, make an archive of the rootfs (/dev/sdb3), once done, unplug it and connect the target disk to restore the rootfs archive. One at a time.

sdauth · Post by **sdauth** » Wed May 18, 2022 8:07 am

Vieri wrote:Could I just add "conv=sync,noerror" to the dd command?

That's what I use, I also set bs=4096 for faster copy. But of course, you will have to wait a bit more since it will do a full copy.

(and also, if another I/O error happen, you're good to start over again

)

Vieri · Post by **Vieri** » Wed May 18, 2022 8:15 am

Well, it seems my source disk is in bad shape. Here's a snippet with sync,noerror:

Code: Select all

712406016 bytes (712 MB, 679 MiB) copied, 392,85 s, 1,8 MB/s
712406528 bytes (712 MB, 679 MiB) copied, 393 s, 1,8 MB/s
dd: error reading '/dev/sdb': Input/output error
1391408+11 records in
1391419+0 records out
712406528 bytes (712 MB, 679 MiB) copied, 416,989 s, 1,7 MB/s
712407040 bytes (712 MB, 679 MiB) copied, 417 s, 1,7 MB/s
dd: error reading '/dev/sdb': Input/output error
1391408+12 records in
1391420+0 records out
712407040 bytes (712 MB, 679 MiB) copied, 441,495 s, 1,6 MB/s
712407552 bytes (712 MB, 679 MiB) copied, 441 s, 1,6 MB/s
dd: error reading '/dev/sdb': Input/output error
1391408+13 records in
1391421+0 records out
712407552 bytes (712 MB, 679 MiB) copied, 465,834 s, 1,5 MB/s
712408064 bytes (712 MB, 679 MiB) copied, 466 s, 1,5 MB/s
dd: error reading '/dev/sdb': Input/output error
1391408+14 records in
1391422+0 records out
712408064 bytes (712 MB, 679 MiB) copied, 490,032 s, 1,5 MB/s
712408576 bytes (712 MB, 679 MiB) copied, 490 s, 1,5 MB/s
dd: error reading '/dev/sdb': Input/output error
1391408+15 records in
1391423+0 records out

I'm not feeling too comfortable with this, so I'll try either ddrescue or fsarchiver. Will then try to boot the new disk, but I'm afraid I'll have some kind of data loss.

Anon-E-moose · Post by **Anon-E-moose** » Wed May 18, 2022 10:16 am

I'd pull the sata cables off completely (both ends) and reseat, just to make sure that it's not a "connection" problems vs failing disk.

Or do a smartctl check on disk.

sdauth · Post by **sdauth** » Wed May 18, 2022 10:16 am

Vieri wrote:OK, thanks for pointing that out. I guess my source hard disk is failing:

Code: Select all

[ 2040.168570] sd 4:0:0:0: [sdb] tag#0 FAILED Result: hostbyte=DID_ERROR driverbyte=DRIVER_OK
[ 2040.168575] sd 4:0:0:0: [sdb] tag#0 CDB: Read(10) 28 00 00 21 68 00 00 00 08 00
[ 2040.168577] blk_update_request: I/O error, dev sdb, sector 2189312 op 0x0:(READ) flags 0x80700 phys_seg 1 prio clas
s 0
[ 2040.168632] blk_update_request: I/O error, dev sdb, sector 2189312 op 0x0:(READ) flags 0x0 phys_seg 1 prio class 0
[ 2040.168635] Buffer I/O error on dev sdb3, logical block 512, async page read

even if it does boot fine and seems to work OK.

I remember having these kind of messages with some old HDD but there were ok after all.
Except this, first time I see that :

Code: Select all

Buffer I/O error on dev sdb3, logical block 512, async page read

Could you try to switch the SATA to USB controller ? Or / and try another USB port ? To see if you get the same output.
I only used ddrescue once so I'm not that familiar with it. It will take more time of course.
Usage is quite simple : ddrescue /dev/sdb path/to/image.dd path/to/log.txt

But first, you should try a short smartctl test on your source drive :

Code: Select all

smartctl -t short /dev/sdb

and post the output here (smartctl -a /dev/sdb)

To make sure the drive is ok.
If it is not, then with some luck, you could at least image the rootfs with fsarchiver if the problematic sectors are not located on it.

Post by **NeddySeagoon** » Wed May 18, 2022 4:04 pm

Vieri,

smantctl will tell us much more about your HDD.

At the moment, all we know is that there is a problem between the PC and the drive.
It may or may not be the drive.

ddrescue will do a much better job of reading your drive as it has extensive error handling and workarounds.

You don't cane about the content of swap, so a dirty hack is to dd partition 3, not the whole drive.
If that bit of the drive is OK, that might be enough. If its not, ddrescue will make a much better job of data recovery than dd.

Vieri · Post by **Vieri** » Wed May 18, 2022 10:33 pm

The ddrescue process completed after more than 8 hours.

Code: Select all

# ddrescue -f -n /dev/sdb /dev/sdc ./recovery.log
GNU ddrescue 1.22
     ipos:   46667 MB, non-trimmed:        0 B,  current rate:    1365 B/s
     opos:   46667 MB, non-scraped:    16384 B,  average rate:   8130 kB/s
non-tried:        0 B,  bad-sector:     4096 B,    error rate:      21 B/s
  rescued:  250059 MB,   bad areas:        8,        run time:  8h 32m 35s
pct rescued:   99.99%, read errors:       12,  remaining time:          4s
                              time since last successful read:          0s
Finished

With smartctl on source disk I get:

Code: Select all

=== START OF INFORMATION SECTION ===
Model Family:     Seagate Maxtor DiamondMax 21
Device Model:     MAXTOR STM3250310AS
Serial Number:    6RY1RF4S
Firmware Version: 3.AAC
User Capacity:    250.059.350.016 bytes [250 GB]
Sector Size:      512 bytes logical/physical
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   ATA/ATAPI-7 (minor revision not indicated)
Local Time is:    Thu May 19 02:59:01 2022 CEST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x82) Offline data collection activity
                                        was completed without error.
                                        Auto Offline Data Collection: Enabled.
Self-test execution status:      ( 121) The previous self-test completed having
                                        the read element of the test failed.
Total time to complete Offline
data collection:                (  430) seconds.
Offline data collection
capabilities:                    (0x5b) SMART execute Offline immediate.
                                        Auto Offline data collection on/off support.
                                        Suspend Offline collection upon new
                                        command.
                                        Offline surface scan supported.
                                        Self-test supported.
                                        No Conveyance Self-test supported.
                                        Selective Self-test supported.
SMART capabilities:            (0x0003) Saves SMART data before entering
                                        power-saving mode.
                                        Supports SMART auto save timer.
Error logging capability:        (0x01) Error logging supported.
                                        General Purpose Logging supported.
Short self-test routine
recommended polling time:        (   1) minutes.
Extended self-test routine
recommended polling time:        (  92) minutes.

SMART Attributes Data Structure revision number: 10
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x000f   117   068   006    Pre-fail  Always       -       122747846
  3 Spin_Up_Time            0x0003   097   097   000    Pre-fail  Always       -       0
  4 Start_Stop_Count        0x0032   100   100   020    Old_age   Always       -       225
  5 Reallocated_Sector_Ct   0x0033   100   100   036    Pre-fail  Always       -       14
  7 Seek_Error_Rate         0x000f   082   060   030    Pre-fail  Always       -       171164356
  9 Power_On_Hours          0x0032   001   001   000    Old_age   Always       -       126459
 10 Spin_Retry_Count        0x0013   100   100   097    Pre-fail  Always       -       0
 12 Power_Cycle_Count       0x0032   100   100   020    Old_age   Always       -       222
187 Reported_Uncorrect      0x0032   001   001   000    Old_age   Always       -       239
189 High_Fly_Writes         0x003a   100   100   000    Old_age   Always       -       0
190 Airflow_Temperature_Cel 0x0022   066   049   045    Old_age   Always       -       34 (Min/Max 30/34)
194 Temperature_Celsius     0x0022   034   051   000    Old_age   Always       -       34 (0 18 0 0 0)
195 Hardware_ECC_Recovered  0x001a   117   046   000    Old_age   Always       -       148942409
197 Current_Pending_Sector  0x0012   100   100   000    Old_age   Always       -       11
198 Offline_Uncorrectable   0x0010   100   100   000    Old_age   Offline      -       11
199 UDMA_CRC_Error_Count    0x003e   200   200   000    Old_age   Always       -       0
200 Multi_Zone_Error_Rate   0x0000   100   253   000    Old_age   Offline      -       0
202 Data_Address_Mark_Errs  0x0032   100   253   000    Old_age   Always       -       0

SMART Error Log Version: 1
ATA Error Count: 239 (device log contains only the most recent five errors)
        CR = Command Register [HEX]
        FR = Features Register [HEX]
        SC = Sector Count Register [HEX]
        SN = Sector Number Register [HEX]
        CL = Cylinder Low Register [HEX]
        CH = Cylinder High Register [HEX]
        DH = Device/Head Register [HEX]
        DC = Device Command Register [HEX]
        ER = Error register [HEX]
        ST = Status register [HEX]
Powered_Up_Time is measured from power on, and printed as
DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes,
SS=sec, and sss=millisec. It "wraps" after 49.710 days.

Error 239 occurred at disk power-on lifetime: 60922 hours (2538 days + 10 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  40 51 00 39 ce 6e e0  Error: UNC at LBA = 0x006ece39 = 7261753

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  25 03 08 38 ce 6e e0 00      00:57:33.670  READ DMA EXT
  25 03 08 38 ce 6e e0 00      00:57:58.456  READ DMA EXT
  25 03 08 38 ce 6e e0 00      00:57:54.366  READ DMA EXT
  25 03 08 38 ce 6e e0 00      00:57:50.281  READ DMA EXT
  25 03 08 38 ce 6e e0 00      00:57:46.189  READ DMA EXT

Error 238 occurred at disk power-on lifetime: 60922 hours (2538 days + 10 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  40 51 00 39 ce 6e e0  Error: UNC at LBA = 0x006ece39 = 7261753

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  25 03 08 38 ce 6e e0 00      00:57:33.670  READ DMA EXT
  25 03 08 38 ce 6e e0 00      00:57:29.589  READ DMA EXT
  25 03 08 38 ce 6e e0 00      00:57:54.366  READ DMA EXT
  25 03 08 38 ce 6e e0 00      00:57:50.281  READ DMA EXT
  25 03 08 38 ce 6e e0 00      00:57:46.189  READ DMA EXT

Error 237 occurred at disk power-on lifetime: 60922 hours (2538 days + 10 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  40 51 00 39 ce 6e e0  Error: UNC at LBA = 0x006ece39 = 7261753

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  25 03 08 38 ce 6e e0 00      00:57:33.670  READ DMA EXT
  25 03 08 38 ce 6e e0 00      00:57:29.589  READ DMA EXT
  25 03 08 38 ce 6e e0 00      00:57:25.499  READ DMA EXT
  25 03 08 38 ce 6e e0 00      00:57:50.281  READ DMA EXT
  25 03 08 40 ce 6e e0 00      00:57:46.189  READ DMA EXT

Error 236 occurred at disk power-on lifetime: 60922 hours (2538 days + 10 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  40 51 00 39 ce 6e e0  Error: UNC at LBA = 0x006ece39 = 7261753

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  25 03 08 38 ce 6e e0 00      00:57:33.670  READ DMA EXT
  25 03 08 38 ce 6e e0 00      00:57:29.589  READ DMA EXT
  25 03 08 38 ce 6e e0 00      00:57:25.499  READ DMA EXT
  25 03 08 40 ce 6e e0 00      00:57:21.412  READ DMA EXT
  25 03 08 48 ce 6e e0 00      00:57:46.189  READ DMA EXT

Error 235 occurred at disk power-on lifetime: 60922 hours (2538 days + 10 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  40 51 00 39 ce 6e e0  Error: UNC at LBA = 0x006ece39 = 7261753

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  25 03 08 38 ce 6e e0 00      00:57:33.670  READ DMA EXT
  25 03 08 38 ce 6e e0 00      00:57:29.589  READ DMA EXT
  25 03 08 40 ce 6e e0 00      00:57:25.499  READ DMA EXT
  25 03 08 48 ce 6e e0 00      00:57:21.412  READ DMA EXT
  25 03 08 50 ce 6e e0 00      00:57:17.322  READ DMA EXT

SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Short captive       Completed: read failure       90%     60922         91147833
# 2  Short offline       Completed: read failure       90%     60922         91147833

SMART Selective self-test log data structure revision number 1
 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.

This HDD is connected to the computer via a USB docking station.
I've yet to test it by connecting it straight to the Motherboard's SATA ports.
Is it worth the try?

In any case, this looks bad.
However, I can now mount /dev/sdc3 which is the target disk, so I guess ddrescue did a pretty good job.
I still need to boot the system with the new HDD to see if there has been any data corruption or loss.

Thanks again!

figueroa · Post by **figueroa** » Thu May 19, 2022 4:33 am

In my opinion, attempting to clone is the wrong solution for the use case. The right solution is to restore from current backup. You can search the forums here for "system backup" for recent discussions, or just follow this link: https://wiki.archlinux.org/index.php/rs ... tem_backup

Not having current backups is the source of endless grief.

Irre · Post by **Irre** » Thu May 19, 2022 9:31 am

I used ddrescue when I cloned my defective harddisk. There were several errors I wasnt aware of, but it worked

sdauth · Post by **sdauth** » Thu May 19, 2022 11:37 am

Vieri wrote:The ddrescue process completed after more than 8 hours.

[...]

This HDD is connected to the computer via a USB docking station.
I've yet to test it by connecting it straight to the Motherboard's SATA ports.
Is it worth the try?

In any case, this looks bad.
However, I can now mount /dev/sdc3 which is the target disk, so I guess ddrescue did a pretty good job.
I still need to boot the system with the new HDD to see if there has been any data corruption or loss.

Thanks again!

Your source drive is indeed starting to die

This would be explain the error with standard dd..
After all, it finished fine with ddrescue so your USB docking station is not the cause, but most likely the drive itself. You can still try to connect it directly via SATA but I don't think it will make much difference, the smartctl report is rather explicit.
What you could do next is to identify which files were impacted by the "current pending sectors" using debugfs on your source drive (here is a good explanation : https://mellowhost.com/blog/identifying ... linux.html )

Post by **NeddySeagoon** » Thu May 19, 2022 4:01 pm

Vieri,

Here's the interesting bits form your SMART data and what it means ...

Code: Select all

ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  5 Reallocated_Sector_Ct   0x0033   100   100   036    Pre-fail  Always       -       14
  9 Power_On_Hours          0x0032   001   001   000    Old_age   Always       -       126459
197 Current_Pending_Sector  0x0012   100   100   000    Old_age   Always       -       11

The drive has already remapped 14 sectors because they were getting difficult to read. Its supposed to do that, so that bad sectors are never visible to the operating system.
However, there are 11 sectors that the drive would like to remap but can't because it can't read them. That's 11 sectors that in knows about because its tried to read them in the course of normal system operation. There may be many others.
In short, the drive can no longer reliably read it's own writing.

That drive has 126,459 operating hours on the clock. I get nervous at 70,000 hours, so its expected to be past its best.

The smart tests abort at first fail, so

Code: Select all

SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Short captive       Completed: read failure       90%     60922         91147833
# 2  Short offline       Completed: read failure       90%     60922         91147833

Its interesting that the drive was starting to fail at 60922, or over 60,000 operating hours ago.

The ddrescue log will be interesting. It will tall where the remaining errors are.
The summary

Code: Select all

bad-sector:     4096 B  bad areas:        8

tells that ddrescue failed to read eight sectors and the drive knew about at least 11 unreadable sectors, so ddrescue did its thing and coaxed another read from some of those bad blocks.

What next?

Post your recovery.log file. It will tell where the bad sectors are. It they are inside the swap partition it won't matter, unless you have a hibernate image there you want back.
If not, we can make ddrescue try harder, just on the bits not yet recovered. That will fill in the 'holes' in your image on /dev/sdc.

Vieri · Post by **Vieri** » Tue May 24, 2022 6:23 am

Yes, recovering from a backup is a better approach. It's not that I don't have one, but it's one year old. I'm not worried about losing the service, just hoping to save time not having to update the 1-year-old system. Recovering this failing disk would help. It also serves as an exercise to see what can actually be done in these extreme cases.

The ddrescue recovery file contains:

Code: Select all

# cat recovery.log
# Mapfile. Created by GNU ddrescue version 1.22
# Command line: ddrescue -f -n /dev/sdb /dev/sdc ./recovery.log
# Start time:   2022-05-18 12:57:39
# Current time: 2022-05-18 21:30:14
# Finished
# current_pos  current_status  current_pass
0xADD9C8000     +               1
#      pos        size  status
0x00000000  0x2A688000  +
0x2A688000  0x00000200  -
0x2A688200  0x00000C00  /
0x2A688E00  0x00000200  -
0x2A689000  0x000DE000  +
0x2A767000  0x00000200  -
0x2A767200  0x00000C00  /
0x2A767E00  0x00000200  -
0x2A768000  0xA5E36E000  +
0xA88AD6000  0x00000200  -
0xA88AD6200  0x00001C00  /
0xA88AD7E00  0x00000200  -
0xA88AD8000  0x54EEF000  +
0xADD9C7000  0x00000200  -
0xADD9C7200  0x00000C00  /
0xADD9C7E00  0x00000200  -
0xADD9C8000  0x2F5B166000  +

I guess I could run the same ddrescue command, but with the -A, --try-again parameter, right?

[EDIT]

I might not need to use -A. I ran the following command:

Code: Select all

# ddrescue -f -n -r3 /dev/sdb /dev/sdc ./recovery.log
GNU ddrescue 1.22
Press Ctrl-C to interrupt
Initial status (read from mapfile)
rescued: 250059 MB, tried: 20480 B, bad-sector: 4096 B, bad areas: 8

     ipos:   46667 MB, non-trimmed:        0 B,  current rate:       0 B/s
     opos:   46667 MB, non-scraped:    16384 B,  average rate:       0 B/s
non-tried:        0 B,  bad-sector:     4096 B,    error rate:      21 B/s
  rescued:  250059 MB,   bad areas:        8,        run time:      9m 45s
pct rescued:   99.99%, read errors:       24,  remaining time:         n/a
                              time since last successful read:         n/a
Finished

Code: Select all

# cat ./recovery.log
# Mapfile. Created by GNU ddrescue version 1.22
# Command line: ddrescue -f -n -r3 /dev/sdb /dev/sdc ./recovery.log
# Start time:   2022-05-24 11:24:41
# Current time: 2022-05-24 11:34:26
# Finished
# current_pos  current_status  current_pass
0xADD9C7E00     +               3
#      pos        size  status
0x00000000  0x2A688000  +
0x2A688000  0x00000200  -
0x2A688200  0x00000C00  /
0x2A688E00  0x00000200  -
0x2A689000  0x000DE000  +
0x2A767000  0x00000200  -
0x2A767200  0x00000C00  /
0x2A767E00  0x00000200  -
0x2A768000  0xA5E36E000  +
0xA88AD6000  0x00000200  -
0xA88AD6200  0x00001C00  /
0xA88AD7E00  0x00000200  -
0xA88AD8000  0x54EEF000  +
0xADD9C7000  0x00000200  -
0xADD9C7200  0x00000C00  /
0xADD9C7E00  0x00000200  -
0xADD9C8000  0x2F5B166000  +

Post by **NeddySeagoon** » Tue May 24, 2022 10:04 am

Vieri,

Your partition table is

Code: Select all

Device     Boot   Start       End   Sectors   Size Id Type
/dev/sdc1  *         63    208844    208782   102M 83 Linux
/dev/sdc2        208845   2184839   1975995 964,9M 82 Linux swap / Solaris
/dev/sdc3       2184840 488375999 486191160 231,9G 83 Linux

Converting the start sectors into bytes we have, in decimal
32256
106928640
1118638080

Or in hex, which is coing bo be more useful for the rest of thus post.

Code: Select all

0x7E00
0x65f9a00
0x42AD1000

From your ddrescue log we have

Code: Select all

#      pos        size  status
0x00000000  0x2A688000  +
0x2A688000  0x00000200  -
0x2A688200  0x00000C00  /
0x2A688E00  0x00000200  -
0x2A689000  0x000DE000  +
0x2A767000  0x00000200  -
0x2A767200  0x00000C00  /
0x2A767E00  0x00000200  -
0x2A768000  0xA5E36E000  +
0xA88AD6000  0x00000200  -
0xA88AD6200  0x00001C00  /
0xA88AD7E00  0x00000200  -
0xA88AD8000  0x54EEF000  +
0xADD9C7000  0x00000200  -
0xADD9C7200  0x00000C00  /
0xADD9C7E00  0x00000200  -
0xADD9C8000  0x2F5B166000  +

Matching the above starts with your partition table
0x00000000 0x2A688000 + covers all of partition 1

Code: Select all

0x2A688000  0x00000200  -
0x2A688200  0x00000C00  /
0x2A688E00  0x00000200  -
0x2A689000  0x000DE000  +
0x2A767000  0x00000200  -
0x2A767200  0x00000C00  /
0x2A767E00  0x00000200  -
0x2A768000  0xA5E36E000  +

All fall into partition 2 which is swap.

Code: Select all

0xA88AD6000  0x00000200  -
0xA88AD6200  0x00001C00  /
0xA88AD7E00  0x00000200  -
0xA88AD8000  0x54EEF000  +
0xADD9C7000  0x00000200  -
0xADD9C7200  0x00000C00  /
0xADD9C7E00  0x00000200  -
0xADD9C8000  0x2F5B166000  +[

all all in partition 3

Doing some sorting, These regions are recovered.

Code: Select all

0x00000000  0x2A688000  +
0x2A689000  0x000DE000  +
0x2A768000  0xA5E36E000  +
0xA88AD8000  0x54EEF000  +
0xADD9C8000  0x2F5B166000  +

These are 'failed', they are all one disc block 512 decimal is 200 hex.

Code: Select all

0x2A688000  0x00000200  -
0x2A688E00  0x00000200  -
0x2A767000  0x00000200  -
0x2A767E00  0x00000200  -
0xA88AD6000  0x00000200  -
0xA88AD7E00  0x00000200  -
0xADD9C7000  0x00000200  -
0xADD9C7E00  0x00000200  -

These regions are not recovered yet but ddrescue can try harder.

Code: Select all

0x2A688200  0x00000C00  /
0x2A767200  0x00000C00  /
0xA88AD6200  0x00001C00  /
0xADD9C7200  0x00000C00  /

Lets tell ddrescue to try harder a good command is

Code: Select all

ddrescue -MAd -r 16 ...

using the same input, output and log files as you already have.
ddrescue will only try to fill in the holes in the already recovered data.

There is more.
As you have a conventional HDD with lots of hours on it, it is likely that the mechanics are worn so in its normal operating position, the alignment is poor.
Run the above command six times, moving the drive each time so that its face up, face down and operating on all four edges.
This way, gravity may help the alignment to coax on last read, even from the 'failed' sectors.
Be kind to the drive, Let it spin down before you move it.

If you still don't have all your data back, add the -R option

Code: Select all

ddrescue -MAdR -r 16 ...

and do all six passes again.

Post the recovery.log when that completes.

Vieri · Post by **Vieri** » Tue May 24, 2022 10:53 pm

I didn't know the trick about changing the HDD's position to try to read bad sectors. Nice... at least until we're left with SSDs only.

However, I couldn't change the drive's orientation because the docking station doesn't firmly hold the disks in. I have to keep them vertically -- any other position is risky.
My other option is to connect it via SATA cable directly to the motherboard. In that case I can easily try out a few positions.

Anyway, I ran this:

Code: Select all

# ddrescue -MAdf -r 16 /dev/sdb /dev/sdc ./recovery.log
GNU ddrescue 1.22
Press Ctrl-C to interrupt
Initial status (read from mapfile)
rescued: 250059 MB, tried: 0 B, bad-sector: 0 B, bad areas: 0

     ipos:  711494 kB, non-trimmed:        0 B,  current rate:       0 B/s
     opos:  711494 kB, non-scraped:        0 B,  average rate:       3 B/s
non-tried:        0 B,  bad-sector:     5632 B,    error rate:      21 B/s
  rescued:  250059 MB,   bad areas:        4,        run time:  1h 13m 52s
pct rescued:   99.99%, read errors:      190,  remaining time:         n/a
                              time since last successful read:  1h 10m 12s
Finished

and it gave me this log:

Code: Select all

# Mapfile. Created by GNU ddrescue version 1.22
# Command line: ddrescue -MAdf -r 16 /dev/sdb /dev/sdc ./recovery.log
# Start time:   2022-05-24 16:29:36
# Current time: 2022-05-24 17:43:28
# Finished
# current_pos  current_status  current_pass
0x2A688C00     +               16
#      pos        size  status
0x00000000  0x2A688A00  +
0x2A688A00  0x00000200  -
0x2A688C00  0x000DF200  +
0x2A767E00  0x00000200  -
0x2A768000  0xA5E36E400  +
0xA88AD6400  0x00001000  -
0xA88AD7400  0x54EEFE00  +
0xADD9C7200  0x00000200  -
0xADD9C7400  0x2F5B166C00  +

Getting there...

I will try the R option next.

Thanks

Vieri · Post by **Vieri** » Wed May 25, 2022 10:37 am

It didn't budge much:

Code: Select all

# ddrescue -MAdRf -r 16 /dev/sdb /dev/sdc ./recovery.log
[sudo] password for hmaninf:
GNU ddrescue 1.22
Press Ctrl-C to interrupt
Initial status (read from mapfile)
rescued: 250059 MB, tried: 0 B, bad-sector: 0 B, bad areas: 0

     ipos:   46667 MB, non-trimmed:        0 B,  current rate:       0 B/s
     opos:   46667 MB, non-scraped:        0 B,  average rate:       0 B/s
non-tried:        0 B,  bad-sector:     5632 B,    error rate:      20 B/s
  rescued:  250059 MB,   bad areas:        4,        run time:  1h 13m  1s
pct rescued:   99.99%, read errors:      188,  remaining time:         n/a
                              time since last successful read:         n/a
Finished

Code: Select all

# cat ./recovery.log
# Mapfile. Created by GNU ddrescue version 1.22
# Command line: ddrescue -MAdRf -r 16 /dev/sdb /dev/sdc ./recovery.log
# Start time:   2022-05-25 03:31:21
# Current time: 2022-05-25 04:44:22
# Finished
# current_pos  current_status  current_pass
0xADD9C7200     +               16
#      pos        size  status
0x00000000  0x2A688A00  +
0x2A688A00  0x00000200  -
0x2A688C00  0x000DF200  +
0x2A767E00  0x00000200  -
0x2A768000  0xA5E36E400  +
0xA88AD6400  0x00001000  -
0xA88AD7400  0x54EEFE00  +
0xADD9C7200  0x00000200  -
0xADD9C7400  0x2F5B166C00  +

I'll give it another try without R, but I think I'll just settle with what I got.

Thanks, everyone.

Post by **NeddySeagoon** » Wed May 25, 2022 4:21 pm

Vieri,

If you are ready to settle, there is one more thing to do.

Code: Select all

ddrescue -MAdRf -r 256 ...

This time, pick up the drive and rotate it at right angles to the spin axis while ddrescue runs.
You will feel the gryroscopic force due to the rotating mass of the platters.

Its normally a very bad thing to do to the drive but ff you are ready to settle, what is there to loose?

If you are really lucky, the regions not yet recovered are unallocated space.

Vieri · Post by **Vieri** » Thu May 26, 2022 4:51 pm

I've booted the cloned disk, and everything seems to be working fine. Time will tell.

Thank you very much.