View previous topic :: View next topic |
Author |
Message |
Vieri l33t
Joined: 18 Dec 2005 Posts: 874
|
Posted: Wed May 18, 2022 7:17 am Post subject: [SOLVED] Clone a hard disk |
|
|
Hi,
I'm having trouble clonig a hard disk with an old Gentoo system on it.
I have the disk and a blank target disk (both mechanical SATA - target is bigger than source) connected via USB. They show up as /dev/sdb (source) and /dev/sdc (target).
So I run a simple dd command as shown here below:
Code: | # dd if=/dev/sdb of=/dev/sdc status=progress |
711426560 bytes (711 MB, 678 MiB) copied, 120 s, 5,9 MB/s
dd: error reading '/dev/sdb': Input/output error
1389632+0 records in
1389632+0 records out
711491584 bytes (711 MB, 679 MiB) copied, 145,221 s, 4,9 MB/s |
The target disk partitions are as expected:
Code: | Disk /dev/sdc: 465,8 GiB, 500107862016 bytes, 976773168 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disklabel type: dos
Disk identifier: 0x05dbeb60
Device Boot Start End Sectors Size Id Type
/dev/sdc1 * 63 208844 208782 102M 83 Linux
/dev/sdc2 208845 2184839 1975995 964,9M 82 Linux swap / Solaris
/dev/sdc3 2184840 488375999 486191160 231,9G 83 Linux |
I can boot the target HDD, but it cannot mount "root". It stays there forever with the message "Mounting root...".
So when I connect the source and target disks to my "rescue" system again, I can confirm that it is not possible to mount root.
Code: | # mount /dev/sdc3 ./disk
mount: /root/disk: wrong fs type, bad option, bad superblock on /dev/sdc
3, missing codepage or helper program, or other error. |
Doing the same on the source partition works fine (I can mount and list the files):
Code: | # mount /dev/sdb3 ./disk |
Also, mounting the target disk's boot partition works:
Code: | # mount /dev/sdc1 ./disk
# ls ./disk/ |
So the problem is ONLY with the root partition.
Code: | # dumpe2fs -h /dev/sdc3
dumpe2fs 1.44.1 (24-Mar-2018)
dumpe2fs: Bad magic number in super-block while trying to open /dev/sdc3
Couldn't find valid filesystem superblock. |
Code: | # mke2fs -n /dev/sdc
mke2fs 1.44.1 (24-Mar-2018)
Found a dos partition table in /dev/sdc
Proceed anyway? (y,N) y
Creating filesystem with 122096646 4k blocks and 30531584 inodes
Filesystem UUID: 0e78bf4a-bbff-474e-84a6-83034ca45bac
Superblock backups stored on blocks:
32768, 98304, 163840, 229376, 294912, 819200, 884736, 1605632, 2654208,
4096000, 7962624, 11239424, 20480000, 23887872, 71663616, 78675968,
102400000 |
Code: | # fsck -b 32768 /dev/sdc
fsck from util-linux 2.31.1
e2fsck 1.44.1 (24-Mar-2018)
fsck.ext2: Bad magic number in super-block while trying to open /dev/sdc
The superblock could not be read or does not describe a valid ext2/ext3/ext4
filesystem. If the device is valid and it really contains an ext2/ext3/ext4
filesystem (and not swap or ufs or something else), then the superblock
is corrupt, and you might try running e2fsck with an alternate superblock:
e2fsck -b 8193 <device>
or
e2fsck -b 32768 <device>
Found a dos partition table in /dev/sdc |
No matter which superblock backup I try I still get the same output.
What can I try?
Vieri
[EDIT]
BTW I know fsck found a DOS partition table in /dev/sdc, but keep in mind that the target disk is 500GB in size.
Last edited by Vieri on Wed May 25, 2022 10:38 am; edited 2 times in total |
|
Back to top |
|
|
NeddySeagoon Administrator
Joined: 05 Jul 2003 Posts: 54216 Location: 56N 3W
|
Posted: Wed May 18, 2022 7:28 am Post subject: |
|
|
Vieri,
You have an error at Quote: | 711491584 bytes (711 MB, 679 MiB) copied, 145,221 s, 4,9 MB/s | down the source drive,
dmesg will tell much more. Put the whole thin onto a pastebin. The copy stopped there.
The partition table is in the first block of the drive, so that is in the part that copied.
The copy failed in the swap partition, so copying root has not started yet.
Code: | smartctl -x /dev/sdb | will be informative too. That may not work over USB2.
There are better tools than dd for copying from problem disks but that's for another post, when we know what we are up against. _________________ Regards,
NeddySeagoon
Computer users fall into two groups:-
those that do backups
those that have never had a hard drive fail. |
|
Back to top |
|
|
sdauth Guru
Joined: 19 Sep 2018 Posts: 568 Location: Ásgarðr
|
Posted: Wed May 18, 2022 7:34 am Post subject: |
|
|
edit : NeddySeagoon was faster
Code: | dd: error reading '/dev/sdb': Input/output error |
Looks like it only copied partition table & boot partition from sdb to sdc then halted because of I/O error ?
You could try to make an archive of the root filesystem of sdb3 with fsarchiver then restore it to sdc3.
Code: | fsarchiver -v -j$(nproc) -Z1 savefs sdb_root.fsa /dev/sdb3 |
then restore (make sure /dev/sdc3 is really the target device.. )
Code: | fsarchiver -v -j$(nproc) restfs sdb_root.fsa id=0,dest=/dev/sdc3 |
fsarchiver will automatically recreates the partition with preserved UUID, xattrs etc..
Once done, resize2fs -p /dev/sdc3 to expand the root partition if needed.
then manually recreates swap on /dev/sdc2 with Code: | mkswap -U *old UUID from /dev/sdb2* /dev/sdc2 |
|
|
Back to top |
|
|
Vieri l33t
Joined: 18 Dec 2005 Posts: 874
|
Posted: Wed May 18, 2022 7:55 am Post subject: |
|
|
OK, thanks for pointing that out. I guess my source hard disk is failing:
Code: | [ 2040.168570] sd 4:0:0:0: [sdb] tag#0 FAILED Result: hostbyte=DID_ERROR driverbyte=DRIVER_OK
[ 2040.168575] sd 4:0:0:0: [sdb] tag#0 CDB: Read(10) 28 00 00 21 68 00 00 00 08 00
[ 2040.168577] blk_update_request: I/O error, dev sdb, sector 2189312 op 0x0:(READ) flags 0x80700 phys_seg 1 prio clas
s 0
[ 2040.168632] blk_update_request: I/O error, dev sdb, sector 2189312 op 0x0:(READ) flags 0x0 phys_seg 1 prio class 0
[ 2040.168635] Buffer I/O error on dev sdb3, logical block 512, async page read |
even if it does boot fine and seems to work OK.
I'll check out the fsarchiver that you're mentioning while I'm posting this reply (you're faster than my dd process).
Thanks |
|
Back to top |
|
|
Vieri l33t
Joined: 18 Dec 2005 Posts: 874
|
Posted: Wed May 18, 2022 7:57 am Post subject: |
|
|
Could I just add "conv=sync,noerror" to the dd command? |
|
Back to top |
|
|
sdauth Guru
Joined: 19 Sep 2018 Posts: 568 Location: Ásgarðr
|
Posted: Wed May 18, 2022 8:04 am Post subject: |
|
|
Some cheap SATA to USB controller can cause issue also. Especially when both are used at the same time.
If the source disk is fine, then fsarchiver is ok to make an archive of the rootfs.
Otherwise, ddrescue can also be used if there are really errors on the source disk.
The safest route would be to only connect the source disk, make an archive of the rootfs (/dev/sdb3), once done, unplug it and connect the target disk to restore the rootfs archive. One at a time. |
|
Back to top |
|
|
sdauth Guru
Joined: 19 Sep 2018 Posts: 568 Location: Ásgarðr
|
Posted: Wed May 18, 2022 8:07 am Post subject: |
|
|
Vieri wrote: | Could I just add "conv=sync,noerror" to the dd command? |
That's what I use, I also set bs=4096 for faster copy. But of course, you will have to wait a bit more since it will do a full copy. (and also, if another I/O error happen, you're good to start over again ) |
|
Back to top |
|
|
Vieri l33t
Joined: 18 Dec 2005 Posts: 874
|
Posted: Wed May 18, 2022 8:15 am Post subject: |
|
|
Well, it seems my source disk is in bad shape. Here's a snippet with sync,noerror:
Code: | 712406016 bytes (712 MB, 679 MiB) copied, 392,85 s, 1,8 MB/s
712406528 bytes (712 MB, 679 MiB) copied, 393 s, 1,8 MB/s
dd: error reading '/dev/sdb': Input/output error
1391408+11 records in
1391419+0 records out
712406528 bytes (712 MB, 679 MiB) copied, 416,989 s, 1,7 MB/s
712407040 bytes (712 MB, 679 MiB) copied, 417 s, 1,7 MB/s
dd: error reading '/dev/sdb': Input/output error
1391408+12 records in
1391420+0 records out
712407040 bytes (712 MB, 679 MiB) copied, 441,495 s, 1,6 MB/s
712407552 bytes (712 MB, 679 MiB) copied, 441 s, 1,6 MB/s
dd: error reading '/dev/sdb': Input/output error
1391408+13 records in
1391421+0 records out
712407552 bytes (712 MB, 679 MiB) copied, 465,834 s, 1,5 MB/s
712408064 bytes (712 MB, 679 MiB) copied, 466 s, 1,5 MB/s
dd: error reading '/dev/sdb': Input/output error
1391408+14 records in
1391422+0 records out
712408064 bytes (712 MB, 679 MiB) copied, 490,032 s, 1,5 MB/s
712408576 bytes (712 MB, 679 MiB) copied, 490 s, 1,5 MB/s
dd: error reading '/dev/sdb': Input/output error
1391408+15 records in
1391423+0 records out |
I'm not feeling too comfortable with this, so I'll try either ddrescue or fsarchiver. Will then try to boot the new disk, but I'm afraid I'll have some kind of data loss. |
|
Back to top |
|
|
Anon-E-moose Watchman
Joined: 23 May 2008 Posts: 6097 Location: Dallas area
|
Posted: Wed May 18, 2022 10:16 am Post subject: |
|
|
I'd pull the sata cables off completely (both ends) and reseat, just to make sure that it's not a "connection" problems vs failing disk.
Or do a smartctl check on disk. _________________ PRIME x570-pro, 3700x, 6.1 zen kernel
gcc 13, profile 17.0 (custom bare multilib), openrc, wayland |
|
Back to top |
|
|
sdauth Guru
Joined: 19 Sep 2018 Posts: 568 Location: Ásgarðr
|
Posted: Wed May 18, 2022 10:16 am Post subject: |
|
|
Vieri wrote: | OK, thanks for pointing that out. I guess my source hard disk is failing:
Code: | [ 2040.168570] sd 4:0:0:0: [sdb] tag#0 FAILED Result: hostbyte=DID_ERROR driverbyte=DRIVER_OK
[ 2040.168575] sd 4:0:0:0: [sdb] tag#0 CDB: Read(10) 28 00 00 21 68 00 00 00 08 00
[ 2040.168577] blk_update_request: I/O error, dev sdb, sector 2189312 op 0x0:(READ) flags 0x80700 phys_seg 1 prio clas
s 0
[ 2040.168632] blk_update_request: I/O error, dev sdb, sector 2189312 op 0x0:(READ) flags 0x0 phys_seg 1 prio class 0
[ 2040.168635] Buffer I/O error on dev sdb3, logical block 512, async page read |
even if it does boot fine and seems to work OK.
|
I remember having these kind of messages with some old HDD but there were ok after all.
Except this, first time I see that : Code: | Buffer I/O error on dev sdb3, logical block 512, async page read
|
Could you try to switch the SATA to USB controller ? Or / and try another USB port ? To see if you get the same output.
I only used ddrescue once so I'm not that familiar with it. It will take more time of course.
Usage is quite simple : ddrescue /dev/sdb path/to/image.dd path/to/log.txt
But first, you should try a short smartctl test on your source drive :
Code: | smartctl -t short /dev/sdb |
and post the output here (smartctl -a /dev/sdb)
To make sure the drive is ok.
If it is not, then with some luck, you could at least image the rootfs with fsarchiver if the problematic sectors are not located on it. |
|
Back to top |
|
|
NeddySeagoon Administrator
Joined: 05 Jul 2003 Posts: 54216 Location: 56N 3W
|
Posted: Wed May 18, 2022 4:04 pm Post subject: |
|
|
Vieri,
smantctl will tell us much more about your HDD.
At the moment, all we know is that there is a problem between the PC and the drive.
It may or may not be the drive.
ddrescue will do a much better job of reading your drive as it has extensive error handling and workarounds.
You don't cane about the content of swap, so a dirty hack is to dd partition 3, not the whole drive.
If that bit of the drive is OK, that might be enough. If its not, ddrescue will make a much better job of data recovery than dd. _________________ Regards,
NeddySeagoon
Computer users fall into two groups:-
those that do backups
those that have never had a hard drive fail. |
|
Back to top |
|
|
Vieri l33t
Joined: 18 Dec 2005 Posts: 874
|
Posted: Wed May 18, 2022 10:33 pm Post subject: |
|
|
The ddrescue process completed after more than 8 hours.
Code: | # ddrescue -f -n /dev/sdb /dev/sdc ./recovery.log
GNU ddrescue 1.22
ipos: 46667 MB, non-trimmed: 0 B, current rate: 1365 B/s
opos: 46667 MB, non-scraped: 16384 B, average rate: 8130 kB/s
non-tried: 0 B, bad-sector: 4096 B, error rate: 21 B/s
rescued: 250059 MB, bad areas: 8, run time: 8h 32m 35s
pct rescued: 99.99%, read errors: 12, remaining time: 4s
time since last successful read: 0s
Finished |
With smartctl on source disk I get:
Code: | === START OF INFORMATION SECTION ===
Model Family: Seagate Maxtor DiamondMax 21
Device Model: MAXTOR STM3250310AS
Serial Number: 6RY1RF4S
Firmware Version: 3.AAC
User Capacity: 250.059.350.016 bytes [250 GB]
Sector Size: 512 bytes logical/physical
Device is: In smartctl database [for details use: -P show]
ATA Version is: ATA/ATAPI-7 (minor revision not indicated)
Local Time is: Thu May 19 02:59:01 2022 CEST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED
General SMART Values:
Offline data collection status: (0x82) Offline data collection activity
was completed without error.
Auto Offline Data Collection: Enabled.
Self-test execution status: ( 121) The previous self-test completed having
the read element of the test failed.
Total time to complete Offline
data collection: ( 430) seconds.
Offline data collection
capabilities: (0x5b) SMART execute Offline immediate.
Auto Offline data collection on/off support.
Suspend Offline collection upon new
command.
Offline surface scan supported.
Self-test supported.
No Conveyance Self-test supported.
Selective Self-test supported.
SMART capabilities: (0x0003) Saves SMART data before entering
power-saving mode.
Supports SMART auto save timer.
Error logging capability: (0x01) Error logging supported.
General Purpose Logging supported.
Short self-test routine
recommended polling time: ( 1) minutes.
Extended self-test routine
recommended polling time: ( 92) minutes.
SMART Attributes Data Structure revision number: 10
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
1 Raw_Read_Error_Rate 0x000f 117 068 006 Pre-fail Always - 122747846
3 Spin_Up_Time 0x0003 097 097 000 Pre-fail Always - 0
4 Start_Stop_Count 0x0032 100 100 020 Old_age Always - 225
5 Reallocated_Sector_Ct 0x0033 100 100 036 Pre-fail Always - 14
7 Seek_Error_Rate 0x000f 082 060 030 Pre-fail Always - 171164356
9 Power_On_Hours 0x0032 001 001 000 Old_age Always - 126459
10 Spin_Retry_Count 0x0013 100 100 097 Pre-fail Always - 0
12 Power_Cycle_Count 0x0032 100 100 020 Old_age Always - 222
187 Reported_Uncorrect 0x0032 001 001 000 Old_age Always - 239
189 High_Fly_Writes 0x003a 100 100 000 Old_age Always - 0
190 Airflow_Temperature_Cel 0x0022 066 049 045 Old_age Always - 34 (Min/Max 30/34)
194 Temperature_Celsius 0x0022 034 051 000 Old_age Always - 34 (0 18 0 0 0)
195 Hardware_ECC_Recovered 0x001a 117 046 000 Old_age Always - 148942409
197 Current_Pending_Sector 0x0012 100 100 000 Old_age Always - 11
198 Offline_Uncorrectable 0x0010 100 100 000 Old_age Offline - 11
199 UDMA_CRC_Error_Count 0x003e 200 200 000 Old_age Always - 0
200 Multi_Zone_Error_Rate 0x0000 100 253 000 Old_age Offline - 0
202 Data_Address_Mark_Errs 0x0032 100 253 000 Old_age Always - 0
SMART Error Log Version: 1
ATA Error Count: 239 (device log contains only the most recent five errors)
CR = Command Register [HEX]
FR = Features Register [HEX]
SC = Sector Count Register [HEX]
SN = Sector Number Register [HEX]
CL = Cylinder Low Register [HEX]
CH = Cylinder High Register [HEX]
DH = Device/Head Register [HEX]
DC = Device Command Register [HEX]
ER = Error register [HEX]
ST = Status register [HEX]
Powered_Up_Time is measured from power on, and printed as
DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes,
SS=sec, and sss=millisec. It "wraps" after 49.710 days.
Error 239 occurred at disk power-on lifetime: 60922 hours (2538 days + 10 hours)
When the command that caused the error occurred, the device was active or idle.
After command completion occurred, registers were:
ER ST SC SN CL CH DH
-- -- -- -- -- -- --
40 51 00 39 ce 6e e0 Error: UNC at LBA = 0x006ece39 = 7261753
Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
-- -- -- -- -- -- -- -- ---------------- --------------------
25 03 08 38 ce 6e e0 00 00:57:33.670 READ DMA EXT
25 03 08 38 ce 6e e0 00 00:57:58.456 READ DMA EXT
25 03 08 38 ce 6e e0 00 00:57:54.366 READ DMA EXT
25 03 08 38 ce 6e e0 00 00:57:50.281 READ DMA EXT
25 03 08 38 ce 6e e0 00 00:57:46.189 READ DMA EXT
Error 238 occurred at disk power-on lifetime: 60922 hours (2538 days + 10 hours)
When the command that caused the error occurred, the device was active or idle.
After command completion occurred, registers were:
ER ST SC SN CL CH DH
-- -- -- -- -- -- --
40 51 00 39 ce 6e e0 Error: UNC at LBA = 0x006ece39 = 7261753
Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
-- -- -- -- -- -- -- -- ---------------- --------------------
25 03 08 38 ce 6e e0 00 00:57:33.670 READ DMA EXT
25 03 08 38 ce 6e e0 00 00:57:29.589 READ DMA EXT
25 03 08 38 ce 6e e0 00 00:57:54.366 READ DMA EXT
25 03 08 38 ce 6e e0 00 00:57:50.281 READ DMA EXT
25 03 08 38 ce 6e e0 00 00:57:46.189 READ DMA EXT
Error 237 occurred at disk power-on lifetime: 60922 hours (2538 days + 10 hours)
When the command that caused the error occurred, the device was active or idle.
After command completion occurred, registers were:
ER ST SC SN CL CH DH
-- -- -- -- -- -- --
40 51 00 39 ce 6e e0 Error: UNC at LBA = 0x006ece39 = 7261753
Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
-- -- -- -- -- -- -- -- ---------------- --------------------
25 03 08 38 ce 6e e0 00 00:57:33.670 READ DMA EXT
25 03 08 38 ce 6e e0 00 00:57:29.589 READ DMA EXT
25 03 08 38 ce 6e e0 00 00:57:25.499 READ DMA EXT
25 03 08 38 ce 6e e0 00 00:57:50.281 READ DMA EXT
25 03 08 40 ce 6e e0 00 00:57:46.189 READ DMA EXT
Error 236 occurred at disk power-on lifetime: 60922 hours (2538 days + 10 hours)
When the command that caused the error occurred, the device was active or idle.
After command completion occurred, registers were:
ER ST SC SN CL CH DH
-- -- -- -- -- -- --
40 51 00 39 ce 6e e0 Error: UNC at LBA = 0x006ece39 = 7261753
Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
-- -- -- -- -- -- -- -- ---------------- --------------------
25 03 08 38 ce 6e e0 00 00:57:33.670 READ DMA EXT
25 03 08 38 ce 6e e0 00 00:57:29.589 READ DMA EXT
25 03 08 38 ce 6e e0 00 00:57:25.499 READ DMA EXT
25 03 08 40 ce 6e e0 00 00:57:21.412 READ DMA EXT
25 03 08 48 ce 6e e0 00 00:57:46.189 READ DMA EXT
Error 235 occurred at disk power-on lifetime: 60922 hours (2538 days + 10 hours)
When the command that caused the error occurred, the device was active or idle.
After command completion occurred, registers were:
ER ST SC SN CL CH DH
-- -- -- -- -- -- --
40 51 00 39 ce 6e e0 Error: UNC at LBA = 0x006ece39 = 7261753
Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
-- -- -- -- -- -- -- -- ---------------- --------------------
25 03 08 38 ce 6e e0 00 00:57:33.670 READ DMA EXT
25 03 08 38 ce 6e e0 00 00:57:29.589 READ DMA EXT
25 03 08 40 ce 6e e0 00 00:57:25.499 READ DMA EXT
25 03 08 48 ce 6e e0 00 00:57:21.412 READ DMA EXT
25 03 08 50 ce 6e e0 00 00:57:17.322 READ DMA EXT
SMART Self-test log structure revision number 1
Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error
# 1 Short captive Completed: read failure 90% 60922 91147833
# 2 Short offline Completed: read failure 90% 60922 91147833
SMART Selective self-test log data structure revision number 1
SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS
1 0 0 Not_testing
2 0 0 Not_testing
3 0 0 Not_testing
4 0 0 Not_testing
5 0 0 Not_testing
Selective self-test flags (0x0):
After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay. |
This HDD is connected to the computer via a USB docking station.
I've yet to test it by connecting it straight to the Motherboard's SATA ports.
Is it worth the try?
In any case, this looks bad.
However, I can now mount /dev/sdc3 which is the target disk, so I guess ddrescue did a pretty good job.
I still need to boot the system with the new HDD to see if there has been any data corruption or loss.
Thanks again! |
|
Back to top |
|
|
figueroa Advocate
Joined: 14 Aug 2005 Posts: 2961 Location: Edge of marsh USA
|
Posted: Thu May 19, 2022 4:33 am Post subject: |
|
|
In my opinion, attempting to clone is the wrong solution for the use case. The right solution is to restore from current backup. You can search the forums here for "system backup" for recent discussions, or just follow this link: https://wiki.archlinux.org/index.php/rsync#Full_system_backup
Not having current backups is the source of endless grief. _________________ Andy Figueroa
hp pavilion hpe h8-1260t/2AB5; spinning rust x3
i7-2600 @ 3.40GHz; 16 gb; Radeon HD 7570
amd64/23.0/split-usr/desktop (stable), OpenRC, -systemd -pulseaudio -uefi |
|
Back to top |
|
|
Irre Guru
Joined: 09 Nov 2013 Posts: 434 Location: Stockholm
|
Posted: Thu May 19, 2022 9:31 am Post subject: |
|
|
I used ddrescue when I cloned my defective harddisk. There were several errors I wasnt aware of, but it worked |
|
Back to top |
|
|
sdauth Guru
Joined: 19 Sep 2018 Posts: 568 Location: Ásgarðr
|
Posted: Thu May 19, 2022 11:37 am Post subject: |
|
|
Vieri wrote: | The ddrescue process completed after more than 8 hours.
[...]
This HDD is connected to the computer via a USB docking station.
I've yet to test it by connecting it straight to the Motherboard's SATA ports.
Is it worth the try?
In any case, this looks bad.
However, I can now mount /dev/sdc3 which is the target disk, so I guess ddrescue did a pretty good job.
I still need to boot the system with the new HDD to see if there has been any data corruption or loss.
Thanks again! |
Your source drive is indeed starting to die This would be explain the error with standard dd..
After all, it finished fine with ddrescue so your USB docking station is not the cause, but most likely the drive itself. You can still try to connect it directly via SATA but I don't think it will make much difference, the smartctl report is rather explicit.
What you could do next is to identify which files were impacted by the "current pending sectors" using debugfs on your source drive (here is a good explanation : https://mellowhost.com/blog/identifying-file-inode-by-sector-block-number-in-linux.html ) |
|
Back to top |
|
|
NeddySeagoon Administrator
Joined: 05 Jul 2003 Posts: 54216 Location: 56N 3W
|
Posted: Thu May 19, 2022 4:01 pm Post subject: |
|
|
Vieri,
Here's the interesting bits form your SMART data and what it means ...
Code: |
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
5 Reallocated_Sector_Ct 0x0033 100 100 036 Pre-fail Always - 14
9 Power_On_Hours 0x0032 001 001 000 Old_age Always - 126459
197 Current_Pending_Sector 0x0012 100 100 000 Old_age Always - 11 |
The drive has already remapped 14 sectors because they were getting difficult to read. Its supposed to do that, so that bad sectors are never visible to the operating system.
However, there are 11 sectors that the drive would like to remap but can't because it can't read them. That's 11 sectors that in knows about because its tried to read them in the course of normal system operation. There may be many others.
In short, the drive can no longer reliably read it's own writing.
That drive has 126,459 operating hours on the clock. I get nervous at 70,000 hours, so its expected to be past its best.
The smart tests abort at first fail, so
Code: | SMART Self-test log structure revision number 1
Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error
# 1 Short captive Completed: read failure 90% 60922 91147833
# 2 Short offline Completed: read failure 90% 60922 91147833 | Its interesting that the drive was starting to fail at 60922, or over 60,000 operating hours ago.
The ddrescue log will be interesting. It will tall where the remaining errors are.
The summary Code: | bad-sector: 4096 B bad areas: 8 | tells that ddrescue failed to read eight sectors and the drive knew about at least 11 unreadable sectors, so ddrescue did its thing and coaxed another read from some of those bad blocks.
What next?
Post your recovery.log file. It will tell where the bad sectors are. It they are inside the swap partition it won't matter, unless you have a hibernate image there you want back.
If not, we can make ddrescue try harder, just on the bits not yet recovered. That will fill in the 'holes' in your image on /dev/sdc. _________________ Regards,
NeddySeagoon
Computer users fall into two groups:-
those that do backups
those that have never had a hard drive fail. |
|
Back to top |
|
|
Vieri l33t
Joined: 18 Dec 2005 Posts: 874
|
Posted: Tue May 24, 2022 6:23 am Post subject: |
|
|
Yes, recovering from a backup is a better approach. It's not that I don't have one, but it's one year old. I'm not worried about losing the service, just hoping to save time not having to update the 1-year-old system. Recovering this failing disk would help. It also serves as an exercise to see what can actually be done in these extreme cases.
The ddrescue recovery file contains:
Code: | # cat recovery.log
# Mapfile. Created by GNU ddrescue version 1.22
# Command line: ddrescue -f -n /dev/sdb /dev/sdc ./recovery.log
# Start time: 2022-05-18 12:57:39
# Current time: 2022-05-18 21:30:14
# Finished
# current_pos current_status current_pass
0xADD9C8000 + 1
# pos size status
0x00000000 0x2A688000 +
0x2A688000 0x00000200 -
0x2A688200 0x00000C00 /
0x2A688E00 0x00000200 -
0x2A689000 0x000DE000 +
0x2A767000 0x00000200 -
0x2A767200 0x00000C00 /
0x2A767E00 0x00000200 -
0x2A768000 0xA5E36E000 +
0xA88AD6000 0x00000200 -
0xA88AD6200 0x00001C00 /
0xA88AD7E00 0x00000200 -
0xA88AD8000 0x54EEF000 +
0xADD9C7000 0x00000200 -
0xADD9C7200 0x00000C00 /
0xADD9C7E00 0x00000200 -
0xADD9C8000 0x2F5B166000 + |
I guess I could run the same ddrescue command, but with the -A, --try-again parameter, right?
[EDIT]
I might not need to use -A. I ran the following command:
Code: | # ddrescue -f -n -r3 /dev/sdb /dev/sdc ./recovery.log
GNU ddrescue 1.22
Press Ctrl-C to interrupt
Initial status (read from mapfile)
rescued: 250059 MB, tried: 20480 B, bad-sector: 4096 B, bad areas: 8
ipos: 46667 MB, non-trimmed: 0 B, current rate: 0 B/s
opos: 46667 MB, non-scraped: 16384 B, average rate: 0 B/s
non-tried: 0 B, bad-sector: 4096 B, error rate: 21 B/s
rescued: 250059 MB, bad areas: 8, run time: 9m 45s
pct rescued: 99.99%, read errors: 24, remaining time: n/a
time since last successful read: n/a
Finished |
Code: | # cat ./recovery.log
# Mapfile. Created by GNU ddrescue version 1.22
# Command line: ddrescue -f -n -r3 /dev/sdb /dev/sdc ./recovery.log
# Start time: 2022-05-24 11:24:41
# Current time: 2022-05-24 11:34:26
# Finished
# current_pos current_status current_pass
0xADD9C7E00 + 3
# pos size status
0x00000000 0x2A688000 +
0x2A688000 0x00000200 -
0x2A688200 0x00000C00 /
0x2A688E00 0x00000200 -
0x2A689000 0x000DE000 +
0x2A767000 0x00000200 -
0x2A767200 0x00000C00 /
0x2A767E00 0x00000200 -
0x2A768000 0xA5E36E000 +
0xA88AD6000 0x00000200 -
0xA88AD6200 0x00001C00 /
0xA88AD7E00 0x00000200 -
0xA88AD8000 0x54EEF000 +
0xADD9C7000 0x00000200 -
0xADD9C7200 0x00000C00 /
0xADD9C7E00 0x00000200 -
0xADD9C8000 0x2F5B166000 + |
|
|
Back to top |
|
|
NeddySeagoon Administrator
Joined: 05 Jul 2003 Posts: 54216 Location: 56N 3W
|
Posted: Tue May 24, 2022 10:04 am Post subject: |
|
|
Vieri,
Your partition table is
Code: | Device Boot Start End Sectors Size Id Type
/dev/sdc1 * 63 208844 208782 102M 83 Linux
/dev/sdc2 208845 2184839 1975995 964,9M 82 Linux swap / Solaris
/dev/sdc3 2184840 488375999 486191160 231,9G 83 Linux |
Converting the start sectors into bytes we have, in decimal
32256
106928640
1118638080
Or in hex, which is coing bo be more useful for the rest of thus post.
Code: | 0x7E00
0x65f9a00
0x42AD1000 |
From your ddrescue log we have
Code: | # pos size status
0x00000000 0x2A688000 +
0x2A688000 0x00000200 -
0x2A688200 0x00000C00 /
0x2A688E00 0x00000200 -
0x2A689000 0x000DE000 +
0x2A767000 0x00000200 -
0x2A767200 0x00000C00 /
0x2A767E00 0x00000200 -
0x2A768000 0xA5E36E000 +
0xA88AD6000 0x00000200 -
0xA88AD6200 0x00001C00 /
0xA88AD7E00 0x00000200 -
0xA88AD8000 0x54EEF000 +
0xADD9C7000 0x00000200 -
0xADD9C7200 0x00000C00 /
0xADD9C7E00 0x00000200 -
0xADD9C8000 0x2F5B166000 + |
Matching the above starts with your partition table
0x00000000 0x2A688000 + covers all of partition 1
Code: | 0x2A688000 0x00000200 -
0x2A688200 0x00000C00 /
0x2A688E00 0x00000200 -
0x2A689000 0x000DE000 +
0x2A767000 0x00000200 -
0x2A767200 0x00000C00 /
0x2A767E00 0x00000200 -
0x2A768000 0xA5E36E000 + | All fall into partition 2 which is swap.
Code: | 0xA88AD6000 0x00000200 -
0xA88AD6200 0x00001C00 /
0xA88AD7E00 0x00000200 -
0xA88AD8000 0x54EEF000 +
0xADD9C7000 0x00000200 -
0xADD9C7200 0x00000C00 /
0xADD9C7E00 0x00000200 -
0xADD9C8000 0x2F5B166000 +[ | all all in partition 3
Doing some sorting, These regions are recovered.
Code: | 0x00000000 0x2A688000 +
0x2A689000 0x000DE000 +
0x2A768000 0xA5E36E000 +
0xA88AD8000 0x54EEF000 +
0xADD9C8000 0x2F5B166000 + |
These are 'failed', they are all one disc block 512 decimal is 200 hex.
Code: | 0x2A688000 0x00000200 -
0x2A688E00 0x00000200 -
0x2A767000 0x00000200 -
0x2A767E00 0x00000200 -
0xA88AD6000 0x00000200 -
0xA88AD7E00 0x00000200 -
0xADD9C7000 0x00000200 -
0xADD9C7E00 0x00000200 - |
These regions are not recovered yet but ddrescue can try harder.
Code: | 0x2A688200 0x00000C00 /
0x2A767200 0x00000C00 /
0xA88AD6200 0x00001C00 /
0xADD9C7200 0x00000C00 / |
Lets tell ddrescue to try harder a good command is Code: | ddrescue -MAd -r 16 ... | using the same input, output and log files as you already have.
ddrescue will only try to fill in the holes in the already recovered data.
There is more.
As you have a conventional HDD with lots of hours on it, it is likely that the mechanics are worn so in its normal operating position, the alignment is poor.
Run the above command six times, moving the drive each time so that its face up, face down and operating on all four edges.
This way, gravity may help the alignment to coax on last read, even from the 'failed' sectors.
Be kind to the drive, Let it spin down before you move it.
If you still don't have all your data back, add the -R option Code: | ddrescue -MAdR -r 16 ... | and do all six passes again.
Post the recovery.log when that completes. _________________ Regards,
NeddySeagoon
Computer users fall into two groups:-
those that do backups
those that have never had a hard drive fail. |
|
Back to top |
|
|
Vieri l33t
Joined: 18 Dec 2005 Posts: 874
|
Posted: Tue May 24, 2022 10:53 pm Post subject: |
|
|
I didn't know the trick about changing the HDD's position to try to read bad sectors. Nice... at least until we're left with SSDs only.
However, I couldn't change the drive's orientation because the docking station doesn't firmly hold the disks in. I have to keep them vertically -- any other position is risky.
My other option is to connect it via SATA cable directly to the motherboard. In that case I can easily try out a few positions.
Anyway, I ran this:
Code: | # ddrescue -MAdf -r 16 /dev/sdb /dev/sdc ./recovery.log
GNU ddrescue 1.22
Press Ctrl-C to interrupt
Initial status (read from mapfile)
rescued: 250059 MB, tried: 0 B, bad-sector: 0 B, bad areas: 0
ipos: 711494 kB, non-trimmed: 0 B, current rate: 0 B/s
opos: 711494 kB, non-scraped: 0 B, average rate: 3 B/s
non-tried: 0 B, bad-sector: 5632 B, error rate: 21 B/s
rescued: 250059 MB, bad areas: 4, run time: 1h 13m 52s
pct rescued: 99.99%, read errors: 190, remaining time: n/a
time since last successful read: 1h 10m 12s
Finished |
and it gave me this log:
Code: | # Mapfile. Created by GNU ddrescue version 1.22
# Command line: ddrescue -MAdf -r 16 /dev/sdb /dev/sdc ./recovery.log
# Start time: 2022-05-24 16:29:36
# Current time: 2022-05-24 17:43:28
# Finished
# current_pos current_status current_pass
0x2A688C00 + 16
# pos size status
0x00000000 0x2A688A00 +
0x2A688A00 0x00000200 -
0x2A688C00 0x000DF200 +
0x2A767E00 0x00000200 -
0x2A768000 0xA5E36E400 +
0xA88AD6400 0x00001000 -
0xA88AD7400 0x54EEFE00 +
0xADD9C7200 0x00000200 -
0xADD9C7400 0x2F5B166C00 + |
Getting there...
I will try the R option next.
Thanks |
|
Back to top |
|
|
Vieri l33t
Joined: 18 Dec 2005 Posts: 874
|
Posted: Wed May 25, 2022 10:37 am Post subject: |
|
|
It didn't budge much:
Code: | # ddrescue -MAdRf -r 16 /dev/sdb /dev/sdc ./recovery.log
[sudo] password for hmaninf:
GNU ddrescue 1.22
Press Ctrl-C to interrupt
Initial status (read from mapfile)
rescued: 250059 MB, tried: 0 B, bad-sector: 0 B, bad areas: 0
ipos: 46667 MB, non-trimmed: 0 B, current rate: 0 B/s
opos: 46667 MB, non-scraped: 0 B, average rate: 0 B/s
non-tried: 0 B, bad-sector: 5632 B, error rate: 20 B/s
rescued: 250059 MB, bad areas: 4, run time: 1h 13m 1s
pct rescued: 99.99%, read errors: 188, remaining time: n/a
time since last successful read: n/a
Finished |
Code: | # cat ./recovery.log
# Mapfile. Created by GNU ddrescue version 1.22
# Command line: ddrescue -MAdRf -r 16 /dev/sdb /dev/sdc ./recovery.log
# Start time: 2022-05-25 03:31:21
# Current time: 2022-05-25 04:44:22
# Finished
# current_pos current_status current_pass
0xADD9C7200 + 16
# pos size status
0x00000000 0x2A688A00 +
0x2A688A00 0x00000200 -
0x2A688C00 0x000DF200 +
0x2A767E00 0x00000200 -
0x2A768000 0xA5E36E400 +
0xA88AD6400 0x00001000 -
0xA88AD7400 0x54EEFE00 +
0xADD9C7200 0x00000200 -
0xADD9C7400 0x2F5B166C00 + |
I'll give it another try without R, but I think I'll just settle with what I got.
Thanks, everyone. |
|
Back to top |
|
|
NeddySeagoon Administrator
Joined: 05 Jul 2003 Posts: 54216 Location: 56N 3W
|
Posted: Wed May 25, 2022 4:21 pm Post subject: |
|
|
Vieri,
If you are ready to settle, there is one more thing to do.
Code: | ddrescue -MAdRf -r 256 ... |
This time, pick up the drive and rotate it at right angles to the spin axis while ddrescue runs.
You will feel the gryroscopic force due to the rotating mass of the platters.
Its normally a very bad thing to do to the drive but ff you are ready to settle, what is there to loose?
If you are really lucky, the regions not yet recovered are unallocated space. _________________ Regards,
NeddySeagoon
Computer users fall into two groups:-
those that do backups
those that have never had a hard drive fail. |
|
Back to top |
|
|
Vieri l33t
Joined: 18 Dec 2005 Posts: 874
|
Posted: Thu May 26, 2022 4:51 pm Post subject: |
|
|
I've booted the cloned disk, and everything seems to be working fine. Time will tell.
Thank you very much. |
|
Back to top |
|
|
|
|
You cannot post new topics in this forum You cannot reply to topics in this forum You cannot edit your posts in this forum You cannot delete your posts in this forum You cannot vote in polls in this forum
|
|