Help with failing disk

remix · l33t Joined: 28 Apr 2004 Posts: 797 Location: hawaii

I have a 4 disk Raid5 array. One disk has completely died, and before I found time to replace that disk, (while running with 3 or 4 disks) one of my other hard drives started to die. first one partition, then another, while the others on that same disk continue to work.

I really need to recover some files from one of the partitions in the newly failing disk, and i just bought a couple new hard drives and replaced the one completely destroyed disk.

My question is, is there any way to recover one of the partitions in that 'fourth' disk so that i can assemble and mount it (using 3 of 4 partitions)?

NeddySeagoon · Posted: Sat Apr 19, 2014 3:47 pm Post subject:

remix,

Install ddrescue and use that to make an image of the most recently failed drive onto the new drive
Make sure to put the ddrescue log on a third drive.

remix · l33t Joined: 28 Apr 2004 Posts: 797 Location: hawaii

I got confused when you said "Don't do this yet"

If I am understanding you correctly, i should not perform a Disk to Disk ddrescue yet,
first, I should just copy to /dev/null and output the errors to rescue_log.txt

my setup,
/dev/sda good
/dev/sdb newly installed, partitioned to match raid ( -b 4096)
/dev/sdc good
/dev/sdd failing ( -b 512)

and another brand new hd ( -b 4096) that is waiting to replace /dev/sdd once i can recover anything i can from my 5th raid partition.

/dev/sdd1, working
/dev/sdd2, working
/dev/sdd3, starting to fail
/dev/sdd4, extended
/dev/sdd5, failed (most important)
/dev/sdd6, failed (don't really care)

NeddySeagoon · Posted: Sun Apr 20, 2014 8:31 am Post subject:

remix,

The disk may fail completely at any time.
Do the disc to disc rescue first. The disk to /dev/null is a final desperate attempt to get more data recovered.

You may as well copy the entire disk. If you only copy a partition, how will you recover your raid sets?
You will need to get the good partitions onto the new drive at some time.
Of course, if you are still using the raid sets in degraded mode, the data will change and whatever ddrescue copies now from the degraded arrays will be useless.

You should not use ddrescue at all until yon know what its telling you. Read its man and/or info pages.
The hex numbers are block numbers. The symbols at the end of each line tell what the block numbers mean.
A log showing perfect data recovery will have exactly one line of data.

Being able to do arithmetic is hex is useful, since you can work out where and how many blocks you have lost, or still have to recover.
With a bit more poking about the filesystem, you can get a rough idea of whats there and determine its importance.
That enables an informed decision about giving up or trying harder.

Only use -b 4096 on drives with 4k physical sectors. Use -b 512 on other drives.
_________________
Regards,

NeddySeagoon

Computer users fall into two groups:-
those that do backups
those that have never had a hard drive fail.

remix · l33t Joined: 28 Apr 2004 Posts: 797 Location: hawaii

Good point.

Sounds like it would be safer to boot into a livedvd and not mount any of the degraded raid partitions.

I'll install the new blank disk, and perform the ddrescue

remix · l33t Joined: 28 Apr 2004 Posts: 797 Location: hawaii

i just read the man page and -b is the block size of the input device, which in my case is 512
output device is 409

so it is

NeddySeagoon · Posted: Mon Apr 21, 2014 5:46 pm Post subject:

remix,

Give it a go. Post the log when it stops.
You will find that gravity can assist the data recovery,

rerun the same command using the same log file a total of six times.
Make a copy of the log each time the command completes, so you can look at the differences later.
Between each invocation of the command, move the drive so you try it with each edge and both faces 'down'.

If you have a bearing failure, gravity and the odd orientations can get you one last read and thats all you need.
_________________
Regards,

NeddySeagoon

Computer users fall into two groups:-
those that do backups
those that have never had a hard drive fail.

remix · l33t Joined: 28 Apr 2004 Posts: 797 Location: hawaii

awesome tip! should i be flipping it during the retries? (without stopping or rebooting)

i just finished the first go through, i did set it to retry 8 times.

NeddySeagoon · Posted: Wed Apr 23, 2014 5:57 pm Post subject:

remix,

That looks fairly good so far.

remix · l33t Joined: 28 Apr 2004 Posts: 797 Location: hawaii

i'm ok with some of the files being completely inaccessible, well let's see how many that would be.

NeddySeagoon · Posted: Thu Apr 24, 2014 9:58 pm Post subject:

remix,

remix · l33t Joined: 28 Apr 2004 Posts: 797 Location: hawaii

thanks for the info, it makes sense to me.

the filesystem on the raid partitions is reiserfs

the new log is long so i pastie'd it here http://pastie.org/9118746
_________________
help the needy

remix · l33t Joined: 28 Apr 2004 Posts: 797 Location: hawaii

i don't think i recovered enough, not sure what i did wrong (other than not having backups)

NeddySeagoon · Posted: Mon Apr 28, 2014 9:47 pm Post subject:

remix,

We are not done yet.

Can you remember the parameters to --create that you used when you created md3 and md4?
What does

remix · l33t Joined: 28 Apr 2004 Posts: 797 Location: hawaii

i created the md devices using

NeddySeagoon · Posted: Fri May 09, 2014 11:44 am Post subject:

remix,

There is no need to use --create on the raid sets that are now working. There is nothing to do to them if /proc/mdstat shows that they are up to strength.
If its /dev/md4 thats the problem, I need the output of

remix · l33t Joined: 28 Apr 2004 Posts: 797 Location: hawaii

Chunk Size : 64K
Metadata : 0.90.00

NeddySeagoon · Posted: Sun Jun 15, 2014 12:02 pm Post subject:

remix,

First of all, understand that something will be corrupt but we have no idea what.
As it stands, that raid set should assemble and run with the --force option, if not, we need te rewrite the raid meta data, which does nothing to the user data on the raid.
Its just like rewriting a partition table.

We must be sure to pass the chunk size and metadata versions to mdadm --create as 64k and 0.90 are no longer the defaults and we want to recreate the raid metadata as was, so your data reappears.