RAID array broken, can't boot

ExecutorElassus · Posted: Mon Feb 11, 2019 3:58 pm Post subject:

Hi Neddy,

from this

NeddySeagoon · Posted: Mon Feb 11, 2019 4:26 pm Post subject:

ExecutorElassus,

Here's one of my raid sets ... just one drive.

ExecutorElassus · Posted: Mon Feb 11, 2019 4:35 pm Post subject:

Hi Neddy,
here's what I tried:

NeddySeagoon · Posted: Mon Feb 11, 2019 4:40 pm Post subject:

ExecutorElassus,

Don't use /dev/md127. It may be in config files and its certainly in the raid metadata as a preferred minor number, so choose a new md number.
I hadn't thought to stop already running arrays of one drive. That's correct.
_________________
Regards,

NeddySeagoon

Computer users fall into two groups:-
those that do backups
those that have never had a hard drive fail.

ExecutorElassus · Posted: Mon Feb 11, 2019 4:47 pm Post subject:

Hi Neddy,

So now:

NeddySeagoon · Posted: Mon Feb 11, 2019 5:02 pm Post subject:

ExecutorElassus,

ExecutorElassus · Posted: Mon Feb 11, 2019 5:09 pm Post subject:

Hi Neddy,

now we're here:

NeddySeagoon · Posted: Mon Feb 11, 2019 5:29 pm Post subject:

ExecutorElassus,

Looks promising but its early days. Those 4000 writes bother me.

ExecutorElassus · Posted: Mon Feb 11, 2019 5:31 pm Post subject:

Hi Neddy,

NeddySeagoon · Posted: Mon Feb 11, 2019 5:57 pm Post subject:

ExecutorElassus,

ExecutorElassus · Posted: Mon Feb 11, 2019 6:17 pm Post subject:

Here:s the pastebin for attempted mounts and the lvdisplay: https://pastebin.com/MfzUpiKj
and here's the ddrescue.map file: https://pastebin.com/izG41dN5

would running fsck on the volumes using a backup superblock allow them to be fixed and then mounted? So far, I don't see any glaring errors (but the stuff I care about is thousands of files in hundreds of subdirectories, so I doubt I'd ever find them all.

How does it look? Are we making progress? Should I, at some point, switch to using /dev/sdd4, as it is the non-broken drive?

thanks for the help,

EE

NeddySeagoon · Posted: Mon Feb 11, 2019 6:52 pm Post subject:

ExecutorElassus,

ExecutorElassus · Posted: Mon Feb 11, 2019 7:00 pm Post subject:

I think at this point I'd like to start trying to work with sdd4, in case I either try to make writes, or if there's more data recovered there.

How do I shut down the active volume groups?
UPDATE: nvm I figured out how to use 'vgchange -an' to shut it down. Now I've stopped /dev/md2 and restarted it with /dev/sdd4, and activated the volume groups. I'll update in a sec when I try your last suggestions.

Thanks for the help,

EE

ExecutorElassus · Posted: Mon Feb 11, 2019 7:16 pm Post subject:

Hi Neddy,

Using backup superblocks managed to mount all the remaining partitions except /dev/vg/portage. I used a different backup superblock, found using dumpe2fs | grep superblock, tried that one, and it mounted as well.

So all the partitions mount, and a cursory look inside shows them all having the contents they should (it's worth noting, btw, that the signal for me a couple weeks ago that a drive was failing was that portage kept not being able to emerge --sync due to permissions and other problems, so I think this was the partition where a lot of the bad blocks accumulated.

NeddySeagoon · Posted: Mon Feb 11, 2019 9:41 pm Post subject:

ExecutorElassus,

We *know* that sdd4 has holes in. Its a question of where and what in affected.
Just because things mount does not mean that the file contents are correct.

When you read a raid5, two out of the three (in your case) drives are used to decode the data,
You need both bits. with sdb4, you would get read errors when you hit a failed block.
With sdd4, it will silently return rubbish.
That rubbish may be file content, directory content, or now unlikely, key filesystem metadata.

While you are using mdadm --readonly you won't do any damage to your data.

If you put the raid together with sd[ac] you may get a different subset of logical volumes that work.

You said that sdb-new is 2GB ?
If that's correct, it will hold all the data from the raid set. Hold that thought.
It may be that different pairings of drives in md2 give you correct access to different LVM filesystems. If that's true, then in place data recovery may not be possible, but you can copy all the files to sdb-new.

That's a bit simplistic. if sdd4 is made to fill the the remaining space on sdd, (it may be like that anyway) then it can be used to hold all the data from md2 while its other partitions are members of the other raid sets. There is a big difference between reading the data and reading correct data. The only way you can verify the data is correct is by examining it.
Like I've already said, some filesystems are expendable. Don't bother recovering them.

Looking through your logical volume to HDD map

ExecutorElassus · Posted: Mon Feb 11, 2019 9:55 pm Post subject:

Hi Neddy,

yes, the new HDD is 2TB (not GB, as you typed in your last message). that means, theoretically, that I could copy all of one of the other drives onto the extra space if needed.

So there is a big block of bad data on vg-carrier-1, and most everything that comes before it is expendable (I'd like to keep /usr, just because I have a lot of fonts and stuff under /usr/share, anda bunch of custom ebuilds in /usr/local, but those aren't priorities). vg-carrier1 is more concerning: that's where all of my work files are (documents, papers, articles, invoices, etc), so I need to do a more thorough investigation of what's there.

Using the --force flag seems to reset the event count to the highest number, so I'm not sure if that is going to cause problems.

But what should I do next? I'm not sure how to look more deeply into carrier1 without being able to load the files in some sort of GUI, but is there some other way I can try to recover the data?

Thanks for the help,

EE

NeddySeagoon · Posted: Mon Feb 11, 2019 10:55 pm Post subject:

ExecutorElassus,

Put the raid together with sd[ab]4 still --readonly
mount carrier1 somewhere and try to copy the files out.
cp -a will do.
It will fail at the first read error. If you are lucky, there won't be a read error.

You can try with sd[ac]4 too but sdc4 was in the middle of a rebuild.

As long as you always use --readonly everywhere, the data on the raid will not change and it is what it is.
The metadata may change but we have the info in this thread to run a --create but that won't sync your raids.

Warning: Even if the copy works, some files can be corrupt because of the difference in event counts.
The out of sync is not detectable.

Try not to use sdb-new as a destination, unless you make a new partition off the end of sdd4, so the raid image is preserved.
You need 100G, or less, depending on how full carrier1 is.

Someone with some script-fu could write a script to recursively list all the files in carrier1, then copy them one at a time, listing the ones that failed.
That's beyond my bash skills though.

-- edit --

Its lots of small blocks not recovered, rather than a big block.

Hu · Moderator Joined: 06 Mar 2007 Posts: 21624

The quick way to copy files out would be rsync. I have had some success syncing files off dying drives before.

If that doesn't work for you, you could try cd "$carrier_mount_point" && find . '-(' -type f -o -type d '-)' other-restrictions -print0 > "$TMPDIR/files-to-save.txt" to save a list of files. You will likely need to preserve directory structure when copying them, which makes the copy side a bit harder. You could try tar --no-recursion -C "$carrier_mount_point" -0 -T "$TMPDIR/files-to-save.txt" -c -f - | tar -C "$recovery_directory" -x -f - -k to copy them out with tar. If that also fails (and it might, if read errors come back and tar aborts on error), fall back to cd "$carrier_mount_point" && while read -d '' filename; do cp --parents -a "$filename" "$recovery_directory"; done < "$TMPDIR/files-to-save.txt".

Note that this last method will preserve directory hierarchy, but not directory permissions / ownership. If you need that, you could try to pre-create the hierarchy: cd "$carrier_mount_point" && find . -type f other-restrictions -printf '%h\0' | sort -z | uniq -z > "$TMPDIR/dirs-to-save.txt" && tar -C "$carrier_mount_point" -0 -T "$TMPDIR/dirs-to-save.txt" --no-recursion -c -f - | tar -C "$recovery_directory" -x -f -. If this step fails, you are unable to read back some of your directory entries. That would be very unfortunate, as it means some files may be unreachable, even if their contents are intact.

Where I wrote other-restrictions, you could plug in any find predicates to restrict saving files you don't want (old enough that you can restore them from backup; derived files you can recreate from other salvaged files; etc.). As much as practical, you want to minimize recovering files that you can get elsewhere.

Regarding /usr, I would say copy /usr/local, but plan on reinstalling the relevant packages to rebuild /usr/share. If you can save it after you've saved all your irreplaceable contents, go ahead and try. Just prioritize the things that are most difficult to replace.

ExecutorElassus · Posted: Tue Feb 12, 2019 8:38 am Post subject:

Hi Neddy,

conveniently, I have two SSD drives, 250GB each, that I was planning to use as a RAID1 and migrate all my system partitions (everything up to /home, but not the carrier partitions). I never got around to it, so I have some extra space.

But I have a meeting to attend today, so I'll have to get to this when I get back in the evening.

The only thing I did when sdc4 was rebuilding was start the WM. None of the carrier partitions was mounted. So hopefully being out of sync won't affect too much.

What's a good filesystem format for an SSD? My third one is formatted with f2fs, but this liveCD apparently doesn't have that. Hu, how do I use rsync to copy everything?

Also, conveniently, carrier1 is only about 30% full, so it's quite possible the bad sectors don't even have data on them.

It would be nice if there were a way to use the rescue mapfile and have /dev/sd[ac]4 used to rebuild *only those sectors*, on the off chance that those specific sectors might be intact on /dev/sd[ac]4.

Anyway, when I get back home I'll format the SSD and see about copying carrier1 from sd[ab]4.

thanks for the help,

EE
PS it turns out that carrier1 mounts without issue when mounted from sd[ab]4. I'm not sure why that is.

NeddySeagoon · Posted: Tue Feb 12, 2019 10:28 am Post subject:

ExecutorElassus,

1. You don't know the sync status of sd[ac]4 so attempting to recover individual blocks would be risky.
mdadm reads/writes chunks, that 512k, so a chunk is all or nothing. One missing/unreadable sector costs you a whole chunk.

On top of that, your filesystems use 4kB blocks, so 4kB is the least you can loose at the filesystem level, even if the drive is 512B sectors.

You have a filesystem with 4k blocks on top of a raid with 512k chunks on top of a HDD (which you are trying to rescue) with 512B sectors.
The holes in your recovered data are bigger than you think but on the bright side, one raid chunk spans several unreadable sectors.

sdc4 has lost its active slot. I suspect we will need to run a --create to rewrite that before you can assemble it.

If you really think the raid is clean ... and we know its not, its possible to bring it up with all three drives --readonly then try file copies.
Reads to sdb4 will fail from time to time but the data will be fetched from the other drives, so should succeed.
Again, that does not mean the recovered data is what its supposed to be.
With all three drives in the raid set, we don't have any control, or knowledge, of which drives are being read.
That will be important if sd[ab]4 gives you rubbish/fails we can try the same files from sd[ac]4 which may give a different answer.

That trick with read the missing sectors from the 'good' drives is what mdadm --replace would have done.
It would have duplicated sdb4 using data from all three drives.
That's what you really wanted to do at the outset but hindsight gives everyone 20/20 vision.

-- edit --

See your PM too.
_________________
Regards,

NeddySeagoon

Computer users fall into two groups:-
those that do backups
those that have never had a hard drive fail.

ExecutorElassus · Posted: Tue Feb 12, 2019 5:32 pm Post subject:

Hi Neddy,

all right. I've reassembled sd[ab]4 and activated the VGs inside. Once I have the SSD formatted (I guess with ext3?) and I copy over all of carrier1, what should I do with it? Assuming there are no copy errors, what next? I can't check file integrity without the programs to open the files, but is there something else I should do?

What other steps should I take?

Thanks for the help,

EE

NeddySeagoon · Posted: Tue Feb 12, 2019 6:03 pm Post subject:

ExecutorElassus,

Thats it. Copy over other things too.
Use rsync as Hu suggested rather than cp -a.

You could do a second copy based on sd[ac]4 and compare the copies. The differences are just that.
You still have to look at the files that differ to see which is correct, if any.
Even where the files compare correctly, you only know that they are the same, not correct.

You can defer the checking. The data is what it is, you can go back to ddrescue and attempt to fill more holes and get back more data.

The drawback with using sdd4 is that the missing data will still read. The drive will return whatever happens to be there.
If you thought that was useful, you can do it.
e.g. if sdb4 fails to read a directory, it will all be missing.
However, if sdd4 has part recovered that directory, you may be able to use the part recovered directory to salvage the files that are there.

Its OK to run ddrescue overnight if you want to too. Set up lots of retries and let it run.
Next night, do it again with the drive in a different position.

I wouldn't rule out a --create on all three drives last thing and see what happens then, but I've given up trying to salvage the data in place.

Hold that for a moment ...

You could copy off
/dev/mapper/vg-var
/dev/mapper/vg-usr
/dev/mapper/vg-home
/dev/mapper/vg-opt
onto the SSD and make new there filesystems for tmp, vartmp, distfiles and portage

It fits with over 100G to spare, fix /etc/fstab to point to the SSD and try to bring the box up normally (with the SSD in place of the raid).
That 100G will give you space for /dev/mapper/vg-carrier1 too.

You would then have a working Gentoo to use to recover the data from the raid.
_________________
Regards,

NeddySeagoon

Computer users fall into two groups:-
those that do backups
those that have never had a hard drive fail.

ExecutorElassus · Posted: Tue Feb 12, 2019 6:36 pm Post subject:

Hi Neddy,

this last suggestion is actually the process I think I asked about over a year ago, when I first got the SSDs, but chickened out of doing.

So, here's what I think I would do. Please correct me if I'm wrong:

Format the SSD as one single partition. I can't use f2fs, apparently. What's a good format?
rsync all of sd[adc]3 (which holds / and all the rest of the system directories like /etc and is clean). How do I do this?
then, rsync each of the other system partitions that live on sd[ab]4 (that is, /usr, /portage, /distfiles, /opt, and /home, all of which I have reason to believe are clean) to their respective directories on the SSD. How do I do this?
edit /etc/fstab to point to this SSD as /. But I boot from an initfs (yours, incidentally). How do I edit this to it boots from the SSD and not the RAID array? is that something I just edit in grub.cfg?

As you say, this should get me a bootable system using only the SSD. Is there a way to turn this into a RAID1 array later while there's still data on it?

Can you please walk me through how to do this? This all looks risky and above my level of skill.

Thanks for the help,

EE

NeddySeagoon · Posted: Tue Feb 12, 2019 8:03 pm Post subject:

ExecutorElassus,

Use ext4 on the SSD. You may want to feed it options to control the number of i-nodes and/or turn off the journal on expenable partitions.

For the portage tree, one inode per filesystem block is good or you will run out of i-nodes.
Due to having a senior moment, my portage tree is on a filesystem with 1k blocks on an SSD with 4k physical blocks. Don't do that.
Make portage one i-node per block. 4G should be enough.

You are going to mount all these filesystems separately, like you do now.
Either use LVM, so you can move space around, growing is trivial, shrinking is hard, or make separate partitions.
As you have used LVM, that will offer the best use of space.

Make two, possibly three partitions on the SSD, /boot, root and everything else.
Combine /boot and root if thats what you do now,.
Make everything else, LVM, as that's what you have now.

Boot with System Rescue CD attach your old Gentoo to /mnt/gentoo but use the read only option to mount for all the old filesystems.
Bring up the sd[ab]4 raid as you have been doing and attach its filesystems in the right places, under /mnt/gentoo. Don't forget the read only option.

Make a new mountpoint /mnt/SSD

Attach all the empty SSD filesystems here. After you mount the SSD_root, you will need to mkdir all the lower level mount points.
Don't forget that /tmp will need its permissions adjusted.

Time for a review before you do something you can't undo.

Your old Gentoo is attached at /mnt/gentoo and its all read only. Do check.
Your empty SSD has its filesystem tree attached at /mnt/SSD and its read/write.
This read only read write stops you removing you existing install by messing up the rsync

I'm not a habitual rsync user, so I can't give you the command. I'll refer you to Hus post.

-- edit --

Once the copy completes, chroot into /mnt/SSD
You will need /proc, /dev and /sys
Fix /etc/fstab
Fix /boot/grub/grub.cfg
Reinstall grub as it uses space outside any filesystem, so thats not been copied.
Reboot but go into the BIOS and choose to boot from the SSD.
Boot. It should come up on the SSD only but your good raid sets will be running but not mounted.

-- edit --
You want to recursively rsync from /mnt/gentoo to /mnt/SSD

I use

ExecutorElassus · Posted: Tue Feb 12, 2019 8:35 pm Post subject:

Hi Neddy,

right now my drives have four partitions: /boot (raid1 on sd[adc]1), <swap> on sd[adc]2, / as raid1 on sd[adc]3, and the LVM holding /usr, /opt, /var, /tmp, /var/tmp, /var/portage, /var/portage/distfiles, and /home, along with the nine /carrierN partitions as degraded raid5 on sd[ab]4.

So, what I would do is create three partitions on the SSD (which is /dev/sde right now), for holding /boot, /, and the LVM for /usr, Var, /tmp, /var/tmp, /var/portage, /var/portage/distfiles, and /home. I would leave the nine carrier partitions on the raid5 array and try to recover them once the rest of the system boots.

Right now, I have /dev/md124 (active with sd[adc]3) mounted at /mnt/gentoo. At this point, once I partition and format the SSD, I should be able to just rsync everything over, yes?

EDIT: I could, theoretically, plug the other SSD back in (I unplugged it to use the cables for sdb), and put RAID1 arrays on all three partitions from the start. Would that be a smart thing to do, since that was my original intent anyway?

Hu, can you walk me through how to do that?

Thanks for the help,

EE