RAID6: emergency help?

RayDude · Posted: Fri Dec 07, 2012 8:20 am Post subject: RAID6: emergency help?

I created a raid6 array from six drives and copied all my old data onto it. I still have the old drives but ... setting them up and copying them over would be a rather large task.

The first time I booted the machine with the new raid6 array it worked perfectly.

Then I powered it off and put the cover back on and rebooted and one of the drives became faulty.

Then trying to re-add it to the array to get it to rebuild another drive went faulty.

Then one more mdadm -A and another drive went away and it looks like all data is lost.

What am I doing wrong?

Here's what mdadm currently says:

NeddySeagoon · Posted: Fri Dec 07, 2012 10:05 pm Post subject:

RayDude,

Don't do anything that may involve writes. Post the output of

RayDude · Posted: Sat Dec 08, 2012 12:51 am Post subject:

Thanks Neddy!

I think the raid controller I used for my external SATA box is not compatible with these drives... Because the three drives that failed are all attached to it. (two inside, one outside)

It looks like b, c, d, and f are all at the same count which means I might be able to recover this. I bought two new dual controllers (DGMS, Fry's sucks) to see if they will work with the drives. I think b, c, and d are okay because they are plugged into the mother board.

It would be so awesome if I could get the array to rebuild itself. It takes days to copy 8 TB over gigabit.

Here's what mdadm said

frostschutz · Advocate Joined: 22 Feb 2005 Posts: 2977 Location: Germany

If you have solved the drive failure problem (by hooking them up through some other card). And if the drives do work reliably then. You should be able to reassemble the RAID. By your output, the first four ones are good (same timestamp and event count), whereas the latter two are out-of-date. So you should assemble (using --force if you must) with only the first four drives, and then once this is up and running, re-add the other two. Since RAID 6 allows two drive failures, it will resync. As long as the first drives aren't bad, the sync should succeed and you are back in the game with no additional data loss since the md failure.

Good luck.

If it does not work out you may have to resort to your backup after all.

RayDude · Posted: Sat Dec 08, 2012 5:40 am Post subject:

Thanks!

I removed the old sata card and added the two new SIL3132 boards. Unfortunately only one of them is recognized and I don't know why. The one plugged into the X16 port is not initialized, no bios no nothing. So I can only see five drives.

I had a four port PCI SATA raid card in my hand but I couldn't remember if this MOBO had a PCI slot so I got the PCIe cards...

Now I'm stuck buying a four port card from NewEgg because Frys didn't have any four port PCIe in stock (PCI is probably slow anyway) or I just have to bite the bullet and assemble with the four good drives and hope that there are no write errors until I find a way to hook up the sixth drive.

Man I wish I'd planned this better. I forgot this MOBO only had four SATA ports.

Update: Well, I let it rebuild the drives and it went to active sync.

Then I added the last drive and its currently rebuilding. So I'll have one redundant drive until I can find a solution that gives me four more SATA devices.

Any suggestions for a good card that's less than a hundred?

Thanks again guys!
_________________
Some day there will only be free software.

NeddySeagoon · Posted: Sat Dec 08, 2012 5:45 pm Post subject:

RayDude,

Write errors are actually fairly safe. The drive will realise the write failed and reallocate the failed sector.
Its read errors that are the problem. When a drive has problems with a read but its still successful, the data will be moved to s spare sector.
When a read fails, the data is lost and can't be moved. That's a fairly simplistic explaination anyway.
_________________
Regards,

NeddySeagoon

Computer users fall into two groups:-
those that do backups
those that have never had a hard drive fail.

Mad Merlin · Veteran Joined: 09 May 2005 Posts: 1155

frostschutz · Advocate Joined: 22 Feb 2005 Posts: 2977 Location: Germany

I'm not sure how expensive they are in $, but Lian Li IB-01 or Dawicontrol DC-624e, are around 80€, so shouldn't be >$100.

The Lian Li is just a port multiplier, though, and the Dawicontrol may need some extras ( http://theangryangel.co.uk/blog/marvell-88se9172-sata3-under-linux-as-of-320 but it may work out of the box in newer kernels maybe )

Since you mentioned you used internal and external ports on your card, there are lots of cards where the external port is shared, i.e. you can use either the external or the internal connector but not both at the same time.

RayDude · Posted: Thu Dec 13, 2012 12:44 am Post subject:

mdadm help again. I bought a cheap Marvell Based RAID III card from Newegg.

It worked from the start and drive six of the array started rebuilding.

At some point after about 20 hours the new controller died and took three hard drives with it ... again...

I guess the old adage, you get what you paid for, applies here.

Anywho. I bought a PCI raid card SIL3124 based, and its up and running. All six drives read as clean but three of them have a smaller event counts. How do I rebuild this with minimal damage. Is it possible?

RayDude · Posted: Thu Dec 13, 2012 5:31 am Post subject:

Update: I just typed 'mdadm -A --force /dev/md127' and it assembled and began rebuilding disk 6 again.

Then I ran fsck.ext4 on /dev/md127 and let the delete a few bad inodes.

Unfortunately, as I suspected, the PCI card is too slow and the rebuild of drive 6 hasn't moved a percent in several hours.

I'm still without a solution.
_________________
Some day there will only be free software.

NeddySeagoon · Posted: Thu Dec 13, 2012 9:41 pm Post subject:

RayDude,

If all your raid6 was doing was rebuilding - i.e. you were not writing anything to it, you might be lucky.
If there were files open for writing when your three drives went offline, you can expect those files to be in a mess.
If directory writes were in progress, the contents of these directories may be lost.

This wiki article works. The advantage with --create in degraded mode over --force, is that you can try all the combinatons to see if one degraded combination is better than another.

At first sight, there is no reason why it should be but the drives are not all written concurrently, so a failure such as you had will leave the drives in slightly different states. Degraded mode for you means four out of six drives. You need to think carefully before your mount the filesystem, even read only, as journal replays and the resulting writes will still happen. I think you can avoid the journal replay is you want but I don't know how.
_________________
Regards,

NeddySeagoon

Computer users fall into two groups:-
those that do backups
those that have never had a hard drive fail.

frostschutz · Advocate Joined: 22 Feb 2005 Posts: 2977 Location: Germany

--create is also dangerous though, if you get the command wrong that's bye bye to your data. Your top priority is your hardware issue though. It just doesn't do having three drives vanish in one go. If you have so many controllers failing maybe you should check for short circuits in your PSU/case, or any other cables for that matter. I've used cheap controllers myself and never had a failure. So your problems seem fishy to me somehow.

RayDude · Posted: Fri Dec 14, 2012 3:51 am Post subject:

Thanks guys.

There were only three bad inodes after the fsck, so the array seems fine.

With the PCI card, its still rebuilding the final drive. It is at 94% and will hopefully be done before the morning...

I have ordered an LSI logic RAID card from Amazon, it will arrive tomorrow. Hopefully it will be reliable in my mother board. Since I'll likely be able to boot my SSD off it, I'll connect four drives to the mother board that way if I have problems the most I will lose is two drives.

This sure has been an experience though, wow.
_________________
Some day there will only be free software.

RayDude · Posted: Sat Dec 15, 2012 7:19 am Post subject:

Just Venting...

I bought a SAS controller from LSI.

It didn't appear to support 3TB drives, so I upgraded the FW from a kubuntu boot usb drive.

It still doesn't appear to support 3TB drives.

What do I have to do to get a working four port SATA card?

*exasperated*
_________________
Some day there will only be free software.

NeddySeagoon · Posted: Sat Dec 15, 2012 1:15 pm Post subject:

RayDude,

How do you mean

RayDude · Posted: Sun Dec 16, 2012 4:30 am Post subject:

Thanks Neddy.

I already have working GPT 3 TB partitions on the drives. When I connect them to the LSI card, the are reported as 2048 MB and the GPT partitions are not present when the machine is booted.

I've found some forums complaining about this problem with LSI but I haven't found any solutions. I've emailed tech support.

I should point out that I removed the BDROM from the PCI raid card and the raid performance doubled to 135 MB / second... Pretty interesting.

NeddySeagoon · Posted: Sun Dec 16, 2012 2:22 pm Post subject:

RayDude,

GPT uses two copies of the partition table, one at the start of the drive and one at the end. It gets really upset if the two don't match.
With a 2048Mb limit, the copy at the end of the drive can't be read.

dmesg probably has errors about attempting to read beyond the end of the device.
_________________
Regards,

NeddySeagoon

Computer users fall into two groups:-
those that do backups
those that have never had a hard drive fail.

RayDude · Posted: Sun Dec 16, 2012 4:00 pm Post subject:

Thanks, I'm sure that's why the partition table seems empty when I attempt to look at it. Now its up to LSI.

I guess I'll have to break down and buy another raid card. The reviews on promise and hot point look bad. I might just buy another silicon images board, but make it for PCI Express instead of PCI... I don't want to because the one I used for several years died...
_________________
Some day there will only be free software.

Mad Merlin · Veteran Joined: 09 May 2005 Posts: 1155

You didn't mention which model of LSI card you ended up with. However, LSI has an article on the issue here: http://webcache.googleusercontent.com/search?q=cache:6MN0yCPeVn0J:http://kb.lsi.com/Print16399.aspx%2Blsi+2TB&oe=UTF-8&hl=en&ct=clnk

It looks like the 6Gbit/s cards (such as the 9211 I suggested above) support >= 3T drives while the older ones do not (unless you have SAS drives, which you likely don't). However, I've personally only used the 9211 with SSDs (which are, sadly, smaller than 2T still).
_________________
Game! - Where the stick is mightier than the sword!

RayDude · Posted: Tue Jan 01, 2013 8:45 pm Post subject:

Final Update. My older card would not recognize 3TB even with the IT firmware. I sent it back and ordered a Marvell SAS card. It works okay, but the performance is kinda random. For example the SSD plugged into the MVSAS card is about half as fast as it was on the motherboard. But it wasn't always this slow... I'm having trouble figuring out how to get full performance out of it.
_________________
Some day there will only be free software.