| View previous topic :: View next topic |
| Author |
Message |
mr-simon Guru


Joined: 22 Nov 2002 Posts: 314 Location: Leamington Spa, Warks, UK
|
Posted: Fri Dec 13, 2002 12:26 am Post subject: software raid-5 recovery... HELP! |
|
|
I'm not having a good day today...
OK, first a quick system overview, so we know where we stand.
/dev/hda1 <-- 40gb, /dev/md0 disk 0 (40g drive)
/dev/hdb1 <-- 40gb, /dev/md0 disk 1 (40g drive)
/dev/hde2 <-- 40gb, /dev/md0 disk 2 (160g drive)
software raid-5
/dev/hde1 <-- 120gb, /dev/md1 disk 1 (rest of 160g drive)
/dev/hdf1 <-- 120gb, /dev/md1 disk 2 (120g drive)
/dev/hdg1 <-- 120gb, /dev/md1 disk 3 (120g drive)
/dev/hdh1 <-- 120gb, /dev/md1 disk 4 ( 120g drive)
software raid-5
All partitions are 'linux raid autodetect', and have persistent superblock. Were created on previous install. I forgot to make an entry in /etc/raidtab for them. I've re-created it now (root partition not on raid) but I can't remember the chunk-size for /dev/md1 so I left that out.
drives hda1 and hda2 connected to on-motherboard IDE
the rest connected to highpoint raid controller.
Now... Some weird glitch seemed to happen to drives hde and hdf, just for a moment. Could be a dodgy cable... I've replaced it just to be on the safe side. I got three messages spewed out onto the console saying the partitions were down. I halted the server immediately.
When I brought it back up, /dev/md0 came up in degraded mode - with /dev/hde2 missing. /dev/md1 did not come up at all... although there were some log messages about it. (Something to do with two drives having screwed up I guess.)
I can fdisk /dev/hde and /dev/hdf and see the partitions on there fine. The drives are working as far as I can tell... But when they went down it obviously mangled them enough for Linux not to be able to autodetect and bring them up automagically.
Soo... I checked /proc/mdstat. /dev/md1 is not listed at all, but /dev/md0 says [UU_], the _ drive being /dev/hde2 as I suspected. I did a raidhotadd /dev/md0 /dev/hde2... 2 hours pass... mmm... it seems to like that. We have a happy /dev/md0 again.
Now... Here's the problem.
*what* do I do with /dev/md1
I'm scared of using mkraid because I've never done this before and I'd really prefer not to mangle the contents of the RAID. (Er... Did I mention it's not backed up? Doh. I guess fault-tolerant is only tolerant, not proof! ) As I said before, I re-created my /proc/raidtab (minus chunk-size) and tried to manually raidstart /dev/md1. Here is what happens: | Code: | # raidstart /dev/md1
raid5: failed to run raid set md1 |
Here is what /var/log/messages has to say about that, form the moment I typed the above command. (excuse typos, laboriously copied by hand off other monitor) | Code: | kernel: [events: 0000004b]
kernel: [events: 0000004b]
kernel: [events: 0000004d]
kernel: [events: 0000004d]
kernel: md: autorun ...
kernel: md: considering hdh1 ...
kernel: md: adding hdh1 ...
kernel: md: adding hdg1 ...
kernel: md: adding hdf1 ...
kernel: md: adding hde1 ...
kernel: md: created md1
kernel: md: bind<hde1,1>
kernel: md: bind<hdf1,2>
kernel: md: bind<hdg1,3>
kernel: md: bind<hdh1,4>
kernel: md: running: <hdh1><hdg1><hdf1><hdg1>
kernel: md: hdh1's event counter: 0000004d
kernel: md: hdg1's event counter: 0000004d
kernel: md: hdf1's event counter: 0000004b
kernel: md: hde1's event counter: 0000004b
kernel: md: freshest: hdh1
kernel: md: unbind<hdf1,3>
kernel: md: export_rdev(hdf1)
kernel: md: unbind<hde1,2>
kernel: md: export_rdev(hde1)
kernel: md: md1: max total readahead window set to 1536k
kernel: md: md1: 3 data-disks, max readahead per data-disk: 512k
kernel: md: raid5: device hdh1 operational as raid disk 3
kernel: md: raid5: device hdg1 operational as raid disk 2
kernel: md: md1 stopped.
kernel: md: unbind<hdh1,1>
kernel: md: export_rdev(hdh1)
kernel: md: unbind<hdg1,0>
kernel: md: export_rdev(hdg1)
kernel: md: ... autorun DONE. | Did I get that right? Erm, I missed off the beginning of all the lines with the time and my hostname. All the above events happened within the same second.
The thing that appears to me is that hde1 and hdf1 have an event counter 2 less... I'm not quite sure what an event counter is, but it would seem appropriate that those two drives had failed and the others had carried on for, er, two events without them.
So... We get round to the question. What, if anything, can I do to bring these drives back. If I had backups I'd be happy to try a mkraid to see what happens... But because I don't, I really want to be sure of what I'm doing before I type anything.
Can anyone tell me what I need to do? It's pretty safe to say I'm in panic mode right now.  _________________ "Pokey, are you drunk on love?"
"Yes. Also whiskey. But mostly love... and whiskey."
Adopt an unanswered post |
|
| Back to top |
|
 |
mr-simon Guru


Joined: 22 Nov 2002 Posts: 314 Location: Leamington Spa, Warks, UK
|
Posted: Fri Dec 13, 2002 11:43 am Post subject: solved |
|
|
Never mind... I discovered this article that mentioned the existance of mdadm - a much better alternative to raidtools. I could:
#mdadm --examine /dev/hdh1
And it told me everything I needed to know about it, including the chunk size, which was my biggest worry about trying to use mkraid --force. In the end, I used:
# mdadm --assemble /dev/md1 --uuid=the:uuid:of:the:drive:discovered:above --force /dev/hd*
Ta-daaa... My raid came up. I breathed a heavy sigh of relieve. I went back to bed.  _________________ "Pokey, are you drunk on love?"
"Yes. Also whiskey. But mostly love... and whiskey."
Adopt an unanswered post |
|
| Back to top |
|
 |
pjp Administrator


Joined: 16 Apr 2002 Posts: 15989 Location: Colorado
|
Posted: Fri Dec 13, 2002 5:18 pm Post subject: |
|
|
Moved from H&L.
| Quote: | | software raid-5 recovery |
_________________ Safety is my gaol.
US Constitution | Amendments |
|
| Back to top |
|
 |
|
|
You cannot post new topics in this forum You cannot reply to topics in this forum You cannot edit your posts in this forum You cannot delete your posts in this forum You cannot vote in polls in this forum
|
|