View previous topic :: View next topic |
Author |
Message |
mhelvens Guru
Joined: 17 Mar 2005 Posts: 337 Location: The Netherlands
|
Posted: Mon Oct 29, 2012 1:32 pm Post subject: [SOLVED] drive was removed from my RAID 5 array; is it dead? |
|
|
Hello all!
My /home dir consists of a RAID 5 array with three 1.5TB disks. Yesterday I did an `emerge --update --deep world`. Today upon reboot, /home didn't mount.
So, I did an `mdadm --assemble --scan` and got the message:
Code: | mdadm: /dev/md127 has been started with 2 drives (out of 3) |
`mdadm --detail /dev/md127` now shows one of the drives as 'removed':
Code: | mhelvens-pc mhelvens # mdadm --detail /dev/md127
/dev/md127:
Version : 1.2
Creation Time : Thu Oct 20 19:41:06 2011
Raid Level : raid5
Array Size : 2930272256 (2794.53 GiB 3000.60 GB)
Used Dev Size : 1465136128 (1397.26 GiB 1500.30 GB)
Raid Devices : 3
Total Devices : 2
Persistence : Superblock is persistent
Update Time : Mon Oct 29 14:06:06 2012
State : clean, degraded
Active Devices : 2
Working Devices : 2
Failed Devices : 0
Spare Devices : 0
Layout : left-symmetric
Chunk Size : 512K
Name : michiel-pc:0
UUID : 82da8dc5:42efff78:bcce5cab:0baa4591
Events : 24760
Number Major Minor RaidDevice State
3 8 17 0 active sync /dev/sdb1
1 8 33 1 active sync /dev/sdc1
2 0 0 2 removed |
If I press on, and tell `mdadm` specifically about /dev/sdd1, I will get something like:
Code: | mdadm: /dev/md/michiel-pc:0_0 assembled from 1 drive - not enough to start the array |
Also `mdadm /dev/md127 --re-add /dev/sdd1` doesn't work:
Code: | mdadm: --re-add for /dev/sdd1 to /dev/md127 is not possible |
Of course, it occurred to me that the drive may be dead (that's why I have RAID5, after all). But it seems like too much of a coincidence that this happened after a long overdue world update where I... didn't pay particular attention to the messages afterwards.
How can I be sure?
Thanks in advance!
Last edited by mhelvens on Tue Oct 30, 2012 7:53 pm; edited 1 time in total |
|
Back to top |
|
|
DaggyStyle Watchman
Joined: 22 Mar 2006 Posts: 5909
|
Posted: Mon Oct 29, 2012 3:02 pm Post subject: |
|
|
far from being an expert but can you see anything on that drive? partition tables? smart status?
also if I'm not mistaken, in order to get redundancy feature working in raid5 (e.g. loose one drive, data still intact) requires 4 drives, running raid5 on three drives is raid 5 without redundancy. _________________ Only two things are infinite, the universe and human stupidity and I'm not sure about the former - Albert Einstein |
|
Back to top |
|
|
mhelvens Guru
Joined: 17 Mar 2005 Posts: 337 Location: The Netherlands
|
Posted: Mon Oct 29, 2012 3:06 pm Post subject: |
|
|
DaggyStyle wrote: | far from being an expert but can you see anything on that drive? partition tables? smart status? |
Looks like. In fdisk I can still see this info (looks fine):
Code: | Disk /dev/sdd: 1500.3 GB, 1500301910016 bytes, 2930277168 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk identifier: 0xa7a50b94
Device Boot Start End Blocks Id System
/dev/sdd1 2048 2930277167 1465137560 fd Linux raid autodetect |
Is there any other specific info I should look up?
DaggyStyle wrote: | also if I'm not mistaken, in order to get redundancy feature working in raid5 (e.g. loose one drive, data still intact) requires 4 drives, running raid5 on three drives is raid 5 without redundancy. |
No, that's not true. RAID5 uses 1 drive for redundancy when you have 3 drives total or more. Right now the array is running fine on 2 drives, without redundancy. |
|
Back to top |
|
|
DaggyStyle Watchman
Joined: 22 Mar 2006 Posts: 5909
|
Posted: Mon Oct 29, 2012 3:15 pm Post subject: |
|
|
mhelvens wrote: | DaggyStyle wrote: | far from being an expert but can you see anything on that drive? partition tables? smart status? |
Looks like. In fdisk I can still see this info (looks fine):
Code: | Disk /dev/sdd: 1500.3 GB, 1500301910016 bytes, 2930277168 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk identifier: 0xa7a50b94
Device Boot Start End Blocks Id System
/dev/sdd1 2048 2930277167 1465137560 fd Linux raid autodetect |
Is there any other specific info I should look up?
DaggyStyle wrote: | also if I'm not mistaken, in order to get redundancy feature working in raid5 (e.g. loose one drive, data still intact) requires 4 drives, running raid5 on three drives is raid 5 without redundancy. |
No, that's not true. RAID5 uses 1 drive for redundancy when you have 3 drives total or more. Right now the array is running fine on 2 drives, without redundancy. |
ok, it seems that my IT guy is wrong.
I think it maybe worthwhile to check the superblock on that drive and see if it matches the other's superblock, also, did you upgraded your kernel before it happened?
in addtion, try to run smartctl on the drive and get some data and see if the drive is not pre fail. _________________ Only two things are infinite, the universe and human stupidity and I'm not sure about the former - Albert Einstein |
|
Back to top |
|
|
mhelvens Guru
Joined: 17 Mar 2005 Posts: 337 Location: The Netherlands
|
Posted: Mon Oct 29, 2012 3:24 pm Post subject: |
|
|
DaggyStyle wrote: | I think it maybe worthwhile to check the superblock on that drive and see if it matches the other's superblock, |
I'm not sure how to do that.
DaggyStyle wrote: | also, did you upgraded your kernel before it happened? |
Nope.
DaggyStyle wrote: | in addtion, try to run smartctl on the drive and get some data and see if the drive is not pre fail. |
Brilliant! Never used that. Anyway, the drive in question PASSED with flying colours. No errors reported, etc.
Seems the drive is not dying. Just don't know how to add it back to the array.
Should I try --add? I didn't want to try that yet, as I assumed this would add it as a new drive, and completely resync. |
|
Back to top |
|
|
DaggyStyle Watchman
Joined: 22 Mar 2006 Posts: 5909
|
Posted: Mon Oct 29, 2012 4:05 pm Post subject: |
|
|
mhelvens wrote: | DaggyStyle wrote: | I think it maybe worthwhile to check the superblock on that drive and see if it matches the other's superblock, |
I'm not sure how to do that.
DaggyStyle wrote: | also, did you upgraded your kernel before it happened? |
Nope.
DaggyStyle wrote: | in addtion, try to run smartctl on the drive and get some data and see if the drive is not pre fail. |
Brilliant! Never used that. Anyway, the drive in question PASSED with flying colours. No errors reported, etc.
Seems the drive is not dying. Just don't know how to add it back to the array.
Should I try --add? I didn't want to try that yet, as I assumed this would add it as a new drive, and completely resync. |
there are specific entries to watch in smartctl{s output, search the forum.
why didn't you tried to add it again?
as for superblock I assume that dd and md5sum is the way. _________________ Only two things are infinite, the universe and human stupidity and I'm not sure about the former - Albert Einstein |
|
Back to top |
|
|
mhelvens Guru
Joined: 17 Mar 2005 Posts: 337 Location: The Netherlands
|
Posted: Mon Oct 29, 2012 4:24 pm Post subject: |
|
|
DaggyStyle wrote: | there are specific entries to watch in smartctl{s output, search the forum. |
I looked at all the info. Everything looking good. Health status: PASSED. Everything completed without errors. I ran a short self-test: no errors.
DaggyStyle wrote: | why didn't you tried to add it again? |
Because the drive was already in the array before, so I assumed it would be a quick fix. When I --add, it takes quite a while to complete.
I now guess that a 'quick' fix (like --re-add) couldn't work because while the array was mounted with only two drives there were write-actions. So perhaps the third drive was now inconsistent? Just guessing.
So I used --add anyway. It's now recovering. 1150 minutes to go. I assume it will go faster once I stop using the array. But right now, I have no choice. Work to complete.
Thanks! I'll report back when it completes. |
|
Back to top |
|
|
energyman76b Advocate
Joined: 26 Mar 2003 Posts: 2048 Location: Germany
|
Posted: Tue Oct 30, 2012 5:40 pm Post subject: |
|
|
ok, only read the start and not all of the rest.
I hope you didn't do anything stupid in the mean time.
First of all:
most of the time a drive is not added to an array nothing serious happened. Driver was not done initializing hardware and similar stuff. Nothing bad. Just timing.
smartctl is a good call. Please have smartd run. Always. Especially with raid devices.
Check dmesg. No errors?
Then continue:
First things first: log out as user. Unmout /home. The less you do on that FS the smaller the chance that the resync will run into problems.
Second. Stop the array
mdadm -S /dev/md127 or whatitscalled
Third. Start the array
mdadm -A /dev/mc127
and report back.
I almost never experience problems with kernel assembled arrays. But the 'new' superblock 1.2 arrays that have to be assembled by mdraid during boot are a different story....
Edit: read the rest now.
Ok, add/readd might work or not... restarting is easier...
the next time your array is degraded and not mounted, don't mount it. Fix it first. If it is degraded and mounted... well... _________________ Study finds stunning lack of racial, gender, and economic diversity among middle-class white males
I identify as a dirty penismensch. |
|
Back to top |
|
|
mhelvens Guru
Joined: 17 Mar 2005 Posts: 337 Location: The Netherlands
|
Posted: Tue Oct 30, 2012 5:55 pm Post subject: |
|
|
energyman76b wrote: | ok, only read the start and not all of the rest.
I hope you didn't do anything stupid in the mean time. |
Maybe I did. But if so, it appears to have gone well enough.
Please read on and help me find out if everything is fine now?
energyman76b wrote: | smartctl is a good call. Please have smartd run. Always. Especially with raid devices. |
Thanks for the tip! I'll find out more about smartd. Until now I've been running the mdadm daemon, which is also supposed to warn me if anything goes wrong. Would that be redundant?
energyman76b wrote: | Check dmesg. No errors?
Then continue:
First things first: log out as user. Unmout /home. The less you do on that FS the smaller the chance that the resync will run into problems.
Second. Stop the array
mdadm -S /dev/md127 or whatitscalled
Third. Start the array
mdadm -A /dev/mc127
and report back. |
Did all that (sort of). I reported the outcome in my first post. I just neglected to mention some of the steps (unmount, stop array, etc.). Except that I started right away with --assemble --scan.
Anyway, now that you've read the rest... So I did an --add. It was recovering the array all night, and it seems to have gone ok. Here's `mdadm --detail /dev/md127`:
Code: | mhelvens-pc / # mdadm --detail /dev/md127
/dev/md127:
Version : 1.2
Creation Time : Thu Oct 20 19:41:06 2011
Raid Level : raid5
Array Size : 2930272256 (2794.53 GiB 3000.60 GB)
Used Dev Size : 1465136128 (1397.26 GiB 1500.30 GB)
Raid Devices : 3
Total Devices : 3
Persistence : Superblock is persistent
Update Time : Tue Oct 30 18:48:08 2012
State : clean
Active Devices : 3
Working Devices : 3
Failed Devices : 0
Spare Devices : 0
Layout : left-symmetric
Chunk Size : 512K
Name : michiel-pc:0
UUID : 82da8dc5:42efff78:bcce5cab:0baa4591
Events : 56187
Number Major Minor RaidDevice State
3 8 17 0 active sync /dev/sdb1
1 8 33 1 active sync /dev/sdc1
4 8 49 2 active sync /dev/sdd1 |
As you can see, it looks fine. Only thing is: The 'Number' column now shows '4' for sdd1. As far as mdadm is concerned, I suppose, nr. 2 died, and I put nr. 4 in its place.
Anyway, can you recommend any final tests to make sure everything is OK?
energyman76b wrote: | the next time your array is degraded and not mounted, don't mount it. Fix it first. If it is degraded and mounted... well... |
I'll remember that!
Thanks!
Last edited by mhelvens on Tue Oct 30, 2012 6:00 pm; edited 2 times in total |
|
Back to top |
|
|
Jaglover Watchman
Joined: 29 May 2005 Posts: 8291 Location: Saint Amant, Acadiana
|
|
Back to top |
|
|
energyman76b Advocate
Joined: 26 Mar 2003 Posts: 2048 Location: Germany
|
Posted: Tue Oct 30, 2012 6:34 pm Post subject: |
|
|
mdadm will only scream when a disk is dead.
smartd can warn you so you might be able to act before the disk is dead.
It also runs self tests - if you configure it that way - which also help to find worrisome developments.
For example this:
199 UDMA_CRC_Error_Count 0x0036 100 100 000 Old_age Always - 2
200 Multi_Zone_Error_Rate 0x002a 100 100 000 Old_age Always - 2
223 Load_Retry_Count 0x0032 100 100 000 Old_age Always - 1
happens. if it does not change over weeks or month nothing to worry about. But if it goes up quickly... get a new disk ASAP.
196 Reallocated_Event_Count 0x0032 100 100 000 Old_age Always - 0
197 Current_Pending_Sector 0x0012 100 100 000 Old_age Always - 0
or those two. One or two? might happen. Constant growth? Time for a backup. _________________ Study finds stunning lack of racial, gender, and economic diversity among middle-class white males
I identify as a dirty penismensch. |
|
Back to top |
|
|
mhelvens Guru
Joined: 17 Mar 2005 Posts: 337 Location: The Netherlands
|
Posted: Tue Oct 30, 2012 7:53 pm Post subject: |
|
|
Ok. Thanks! |
|
Back to top |
|
|
Jaglover Watchman
Joined: 29 May 2005 Posts: 8291 Location: Saint Amant, Acadiana
|
|
Back to top |
|
|
mhelvens Guru
Joined: 17 Mar 2005 Posts: 337 Location: The Netherlands
|
Posted: Sun Nov 04, 2012 12:41 pm Post subject: |
|
|
Jaglover wrote: | And the solution was? |
I described the 'solution' in my earlier post. I used `mdadm --add` to add the drive back into the array. It had to completely resync, but is working fine now.
This was not the ideal solution. I could have possibly let it 'catch back up' with the array, but I had already mounted it, and I didn't know how.
I marked the topic as [SOLVED] because my problem is now gone. Is this not common practice?
Cheers! |
|
Back to top |
|
|
|