Gentoo Forums
Gentoo Forums
Gentoo Forums
Quick Search: in
[SOLVED] Booting issues with mdadm
View unanswered posts
View posts from last 24 hours

 
Reply to topic    Gentoo Forums Forum Index Kernel & Hardware
View previous topic :: View next topic  
Author Message
alexbuell
Guru
Guru


Joined: 18 Jul 2002
Posts: 484
Location: "Hemp"shire, UK

PostPosted: Thu Nov 11, 2010 4:05 pm    Post subject: [SOLVED] Booting issues with mdadm Reply with quote

I've been experiencing booting issues with mdamd on one of my boxes.

This machine has a pair of 36GB SCSI disks installed. These disks are organised:

/dev/sda, divided into four partitions, 1) sda1 - bootable partition 2) sda2 - swap partition 3) sda3 - SUN disk label - whole disk 4) sda4 - root partition.
/dev/sdb is the same as above, for sdb1, sdb2, sdb3 and sdb4.

/dev/sda1 and /dev/sdb1 are set up as RAID1 as a bootable /dev/md0 device, tagged as partition type fd
/dev/sda2 and /dev/sdb2 are set up as RAID1 as a swap /dev/md1 device, tagged as partition type fd
/dev/sda3 and /dev/sdb3 are reserved as a Sun disk label, already tagged as partition type 5
/dev/sda4 and /dev/sdb4 are set up as RAID10,layout=f2 as a root /dev/md2 device, tagged as partition type fd

/dev/md0 is an ext3 journalling filesystem set up with kernel and initrd to boot.
/dev/md1 is a swap partition
/dev/md2 is an ext4 partition, with root filesystem on it.

OK, this is all good. I use UUIDs to mount the filesystems when needed in /etc/fstab and in bootloader.

On booting, the kernel loads, detects the arrays:

Code:

..
..

Activating mdev
mdadm: /dev/md0 has been started with 2 drives.
mdadm: WARNING: /dev/sdb4 and /dev/sdb3 appears to have very similar superblocks.
          If they are really different, please --zero the superblock on one
          If they are the same or overlap, please remove one from the
          DEVICE list in mdadm.conf
mdadm: /dev/md1 has been started with 2 drives.
>> Determining root device...
!!! Could not find the root block device in UUID=****-***-**-*****.
    Please specify another value or: Press Enter for the same, type "shell" for a shell, or "q" to skip....
root block device(UUID=****-***-**-**-****) ::


Hmm OK, so I drop down to console to find out why. A quick look at /proc/mdstat reveals md1 and md0 are active, that's good, but no md2? I look at the list of blkids, why is it including /dev/sda3 and /dev/sdb3? These shouldn't be recognizable by mdadm as RAID members as they're of partition type 5. Strange...

A closer look at the blkids reveals that /dev/sda3 and /dev/sdb3 have the same UUIDs as /dev/sda4 and /dev/sdb4. Any ideas why?

I try and reassemble /md2

Code:

mdadm --assemble /dev/md2 /dev/sda4 /dev/sdb4
mdadm: WARNING: /dev/sdb4 and /dev/sdb3 appears to have very similar superblocks.
          If they are really different, please --zero the superblock on one
          If they are the same or overlap, please remove one from the
          DEVICE list in mdadm.conf


What seems to be the problem? :(
_________________
Cheers,
Alex.

Linux - the best text adventure game ever.


Last edited by alexbuell on Fri Nov 12, 2010 12:15 am; edited 1 time in total
Back to top
View user's profile Send private message
alexbuell
Guru
Guru


Joined: 18 Jul 2002
Posts: 484
Location: "Hemp"shire, UK

PostPosted: Thu Nov 11, 2010 6:56 pm    Post subject: Reply with quote

I just booted off an external disk pack.

Looks like udev starts looking through the disks and builds the arrays.

Here's what I found:
Code:

# cat /proc/mdstat
Personalities : [raid0] [raid1] [raid6] [raid5] [raid4] [raid10]
md124 : active raid10 sda3[0] sdb3[1]
      33365888 blocks 64K chunks 2 far-copies [2/2] [UU]
     
md125 : active (auto-read-only) raid10 sda4[0] sdb4[1]
      33365888 blocks 64K chunks 2 far-copies [2/2] [UU]
     
md126 : active (auto-read-only) raid1 sda2[0] sdb2[1]
      2097088 blocks [2/2] [UU]
     
md127 : active (auto-read-only) raid1 sda1[0] sdb1[1]
      102336 blocks [2/2] [UU]
     
unused devices: <none>


Hmm, md124 shouldn't be there at all, period.

Code:

# fdisk -l /dev/sda

Disk /dev/sda (Sun disk label): 64 heads, 32 sectors, 34732 cylinders
Units = cylinders of 2048 * 512 bytes

   Device Flag    Start       End    Blocks   Id  System
/dev/sda1             0       100    102400   fd  Linux raid autodetect
/dev/sda2           100      2148   2097152   fd  Linux raid autodetect
/dev/sda3             0     34732  35565568    5  Whole disk
/dev/sda4          2148     34732  33366016   fd  Linux raid autodetect
helium alex # fdisk -l /dev/sdb

Disk /dev/sdb (Sun disk label): 64 heads, 32 sectors, 34732 cylinders
Units = cylinders of 2048 * 512 bytes

   Device Flag    Start       End    Blocks   Id  System
/dev/sdb1             0       100    102400   fd  Linux raid autodetect
/dev/sdb2           100      2148   2097152   fd  Linux raid autodetect
/dev/sdb3             0     34732  35565568    5  Whole disk
/dev/sdb4          2148     34732  33366016   fd  Linux raid autodetect


Decidedly most odd. Any ideas why it is assembling a RAID for /dev/sda3 and /dev/sdb3?

I also did a blkid:
Code:

/dev/sda2: UUID="b7f78830-8e88-425b-91df-3973bee039dc" TYPE="swap"
/dev/sda4: UUID="dad11fe4-8a6c-4216-9093-b0931e5c9a9e" TYPE="ext4"
/dev/sda1: UUID="5d3df0f3-c46e-9dcf-d706-bb7b0cd30ce9" TYPE="linux_raid_member"
/dev/sda3: UUID="51841d8a-98d4-1838-d706-bb7b0cd30ce9" TYPE="linux_raid_member"
/dev/sdb1: UUID="5d3df0f3-c46e-9dcf-d706-bb7b0cd30ce9" TYPE="linux_raid_member"
/dev/sdb2: UUID="107ae772-7b1c-00fa-d706-bb7b0cd30ce9" TYPE="linux_raid_member"
/dev/sdb3: UUID="51841d8a-98d4-1838-d706-bb7b0cd30ce9" TYPE="linux_raid_member"
/dev/sdb4: UUID="51841d8a-98d4-1838-d706-bb7b0cd30ce9" TYPE="linux_raid_member"
/dev/md127: UUID="9c2adc5f-190f-4924-9369-755f601a8742" SEC_TYPE="ext2" TYPE="ext3"
/dev/md127p1: UUID="9c2adc5f-190f-4924-9369-755f601a8742" TYPE="ext2"
/dev/md127p3: UUID="9c2adc5f-190f-4924-9369-755f601a8742" SEC_TYPE="ext2" TYPE="ext3"
/dev/md126: UUID="b7f78830-8e88-425b-91df-3973bee039dc" TYPE="swap"
/dev/md125: UUID="dad11fe4-8a6c-4216-9093-b0931e5c9a9e" TYPE="ext4"
/dev/md124: UUID="9c2adc5f-190f-4924-9369-755f601a8742" TYPE="ext2"
/dev/md124p1: UUID="9c2adc5f-190f-4924-9369-755f601a8742" TYPE="ext2"
/dev/md124p3: UUID="9c2adc5f-190f-4924-9369-755f601a8742" TYPE="ext2"


Something's definitely wrong somewhere.
_________________
Cheers,
Alex.

Linux - the best text adventure game ever.
Back to top
View user's profile Send private message
alexbuell
Guru
Guru


Joined: 18 Jul 2002
Posts: 484
Location: "Hemp"shire, UK

PostPosted: Fri Nov 12, 2010 12:16 am    Post subject: Reply with quote

Solved.

Turned out to be the 0.9 superblocks that were right at the end of each partition that screwed up /dev/sd[ab]3 partitions. Once I changed over to 1.2 superblocks, the problems went away.

Thank the Ghods for that :-)
_________________
Cheers,
Alex.

Linux - the best text adventure game ever.
Back to top
View user's profile Send private message
ocbMaurice
Tux's lil' helper
Tux's lil' helper


Joined: 14 Feb 2003
Posts: 84
Location: Switzerland

PostPosted: Sat Nov 20, 2010 4:20 pm    Post subject: Reply with quote

Hi there,

I'm not sure if I had the same problem as you guys, but it's at least similar. I was in the process of updating my server to the latest stable world. After the reboot a few things went bad. First mdraid would not want to start since my server has a hardened profile which is still on baselayout 1. Then I discovered that my raid (/dev/md0) went missing and indeed /proc/mdstat gave me some strange and frightening data:
Code:
Personalities : [raid6] [raid5] [raid4]
md126 : active (auto-read-only) raid5 sde[2] sdd[1] sdf[4](S) sdc[0]
      2930287488 blocks level 5, 64k chunk, algorithm 2 [4/3] [UUU_]

md127 : inactive sdb2[0](S)
      160576 blocks

md0 : inactive sdb3[0](S)
      976583232 blocks

unused devices: <none>


But it should've looked like this:
Code:
Personalities : [raid6] [raid5] [raid4]
md0 : active raid5 sdb3[0] sdf3[4] sde3[3] sdd3[2] sdc3[1]
      3906332928 blocks level 5, 64k chunk, algorithm 2 [5/5] [UUUUU]

unused devices: <none>


So I then started to boot with the previous kernel version, updated to openrc and baselayout-2 and so on. I was able to read out the superblocks of the raid-member disk with mdadm and I was able to assemble my raid manually (after stoping the strange md12[67] devices). Unfortunately after some reboot the 'wrong' md126 raid started to resync itself which I knew on the point was very very bad for my real data. Anyway, I finally could isolate the culprit to be the update from mdadm-3.0 to mdadm-3.1.4. (mdadm-3.1.1-r1 also fails). I can reproduce this behaviour now anytime by compiling the version of mdadm and reboot. I also discovered that for mdadm-3.0 to have an md0 device I must start mdraid, while mdadm-3.1.x detects the bad devices on startup (don't know exactly when or how).

alexbuell wrote:
Turned out to be the 0.9 superblocks that were right at the end of each partition that screwed up /dev/sd[ab]3 partitions. Once I changed over to 1.2 superblocks, the problems went away.

How did you do that. Did you re-create the array completely or is there an easier way without moving all the data around??
I guess that this could very well be the problem here too since I have "ARRAY /dev/md0 metadata=0.90" in my mdadm.conf.

Since I have now moved all the data from my array, I will try to recreate it with mdadm-3.1.4 and see if it is auto-detected correctly.
I will backup the superblocks of each current member disk in case anyone ever would like to look into this (just give me a pm).

I'll document a few of the steps I did to recover my ext3 filesystem.
Remember to always store the commands you used to create an array.
Do some experiments while the array is empty. I'm so glad I did all that!

Check the status of you raid arrays
Code:
cat /proc/mdstat

If it's not correct, stop the wrong ones
Code:
mdadm --stop /dev/md127

Emerge mdadm-3.0, reboot and check again

Otherwise, try to re-assemble your array manually
Code:
mdadm --assemble --verbose /dev/md0 --chunk=64 /dev/sd[bcdef]3

If there has been some data loss you might need to recover ext3 superblocks
Code:
fsck.ext3 -b 550731776 /dev/md0

You should find enough info on how to calculate the superblock offset on the net.

The last day was kind of a rollercoaster ride.
This really is the first "bug" that has hit me that hard.
I'm just happy I could restore nearly anything.

*Note to myself* - "raid 5 is not a backup solution!"

Maurice
Back to top
View user's profile Send private message
korban
n00b
n00b


Joined: 04 Feb 2003
Posts: 37
Location: Vienna, Austria

PostPosted: Wed Nov 24, 2010 9:58 am    Post subject: Reply with quote

Hi!

I had a similar problem:

updated from mdadm-3.0 to mdadm-3.1.4 about a week ago.
today I rebooted my server and /dev/md0 was gone - instead I had /dev/md127, which didn't contain a valid filesystem

unfortunately I started a fsck.ext3 /dev/md127, which messed up things for me :-(
half way I realized there's something wrong, stopped fsck process, stopped the raid device and restarted it by doing mdadm --assemble --scan

just waiting for the final fsck to finish...

Did anyone find some information on updating superblock yet?

update:
I don't think the superblock can be updated. According to https://raid.wiki.kernel.org/index.php/RAID_superblock_formats the main difference between 0.9, 1, 1.1 and 1.2 ist the position of the superblock on the device.

update2:
I disabled kernel autodetect of software raid by using kernel parameter "raid=noautodetect" and added mdraid to runlevel "boot". Same problem after reboot. (mdraid says it can only be used with baselayout-2 - I have 1.12.14-r1).
Back to top
View user's profile Send private message
ocbMaurice
Tux's lil' helper
Tux's lil' helper


Joined: 14 Feb 2003
Posts: 84
Location: Switzerland

PostPosted: Thu Dec 02, 2010 9:50 am    Post subject: Reply with quote

korban wrote:
I don't think the superblock can be updated. According to https://raid.wiki.kernel.org/index.php/RAID_superblock_formats the main difference between 0.9, 1, 1.1 and 1.2 ist the position of the superblock on the device.

I came to the same conclusion. I copied the whole raid to external harddisks, cleared the raid superblocks on each disk. Then updated mdadm and re-created the raid again, then finally copying back all my data (will keep the data on the external disks for a while, just to be sure).

korban wrote:
I disabled kernel autodetect of software raid by using kernel parameter "raid=noautodetect" and added mdraid to runlevel "boot". Same problem after reboot. (mdraid says it can only be used with baselayout-2 - I have 1.12.14-r1).

Been there too ;) I upgraded to baselayout-2 and openrc (which wasn't really hard), but it didn't solve my problem.

I was able to find some more info which I haven't posted yet. With mdadm-3.1.4 I could not read out the superblock of /dev/sdb (all other disks were doing fine). What I find strange is that the drives are identical, bought at the same time and have been initialized at the same time, so why did only /dev/sdb fail?

Hope you got your data back too!
Have a nice day.
Back to top
View user's profile Send private message
korban
n00b
n00b


Joined: 04 Feb 2003
Posts: 37
Location: Vienna, Austria

PostPosted: Thu Dec 02, 2010 11:08 am    Post subject: Reply with quote

Hi!

@ocbMaurice: Thanks for the info, upgrading to baselayout-2 would have been my next step.

I did get my data back, but I'm still looking for another solution rather than re-creating raid.
I have 3 raid arrays (5.4TB, 2.7TB and 1.4 TB) and two of them are heavy used production systems - migrating the data would take very long!

Thanks,
korban
Back to top
View user's profile Send private message
miroR
l33t
l33t


Joined: 05 Mar 2008
Posts: 826

PostPosted: Fri Dec 03, 2010 12:11 am    Post subject: Reply with quote

Hi, folks!
Pasting from:
http://www.mail-archive.com/gentoo-user@lists.gentoo.org/msg100212.html
Code:
> "v0.90 can be used with 'in kernel autodetect' (i.e. partition type 0xfd).
> v1 cannot (I consider this an improvement :-)"

And that is Kerin Millar quoting Neil Brown while replying to Mark Knecht.
I thank both of them now, Mark for his resolute and stubborn work with unbootable systems due to mdadm issues, and Kerin for his identifying the fault and solving of the problem.
I would still be installing grub and wondering which drivers were buggy or whatever if it weren't for their correspondence that I stumbled eventually upon...
I started naively to accomplish an easy trasition from raid10 to raid6 as I need a SATA slot and have only 6 slots per machine... and I bet that it would go smoothly, but ended up in a mental quagmire of a dozen vain (non-booting) recompiles, what a nightmare...
That quote above was a rope of salvation.
I got my systems back.
It was mdadm, the near total incompatibility between the versions thereof.
The first part of the whole story is elsewhere, and I will try, for others who might get into similar trouble, to just add, to what I already wrote, in this thread you're reading, about how I got my systems back, and point to what the reasons seem to me to have been of the downtime.
So, pls., you will only fully understand the rest of the story if you read the first posts that I wrote here:
https://forums.gentoo.org/viewtopic-t-854436-highlight-mdadm.html
Hope you did and came back.
I was so confident at first:
miroR wrote:
I bet I'll do the transition (I corrected the language now) from raid10 to raid6 smoothly.
...[snip]...

Then I went on to almost blame myself, as I just couldn't make it...
I hope you read the rest, it's all in that thread. I now feel like posting my /dev/md1 with the superblock version 1.2 created with mdadm 3.1.4 with the same command as my previous /dev/md1 with the superblock version 0.90 created with a much older mdadm (which I was able to use from Sysresccd 1.1.5, I believe, that I was clever enough to keep, and is some two to three years old or even older).
The command for both of the /dev/md1 raid6 devices is:
Code:
mdadm -C /dev/md1 -l6 -n5 -c128 /dev/sd[a-e]2

Prior to that command, I was able to ascertain with
Code:
mdadm -E /dev/sda2
etc, for each of the components,
whether another array was using these devices. When that was the case, I would
Code:
mdadm --zero-superblock /dev/sdX2

six times, where X was a, than b... through e (and f for the spare).
OK. Then the command, as I said:
Code:
mdadm -C /dev/md1 -l10 -n6 -c128 /dev/sd[a-e]2

And then, if there is still a SATA HD and not blu-ray there, add the spare:
Code:
mdadm /dev/md1 -a /dev/sdf2

So, the old, due to which my systems are (two of a few boxes that can be mutually cloned between themselves) back on and working, hopefully faultlessly, but I will only later be able to tell you that...
So, the old, the working, the one that got my systems back into booting, created with the years old mdadm version is:
Code:
/dev/md1:
        Version : 0.90
  Creation Time : Thu Dec  2 21:52:13 2010
     Raid Level : raid6
     Array Size : 19277568 (18.38 GiB 19.74 GB)
  Used Dev Size : 6425856 (6.13 GiB 6.58 GB)
   Raid Devices : 5
  Total Devices : 6
Preferred Minor : 1
    Persistence : Superblock is persistent

    Update Time : Fri Dec  3 00:17:13 2010
          State : clean
 Active Devices : 5
Working Devices : 6
 Failed Devices : 0
  Spare Devices : 1

         Layout : left-symmetric
     Chunk Size : 128K

           UUID : e7913fc8:98140c0e:b744d456:635138f8
         Events : 0.8

    Number   Major   Minor   RaidDevice State
       0       8       66        0      active sync   /dev/sde2
       1       8       82        1      active sync   /dev/sdf2
       2       8        2        2      active sync   /dev/sda2
       3       8       18        3      active sync   /dev/sdb2
       4       8       34        4      active sync   /dev/sdc2

       5       8       50        -      spare   /dev/sdd2


And the new, the incompatible, the one that seems not to be autodetected in kernel, and so can't be created in some compatible way nor, sure you understand now, can't be booted no matter what, is:
Code:
/dev/md1:
        Version : 1.2
  Creation Time : Wed Dec  1 00:16:53 2010
     Raid Level : raid6
     Array Size : 19274880 (18.38 GiB 19.74 GB)
  Used Dev Size : 6424960 (6.13 GiB 6.58 GB)
   Raid Devices : 5
  Total Devices : 6
    Persistence : Superblock is persistent

    Update Time : Thu Dec  2 20:39:46 2010
          State : clean
 Active Devices : 5
Working Devices : 6
 Failed Devices : 0
  Spare Devices : 1

         Layout : left-symmetric
     Chunk Size : 128K

           Name : sysresccd:1  (local to host sysresccd)
           UUID : 3a921635:45f5b2cd:ec738fb7:233635f3
         Events : 20

    Number   Major   Minor   RaidDevice State
       0       8        2        0      active sync   /dev/sda2
       1       8       18        1      active sync   /dev/sdb2
       2       8       34        2      active sync   /dev/sdc2
       3       8       50        3      active sync   /dev/sdd2
       4       8       82        4      active sync   /dev/sdf2

       5       8       98        -      spare   /dev/sdg2

Surely there's no cloning there without further expertise and without more understanding of computing that I can boast with... If there's any cloning possible at all, no matter how clever you are!
These arrays are totally different and totally incompatible, as there was no way to boot into that root partition (I wrote how my systems are set in tha abovementioned thread)!
Until, that is, I reverted to reconstructing of the /dev/md1 with that very old mdadm, that gave me /dev/md1 (and other raid arrays) that worked up until now for all of these years.
That much on what I can tell for people possibly having issues similar to mine.
I wonder though, and am open to further looking into these issues in the sense of future possible total lack of support if the case should be, of superblock version 0.90, if my fears of that happening are real.
I mean, I am not an expert and even only finding this much took me real time.
I wonder how Alex who started the thread (are you still there, Alex?), got his raid to boot, if I understand well, by updating to superblock version 1.2?
Is it possibly because my booting kernel is on an /dev/md0 raid1 device of superblock version 0.90, and if I maybe got both the boot /dev/md0 and the root /dev/md1 to version 1.2 superblock, that I would solve the issue and get the system to boot.
I doubt, really, as the quote that I started this post with says otherwise, but...
Ok. I'm done.
It killed lots of my time all this useless mess!
What an incompatibility! I'll be amazed for days to come!
They got possibly dozens of thousands of computers using their old mdadm format, and they decide to just drop the format and rewrite the whole program with a new one, and let the users figure out just what exactly happened once those poor users simply find that their systems are unbootable just because they updated them, and there is no announcement nor any normal way but days on end of fault-finding detective-like work to do, to find out what happened...
And it was an improvement, the no more "in kernel autodetect"... God! I'm sorry for the guys who made this mess, as well as for my own useless plight!
End of my ramblings!
Hope this helps!
Back to top
View user's profile Send private message
miroR
l33t
l33t


Joined: 05 Mar 2008
Posts: 826

PostPosted: Wed Nov 16, 2011 6:49 pm    Post subject: Disappearing documents in history of open source programming Reply with quote

miroR wrote:
Hi, folks!
Pasting from:
http://www.mail-archive.com/gentoo-user@lists.gentoo.org/msg100212.html
Code:
> "v0.90 can be used with 'in kernel autodetect' (i.e. partition type 0xfd).
> v1 cannot (I consider this an improvement :-)"

And that is Kerin Millar quoting Neil Brown while replying to Mark Knecht.
I thank both of them now, Mark for his resolute and stubborn work with unbootable systems due to mdadm issues, and Kerin for his identifying the fault and solving of the problem.
...

That link above refers to another one tht would prove the point completely (IIRC), but the other one is now (Wed Nov 16 19:30:25 CET 2011) dead.

I call that:
Documents that make for history of open source programming disappearing

Oh well, things are bigger than us little.

But never mind, the mdadm guys didn't let us little users to have to go the initramfs and associates ways to just have our kernels boot from our RAID-6 or similar devices.

I found out the option that acoompishes creation of the old style RAID devices that can be booted by Linux kernels without the help of initramfs.

Here is what I went on and did because of the need that arose in these circumstances:
https://forums.gentoo.org/viewtopic-t-901036-highlight-.html#6873636
but if you want to find it pronto, just skim through to, say, the word raid6 (Ctrl + F raid6).

Code:
myBox # mdadm -D /dev/md2
/dev/md2:
        Version : 0.90
  Creation Time : Sun Dec  5 14:37:23 2010
     Raid Level : raid6
     Array Size : 7228800 (6.89 GiB 7.40 GB)
  Used Dev Size : 2409600 (2.30 GiB 2.47 GB)
   Raid Devices : 5
  Total Devices : 5
Preferred Minor : 2
    Persistence : Superblock is persistent

    Update Time : Wed Nov 16 06:35:41 2011
          State : clean
 Active Devices : 5
Working Devices : 5
 Failed Devices : 0
  Spare Devices : 0

         Layout : left-symmetric
     Chunk Size : 128K

           UUID : xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
         Events : 0.76

    Number   Major   Minor   RaidDevice State
       0       8       69        0      active sync   /dev/sde5
       1       8       53        1      active sync   /dev/sdd5
       2       8        5        2      active sync   /dev/sda5
       3       8       21        3      active sync   /dev/sdb5
       4       8       37        4      active sync   /dev/sdc5
myBox # mdadm -S /dev/md2
mdadm: stopped /dev/md2
myBox # mdadm -C /dev/md2 -l 5 -e 0.90 -c 128 -n 5 /dev/sda5  /dev/sdb5  /dev/sdc5  /dev/sdd5  /dev/sde5
mdadm: /dev/sda5 appears to be part of a raid array:
    level=raid6 devices=5 ctime=Sun Dec  5 14:37:23 2010
mdadm: /dev/sdb5 appears to be part of a raid array:
    level=raid6 devices=5 ctime=Sun Dec  5 14:37:23 2010
mdadm: /dev/sdc5 appears to be part of a raid array:
    level=raid6 devices=5 ctime=Sun Dec  5 14:37:23 2010
mdadm: /dev/sdd5 appears to be part of a raid array:
    level=raid6 devices=5 ctime=Sun Dec  5 14:37:23 2010
mdadm: /dev/sde5 appears to be part of a raid array:
    level=raid6 devices=5 ctime=Sun Dec  5 14:37:23 2010
Continue creating array? yes
mdadm: array /dev/md2 started.
myBox # mdadm -D /dev/md2
/dev/md2:
        Version : 0.90
  Creation Time : Wed Nov 16 10:15:48 2011
     Raid Level : raid5
     Array Size : 9638400 (9.19 GiB 9.87 GB)
  Used Dev Size : 2409600 (2.30 GiB 2.47 GB)
   Raid Devices : 5
  Total Devices : 5
Preferred Minor : 2
    Persistence : Superblock is persistent

    Update Time : Wed Nov 16 10:18:20 2011
          State : clean, degraded, recovering
 Active Devices : 4
Working Devices : 5
 Failed Devices : 0
  Spare Devices : 1

         Layout : left-symmetric
     Chunk Size : 128K

 Rebuild Status : 8% complete

           UUID : xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx (local to host myBox)
         Events : 0.5

    Number   Major   Minor   RaidDevice State
       0       8        5        0      active sync   /dev/sda5
       1       8       21        1      active sync   /dev/sdb5
       2       8       37        2      active sync   /dev/sdc5
       3       8       53        3      active sync   /dev/sdd5
       5       8       69        4      spare rebuilding   /dev/sde5
myBox #


Pls., bear in mind that the current version of mdadm is far off from 0.90!

Code:
myBox # mdadm --version
mdadm - v3.1.5 - 23rd March 2011
myBox #

So, the important option (I apologize if bigger guys find this obtrusive, but I always keep in mind how real beginners way back like I was years back, need a lot of easy explanation...) is:
Code:
-e 0.90


I hope they keep this option available in the future as well!
Back to top
View user's profile Send private message
miroR
l33t
l33t


Joined: 05 Mar 2008
Posts: 826

PostPosted: Sat Nov 19, 2011 2:47 pm    Post subject: Re: Disappearing documents in history of open source program Reply with quote

miroR wrote:

So, the important option (I apologize if bigger guys find this obtrusive, but I always keep in mind how real beginners way back like I was years back, need a lot of easy explanation...) is:
Code:
-e 0.90


I hope they keep this option available in the future as well!

God, was I wrong!!?
I can't even boot into this box anymore!
Well I can, but not from /boot
And the /boot is on /dev/md0 RAID1 device. Nothing apparently to do with /dev/md2 RAID5 device. But, and this is a paste from the current manual of mdadm:
Quote:

-e, --metadata=
Declare the style of RAID metadata (superblock) to be used. The
default is 1.2 for --create, and to guess for other operations.
The default can be overridden by setting the metadata value for the
CREATE keyword in mdadm.conf.

Options are:


0, 0.90
Use the original 0.90 format superblock. This format limits
arrays to 28 component devices and limits component devices
of levels 1 and greater to 2 terabytes. It is also possible
for there to be confusion about whether the superblock
applies to a whole device or just the last partition, if
that partition starts on a 64K boundary.


The confusion, I think, in my box, was total!
BTW, on another box, I have
Code:
# mdadm --version
mdadm - v3.1.4 - 31st August 2010

And pls. take notice that the mdadm author himself has obviously discovered and documented this wrong and unexpected behavior, obviously after that version! The following is a paste.
Quote:

...
-e, --metadata=
Declare the style of RAID metadata (superblock) to be used. The
default is 1.2 for --create, and to guess for other operations.
The default can be overridden by setting the metadata value for the
CREATE keyword in mdadm.conf.

Options are:


0, 0.90
Use the original 0.90 format superblock. This format limits
arrays to 28 component devices and limits component devices
of levels 1 and greater to 2 terabytes.


1, 1.0, 1.1, 1.2 default
...
Back to top
View user's profile Send private message
miroR
l33t
l33t


Joined: 05 Mar 2008
Posts: 826

PostPosted: Sat Nov 19, 2011 3:06 pm    Post subject: I think I am now trying to move away from booting from RAID Reply with quote

I'm seeking help also here now:
http://en.gentoo-wiki.com/wiki/Talk:Grub2#Reconstruct_a_completely_new_boot_partition_with_grub2_.3F
Back to top
View user's profile Send private message
miroR
l33t
l33t


Joined: 05 Mar 2008
Posts: 826

PostPosted: Thu Nov 24, 2011 5:15 am    Post subject: Re: I think I am now trying to move away from booting from R Reply with quote

miroR wrote:
I'm seeking help also here now:
http://en.gentoo-wiki.com/wiki/Talk:Grub2#Reconstruct_a_completely_new_boot_partition_with_grub2_.3F

That was uninformed of me to do, to put it mildly.
I studied for a few days, and didn't succeed in building initramfs for my purposes.
I tried with different kernels, rebuilding a few, new and old, and only recently I am able to boot normally into my system.
In short, my best guess is that mdadm after the above changes (https://forums.gentoo.org/viewtopic-t-852333-highlight-.html#6876910), aided in its arcane and uncanny, completely untraceable changes to naming of my raid6 disks with the unpredictable behavior of my main board (some 6 yrs old Abit AT8-32X, have it on 3(4) systems perfectly cloneable once a single one of them is is good order, else I wouldn't have benefit in so much compiling time after time)...
In short, my best guess is that after the above changes (pls. see esp.
https://forums.gentoo.org/viewtopic-t-852333-highlight-.html#6876910), the mdadm, aided by my MBO, kept needing completely different root argument in the kernel line.
I have five 250GB drives, 5+ yrs old technology, in every system arranged in mostly raid6 devices, one for root system, one for swat, and one raid6 device separately for data solely.
I have separate /boot partitions, and on the root system I have everything else that appartains to system.
/boot was on a raid1 software raid (the mirror raid).
That /boot device just couldn't anymore be found no matter what!
I really think that it was the "-e 0.90" in the command in the link above that made them "disappear"... along with the MBO's of mine magical unpredictability of drives' order shown to the kernel.
What I did is this:
Code:
# dd if=/dev/md0 | gzip -6c | split -d -b1085m - date-time_system_boot_md0.dd

to get a file, only 1 file in this case:
date-time_system_boot_md0.dd00, because the boot is only 300MB or so, the part there "split -d -b1085m -" is redundant in this case, but for backing up my root filesystem, with /usr /var /home, all in it) it is rather necessary, so I keep it all the time.
If someone finds this usefule, just use your date and time, and your system (host) name and so on in the naming.
I stopped and --zero-superblock'ed the raid1 drive's devices (there I had two more mirror raids, because you never knew which disk the MBO will decide it was the first disk... My MBO kept very unpredictable decisions in that respect, so I had them all clones of each other (one of course of the three was with a missing device).
And perfectly functional I got 5 separate devices /dev/sda1 to /dev/sde1 with:
Code:
cat date-time_system_boot_md0.dd00 | gunzip | dd=/dev/sda1

and so on (just substitue a with b, then c, d and finally e).
Now the boot device, the first disk (not necessarily the same that was /dev/sda1 the last time I booted my system (believe you me, it's not predictable --well, not easily, and why bother, read on!-- but I cloned them all equal so it didn't matter), actually the first two boot devices, like /dev/sda1 and /dev/sdb1 became visible, finally.
That didn't really mean that I could boot into my system, though!
No!
Now we got the beautiful joyous changes ever since the developers behind mdadm decided it wasn't for common users to manage software raids, but that it should be made difficult to them and ruin days and days in their lives getting their old mdadm made raid5 and raid6 and raid10 systems work with the unavoidable new mdadm's that are made not for common users anymore because they don't let no kernels autodect them anymore!
OK, let go of it, let go!
Forget that issue!
I haven't, as I said, made it to build such an initramfs that would give the kernels and the raids of mine proper understanding of each other, and I did spend a few days studying how to do it.
I think I'll have to go back to that issue with plenty more working hours yet, but I am actually booting into my system again at this time.
Somehow, in the end, one of my kernels that I built, started recognizing my raid6 where the root ("/") is under the name:
Code:
/dev/md125

Not that it was created as such, neither is it in my fstab as such, but it seems now to need this kernel boot line:
Code:
root=/dev/md125

and then it boots OK.
I'm afraid my mdadm woes are really not over.
I might be back to find out if anyone has a good understanding and clear tips on how to compile mdadm in:
Code:
/usr/src/initramfs

with:
Code:
CONFIG_INITRAMFS_SOURCE="/usr/src/initramfs"

in the kernel and alike.
The guides that I read under domain gentoo.org and elsewhere haven't been sufficient for this mdadm problem on my hands.
I might be back to give the links here of the guides that I used, but I am really tired right now.
Thanks if someone with similar problems, who made transitions to initramfs kernel-mdadm happily-communicating-and-agreeing systems, give more insight to all this!
Back to top
View user's profile Send private message
Display posts from previous:   
Reply to topic    Gentoo Forums Forum Index Kernel & Hardware All times are GMT
Page 1 of 1

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum