Gentoo Forums
Gentoo Forums
Gentoo Forums
Quick Search: in
mdadm-3.2.4 skips many partitions but still starts...!
View unanswered posts
View posts from last 24 hours

 
Reply to topic    Gentoo Forums Forum Index Kernel & Hardware
View previous topic :: View next topic  
Author Message
Havin_it
Veteran
Veteran


Joined: 17 Jul 2005
Posts: 1246
Location: Edinburgh, UK

PostPosted: Mon May 14, 2012 11:07 am    Post subject: mdadm-3.2.4 skips many partitions but still starts...! Reply with quote

Hello,

So I just upgraded to mdadm-3.2.4 and gentoo-sources-3.3.5 at the same time, followed by a reboot. (The dreaded udev-182 happened before the previous reboot, and has been fine, just to eliminate that.) The result was that none of my 3 mdadm/raid5 volumes came up properly. Downgrading to mdadm-3.2.3-r1 (still with the newer kernel) has got things working again, so mdadm definitely seems to be the baddie.

My array layout is best explained with my /etc/mdadm.conf:
Code:
ARRAY /dev/md0 devices=/dev/sdb1,/dev/sdc1,/dev/sdd1
ARRAY /dev/md1 devices=/dev/sda2,/dev/sdb2,/dev/sdc2,/dev/sdd2
ARRAY /dev/md2 devices=/dev/sdb3,/dev/sdc3,/dev/sdd3

Note I don't have DEVICE lines in this file, and kernel autodetection is still enabled; IIRC these were intended as a backup in case the kernel/udev changed the raid device names, which has happened before.

The root partition is on /dev/sda1, not part of any array. Nothing "system" is on the arrays, apart from /var/tmp and /home (in fact these are in luks volumes on one of the arrays, but I digress).

Now I'm back to a working system (that is, including the stuff that depends on the arrays) I need to use it for the next few hours, but I can revert to the broken mdadm later tonight and get any output you request to help with diagnosis.

What I can say (from memory; output's gone out the top of my console buffer now sadly) is that /proc/mdstat was showing only one partition for each raid volume, and was flagged 'inactive'. However mdadm had not failed, and nagios did not see anything to complain about (that seems very wrong...). One obvious indicator that things were wrong was that the partitions on one of the arrays (md1p1 and md1p2, the other arrays are used as raw devices) did not show up in /dev.

Something else that I noticed while troubleshooting, and just seems odd (although it's the same with all combos of new/old kernel/mdadm) is the contents of /dev/disk/by-{part}uuid:
Code:
hazel linux # ls -l /dev/disk/by-{part,}uuid
/dev/disk/by-partuuid:
total 0
lrwxrwxrwx 1 root root 10 May 14 11:05 34debc60-f880-4808-acba-fd5da4d105f4 -> ../../sda1
lrwxrwxrwx 1 root root 10 May 14 11:05 5d1cf95c-0dd3-4d85-974d-6fb0dc33ede8 -> ../../sdc3
lrwxrwxrwx 1 root root 10 May 14 11:05 6377b145-5947-4c11-b953-3b94348e057c -> ../../sdc2
lrwxrwxrwx 1 root root 10 May 14 11:05 6a266299-683b-44a8-b022-d1605c0044f5 -> ../../sdc1
lrwxrwxrwx 1 root root 10 May 14 11:05 8f5aeb1b-2351-431d-b2f3-ddbd38ce7042 -> ../../sda2

/dev/disk/by-uuid:
total 0
lrwxrwxrwx 1 root root 11 May 14 11:05 1a22285c-aa39-49d6-b75e-65a30aa7ae76 -> ../../md1p2
lrwxrwxrwx 1 root root 10 May 14 11:05 233eec75-460b-4505-9d20-b7ce2a5517fc -> ../../dm-0
lrwxrwxrwx 1 root root 10 May 14 11:05 5b8bee5f-7482-4054-b095-23f0dafe9cf0 -> ../../dm-1
lrwxrwxrwx 1 root root  9 May 14 11:05 8177c540-d573-4b6e-be97-a179b177eda8 -> ../../md2
lrwxrwxrwx 1 root root 10 May 14 11:05 96ca2c9f-76cf-469d-b3e4-a1e65ff04b1e -> ../../dm-2
lrwxrwxrwx 1 root root 11 May 14 11:05 9d1182dc-f309-4ff8-9824-dd4ae7ab7fd6 -> ../../md1p1
lrwxrwxrwx 1 root root 10 May 14 11:05 c1297020-b05c-4656-84b1-e91eba898163 -> ../../sda1


Why such a strange cross-section of the devices? These may have always been like this, but I have a funny feeling (I'll check later) that the partitions shown in by-partuuid are the same (and only) ones that showed up in /proc/mdstat.

The drives sd[b,c,d] are identical in hardware and (GPT) partitioning. I achieved this by setting up one drive and copying its partition table to the others. I did give them new GPT labels, but perhaps there's something else I failed to do there, that causes confusion? If so, why did it only become a problem now?

TIA for any ideas on this one. Just let me know what output/config you'd like to see and I'll post it later.
Back to top
View user's profile Send private message
LordVan
Developer
Developer


Joined: 28 Nov 2002
Posts: 67
Location: Austria

PostPosted: Mon May 14, 2012 11:21 am    Post subject: Reply with quote

you could try specifying the UUIDs (can be generated quite nicely with mdadm --examine )
_________________
I don't suffer from insanity. I enjoy every minute of it.
Back to top
View user's profile Send private message
Havin_it
Veteran
Veteran


Joined: 17 Jul 2005
Posts: 1246
Location: Edinburgh, UK

PostPosted: Mon May 14, 2012 1:44 pm    Post subject: Reply with quote

Hi LordVan, thanks for the reply :D

I'll certainly try this, though won't the missing /dev symlinks be an issue for that?

Just to sort of answer my own query above, I checked all the UUIDs and they seem OK: the array ones are correctly grouped, the device ones are all unique.
Code:
hazel ~ # for d in b1 c1 d1 a2 b2 c2 d2 b3 c3 d3; do echo sd$d:; mdadm -E /dev/sd$d |grep UUID; done
#1st array
sdb1:
     Array UUID : 3505e7ec:202fabce:86aee957:c134a8cb
    Device UUID : 61249d56:ae319f04:9589ef69:8ac6dc21
sdc1:
     Array UUID : 3505e7ec:202fabce:86aee957:c134a8cb
    Device UUID : d02b97d2:046c7db7:4f01603d:9d81af26
sdd1:
     Array UUID : 3505e7ec:202fabce:86aee957:c134a8cb
    Device UUID : 00a0cc11:6ce72287:d891130c:2a062177

#2nd array
sda2:
     Array UUID : c988ae5c:f643b427:37db5db0:6627531a
    Device UUID : d028712d:d0933028:55d972ff:08c8d432
sdb2:
     Array UUID : c988ae5c:f643b427:37db5db0:6627531a
    Device UUID : 23e2de36:26083619:fb16d0d9:67afda4c
sdc2:
     Array UUID : c988ae5c:f643b427:37db5db0:6627531a
    Device UUID : 5c348138:d89fa82a:a07a15a3:cebcd0da
sdd2:
     Array UUID : c988ae5c:f643b427:37db5db0:6627531a
    Device UUID : bfb91875:2484f14d:262954cd:de178c08

#3rd array
sdb3:
     Array UUID : 8e9c0244:726bfd56:30dbfdde:3181043f
    Device UUID : 7bb1e742:c1c40936:68d95775:d2d770eb
sdc3:
     Array UUID : 8e9c0244:726bfd56:30dbfdde:3181043f
    Device UUID : 2cc1655e:6e47f94b:3efcdd57:2e813ca0
sdd3:
     Array UUID : 8e9c0244:726bfd56:30dbfdde:3181043f
    Device UUID : 79479224:99f98f54:0bde2c88:19ea6721


Also, the manpage isn't clear on this: should my ARRAY lines actually work/do anything without DEVICE lines before them?
Back to top
View user's profile Send private message
LordVan
Developer
Developer


Joined: 28 Nov 2002
Posts: 67
Location: Austria

PostPosted: Mon May 14, 2012 2:27 pm    Post subject: Reply with quote

no clue sorry.

here are the lines i appended to my mdadm.conf (output from mdadm)

Code:
ARRAY /dev/md0 level=raid1 num-devices=2 UUID=4b8679b7:f94e0498:655a214d:2935de26
ARRAY /dev/md1 level=raid1 num-devices=2 UUID=311983c7:5faad243:7720c9cf:fa81c470
ARRAY /dev/md2 level=raid1 num-devices=2 UUID=569e8bb3:43e94fde:36e678bd:2460358c
ARRAY /dev/md3 level=raid1 num-devices=2 UUID=9b3fab54:844b372c:0e61ae67:ef761534


those work for me so you can use them as example
_________________
I don't suffer from insanity. I enjoy every minute of it.
Back to top
View user's profile Send private message
Havin_it
Veteran
Veteran


Joined: 17 Jul 2005
Posts: 1246
Location: Edinburgh, UK

PostPosted: Mon May 14, 2012 2:38 pm    Post subject: Reply with quote

Thanks, that does guide me a bit. I thought it was all the device UUIDs that went in, but I take it those are just the array UUIDs.

BTW, do you mean the above is output directly from mdadm? If so, what's the full command if I wanted to do likewise?
Back to top
View user's profile Send private message
LordVan
Developer
Developer


Joined: 28 Nov 2002
Posts: 67
Location: Austria

PostPosted: Tue May 15, 2012 5:07 am    Post subject: Reply with quote

I looked it up now (and tried it again since I wanted to make sure:

Code:
mdadm --examine --scan

_________________
I don't suffer from insanity. I enjoy every minute of it.
Back to top
View user's profile Send private message
Havin_it
Veteran
Veteran


Joined: 17 Jul 2005
Posts: 1246
Location: Edinburgh, UK

PostPosted: Tue May 15, 2012 12:05 pm    Post subject: Reply with quote

OK, I changed mdadm conf to:
Code:
ARRAY /dev/md0 metadata=1.2 UUID=3505e7ec:202fabce:86aee957:c134a8cb name=hazel:0
ARRAY /dev/md1 metadata=1.2 UUID=c988ae5c:f643b427:37db5db0:6627531a name=hazel:1
ARRAY /dev/md2 metadata=1.2 UUID=8e9c0244:726bfd56:30dbfdde:3181043f name=hazel:2


Also, I added "raid=noautodetect" to my kernel boot params, and added mdraid to the boot runlevel per the einfo message on the latest ebuild (I only had mdadm there before).

No improvement :(

Here's an example of /proc/mdstat in broken mode:
Code:
Personalities : [raid0] [raid1] [raid10] [raid6] [raid5] [raid4] [faulty]
md1 : inactive sdb2[1](S)
      223689728 blocks super 1.2
       
md2 : inactive sdd3[3](S)
      261621760 blocks super 1.2
       
md0 : inactive sdc1[1](S)
      3070976 blocks super 1.2
       
unused devices: <none>

I say "example" because the three partitions that appear seem to be different every time.
Back to top
View user's profile Send private message
Havin_it
Veteran
Veteran


Joined: 17 Jul 2005
Posts: 1246
Location: Edinburgh, UK

PostPosted: Tue May 15, 2012 12:57 pm    Post subject: Reply with quote

Next thing I tried: commenting-out everything in mdadm.conf, kernel autodetection still turned off. Result:
Code:
Personalities : [raid0] [raid1] [raid10] [raid6] [raid5] [raid4] [faulty]
md125 : active raid5 sdb1[0] sdc1[1]
      6141696 blocks super 1.2 level 5, 128k chunk, algorithm 2 [3/2] [UU_]
     
md126 : active raid5 sdb3[0] sdd3[3]
      523243008 blocks super 1.2 level 5, 128k chunk, algorithm 2 [3/2] [U_U]
     
md127 : inactive sda2[0](S)
      223689728 blocks super 1.2
       
md0 : inactive sdd1[3](S)
      3070976 blocks super 1.2
       
md2 : inactive sdc3[1](S)
      261621760 blocks super 1.2
       
md1 : inactive sdb2[1](S)
      223689728 blocks super 1.2
       
unused devices: <none>

So, a whole different muddle. Is any of this helping?

EDIT: And here it is with autodetect turned back on:
Code:
Personalities : [raid0] [raid1] [raid10] [raid6] [raid5] [raid4] [faulty]
md125 : active raid5 sdb1[0] sdd1[3]
      6141696 blocks super 1.2 level 5, 128k chunk, algorithm 2 [3/2] [U_U]
     
md126 : active raid5 sdb3[0] sdd3[3]
      523243008 blocks super 1.2 level 5, 128k chunk, algorithm 2 [3/2] [U_U]
     
md127 : inactive sda2[0](S)
      223689728 blocks super 1.2
       
md2 : inactive sdc3[1](S)
      261621760 blocks super 1.2
       
md1 : inactive sdb2[1](S)
      223689728 blocks super 1.2
       
md0 : inactive sdc1[1](S)
      3070976 blocks super 1.2
       
unused devices: <none>


Note that in both cases the drives that are assigned are different, and some are missing.
Back to top
View user's profile Send private message
Havin_it
Veteran
Veteran


Joined: 17 Jul 2005
Posts: 1246
Location: Edinburgh, UK

PostPosted: Tue May 15, 2012 2:09 pm    Post subject: Reply with quote

Right, I think I've now tried every combo of settings I could think of. Finally, I did:

mdadm-3.2.3-r1
kernel version: 3.3.5 (newer)
kernel autodetect: on
mdraid in boot runlevel: yes
mdadm.conf: empty

Works perfectly, device names as before.

I don't think I'm doing anything wrong here, so I'm gonna bugreport it.
Back to top
View user's profile Send private message
djdunn
l33t
l33t


Joined: 26 Dec 2004
Posts: 810

PostPosted: Wed May 16, 2012 7:36 am    Post subject: Reply with quote

this is all problems with superblocks most likely, especially the version 1.2 that doesn't work with kernel autodetection, there's also a thing with mdadm naming stuff md126 and md127 so on, i just gave in and let mdadm win that fight i didn't care that much and changed my systems appropriately. if you go in and wipe and rewrite your superblocks you can probably do it with the newer versions
_________________
“Music is a moral law. It gives a soul to the Universe, wings to the mind, flight to the imagination, a charm to sadness, gaiety and life to everything. It is the essence of order, and leads to all that is good and just and beautiful.”

― Plato
Back to top
View user's profile Send private message
Havin_it
Veteran
Veteran


Joined: 17 Jul 2005
Posts: 1246
Location: Edinburgh, UK

PostPosted: Wed May 16, 2012 10:14 am    Post subject: Reply with quote

From my findings before, I assumed that the kernel had been doing the assembly (since I didn't have mdraid in init, only mdadm). But then I read this in the kernel docs:
/usr/src/linux/Documentation/md.txt wrote:
Boot time autodetection of RAID arrays
--------------------------------------

When md is compiled into the kernel (not as module), partitions of
type 0xfd are scanned and automatically assembled into RAID arrays.
This autodetection may be suppressed with the kernel parameter
"raid=noautodetect". As of kernel 2.6.9, only drives with a type 0
superblock can be autodetected and run at boot time.

The kernel parameter "raid=partitionable" (or "raid=part") means
that all auto-detected arrays are assembled as partitionable.


No idea what "type 0xfd" or "type 0 superblock" mean though, anyone?

FWIW, here's the full info from one partition and its array:
Code:
hazel ~ # mdadm -E /dev/sdb1
/dev/sdb1:
          Magic : a92b4efc
        Version : 1.2
    Feature Map : 0x0
     Array UUID : 3505e7ec:202fabce:86aee957:c134a8cb
           Name : hazel:0  (local to host hazel)
  Creation Time : Tue Feb 14 18:58:30 2012
     Raid Level : raid5
   Raid Devices : 3

 Avail Dev Size : 6141952 (2.93 GiB 3.14 GB)
     Array Size : 12283392 (5.86 GiB 6.29 GB)
  Used Dev Size : 6141696 (2.93 GiB 3.14 GB)
    Data Offset : 2048 sectors
   Super Offset : 8 sectors
          State : clean
    Device UUID : 61249d56:ae319f04:9589ef69:8ac6dc21

    Update Time : Wed May 16 10:55:10 2012
       Checksum : 73f249d8 - correct
         Events : 18

         Layout : left-symmetric
     Chunk Size : 128K

   Device Role : Active device 0
   Array State : AAA ('A' == active, '.' == missing)

hazel ~ # mdadm -D /dev/md0
/dev/md0:
        Version : 1.2
  Creation Time : Tue Feb 14 18:58:30 2012
     Raid Level : raid5
     Array Size : 6141696 (5.86 GiB 6.29 GB)
  Used Dev Size : 3070848 (2.93 GiB 3.14 GB)
   Raid Devices : 3
  Total Devices : 3
    Persistence : Superblock is persistent

    Update Time : Wed May 16 10:55:10 2012
          State : clean
 Active Devices : 3
Working Devices : 3
 Failed Devices : 0
  Spare Devices : 0

         Layout : left-symmetric
     Chunk Size : 128K

           Name : hazel:0  (local to host hazel)
           UUID : 3505e7ec:202fabce:86aee957:c134a8cb
         Events : 18

    Number   Major   Minor   RaidDevice State
       0       8       17        0      active sync   /dev/sdb1
       1       8       33        1      active sync   /dev/sdc1
       3       8       49        2      active sync   /dev/sdd1
Back to top
View user's profile Send private message
Havin_it
Veteran
Veteran


Joined: 17 Jul 2005
Posts: 1246
Location: Edinburgh, UK

PostPosted: Wed May 16, 2012 10:55 am    Post subject: Reply with quote

Also should mention that even with the new/bad mdadm, if I stop all the malformed arrays then issue mdadm -As (as the mdraid initscript does), it assembles perfectly, so whatever goes wrong is specific to the boot/init process.

Curiouser and curiouser...
Back to top
View user's profile Send private message
djdunn
l33t
l33t


Joined: 26 Dec 2004
Posts: 810

PostPosted: Thu May 17, 2012 10:00 am    Post subject: Reply with quote

type 0xfd is partition type linux raid
autodetection via kernel does not work with superblock version 1.2

superblock version 1.2 you can only boot / on raid by running mdadm with an initramfs NOT dmraid
_________________
“Music is a moral law. It gives a soul to the Universe, wings to the mind, flight to the imagination, a charm to sadness, gaiety and life to everything. It is the essence of order, and leads to all that is good and just and beautiful.”

― Plato
Back to top
View user's profile Send private message
Havin_it
Veteran
Veteran


Joined: 17 Jul 2005
Posts: 1246
Location: Edinburgh, UK

PostPosted: Thu May 17, 2012 10:29 am    Post subject: Reply with quote

djdunn wrote:
type 0xfd is partition type linux raid
autodetection via kernel does not work with superblock version 1.2

superblock version 1.2 you can only boot / on raid by running mdadm with an initramfs NOT dmraid


OK, so in my previous setting (or now) the kernel was not doing autodetection, but as at that point mdadm was not doing assembly either (mdraid was not in boot runlevel; mdadm initscript doesn't assemble arrays, just monitors them) how did/does it work? I take it udev must be involved, but it does seem conclusive that mdadm is the package that's to blame...

Huh?

What exactly does udev do as part of this process?

PS: not sure if you were being specific or illustrative, but just to reiterate my / isn't on raid. Also isn't dmraid a different package?
Back to top
View user's profile Send private message
esperto
Apprentice
Apprentice


Joined: 27 Dec 2004
Posts: 158
Location: Brazil

PostPosted: Sat May 19, 2012 8:25 pm    Post subject: Reply with quote

I've just had the same problem, upgraded the mdadm to version 3.2.4 and when I rebooted the md0 was inactive and only showed my sdd1 partition as part of it, I've immediately rolled back to version 3.2.3-r1 and everything was back to normal, definatly something fishy here.

below is my what I have in mdadm.conf
Code:


ARRAY /dev/md0 metadata=1.2 name=htpc:0 UUID=b1aa480d:af00e6fd:c35876b8:00ae9e55



I'm currently running kernel 3.2.12 from gentoo-sources and I don' t have mdraid on boot.

What should I do? just keep the 3.2.3-r1 version for now and wait for a new update or clean out mdadm.conf and add mdraid to boot sequence? I'm afraid removing the array line from mdadm.conf will screw up my raid.
_________________
nasci pelado, careca e sem dente, o que vier é lucro
Back to top
View user's profile Send private message
Havin_it
Veteran
Veteran


Joined: 17 Jul 2005
Posts: 1246
Location: Edinburgh, UK

PostPosted: Sun May 20, 2012 8:19 am    Post subject: Reply with quote

Hi esperto, nice to hear I'm not alone :D

If your issue is the same, I don't think it'll matter what you do if you upgrade again: I've flipped every variable I could think of and still no joy. If you're willing to try it again though, please try them yourself and add your findings to my bug report. Useful info would be:

* does it work using only mdadm (ie if you add raid=noautodetect to your GRUB boot command-line and add mdraid to boot runlevel)? In your case, does this work with mdadm-3.2.3-r1 either?

* Any change if you comment-out the ARRAY line in mdadm.conf? (For me, with 3.2.3-r1 it still works anyway.)

* With mdadm-3.2.4, if you do mdadm -S <each array device mentioned in mdstat>, then mdadm -As, does the array assemble correctly?

* Are your RAID partitions GPT or MS-DOS? How many in the array, what RAID level, etc.
Back to top
View user's profile Send private message
rcb1974
n00b
n00b


Joined: 12 Mar 2003
Posts: 56
Location: Ithaca, NY, USA

PostPosted: Mon May 21, 2012 7:10 pm    Post subject: I have the same problem Reply with quote

I have the same problem after upgrading mdadm.
I'm now using mdadm 3.2.4 and vanilla sources 3.3.6.
All my v0.9 superblock software RAID1 arrays (except the /dev/md0 root volume) no longer get autoassembled.
In order to mount the arrays, I first have to stop them, and then assemble them.

Example:

Code:
mdadm --stop /dev/md1
mdadm --assemble /dev/md1
mount /dev/md1
Back to top
View user's profile Send private message
Havin_it
Veteran
Veteran


Joined: 17 Jul 2005
Posts: 1246
Location: Edinburgh, UK

PostPosted: Mon May 21, 2012 9:44 pm    Post subject: Reply with quote

Interesting - I guess if md0 is your root that you have an initramfs with mdadm in, can you think of any other differences between md1 (and any others) and md0?

Also, does md1 (and others) come up correctly if you just do "mdadm -As"?
Back to top
View user's profile Send private message
Display posts from previous:   
Reply to topic    Gentoo Forums Forum Index Kernel & Hardware All times are GMT
Page 1 of 1

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum