View previous topic :: View next topic |
Author |
Message |
Havin_it Veteran
Joined: 17 Jul 2005 Posts: 1246 Location: Edinburgh, UK
|
Posted: Mon May 14, 2012 11:07 am Post subject: mdadm-3.2.4 skips many partitions but still starts...! |
|
|
Hello,
So I just upgraded to mdadm-3.2.4 and gentoo-sources-3.3.5 at the same time, followed by a reboot. (The dreaded udev-182 happened before the previous reboot, and has been fine, just to eliminate that.) The result was that none of my 3 mdadm/raid5 volumes came up properly. Downgrading to mdadm-3.2.3-r1 (still with the newer kernel) has got things working again, so mdadm definitely seems to be the baddie.
My array layout is best explained with my /etc/mdadm.conf:
Code: | ARRAY /dev/md0 devices=/dev/sdb1,/dev/sdc1,/dev/sdd1
ARRAY /dev/md1 devices=/dev/sda2,/dev/sdb2,/dev/sdc2,/dev/sdd2
ARRAY /dev/md2 devices=/dev/sdb3,/dev/sdc3,/dev/sdd3 |
Note I don't have DEVICE lines in this file, and kernel autodetection is still enabled; IIRC these were intended as a backup in case the kernel/udev changed the raid device names, which has happened before.
The root partition is on /dev/sda1, not part of any array. Nothing "system" is on the arrays, apart from /var/tmp and /home (in fact these are in luks volumes on one of the arrays, but I digress).
Now I'm back to a working system (that is, including the stuff that depends on the arrays) I need to use it for the next few hours, but I can revert to the broken mdadm later tonight and get any output you request to help with diagnosis.
What I can say (from memory; output's gone out the top of my console buffer now sadly) is that /proc/mdstat was showing only one partition for each raid volume, and was flagged 'inactive'. However mdadm had not failed, and nagios did not see anything to complain about (that seems very wrong...). One obvious indicator that things were wrong was that the partitions on one of the arrays (md1p1 and md1p2, the other arrays are used as raw devices) did not show up in /dev.
Something else that I noticed while troubleshooting, and just seems odd (although it's the same with all combos of new/old kernel/mdadm) is the contents of /dev/disk/by-{part}uuid:
Code: | hazel linux # ls -l /dev/disk/by-{part,}uuid
/dev/disk/by-partuuid:
total 0
lrwxrwxrwx 1 root root 10 May 14 11:05 34debc60-f880-4808-acba-fd5da4d105f4 -> ../../sda1
lrwxrwxrwx 1 root root 10 May 14 11:05 5d1cf95c-0dd3-4d85-974d-6fb0dc33ede8 -> ../../sdc3
lrwxrwxrwx 1 root root 10 May 14 11:05 6377b145-5947-4c11-b953-3b94348e057c -> ../../sdc2
lrwxrwxrwx 1 root root 10 May 14 11:05 6a266299-683b-44a8-b022-d1605c0044f5 -> ../../sdc1
lrwxrwxrwx 1 root root 10 May 14 11:05 8f5aeb1b-2351-431d-b2f3-ddbd38ce7042 -> ../../sda2
/dev/disk/by-uuid:
total 0
lrwxrwxrwx 1 root root 11 May 14 11:05 1a22285c-aa39-49d6-b75e-65a30aa7ae76 -> ../../md1p2
lrwxrwxrwx 1 root root 10 May 14 11:05 233eec75-460b-4505-9d20-b7ce2a5517fc -> ../../dm-0
lrwxrwxrwx 1 root root 10 May 14 11:05 5b8bee5f-7482-4054-b095-23f0dafe9cf0 -> ../../dm-1
lrwxrwxrwx 1 root root 9 May 14 11:05 8177c540-d573-4b6e-be97-a179b177eda8 -> ../../md2
lrwxrwxrwx 1 root root 10 May 14 11:05 96ca2c9f-76cf-469d-b3e4-a1e65ff04b1e -> ../../dm-2
lrwxrwxrwx 1 root root 11 May 14 11:05 9d1182dc-f309-4ff8-9824-dd4ae7ab7fd6 -> ../../md1p1
lrwxrwxrwx 1 root root 10 May 14 11:05 c1297020-b05c-4656-84b1-e91eba898163 -> ../../sda1 |
Why such a strange cross-section of the devices? These may have always been like this, but I have a funny feeling (I'll check later) that the partitions shown in by-partuuid are the same (and only) ones that showed up in /proc/mdstat.
The drives sd[b,c,d] are identical in hardware and (GPT) partitioning. I achieved this by setting up one drive and copying its partition table to the others. I did give them new GPT labels, but perhaps there's something else I failed to do there, that causes confusion? If so, why did it only become a problem now?
TIA for any ideas on this one. Just let me know what output/config you'd like to see and I'll post it later. |
|
Back to top |
|
|
LordVan Developer
Joined: 28 Nov 2002 Posts: 67 Location: Austria
|
Posted: Mon May 14, 2012 11:21 am Post subject: |
|
|
you could try specifying the UUIDs (can be generated quite nicely with mdadm --examine ) _________________ I don't suffer from insanity. I enjoy every minute of it. |
|
Back to top |
|
|
Havin_it Veteran
Joined: 17 Jul 2005 Posts: 1246 Location: Edinburgh, UK
|
Posted: Mon May 14, 2012 1:44 pm Post subject: |
|
|
Hi LordVan, thanks for the reply
I'll certainly try this, though won't the missing /dev symlinks be an issue for that?
Just to sort of answer my own query above, I checked all the UUIDs and they seem OK: the array ones are correctly grouped, the device ones are all unique.
Code: | hazel ~ # for d in b1 c1 d1 a2 b2 c2 d2 b3 c3 d3; do echo sd$d:; mdadm -E /dev/sd$d |grep UUID; done
#1st array
sdb1:
Array UUID : 3505e7ec:202fabce:86aee957:c134a8cb
Device UUID : 61249d56:ae319f04:9589ef69:8ac6dc21
sdc1:
Array UUID : 3505e7ec:202fabce:86aee957:c134a8cb
Device UUID : d02b97d2:046c7db7:4f01603d:9d81af26
sdd1:
Array UUID : 3505e7ec:202fabce:86aee957:c134a8cb
Device UUID : 00a0cc11:6ce72287:d891130c:2a062177
#2nd array
sda2:
Array UUID : c988ae5c:f643b427:37db5db0:6627531a
Device UUID : d028712d:d0933028:55d972ff:08c8d432
sdb2:
Array UUID : c988ae5c:f643b427:37db5db0:6627531a
Device UUID : 23e2de36:26083619:fb16d0d9:67afda4c
sdc2:
Array UUID : c988ae5c:f643b427:37db5db0:6627531a
Device UUID : 5c348138:d89fa82a:a07a15a3:cebcd0da
sdd2:
Array UUID : c988ae5c:f643b427:37db5db0:6627531a
Device UUID : bfb91875:2484f14d:262954cd:de178c08
#3rd array
sdb3:
Array UUID : 8e9c0244:726bfd56:30dbfdde:3181043f
Device UUID : 7bb1e742:c1c40936:68d95775:d2d770eb
sdc3:
Array UUID : 8e9c0244:726bfd56:30dbfdde:3181043f
Device UUID : 2cc1655e:6e47f94b:3efcdd57:2e813ca0
sdd3:
Array UUID : 8e9c0244:726bfd56:30dbfdde:3181043f
Device UUID : 79479224:99f98f54:0bde2c88:19ea6721 |
Also, the manpage isn't clear on this: should my ARRAY lines actually work/do anything without DEVICE lines before them? |
|
Back to top |
|
|
LordVan Developer
Joined: 28 Nov 2002 Posts: 67 Location: Austria
|
Posted: Mon May 14, 2012 2:27 pm Post subject: |
|
|
no clue sorry.
here are the lines i appended to my mdadm.conf (output from mdadm)
Code: | ARRAY /dev/md0 level=raid1 num-devices=2 UUID=4b8679b7:f94e0498:655a214d:2935de26
ARRAY /dev/md1 level=raid1 num-devices=2 UUID=311983c7:5faad243:7720c9cf:fa81c470
ARRAY /dev/md2 level=raid1 num-devices=2 UUID=569e8bb3:43e94fde:36e678bd:2460358c
ARRAY /dev/md3 level=raid1 num-devices=2 UUID=9b3fab54:844b372c:0e61ae67:ef761534
|
those work for me so you can use them as example _________________ I don't suffer from insanity. I enjoy every minute of it. |
|
Back to top |
|
|
Havin_it Veteran
Joined: 17 Jul 2005 Posts: 1246 Location: Edinburgh, UK
|
Posted: Mon May 14, 2012 2:38 pm Post subject: |
|
|
Thanks, that does guide me a bit. I thought it was all the device UUIDs that went in, but I take it those are just the array UUIDs.
BTW, do you mean the above is output directly from mdadm? If so, what's the full command if I wanted to do likewise? |
|
Back to top |
|
|
LordVan Developer
Joined: 28 Nov 2002 Posts: 67 Location: Austria
|
Posted: Tue May 15, 2012 5:07 am Post subject: |
|
|
I looked it up now (and tried it again since I wanted to make sure:
Code: | mdadm --examine --scan |
_________________ I don't suffer from insanity. I enjoy every minute of it. |
|
Back to top |
|
|
Havin_it Veteran
Joined: 17 Jul 2005 Posts: 1246 Location: Edinburgh, UK
|
Posted: Tue May 15, 2012 12:05 pm Post subject: |
|
|
OK, I changed mdadm conf to:
Code: | ARRAY /dev/md0 metadata=1.2 UUID=3505e7ec:202fabce:86aee957:c134a8cb name=hazel:0
ARRAY /dev/md1 metadata=1.2 UUID=c988ae5c:f643b427:37db5db0:6627531a name=hazel:1
ARRAY /dev/md2 metadata=1.2 UUID=8e9c0244:726bfd56:30dbfdde:3181043f name=hazel:2 |
Also, I added "raid=noautodetect" to my kernel boot params, and added mdraid to the boot runlevel per the einfo message on the latest ebuild (I only had mdadm there before).
No improvement
Here's an example of /proc/mdstat in broken mode:
Code: | Personalities : [raid0] [raid1] [raid10] [raid6] [raid5] [raid4] [faulty]
md1 : inactive sdb2[1](S)
223689728 blocks super 1.2
md2 : inactive sdd3[3](S)
261621760 blocks super 1.2
md0 : inactive sdc1[1](S)
3070976 blocks super 1.2
unused devices: <none>
|
I say "example" because the three partitions that appear seem to be different every time. |
|
Back to top |
|
|
Havin_it Veteran
Joined: 17 Jul 2005 Posts: 1246 Location: Edinburgh, UK
|
Posted: Tue May 15, 2012 12:57 pm Post subject: |
|
|
Next thing I tried: commenting-out everything in mdadm.conf, kernel autodetection still turned off. Result:
Code: | Personalities : [raid0] [raid1] [raid10] [raid6] [raid5] [raid4] [faulty]
md125 : active raid5 sdb1[0] sdc1[1]
6141696 blocks super 1.2 level 5, 128k chunk, algorithm 2 [3/2] [UU_]
md126 : active raid5 sdb3[0] sdd3[3]
523243008 blocks super 1.2 level 5, 128k chunk, algorithm 2 [3/2] [U_U]
md127 : inactive sda2[0](S)
223689728 blocks super 1.2
md0 : inactive sdd1[3](S)
3070976 blocks super 1.2
md2 : inactive sdc3[1](S)
261621760 blocks super 1.2
md1 : inactive sdb2[1](S)
223689728 blocks super 1.2
unused devices: <none> |
So, a whole different muddle. Is any of this helping?
EDIT: And here it is with autodetect turned back on:
Code: | Personalities : [raid0] [raid1] [raid10] [raid6] [raid5] [raid4] [faulty]
md125 : active raid5 sdb1[0] sdd1[3]
6141696 blocks super 1.2 level 5, 128k chunk, algorithm 2 [3/2] [U_U]
md126 : active raid5 sdb3[0] sdd3[3]
523243008 blocks super 1.2 level 5, 128k chunk, algorithm 2 [3/2] [U_U]
md127 : inactive sda2[0](S)
223689728 blocks super 1.2
md2 : inactive sdc3[1](S)
261621760 blocks super 1.2
md1 : inactive sdb2[1](S)
223689728 blocks super 1.2
md0 : inactive sdc1[1](S)
3070976 blocks super 1.2
unused devices: <none> |
Note that in both cases the drives that are assigned are different, and some are missing. |
|
Back to top |
|
|
Havin_it Veteran
Joined: 17 Jul 2005 Posts: 1246 Location: Edinburgh, UK
|
Posted: Tue May 15, 2012 2:09 pm Post subject: |
|
|
Right, I think I've now tried every combo of settings I could think of. Finally, I did:
mdadm-3.2.3-r1
kernel version: 3.3.5 (newer)
kernel autodetect: on
mdraid in boot runlevel: yes
mdadm.conf: empty
Works perfectly, device names as before.
I don't think I'm doing anything wrong here, so I'm gonna bugreport it. |
|
Back to top |
|
|
djdunn l33t
Joined: 26 Dec 2004 Posts: 810
|
Posted: Wed May 16, 2012 7:36 am Post subject: |
|
|
this is all problems with superblocks most likely, especially the version 1.2 that doesn't work with kernel autodetection, there's also a thing with mdadm naming stuff md126 and md127 so on, i just gave in and let mdadm win that fight i didn't care that much and changed my systems appropriately. if you go in and wipe and rewrite your superblocks you can probably do it with the newer versions _________________ “Music is a moral law. It gives a soul to the Universe, wings to the mind, flight to the imagination, a charm to sadness, gaiety and life to everything. It is the essence of order, and leads to all that is good and just and beautiful.”
― Plato |
|
Back to top |
|
|
Havin_it Veteran
Joined: 17 Jul 2005 Posts: 1246 Location: Edinburgh, UK
|
Posted: Wed May 16, 2012 10:14 am Post subject: |
|
|
From my findings before, I assumed that the kernel had been doing the assembly (since I didn't have mdraid in init, only mdadm). But then I read this in the kernel docs:
/usr/src/linux/Documentation/md.txt wrote: | Boot time autodetection of RAID arrays
--------------------------------------
When md is compiled into the kernel (not as module), partitions of
type 0xfd are scanned and automatically assembled into RAID arrays.
This autodetection may be suppressed with the kernel parameter
"raid=noautodetect". As of kernel 2.6.9, only drives with a type 0
superblock can be autodetected and run at boot time.
The kernel parameter "raid=partitionable" (or "raid=part") means
that all auto-detected arrays are assembled as partitionable. |
No idea what "type 0xfd" or "type 0 superblock" mean though, anyone?
FWIW, here's the full info from one partition and its array:
Code: | hazel ~ # mdadm -E /dev/sdb1
/dev/sdb1:
Magic : a92b4efc
Version : 1.2
Feature Map : 0x0
Array UUID : 3505e7ec:202fabce:86aee957:c134a8cb
Name : hazel:0 (local to host hazel)
Creation Time : Tue Feb 14 18:58:30 2012
Raid Level : raid5
Raid Devices : 3
Avail Dev Size : 6141952 (2.93 GiB 3.14 GB)
Array Size : 12283392 (5.86 GiB 6.29 GB)
Used Dev Size : 6141696 (2.93 GiB 3.14 GB)
Data Offset : 2048 sectors
Super Offset : 8 sectors
State : clean
Device UUID : 61249d56:ae319f04:9589ef69:8ac6dc21
Update Time : Wed May 16 10:55:10 2012
Checksum : 73f249d8 - correct
Events : 18
Layout : left-symmetric
Chunk Size : 128K
Device Role : Active device 0
Array State : AAA ('A' == active, '.' == missing)
hazel ~ # mdadm -D /dev/md0
/dev/md0:
Version : 1.2
Creation Time : Tue Feb 14 18:58:30 2012
Raid Level : raid5
Array Size : 6141696 (5.86 GiB 6.29 GB)
Used Dev Size : 3070848 (2.93 GiB 3.14 GB)
Raid Devices : 3
Total Devices : 3
Persistence : Superblock is persistent
Update Time : Wed May 16 10:55:10 2012
State : clean
Active Devices : 3
Working Devices : 3
Failed Devices : 0
Spare Devices : 0
Layout : left-symmetric
Chunk Size : 128K
Name : hazel:0 (local to host hazel)
UUID : 3505e7ec:202fabce:86aee957:c134a8cb
Events : 18
Number Major Minor RaidDevice State
0 8 17 0 active sync /dev/sdb1
1 8 33 1 active sync /dev/sdc1
3 8 49 2 active sync /dev/sdd1 |
|
|
Back to top |
|
|
Havin_it Veteran
Joined: 17 Jul 2005 Posts: 1246 Location: Edinburgh, UK
|
Posted: Wed May 16, 2012 10:55 am Post subject: |
|
|
Also should mention that even with the new/bad mdadm, if I stop all the malformed arrays then issue mdadm -As (as the mdraid initscript does), it assembles perfectly, so whatever goes wrong is specific to the boot/init process.
Curiouser and curiouser... |
|
Back to top |
|
|
djdunn l33t
Joined: 26 Dec 2004 Posts: 810
|
Posted: Thu May 17, 2012 10:00 am Post subject: |
|
|
type 0xfd is partition type linux raid
autodetection via kernel does not work with superblock version 1.2
superblock version 1.2 you can only boot / on raid by running mdadm with an initramfs NOT dmraid _________________ “Music is a moral law. It gives a soul to the Universe, wings to the mind, flight to the imagination, a charm to sadness, gaiety and life to everything. It is the essence of order, and leads to all that is good and just and beautiful.”
― Plato |
|
Back to top |
|
|
Havin_it Veteran
Joined: 17 Jul 2005 Posts: 1246 Location: Edinburgh, UK
|
Posted: Thu May 17, 2012 10:29 am Post subject: |
|
|
djdunn wrote: | type 0xfd is partition type linux raid
autodetection via kernel does not work with superblock version 1.2
superblock version 1.2 you can only boot / on raid by running mdadm with an initramfs NOT dmraid |
OK, so in my previous setting (or now) the kernel was not doing autodetection, but as at that point mdadm was not doing assembly either (mdraid was not in boot runlevel; mdadm initscript doesn't assemble arrays, just monitors them) how did/does it work? I take it udev must be involved, but it does seem conclusive that mdadm is the package that's to blame...
Huh?
What exactly does udev do as part of this process?
PS: not sure if you were being specific or illustrative, but just to reiterate my / isn't on raid. Also isn't dmraid a different package? |
|
Back to top |
|
|
esperto Apprentice
Joined: 27 Dec 2004 Posts: 158 Location: Brazil
|
Posted: Sat May 19, 2012 8:25 pm Post subject: |
|
|
I've just had the same problem, upgraded the mdadm to version 3.2.4 and when I rebooted the md0 was inactive and only showed my sdd1 partition as part of it, I've immediately rolled back to version 3.2.3-r1 and everything was back to normal, definatly something fishy here.
below is my what I have in mdadm.conf
Code: |
ARRAY /dev/md0 metadata=1.2 name=htpc:0 UUID=b1aa480d:af00e6fd:c35876b8:00ae9e55
|
I'm currently running kernel 3.2.12 from gentoo-sources and I don' t have mdraid on boot.
What should I do? just keep the 3.2.3-r1 version for now and wait for a new update or clean out mdadm.conf and add mdraid to boot sequence? I'm afraid removing the array line from mdadm.conf will screw up my raid. _________________ nasci pelado, careca e sem dente, o que vier é lucro |
|
Back to top |
|
|
Havin_it Veteran
Joined: 17 Jul 2005 Posts: 1246 Location: Edinburgh, UK
|
Posted: Sun May 20, 2012 8:19 am Post subject: |
|
|
Hi esperto, nice to hear I'm not alone
If your issue is the same, I don't think it'll matter what you do if you upgrade again: I've flipped every variable I could think of and still no joy. If you're willing to try it again though, please try them yourself and add your findings to my bug report. Useful info would be:
* does it work using only mdadm (ie if you add raid=noautodetect to your GRUB boot command-line and add mdraid to boot runlevel)? In your case, does this work with mdadm-3.2.3-r1 either?
* Any change if you comment-out the ARRAY line in mdadm.conf? (For me, with 3.2.3-r1 it still works anyway.)
* With mdadm-3.2.4, if you do mdadm -S <each array device mentioned in mdstat>, then mdadm -As, does the array assemble correctly?
* Are your RAID partitions GPT or MS-DOS? How many in the array, what RAID level, etc. |
|
Back to top |
|
|
rcb1974 n00b
Joined: 12 Mar 2003 Posts: 56 Location: Ithaca, NY, USA
|
Posted: Mon May 21, 2012 7:10 pm Post subject: I have the same problem |
|
|
I have the same problem after upgrading mdadm.
I'm now using mdadm 3.2.4 and vanilla sources 3.3.6.
All my v0.9 superblock software RAID1 arrays (except the /dev/md0 root volume) no longer get autoassembled.
In order to mount the arrays, I first have to stop them, and then assemble them.
Example:
Code: | mdadm --stop /dev/md1
mdadm --assemble /dev/md1
mount /dev/md1 |
|
|
Back to top |
|
|
Havin_it Veteran
Joined: 17 Jul 2005 Posts: 1246 Location: Edinburgh, UK
|
Posted: Mon May 21, 2012 9:44 pm Post subject: |
|
|
Interesting - I guess if md0 is your root that you have an initramfs with mdadm in, can you think of any other differences between md1 (and any others) and md0?
Also, does md1 (and others) come up correctly if you just do "mdadm -As"? |
|
Back to top |
|
|
|
|
You cannot post new topics in this forum You cannot reply to topics in this forum You cannot edit your posts in this forum You cannot delete your posts in this forum You cannot vote in polls in this forum
|
|