Gentoo Forums
Gentoo Forums
Gentoo Forums
Quick Search: in
md devices being assembled before multipath [SOLVED]
View unanswered posts
View posts from last 24 hours

 
Reply to topic    Gentoo Forums Forum Index Kernel & Hardware
View previous topic :: View next topic  
Author Message
wildbug
n00b
n00b


Joined: 07 Oct 2007
Posts: 73

PostPosted: Fri Jul 22, 2011 4:50 pm    Post subject: md devices being assembled before multipath [SOLVED] Reply with quote

I've recently installed a SAS2 disk array, and I'm having some issues bringing it up (correctly) on boot.

The root filesystem is on a RAID1 (motherboard SATA). The new storage array is in a JBOD attached to two LSI HBAs. multipath is used to map the two paths to one device, those multipath devices are assembled into four 9-disk RAID6 volumes which are part of an LVM2 volume group. A single logical volume exists in this volume group and is formatted with XFS.

This works fine when assembled manually but doesn't come up correctly when rebooting. The problem is that the md devices start before the multipath devices are created; AFAICT multipath fails because the devices are already in use by md.

I've turned off md autodetect in the kernel, and I edited the "before" line in /etc/init.d/multipath to include mdraid. The root device is assembled with a "md=127,/dev/sda1,/dev/sdb1" kernel parameter in grub.conf.

You can see that the md devices are already assembled when /etc/init.d/mdraid is run. Here's an excerpt from /var/log/rc.log:
Code:
rc boot logging started at Fri Jul 22 19:50:01 2011
 * Setting system clock using the hardware clock [UTC] ... [ ok ]
 * Loading module dm_multipath ... [ ok ]
 * Autoloaded 1 module(s)
 * Activating Multipath devices ... [ ok ]
 * Starting up RAID devices ...
mdadm: /dev/md10 is already in use.
mdadm: /dev/md11 is already in use.
mdadm: /dev/md12 is already in use.
mdadm: /dev/md/126 is already in use.
 [ !! ]
 * Setting up the Logical Volume Manager ... [ ok ]
 * Checking local filesystems  ...
/sbin/fsck.xfs: XFS file system.
/sbin/fsck.xfs: UUID=a5c7abf9-d2bc-4d30-bf08-df08215c48c1 does not exist
/sbin/fsck.xfs: XFS file system.
 * Operational error
 [ !! ]
(...continues)


In dmesg I can see "md: bind<sd*>" lines interspersed between the SCSI discoveries. Excerpt:
Code:
[   17.230502] scsi 6:0:26:0: Direct-Access     SEAGATE  ST32000444SS     0006 PQ: 0 ANSI: 5
[   17.230508] scsi 6:0:26:0: SSP: handle(0x0025), sas_addr(0x5000c50033f6f6ce), phy(14), device_name(0x00c50050cef6f633)
[   17.230511] scsi 6:0:26:0: SSP: enclosure_logical_id(0x500304800000007f), slot(6)
[   17.230515] scsi 6:0:26:0: qdepth(254), tagged(1), simple(0), ordered(0), scsi_level(6), cmd_que(1)
[   17.231283] sd 6:0:23:0: [sdax] Attached SCSI disk
[   17.231454] sd 6:0:25:0: [sdaz] Write cache: enabled, read cache: enabled, supports DPO and FUA
[   17.232243]  sday: unknown partition table
[   17.232835] sd 6:0:26:0: [sdba] 3907029168 512-byte logical blocks: (2.00 TB/1.81 TiB)
[   17.233159] sd 6:0:26:0: Attached scsi generic sg54 type 0
[   17.234782] sd 6:0:26:0: [sdba] Write Protect is off
[   17.234785] sd 6:0:26:0: [sdba] Mode Sense: d7 00 10 08
[   17.235817] md: bind<sdi>


Later in the dmesg output md assembles its devices despite having autoassemble turned off both in the kernel and on the kernel boot line. What gives?


Last edited by wildbug on Fri Mar 01, 2013 5:39 pm; edited 1 time in total
Back to top
View user's profile Send private message
wildbug
n00b
n00b


Joined: 07 Oct 2007
Posts: 73

PostPosted: Fri Jul 22, 2011 5:28 pm    Post subject: Reply with quote

udev's doing this, isn't it?
Code:
# rc-update show sysinit
                devfs | sysinit
                dmesg | sysinit
                 udev | sysinit
# find /lib64/udev -name "*md*"
/lib64/udev/rules.d/64-md-raid.rules
Back to top
View user's profile Send private message
NeddySeagoon
Administrator
Administrator


Joined: 05 Jul 2003
Posts: 54097
Location: 56N 3W

PostPosted: Fri Jul 22, 2011 5:34 pm    Post subject: Reply with quote

wildbug,

Kernel auto assembly works with raid superblocks version 0.90 only. Thats not been the default for about 6 months now.
That change has caused a lot of people who were expecting raid auto assembly to just work issues.

What version raid suberblocsl do you have ?
Try
Code:
$ sudo /sbin/mdadm -E /dev/sda1
Password:
/dev/sda1:
          Magic : a92b4efc
        Version : 0.90.00
           UUID : 9392926d:64086e7a:86638283:4138a597
  Creation Time : Sat Apr 11 16:34:40 2009
     Raid Level : raid1
{snip]
on one of the volumes that you donate to raid. I would expect your version to be 1.2, in which case your raid sets must be being assembled by mdadm somewhere.

Kernel auto assembly looks like this in dmesg
Code:
[    2.380529] md: Waiting for all devices to be available before autodetect
[    2.380874] md: If you don't use raid, use raid=noautodetect
[    2.381485] md: Autodetecting RAID arrays.
[    2.503208] md: Scanned 12 and added 12 devices.
[    2.504567] md: autorun ...
[snip]

_________________
Regards,

NeddySeagoon

Computer users fall into two groups:-
those that do backups
those that have never had a hard drive fail.
Back to top
View user's profile Send private message
wildbug
n00b
n00b


Joined: 07 Oct 2007
Posts: 73

PostPosted: Fri Jul 22, 2011 6:03 pm    Post subject: Reply with quote

Neddy, thanks for the reply, but despite my erroneous reply (and subsequent retraction) in the other thread, I realize that autoassemble only works for 0.90 superblocks. In fact it was my experience described above that made me think that autoassemble was working despite non-0.90 superblocks. However, the root device DOES have v0.90 superblocks and has been autoassembling correctly for months now. But with the recent addition of devices that require multipath to be active before md, I've intentionally turned off autodetect and just added a kernel parameter to assemble the root device (as detailed in my OP).

I'm not trying to get arrays to autoassemble; I'm trying to STOP them from being assembled before multipath is activated. I turned off raid autodetect in the kernel (and redundant "raid=noautodetect", just in case); non-root arrays are being assembled between kernel boot and the boot runlevel, which is why I'm now wondering if udev is responsible.

(For the record, my root device is 0.90 and the other md devices are a mixture of 1.1 and 1.2. Not that it's relevant.)

Here's the root device being correctly assembled (from dmesg):
Code:
[    6.797329] md: Skipping autodetection of RAID arrays. (raid=autodetect will force)
[    6.797975] md: Loading md127: /dev/sda1
[    6.798894] md: bind<sda1>
[    6.799383] md: bind<sdb1>
[    6.800027] bio: create slab <bio-1> at 1
[    6.800490] md/raid1:md127: active with 2 out of 2 mirrors
[    6.800847] md127: detected capacity change from 0 to 224063848448
[    6.801501]  md127: unknown partition table
[    6.803088]  md127: unknown partition table
[    6.829729] XFS (md127): Mounting Filesystem
[    6.845067] usb 3-1: new low speed USB device number 2 using ohci_hcd
[    6.859377] XFS (md127): Ending clean mount
[    6.859741] VFS: Mounted root (xfs filesystem) readonly on device 9:127.
Back to top
View user's profile Send private message
NeddySeagoon
Administrator
Administrator


Joined: 05 Jul 2003
Posts: 54097
Location: 56N 3W

PostPosted: Fri Jul 22, 2011 7:15 pm    Post subject: Reply with quote

wildbug,

We have established that its not the kernel doing auto assemble of your raid 1.1/1.2 raid sets, which is a step in the right direction.

What does
Code:
rc-update show
produce ?
mdadm should not be listed or it will be started in its sequence by the start up scripts.

Just because a service is not listed in rc-update show does not mean it is not running.
_________________
Regards,

NeddySeagoon

Computer users fall into two groups:-
those that do backups
those that have never had a hard drive fail.
Back to top
View user's profile Send private message
wildbug
n00b
n00b


Joined: 07 Oct 2007
Posts: 73

PostPosted: Fri Jul 22, 2011 7:43 pm    Post subject: Reply with quote

Code:
# rc-update show
             bootmisc | boot                                         
          consolefont | boot                                         
           consolekit |      default                                 
                cupsd |      default                                 
                 dbus |      default                                 
                devfs |                                        sysinit
        device-mapper | boot                                         
                dmesg |                                        sysinit
                 fsck | boot                                         
             hostname | boot                                         
              hwclock | boot                                         
              keymaps | boot                                         
            killprocs |                        shutdown               
                local |      default nonetwork                       
           localmount | boot                                         
                  lvm | boot                                         
                mdadm |      default                                 
               mdraid | boot                                         
              modules | boot                                         
             mount-ro |                        shutdown               
                 mtab | boot                                         
            multipath | boot                                         
           multipathd |      default                                 
             net.eth0 |      default                                 
             net.eth1 |      default                                 
               net.lo | boot                                         
             netmount |      default                                 
                  nfs |      default                                 
             nfsmount |      default                                 
                 ntpd | boot                                         
              pbs_mom |      default                                 
            pbs_sched |      default                                 
           pbs_server |      default                                 
               procfs | boot                                         
                 root | boot                                         
            savecache |                        shutdown               
                 sshd |      default                                 
                 swap | boot                                         
               sysctl | boot                                         
         termencoding | boot                                         
                 udev |                                        sysinit
       udev-postmount |      default                                 
                 upsd |      default                                 
               upsdrv |      default                                 
               upsmon |      default                                 
              urandom | boot                                         
                  vgl |      default                                 
                  xdm |      default                                 
               xinetd |      default


FYI, /etc/init.d/mdadm is the monitoring daemon; there is no assembly. /etc/init.d/mdraid is the one containing "mdadm -As".

But look at my OP again, specifically the first code block. That's output from the rc_logger for the boot runlevel. You can see the order of services being executed -- hwclock, modules, multipath, mdraid, etc. When it hits mdraid, it declares that the md devices are already assembled. That means that something is putting them together AFTER kernel autodetect and BEFORE mdraid.

When I first posted this, I didn't realize there was a sysinit runlevel before boot. There are three services in sysinit -- devfs, dmesg, and udev. It's possible that udev or devfs is triggering the md devices (which is what I was getting at in my second post). I'm not intimately familiar with either of those.
Back to top
View user's profile Send private message
wildbug
n00b
n00b


Joined: 07 Oct 2007
Posts: 73

PostPosted: Fri Jul 22, 2011 8:19 pm    Post subject: Reply with quote

Here's my complete dmesg: http://pastebin.com/raw.php?i=QGq7wit4

Here's an excerpt of the array assembly timeline:
Code:
[    6.778542] md/raid1:md127: active with 2 out of 2 mirrors
[    7.618775] udev[2822]: starting version 164
[    7.772809] md/raid1:md126: active with 2 out of 2 mirrors
[    8.282715] md/raid:md10: raid level 6 active with 9 out of 9 devices, algorithm 2
[    8.401377] md/raid:md11: raid level 6 active with 9 out of 9 devices, algorithm 2
[   17.420690] md/raid:md125: raid level 5 active with 6 out of 6 devices, algorithm 2
[   17.845425] md/raid:md12: raid level 6 active with 9 out of 9 devices, algorithm 2
[   18.108825] md/raid:md13: raid level 6 active with 9 out of 9 devices, algorithm 2
[   20.816985] device-mapper: multipath: version 1.3.0 loaded
[   21.021549] device-mapper: table: 253:0: multipath: error getting device
Back to top
View user's profile Send private message
wildbug
n00b
n00b


Joined: 07 Oct 2007
Posts: 73

PostPosted: Mon Jul 25, 2011 8:05 pm    Post subject: Reply with quote

Yep, udev is the culprit. The rule /lib/udev/rules.d/64-md-raid.rules (supplied by sys-fs/mdadm) calls "mdadm --incremental" on the device. If I remove that file, md arrays are not automatically assembled.

Now I have to figure out how to fix this in an upgrade-friendly way. I'd like to make that rule ignore disks attached via the HBAs. Could this be possible by creating a custom rule in /etc/udev/rules.d and without deleting/editing the "official" /lib/udev/rules.d/64-md-raid.rules?
Back to top
View user's profile Send private message
NeddySeagoon
Administrator
Administrator


Joined: 05 Jul 2003
Posts: 54097
Location: 56N 3W

PostPosted: Mon Jul 25, 2011 8:25 pm    Post subject: Reply with quote

wildbug,

If you create a rule in a file with a lower number than /lib/udev/rules.d/64-md-raid.rules ? /not etc?... ?
Say 03-md-raid.rules, that does nothing, it will be run before 64-md-raid.rules and will not be affected by updates either.
It must match the same thing(s) as in 64-md-raid.rules.

udev will trigger, execute your rule that does nothing, then you have full manual control.
_________________
Regards,

NeddySeagoon

Computer users fall into two groups:-
those that do backups
those that have never had a hard drive fail.
Back to top
View user's profile Send private message
wildbug
n00b
n00b


Joined: 07 Oct 2007
Posts: 73

PostPosted: Mon Jul 25, 2011 10:26 pm    Post subject: Reply with quote

I think I've solved this. I can't reboot to test right now as I currently have users running some long simulations, but udevadm test seems to produce correct results. I'll mark the thread as solved once I can reboot and confirm.

This is what I did:

The server in question has two LSI 9200-8e HBAs and an onboard SAS controller with the same chipset (LSI2008). As I only want to include devices connected to the HBAs, I used "udevadm info" to find differences between the device trees of drives attached to the motherboard and the HBAs. At one level I found differences -- ATTRS{subsystem_vendor} and ATTRS{subsystem_device}. I could now identify the correct devices.

The next part was overriding the array assembly. I finally realized that there was only one line in 64-md-raid.rules that I had to circumvent:
Code:
ENV{ID_FS_TYPE}=="linux_raid_member", ACTION=="add", RUN+="/sbin/mdadm --incremental $env{DEVNAME}"


If either ENV{ID_FS_TYPE} or ACTION fail to match, then mdadm isn't executed. ENV variables are settable, so I created my own rule to unset ENV{ID_FS_TYPE} just before 64-md-raid.rules is consulted. It also has to be after /lib/udev/rules.d/60-persistent-storage.rules as this is where that variable is originally set (as identified from udevadm test output).

Code:

# /etc/udev/rules.d/63-lsi-9200-8e.rules
KERNEL=="sd*", ACTION=="add", DRIVERS=="mpt2sas", ATTRS{subsystem_vendor}=="0x1000", ATTRS{subsystem_device}=="0x3080", ENV{ID_FS_TYPE}=""


Using udevadm test on devices not attached to the HBAs, I can see the "mdadm --incremental" line; HBA-attached devices no longer include it. That should mean success. :)
Back to top
View user's profile Send private message
dmitryilyin
n00b
n00b


Joined: 08 Apr 2008
Posts: 27
Location: Netherlands

PostPosted: Tue Jul 26, 2011 7:09 pm    Post subject: Reply with quote

You'll have more luck with better server distribution (assuming you are working with server) Debian or RedHat like.
They use advanced initramfs and much better suited for production servers.
Gentoo is good for learning linux, development and experimenting)
Back to top
View user's profile Send private message
wildbug
n00b
n00b


Joined: 07 Oct 2007
Posts: 73

PostPosted: Fri Mar 01, 2013 5:38 pm    Post subject: Reply with quote

So I finally got around to rebooting... :)

I think I have this sorted out. Custom udev rules are not necessary.

What happens is that udev runs in the sysinit bootlevel; /lib/udev/rules.d/64-md-raid.rules is part of this process, and it uses mdadm in incremental mode to attempt to automatically assemble RAID devices as components are discovered. However, it does respect /etc/mdadm.conf, so that file can be used to control behavior during this step. Whitelisting devices using a DEVICE line wasn't sufficent; it seems only ARRAY lines are affected by this. An AUTO line set to blacklist all arrays in conjunction with selectively whitelisting arrays in ARRAY lines with "auto=yes" will work. Setting "devices=/dev/mapper/*" in the ARRAY lines was also necessary.

/etc/mdadm.conf
Code:
AUTO -all
DEVICE /dev/sd[ab]*
DEVICE /dev/mapper/*

ARRAY /dev/md10 UUID=xxxxxxxx:xxxxxxxx:xxxxxxxx:xxxxxxxx devices=/dev/mapper/* spares=1 spare-group=mp_spares
ARRAY /dev/md11 UUID=xxxxxxxx:xxxxxxxx:xxxxxxxx:xxxxxxxx devices=/dev/mapper/* spares=1 spare-group=mp_spares

ARRAY /dev/md/root UUID=xxxxxxxx:xxxxxxxx:xxxxxxxx:xxxxxxxx auto=yes


The dm-multipath kernel module must be loaded (if you've built it as a module) or /etc/init.d/multipath will fail to start.

/etc/conf.d/modules
Code:
modules="dm_multipath"


And mdraid, which will be used to bring up the arrays listed in mdadm.conf, needs to be started after multipath.

/etc/conf.d/mdraid
Code:
rc_need="multipath"
Back to top
View user's profile Send private message
Display posts from previous:   
Reply to topic    Gentoo Forums Forum Index Kernel & Hardware All times are GMT
Page 1 of 1

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum