| View previous topic :: View next topic |
| Author |
Message |
NeddySeagoon Administrator


Joined: 05 Jul 2003 Posts: 30016 Location: 56N 3W
|
Posted: Sat Apr 14, 2012 9:23 pm Post subject: |
|
|
ExecutorElassus,
I cannot think of any reason for a drive to drop out of a raid set and not leave a sign in your log.
If the system shut down correctly and something happened to make the next assemble fail, you might miss it as it may not be possibe to log, particulary if the log was on the raid that did not start.
dmesg keeps a ring buffer in RAM, so the dmesg command works for the content of the buffer, even if the log location is not mounted. Of course, you lose that on power down.
I've not been around much today - my system has been mostly failing to boot while I did the udev-182 upgrade.
It only boots now if I skip the fsck in the initrd. _________________ Regards,
NeddySeagoon
Computer users fall into two groups:-
those that do backups
those that have never had a hard drive fail. |
|
| Back to top |
|
 |
ExecutorElassus l33t


Joined: 11 Mar 2004 Posts: 697 Location: Stuttgart, Germany
|
Posted: Sat Apr 14, 2012 9:30 pm Post subject: |
|
|
Hi Neddy,
well, then you can tell me how to get udev-182 to play nice with my RAIDed /usr once I have it working.
For the assembly problems, would it be useful to make use of the mdadm.config file? Like, actually specify the arrays manually? Right now, it seems to be building them based on their own superblocks (or whatever else mdadm uses when there's no config file), and maybe stating explicitly which partitions go into which array might make things work better.
In any case, once the recovery is done, I'll run dd as per your instructions to check for read errors. If dmesg says nothing, I'll try rebooting later tonight (tomorrow morning) and report back.
Grr... my failures are always the ones that make no sense.
Thanks again for the help, and good luck with udev. Have you tried using the earlymount script posted here?
Cheers,
EE |
|
| Back to top |
|
 |
NeddySeagoon Administrator


Joined: 05 Jul 2003 Posts: 30016 Location: 56N 3W
|
Posted: Sat Apr 14, 2012 9:56 pm Post subject: |
|
|
ExecutorElassus,
raid is OK, I'm having problems with separate /usr and /var which are in lvm2 on raid5.
I'm going with the wiki.gentoo.org page but with additions for raid and lvm.
The raid bits work fine - the lvm ones don't ... not yet.
It looks like I don't get any /dev nodes for the logical volumes.
Do you use raid autodetect in the kernel.?
If so, to use an mdadm.conf file you will need to either move to an initrd or not have root on raid, since root on raid needs the raid assembled before root is mounted. _________________ Regards,
NeddySeagoon
Computer users fall into two groups:-
those that do backups
those that have never had a hard drive fail. |
|
| Back to top |
|
 |
ExecutorElassus l33t


Joined: 11 Mar 2004 Posts: 697 Location: Stuttgart, Germany
|
Posted: Sat Apr 14, 2012 10:09 pm Post subject: |
|
|
Hi Neddy,
I … think I have an initrd? But I do for certain have / on a RAID1 (mirrored, yes?) array. So is /boot, for that matter.
That forum seems to have a script to run pre-mount and checking on RAID/LVM at the sysinit level. I don't know for certain if that would work for my setup (which is the one specified in the gentoo RAID/LVM2 quick-install handbook: /boot on RAID1, x swap partitions, / on RAID1, and then a big RAID5 for everything else [in my case, /usr, /opt, /var, /var/tmp, /usr/portage, /usr/portage/distfiles, and /home all get an lv, along with 1.7TB of storage partitions). Since the kernel source directories are under /usr/src/linux, which is on the big RAID5, is that going to cause problems with the kernel loading (with <udev-182)?
The earlymount script seemed to work okay, but then my drives shut down, and I rebooted into a broken system.
Since I have a couple hours to go before I can test the md127 that's rebuilding, lemme plug you with questions. Is it possible to change the role numbers of an active array? I'm still bothered by sdc4 being [3] instead of [1] like all the other sdcX partitions are in their respective arrays, and wonder if that might be causing problems.
So far, all dmesg says is the mdadm message I posted before, and | Code: | | scsi_verify_blk_ioctl: 16 callbacks suppressed | .
Any advice?
Cheers,
EE |
|
| Back to top |
|
 |
NeddySeagoon Administrator


Joined: 05 Jul 2003 Posts: 30016 Location: 56N 3W
|
Posted: Sat Apr 14, 2012 10:37 pm Post subject: |
|
|
ExecutorElassus,
The role numbers don't matter. The do not need to map to the partitions in any particular order.
What happens in /usr/src has no impact on booting. The kernel you boot is a binary file, normally in /boot
That you have /boot on raid1 rells me that yur /boot raid is a version 0.9 superblock, so could be kernel auto assebled. Grub is not interested in that since raid assembly, however its done, happens after grub has done its stuff and exited.
You may have an initrd to assemble the rest and start your logical volume. You may not too. look in grub.conf. Do you have an initrd entry user tour kernel line ? _________________ Regards,
NeddySeagoon
Computer users fall into two groups:-
those that do backups
those that have never had a hard drive fail. |
|
| Back to top |
|
 |
ExecutorElassus l33t


Joined: 11 Mar 2004 Posts: 697 Location: Stuttgart, Germany
|
Posted: Sat Apr 14, 2012 10:44 pm Post subject: |
|
|
Hi Neddy,
nope, no initrd line. So, I guess that means no initrd.
Both /boot and / are indeed on 0.9 superblocks, as instructed by the install guide. The RAID5 is 1.2.
What I've heard about initrd or initramfs is that both of them add to boot time, and are thus undesirable (or so I gathered from that thread about the earlymounts script). But I'll worry about that part of setup once I've sorted why my RAID5 keeps barfing.
90 minutes to go from this message. I imagine you'll be off to bed by then. Should I then proceed with 'dd', and then try rebooting?
Cheers,
EE |
|
| Back to top |
|
 |
NeddySeagoon Administrator


Joined: 05 Jul 2003 Posts: 30016 Location: 56N 3W
|
Posted: Sat Apr 14, 2012 10:52 pm Post subject: |
|
|
ExecutorElassus,
Yes. The dd is harmless. As you have no initrd, you must use kernel auto assembly, so you can't use an mdadm.conf.
For you, your initrd would only assemble your raid and start your lvm, which has to happen anyway.
It would not be a huge bloated full of kernel modules initrd that you rebuild with every kernel. It would just be a space for some userspace tools. It need not increase the boot time. _________________ Regards,
NeddySeagoon
Computer users fall into two groups:-
those that do backups
those that have never had a hard drive fail. |
|
| Back to top |
|
 |
ExecutorElassus l33t


Joined: 11 Mar 2004 Posts: 697 Location: Stuttgart, Germany
|
Posted: Sat Apr 14, 2012 10:58 pm Post subject: |
|
|
Hrm,
and as I understand it, this is what's causing problems with udev-182, yes? So, an initrd would mount the MDs, and start the RAID/LVM, earlier in the boot process, yes?
Maybe I should just suck it up and use one. Is there documentation on it?
But first things first. I'll see if I can get this RAID5 array to survive a reboot, and then proceed wit the rest.
I'll report back as soon as dd is finished (unless there are other things I should do in the meantime?)
Thanks again,
EE |
|
| Back to top |
|
 |
ExecutorElassus l33t


Joined: 11 Mar 2004 Posts: 697 Location: Stuttgart, Germany
|
Posted: Sun Apr 15, 2012 9:44 am Post subject: |
|
|
Okay, how about this:
dd finished with no messages in dmesg. So I reboot. As before, sda4 gets moved into md125 (a bogus array), and that array fails to start. BUT. I reboot, and on the next bootup, everything is in its correct (active array), partitions are mounted, and fsck checks all the partitions.
This has been happening for some time, now that I think of it: the first reboot always results in a disabled RAID5, but rebooting results in a functional one.
Any idea why?
Cheers,
EE |
|
| Back to top |
|
 |
NeddySeagoon Administrator


Joined: 05 Jul 2003 Posts: 30016 Location: 56N 3W
|
Posted: Sun Apr 15, 2012 12:14 pm Post subject: |
|
|
ExecutorElassus,
That sounds like a race condition. Maybe one drive is taking longer then the others to come ready, so the kernel gives up waiting for it.
If it were spin up time related, at the reboot, the drives would already be spun up ... so it would just work.
Spin up time is a wearout paramater reported in smartctl. _________________ Regards,
NeddySeagoon
Computer users fall into two groups:-
those that do backups
those that have never had a hard drive fail. |
|
| Back to top |
|
 |
ExecutorElassus l33t


Joined: 11 Mar 2004 Posts: 697 Location: Stuttgart, Germany
|
Posted: Sun Apr 15, 2012 12:28 pm Post subject: |
|
|
Hi Neddy,
but I'm not booting from a full shutdown. I'm using 'init 6' both times. This morning, after dd finished, I hit 'init 6'. The first boot, I got one drive dropped out of the array. The next boot, it's back in place without my doing anything.
smartctl shows 0 for "Spin Up Time" for all three drives. sdc (the suspect one) does show:
| Code: | 183 Runtime_Bad_Block 0x0000 001 001 000 Old_age Offline - 1683
188 Command_Timeout 0x0032 100 001 000 Old_age Always - 1632
199 UDMA_CRC_Error_Count 0x003e 200 200 000 Old_age Always - 1627
| where the other two show zero. Is that anything?
On the other side, I cannot emerge xorg-server due to these errors:
| Code: | /var/tmp/portage/x11-base/xorg-server-1.12.0-r1/work/xorg-server-1.12.0/dix/devices.c: In function 'FreeDeviceClass':
/var/tmp/portage/x11-base/xorg-server-1.12.0-r1/work/xorg-server-1.12.0/dix/devices.c:732:21: warning: declaration of 'type' shadows a global declaration
/var/tmp/portage/x11-base/xorg-server-1.12.0-r1/work/xorg-server-1.12.0/dix/devices.c: In function 'FreeFeedbackClass':
/var/tmp/portage/x11-base/xorg-server-1.12.0-r1/work/xorg-server-1.12.0/dix/devices.c:798:23: warning: declaration of 'type' shadows a global declaration
/var/tmp/portage/x11-base/xorg-server-1.12.0-r1/work/xorg-server-1.12.0/dix/devices.c: In function 'BadDeviceMap':
/var/tmp/portage/x11-base/xorg-server-1.12.0-r1/work/xorg-server-1.12.0/dix/devices.c:1637:30: warning: declaration of 'length' shadows a global declaration
/var/tmp/portage/x11-base/xorg-server-1.12.0-r1/work/xorg-server-1.12.0/dix/devices.c: In function 'GetMaster':
/var/tmp/portage/x11-base/xorg-server-1.12.0-r1/work/xorg-server-1.12.0/dix/devices.c:2610:33: warning: declaration of 'which' shadows a global declaration
/var/tmp/portage/x11-base/xorg-server-1.12.0-r1/work/xorg-server-1.12.0/dix/devices.c: In function 'AllocDevicePair':
/var/tmp/portage/x11-base/xorg-server-1.12.0-r1/work/xorg-server-1.12.0/dix/devices.c:2653:18: warning: declaration of 'pointer' shadows a global declaration
make[2]: *** [devices.lo] Error 1
make[2]: *** Waiting for unfinished jobs....
| This is just a snippet. Any guess what that is?
Let me know what you think.
Cheers,
EE
PS- now that I can emerge world again, some problems I had (like qt not emerging due to a block) aren't a problem any more. But xorg-server still won't emerge, due to this error. I'll let you know if any other of the 214 packages I'm emerging conks out. |
|
| Back to top |
|
 |
NeddySeagoon Administrator


Joined: 05 Jul 2003 Posts: 30016 Location: 56N 3W
|
Posted: Sun Apr 15, 2012 1:30 pm Post subject: |
|
|
ExecutorElassus,
Its difficult for me working though a keyhole ... make friends with wgetpaste and post the whole build log.
The failure message at the end of every failed build tells where the log is.
Warning are just that - warnings and thats all thats in your log snippit.
You need tp look on your drive vendors web site to understand what the raw values mean. They are often several bit fields in a 32 bit vlaue.
The normalised values would be a pass but | Code: | | 183 Runtime_Bad_Block 0x0000 001 001 000 Old_age Offline | is only updated with an offline test, so thats not really telling us anything. _________________ Regards,
NeddySeagoon
Computer users fall into two groups:-
those that do backups
those that have never had a hard drive fail. |
|
| Back to top |
|
 |
ExecutorElassus l33t


Joined: 11 Mar 2004 Posts: 697 Location: Stuttgart, Germany
|
Posted: Sun Apr 15, 2012 6:18 pm Post subject: |
|
|
pastebin is my new imaginary internet boyfriend (sorry): build.log
I'll do some digging with the drive vendor, and let you know inn a sec which ebuilld failed this time.
Cheers,
EE
UPDATE: looking at that log, the only lines which return explicit errors are from a file that belongs to kbproto. I've remerged that, and I'll get back to you about xorg-server once the @world update is done. |
|
| Back to top |
|
 |
ExecutorElassus l33t


Joined: 11 Mar 2004 Posts: 697 Location: Stuttgart, Germany
|
Posted: Sun Apr 15, 2012 8:30 pm Post subject: |
|
|
I got a little further in building X. The new build.log (which still fails) is here.
I'm also failing my @world emerge on pango. The log for that is here.
Any ideas what's going wrong with those?
Cheers,
EE
UPDATE:pango is fixed. It was libXft, not Xutil, that was choking. |
|
| Back to top |
|
 |
ExecutorElassus l33t


Joined: 11 Mar 2004 Posts: 697 Location: Stuttgart, Germany
|
Posted: Mon Apr 16, 2012 4:44 pm Post subject: |
|
|
Okay, I've gotten a bit further. Now I'm on kdepimlibs, which fails on the following:
| Code: | [ 0%] Built target kcal_automoc
Generating contactsearchjob.moc
Generating transactionjobs.moc
Scanning dependencies of target kimap_automoc
Scanning dependencies of target kio_sieve_automoc
Generating session.moc
Generating messagethreaderproxymodel.moc
Generating deletejob.moc
Generating preprocessorbase_p.moc
[ 0%] Built target kio_sieve_automoc
Generating standardmailactionmanager.moc
Scanning dependencies of target kio_imap4_automoc
/var/tmp/portage/kde-base/kdepimlibs-4.8.2/work/kdepimlibs-4.8.2/akonadi/contact/contactsearchjob.h:81: Error: Template classes not supported by Q_OBJECT
automoc4: process for /var/tmp/portage/kde-base/kdepimlibs-4.8.2/work/kdepimlibs-4.8.2_build/akonadi/contact/contactsearchjob.moc failed: Unknown error
pid to wait for: 0
| This is preventing akonadi from emerging as well. Also, redlands seems to be broken due to a missing tab space in the makefile (though I managed to build nepomuk headers anyway).
Anyway, do you know what's up with kdepimlibs? Is that something wrong with the ebuild?
Cheers,
EE |
|
| Back to top |
|
 |
NeddySeagoon Administrator


Joined: 05 Jul 2003 Posts: 30016 Location: 56N 3W
|
Posted: Mon Apr 16, 2012 6:06 pm Post subject: |
|
|
ExecutorElassus,
I've never built KDE, so I'm not a lot of help here. Try searching on bugs.gentoo.org to see if its a known issue.
If so, there may be a fix for it there too. _________________ Regards,
NeddySeagoon
Computer users fall into two groups:-
those that do backups
those that have never had a hard drive fail. |
|
| Back to top |
|
 |
NeddySeagoon Administrator


Joined: 05 Jul 2003 Posts: 30016 Location: 56N 3W
|
Posted: Mon Apr 16, 2012 6:07 pm Post subject: |
|
|
ExecutorElassus,
I've never built KDE, so I'm not a lot of help here. Try searching on bugs.gentoo.org to see if its a known issue.
If so, there may be a fix for it there too. _________________ Regards,
NeddySeagoon
Computer users fall into two groups:-
those that do backups
those that have never had a hard drive fail. |
|
| Back to top |
|
 |
ExecutorElassus l33t


Joined: 11 Mar 2004 Posts: 697 Location: Stuttgart, Germany
|
Posted: Mon Apr 16, 2012 8:50 pm Post subject: |
|
|
In fact, KDE bugs aren't even handled by the gentoo tracker: they have to be filed upstream. I did so, and we'll see what happens.
I have, however, managed to get xorg back up, so I'm back into my wm (hurrah!). I now have just four packages that won't update for various reasons, and then I imagine I'll be finding stray misnamed files for months.
But at least the system is up, and (sorta) stable, so now let's get back to the original issue: mdadm seems to be randomly dropping one of the drives out of the RAID array on bootup, with no errors in the log from shutdown (and, since logging doesn't work on the non-RAID system, nothing but dmesg to tell me what might have gone wrong at boot [but I can't scroll through that, so it isn't helpful]).
So, you're suggesting it's just spool-up times? Is there anything else that might cause it?
Cheers,
EE |
|
| Back to top |
|
 |
NeddySeagoon Administrator


Joined: 05 Jul 2003 Posts: 30016 Location: 56N 3W
|
Posted: Mon Apr 16, 2012 8:55 pm Post subject: |
|
|
ExecutorElassus,
or pastebin your dmesg _________________ Regards,
NeddySeagoon
Computer users fall into two groups:-
those that do backups
those that have never had a hard drive fail. |
|
| Back to top |
|
 |
ExecutorElassus l33t


Joined: 11 Mar 2004 Posts: 697 Location: Stuttgart, Germany
|
Posted: Mon Apr 16, 2012 8:59 pm Post subject: |
|
|
when the array is inactive, I do not have access to less. Can I pipe it to a regular file? This presumes I can reboot into a working array, since nfs and ssh are also inaccessible without the RAID5. *sadpanda*
ALSO: I just noticed, that 'ld' will, in the middle of some compiles - in this case firefox - chew up close to 40% of my RAM (about 1.6GB). Is that normal? Or is this an unfortunate consequence of my file system being riddled with files identified as directories, and actual files emerged into backups, etc. etc.?
Cheers,
EE |
|
| Back to top |
|
 |
NeddySeagoon Administrator


Joined: 05 Jul 2003 Posts: 30016 Location: 56N 3W
|
Posted: Mon Apr 16, 2012 9:58 pm Post subject: |
|
|
ExecutorElassus,
You should be a little more adventuerous. Try
When your raid drops a drive, you can still start it. | Code: | | mdadm --run /dev/mdX |
This will run the raid in degraded mode. It will probably be at the expense of a resync when you readd the dropped drive.
Build your busybox with the static USE flag. That has a pager and it lives in /bin
busybox --help will tell you about it, or man busybox.
You will be building busybox, mdadm and lvm with the static use flag for your initrd, which you need to get past udev-182.
Learn about /etc/portage/package.use for per package USE flags. Do not set static in make.conf. _________________ Regards,
NeddySeagoon
Computer users fall into two groups:-
those that do backups
those that have never had a hard drive fail. |
|
| Back to top |
|
 |
ExecutorElassus l33t


Joined: 11 Mar 2004 Posts: 697 Location: Stuttgart, Germany
|
Posted: Mon Apr 16, 2012 10:06 pm Post subject: |
|
|
Okay.
For the time being, I'm only going to do that if 1) the array boots inactive and separated, and 2) won't boot back normal the next (one or more) reboot. But thank you for the tips. Okay. Now that I have a system I can (mostly) work with, let's talk about that initrd. So, first step is to build mdadm, lvm, and busybox with +static? I'm fine with using package.use. What next?
Cheers,
EE |
|
| Back to top |
|
 |
NeddySeagoon Administrator


Joined: 05 Jul 2003 Posts: 30016 Location: 56N 3W
|
Posted: Tue Apr 17, 2012 6:20 pm Post subject: |
|
|
I used a combination of this wiki post which covers the straight forward (relatively) case of root on lvm on raid and this
wiki thread covering the new bit for mounting the separate partitions.
To actually build the initramfs, I used the kenel provided script.
The second wiki article is writen for a 32 bit install. That cause me some grief as I'm 64bit no-multilib.
I like to build my initrd by hand in /root/initrd/ with everything I need there. That way I know it won't change with a | Code: | | emerge --sync && emerge world -uDN | so I'm not a big fan of the kernel provided script that sucks random files off your live filesystem. Its a real PITA when you make a useless initrd. I still have the work to do to make my initrd independant of my life filesystem but my box boots cleanly now, so its non urgent. I really don't care if my initrd is fullof security holes - it runs once at boot before networking is up, so it can't be exploited. I do get paranoid when it doesn't work
My /root/initrd/initramfs_list contains | Code: | # directory structure
dir /proc 755 0 0
dir /usr 755 0 0
dir /bin 755 0 0
dir /sys 755 0 0
dir /var 755 0 0
#dir /lib 755 0 0
dir /lib64 755 0 0
dir /sbin 755 0 0
dir /mnt 755 0 0
dir /mnt/root 755 0 0
dir /etc 755 0 0
dir /root 700 0 0
dir /dev 755 0 0
# busybox
file /bin/busybox /bin/busybox 755 0 0
# for raid on lvm
file /sbin/mdadm /sbin/mdadm 755 0 0
file /sbin/lvm.static /sbin/lvm.static 755 0 0
# libraries required by /sbin/fsck.ext4 and /sbin/fsck
slink /lib /lib64 777 0 0
file /lib64/ld-linux-x86-64.so.2 /lib64/ld-linux-x86-64.so.2 755 0 0
file /lib64/libext2fs.so.2 /lib64/libext2fs.so.2 755 0 0
file /lib64/libcom_err.so.2 /lib64/libcom_err.so.2 755 0 0
file /lib64/libpthread.so.0 /lib64/libpthread.so.0 755 0 0
file /lib64/libblkid.so.1 /lib64/libblkid.so.1 755 0 0
file /lib64/libuuid.so.1 /lib64/libuuid.so.1 755 0 0
file /lib64/libe2p.so.2 /lib64/libe2p.so.2 755 0 0
file /lib64/libc.so.6 /lib64/libc.so.6 755 0 0
file /sbin/fsck /sbin/fsck 755 0 0
file /sbin/fsck.ext4 /sbin/fsck.ext4 755 0 0
# our init script
file /init /root/initrd/init 755 0 0 | If you don't use ext4, you need to run ldd on your fsck helper and include it and its libraires in place of fsck.ext4.
If your /usr and /var are different filesystems, you need both fsck helpers and their libraries.
My initscript ended up as | Code: | #!/bin/busybox sh
rescue_shell() {
echo "$@"
echo "Something went wrong. Dropping you to a shell."
/bin/busybox --install -s
exec /bin/sh
}
# allow the use of UUIDs or filesystem lables
uuidlabel_root() {
for cmd in $(cat /proc/cmdline) ; do
case $cmd in
root=*)
type=$(echo $cmd | cut -d= -f2)
echo "Mounting rootfs"
if [ $type == "LABEL" ] || [ $type == "UUID" ] ; then
uuid=$(echo $cmd | cut -d= -f3)
mount -o ro $(findfs "$type"="$uuid") /mnt/root
else
mount -o ro $(echo $cmd | cut -d= -f2) /mnt/root
fi
;;
esac
done
}
check_filesystem() {
# most of code coming from /etc/init.d/fsck
local fsck_opts= check_extra= RC_UNAME=$(uname -s)
# FIXME : get_bootparam forcefsck
if [ -e /forcefsck ]; then
fsck_opts="$fsck_opts -f"
check_extra="(check forced)"
fi
echo "Checking local filesystem $check_extra : $1"
if [ "$RC_UNAME" = Linux ]; then
fsck_opts="$fsck_opts -C0 -T"
fi
trap : INT QUIT
# using our own fsck, not the builtin one from busybox
/sbin/fsck -p $fsck_opts $1
ret_val=$?
case $ret_val in
0) return 0;;
1) echo "Filesystem repaired"; return 0;;
2|3) if [ "$RC_UNAME" = Linux ]; then
echo "Filesystem repaired, but reboot needed"
reboot -f
else
rescue_shell "Filesystem still have errors; manual fsck required"
fi;;
4) if [ "$RC_UNAME" = Linux ]; then
rescue_shell "Fileystem errors left uncorrected, aborting"
else
echo "Filesystem repaired, but reboot needed"
reboot
fi;;
8) echo "Operational error"; return 0;;
16) echo "Use or Syntax Error"; return 16;;
32) echo "fsck interrupted";;
127) echo "Shared Library Error"; sleep 20; return 0;;
*) echo $ret_val; echo "Some random fsck error - continuing anyway"; sleep 20; return 0;;
esac
# rescue_shell can't find tty so its broken
rescue_shell
}
# start for real here
# temporarily mount proc and sys
mount -t proc none /proc
mount -t sysfs none /sys
mount -t devtmpfs none /dev
# disable kernel messages from popping onto the screen
###echo 0 > /proc/sys/kernel/printk
# clear the screen
###clear
# assemble the raid set(s) - they got renumbered from md1, md5 and md6
# /boot
/sbin/mdadm --assemble /dev/md125 /dev/sda1 /dev/sdb1 /dev/sdc1 /dev/sdd1
# don't care if /boot fails to assemble
# / (root) I wimped out of root on lvm for this box
/sbin/mdadm --assemble /dev/md126 /dev/sda5 /dev/sdb5 /dev/sdc5 /dev/sdd5 || rescue_shell
# if root won't assemble, we are stuck
# LVM for everything else
/sbin/mdadm --assemble /dev/md127 /dev/sda6 /dev/sdb6 /dev/sdc6 /dev/sdd6 || rescue_shell
# and if the LVM space won't assemble there is no /usr or /var so we are really in a mess
# TODO could auto cope with degraded raid operation
# lvm runs as whatever its called as
ln -s /sbin/lvm.static /sbin/vgchange
# start the vg volume group - we only have one volume group
/sbin/vgchange -ay vg || rescue_shell
# if this failed we have no /usr or /var
# get here with raid sets assembled and logical volumes available
# mounting rootfs on /mnt/root
uuidlabel_root || rescue_shell "Error with uuidlabel_root"
# space separated list of mountpoints that ...
mountpoints="/usr /var"
# ... we want to find in /etc/fstab ...
ln -s /mnt/root/etc/fstab /etc/fstab
# ... to check filesystems and mount our devices.
for m in $mountpoints ; do
#echo $m
check_filesystem $m
echo "Mounting $m"
# mount the device and ...
mount $m || rescue_shell "Error while mounting $m"
# ... move the tree to its final location
mount --move $m "/mnt/root"$m || rescue_shell "Error while moving $m"
done
echo "All done. Switching to real root."
# clean up. The init process will remount proc sys and dev later
umount /proc
umount /sys
umount /dev
# switch to the real root and execute init
exec switch_root /mnt/root /sbin/init |
A few gotchas not listed in those wiki pages. The filesystems checked and monted by the initrd need to be set to noauto in /etc/fstab or you will be told that some mounts failed. Thats expected. /usr at and maybe /var will already be mounted.
When DEVTMPFS makes your logical volume nodes, /dev/mapper/vg-user and friends are made, as are /dev/dm-0 and friends but the symbolic links in /dev/vg/ are not created.
This mens you can use the first two in /etc/fstab but not the latter. I found that out the hard way.
As the initrd contains the code for mounting everything by UUID, this is probably a good time to switch to UUID mounts. Don't do it all in one go though.
My raid assembly is explicit on the mdadam command line because its easy to follow. You could put /etc/mdadm.conf in the initrd and call that.
mdadm also understands how to assembe a raid set given its UUID. Thats still a TODO.
I don't have root on lvm on this system - thats the next one to convert. _________________ Regards,
NeddySeagoon
Computer users fall into two groups:-
those that do backups
those that have never had a hard drive fail. |
|
| Back to top |
|
 |
ExecutorElassus l33t


Joined: 11 Mar 2004 Posts: 697 Location: Stuttgart, Germany
|
Posted: Wed Apr 25, 2012 10:58 pm Post subject: |
|
|
Hi Neddy!
so, after 10 days of basically working okay, I rebooted today. Now, when I reboot, the large RAID5 array - the one holding /usr, /var, /home, etc - is active as "auto-read-only" and none of its partitions are mounted.
So, I'm back to where I started. Sorta.
As far as I can tell, the array is fine, and all its drives are active; they just … aren't being mounted by mdadm. Is there a way to re-initialize the system, so that mdadm and lvm re-do mounting and checking everything, and re-load all the stuff that lives on that array?
On a second question, can you think of any reason why that array would always start up auto-read-only?
Thanks,
EE
(and once again, I don't have access to a pager, because I haven't rebuilt busybox, mdadm, or lvm static. Do I need to do anything besides rebuild them with USE="static" to have access to them?) |
|
| Back to top |
|
 |
NeddySeagoon Administrator


Joined: 05 Jul 2003 Posts: 30016 Location: 56N 3W
|
Posted: Thu Apr 26, 2012 5:41 pm Post subject: |
|
|
ExecutorElassus,
If busybox, mdam and lvm are not built with USE=static, you need to remake them and rebuild your initrd.
Without the staic USE, they will have never worked in your initrd, never mind being ok for 5 days.
What versions of openrc and udev do you have ?
The is an alternative to the USE=static. You can add the libraries these applications need to your initrd.
I prefer static. _________________ Regards,
NeddySeagoon
Computer users fall into two groups:-
those that do backups
those that have never had a hard drive fail. |
|
| Back to top |
|
 |
|
|
You cannot post new topics in this forum You cannot reply to topics in this forum You cannot edit your posts in this forum You cannot delete your posts in this forum You cannot vote in polls in this forum
|
|