Gentoo Forums
Gentoo Forums
Gentoo Forums
Quick Search: in
LVM børked; how do I rebuild? [SOLVED]
View unanswered posts
View posts from last 24 hours

Goto page Previous  1, 2, 3, 4  Next  
Reply to topic    Gentoo Forums Forum Index Kernel & Hardware
View previous topic :: View next topic  
Author Message
NeddySeagoon
Administrator
Administrator


Joined: 05 Jul 2003
Posts: 29988
Location: 56N 3W

PostPosted: Sat Apr 14, 2012 9:23 pm    Post subject: Reply with quote

ExecutorElassus,

I cannot think of any reason for a drive to drop out of a raid set and not leave a sign in your log.

If the system shut down correctly and something happened to make the next assemble fail, you might miss it as it may not be possibe to log, particulary if the log was on the raid that did not start.
dmesg keeps a ring buffer in RAM, so the dmesg command works for the content of the buffer, even if the log location is not mounted. Of course, you lose that on power down.

I've not been around much today - my system has been mostly failing to boot while I did the udev-182 upgrade.
It only boots now if I skip the fsck in the initrd.
_________________
Regards,

NeddySeagoon

Computer users fall into two groups:-
those that do backups
those that have never had a hard drive fail.
Back to top
View user's profile Send private message
ExecutorElassus
l33t
l33t


Joined: 11 Mar 2004
Posts: 697
Location: Stuttgart, Germany

PostPosted: Sat Apr 14, 2012 9:30 pm    Post subject: Reply with quote

Hi Neddy,

well, then you can tell me how to get udev-182 to play nice with my RAIDed /usr once I have it working. :wink:

For the assembly problems, would it be useful to make use of the mdadm.config file? Like, actually specify the arrays manually? Right now, it seems to be building them based on their own superblocks (or whatever else mdadm uses when there's no config file), and maybe stating explicitly which partitions go into which array might make things work better.

In any case, once the recovery is done, I'll run dd as per your instructions to check for read errors. If dmesg says nothing, I'll try rebooting later tonight (tomorrow morning) and report back.

Grr... my failures are always the ones that make no sense.

Thanks again for the help, and good luck with udev. Have you tried using the earlymount script posted here?

Cheers,

EE
Back to top
View user's profile Send private message
NeddySeagoon
Administrator
Administrator


Joined: 05 Jul 2003
Posts: 29988
Location: 56N 3W

PostPosted: Sat Apr 14, 2012 9:56 pm    Post subject: Reply with quote

ExecutorElassus,

raid is OK, I'm having problems with separate /usr and /var which are in lvm2 on raid5.
I'm going with the wiki.gentoo.org page but with additions for raid and lvm.

The raid bits work fine - the lvm ones don't ... not yet.
It looks like I don't get any /dev nodes for the logical volumes.

Do you use raid autodetect in the kernel.?
If so, to use an mdadm.conf file you will need to either move to an initrd or not have root on raid, since root on raid needs the raid assembled before root is mounted.
_________________
Regards,

NeddySeagoon

Computer users fall into two groups:-
those that do backups
those that have never had a hard drive fail.
Back to top
View user's profile Send private message
ExecutorElassus
l33t
l33t


Joined: 11 Mar 2004
Posts: 697
Location: Stuttgart, Germany

PostPosted: Sat Apr 14, 2012 10:09 pm    Post subject: Reply with quote

Hi Neddy,

I … think I have an initrd? But I do for certain have / on a RAID1 (mirrored, yes?) array. So is /boot, for that matter.

That forum seems to have a script to run pre-mount and checking on RAID/LVM at the sysinit level. I don't know for certain if that would work for my setup (which is the one specified in the gentoo RAID/LVM2 quick-install handbook: /boot on RAID1, x swap partitions, / on RAID1, and then a big RAID5 for everything else [in my case, /usr, /opt, /var, /var/tmp, /usr/portage, /usr/portage/distfiles, and /home all get an lv, along with 1.7TB of storage partitions). Since the kernel source directories are under /usr/src/linux, which is on the big RAID5, is that going to cause problems with the kernel loading (with <udev-182)?

The earlymount script seemed to work okay, but then my drives shut down, and I rebooted into a broken system.

Since I have a couple hours to go before I can test the md127 that's rebuilding, lemme plug you with questions. Is it possible to change the role numbers of an active array? I'm still bothered by sdc4 being [3] instead of [1] like all the other sdcX partitions are in their respective arrays, and wonder if that might be causing problems.

So far, all dmesg says is the mdadm message I posted before, and
Code:
scsi_verify_blk_ioctl: 16 callbacks suppressed
.

Any advice?

Cheers,

EE
Back to top
View user's profile Send private message
NeddySeagoon
Administrator
Administrator


Joined: 05 Jul 2003
Posts: 29988
Location: 56N 3W

PostPosted: Sat Apr 14, 2012 10:37 pm    Post subject: Reply with quote

ExecutorElassus,

The role numbers don't matter. The do not need to map to the partitions in any particular order.
What happens in /usr/src has no impact on booting. The kernel you boot is a binary file, normally in /boot
That you have /boot on raid1 rells me that yur /boot raid is a version 0.9 superblock, so could be kernel auto assebled. Grub is not interested in that since raid assembly, however its done, happens after grub has done its stuff and exited.

You may have an initrd to assemble the rest and start your logical volume. You may not too. look in grub.conf. Do you have an initrd entry user tour kernel line ?
_________________
Regards,

NeddySeagoon

Computer users fall into two groups:-
those that do backups
those that have never had a hard drive fail.
Back to top
View user's profile Send private message
ExecutorElassus
l33t
l33t


Joined: 11 Mar 2004
Posts: 697
Location: Stuttgart, Germany

PostPosted: Sat Apr 14, 2012 10:44 pm    Post subject: Reply with quote

Hi Neddy,

nope, no initrd line. So, I guess that means no initrd.

Both /boot and / are indeed on 0.9 superblocks, as instructed by the install guide. The RAID5 is 1.2.

What I've heard about initrd or initramfs is that both of them add to boot time, and are thus undesirable (or so I gathered from that thread about the earlymounts script). But I'll worry about that part of setup once I've sorted why my RAID5 keeps barfing.

90 minutes to go from this message. I imagine you'll be off to bed by then. Should I then proceed with 'dd', and then try rebooting?

Cheers,

EE
Back to top
View user's profile Send private message
NeddySeagoon
Administrator
Administrator


Joined: 05 Jul 2003
Posts: 29988
Location: 56N 3W

PostPosted: Sat Apr 14, 2012 10:52 pm    Post subject: Reply with quote

ExecutorElassus,

Yes. The dd is harmless. As you have no initrd, you must use kernel auto assembly, so you can't use an mdadm.conf.

For you, your initrd would only assemble your raid and start your lvm, which has to happen anyway.
It would not be a huge bloated full of kernel modules initrd that you rebuild with every kernel. It would just be a space for some userspace tools. It need not increase the boot time.
_________________
Regards,

NeddySeagoon

Computer users fall into two groups:-
those that do backups
those that have never had a hard drive fail.
Back to top
View user's profile Send private message
ExecutorElassus
l33t
l33t


Joined: 11 Mar 2004
Posts: 697
Location: Stuttgart, Germany

PostPosted: Sat Apr 14, 2012 10:58 pm    Post subject: Reply with quote

Hrm,

and as I understand it, this is what's causing problems with udev-182, yes? So, an initrd would mount the MDs, and start the RAID/LVM, earlier in the boot process, yes?

Maybe I should just suck it up and use one. Is there documentation on it?

But first things first. I'll see if I can get this RAID5 array to survive a reboot, and then proceed wit the rest.

I'll report back as soon as dd is finished (unless there are other things I should do in the meantime?)

Thanks again,

EE
Back to top
View user's profile Send private message
ExecutorElassus
l33t
l33t


Joined: 11 Mar 2004
Posts: 697
Location: Stuttgart, Germany

PostPosted: Sun Apr 15, 2012 9:44 am    Post subject: Reply with quote

Okay, how about this:

dd finished with no messages in dmesg. So I reboot. As before, sda4 gets moved into md125 (a bogus array), and that array fails to start. BUT. I reboot, and on the next bootup, everything is in its correct (active array), partitions are mounted, and fsck checks all the partitions.

This has been happening for some time, now that I think of it: the first reboot always results in a disabled RAID5, but rebooting results in a functional one.

Any idea why?

Cheers,

EE
Back to top
View user's profile Send private message
NeddySeagoon
Administrator
Administrator


Joined: 05 Jul 2003
Posts: 29988
Location: 56N 3W

PostPosted: Sun Apr 15, 2012 12:14 pm    Post subject: Reply with quote

ExecutorElassus,

That sounds like a race condition. Maybe one drive is taking longer then the others to come ready, so the kernel gives up waiting for it.
If it were spin up time related, at the reboot, the drives would already be spun up ... so it would just work.

Spin up time is a wearout paramater reported in smartctl.
_________________
Regards,

NeddySeagoon

Computer users fall into two groups:-
those that do backups
those that have never had a hard drive fail.
Back to top
View user's profile Send private message
ExecutorElassus
l33t
l33t


Joined: 11 Mar 2004
Posts: 697
Location: Stuttgart, Germany

PostPosted: Sun Apr 15, 2012 12:28 pm    Post subject: Reply with quote

Hi Neddy,

but I'm not booting from a full shutdown. I'm using 'init 6' both times. This morning, after dd finished, I hit 'init 6'. The first boot, I got one drive dropped out of the array. The next boot, it's back in place without my doing anything.

smartctl shows 0 for "Spin Up Time" for all three drives. sdc (the suspect one) does show:
Code:
183 Runtime_Bad_Block       0x0000   001   001   000    Old_age   Offline      -       1683
188 Command_Timeout         0x0032   100   001   000    Old_age   Always       -       1632
199 UDMA_CRC_Error_Count    0x003e   200   200   000    Old_age   Always       -       1627
where the other two show zero. Is that anything?

On the other side, I cannot emerge xorg-server due to these errors:
Code:
/var/tmp/portage/x11-base/xorg-server-1.12.0-r1/work/xorg-server-1.12.0/dix/devices.c: In function 'FreeDeviceClass':
/var/tmp/portage/x11-base/xorg-server-1.12.0-r1/work/xorg-server-1.12.0/dix/devices.c:732:21: warning: declaration of 'type' shadows a global declaration
/var/tmp/portage/x11-base/xorg-server-1.12.0-r1/work/xorg-server-1.12.0/dix/devices.c: In function 'FreeFeedbackClass':
/var/tmp/portage/x11-base/xorg-server-1.12.0-r1/work/xorg-server-1.12.0/dix/devices.c:798:23: warning: declaration of 'type' shadows a global declaration
/var/tmp/portage/x11-base/xorg-server-1.12.0-r1/work/xorg-server-1.12.0/dix/devices.c: In function 'BadDeviceMap':
/var/tmp/portage/x11-base/xorg-server-1.12.0-r1/work/xorg-server-1.12.0/dix/devices.c:1637:30: warning: declaration of 'length' shadows a global declaration
/var/tmp/portage/x11-base/xorg-server-1.12.0-r1/work/xorg-server-1.12.0/dix/devices.c: In function 'GetMaster':
/var/tmp/portage/x11-base/xorg-server-1.12.0-r1/work/xorg-server-1.12.0/dix/devices.c:2610:33: warning: declaration of 'which' shadows a global declaration
/var/tmp/portage/x11-base/xorg-server-1.12.0-r1/work/xorg-server-1.12.0/dix/devices.c: In function 'AllocDevicePair':
/var/tmp/portage/x11-base/xorg-server-1.12.0-r1/work/xorg-server-1.12.0/dix/devices.c:2653:18: warning: declaration of 'pointer' shadows a global declaration
make[2]: *** [devices.lo] Error 1
make[2]: *** Waiting for unfinished jobs....
This is just a snippet. Any guess what that is?

Let me know what you think.

Cheers,

EE
PS- now that I can emerge world again, some problems I had (like qt not emerging due to a block) aren't a problem any more. But xorg-server still won't emerge, due to this error. I'll let you know if any other of the 214 packages I'm emerging conks out.
Back to top
View user's profile Send private message
NeddySeagoon
Administrator
Administrator


Joined: 05 Jul 2003
Posts: 29988
Location: 56N 3W

PostPosted: Sun Apr 15, 2012 1:30 pm    Post subject: Reply with quote

ExecutorElassus,

Its difficult for me working though a keyhole ... make friends with wgetpaste and post the whole build log.
The failure message at the end of every failed build tells where the log is.

Warning are just that - warnings and thats all thats in your log snippit.

You need tp look on your drive vendors web site to understand what the raw values mean. They are often several bit fields in a 32 bit vlaue.
The normalised values would be a pass but
Code:
  183 Runtime_Bad_Block       0x0000   001   001   000    Old_age   Offline
is only updated with an offline test, so thats not really telling us anything.
_________________
Regards,

NeddySeagoon

Computer users fall into two groups:-
those that do backups
those that have never had a hard drive fail.
Back to top
View user's profile Send private message
ExecutorElassus
l33t
l33t


Joined: 11 Mar 2004
Posts: 697
Location: Stuttgart, Germany

PostPosted: Sun Apr 15, 2012 6:18 pm    Post subject: Reply with quote

pastebin is my new imaginary internet boyfriend (sorry): build.log

I'll do some digging with the drive vendor, and let you know inn a sec which ebuilld failed this time.

Cheers,

EE

UPDATE: looking at that log, the only lines which return explicit errors are from a file that belongs to kbproto. I've remerged that, and I'll get back to you about xorg-server once the @world update is done.
Back to top
View user's profile Send private message
ExecutorElassus
l33t
l33t


Joined: 11 Mar 2004
Posts: 697
Location: Stuttgart, Germany

PostPosted: Sun Apr 15, 2012 8:30 pm    Post subject: Reply with quote

I got a little further in building X. The new build.log (which still fails) is here.

I'm also failing my @world emerge on pango. The log for that is here.

Any ideas what's going wrong with those?

Cheers,

EE
UPDATE:pango is fixed. It was libXft, not Xutil, that was choking.
Back to top
View user's profile Send private message
ExecutorElassus
l33t
l33t


Joined: 11 Mar 2004
Posts: 697
Location: Stuttgart, Germany

PostPosted: Mon Apr 16, 2012 4:44 pm    Post subject: Reply with quote

Okay, I've gotten a bit further. Now I'm on kdepimlibs, which fails on the following:

Code:
[  0%] Built target kcal_automoc
Generating contactsearchjob.moc
Generating transactionjobs.moc
Scanning dependencies of target kimap_automoc
Scanning dependencies of target kio_sieve_automoc
Generating session.moc
Generating messagethreaderproxymodel.moc
Generating deletejob.moc
Generating preprocessorbase_p.moc
[  0%] Built target kio_sieve_automoc
Generating standardmailactionmanager.moc
Scanning dependencies of target kio_imap4_automoc
/var/tmp/portage/kde-base/kdepimlibs-4.8.2/work/kdepimlibs-4.8.2/akonadi/contact/contactsearchjob.h:81: Error: Template classes not supported by Q_OBJECT
automoc4: process for /var/tmp/portage/kde-base/kdepimlibs-4.8.2/work/kdepimlibs-4.8.2_build/akonadi/contact/contactsearchjob.moc failed: Unknown error
pid to wait for: 0
This is preventing akonadi from emerging as well. Also, redlands seems to be broken due to a missing tab space in the makefile (though I managed to build nepomuk headers anyway).

Anyway, do you know what's up with kdepimlibs? Is that something wrong with the ebuild?

Cheers,

EE
Back to top
View user's profile Send private message
NeddySeagoon
Administrator
Administrator


Joined: 05 Jul 2003
Posts: 29988
Location: 56N 3W

PostPosted: Mon Apr 16, 2012 6:06 pm    Post subject: Reply with quote

ExecutorElassus,

I've never built KDE, so I'm not a lot of help here. Try searching on bugs.gentoo.org to see if its a known issue.
If so, there may be a fix for it there too.
_________________
Regards,

NeddySeagoon

Computer users fall into two groups:-
those that do backups
those that have never had a hard drive fail.
Back to top
View user's profile Send private message
NeddySeagoon
Administrator
Administrator


Joined: 05 Jul 2003
Posts: 29988
Location: 56N 3W

PostPosted: Mon Apr 16, 2012 6:07 pm    Post subject: Reply with quote

ExecutorElassus,

I've never built KDE, so I'm not a lot of help here. Try searching on bugs.gentoo.org to see if its a known issue.
If so, there may be a fix for it there too.
_________________
Regards,

NeddySeagoon

Computer users fall into two groups:-
those that do backups
those that have never had a hard drive fail.
Back to top
View user's profile Send private message
ExecutorElassus
l33t
l33t


Joined: 11 Mar 2004
Posts: 697
Location: Stuttgart, Germany

PostPosted: Mon Apr 16, 2012 8:50 pm    Post subject: Reply with quote

In fact, KDE bugs aren't even handled by the gentoo tracker: they have to be filed upstream. I did so, and we'll see what happens.

I have, however, managed to get xorg back up, so I'm back into my wm (hurrah!). I now have just four packages that won't update for various reasons, and then I imagine I'll be finding stray misnamed files for months.

But at least the system is up, and (sorta) stable, so now let's get back to the original issue: mdadm seems to be randomly dropping one of the drives out of the RAID array on bootup, with no errors in the log from shutdown (and, since logging doesn't work on the non-RAID system, nothing but dmesg to tell me what might have gone wrong at boot [but I can't scroll through that, so it isn't helpful]).

So, you're suggesting it's just spool-up times? Is there anything else that might cause it?

Cheers,

EE
Back to top
View user's profile Send private message
NeddySeagoon
Administrator
Administrator


Joined: 05 Jul 2003
Posts: 29988
Location: 56N 3W

PostPosted: Mon Apr 16, 2012 8:55 pm    Post subject: Reply with quote

ExecutorElassus,

Code:
dmesg | less
or pastebin your dmesg
_________________
Regards,

NeddySeagoon

Computer users fall into two groups:-
those that do backups
those that have never had a hard drive fail.
Back to top
View user's profile Send private message
ExecutorElassus
l33t
l33t


Joined: 11 Mar 2004
Posts: 697
Location: Stuttgart, Germany

PostPosted: Mon Apr 16, 2012 8:59 pm    Post subject: Reply with quote

when the array is inactive, I do not have access to less. Can I pipe it to a regular file? This presumes I can reboot into a working array, since nfs and ssh are also inaccessible without the RAID5. *sadpanda*

ALSO: I just noticed, that 'ld' will, in the middle of some compiles - in this case firefox - chew up close to 40% of my RAM (about 1.6GB). Is that normal? Or is this an unfortunate consequence of my file system being riddled with files identified as directories, and actual files emerged into backups, etc. etc.?

Cheers,

EE
Back to top
View user's profile Send private message
NeddySeagoon
Administrator
Administrator


Joined: 05 Jul 2003
Posts: 29988
Location: 56N 3W

PostPosted: Mon Apr 16, 2012 9:58 pm    Post subject: Reply with quote

ExecutorElassus,

You should be a little more adventuerous. Try
Code:
 dmesg > dmesg.txt


When your raid drops a drive, you can still start it.
Code:
mdadm --run /dev/mdX

This will run the raid in degraded mode. It will probably be at the expense of a resync when you readd the dropped drive.

Build your busybox with the static USE flag. That has a pager and it lives in /bin
busybox --help will tell you about it, or man busybox.

You will be building busybox, mdadm and lvm with the static use flag for your initrd, which you need to get past udev-182.
Learn about /etc/portage/package.use for per package USE flags. Do not set static in make.conf.
_________________
Regards,

NeddySeagoon

Computer users fall into two groups:-
those that do backups
those that have never had a hard drive fail.
Back to top
View user's profile Send private message
ExecutorElassus
l33t
l33t


Joined: 11 Mar 2004
Posts: 697
Location: Stuttgart, Germany

PostPosted: Mon Apr 16, 2012 10:06 pm    Post subject: Reply with quote

Okay.

For the time being, I'm only going to do that if 1) the array boots inactive and separated, and 2) won't boot back normal the next (one or more) reboot. But thank you for the tips. Okay. Now that I have a system I can (mostly) work with, let's talk about that initrd. So, first step is to build mdadm, lvm, and busybox with +static? I'm fine with using package.use. What next?

Cheers,

EE
Back to top
View user's profile Send private message
NeddySeagoon
Administrator
Administrator


Joined: 05 Jul 2003
Posts: 29988
Location: 56N 3W

PostPosted: Tue Apr 17, 2012 6:20 pm    Post subject: Reply with quote

I used a combination of this wiki post which covers the straight forward (relatively) case of root on lvm on raid and this
wiki thread covering the new bit for mounting the separate partitions.
To actually build the initramfs, I used the kenel provided script.
The second wiki article is writen for a 32 bit install. That cause me some grief as I'm 64bit no-multilib.

I like to build my initrd by hand in /root/initrd/ with everything I need there. That way I know it won't change with a
Code:
 emerge --sync && emerge world -uDN
so I'm not a big fan of the kernel provided script that sucks random files off your live filesystem. Its a real PITA when you make a useless initrd. I still have the work to do to make my initrd independant of my life filesystem but my box boots cleanly now, so its non urgent. I really don't care if my initrd is fullof security holes - it runs once at boot before networking is up, so it can't be exploited. I do get paranoid when it doesn't work

My /root/initrd/initramfs_list contains
Code:
# directory structure
dir /proc       755 0 0
dir /usr        755 0 0
dir /bin        755 0 0
dir /sys        755 0 0
dir /var        755 0 0
#dir /lib        755 0 0
dir /lib64      755 0 0
dir /sbin       755 0 0
dir /mnt        755 0 0
dir /mnt/root   755 0 0
dir /etc        755 0 0
dir /root       700 0 0
dir /dev        755 0 0

# busybox
file /bin/busybox /bin/busybox  755 0 0

# for raid on lvm
file /sbin/mdadm                /sbin/mdadm              755 0 0
file /sbin/lvm.static           /sbin/lvm.static         755 0 0

# libraries required by /sbin/fsck.ext4 and /sbin/fsck

slink   /lib                            /lib64                          777 0 0
file    /lib64/ld-linux-x86-64.so.2     /lib64/ld-linux-x86-64.so.2     755 0 0
file    /lib64/libext2fs.so.2           /lib64/libext2fs.so.2           755 0 0
file    /lib64/libcom_err.so.2          /lib64/libcom_err.so.2          755 0 0
file    /lib64/libpthread.so.0          /lib64/libpthread.so.0          755 0 0
file    /lib64/libblkid.so.1            /lib64/libblkid.so.1            755 0 0
file    /lib64/libuuid.so.1             /lib64/libuuid.so.1             755 0 0
file    /lib64/libe2p.so.2              /lib64/libe2p.so.2              755 0 0
file    /lib64/libc.so.6                /lib64/libc.so.6                755 0 0

file    /sbin/fsck              /sbin/fsck                      755 0 0
file    /sbin/fsck.ext4         /sbin/fsck.ext4                 755 0 0

# our init script
file    /init                   /root/initrd/init               755 0 0
If you don't use ext4, you need to run ldd on your fsck helper and include it and its libraires in place of fsck.ext4.
If your /usr and /var are different filesystems, you need both fsck helpers and their libraries.

My initscript ended up as
Code:
#!/bin/busybox sh

rescue_shell() {
    echo "$@"
    echo "Something went wrong. Dropping you to a shell."
    /bin/busybox --install -s
    exec /bin/sh
}

# allow the use of UUIDs or filesystem lables
uuidlabel_root() {
    for cmd in $(cat /proc/cmdline) ; do
        case $cmd in
        root=*)
            type=$(echo $cmd | cut -d= -f2)
            echo "Mounting rootfs"
            if [ $type == "LABEL" ] || [ $type == "UUID" ] ; then
                uuid=$(echo $cmd | cut -d= -f3)
                mount -o ro $(findfs "$type"="$uuid") /mnt/root
            else
                mount -o ro $(echo $cmd | cut -d= -f2) /mnt/root
            fi
            ;;
        esac
    done
}

check_filesystem() {
    # most of code coming from /etc/init.d/fsck

    local fsck_opts= check_extra= RC_UNAME=$(uname -s)

    # FIXME : get_bootparam forcefsck
    if [ -e /forcefsck ]; then
        fsck_opts="$fsck_opts -f"
        check_extra="(check forced)"
    fi

    echo "Checking local filesystem $check_extra : $1"

    if [ "$RC_UNAME" = Linux ]; then
        fsck_opts="$fsck_opts -C0 -T"
    fi

    trap : INT QUIT

    # using our own fsck, not the builtin one from busybox
    /sbin/fsck -p $fsck_opts $1

    ret_val=$?
    case $ret_val in
        0)      return 0;;
        1)      echo "Filesystem repaired"; return 0;;
        2|3)    if [ "$RC_UNAME" = Linux ]; then
                        echo "Filesystem repaired, but reboot needed"
                        reboot -f
                else
                        rescue_shell "Filesystem still have errors; manual fsck required"
                fi;;
        4)      if [ "$RC_UNAME" = Linux ]; then
                        rescue_shell "Fileystem errors left uncorrected, aborting"
                else
                        echo "Filesystem repaired, but reboot needed"
                        reboot
                fi;;
        8)      echo "Operational error"; return 0;;
        16)     echo "Use or Syntax Error"; return 16;;
        32)     echo "fsck interrupted";;
        127)    echo "Shared Library Error"; sleep 20; return 0;;
        *)      echo $ret_val; echo "Some random fsck error - continuing anyway"; sleep 20; return 0;;
    esac

# rescue_shell can't find tty so its broken
    rescue_shell
}

# start for real here

# temporarily mount proc and sys
mount -t proc none /proc
mount -t sysfs none /sys
mount -t devtmpfs none /dev

# disable kernel messages from popping onto the screen
###echo 0 > /proc/sys/kernel/printk
# clear the screen
###clear

# assemble the raid set(s) - they got renumbered from md1, md5 and md6
# /boot
/sbin/mdadm --assemble /dev/md125 /dev/sda1 /dev/sdb1 /dev/sdc1 /dev/sdd1
# don't care if /boot fails to assemble

# /  (root)  I wimped out of root on lvm for this box
/sbin/mdadm --assemble /dev/md126 /dev/sda5 /dev/sdb5 /dev/sdc5 /dev/sdd5 || rescue_shell
# if root won't assemble, we are stuck

# LVM for everything else
/sbin/mdadm --assemble /dev/md127 /dev/sda6 /dev/sdb6 /dev/sdc6 /dev/sdd6 || rescue_shell
# and if the LVM space won't assemble there is no /usr or /var so we are really in a mess
# TODO could auto cope with degraded raid operation

# lvm runs as whatever its called as
ln -s /sbin/lvm.static /sbin/vgchange

# start the vg volume group - we only have one volume group
/sbin/vgchange -ay vg || rescue_shell
# if this failed we have no /usr or /var

# get here with raid sets assembled and logical volumes available

# mounting rootfs on /mnt/root
uuidlabel_root || rescue_shell "Error with uuidlabel_root"

# space separated list of mountpoints that ...
mountpoints="/usr /var"

# ... we want to find in /etc/fstab ...
ln -s /mnt/root/etc/fstab /etc/fstab

# ... to check filesystems and mount our devices.
for m in $mountpoints ; do

#echo $m

    check_filesystem $m

    echo "Mounting $m"
    # mount the device and ...
    mount $m || rescue_shell "Error while mounting $m"

    # ... move the tree to its final location
    mount --move $m "/mnt/root"$m || rescue_shell "Error while moving $m"
done

echo "All done. Switching to real root."

# clean up. The init process will remount proc sys and dev later
umount /proc
umount /sys
umount /dev

# switch to the real root and execute init
exec switch_root /mnt/root /sbin/init


A few gotchas not listed in those wiki pages. The filesystems checked and monted by the initrd need to be set to noauto in /etc/fstab or you will be told that some mounts failed. Thats expected. /usr at and maybe /var will already be mounted.

When DEVTMPFS makes your logical volume nodes, /dev/mapper/vg-user and friends are made, as are /dev/dm-0 and friends but the symbolic links in /dev/vg/ are not created.
This mens you can use the first two in /etc/fstab but not the latter. I found that out the hard way.
As the initrd contains the code for mounting everything by UUID, this is probably a good time to switch to UUID mounts. Don't do it all in one go though.

My raid assembly is explicit on the mdadam command line because its easy to follow. You could put /etc/mdadm.conf in the initrd and call that.
mdadm also understands how to assembe a raid set given its UUID. Thats still a TODO.

I don't have root on lvm on this system - thats the next one to convert.
_________________
Regards,

NeddySeagoon

Computer users fall into two groups:-
those that do backups
those that have never had a hard drive fail.
Back to top
View user's profile Send private message
ExecutorElassus
l33t
l33t


Joined: 11 Mar 2004
Posts: 697
Location: Stuttgart, Germany

PostPosted: Wed Apr 25, 2012 10:58 pm    Post subject: Reply with quote

Hi Neddy!

so, after 10 days of basically working okay, I rebooted today. Now, when I reboot, the large RAID5 array - the one holding /usr, /var, /home, etc - is active as "auto-read-only" and none of its partitions are mounted.

So, I'm back to where I started. Sorta.

As far as I can tell, the array is fine, and all its drives are active; they just … aren't being mounted by mdadm. Is there a way to re-initialize the system, so that mdadm and lvm re-do mounting and checking everything, and re-load all the stuff that lives on that array?

On a second question, can you think of any reason why that array would always start up auto-read-only?

Thanks,

EE
(and once again, I don't have access to a pager, because I haven't rebuilt busybox, mdadm, or lvm static. Do I need to do anything besides rebuild them with USE="static" to have access to them?)
Back to top
View user's profile Send private message
NeddySeagoon
Administrator
Administrator


Joined: 05 Jul 2003
Posts: 29988
Location: 56N 3W

PostPosted: Thu Apr 26, 2012 5:41 pm    Post subject: Reply with quote

ExecutorElassus,

If busybox, mdam and lvm are not built with USE=static, you need to remake them and rebuild your initrd.
Without the staic USE, they will have never worked in your initrd, never mind being ok for 5 days.

What versions of openrc and udev do you have ?

The is an alternative to the USE=static. You can add the libraries these applications need to your initrd.
I prefer static.
_________________
Regards,

NeddySeagoon

Computer users fall into two groups:-
those that do backups
those that have never had a hard drive fail.
Back to top
View user's profile Send private message
Display posts from previous:   
Reply to topic    Gentoo Forums Forum Index Kernel & Hardware All times are GMT
Goto page Previous  1, 2, 3, 4  Next
Page 3 of 4

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum