LVM børked; how do I rebuild? [SOLVED]

ExecutorElassus · Posted: Fri Apr 13, 2012 6:59 pm Post subject:

Hi Neddy,

both lines 5 and 197 still show 0 for the raw value, so I'll assume there were no write errors (I added the suspect drive to the array last, so it was written). (Incidentally, resync sped up considerably when I stopped boinc).

Okay, so I'll try tinderboxing (?) portage, then python, and see what that does. So, just unpack them from root?

If that fails, is "reinstall" something less drastic than "chroot in from a liveCD and start over from scratch"?

Cheers,

EE
UPDATE: for some reason, trying to reply boots me to the main index, so I'll reply here. Uh, progress! On a lark, I guessed that 'install' - as part of coreutils - might be broken. I used the tinderbox version, and now I've emerged portage to its latest version. I'll try to sync and see what happens.

NeddySeagoon · Posted: Fri Apr 13, 2012 7:03 pm Post subject:

ExecutorElassus,

Less haste. Get the right portage for you and unpack it to the root of your filesystem.
I posted the details earlier. You must use the p option to tar ir it still won't work as it will be unpacked with -x in the permissions. That means it won't eXecute, not even for root.

Then test - see what has changed if anything.
_________________
Regards,

NeddySeagoon

Computer users fall into two groups:-
those that do backups
those that have never had a hard drive fail.

ExecutorElassus · Posted: Fri Apr 13, 2012 7:37 pm Post subject:

okay. I managed to emerge portage successfully, and then tried to re-emerge coreutils. That failed on a broken /usr/include/mntent.h, so now I'm emerging glibc. After that I'll try coreutils again.

It seems the re-syncing (or rather, several iterations of it, along with bad journal/fsck management on my part - "sure, just auto-fix everything!") has left some files corrupted. But if I can run emerge, and then rebuild the toolchain, I can start getting things put back together.

I'll keep you posted.

Thanks again,

EE
UPDATEokay, glibc won't install due to a broken /usr/include/mntent.h, which belongs to linux-headers. I can't emerge linux-headers due to a broken file belonging to glibc. Using tinderbox files for both of those results in the following error:

NeddySeagoon · Posted: Fri Apr 13, 2012 8:25 pm Post subject:

ExecutorElassus

Can you do

ExecutorElassus · Posted: Fri Apr 13, 2012 8:28 pm Post subject:

Hrm: Apparently not:

NeddySeagoon · Posted: Fri Apr 13, 2012 8:29 pm Post subject:

ExecutorElassus,

What version of glibc did you have?
and what version do you have now?

glibc must not be downgraded. I guess you used to have glibc-2.14 and have a lower version now?
_________________
Regards,

NeddySeagoon

Computer users fall into two groups:-
those that do backups
those that have never had a hard drive fail.

ExecutorElassus · Posted: Fri Apr 13, 2012 8:34 pm Post subject:

I had glibc-2.14.1-r2, but the tinderbox version was glibc-2.13-r4 (and is thus my current version.

Sigh. So, if glibc can't be downgraded, what's my next step?

Cheers,

EE

NeddySeagoon · Posted: Fri Apr 13, 2012 8:36 pm Post subject:

ExecutorElassus,

I have sys-libs/glibc-2.14.1-r2 so I can post a tarball. It will be similar to what you would get from the tinderbox except its optimosed for an AMD Phenom II 1090.
I will have it for other 64 bit AMD arches too, like an E350 and whatever is in the HP Microserver.
If you prefer, I can tell you how to make you own packages. You will need 10G or so of space for this.
_________________
Regards,

NeddySeagoon

Computer users fall into two groups:-
those that do backups
those that have never had a hard drive fail.

ExecutorElassus · Posted: Fri Apr 13, 2012 8:42 pm Post subject:

Hi Neddy,

I'm going to go with the tarball option, because 1) I'm not sure I have 10GB free space to build without moving things around, and 2) I'm not certain of my toolchain's integrity.

would you mind posting your version? I might very well have a similar CPU, but right now, I'm mainly just worried about getting to a point where I can build my own toolchain (which apparently will need working python, linux-headers, coreutils, glibc, gcc, and probably rebuilding the kernel for good measure).

Thanks for the help.

EE

NeddySeagoon · Posted: Fri Apr 13, 2012 8:54 pm Post subject:

ExecutorElassus,

Heres my glibc-2.14.1-r2.

You don't use your toolchain to make your own packages.
Long story short ... make a ext2 fs in a file ... about 10G
Loopback mount the file on /mnt/gentoo. put a stage3 and portage snapshot in there
chroot into the new install in a file. Set FEATURES to include buildpkg, emerge --sync, emerge whatever you need.
From outside the chroot in a file copy the packages you want out of /mnt/gentoo/usr/portage/packages/...
Install them as anything else you fetch from the tinderbox.
rm the install in a file when you are done.
_________________
Regards,

NeddySeagoon

Computer users fall into two groups:-
those that do backups
those that have never had a hard drive fail.

ExecutorElassus · Posted: Fri Apr 13, 2012 8:59 pm Post subject:

So, now:

NeddySeagoon · Posted: Fri Apr 13, 2012 9:14 pm Post subject:

ExecutorElassus,

Yes. liveCD. This is why you build busybox with the static USE flag. So you still have something when glibc gets trashed.
You cn no longer chroot into your install as bash won't run.

Its cd /mnt/gentoo tar ...
_________________
Regards,

NeddySeagoon

Computer users fall into two groups:-
those that do backups
those that have never had a hard drive fail.

ExecutorElassus · Posted: Fri Apr 13, 2012 9:18 pm Post subject:

Okay, just to be clear: since I have partitions or /usr, /var, etc, I should mount them before I start untarring things, yes? Am I going to have to recreate all the device nodes and VGs first? How close to "from scratch" do I have to get?

Cheers,

EE

NeddySeagoon · Posted: Fri Apr 13, 2012 9:27 pm Post subject:

ExecutorElassus,

Thats correct - you want all the component parts of the tarball to go into the right places, so your filesystem needs to be assembled.
You will be safer with the command

ExecutorElassus · Posted: Fri Apr 13, 2012 9:32 pm Post subject:

Hi Neddy,

okay, so, I do this:

boot the LiveCD (in my case, SystemRescueCD 2.4.1), set up networking so I can ssh over from my laptop, and then … will the VGs already be mountable? Will I need to recreate all the device nodes and LVs? Or can I simply mount things that the CD will auto-detect?

ugh. I hate this. I'm sorry for all the trouble, and am really thankful you're walking me through this. I'll start the boot up with the liveCD, and get back to you.

Cheers,

EE

NeddySeagoon · Posted: Fri Apr 13, 2012 9:44 pm Post subject:

ExecutorElassus,

SystemRescueCd will start your raid sets and activate your logical volums. You should just need to do the mounts.
_________________
Regards,

NeddySeagoon

Computer users fall into two groups:-
those that do backups
those that have never had a hard drive fail.

ExecutorElassus · Posted: Fri Apr 13, 2012 10:02 pm Post subject:

Okay, I got all the RAID sets mounted okay. Now, this:

NeddySeagoon · Posted: Fri Apr 13, 2012 10:12 pm Post subject:

ExecutorElassus,

Nope, You can either attempt the chroot, or reboot to to test. Success with either means fixing glibc worked, since almost nothing works without glibc.
Then you test one file at a time and only replace what you need.

At this time of night, I would reboot or chroot, then run the bootstrap.sh script and see what happens.
Can you leave it building while you sleep?

"Unexprected EOF" ??? Extra Garbage at End Ignoed is safe to ignore.
_________________
Regards,

NeddySeagoon

Computer users fall into two groups:-
those that do backups
those that have never had a hard drive fail.

ExecutorElassus · Posted: Fri Apr 13, 2012 10:16 pm Post subject:

I can chroot into the system. I'll try running the bootstrap.sh, and report back tomorrow.

Thanks again for all the help. People like you are why gentoo is awesome.

Cheers,

EE
UPDATE: trying to run the bootstrap script from chroot results in

ExecutorElassus · Posted: Sat Apr 14, 2012 3:27 pm Post subject:

Hi Neddy,

okay, I guess we're now past the "how do I get the drives working again?" stage, and on to "how do I remerge everything?" stage. Today's error I can't figure out is from e2fsprogs:

ExecutorElassus · Posted: Sat Apr 14, 2012 4:55 pm Post subject:

Okay, maybe I still have drive problems.

I just rebooted, and - yet again - one of the members of md127 was put into its own array, and md127 itself was set inactive, with both of its member drives marked as spares. What's going on with that? I don't find any errors with dmesg, or with 'mdadm -E /dev/sdX4', so I'm not really sure why mdadm keeps dropping the drives out. Can you give any advice?

Thanks,

EE

NeddySeagoon · Posted: Sat Apr 14, 2012 5:49 pm Post subject:

ExecutorElassus,

Look at dmesg and the event count on each member of your raid set.
If you raid set assembled in degraded mode (only n-1) drives, it would not rin unless your forced it to run.
You would remember doing

ExecutorElassus · Posted: Sat Apr 14, 2012 6:01 pm Post subject:

Hi Neddy,

the two drives that were in an inactive array - and marked as spares - had the same event count. The one that got dropped out had six fewer.

So, I'll try running dd on the array, once it's built in about six hours. Is it possible that the wonky role numbers for the drives (sda4[0] sdc4[3] sdb4[2], whereas the other two arrays are respectively [0] [1] [2]) is causing mdadm to assume that a a drive in between is missing, and that sda4 (which was not in the array at startup) did not belong (as sdc4 had a role number of [3], already beyond the drive count)? Is there any way to fix that on the fly?

Cheers,

EE

NeddySeagoon · Posted: Sat Apr 14, 2012 6:02 pm Post subject:

ExecutorElassus,

Put

ExecutorElassus · Posted: Sat Apr 14, 2012 6:30 pm Post subject:

Hi Neddy,

I'll add in those packages to package.mask once I can boot up with RAID working (right now, I don't even have access to nano, much less the files I need to edit). The only thing I can see at the end of dmesg is "mdadm: sending ioctl 1261 to a partition!" which another forum told me was a kernel error I can ignore.

If there's nothing in dmesg, can you think of any reason my RAID would be dropping drives out of the array? It's a different dive from the one that was suspect last time; I'd find it really hard to believe that two drives out of three failed.

Cheers,

EE