| View previous topic :: View next topic |
| Author |
Message |
ExecutorElassus l33t


Joined: 11 Mar 2004 Posts: 697 Location: Stuttgart, Germany
|
Posted: Tue Apr 10, 2012 10:47 pm Post subject: LVM børked; how do I rebuild? [SOLVED] |
|
|
Okay, several problems:
one of the drives in a RAID5 array choked during some heavy writing, and I had to shut down dirty (ie, no unmounting; just had to switch off)
as a result, that array - which holds the LVM for all of my extended partitions, including /usr, /var, and all my userland storage - will not start.
#cat /proc/mdstat:
| Code: | Personalities [linear] [raid0] [raid1] [raid6] [raid5] [raid4]
md127 : inactive sda4[0] sdc4[3]
1931841384 blocks super 1.2
md1 : active raid1 sdc1[2] sda1[0] sdb1[1]
97536 blocks [3/3] [UUU]
md3 : active raid1 sdc3[2] sda3[0] sdb3[1] |
You will notice several problems with md127:
sdc4 has a role number higher than the array number, so it won't act as active. How do I change that?
I cannot fail or remove sdc4 from that array. I get errors about "device or resource busy"
also, 'md127' kinda sucks for an array name
I cannot add sdb4 to md127. Doing so returns:
| Code: | | mdadm: add new device failed for /dev/sdb4 as 4: invalid argument |
In short: how do I rebuild the array superblock for both /dev/sdb4 and /dev/sdc4, and then rebuild the array (as a bonus, it'd be nice to re-name it to something other than 'livecd:4' and give it a more sensible array number while I'm at it. Also, I do not have access to nano, or less, on this machine. Is it possible to do all this with mdadm?
Thanks for the help.
EE
UPDATE: 'cat /proc/mdstat' now returns:
| Code: | md127 : inactive sdb4[4](S) sda4[0] sdc4[4](S)
2897762076 blocks super 1.2 | suggesting that all drives are now in the array, they just have the wrong role numbers. 'mdadm -E /dev/sdc4' gives as the last line, while the other two are now both A.A Trying to start the array now results in 'I/O Error' as the failure.
UPDATE 2 - Message to the Future
As per XKCD:
Dear people from the future, here's what we've learned:
1) If one drive in a RAID array goes down, start the array degraded, back up your data, and then proceed. Do NOT force the raid controller to assume everything is correct unless you are sure it is. I made this mistake, and then the fsck on the next startup relocated files and nodes all over the place. I still have directories re-designated as files and vice-versa.
2) glibc breaking - in my case, by using tinderbox to overwrite my glibc emerge with an older version - is one of the worst ways to break a system. DON'T DO IT.
3) the udev-182 upgrade will render your system nonfunctional on the next boot if you don't take steps to pre-load /usr and /var if they are on separate partitions (or RAID or LVM).
4) NeddySeagoon - the one guy who replied to my query, and walked me through everything to repair my system - is a gentleman and a scholar. Buy him a beer for me if you run into him. He also wrote up a nice guide to help with the migration to udev-182. The migration is tricky! YMMV, etc etc.
5) build mdadm, lvm[2], and busybox with package-specific "static" USE flag, because you may need to use them before udev starts and mounts the libraries they would otherwise link.
6) Always have a SystemRescueCD (or some other liveCD) on hand, and a drive you can use to boot it.
7) make sure you understand the difference between UUIDs of a block device and of a filesystem. Different parts of your startup sequence (say, your /etc/fstab entries vs your root= entry in grub.conf vs. Neddy's initrd script) require one or the other.
Last edited by ExecutorElassus on Sun May 06, 2012 11:04 pm; edited 1 time in total |
|
| Back to top |
|
 |
NeddySeagoon Administrator


Joined: 05 Jul 2003 Posts: 29972 Location: 56N 3W
|
Posted: Wed Apr 11, 2012 6:13 pm Post subject: |
|
|
ExecutorElassus,
I know your pain. I had to replace a drive in my raid5 that houses LVM for my KVMs and the bare metail hardware system.
During the resync one of the 'good' drives got kicked out of the array with hard read errors, resulting in I/O errors and interface resets. This left me with a five spindle raid5 with only 3 active drives.
Anyway enough of my woes.
Please post the full mdadm -E output for each partition in the raid set giving you issues.
You should be able to stop the partitally assembled raid and force assembly with the partitions you give mdadm. How successful thats likely to be depends on the event count on each contributing partition.
The down side is that the raid will assemble but there is no way to detect and correct damaged data caused by the raid elements being out of sync.
You only get one go at this unless you have enough space to make images of the partitions involved so you have an undo.
Right now the preferred minor number of the raid set is the least of your problems. _________________ Regards,
NeddySeagoon
Computer users fall into two groups:-
those that do backups
those that have never had a hard drive fail. |
|
| Back to top |
|
 |
ExecutorElassus l33t


Joined: 11 Mar 2004 Posts: 697 Location: Stuttgart, Germany
|
Posted: Wed Apr 11, 2012 7:39 pm Post subject: |
|
|
sigh. So, since last night, I deleted the array, re-made it with --assume-clean, rebooted, dropped and re-added partitions, re-synced, rebooted again and ran fsck (which cleared a whole bunch of "illegal block"s from the inodes, and a few more repetitions of the same steps. Anyway, here are my current outputs (thanks heavens I can at least ssh into the box now!)
| Code: | # mdadm -E /dev/sda4
/dev/sda4:
Magic : a92b4efc
Version : 1.2
Feature Map : 0x0
Array UUID : d42e5336:b75b0144:a502f2a0:178afc11
Name : domo-kun:carrier (local to host domo-kun)
Creation Time : Wed Apr 11 02:10:50 2012
Raid Level : raid5
Raid Devices : 3
Avail Dev Size : 1931841384 (921.17 GiB 989.10 GB)
Array Size : 3863681024 (1842.35 GiB 1978.20 GB)
Used Dev Size : 1931840512 (921.17 GiB 989.10 GB)
Data Offset : 2048 sectors
Super Offset : 8 sectors
State : clean
Device UUID : f7f1d49b:a0272bc3:c46251a2:e0502319
Update Time : Wed Apr 11 21:34:23 2012
Checksum : 3823edf3 - correct
Events : 16936
Layout : left-symmetric
Chunk Size : 512K
Device Role : Active device 0
Array State : AAA ('A' == active, '.' == missing)
|
| Code: | # mdadm -E /dev/sdb4
/dev/sdb4:
Magic : a92b4efc
Version : 1.2
Feature Map : 0x0
Array UUID : d42e5336:b75b0144:a502f2a0:178afc11
Name : domo-kun:carrier (local to host domo-kun)
Creation Time : Wed Apr 11 02:10:50 2012
Raid Level : raid5
Raid Devices : 3
Avail Dev Size : 1931841384 (921.17 GiB 989.10 GB)
Array Size : 3863681024 (1842.35 GiB 1978.20 GB)
Used Dev Size : 1931840512 (921.17 GiB 989.10 GB)
Data Offset : 2048 sectors
Super Offset : 8 sectors
State : clean
Device UUID : b4009981:9f2fb7a3:0627bfaa:066872ba
Update Time : Wed Apr 11 21:34:48 2012
Checksum : a830e033 - correct
Events : 16936
Layout : left-symmetric
Chunk Size : 512K
Device Role : Active device 1
Array State : AAA ('A' == active, '.' == missing)
|
| Code: | # mdadm -E /dev/sdc4
/dev/sdc4:
Magic : a92b4efc
Version : 1.2
Feature Map : 0x0
Array UUID : d42e5336:b75b0144:a502f2a0:178afc11
Name : domo-kun:carrier (local to host domo-kun)
Creation Time : Wed Apr 11 02:10:50 2012
Raid Level : raid5
Raid Devices : 3
Avail Dev Size : 1931841384 (921.17 GiB 989.10 GB)
Array Size : 3863681024 (1842.35 GiB 1978.20 GB)
Used Dev Size : 1931840512 (921.17 GiB 989.10 GB)
Data Offset : 2048 sectors
Super Offset : 8 sectors
State : clean
Device UUID : 4a8d21e3:15026b07:bfacaedc:b5160599
Update Time : Wed Apr 11 21:35:09 2012
Checksum : 7def73bb - correct
Events : 16936
Layout : left-symmetric
Chunk Size : 512K
Device Role : Active device 2
Array State : AAA ('A' == active, '.' == missing)
|
| Code: | # mdadm --detail /dev/md127
/dev/md127:
Version : 1.2
Creation Time : Wed Apr 11 02:10:50 2012
Raid Level : raid5
Array Size : 1931840512 (1842.35 GiB 1978.20 GB)
Used Dev Size : 965920256 (921.17 GiB 989.10 GB)
Raid Devices : 3
Total Devices : 3
Persistence : Superblock is persistent
Update Time : Wed Apr 11 21:35:33 2012
State : clean
Active Devices : 3
Working Devices : 3
Failed Devices : 0
Spare Devices : 0
Layout : left-symmetric
Chunk Size : 512K
Name : domo-kun:carrier (local to host domo-kun)
UUID : d42e5336:b75b0144:a502f2a0:178afc11
Events : 16936
Number Major Minor RaidDevice State
0 8 4 0 active sync /dev/sda4
3 8 20 1 active sync /dev/sdb4
2 8 36 2 active sync /dev/sdc4
| For a while, sdb4 and sdc4 were both marked as "spare" and the array wasn't starting. I ran 'mdadm --zero-superblock' and added one, failed, then deleted the array and rebuilt it. It re-synced the drives after a reboot (which dropped sdb4 into "md126" as a sole member).
Ugh. It's kinda messy. Hoestly, there's not much on those drives I really care about (most of it is replacable, and I have backups), but there are a few things I'd hate to lose completely.
How hosed am I?
Thanks,
EE |
|
| Back to top |
|
 |
NeddySeagoon Administrator


Joined: 05 Jul 2003 Posts: 29972 Location: 56N 3W
|
Posted: Wed Apr 11, 2012 7:55 pm Post subject: |
|
|
ExecutorElassus,
The array is assembled and assumed clean - the information I was after has gone.
Find the stuff you want - copy it off then remake the filesystem and restore from backups. There is no way to detect any data integrity issues any longer.
With a 3 spindle raid set, you have three ways to bring it up indegraded mode to look around. Thats mostly harmless as with a drive missing it won't resync.
Your best bet would have been the two drives with the closest count. _________________ Regards,
NeddySeagoon
Computer users fall into two groups:-
those that do backups
those that have never had a hard drive fail. |
|
| Back to top |
|
 |
ExecutorElassus l33t


Joined: 11 Mar 2004 Posts: 697 Location: Stuttgart, Germany
|
Posted: Wed Apr 11, 2012 8:16 pm Post subject: |
|
|
sigh. I had a feeling.
Okay. On that array are the following:
/usr
/usr/portage
/usr/portage/distfiles
/opt
/tmp
/var
/var/tmp
/home
and nine partitions, totalling about 2TB of storage space
What is the best way to recover from that? Am I going to have to boot into a liveCD, re-do the partition table, and re-install? I do not seem able to re-emerge anything, as emerge fails to create /var/tmp/portage/[package]/work
double sigh. Is there a short list of essential files that I need to back up to enable a recovery? /etc/make.conf, the world file, /etc/fstab, grub.conf all come to mind. Any others?
Should I just start drinking and crying now? |
|
| Back to top |
|
 |
NeddySeagoon Administrator


Joined: 05 Jul 2003 Posts: 29972 Location: 56N 3W
|
Posted: Wed Apr 11, 2012 9:05 pm Post subject: |
|
|
ExecutorElassus,
| Code: | /usr/portage
/usr/portage/distfiles | are expendable - you can get them from the web.
/tmp is wiped at reboot, so it doesn't matter either. Consider putting /tmp in RAM
/usr and /opt will be rebuilt on reinstall, unless you have some manually installed packages there.
/home is all of your user files - which I hope you have backed up.
Salvage your /var/lib/portage/world file. Salvage /etc. Then reinstall
You cannot reuse /etc directly but you can reuse make.conf, and peek at /etc/conf.d/ for settings
After you have untarred the stage3, reinstate /var/lib/portage/world then do and all your apps will be rebuilt.
If you have some way of validation your raid and fixiing the damage this can be avoided. _________________ Regards,
NeddySeagoon
Computer users fall into two groups:-
those that do backups
those that have never had a hard drive fail. |
|
| Back to top |
|
 |
ExecutorElassus l33t


Joined: 11 Mar 2004 Posts: 697 Location: Stuttgart, Germany
|
Posted: Wed Apr 11, 2012 9:20 pm Post subject: |
|
|
Indeed, I seem to have been incorrect about the partition tables: they still exist. I should probably copy over /etc/mdadm.conf as well, yes?
I'm able to mount and copy over all my /home sub-partitions (and since they're mostly just music files I already backed up, or small documents, that isn't hard).
Okay, so here's the thing: what is the process for reinstall? Just download the tarball and unpack it? Remember, I can't emerge things on the current system: I get errors about being unable to create the working directories in /var/tmp.
world, make.conf, package.use package.mask, and other similar files are all backed up.
So, I guess my basic question is this: is there a way to re-install without booting a liveCD and chrooting in? Can I just wipe /var/tmp and reformat it to start emerging things?
thanks again for your help,
EE |
|
| Back to top |
|
 |
NeddySeagoon Administrator


Joined: 05 Jul 2003 Posts: 29972 Location: 56N 3W
|
Posted: Wed Apr 11, 2012 9:29 pm Post subject: |
|
|
ExecutorElassus,
I would remake all of the filesystems, which means chrooting in.
I run lvm2 over raid5 and I don't have a mdadm.conf file. It can't be used anyway as I need the raid assembled to mount root, so its explicit mdmadm -A calls in the initrd.
Preserve it if it gives you a warm fuzzy feeling.
You say you have managed to salvage things - that may mean that there is little or no damage - there is just no way to tell without comparing against a backup.
Do you feel lucky?
If so, poke about and see whats broken and whats fixable. Its just possible that everything is ok.
What are the permissions on /var /var/tmp and /var/tmp/portage?
Is /var mounted rw ?
I suspect so or you would have got errors during boot.
Is /var full? _________________ Regards,
NeddySeagoon
Computer users fall into two groups:-
those that do backups
those that have never had a hard drive fail. |
|
| Back to top |
|
 |
ExecutorElassus l33t


Joined: 11 Mar 2004 Posts: 697 Location: Stuttgart, Germany
|
Posted: Wed Apr 11, 2012 9:48 pm Post subject: |
|
|
okay, lemme back up a bit, because maybe I'm not as badly off as I thought.
first, the "børked" of the subject was from the array freezing when I ran too many simultaneous rw-intensive operations on it (ie, shredding a big file while copying dozens over from another, while watching a large video file, and, uh, maybe a couple other things. So the drive didn't so much fail as get booted out of the array when I did a cold restart. I doubt I lost much (if any) data.
Honesty, I'm (mostly) confident that the data itself is okay (and the stuff that might not be is statistically going to be program files that I replace when I reinstall them, maybe?). So, uh, I feel lucky?
/var and /var/tmp are both mounted rw, and both about 20% full. The build.log reads:
| Code: | * Package: x11-base/xorg-server-1.12.0-r1
* Repository: gentoo
* Maintainer: x11@gentoo.org
* USE: amd64 elibc_glibc ipv6 kernel_linux multilib udev userland_GNU xorg
* FEATURES: ccache sandbox
install: invalid option -- 'm'
Try `install --help' for more information.
* ERROR: x11-base/xorg-server-1.12.0-r1 failed (unpack phase):
* Failed to create dir '/var/tmp/portage/x11-base/xorg-server-1.12.0-r1/work'
*
* Call stack:
* ebuild.sh, line 701: Called ebuild_main 'unpack'
* phase-functions.sh, line 955: Called dyn_unpack
* phase-functions.sh, line 243: Called die
* The specific snippet of code:
* install -m${PORTAGE_WORKDIR_MODE:-0700} -d "${WORKDIR}" || die "Failed to create dir '${WORKDIR}'"
*
* If you need support, post the output of 'emerge --info =x11-base/xorg-server-1.12.0-r1',
* the complete build log and the output of 'emerge -pqv =x11-base/xorg-server-1.12.0-r1'.
* The complete build log is located at '/var/tmp/portage/x11-base/xorg-server-1.12.0-r1/temp/build.log'.
* The ebuild environment file is located at '/var/tmp/portage/x11-base/xorg-server-1.12.0-r1/temp/environment'.
* S: '/var/tmp/portage/x11-base/xorg-server-1.12.0-r1/work/xorg-server-1.12.0'
| I get similar errors if I try to emerge nvidia-drivers (with the correct directory, of course). I cannot run 'startx' because I get an error about a missing /root/.serverauth.XXXXXXX file.
So, it seems the drives themselves are maybe okay (the services all start up), just no ability to install things. How should I proceed?
Cheers,
EE |
|
| Back to top |
|
 |
NeddySeagoon Administrator


Joined: 05 Jul 2003 Posts: 29972 Location: 56N 3W
|
Posted: Wed Apr 11, 2012 10:12 pm Post subject: |
|
|
ExecutorElassus,
Ok, so lets see if we can fix it.
Does /var/tmp/portage/x11-base/xorg-server-1.12.0-r1/temp/build.log exist?
If so please put it on a pastebin.
Sight of your emerge --info would be good too.
If you do | Code: | | mount /dev/shm /var/tmp/portage | does emerge workl
With over 2G RAM you can emerge most things like this. It just puts the portage build space in RAM. If you don't have a lot of RAM builds will fail because /var/tmp/portage gets full.
However that will get you a different error.
Can you remove the content of /var/tmp/portage/ ?
Its only needed while an emerge is in progress. _________________ Regards,
NeddySeagoon
Computer users fall into two groups:-
those that do backups
those that have never had a hard drive fail. |
|
| Back to top |
|
 |
ExecutorElassus l33t


Joined: 11 Mar 2004 Posts: 697 Location: Stuttgart, Germany
|
Posted: Wed Apr 11, 2012 10:25 pm Post subject: |
|
|
what I posted in the previous post was the entirety of that build.log.
emerge --info gives:
| Code: | # emerge --info
Portage 2.1.10.51 (default/linux/amd64/10.0, gcc-4.5.3, glibc-2.14.1-r2, 3.3.0-gentoo x86_64)
=================================================================
System uname: Linux-3.3.0-gentoo-x86_64-AMD_Phenom-tm-_9950_Quad-Core_Processor-with-gentoo-2.1
Timestamp of tree: Thu, 05 Apr 2012 20:45:01 +0000
ccache version 3.1.7 [enabled]
app-shells/bash: 4.2_p24
dev-java/java-config: 2.1.11-r3
dev-lang/python: 2.7.2-r3, 3.1.4-r3, 3.2.2-r1
dev-util/ccache: 3.1.7
dev-util/cmake: 2.8.7-r5
dev-util/pkgconfig: 0.26
sys-apps/baselayout: 2.1
sys-apps/openrc: 0.9.9.3
sys-apps/sandbox: 2.5
sys-devel/autoconf: 2.13, 2.68
sys-devel/automake: 1.9.6-r3, 1.10.3, 1.11.3
sys-devel/binutils: 2.22-r1
sys-devel/gcc: 4.4.6-r1, 4.5.3-r2
sys-devel/gcc-config: 1.6
sys-devel/libtool: 2.4.2
sys-devel/make: 3.82-r3
sys-kernel/linux-headers: 3.3 (virtual/os-headers)
sys-libs/glibc: 2.14.1-r2
Repositories: gentoo pd-overlay x-portage
ACCEPT_KEYWORDS="amd64 ~amd64"
ACCEPT_LICENSE="* -@EULA dlj-1.1 Mendeley-EULA"
CBUILD="x86_64-pc-linux-gnu"
CFLAGS="-march=athlon64 -O2 -pipe"
CHOST="x86_64-pc-linux-gnu"
CONFIG_PROTECT="/etc /usr/share/config /usr/share/gnupg/qualified.txt /var/lib/hsqldb"
CONFIG_PROTECT_MASK="/etc/ca-certificates.conf /etc/dconf /etc/env.d /etc/env.d/java/ /etc/fonts/fonts.conf /etc/gconf /etc/gentoo-release /etc/revdep-rebuild /etc/sandbox.d /etc/terminfo /etc/texmf/language.dat.d /etc/texmf/language.def.d /etc/texmf/updmap.d /etc/texmf/web2c"
CXXFLAGS="-march=athlon64 -O2 -pipe"
DISTDIR="/usr/portage/distfiles"
EMERGE_DEFAULT_OPTS="--autounmask=n"
FEATURES="assume-digests binpkg-logs ccache distlocks ebuild-locks fixlafiles news parallel-fetch protect-owned sandbox sfperms strict unknown-features-warn unmerge-logs unmerge-orphans userfetch"
FFLAGS=""
GENTOO_MIRRORS="ftp://gentoo.lagis.at/ http://gentoo.mirror.dkm.cz/pub/gentoo/ http://ftp.fi.muni.cz/pub/linux/gentoo/ http://gentoo.mirror.web4u.cz/ ftp://gentoo.mirror.web4u.cz/ ftp://ftp.klid.dk/gentoo/ http://mirror.uni-c.dk/pub/gentoo/ ftp://ftp.spline.inf.fu-berlin.de/mirrors/gentoo/ http://mirror.netcologne.de/gentoo/ ftp://ftp.wh2.tu-dresden.de/pub/mirrors/gentoo ftp://ftp.join.uni-muenster.de/pub/linux/distributions/gentoo http://gentoo.mneisen.org/ http://de-mirror.org/distro/gentoo/ ftp://ftp.uni-erlangen.de/pub/mirrors/gentoo ftp://ftp.tu-clausthal.de/pub/linux/gentoo/ http://ftp.spline.inf.fu-berlin.de/mirrors/gentoo/ ftp://ftp-stud.hs-esslingen.de/pub/Mirrors/gentoo/ ftp://de-mirror.org/distro/gentoo/ http://linux.rz.ruhr-uni-bochum.de/download/gentoo-mirror/ http://ftp6.uni-erlangen.de/pub/mirrors/gentoo ftp://linux.rz.ruhr-uni-bochum.de/gentoo-mirror/ ftp://mirror.netcologne.de/gentoo/ ftp://ftp6.uni-erlangen.de/pub/mirrors/gentoo ftp://ftp6.uni-muenster.de/pub/linux/distributions/gentoo ftp://sunsite.informatik.rwth-aachen.de/pub/Linux/gentoo http://ftp.uni-erlangen.de/pub/mirrors/gentoo http://ftp-stud.hs-esslingen.de/pub/Mirrors/gentoo/ ftp://ftp.ipv6.uni-muenster.de/pub/linux/distributions/gentoo ftp://gentoo.inf.elte.hu/ http://gentoo.inf.elte.hu/ http://ftp.heanet.ie/pub/gentoo/ ftp://ftp.heanet.ie/pub/gentoo/ ftp://ftp.df.lth.se/pub/gentoo/ http://mirror.switch.ch/ftp/mirror/gentoo/ ftp://mirror.switch.ch/mirror/gentoo/ http://gentoo.kiev.ua/ftp/"
LANG="en_US.utf8"
LDFLAGS="-Wl,-O1 -Wl,--as-needed"
LINGUAS="en en_US.utf8 de de_DE.utf8"
MAKEOPTS="-j5"
PKGDIR="/usr/portage/packages"
PORTAGE_CONFIGROOT="/"
PORTAGE_RSYNC_OPTS="--recursive --links --safe-links --perms --times --compress --force --whole-file --delete --stats --human-readable --timeout=180 --exclude=/distfiles --exclude=/local --exclude=/packages"
PORTAGE_TMPDIR="/var/tmp"
PORTDIR="/usr/portage"
PORTDIR_OVERLAY="/var/lib/layman/pd-overlay /usr/local/portage"
SYNC="rsync://rsync.europe.gentoo.org/gentoo-portage"
USE="X Xaw3d a52 aac acl acpi aim alsa amd64 apm audiofile bash-completion berkdb bzip2 cairo cddb cdinstall cdparanoia cdr clamav cli consolekit cracklib crypt css cups curl curlwrappers cxx dbus directfb dri dvd dvdr dvdread encode ffmpeg fftw firefox flac fortran ftp gdbm geoip gif gimp glut gpm graphite gstreamer gtk hddtemp iconv icq ieee1394 imagemagick imap imlib ipv6 jack java java6 javascript jikes joystick jpeg kde kde4 lame latex ldap libsamplerate libwww lm_sensors mad matroska mmx modules motif mp3 mpeg mplayer mudflap multilib ncurses nls nptl nptlonly nsplugin offensive ogg openal opengl openmp openssl oscar pam pcre pdf perl png policykit posix pppd python qt3support qt4 quicktime raw readline rss scanner session sndfile sockets speex spell sse sse2 ssl suid svg symlink sysfs syslog tcl tcpd tetex theora threads tidy tiff tk translucency truetype udev unicode usb videos vorbis wmf wxwindows x264 xcomposite xetex xine xml xorg xpm xscreensaver xulrunner xv xvid zlib" ALSA_CARDS="hda-intel usb-audio" ALSA_PCM_PLUGINS="adpcm alaw asym copy dmix dshare dsnoop empty extplug file hooks iec958 ioplug ladspa lfloat linear meter mmap_emul mulaw multi null plug rate route share shm softvol" APACHE2_MODULES="actions alias auth_basic authn_alias authn_anon authn_dbm authn_default authn_file authz_dbm authz_default authz_groupfile authz_host authz_owner authz_user autoindex cache cgi cgid dav dav_fs dav_lock deflate dir disk_cache env expires ext_filter file_cache filter headers include info log_config logio mem_cache mime mime_magic negotiation rewrite setenvif speling status unique_id userdir usertrack vhost_alias" CALLIGRA_FEATURES="kexi words flow plan sheets stage tables krita karbon braindump" CAMERAS="ptp2" COLLECTD_PLUGINS="df interface irq load memory rrdtool swap syslog" ELIBC="glibc" GPSD_PROTOCOLS="ashtech aivdm earthmate evermore fv18 garmin garmintxt gpsclock itrax mtk3301 nmea ntrip navcom oceanserver oldstyle oncore rtcm104v2 rtcm104v3 sirf superstar2 timing tsip tripmate tnt ubx" INPUT_DEVICES="keyboard mouse evdev" KERNEL="linux" LCD_DEVICES="bayrad cfontz cfontz633 glk hd44780 lb216 lcdm001 mtxorb ncurses text" LINGUAS="en en_US.utf8 de de_DE.utf8" PHP_TARGETS="php5-3" RUBY_TARGETS="ruby18" USERLAND="GNU" VIDEO_CARDS="nvidia" XTABLES_ADDONS="quota2 psd pknock lscan length2 ipv4options ipset ipp2p iface geoip fuzzy condition tee tarpit sysrq steal rawnat logmark ipmark dhcpmac delude chaos account"
Unset: CPPFLAGS, CTARGET, INSTALL_MASK, LC_ALL, PORTAGE_BUNZIP2_COMMAND, PORTAGE_COMPRESS, PORTAGE_COMPRESS_FLAGS, PORTAGE_RSYNC_EXTRA_OPTS, USE_PYTHON
| (sorry for the wall-o).
| Code: | #mount /dev/shm /var/tmp/portage
mount: /dev/shm is not a block device |
I emptied out /var/tmp/portage. I haven't remerging anything yet (I'm waiting for the backup copy operation to finish using up all my rw cycles).
Thanks again,
EE |
|
| Back to top |
|
 |
NeddySeagoon Administrator


Joined: 05 Jul 2003 Posts: 29972 Location: 56N 3W
|
Posted: Wed Apr 11, 2012 10:34 pm Post subject: |
|
|
ExecutorElassus,
I'll some more tomorrow night - about 19 hours from now. Meanwhile its late in Scotland. _________________ Regards,
NeddySeagoon
Computer users fall into two groups:-
those that do backups
those that have never had a hard drive fail. |
|
| Back to top |
|
 |
ExecutorElassus l33t


Joined: 11 Mar 2004 Posts: 697 Location: Stuttgart, Germany
|
Posted: Wed Apr 11, 2012 10:36 pm Post subject: |
|
|
I was gonna say. I'm at UTC+2, so it's bedtime for me as well.
Thanks a ton for the help. You're my new imaginary internet boyfriend or whatever. |
|
| Back to top |
|
 |
NeddySeagoon Administrator


Joined: 05 Jul 2003 Posts: 29972 Location: 56N 3W
|
Posted: Thu Apr 12, 2012 6:57 pm Post subject: |
|
|
ExecutorElassus,
I admit to being male ... I'm also rumored to be the oldest Gentoo dev.
Your log should have read ...
| Code: | * Package: x11-base/xorg-server-1.12.0-r1
* Repository: gentoo
* Maintainer: x11@gentoo.org
* USE: amd64 elibc_glibc ipv6 kernel_linux nptl udev userland_GNU xorg
* FEATURES: preserve-libs sandbox userpriv
>>> Unpacking source...
>>> Unpacking xorg-server-1.12.0.tar.bz2 to /var/tmp/portage/x11-base/xorg-server-1.12.0-r1/work
>>> Source unpacked in /var/tmp/portage/x11-base/xorg-server-1.12.0-r1/work
>>> Preparing source in /var/tmp/portage/x11-base/xorg-server-1.12.0-r1/work/xorg-server-1.12.0 ...
* Applying xorg-server-1.12-disable-acpi.patch .. | .
It should have gone on with the unpacking source ...
then its supposed to prepare and build before it gets to install. Lets assume your portage tree is broken as thats the easiest to fix. will fix that.
I think thats a long shot as everything is protected by several digests. If somethinf was wrong with your tree, you would get digest errors before the ebuild was even used.
Do you get the same error with every package? | Code: | install: invalid option -- 'm'
Try `install --help' for more information.
* ERROR: <package> failed (unpack phase): |
If so, its time to introduce you to the tinderbox. Either your portage or python is in a mess.
The tinderbox contains individual binary packages that can be used one at a time of fixing installs that cannot be fixed any other way.
Think of them as stage3 tarballs that contain a single package. That link is to the default/linux/amd64 branch, which from your emerge --info, is right for you.
There are two ways to use these packages. The portage friendy way is to put them into /usr/portage/packages/<catagory>/<package>/tinderbox_file so that you can do
| Code: | | emerge -K =<catagory>/<package>-<ver> | This installs properly without leaving any shrapnel anound your box but it does need a working portage.
If portage is broken this won't work. The emergency fallback is to fetch the file from the tinderbox and put in into / (your root). Now untar it there, | Code: | | tar xpf tarball_name | The p there is important.
Ignore the warning about extra garbage at end of file ignored.
If tar is broken use but you will need the tarball of tar from the tinderbox too.
Whichever method you use, as soon as portage is working, rebuild the packages you fetched from the tinderbox so they are built with your USE flags anf your CFLAGS.
sys-apps/portage-2.1.10.51 is a good place to start as thats your installed portage version. If your exact version is not there, choose the nearest version.
| Code: | | mount: /dev/shm is not a block device | was unexpected. Maybe you don't have Shared Memory Filesystem in your kernel. Anyway thats a side issue for now. _________________ Regards,
NeddySeagoon
Computer users fall into two groups:-
those that do backups
those that have never had a hard drive fail. |
|
| Back to top |
|
 |
ExecutorElassus l33t


Joined: 11 Mar 2004 Posts: 697 Location: Stuttgart, Germany
|
Posted: Thu Apr 12, 2012 7:42 pm Post subject: |
|
|
| Code: | # emerge --sync
>>> Starting rsync with rsync://212.24.172.37/gentoo-portage...
>>> Checking server timestamp ...
Unknown option: recursive
Unknown option: links
Unknown option: safe-links
Unknown option: perms
Unknown option: times
Unknown option: compress
Unknown option: force
Unknown option: whole-file
Unknown option: delete
Unknown option: stats
Unknown option: human-readable
Unknown option: timeout
Unknown option: exclude
Unknown option: exclude
Unknown option: exclude
Unknown option: verbose
Type shasum -h for help
* Rsync has reported that there is a syntax error. Please ensure
* that your SYNC statement is proper.
* SYNC=rsync://rsync.europe.gentoo.org/gentoo-portage
| hrm. whoops. would it be valid to attempt to download and unpack a portage tree snapshot first? Or should I assume python in broken, and go with tinderbox?
I'm going to assume /usr is probably hosed at this point, but I could be wrong. I can use ssh and nfs, thank heavens, so I can copy things as needed from my working laptop.
Thanks again for the help.
EE |
|
| Back to top |
|
 |
ExecutorElassus l33t


Joined: 11 Mar 2004 Posts: 697 Location: Stuttgart, Germany
|
Posted: Thu Apr 12, 2012 8:22 pm Post subject: |
|
|
also, 'mdadm --detail /dev/md127' now shows this at the bottom:
| Code: | # mdadm --detail /dev/md127
/dev/md127:
Version : 1.2
Creation Time : Wed Apr 11 02:10:50 2012
Raid Level : raid5
Array Size : 1931840512 (1842.35 GiB 1978.20 GB)
Used Dev Size : 965920256 (921.17 GiB 989.10 GB)
Raid Devices : 3
Total Devices : 3
Persistence : Superblock is persistent
Update Time : Thu Apr 12 22:17:25 2012
State : clean, degraded
Active Devices : 2
Working Devices : 2
Failed Devices : 1
Spare Devices : 0
Layout : left-symmetric
Chunk Size : 512K
Name : domo-kun:carrier (local to host domo-kun)
UUID : d42e5336:b75b0144:a502f2a0:178afc11
Events : 19457
Number Major Minor RaidDevice State
0 8 4 0 active sync /dev/sda4
1 0 0 1 removed
2 8 36 2 active sync /dev/sdc4
3 8 20 - faulty spare /dev/sdb4
| but smartctl reports no errors for the drive. but then this:
| Code: | # mdadm --manage /dev/md127 --remove /dev/sdb4
mdadm: hot removed /dev/sdb4 from /dev/md127
domo-kun ~ # mdadm --manage /dev/md127 --add /dev/sdb4
mdadm: /dev/sdb4 reports being an active member for /dev/md127, but a --re-add fails.
mdadm: not performing --add as that would convert /dev/sdb4 in to a spare.
mdadm: To make this a spare, use "mdadm --zero-superblock /dev/sdb4" first.
domo-kun ~ # mdadm --zero-superblock /dev/sdb4
mdadm: Unrecognised md component device - /dev/sdb4
|
Any idea why /dev/sdb4 keeps getting set faulty and dropped out of the array? PS, here's the result of smartctl:
| Code: | # smartctl --all /dev/sdb
smartctl 5.42 2011-10-20 r3458 [x86_64-linux-3.3.0-gentoo] (local build)
Copyright (C) 2002-11 by Bruce Allen, http://smartmontools.sourceforge.net
=== START OF INFORMATION SECTION ===
Model Family: Seagate Barracuda 7200.12
Device Model: ST31000528AS
Serial Number: 5VP6R8T8
LU WWN Device Id: 5 000c50 030a9c7e9
Firmware Version: CC44
User Capacity: 1.000.204.886.016 bytes [1,00 TB]
Sector Size: 512 bytes logical/physical
Device is: In smartctl database [for details use: -P show]
ATA Version is: 8
ATA Standard is: ATA-8-ACS revision 4
Local Time is: Thu Apr 12 22:23:06 2012 CEST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED
General SMART Values:
Offline data collection status: (0x82) Offline data collection activity
was completed without error.
Auto Offline Data Collection: Enabled.
Self-test execution status: ( 40) The self-test routine was interrupted
by the host with a hard or soft reset.
Total time to complete Offline
data collection: ( 600) seconds.
Offline data collection
capabilities: (0x7b) SMART execute Offline immediate.
Auto Offline data collection on/off support.
Suspend Offline collection upon new
command.
Offline surface scan supported.
Self-test supported.
Conveyance Self-test supported.
Selective Self-test supported.
SMART capabilities: (0x0003) Saves SMART data before entering
power-saving mode.
Supports SMART auto save timer.
Error logging capability: (0x01) Error logging supported.
General Purpose Logging supported.
Short self-test routine
recommended polling time: ( 1) minutes.
Extended self-test routine
recommended polling time: ( 196) minutes.
Conveyance self-test routine
recommended polling time: ( 2) minutes.
SCT capabilities: (0x103f) SCT Status supported.
SCT Error Recovery Control supported.
SCT Feature Control supported.
SCT Data Table supported.
SMART Attributes Data Structure revision number: 10
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
1 Raw_Read_Error_Rate 0x000f 117 099 006 Pre-fail Always - 155751471
3 Spin_Up_Time 0x0003 095 095 000 Pre-fail Always - 0
4 Start_Stop_Count 0x0032 100 100 020 Old_age Always - 14
5 Reallocated_Sector_Ct 0x0033 100 100 036 Pre-fail Always - 0
7 Seek_Error_Rate 0x000f 079 060 030 Pre-fail Always - 91093547
9 Power_On_Hours 0x0032 091 091 000 Old_age Always - 8105
10 Spin_Retry_Count 0x0013 100 100 097 Pre-fail Always - 0
12 Power_Cycle_Count 0x0032 100 100 020 Old_age Always - 14
183 Runtime_Bad_Block 0x0000 001 001 000 Old_age Offline - 1683
184 End-to-End_Error 0x0032 100 100 099 Old_age Always - 0
187 Reported_Uncorrect 0x0032 100 100 000 Old_age Always - 0
188 Command_Timeout 0x0032 001 001 000 Old_age Always - 1632
189 High_Fly_Writes 0x003a 100 100 000 Old_age Always - 0
190 Airflow_Temperature_Cel 0x0022 065 055 045 Old_age Always - 35 (Min/Max 33/38)
194 Temperature_Celsius 0x0022 035 045 000 Old_age Always - 35 (0 13 0 0 0)
195 Hardware_ECC_Recovered 0x001a 025 023 000 Old_age Always - 155751471
197 Current_Pending_Sector 0x0012 100 100 000 Old_age Always - 0
198 Offline_Uncorrectable 0x0010 100 100 000 Old_age Offline - 0
199 UDMA_CRC_Error_Count 0x003e 200 200 000 Old_age Always - 1627
240 Head_Flying_Hours 0x0000 100 253 000 Old_age Offline - 59047210393549
241 Total_LBAs_Written 0x0000 100 253 000 Old_age Offline - 2949500869
242 Total_LBAs_Read 0x0000 100 253 000 Old_age Offline - 2131520475
SMART Error Log Version: 1
No Errors Logged
SMART Self-test log structure revision number 1
Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error
# 1 Extended offline Interrupted (host reset) 80% 8079 -
SMART Selective self-test log data structure revision number 1
SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS
1 0 0 Not_testing
2 0 0 Not_testing
3 0 0 Not_testing
4 0 0 Not_testing
5 0 0 Not_testing
Selective self-test flags (0x0):
After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.
|
enjoy?
Thanks,
EE
PPS- it seems sdb3 has also been failed out of its array. Faulty drive, perhaps? Or should I check the cables? |
|
| Back to top |
|
 |
NeddySeagoon Administrator


Joined: 05 Jul 2003 Posts: 29972 Location: 56N 3W
|
Posted: Thu Apr 12, 2012 8:49 pm Post subject: |
|
|
ExecutorElassus,
I guess you have a bad sector on /dev/sdb4 that causes controller resets and eventually, mdadm gives up on it.
dmesg may show somethig useful anything like | Code: | [415840.462727] ata1.00: exception Emask 0x0 SAct 0x7fffffff SErr 0x0 action 0x0
[415840.462734] ata1.00: irq_stat 0x40000008
[415840.462746] ata1.00: cmd 60/d8:08:00:91:0c/03:00:be:00:00/40 tag 1 ncq 503808 in
[415840.462748] res 41/40:00:f0:92:0c/00:00:be:00:00/40 Emask 0x409 (media error) <F>
[415840.472641] ata1.00: configured for UDMA/133
[415840.472688] ata1: EH complete | is a very bad sign. The drive that was doing that didn't show any smart errors either.
| Code: | [417885.092354] sd 0:0:0:0: [sda] Unhandled sense code
[417885.092358] sd 0:0:0:0: [sda] Result: hostbyte=0x00 driverbyte=0x08
[417885.092363] sd 0:0:0:0: [sda] Sense Key : 0x3 [current] [descriptor]
[417885.092369] Descriptor sense data with sense descriptors (in hex):
[417885.092373] 72 03 11 04 00 00 00 0c 00 0a 80 00 00 00 00 00
[417885.092384] bf 46 92 30
[417885.092390] sd 0:0:0:0: [sda] ASC=0x11 ASCQ=0x4
[417885.092394] sd 0:0:0:0: [sda] CDB: cdb[0]=0x28: 28 00 bf 46 92 30 00 00 f8 00
[417885.092406] end_request: I/O error, dev sda, sector 3209073200
[417885.092412] md/raid:md2: read error NOT corrected!! (sector 3193072176 on sda3).
[417885.092418] md/raid:md2: Disk failure on sda3, disabling device. | That was game over.
emerge --sync does little more than call rsync. Your emerge --sync output suggets that make.conf, your profile or rsync itself is broken.
Untarring a portage snapshort is worth a try ... but I tend to agree that user is damaged. The snapshot would fix the profile.
Does tar work ? _________________ Regards,
NeddySeagoon
Computer users fall into two groups:-
those that do backups
those that have never had a hard drive fail. |
|
| Back to top |
|
 |
ExecutorElassus l33t


Joined: 11 Mar 2004 Posts: 697 Location: Stuttgart, Germany
|
Posted: Thu Apr 12, 2012 9:02 pm Post subject: |
|
|
This is just the last part of dmesg:
| Code: | ata2: SATA link up 1.5 Gbps (SStatus 113 SControl 310)
ata2.00: SB600 AHCI: limiting to 255 sectors per cmd
ata2.00: SB600 AHCI: limiting to 255 sectors per cmd
ata2.00: configured for UDMA/33
ata2: EH complete
ata2.00: exception Emask 0x50 SAct 0x1 SErr 0x680900 action 0x6 frozen
ata2.00: irq_stat 0x08000000, interface fatal error
ata2: SError: { UnrecovData HostInt 10B8B BadCRC Handshk }
ata2.00: failed command: READ FPDMA QUEUED
ata2.00: cmd 60/08:00:c8:6c:70/00:00:74:00:00/40 tag 0 ncq 4096 in
res 40/00:00:c8:6c:70/00:00:74:00:00/40 Emask 0x50 (ATA bus error)
ata2.00: status: { DRDY }
ata2: hard resetting link
ata2: softreset failed (device not ready)
ata2: applying PMP SRST workaround and retrying
ata2: SATA link up 1.5 Gbps (SStatus 113 SControl 310)
ata2.00: SB600 AHCI: limiting to 255 sectors per cmd
ata2.00: SB600 AHCI: limiting to 255 sectors per cmd
ata2.00: configured for UDMA/33
ata2: EH complete
ata2.00: exception Emask 0x50 SAct 0x1 SErr 0x680900 action 0x6 frozen
ata2.00: irq_stat 0x08000000, interface fatal error
ata2: SError: { UnrecovData HostInt 10B8B BadCRC Handshk }
ata2.00: failed command: READ FPDMA QUEUED
ata2.00: cmd 60/08:00:c8:6c:70/00:00:74:00:00/40 tag 0 ncq 4096 in
res 40/00:00:c8:6c:70/00:00:74:00:00/40 Emask 0x50 (ATA bus error)
ata2.00: status: { DRDY }
ata2: hard resetting link
ata2: softreset failed (device not ready)
ata2: applying PMP SRST workaround and retrying
ata2: SATA link up 1.5 Gbps (SStatus 113 SControl 310)
ata2.00: SB600 AHCI: limiting to 255 sectors per cmd
ata2.00: SB600 AHCI: limiting to 255 sectors per cmd
ata2.00: configured for UDMA/33
ata2: EH complete
ata2.00: exception Emask 0x50 SAct 0x1 SErr 0x680900 action 0x6 frozen
ata2.00: irq_stat 0x08000000, interface fatal error
ata2: SError: { UnrecovData HostInt 10B8B BadCRC Handshk }
ata2.00: failed command: READ FPDMA QUEUED
ata2.00: cmd 60/08:00:50:d6:4a/00:00:01:00:00/40 tag 0 ncq 4096 in
res 40/00:00:50:d6:4a/00:00:01:00:00/40 Emask 0x50 (ATA bus error)
ata2.00: status: { DRDY }
ata2: hard resetting link
ata2: softreset failed (device not ready)
ata2: applying PMP SRST workaround and retrying
ata2: SATA link up 1.5 Gbps (SStatus 113 SControl 310)
ata2.00: failed to IDENTIFY (I/O error, err_mask=0x100)
ata2.00: revalidation failed (errno=-5)
ata2: hard resetting link
ata2: softreset failed (device not ready)
ata2: applying PMP SRST workaround and retrying
ata2: SATA link up 1.5 Gbps (SStatus 113 SControl 310)
ata2.00: SB600 AHCI: limiting to 255 sectors per cmd
ata2.00: SB600 AHCI: limiting to 255 sectors per cmd
ata2.00: configured for UDMA/33
ata2: EH complete
ata2.00: exception Emask 0x50 SAct 0x1 SErr 0x680900 action 0x6 frozen
ata2.00: irq_stat 0x08000000, interface fatal error
ata2: SError: { UnrecovData HostInt 10B8B BadCRC Handshk }
ata2.00: failed command: READ FPDMA QUEUED
ata2.00: cmd 60/08:00:50:d6:4a/00:00:01:00:00/40 tag 0 ncq 4096 in
res 40/00:00:50:d6:4a/00:00:01:00:00/40 Emask 0x50 (ATA bus error)
ata2.00: status: { DRDY }
ata2: hard resetting link
ata2: softreset failed (device not ready)
ata2: applying PMP SRST workaround and retrying
ata2: SATA link up 1.5 Gbps (SStatus 113 SControl 310)
ata2.00: SB600 AHCI: limiting to 255 sectors per cmd
ata2.00: SB600 AHCI: limiting to 255 sectors per cmd
ata2.00: configured for UDMA/33
ata2: EH complete
mdadm: sending ioctl 800c0910 to a partition!
mdadm: sending ioctl 800c0910 to a partition!
mdadm: sending ioctl 1261 to a partition!
mdadm: sending ioctl 1261 to a partition!
mdadm: sending ioctl 1261 to a partition!
mdadm: sending ioctl 1261 to a partition!
mdadm: sending ioctl 1261 to a partition!
mdadm: sending ioctl 1261 to a partition!
mdadm: sending ioctl 1261 to a partition!
mdadm: sending ioctl 1261 to a partition!
ata2.00: exception Emask 0x50 SAct 0x0 SErr 0x680900 action 0x6 frozen
ata2.00: irq_stat 0x08000000, interface fatal error
ata2: SError: { UnrecovData HostInt 10B8B BadCRC Handshk }
ata2.00: failed command: SMART
ata2.00: cmd b0/d5:01:09:4f:c2/00:00:00:00:00/00 tag 0 pio 512 in
res 50/00:00:00:4f:c2/00:00:00:00:00/40 Emask 0x50 (ATA bus error)
ata2.00: status: { DRDY }
ata2: hard resetting link
ata2: softreset failed (device not ready)
ata2: applying PMP SRST workaround and retrying
ata2: SATA link up 1.5 Gbps (SStatus 113 SControl 310)
ata2.00: SB600 AHCI: limiting to 255 sectors per cmd
ata2.00: SB600 AHCI: limiting to 255 sectors per cmd
ata2.00: configured for UDMA/33
ata2: EH complete
scsi_verify_blk_ioctl: 16 callbacks suppressed
mdadm: sending ioctl 1261 to a partition!
mdadm: sending ioctl 1261 to a partition!
mdadm: sending ioctl 1261 to a partition!
mdadm: sending ioctl 1261 to a partition!
| So, should I try jiggling the cables, or just suck it up and take the drive in for repair?
Cheers,
EE |
|
| Back to top |
|
 |
NeddySeagoon Administrator


Joined: 05 Jul 2003 Posts: 29972 Location: 56N 3W
|
Posted: Thu Apr 12, 2012 9:26 pm Post subject: |
|
|
ExecutorElassus,
First step it to check the drive warranty status on the vendors website.
You fill in the part number and serial number, which smartctrl provides and the website will tell you if you are covered by warranty or not.
| Code: | ata2.00: irq_stat 0x08000000, interface fatal error
ata2: SError: { UnrecovData HostInt 10B8B BadCRC Handshk } | together with the lack of nastly messages from smartctrl, suggests it may be an interface error.
That means the cable at either end, some of the electronics on the drive or some of the electronics on the motherbaord.
Its worth trying a replacement cable. Don't move the cables round, that will just cause another drive to drop out if the cable is faulty.
If a replaqcement cable doesn't help, try another SATA connector on the motherboard with the new cable.
After that, its drive vendors test software. Be careful with that - some tests will destroy your data.
You don't take a drive in for repair. Dead drives can only be repaired by the vendor. You either get a warranty replacement or you buy a new drive.
The | Code: | | ata2.00: configured for UDMA/33 | is also a bad sign. The system is running the interface very slowly (its should be UDMA/133) in an attempt to get valid data across it.
This too may indicate an interface issue. _________________ Regards,
NeddySeagoon
Computer users fall into two groups:-
those that do backups
those that have never had a hard drive fail. |
|
| Back to top |
|
 |
ExecutorElassus l33t


Joined: 11 Mar 2004 Posts: 697 Location: Stuttgart, Germany
|
Posted: Thu Apr 12, 2012 9:38 pm Post subject: |
|
|
Hi Neddy,
all right, I'll swap some cables out, and let you know what happens. Adventure!
The drive has another two years of warranty on it, so I'm covered if it's faulty.
If I plug the drive into a different SATA socket, with a different cable, will the OS give it a different identifier? Or will sda and sdc retain those labels?
Thanks,
EE
PS- holy crap, I haven't bought a drive in a year, and everything looks like it costs double or more. Were the floods in Thailand really that bad? |
|
| Back to top |
|
 |
NeddySeagoon Administrator


Joined: 05 Jul 2003 Posts: 29972 Location: 56N 3W
|
Posted: Thu Apr 12, 2012 10:10 pm Post subject: |
|
|
ExecutorElassus,
Its warranty - you only pay return postage.
The OS may rearrange all your drives if you move one around. mdadm and the raid set won't mind, the raid superblock on each drive tells what goes where inthe raid set, so it will be good.
If you use UUIDs or filesystem labels in /etc/fstab, that much will just work.
Hmm you mentioned a mdadm.conf. If that is actually used, refereices to sdX may change.
Grub may get confused as BIOS discovery order may change. Any partitions you have identified as /dev/sd ... may change drives.
If root is not on raid, yor root=/dev/sd... may change drive too. _________________ Regards,
NeddySeagoon
Computer users fall into two groups:-
those that do backups
those that have never had a hard drive fail. |
|
| Back to top |
|
 |
ExecutorElassus l33t


Joined: 11 Mar 2004 Posts: 697 Location: Stuttgart, Germany
|
Posted: Thu Apr 12, 2012 10:22 pm Post subject: |
|
|
Hi Neddy,
well, I shouldn't even have to pay postage: I bought it locally. Now that I know I have the S/N, I can be sure I'm yanking the correct drive (I may not have last time).
/etc/mdadm.conf is completely commented out, so I'm guessing it's not being used.
So, step one is to swap the cable and the slot on the offending drive. If it still has issues, then take it in for replacement tomorrow.
Thanks again for the help. I'll report back tomorrow (unless there's something else I should know about beforehand?)
Cheers,
EE |
|
| Back to top |
|
 |
NeddySeagoon Administrator


Joined: 05 Jul 2003 Posts: 29972 Location: 56N 3W
|
Posted: Thu Apr 12, 2012 10:32 pm Post subject: |
|
|
ExecutorElassus,
Only change one thing at a time.
First the cable as its easy and less trouble. If its still faulty, swap the SATA port on the motherboard for an unused one.
That may give you boot issues.
If you change two (or more) things at a time, you will never know what fixed it. _________________ Regards,
NeddySeagoon
Computer users fall into two groups:-
those that do backups
those that have never had a hard drive fail. |
|
| Back to top |
|
 |
ExecutorElassus l33t


Joined: 11 Mar 2004 Posts: 697 Location: Stuttgart, Germany
|
Posted: Thu Apr 12, 2012 10:50 pm Post subject: |
|
|
sigh. oops. Too late!
rebooted once. sdb4 and sdc4 (the latter of which still incorrectly identified itself as a member of a fully active array, so that's the wonky one) both got dumped as spares into md126. I deleted that array, added sdb4 back into md127 (as well as deleting md125, which contained an orphaned sdc3 from md3), rebooted, and now only sdc4 was not a member of any array. I added it back to md127, and now it's re-syncing. I have … 500 minutes to go, so just under nine hours (is that a long time to re-sync a raid5 array of about 1.8TB?).
I'll see what's up tomorrow morning, when it's done with the re-sync, and report back. Hopefully it's just the cable, but I wouldn't sweat replacing the drive too much, either.
Stay tuned.
Cheers,
EE
UPDATE: So, recovery finished, and now I'm back at 'install -m' being unable to create directories (not just for xorg-server), and rsync being unable to sync (because it also doesn't understand options). So, uh … tinderbox?
UPDADE 2: It's been sitting stable now for a good six hours, with nothing in dmesg about failed writes or I/O errors, so I'm cautiously optimistic. However, every operation I attempt - emerge --sync, emerge [any package] - spits back errors about unknown options. Any guesses what would cause that? Or, more usefully, what are the best steps to repair that? is there a tarball I can unpack to cover over my portage toolchain, so I can at least start re-emerging programs? Also, I should mention, that at I have /usr on a separate partition, I cannot use >udev-182 until I can figure out how to premount /usr and /var. Since a lot of programs now seem to depend on newer udev, I can't run 'emerge -uD world' automatically. I'm following to forum thread about it, so hopefully that'll be sorted soon.
Let's talk more about getting emerge to work. |
|
| Back to top |
|
 |
NeddySeagoon Administrator


Joined: 05 Jul 2003 Posts: 29972 Location: 56N 3W
|
Posted: Fri Apr 13, 2012 6:39 pm Post subject: |
|
|
ExecutorElassus,
Resysn speeds are not deterministic. You are supposed to be able to use the raid while it resyncs, the more you use it, the slower the sync goes.
That the resync completed properly is a good sign. It read two drives and either veriifed or rewrite the data on the third drive.
Do you know if the writing happend on the suspect drive or was that driver read ?
Thats key. If the suspect drive was written, any write fails will have caused sector remapping.
So | Code: | | 5 Reallocated_Sector_Ct 0x0033 100 100 036 Pre-fail Always - 0 | would have changed.
| Code: | | 197 Current_Pending_Sector 0x0012 100 100 000 Old_age Always - 0 | shows the number of sectors the drive is thinking about remapping.
If the suspect drive was read, then you know there were no read errors, in which case I would conclude your hardware problem is fixed but you don't know if it was the cable or the SATA port.
Throw the cable out anyway. It not worth experimenting with further.
If you want to do a read test you can dd the entire content of the drive to /dev/null and keep an eye on dmesg for errors.
I have the same issue with udev and I've run into another nasty. openrc-0.9.9.3 and a 1.7.x udev don't play nicely together. My root works as it kernel autoassembled vut lvm isn't started until udev has tried to mount everything, so its just me and the root filesystem. I have another system with root in lvm on top of raid5. The raid is assembled by an initrd, which also starts the lvm, since root is in there.. Still udev doesn't play nicely with openrc-0.9.9.3. Nothing gets mounted.
emerge needs lots of things to work. portage, python, rsync to fetch files, assorted decompression utilities to unpack them, the gcc toolchain ...
Try replacing your portsge with one from the tinderbox and see what happens. After you have used the tinderbox for 5 or 6 packages, its time to give up and reinstall. _________________ Regards,
NeddySeagoon
Computer users fall into two groups:-
those that do backups
those that have never had a hard drive fail. |
|
| Back to top |
|
 |
|
|
You cannot post new topics in this forum You cannot reply to topics in this forum You cannot edit your posts in this forum You cannot delete your posts in this forum You cannot vote in polls in this forum
|
|