View previous topic :: View next topic |
Author |
Message |
mno Guru
Joined: 29 Dec 2003 Posts: 454 Location: Toronto, Canada
|
Posted: Fri Jan 25, 2008 1:24 am Post subject: anatomy of a crash |
|
|
Hello everyone,
A few days ago, my server running Gentoo all of a sudden disappeared. I have just been working with on a website on it about an hour ago, and when I went back to continue working on the site an hour later, I noticed that all systems were down. No SSH, FTP, etc, no ping. After a remote power reboot, the server came back up just fine. So, I went in to dig around and try to discover the cause.
I've spent the past oh 2 days digging around log files and running some verifications (such as disk check, etc), but all come back without errors. Although, after the remote reboot, there were a significant amount of blocks that had bad addresses, but those disappeared after a second proper reboot. I assume those were just FOL and occured due to me cutting power in the middle of it doing something.
In any case, I can't discover what went wrong. I have a hunch that the actual server didn't flop, but that it was an error with the networking. Whether on my end or on my colo's end with their switch all of a sudden refusing to communicate. But I can't seem to point my finger on anything. So, my question is, how does one go about disecting what went wrong when the standard places don't show anything as wrong? Just in case, here's my emerge --info.
Thanks in advance,
Max
Code: | lastochka ~ # emerge --info
Portage 2.1.3.19 (default-linux/amd64/2007.0, gcc-4.1.2, glibc-2.6.1-r0, 2.6.22-gentoo-r2 x86_64)
=================================================================
System uname: 2.6.22-gentoo-r2 x86_64 AMD Opteron(tm) Processor 244
Timestamp of tree: Fri, 25 Jan 2008 00:00:04 +0000
app-shells/bash: 3.2_p17-r1
dev-java/java-config: 1.3.7, 2.0.33-r1
dev-lang/python: 2.4.4-r6
dev-python/pycrypto: 2.0.1-r6
sys-apps/baselayout: 1.12.10-r5
sys-apps/sandbox: 1.2.18.1-r2
sys-devel/autoconf: 2.13, 2.61-r1
sys-devel/automake: 1.7.9-r1, 1.10
sys-devel/binutils: 2.18-r1
sys-devel/gcc-config: 1.4.0-r4
sys-devel/libtool: 1.5.24
virtual/os-headers: 2.6.23-r2
ACCEPT_KEYWORDS="amd64"
CBUILD="x86_64-pc-linux-gnu"
CFLAGS="-march=k8 -O2 -pipe"
CHOST="x86_64-pc-linux-gnu"
CONFIG_PROTECT="/etc"
CONFIG_PROTECT_MASK="/etc/env.d /etc/env.d/java/ /etc/fonts/fonts.conf /etc/gconf /etc/php/apache2-php5/ext-active/ /etc/php/cgi-php5/ext-active/ /etc/php/cli-php5/ext-active/ /etc/revdep-rebuild /etc/terminfo /etc/udev/rules.d"
CXXFLAGS="-march=k8 -O2 -pipe"
DISTDIR="/mnt/backup/portage/distfiles"
FEATURES="distlocks metadata-transfer sandbox sfperms strict unmerge-orphans"
GENTOO_MIRRORS="http://gentoo.mirrors.tds.net/gentoo http://mirror.datapipe.net/gentoo http://gentoo.chem.wisc.edu/gentoo/"
MAKEOPTS="-j2"
PKGDIR="/mnt/backup/portage/packages"
PORTAGE_RSYNC_OPTS="--recursive --links --safe-links --perms --times --compress --force --whole-file --delete --delete-after --stats --timeout=180 --exclude=/distfiles --exclude=/local --exclude=/packages --filter=H_**/files/digest-*"
PORTAGE_TMPDIR="/mnt/backup/portagetmp"
PORTDIR="/mnt/backup/portage"
PORTDIR_OVERLAY="/mnt/backup/portage/local/layman/java-overlay"
SYNC="rsync://rsync.gentoo.org/gentoo-portage"
USE="acl amd64 apache2 berkdb bitmap-fonts cli cracklib crypt cups dri fortran gdbm gpm iconv ipv6 isdnlog midi mmx mudflap mysql ncurses nls nptl nptlonly openmp pam pcre perl php pppd python readline reflection session sockets spl sqlite sse sse2 ssl tcpd threads truetype-fonts type1-fonts unicode vhosts xorg zlib" ALSA_CARDS="ali5451 als4000 atiixp atiixp-modem bt87x ca0106 cmipci emu10k1x ens1370 ens1371 es1938 es1968 fm801 hda-intel intel8x0 intel8x0m maestro3 trident usb-audio via82xx via82xx-modem ymfpci" ALSA_PCM_PLUGINS="adpcm alaw asym copy dmix dshare dsnoop empty extplug file hooks iec958 ioplug ladspa lfloat linear meter mulaw multi null plug rate route share shm softvol" APACHE2_MODULES="actions alias auth_basic auth_digest authn_anon authn_dbd authn_dbm authn_default authn_file authz_dbm authz_default authz_groupfile authz_host authz_owner authz_user autoindex cache dav dav_fs dav_lock dbd deflate dir disk_cache env expires ext_filter file_cache filter headers ident imagemap include info log_config logio mem_cache mime mime_magic negotiation proxy proxy_ajp proxy_balancer proxy_connect proxy_http rewrite setenvif so speling status unique_id userdir usertrack vhost_alias" APACHE2_MPMS="worker" ELIBC="glibc" INPUT_DEVICES="keyboard mouse evdev" KERNEL="linux" LCD_DEVICES="bayrad cfontz cfontz633 glk hd44780 lb216 lcdm001 mtxorb ncurses text" USERLAND="GNU" VIDEO_CARDS="apm ark chips cirrus cyrix dummy fbdev glint i128 i810 mach64 mga neomagic nv r128 radeon rendition s3 s3virge savage siliconmotion sis sisusb tdfx tga trident tseng v4l vesa vga via vmware voodoo"
Unset: CPPFLAGS, CTARGET, EMERGE_DEFAULT_OPTS, INSTALL_MASK, LANG, LC_ALL, LDFLAGS, LINGUAS, PORTAGE_COMPRESS, PORTAGE_COMPRESS_FLAGS, PORTAGE_RSYNC_EXTRA_OPTS |
_________________ "Hello and goodbye. As always." | You can't use here?? | Unanswered |
|
Back to top |
|
|
schachti Advocate
Joined: 28 Jul 2003 Posts: 3765 Location: Gifhorn, Germany
|
Posted: Fri Jan 25, 2008 6:37 am Post subject: |
|
|
You can use smartmontools to check your harddisk, maybe the harddisk is the reason for the problem. You could also test your cpu using cpuburn (the computer might crash during this test). _________________ Never argue with an idiot. He brings you down to his level, then beats you with experience.
How-To: Daten verschlüsselt auf DVD speichern. |
|
Back to top |
|
|
mno Guru
Joined: 29 Dec 2003 Posts: 454 Location: Toronto, Canada
|
Posted: Fri Jan 25, 2008 6:38 am Post subject: |
|
|
The harddisks have already been checked. They're connected to a 3ware RAID controller. I verified the array several times, always was perfectly stable. _________________ "Hello and goodbye. As always." | You can't use here?? | Unanswered |
|
Back to top |
|
|
pelkeyj n00b
Joined: 04 Nov 2006 Posts: 17
|
Posted: Fri Jan 25, 2008 8:54 am Post subject: |
|
|
Not for this crash, but for future ones:
If the kernel has the netconsole module, you can have it send console messages (usually including ones during a crash after syslog or hard disks are no longer functional) to another computer via UDP packets.
Documentation is in /usr/src/linux/Documentation/networking/netconsole.txt, and changing the kernel console loglevel is done by modifying /proc/sys/kernel/printk (details in proc(5) man page). |
|
Back to top |
|
|
mno Guru
Joined: 29 Dec 2003 Posts: 454 Location: Toronto, Canada
|
Posted: Fri Jan 25, 2008 9:30 am Post subject: |
|
|
Interesting! What program would the other computer need to have running to receive the packets, and on what port? Though I assume it's the same (netconsole)? I guess it's all in the docs then. Thanks! _________________ "Hello and goodbye. As always." | You can't use here?? | Unanswered |
|
Back to top |
|
|
StarDragon Guru
Joined: 19 Jun 2005 Posts: 390 Location: tEXas
|
Posted: Fri Jan 25, 2008 8:22 pm Post subject: |
|
|
Did the box ever bounce? If not, then I suspect it was a switch issue. Maybe checking your logs on the switch box might help. |
|
Back to top |
|
|
mno Guru
Joined: 29 Dec 2003 Posts: 454 Location: Toronto, Canada
|
Posted: Fri Jan 25, 2008 10:13 pm Post subject: |
|
|
Bounce, what do you mean? _________________ "Hello and goodbye. As always." | You can't use here?? | Unanswered |
|
Back to top |
|
|
|
|
You cannot post new topics in this forum You cannot reply to topics in this forum You cannot edit your posts in this forum You cannot delete your posts in this forum You cannot vote in polls in this forum
|
|