Gentoo Forums
Gentoo Forums
Gentoo Forums
Quick Search: in
anatomy of a crash
View unanswered posts
View posts from last 24 hours

 
Reply to topic    Gentoo Forums Forum Index Kernel & Hardware
View previous topic :: View next topic  
Author Message
mno
Guru
Guru


Joined: 29 Dec 2003
Posts: 454
Location: Toronto, Canada

PostPosted: Fri Jan 25, 2008 1:24 am    Post subject: anatomy of a crash Reply with quote

Hello everyone,

A few days ago, my server running Gentoo all of a sudden disappeared. I have just been working with on a website on it about an hour ago, and when I went back to continue working on the site an hour later, I noticed that all systems were down. No SSH, FTP, etc, no ping. After a remote power reboot, the server came back up just fine. So, I went in to dig around and try to discover the cause.

I've spent the past oh 2 days digging around log files and running some verifications (such as disk check, etc), but all come back without errors. Although, after the remote reboot, there were a significant amount of blocks that had bad addresses, but those disappeared after a second proper reboot. I assume those were just FOL and occured due to me cutting power in the middle of it doing something.

In any case, I can't discover what went wrong. I have a hunch that the actual server didn't flop, but that it was an error with the networking. Whether on my end or on my colo's end with their switch all of a sudden refusing to communicate. But I can't seem to point my finger on anything. So, my question is, how does one go about disecting what went wrong when the standard places don't show anything as wrong? Just in case, here's my emerge --info.

Thanks in advance,
Max

Code:
lastochka ~ # emerge --info
Portage 2.1.3.19 (default-linux/amd64/2007.0, gcc-4.1.2, glibc-2.6.1-r0, 2.6.22-gentoo-r2 x86_64)
=================================================================
System uname: 2.6.22-gentoo-r2 x86_64 AMD Opteron(tm) Processor 244
Timestamp of tree: Fri, 25 Jan 2008 00:00:04 +0000
app-shells/bash:     3.2_p17-r1
dev-java/java-config: 1.3.7, 2.0.33-r1
dev-lang/python:     2.4.4-r6
dev-python/pycrypto: 2.0.1-r6
sys-apps/baselayout: 1.12.10-r5
sys-apps/sandbox:    1.2.18.1-r2
sys-devel/autoconf:  2.13, 2.61-r1
sys-devel/automake:  1.7.9-r1, 1.10
sys-devel/binutils:  2.18-r1
sys-devel/gcc-config: 1.4.0-r4
sys-devel/libtool:   1.5.24
virtual/os-headers:  2.6.23-r2
ACCEPT_KEYWORDS="amd64"
CBUILD="x86_64-pc-linux-gnu"
CFLAGS="-march=k8 -O2 -pipe"
CHOST="x86_64-pc-linux-gnu"
CONFIG_PROTECT="/etc"
CONFIG_PROTECT_MASK="/etc/env.d /etc/env.d/java/ /etc/fonts/fonts.conf /etc/gconf /etc/php/apache2-php5/ext-active/ /etc/php/cgi-php5/ext-active/ /etc/php/cli-php5/ext-active/ /etc/revdep-rebuild /etc/terminfo /etc/udev/rules.d"
CXXFLAGS="-march=k8 -O2 -pipe"
DISTDIR="/mnt/backup/portage/distfiles"
FEATURES="distlocks metadata-transfer sandbox sfperms strict unmerge-orphans"
GENTOO_MIRRORS="http://gentoo.mirrors.tds.net/gentoo http://mirror.datapipe.net/gentoo http://gentoo.chem.wisc.edu/gentoo/"
MAKEOPTS="-j2"
PKGDIR="/mnt/backup/portage/packages"
PORTAGE_RSYNC_OPTS="--recursive --links --safe-links --perms --times --compress --force --whole-file --delete --delete-after --stats --timeout=180 --exclude=/distfiles --exclude=/local --exclude=/packages --filter=H_**/files/digest-*"
PORTAGE_TMPDIR="/mnt/backup/portagetmp"
PORTDIR="/mnt/backup/portage"
PORTDIR_OVERLAY="/mnt/backup/portage/local/layman/java-overlay"
SYNC="rsync://rsync.gentoo.org/gentoo-portage"
USE="acl amd64 apache2 berkdb bitmap-fonts cli cracklib crypt cups dri fortran gdbm gpm iconv ipv6 isdnlog midi mmx mudflap mysql ncurses nls nptl nptlonly openmp pam pcre perl php pppd python readline reflection session sockets spl sqlite sse sse2 ssl tcpd threads truetype-fonts type1-fonts unicode vhosts xorg zlib" ALSA_CARDS="ali5451 als4000 atiixp atiixp-modem bt87x ca0106 cmipci emu10k1x ens1370 ens1371 es1938 es1968 fm801 hda-intel intel8x0 intel8x0m maestro3 trident usb-audio via82xx via82xx-modem ymfpci" ALSA_PCM_PLUGINS="adpcm alaw asym copy dmix dshare dsnoop empty extplug file hooks iec958 ioplug ladspa lfloat linear meter mulaw multi null plug rate route share shm softvol" APACHE2_MODULES="actions alias auth_basic auth_digest authn_anon authn_dbd authn_dbm authn_default authn_file authz_dbm authz_default authz_groupfile authz_host authz_owner authz_user autoindex cache dav dav_fs dav_lock dbd deflate dir disk_cache env expires ext_filter file_cache filter headers ident imagemap include info log_config logio mem_cache mime mime_magic negotiation proxy proxy_ajp proxy_balancer proxy_connect proxy_http rewrite setenvif so speling status unique_id userdir usertrack vhost_alias" APACHE2_MPMS="worker" ELIBC="glibc" INPUT_DEVICES="keyboard mouse evdev" KERNEL="linux" LCD_DEVICES="bayrad cfontz cfontz633 glk hd44780 lb216 lcdm001 mtxorb ncurses text" USERLAND="GNU" VIDEO_CARDS="apm ark chips cirrus cyrix dummy fbdev glint i128 i810 mach64 mga neomagic nv r128 radeon rendition s3 s3virge savage siliconmotion sis sisusb tdfx tga trident tseng v4l vesa vga via vmware voodoo"
Unset:  CPPFLAGS, CTARGET, EMERGE_DEFAULT_OPTS, INSTALL_MASK, LANG, LC_ALL, LDFLAGS, LINGUAS, PORTAGE_COMPRESS, PORTAGE_COMPRESS_FLAGS, PORTAGE_RSYNC_EXTRA_OPTS

_________________
"Hello and goodbye. As always." | You can't use   here?? | Unanswered
Back to top
View user's profile Send private message
schachti
Advocate
Advocate


Joined: 28 Jul 2003
Posts: 3765
Location: Gifhorn, Germany

PostPosted: Fri Jan 25, 2008 6:37 am    Post subject: Reply with quote

You can use smartmontools to check your harddisk, maybe the harddisk is the reason for the problem. You could also test your cpu using cpuburn (the computer might crash during this test).
_________________
Never argue with an idiot. He brings you down to his level, then beats you with experience.

How-To: Daten verschlüsselt auf DVD speichern.
Back to top
View user's profile Send private message
mno
Guru
Guru


Joined: 29 Dec 2003
Posts: 454
Location: Toronto, Canada

PostPosted: Fri Jan 25, 2008 6:38 am    Post subject: Reply with quote

The harddisks have already been checked. They're connected to a 3ware RAID controller. I verified the array several times, always was perfectly stable.
_________________
"Hello and goodbye. As always." | You can't use   here?? | Unanswered
Back to top
View user's profile Send private message
pelkeyj
n00b
n00b


Joined: 04 Nov 2006
Posts: 17

PostPosted: Fri Jan 25, 2008 8:54 am    Post subject: Reply with quote

Not for this crash, but for future ones:
If the kernel has the netconsole module, you can have it send console messages (usually including ones during a crash after syslog or hard disks are no longer functional) to another computer via UDP packets.
Documentation is in /usr/src/linux/Documentation/networking/netconsole.txt, and changing the kernel console loglevel is done by modifying /proc/sys/kernel/printk (details in proc(5) man page).
Back to top
View user's profile Send private message
mno
Guru
Guru


Joined: 29 Dec 2003
Posts: 454
Location: Toronto, Canada

PostPosted: Fri Jan 25, 2008 9:30 am    Post subject: Reply with quote

Interesting! What program would the other computer need to have running to receive the packets, and on what port? Though I assume it's the same (netconsole)? I guess it's all in the docs then. Thanks!
_________________
"Hello and goodbye. As always." | You can't use   here?? | Unanswered
Back to top
View user's profile Send private message
StarDragon
Guru
Guru


Joined: 19 Jun 2005
Posts: 390
Location: tEXas

PostPosted: Fri Jan 25, 2008 8:22 pm    Post subject: Reply with quote

Did the box ever bounce? If not, then I suspect it was a switch issue. Maybe checking your logs on the switch box might help.
Back to top
View user's profile Send private message
mno
Guru
Guru


Joined: 29 Dec 2003
Posts: 454
Location: Toronto, Canada

PostPosted: Fri Jan 25, 2008 10:13 pm    Post subject: Reply with quote

Bounce, what do you mean?
_________________
"Hello and goodbye. As always." | You can't use   here?? | Unanswered
Back to top
View user's profile Send private message
Display posts from previous:   
Reply to topic    Gentoo Forums Forum Index Kernel & Hardware All times are GMT
Page 1 of 1

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum