View previous topic :: View next topic |
Author |
Message |
ba l33t
Joined: 25 May 2003 Posts: 804
|
Posted: Wed Mar 30, 2005 11:48 pm Post subject: Apache trouble |
|
|
Some time ago apache started to dying at about half of https requests with bus error. I'm not sure what caused it(system update was week before, maybe it's the reason...).
In error_log
Code: |
[Thu Mar 31 02:26:45 2005] [notice] Apache/2.0.52 (Gentoo/Linux) mod_ssl/2.0.52 OpenSSL/0.9.7e configured -- resuming normal operations
[Thu Mar 31 02:27:09 2005] [notice] child pid 1441 exit signal Bus error (10)
[Thu Mar 31 02:27:09 2005] [error] cgid daemon process died, restarting
[Thu Mar 31 02:27:11 2005] [notice] child pid 1640 exit signal Bus error (10)
[Thu Mar 31 02:27:22 2005] [notice] child pid 1903 exit signal Bus error (10)
[Thu Mar 31 02:30:08 2005] [notice] child pid 1646 exit signal Bus error (10)
[Thu Mar 31 02:30:09 2005] [notice] child pid 1905 exit signal Bus error (10)
|
strace
Code: |
Process 2339 attached - interrupt to quit
fcntl64(168, F_SETLKW, {type=F_WRLCK, whence=SEEK_SET, start=0, len=0}) = 0
poll([{fd=4, events=POLLIN}, {fd=3, events=POLLIN, revents=POLLIN}], 2, -1) = 1
accept(3, {sa_family=AF_INET, sin_port=htons(59132), sin_addr=inet_addr("xxx.xxx.xxx.xxx")}, [16]) = 170
fcntl64(168, F_SETLKW, {type=F_UNLCK, whence=SEEK_SET, start=0, len=0}) = 0
getsockname(170, {sa_family=AF_INET, sin_port=htons(443), sin_addr=inet_addr("xxx.xxx.xxx.xxx")}, [16]) = 0
time(NULL) = 1112224626
brk(0) = 0x5dc000
brk(0x5fe000) = 0x5fe000
fcntl64(170, F_GETFL) = 0x2 (flags O_RDWR)
fcntl64(170, F_SETFL, O_RDWR|O_NONBLOCK) = 0
time(NULL) = 1112224626
read(170, "\200g\1\3\0\0N\0\0\0\20\1\0\200\3\0\200\7\0\300\6\0@\2"..., 8000) = 105
time(NULL) = 1112224626
time(NULL) = 1112224626
getpid() = 2339
time(NULL) = 1112224626
getpid() = 2339
time([1112224626]) = 1112224626
getpid() = 2339
--- SIGBUS (Bus error) @ 0 (0) ---
chdir("/usr/lib/apache2") = 0
rt_sigaction(SIGBUS, {SIG_DFL}, {SIG_DFL}, 0x74c5bb78, 0) = 0
getpid() = 2339
getpid() = 2339
kill(2339, SIGBUS) = 0
sigreturn() = ? (mask now [QUIT ABRT KILL SYS PIPE TSTP CONT TTIN IO XCPU PROF LOST USR1 USR2])
--- SIGBUS (Bus error) @ 0 (0) ---
Process 2339 detached
|
backtrace from gdb
Code: |
(gdb) attach 1940
Attaching to process 1940
...
(gdb) cont
Continuing.
Program received signal SIGBUS, Bus error.
0x74c9c1e8 in mallopt () from /lib/libc.so.6
(gdb) bt
#0 0x74c9c1e8 in mallopt () from /lib/libc.so.6
#1 0x74d6192c in __after_morecore_hook () from /lib/libc.so.6
#2 0x74d6192c in __after_morecore_hook () from /lib/libc.so.6
Previous frame identical to this frame (corrupt stack?)
|
I tryed emerge -e apache, downgrading gcc, glibc, apache, openssl with no success... Any ideas?
and sorry for my english ( |
|
Back to top |
|
|
labrador Guru
Joined: 04 Oct 2003 Posts: 316
|
Posted: Thu Mar 31, 2005 5:22 pm Post subject: compare/contrast |
|
|
Compare the details in the various crashes. Is it always exiting with
the same consistant error? If so there is something on the application
level to debug. Is it associated with certain pages/traffic on the
web server or can it happen when there are no hits on it?
If the problem happens differently every time, it is likely a hardware issue.
Check CPU heatsink for dust. I've seen overheating cause what seemed like consistant
errors emerging a certain ebuild, but on a closer look, it was slightly
different error each time and hardware was the failure point (over heating).
A can of air can sometimes fix hardware problems.
Another thing that can cause random weirdness is a busted file system.
I had a reiserfs with the 2.6 sparc kernel, and after a power outage
everything was OK, but certain things could not build. I've since
learned that reiserfs with the 2.6 kernel on sparc is not stable.
Check the file system by booting a Live CD and running fsck
against your / partition. Type fsck[double-tab] to see all of the flavours
of fsck available, and use the one that matches your file system type. |
|
Back to top |
|
|
ba l33t
Joined: 25 May 2003 Posts: 804
|
Posted: Thu Mar 31, 2005 8:32 pm Post subject: Re: compare/contrast |
|
|
labrador wrote: | Compare the details in the various crashes. Is it always exiting with
the same consistant error? |
yes
labrador wrote: | Is it associated with certain pages/traffic on the
web server or can it happen when there are no hits on it? |
It is associated with https requests |
|
Back to top |
|
|
|
|
You cannot post new topics in this forum You cannot reply to topics in this forum You cannot edit your posts in this forum You cannot delete your posts in this forum You cannot vote in polls in this forum
|
|