Gentoo Forums
Gentoo Forums
Gentoo Forums
Quick Search: in
Random kernel panics
View unanswered posts
View posts from last 24 hours

 
Reply to topic    Gentoo Forums Forum Index Kernel & Hardware
View previous topic :: View next topic  
Author Message
Heissi
n00b
n00b


Joined: 22 Feb 2008
Posts: 5

PostPosted: Fri Feb 22, 2008 11:31 am    Post subject: Random kernel panics Reply with quote

I recently have problems with random kernel panics on my server/router.

Some background information:
It is a EPIA PD with a VIA C3 CPU.
On it I only installed few daemons and there is nothing special.

At first, the server just froze and I recognized that the CPU fan isn't moving anymore. So I replaced the fan and the hardware seems to be ok, but then there were these kernel panics at random times (10-200 minutes after boot).

Also the BIOS battery was low on voltage (BIOS settings and system clock were resetted), so I replaced it, but the kernel panics were still there.
Then I tested the CPU with cpuburn and some emerge, but... nope... It happened after the emerge process (no CPU load there).
The RAM seems to be ok too (memtest86+).
Finally I replaced the harddisk (I cloned the system) but that didn't resolve anything either.

I don't think the kernel is broken, because the system was running 90 days without any problems.

What should I do now?

I'm really inexperienced with kernel panics.
Is there a way to trace back the source of the problem (maybe the mainboard)?

I don't know which informations of the system are relevant, so if you need informations, just ask.

Thanks.
Back to top
View user's profile Send private message
pathfinder
l33t
l33t


Joined: 19 Jan 2006
Posts: 731
Location: Barcelona, Spain

PostPosted: Fri Feb 22, 2008 1:15 pm    Post subject: Reply with quote

try recompiling the kernel from the config file.

was your config file changed lately?
backup it, then cd usr/src/linux and make menuconfig

you can t boot on your computer, isn t it?

maybe it is due to the clock because in the handbook I think i remember that when you had to compile your kernel for the first time, there was a warning saying taht you ought to be sure the date is correct before proceeding. Maybe the fact your date was not ok made a huge mess.
I would definitely try to set the correct date now the cell has been changed, and then recompile as it it now your kernel. just to see what happens.
Back to top
View user's profile Send private message
Heissi
n00b
n00b


Joined: 22 Feb 2008
Posts: 5

PostPosted: Fri Feb 22, 2008 2:42 pm    Post subject: Reply with quote

I upgraded from hardened-sources-2.6.23 to hardened-sources-2.6.23-r7.

While I was testing something I got this message:
Code:
invalid opcode: 0000 [#1]
Modules linked in: thermal button processor
CPU:    0
EIP:    0060:[<c04e0739>]    Not tainted VLI
EFLAGS: 00010002   (2.6.23-hardened-r7 #1)
EIP is at elv_rb_add+0x1/0x51
eax: ddf9bd64   ebx: ddf9bd4c   ecx: ddf9bd4c   edx: d231be84
esi: ddf8f9c0   edi: d231be84   ebp: 00000000   esp: c6887b3c
ds: 007b   es: 007b   fs: 0000  gs: 0033  ss: 0068
Process mconf (pid: 29677, ti=c6886000 task=c4bf8ab0 task.ti=c6886000)
Stack: ddf8f9c0 c04e8772 d231be84 ddf9bd4c ddf8f9c0 c04e9808 d231be84 ddf92ad0
       00000008 c04e0b22 ddf92b30 0005ffbe 00000086 d231be84 ddf92ad0 00000008
       00000000 c04e3c8e 00000000 00000000 d231be84 c14512a0 c14512a0 00000008
Call Trace:
 [<c04e8772>] cfq_add_rq_rb+0x3c/0x74
 [<c04e9808>] cfq_insert_request+0x1c/0x3a
 [<c04e0b22>] elv_insert+0xa4/0x141
 [<c04e3c8e>] __make_request+0x28c/0x2b6
 [<c04e3eb0>] generic_make_request+0x17e/0x1ab
 [<c046af47>] bio_add_page+0x31/0x37
 [<c046dad8>] mpage_end_io_read+0x0/0x5e
 [<c04e3f82>] submit_bio+0xa5/0xac
 [<c046dad8>] mpage_end_io_read+0x0/0x5e
 [<c046dbaf>] mpage_bio_submit+0x19/0x1d
 [<c046e104>] mpage_readpages+0x10f/0x11c
 [<c04852d4>] ext3_get_block+0x0/0xbe
 [<c05f82fd>] io_schedule+0xe/0x16
 [<c05f8421>] __wait_on_bit+0x4a/0x51
 [<c05f8496>] out_of_line_wait_on_bit+0x6e/0x76
 [<c0467d3f>] sync_buffer+0x0/0x2e
 [<c0439ffd>] buffered_rmqueue+0xbf/0xd7
 [<c043bc31>] read_pages+0x28/0xd3
 [<c04852d4>] ext3_get_block+0x0/0xbe
 [<c043a1b6>] __alloc_pages+0x51/0x2a4
 [<c043bde5>] __do_page_cache_readahead+0x109/0x123
 [<c043bef4>] ra_submit+0x20/0x25
 [<c043c054>] page_cache_sync_readahead+0x2a/0x2f
 [<c043703d>] do_generic_mapping_read+0xda/0x3ff
 [<c04375c7>] generic_file_aio_read+0x11f/0x14a
 [<c0437362>] file_read_actor+0x0/0xda
 [<c044ea5b>] do_sync_read+0xbe/0xfb
 [<c0423a42>] autoremove_wake_function+0x0/0x33
 [<c04106f4>] do_page_fault+0x2a7/0x5c7
 [<c044e282>] nameidata_to_filp+0x23/0x32
 [<c044eb21>] vfs_read+0x89/0x104
 [<c044edde>] sys_read+0x41/0x67
 [<c0403c9d>] sysenter_past_esp+0x66/0x99
 [<c0403cb6>] sysenter_past_esp+0x7f/0x99
 =======================
Code: 48 04 c7 42 3c 00 00 00 00 c7 43 04 00 00 00 00 eb 0e 8b 42 24 03 42 1c 39 f0 75 04 89 d0 eb 06 89 f8 eb a9 31 c0 5b 5e 5f c3 56 <89> c1 89 c6 53 31 db 83 38 00 74 22 8b 19 8d 4b bc 8b 41 1c 39


Then I had to reboot, because the system was screwed up (like 10 defunct processes).
Looks like an Memory or CPU error, doesn't it?
But I trust memtest86+ and the radiator of the CPU wasn't really hot (why there isn't a sensor on the CPU?) so I removed the heat-conductive paste and put on some new one - just to be sure.

Unfortunately recompiling the kernel doesn't solve the problem.
Back to top
View user's profile Send private message
pathfinder
l33t
l33t


Joined: 19 Jan 2006
Posts: 731
Location: Barcelona, Spain

PostPosted: Fri Feb 22, 2008 2:50 pm    Post subject: Reply with quote

well, have you tried with another distro? with windows?
just to detect whether it is an hardware problem, or software related?

cat /proc/cpuinfo gices you something?

try to see cat /proc/whatever just to get some extra info.
Also dmesg might say something, and the /var/log/messages.
I can t really tell you anything else right now.
Back to top
View user's profile Send private message
Heissi
n00b
n00b


Joined: 22 Feb 2008
Posts: 5

PostPosted: Sat Feb 23, 2008 3:21 pm    Post subject: Reply with quote

pathfinder wrote:
well, have you tried with another distro? with windows?
just to detect whether it is an hardware problem, or software related?

cat /proc/cpuinfo gices you something?

try to see cat /proc/whatever just to get some extra info.and s
Also dmesg might say something, and the /var/log/messages.
I can t really tell you anything else right now.


I tried to install windows (I installed it before, so it has to work) and i got a bluescreen. Some interrupt error (IRQL_NOT_LESS_...).
The kernel panic message was similar to this (interrupt exception).

So i can't do anything but buy a new mini-itx mainboard, right?
Back to top
View user's profile Send private message
pathfinder
l33t
l33t


Joined: 19 Jan 2006
Posts: 731
Location: Barcelona, Spain

PostPosted: Sun Feb 24, 2008 2:16 pm    Post subject: Reply with quote

well, that looks like a hard hardware failure... :S
can t really tell you what.
is your Mobo guaranteed? could be useful here...
Back to top
View user's profile Send private message
gundelgauk
n00b
n00b


Joined: 01 Oct 2007
Posts: 40

PostPosted: Sun Feb 24, 2008 5:26 pm    Post subject: Reply with quote

Yes, sounds like faulty hardware. Since you already ruled out RAM and hard drive, it could be the CPU or mainboard. You said yourself that the first time your system froze was when the CPU fan died. Maybe the processor took some damage when that happened.

Apart from that: memtest showing no errors can not guarantee that your RAM is 100% OK. If it does show errors, your RAM is faulty. But it doesn't work the other way round. It might be that your RAM only produces errors when a very specific pattern gets written (or read) to a very specific address. And if memtest does not test exactly this pattern, no error will show up but you still have faulty RAM.
Back to top
View user's profile Send private message
Display posts from previous:   
Reply to topic    Gentoo Forums Forum Index Kernel & Hardware All times are GMT
Page 1 of 1

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum