Gentoo Forums
Gentoo Forums
Gentoo Forums
Quick Search: in
crc error -- SYSTEM HALTED
View unanswered posts
View posts from last 24 hours

 
Reply to topic    Gentoo Forums Forum Index Other Things Gentoo
View previous topic :: View next topic  
Author Message
McManus
Apprentice
Apprentice


Joined: 10 Apr 2002
Posts: 176
Location: Austin, TX

PostPosted: Mon Jun 17, 2002 8:47 pm    Post subject: crc error -- SYSTEM HALTED Reply with quote

Sup everybody. I've been getting this error ever since I got my new computer (dually AMD MP system on a Tyan Tiger MPX board). Does ANYBODY have any idea how to fix it or what is causing it? Basically, it happens about 50% of the time right after the kernel gets loaded, and the system immediately halts. The other 50% of the time I boot into Gentoo and everything works 100% fine (great job on the audigy drivers, btw).

I've been researching this on LNO for a while, but have come up with no answers yet. At first I thought it was a hardware issue, but my drives are all fine, I checked and tested them thoroughly. I also thought it was a cable issue (switched them out to normal ATA66/100 cables), and it still didn't fix it. I've also played around with the kernel about a million times, and searched Google extensively but found nothing (except possibly hardware issues, which I don't believe is so).

So basically... have any of you guys experienced this problem? I am thinking it is either a motherboard BIOS issue (I am waiting for the new BIOS to come out, it's still in beta-stage), or a kernel driver issue with the motherboard chipset or with the Promise ATA133 controller (I tried installing it 2x, once as onboard IDE and once on the Promise ATA133 card, same thing with both installs). Please help, as it is rather annoying to only be able to boot into Linux 50% of the time (as opposed to windows' 99% [heh, notice it's not 100%...] ), and I could really use some Gentoo-specific help here from y'all Gentoo-experts :D.

One last thing, I have tried it with both vanilla and gentoo-source kernels.
_________________
McManus
----
Linux user #267375 - http://counter.li.org
Back to top
View user's profile Send private message
delta407
Bodhisattva
Bodhisattva


Joined: 23 Apr 2002
Posts: 2876
Location: Chicago, IL

PostPosted: Mon Jun 17, 2002 8:50 pm    Post subject: Reply with quote

If you're getting a CRC error on boot from the kernel -- and it's sporadic == it is almost certainly a hardware issue. Your hard drive might be going bad, you might have a bad cable, your mobo could be having issues, a lower chunk of your RAM might be flaky, etc. If the same data comes up with two different CRCs, then something is wrong.
Back to top
View user's profile Send private message
fghellar
Bodhisattva
Bodhisattva


Joined: 10 Apr 2002
Posts: 856
Location: Porto Alegre, BR

PostPosted: Mon Jun 17, 2002 8:51 pm    Post subject: Reply with quote

Have you tried to test your memory? This is a good tool for it: http://www.teresaudio.com/memtest86/.
_________________
| www.gentoo.org | www.tldp.org | www.google.com |
Back to top
View user's profile Send private message
fghellar
Bodhisattva
Bodhisattva


Joined: 10 Apr 2002
Posts: 856
Location: Porto Alegre, BR

PostPosted: Mon Jun 17, 2002 8:55 pm    Post subject: Reply with quote

delta407 wrote:
If you're getting a CRC error on boot from the kernel -- and it's sporadic == it is almost certainly a hardware issue.

I agree. I also had some bad experiences when I upgraded my system: http://forums.viaarena.com/messageview.cfm?catid=18&threadid=11805
_________________
| www.gentoo.org | www.tldp.org | www.google.com |
Back to top
View user's profile Send private message
McManus
Apprentice
Apprentice


Joined: 10 Apr 2002
Posts: 176
Location: Austin, TX

PostPosted: Mon Jun 17, 2002 9:06 pm    Post subject: Reply with quote

fghellar wrote:
Have you tried to test your memory? This is a good tool for it: http://www.teresaudio.com/memtest86/.


Thank you, I will test it tonight.
_________________
McManus
----
Linux user #267375 - http://counter.li.org
Back to top
View user's profile Send private message
McManus
Apprentice
Apprentice


Joined: 10 Apr 2002
Posts: 176
Location: Austin, TX

PostPosted: Mon Jun 17, 2002 9:07 pm    Post subject: Reply with quote

delta407 wrote:
If you're getting a CRC error on boot from the kernel -- and it's sporadic == it is almost certainly a hardware issue. Your hard drive might be going bad, you might have a bad cable, your mobo could be having issues, a lower chunk of your RAM might be flaky, etc. If the same data comes up with two different CRCs, then something is wrong.


Hrm, what do you mean by the same data coming up with 2 different CRCs? Is there a way to check? (it doesn't end up in any of my logs when I get the crc error)
_________________
McManus
----
Linux user #267375 - http://counter.li.org
Back to top
View user's profile Send private message
McManus
Apprentice
Apprentice


Joined: 10 Apr 2002
Posts: 176
Location: Austin, TX

PostPosted: Mon Jun 17, 2002 9:10 pm    Post subject: Reply with quote

fghellar wrote:
delta407 wrote:
If you're getting a CRC error on boot from the kernel -- and it's sporadic == it is almost certainly a hardware issue.

I agree. I also had some bad experiences when I upgraded my system: http://forums.viaarena.com/messageview.cfm?catid=18&threadid=11805


Hrm, I am thinking it's a memory issue. However, I should note that windows never has any problems. Well, then again... it pauses every once in a while for the ECC, but it almost always recovers from it (even in the middle of a game)
_________________
McManus
----
Linux user #267375 - http://counter.li.org
Back to top
View user's profile Send private message
delta407
Bodhisattva
Bodhisattva


Joined: 23 Apr 2002
Posts: 2876
Location: Chicago, IL

PostPosted: Mon Jun 17, 2002 9:10 pm    Post subject: Reply with quote

Your error indicated that the CRC computed from the read-from-disk data was different than the CRC that was stored in it, meaning that something didn't compute right somehwere along the line. Plus, it changes, because it sometimes comes up correctly (matching the stored CRC). The kernel, as stored on disk, is being read or acted upon in two different ways -- randomly -- which just screams to me of a hardware issue. Something is running too hot/too fast/too hard/too long or is simply not working. I would say to check your hard drive, RAM, and CPU; definitely stop overclocking if you are.
Back to top
View user's profile Send private message
McManus
Apprentice
Apprentice


Joined: 10 Apr 2002
Posts: 176
Location: Austin, TX

PostPosted: Mon Jun 17, 2002 9:21 pm    Post subject: Reply with quote

delta407 wrote:
Your error indicated that the CRC computed from the read-from-disk data was different than the CRC that was stored in it, meaning that something didn't compute right somehwere along the line. Plus, it changes, because it sometimes comes up correctly (matching the stored CRC). The kernel, as stored on disk, is being read or acted upon in two different ways -- randomly -- which just screams to me of a hardware issue. Something is running too hot/too fast/too hard/too long or is simply not working. I would say to check your hard drive, RAM, and CPU; definitely stop overclocking if you are.


I am definitely not overclocking, as it tends to break things, and I don't really need that much more speed. I am testing the RAM tonight, and have also tested the hard drives (and cables) extensively. If the RAM turns out to be okay, the only thing left will be the CPU. How on earth would I test it? The system almost NEVER crashes, and is very stable, WHEN I actually get into linux, that is. Maybe it's an SMP issue? No clue...
_________________
McManus
----
Linux user #267375 - http://counter.li.org
Back to top
View user's profile Send private message
delta407
Bodhisattva
Bodhisattva


Joined: 23 Apr 2002
Posts: 2876
Location: Chicago, IL

PostPosted: Mon Jun 17, 2002 9:57 pm    Post subject: Reply with quote

As far as the CPU goes, you could either try each CPU separately in another computer (that doesn't have any issues) and try stressing it (like doing sixteen concurrent kernel compiles from a RAM disk... or something). Or, (and this is less reliable,) you could try each CPU individually.

Do you have problems booting, say, the Gentoo install CD?
Back to top
View user's profile Send private message
McManus
Apprentice
Apprentice


Joined: 10 Apr 2002
Posts: 176
Location: Austin, TX

PostPosted: Mon Jun 17, 2002 10:14 pm    Post subject: Reply with quote

delta407 wrote:
As far as the CPU goes, you could either try each CPU separately in another computer (that doesn't have any issues) and try stressing it (like doing sixteen concurrent kernel compiles from a RAM disk... or something). Or, (and this is less reliable,) you could try each CPU individually.

Do you have problems booting, say, the Gentoo install CD?


The install CD? Never any problems. And I have done like sixteen concurrent kernel compiles (tho not from a RAM disk, so er, nevermind, hehe). I also wish I had the $$$ to test it in another system, too.

Oh, one thing I should mention. I "feel" that the system boots up more often in non-fb mode (versus me using the vga=791 line in lilo.conf) and that I get into linux more often. Though it is only a very rough generalization. It's not like I sat there with pen & paper & counted (usually I am just very happy to be in linux, so I stay, hehe). I am using a GF4 Ti4600, so who knows, maybe it's that nVidia & AMD incompatibility thingy.
_________________
McManus
----
Linux user #267375 - http://counter.li.org
Back to top
View user's profile Send private message
delta407
Bodhisattva
Bodhisattva


Joined: 23 Apr 2002
Posts: 2876
Location: Chicago, IL

PostPosted: Mon Jun 17, 2002 11:11 pm    Post subject: Reply with quote

If you haven't had any issues with the install CD (why not do a half dozen reboots or so just to make sure :D), then it's most likely hard-drive related. Could be the cable, but that's doubtful; your disk could be failing.

Try getting md5sums of your /boot directory; i.e. mount -o ro /dev/BOOT /boot; md5sum --binary `find /boot -type f` and comparing them across reboots, both from the local disk and from the CD. You should be able to see then if the hard drive is returning the data correctly.
Back to top
View user's profile Send private message
delta407
Bodhisattva
Bodhisattva


Joined: 23 Apr 2002
Posts: 2876
Location: Chicago, IL

PostPosted: Mon Jun 17, 2002 11:32 pm    Post subject: Reply with quote

Oh, also, to get a number that represents your entire /boot partition over multiple passes, run the following at a bash prompt:

Code:
for ((i=1; i <= 100 ; i++)); do echo -n "Pass $i... "; mount /boot; md5sum --binary `find /boot -type f` | md5sum; umount /boot; done


It will print something like the following:

Code:
# for ((i=1; i <= 100 ; i++)); do echo -n "Pass $i... " ;mount /boot; md5sum --binary `find /boot -type f` | md5sum; umount /boot; done
Pass 1... 7588bfd7fc9a378012368ed4252f54f5  -
Pass 2... 7588bfd7fc9a378012368ed4252f54f5  -
Pass 3... 7588bfd7fc9a378012368ed4252f54f5  -
Pass 4... 7588bfd7fc9a378012368ed4252f54f5  -
Pass 5... 7588bfd7fc9a378012368ed4252f54f5  -
Pass 6... 7588bfd7fc9a378012368ed4252f54f5  -
Pass 7... 7588bfd7fc9a378012368ed4252f54f5  -
...


It'll really thrash your hard drive about but will a) give you a nice stress test and b) give you a quick and easy visual check that your boot partition is in order. My advice would be to run it once within Gentoo and once off the install CD and compare sums. If the numbers turn up different in sequential passes, you know you have a problem.
Back to top
View user's profile Send private message
arkane
l33t
l33t


Joined: 30 Apr 2002
Posts: 918
Location: Phoenix, AZ

PostPosted: Tue Jun 18, 2002 1:09 am    Post subject: Reply with quote

delta407 wrote:
Oh, also, to get a number that represents your entire /boot partition over multiple passes, run the following at a bash prompt:

Code:
for ((i=1; i <= 100 ; i++)); do echo -n "Pass $i... "; mount /boot; md5sum --binary `find /boot -type f` | md5sum; umount /boot; done


It will print something like the following:

Code:
# for ((i=1; i <= 100 ; i++)); do echo -n "Pass $i... " ;mount /boot; md5sum --binary `find /boot -type f` | md5sum; umount /boot; done
Pass 1... 7588bfd7fc9a378012368ed4252f54f5  -
Pass 2... 7588bfd7fc9a378012368ed4252f54f5  -
Pass 3... 7588bfd7fc9a378012368ed4252f54f5  -
Pass 4... 7588bfd7fc9a378012368ed4252f54f5  -
Pass 5... 7588bfd7fc9a378012368ed4252f54f5  -
Pass 6... 7588bfd7fc9a378012368ed4252f54f5  -
Pass 7... 7588bfd7fc9a378012368ed4252f54f5  -
...


It'll really thrash your hard drive about but will a) give you a nice stress test and b) give you a quick and easy visual check that your boot partition is in order. My advice would be to run it once within Gentoo and once off the install CD and compare sums. If the numbers turn up different in sequential passes, you know you have a problem.


That's a cool little test, you should put that into the tips and tricks section.

I just ran it on my drive to make sure, and it definately works.

--
Dan
Back to top
View user's profile Send private message
delta407
Bodhisattva
Bodhisattva


Joined: 23 Apr 2002
Posts: 2876
Location: Chicago, IL

PostPosted: Tue Jun 18, 2002 1:15 am    Post subject: Reply with quote

arkane wrote:
That's a cool little test, you should put that into the tips and tricks section.

I just ran it on my drive to make sure, and it definately works.


Well, that's actually the hard way. It checks each of the individual files separately, and tends to stress things more (which is probably what we want in this case). You can md5 the whole partition like this:

Code:
dd if=/dev/hda1 2>/dev/null | md5sum


But, that's just one sequential read of the whole partition (which is less error-prone, but we want to find errors), so if it's got a lot of free space you'll run into problems.

Also, make sure it's unmounted, or bad things might happen...
Back to top
View user's profile Send private message
McManus
Apprentice
Apprentice


Joined: 10 Apr 2002
Posts: 176
Location: Austin, TX

PostPosted: Tue Jun 18, 2002 5:14 am    Post subject: Reply with quote

delta407 wrote:
If you haven't had any issues with the install CD (why not do a half dozen reboots or so just to make sure :D), then it's most likely hard-drive related. Could be the cable, but that's doubtful; your disk could be failing.

Try getting md5sums of your /boot directory; i.e. mount -o ro /dev/BOOT /boot; md5sum --binary `find /boot -type f` and comparing them across reboots, both from the local disk and from the CD. You should be able to see then if the hard drive is returning the data correctly.


Okay.. I will try your "little" test :) But one quick question. Why is it that if I can boot the install CD's kernel fine but not my own, then it's a HD issue? Why wouldn't that be a "misconfigured" kernel issue?
_________________
McManus
----
Linux user #267375 - http://counter.li.org
Back to top
View user's profile Send private message
delta407
Bodhisattva
Bodhisattva


Joined: 23 Apr 2002
Posts: 2876
Location: Chicago, IL

PostPosted: Tue Jun 18, 2002 5:16 am    Post subject: Reply with quote

If it was misconfigured, it would either work 100% of the time or fail 100% of the time. Sometimes working and sometimes not, without changing the input data, is a very good sign of a hardware issue.
Back to top
View user's profile Send private message
McManus
Apprentice
Apprentice


Joined: 10 Apr 2002
Posts: 176
Location: Austin, TX

PostPosted: Tue Jun 18, 2002 5:30 am    Post subject: Reply with quote

delta407 wrote:
Oh, also, to get a number that represents your entire /boot partition over multiple passes, run the following at a bash prompt:

Code:
for ((i=1; i <= 100 ; i++)); do echo -n "Pass $i... "; mount /boot; md5sum --binary `find /boot -type f` | md5sum; umount /boot; done


It will print something like the following:

Code:
# for ((i=1; i <= 100 ; i++)); do echo -n "Pass $i... " ;mount /boot; md5sum --binary `find /boot -type f` | md5sum; umount /boot; done
Pass 1... 7588bfd7fc9a378012368ed4252f54f5  -
Pass 2... 7588bfd7fc9a378012368ed4252f54f5  -
Pass 3... 7588bfd7fc9a378012368ed4252f54f5  -
Pass 4... 7588bfd7fc9a378012368ed4252f54f5  -
Pass 5... 7588bfd7fc9a378012368ed4252f54f5  -
Pass 6... 7588bfd7fc9a378012368ed4252f54f5  -
Pass 7... 7588bfd7fc9a378012368ed4252f54f5  -
...


It'll really thrash your hard drive about but will a) give you a nice stress test and b) give you a quick and easy visual check that your boot partition is in order. My advice would be to run it once within Gentoo and once off the install CD and compare sums. If the numbers turn up different in sequential passes, you know you have a problem.


ahhh..... that's NOT an apostrophe.. it's the thing next to the 1... :)

btw, I get different results with the for-loop (for blah blah blah) than with the:

dd if=/dev/hde1 2> /dev/null | md5sum

but.. both are consistent across the board with themselves. It seems like, tho, that everytime I do it I get different results... weird
_________________
McManus
----
Linux user #267375 - http://counter.li.org
Back to top
View user's profile Send private message
delta407
Bodhisattva
Bodhisattva


Joined: 23 Apr 2002
Posts: 2876
Location: Chicago, IL

PostPosted: Tue Jun 18, 2002 5:35 am    Post subject: Reply with quote

The find-related loop should be different than the dd | md5sum.

Quote:
It seems like, tho, that everytime I do it I get different results... weird


You mean that the command loops are internally consistent (i.e. between passes) but not consistent between runs (i.e. rebooting or whatever and running the loop again)?
Back to top
View user's profile Send private message
delta407
Bodhisattva
Bodhisattva


Joined: 23 Apr 2002
Posts: 2876
Location: Chicago, IL

PostPosted: Tue Jun 18, 2002 5:45 am    Post subject: Reply with quote

Well, I'm going to bed, but here: trust the big long for loop thing, because if you md5 the whole partition, you'll also sum the journal and mount count and whatnot which can and probably will change when you run the big loop.

So, if the big loop at any time ever produces a different number -- across passes, retries, or reboots -- then something is definitely wrong, and it's likely your hard drive. You should get the same number booted from Gentoo as you would booted from the rescue disk, as long as the partition is always mounted read-only.
Back to top
View user's profile Send private message
McManus
Apprentice
Apprentice


Joined: 10 Apr 2002
Posts: 176
Location: Austin, TX

PostPosted: Tue Jun 18, 2002 5:48 am    Post subject: Reply with quote

delta407 wrote:
The find-related loop should be different than the dd | md5sum.

Quote:
It seems like, tho, that everytime I do it I get different results... weird


You mean that the command loops are internally consistent (i.e. between passes) but not consistent between runs (i.e. rebooting or whatever and running the loop again)?


Wow, it's great being able to get help @ 1 am :)

What I mean, is that, when I ran the dd if= and then the big for -loop, and then I ran the dd if= again, I would get diff. results for the dd if= test. For loop test always stayed the same, though.

I am running memtest86 right now, seeing if I can find something wrong with my RAM. Overall, I bet it's just some archaic incompatability with my mobo's BIOS and the kernel.
_________________
McManus
----
Linux user #267375 - http://counter.li.org
Back to top
View user's profile Send private message
McManus
Apprentice
Apprentice


Joined: 10 Apr 2002
Posts: 176
Location: Austin, TX

PostPosted: Tue Jun 18, 2002 5:55 am    Post subject: Reply with quote

delta407 wrote:
Well, I'm going to bed, but here: trust the big long for loop thing, because if you md5 the whole partition, you'll also sum the journal and mount count and whatnot which can and probably will change when you run the big loop.

So, if the big loop at any time ever produces a different number -- across passes, retries, or reboots -- then something is definitely wrong, and it's likely your hard drive. You should get the same number booted from Gentoo as you would booted from the rescue disk, as long as the partition is always mounted read-only.


Yeah, I'm going to bed too. Thanks a lot for the help.

Thanks for clearing up the diff. between the two tests. Tomorrow evening I will test out the md5 for loop test whilst booting off of the install CD. I think the HD & cables are fine, tho.

BTW, how are you supposed to mount /boot? I am using defaults 1 1, or should it be defaults 1 2 or defaults 0 0 ?? Prolly doesn't even make a difference, oh well. I bet it's that weird AMD incompatibility thing with NVidia cards or something like that, 'cuz I dont' get crc errors as OFTEN when I'm not using fb. (because the only way I can tell it's a crc error is when I'm not in fb mode, 'cuz when I am all I get is a blank screen)
_________________
McManus
----
Linux user #267375 - http://counter.li.org
Back to top
View user's profile Send private message
delta407
Bodhisattva
Bodhisattva


Joined: 23 Apr 2002
Posts: 2876
Location: Chicago, IL

PostPosted: Tue Jun 18, 2002 6:02 am    Post subject: Reply with quote

I have /boot set to noauto (don't mount on boot) and 1 1.

As far as framebuffer support goes, if the kernel is giving you CRC errors, that (AFAIK) happens before pretty much anything else, including initializing framebuffer stuff.
Back to top
View user's profile Send private message
McManus
Apprentice
Apprentice


Joined: 10 Apr 2002
Posts: 176
Location: Austin, TX

PostPosted: Tue Jun 18, 2002 4:49 pm    Post subject: Reply with quote

delta407 wrote:
I have /boot set to noauto (don't mount on boot) and 1 1.

As far as framebuffer support goes, if the kernel is giving you CRC errors, that (AFAIK) happens before pretty much anything else, including initializing framebuffer stuff.


er, well, I meant that I can't see what kind of error it is when I use framebuffer mode because it never pops up. Screen changes for fb mode, but nothing comes on the screen :) I only found out (a week ago?) that they were crc errors after I turned off fb mode.

Update: I ran memtest86 just fine last night when I went to bed, but that was with ECC off. After I turned it on, I started getting a few ECC errors (which were corrected, according to the test). But get this.. this is an excerpt from Crucial about my motherboard:

"The Tiger MPX supports non-registered DDR SDRAM in the first 2 memory sockets only (DIMM1 and DIMM2, as labeled on the motherboard). Registered DIMMs are supported in all sockets. "

So.... maybe it just doesn't like that configuration. I will try memtest86 on a different DIMM slot tonight and see how things work. But either way, either the mobo has some BIOS issues/quirks or I need new RAM (and I think there is where my troubles lie) :)
_________________
McManus
----
Linux user #267375 - http://counter.li.org
Back to top
View user's profile Send private message
McManus
Apprentice
Apprentice


Joined: 10 Apr 2002
Posts: 176
Location: Austin, TX

PostPosted: Wed Jun 19, 2002 4:04 am    Post subject: Reply with quote

okay, I really think my memory is bad. When I run memtest86 I get errors at failing address 00000000000 (I guess the very first location). *sigh* I have bought new RAM, and am going to get the old one RMA'd
_________________
McManus
----
Linux user #267375 - http://counter.li.org
Back to top
View user's profile Send private message
Display posts from previous:   
Reply to topic    Gentoo Forums Forum Index Other Things Gentoo All times are GMT
Page 1 of 1

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum