View previous topic :: View next topic |
Author |
Message |
numerodix l33t
Joined: 18 Jul 2002 Posts: 743 Location: nl.eu
|
Posted: Wed Mar 31, 2004 1:55 pm Post subject: I got a serious problem here |
|
|
I've been running this box as a headless server for months now. It's been serving me well except for one thing, it keeps crashing under stress.
cpu: AMD Duron 1.3GHz
memory: 128MB SDRAM
drive(s): 30 GB Maxtor drive
network adapter: 10/100 SMC EZ Card PCI
os: Gentoo Linux (2.4.20-gentoo-r8 )
The mainboard is some MSI product and I've had a bunch of problems with MSI mobos in the past.
At first I thought it was overheating, I installed lm-sensors and concluded the temperature peaks at around 57 C which should not cause any problems. I took it to the shop and they smeared a special cooling paste on the cpu to make it cool better. Well that seemed to help but it's still unstable sometimes. It does not seem to be bothered much by long compile sessions. But it will crash when I run bittorrent with many downloads for a long time. It will also crash when transferring large amounts of data across the network (I use it as a fileserver).
So I don't know what it is, disk failing? Network adapter? I'd like to run some sort of test suite on it but I have no experience with it. The primary symptom of a crash are recurring segfault messages so I'd like to run some kind of memtest on it. I'm usually logged in with ssh and that fails in these situations, even though the node stays connected to the network and does respond to ping. But apache or anything else won't connect me.
So I don't know what to do, I don't get any error messages cause the server is headless and I'm not proficient with log monitoring. Any suggestions? _________________ undvd - ripping dvds should be as simple as unzip |
|
Back to top |
|
|
Snake007uk Apprentice
Joined: 12 Jan 2003 Posts: 198 Location: London, UK
|
Posted: Wed Mar 31, 2004 2:30 pm Post subject: |
|
|
hi,
why dont u test out what happens when you run it on the the new kernel 2.6.3 ?? see if it could be kernel related ?? _________________ Snake
Dual AMD MP 2800+, Asus A7M266-D, 1GB Ram, 18.1GB u160 HD, ATI Radeon 9600 Pro, Creative Audigy ZS, Intel SRCU31A, Linksys NIC, iiyama 18.1 4637bk lcd |
|
Back to top |
|
|
dol-sen Retired Dev
Joined: 30 Jun 2002 Posts: 2805 Location: Richmond, BC, Canada
|
Posted: Wed Mar 31, 2004 2:33 pm Post subject: |
|
|
what is the chipset used for it, my daughter's computer has one of the first generation via chipsets and gave me problems a few times. It turned out to be memory related, files were corrupt, etc. The first generation via chipset only works properly with Samsung memory.
It's been ok ever since. _________________ Brian
Porthole, the Portage GUI frontend irc@freenode: #gentoo-guis, #porthole, Blog
layman, gentoolkit, CoreBuilder, esearch... |
|
Back to top |
|
|
numerodix l33t
Joined: 18 Jul 2002 Posts: 743 Location: nl.eu
|
Posted: Wed Mar 31, 2004 2:41 pm Post subject: |
|
|
Snake007uk, yeah I've thought about upgrading the kernel, haven't had time for it yet as I gotta test it properly. So far it's only run on 2.4.20x kernels.
dol-sen, how do I check for chipset again?
I don't believe I've had file corruption though. _________________ undvd - ripping dvds should be as simple as unzip |
|
Back to top |
|
|
gwion Apprentice
Joined: 15 May 2003 Posts: 212 Location: Helsinki
|
Posted: Wed Mar 31, 2004 2:49 pm Post subject: |
|
|
i had troubles with msi mainboards. 4 out of 5 broke within 2 years
in my case i had to replace the mainboard (i exchanged all party down to the cpu and tried the computer...) _________________ But the best thing about being an older goth? The fact that no one tries to tell you "It's a phase!" anymore.
--
gwion@jabber.org |
|
Back to top |
|
|
numerodix l33t
Joined: 18 Jul 2002 Posts: 743 Location: nl.eu
|
Posted: Wed Mar 31, 2004 2:52 pm Post subject: |
|
|
Yeah I've had that happen to me too. The reason I have this one is that I bought the parts as a package on discount. _________________ undvd - ripping dvds should be as simple as unzip |
|
Back to top |
|
|
Lews_Therin l33t
Joined: 03 Oct 2003 Posts: 657 Location: Banned
|
Posted: Wed Mar 31, 2004 4:41 pm Post subject: |
|
|
Is there anything you can do that makes it consistantly crash? If you can, do it, and moniter the computer with top up until it dies. If it, say, runs out of memory and then goes down, we'll know what to do. |
|
Back to top |
|
|
numerodix l33t
Joined: 18 Jul 2002 Posts: 743 Location: nl.eu
|
Posted: Wed Mar 31, 2004 4:48 pm Post subject: |
|
|
dol-sen wrote: | what is the chipset used for it, my daughter's computer has one of the first generation via chipsets and gave me problems a few times. It turned out to be memory related, files were corrupt, etc. The first generation via chipset only works properly with Samsung memory.
It's been ok ever since. |
I get this from dmesg:
VP_IDE: VIA vt82c686b (rev 40) IDE UDMA100 controller on pci0000:00:07.1
ide0: BM-DMA at 0xd000-0xd007, BIOS settings: hda:DMA, hdb:DMA
ide1: BM-DMA at 0xd008-0xd00f, BIOS settings: hdc:DMA, hdd:pio
Safe to assume it's a Via 82cxxx mobo then? Is this the one you meant?
Update: I'm now trying the newest ck-sources: 2.6.4-ck1.
Lews_Therin, that's a good idea, I should try that... of course that sounds a lot like running some kind of stress tests I suggested earlier. No takes on that, huh? _________________ undvd - ripping dvds should be as simple as unzip |
|
Back to top |
|
|
mastergoon Apprentice
Joined: 27 Jul 2003 Posts: 161 Location: Portland, OR
|
Posted: Wed Mar 31, 2004 4:56 pm Post subject: |
|
|
after it crashes look at /var/log/messages to see the last thing it said |
|
Back to top |
|
|
numerodix l33t
Joined: 18 Jul 2002 Posts: 743 Location: nl.eu
|
Posted: Wed Mar 31, 2004 5:01 pm Post subject: |
|
|
No such thing, I have vcron. Would that by any chance be the equivalent to /var/log/everything/current? _________________ undvd - ripping dvds should be as simple as unzip |
|
Back to top |
|
|
Budro Tux's lil' helper
Joined: 17 Aug 2002 Posts: 91 Location: MD, USA
|
Posted: Wed Mar 31, 2004 5:21 pm Post subject: Change the NIC card ?? |
|
|
numer.....
Maybe try changing out the NIC. You stated that it crashes while using bittorrent so maybe all those huge downloads stressing the NIC causing problems.
Also, maybe try moving the NIC card to a different slot on the MOBO, check the Manual to see which PCI slots are sharing an IRQ with another device/slot.
Just some other options to think about .....
For a good network testing program I use "nuttcp" ..... check out .. http://sd.wareonearth.com/~phil/net/ttcp/
Good luck, |
|
Back to top |
|
|
numerodix l33t
Joined: 18 Jul 2002 Posts: 743 Location: nl.eu
|
Posted: Wed Mar 31, 2004 5:44 pm Post subject: |
|
|
Well this is an interesting development. I'm in the middle of a file transfer, one of those that would guarantee a meltdown within 30 minutes and we're still rolling. I won't conclude anything from this yet but it does seem that the kernel had some effect at least, I know that the cfq io scheduler is being used here, dunno what that really means vis-a-vis my problem of network connections crashing the system.
How ironic, system went down just as I was writing this. Seems there has been some improvement nonetheless.
Budro, one doesn't want to rule out any possible cause but I have 3 of these cards and I've used them interchangably in various computers without ever being able to pin any problems on them.
UPDATE: I did what was suggested, checked the last output from the logs before crash and it suggested some of the services I had running might not have been happy about something. I tried turning them off (rc boot) and I was able to transfer 14GB in one gasp, never done before. So it seems the services certainly had something to do with the problem, which is a little worrying as a hardware problem would be easier to solve probably. Though I had a lot of stuff running; xinetd, xdm, webmin, samba, postfix, courier-imapd, nfs, distccd, apache and mysql. _________________ undvd - ripping dvds should be as simple as unzip |
|
Back to top |
|
|
Lews_Therin l33t
Joined: 03 Oct 2003 Posts: 657 Location: Banned
|
Posted: Wed Mar 31, 2004 7:39 pm Post subject: |
|
|
Xdm on a headless box? That's defininately not needed, unless you're using Xs network transparency. With the others, add them back in until it stops working...you'll know the culprit then, and we can hunt down what's wrong (might just need a reimerge). |
|
Back to top |
|
|
mastergoon Apprentice
Joined: 27 Jul 2003 Posts: 161 Location: Portland, OR
|
Posted: Wed Mar 31, 2004 9:17 pm Post subject: |
|
|
for future reference, vcron doesnt have to do with your log files. you have a system logger installed of some sort, obviously not sysklogd not sure what you have. |
|
Back to top |
|
|
qarce n00b
Joined: 28 Mar 2003 Posts: 18 Location: us - california
|
Posted: Wed May 12, 2004 6:01 pm Post subject: Check your memory!!!! |
|
|
I've seen this before.....
Drop in the Gentoo CD
at the boot prompt type memtest
Then give the system a day of so to run.
I have had a few systems now where the memory slots just start going.... one at a time....
In the end it's a mother board replacment time. |
|
Back to top |
|
|
qarce n00b
Joined: 28 Mar 2003 Posts: 18 Location: us - california
|
Posted: Wed May 12, 2004 6:04 pm Post subject: One other ? |
|
|
Mother board....
How old is the mother board?
I had an AMD motherboard a while back that had an IDE problem...
All drives on one IDE channel and everything was cool. Then any write between drives on the two chains would cause file system corruption. In the end I would end up with the box crashing with a lot of File system errors.
There is an Kernel option now to solve this problem. I think it was in the VIA chipset... |
|
Back to top |
|
|
carambola5 Apprentice
Joined: 10 Jul 2002 Posts: 214
|
Posted: Thu May 13, 2004 3:24 am Post subject: |
|
|
There seems to be a standard flow in checking for (apparent) hardware problems:
-Make sure it's not software (try a new kernel)
-overheating (i see you checked that already)
-busted ram
-busted hdd
-busted psu
-busted mobo
-busted processor
beyond that, it's a crapshoot. i suggest you mark off everything on that list (if you're so inclined... temporally and financially) |
|
Back to top |
|
|
Sphynxx n00b
Joined: 12 Jun 2003 Posts: 23 Location: Abilene/Dallas, TX
|
Posted: Thu May 13, 2004 6:13 am Post subject: |
|
|
Never ever have had issues with MSI boards. I insist on using them, as a matter of fact However, I have read articals on a certain board that they made a few years back. I believe it was either the Via 133, or 133A chipset.. then again, it might have even been the 266 or 266A. Is your board the green, or red color? Post what board it is please |
|
Back to top |
|
|
|