Gentoo Forums
Gentoo Forums
Gentoo Forums
Quick Search: in
trouble-shooting critical system hang
View unanswered posts
View posts from last 24 hours

 
Reply to topic    Gentoo Forums Forum Index Kernel & Hardware
View previous topic :: View next topic  
Author Message
darkphader
Veteran
Veteran


Joined: 09 May 2002
Posts: 1217
Location: Motown

PostPosted: Tue Jul 08, 2008 8:32 pm    Post subject: trouble-shooting critical system hang Reply with quote

I need some assistance tracking down the cause of a system hang.
The server (no X installed) handles dns, dhcp, and ntp plus file and print sharing via samba. The client mainly notices that the shares are offline and they can't get any work done, but all services are suspended. The monitor is blank and access via ssh is refused. The system does, however, respond to an arping and usually a ping as well. The logs indicate nothing.
There is no magic with magic sysreq - the elephants are very boring and the system does not reboot, it requires a power on reset.
No specific set of circumstances has been identified with the issue. It may run OK for only half an hour, but usually several days to a week. EDIT: should also mention it has only happened during production, never after hours.

Thanks for any clues/ideas.

Chris
_________________
WYSIWYG - What You See Is What You Grep
Back to top
View user's profile Send private message
Baly
n00b
n00b


Joined: 04 Oct 2006
Posts: 10

PostPosted: Tue Jul 08, 2008 11:52 pm    Post subject: Reply with quote

It almost sounds like it may be going OOM, I would suggest you fire off a top job in batch mode dumping to a file every 30 seconds to 5 minutes depending on your preference. That will at least let you have a look at what was occurring prior to the system becoming unresponsive and may give you some additional insight. Something like:

top -b -d 30 > top.out &
Back to top
View user's profile Send private message
pappy_mcfae
Watchman
Watchman


Joined: 27 Dec 2007
Posts: 5999
Location: Pomona, California.

PostPosted: Wed Jul 09, 2008 5:28 am    Post subject: Re: trouble-shooting critical system hang Reply with quote

darkphader wrote:
EDIT: should also mention it has only happened during production, never after hours.

Thanks for any clues/ideas.

Chris


Herein lies the biggest clue. When was the last time you opened said machine and checked it for dust accumulation, restricted air flow, or the like? Did it start acting this way suddenly, or progressively over time? Have you allowed the server to just sit doing nothing (or being "locked") until it comes back around of its own accord? Do you get excessive hdd light operation during the lockup, or does the hdd light stay off?

Since it is ping-able, that tells me it's not dying completely. However, it is clear that something is putting it under a lot of strain.

Good luck on that.

Blessed be!
Pappy
_________________
This space left intentionally blank, except for these ASCII symbols.
Back to top
View user's profile Send private message
darkphader
Veteran
Veteran


Joined: 09 May 2002
Posts: 1217
Location: Motown

PostPosted: Wed Jul 09, 2008 2:12 pm    Post subject: Reply with quote

Thanks for the ideas. I am now running top to see if that will provide any info on a failure. And will also look into the possibility of overheating, although the system is in a rack in a very clean room so I was totally discounting it, but agree it should be examined.
_________________
WYSIWYG - What You See Is What You Grep
Back to top
View user's profile Send private message
pappy_mcfae
Watchman
Watchman


Joined: 27 Dec 2007
Posts: 5999
Location: Pomona, California.

PostPosted: Wed Jul 09, 2008 6:34 pm    Post subject: Reply with quote

"Clean" is a relative term. No matter where computers are operated, there will be dust. Computers like dust, especially the cooling fins of the CPU heatsink.

Blessed be!
Pappy
_________________
This space left intentionally blank, except for these ASCII symbols.
Back to top
View user's profile Send private message
Display posts from previous:   
Reply to topic    Gentoo Forums Forum Index Kernel & Hardware All times are GMT
Page 1 of 1

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum