View previous topic :: View next topic |
Author |
Message |
moult Retired Dev
Joined: 31 Mar 2008 Posts: 146 Location: Australia
|
Posted: Thu Sep 24, 2015 1:31 pm Post subject: Computer freezing, no messages in syslog |
|
|
So, this has started happening rather regularly. Seems to be triggered by intense CPU and/or diskwriting activity. ThinkPad T420i. It'll freeze such that the caps lock light will start blinking, and if sound was playing, the sound would loop. No messages in syslog. No mouse or keyboard response. SysRq+REISUB doesn't do anything. Tried checking smartmontools on /dev/sda and it says that there aren't any issues. Laptop is about 3.5-4 years old.
What else can I do to debug/assess the issue? _________________ thinkMoult - I write articles online. You might like some of them.
Planet Larry - do you write a blog and use Gentoo? Get your blog added to the Planet Larry Gentoo user blog aggregator! |
|
Back to top |
|
|
eccerr0r Watchman
Joined: 01 Jul 2004 Posts: 9679 Location: almost Mile High in the USA
|
Posted: Thu Sep 24, 2015 2:23 pm Post subject: |
|
|
The blinking capslock means that the kernel panicked. It looks like magic-sysreq doesn't work when it panics so that's at least consistent... I'd make sure RAM is good, else you'd probably have to somehow get the oops info, perhaps via serial console or something rather. These newer hardware must require special debug hardware... _________________ Intel Core i7 2700K/Radeon R7 250/24GB DDR3/256GB SSD
What am I supposed watching? |
|
Back to top |
|
|
steveL Watchman
Joined: 13 Sep 2006 Posts: 5153 Location: The Peanut Gallery
|
Posted: Thu Sep 24, 2015 3:22 pm Post subject: |
|
|
Check your cooling, and clean out the insides, as well as memtest as indicated. |
|
Back to top |
|
|
moult Retired Dev
Joined: 31 Mar 2008 Posts: 146 Location: Australia
|
Posted: Fri Sep 25, 2015 9:45 pm Post subject: |
|
|
Cooling checked, (if it overheats, it'll shutdown, not kernel panic). Also just cleaned out the insides just to be sure. memtest86+ done, nothing found, although I did only have time to do one pass, I'll leave it to do more passes just to be sure.
I'm rethinking my original suspect of CPU activity - I've just compiled 330 packages (now at 75degC) without issue. Maybe cleaning out the insides helped (it didn't seem too dirty, though, a bit of fluff but not much). _________________ thinkMoult - I write articles online. You might like some of them.
Planet Larry - do you write a blog and use Gentoo? Get your blog added to the Planet Larry Gentoo user blog aggregator! |
|
Back to top |
|
|
krinn Watchman
Joined: 02 May 2003 Posts: 7470
|
Posted: Sat Sep 26, 2015 12:58 am Post subject: |
|
|
Moult wrote: | Cooling checked, (if it overheats, it'll shutdown, not kernel panic). |
It will not shutdown, it "may" or "should".
Intel cpu (from pentium3 era) are made to lower their clocking in order to protect themselves from heat, and just never issue any shutdown themselves.
the shutdown is a bios feature m/b may have or a software feature (i don't know what kernel part handle that). It was primary made for amd cpu that lack the feature (but i suppose they now have one too).
i had own an asus m/b for intel with the feature, but in this case, it's useless except for commercial purpose (it's even worst, as it is better to save your work at a shitty clock speed than an emergency shutdown without saving it).
Most amd cpu are getting mad from heat, and even with the feature to shutdown when too hot, they are creating errors under heat trouble, making most of the time the os freeze/crash/reboot before the shutdown could be really made. It also affect indirectly intel cpu too, because heat goes up, case get hot, and memory/video/scsi cards are crashing the os...
And even configured, if your shutdown trigger is set too high, it just never reach it, as the cpu->downclock itself->heat goes back to normal->cpu clock back to normal->too hot again->downclock... and your software/bios part is waiting to reach a temp it never reach.
As you see, it's really optimistic to assume a shutdown for heat trouble will be made, as even under heat pain, you will certainly never seen any.
Intel cpu report error thru MCE events (i think amd cpu are doing it too, just because i remember the mce amd feature is present in kernel conf, but i'm less sure mce heat errors are report by them or not)
If you want be sure, then check your mce status (kernel feature enable + app-admin/mcelog)
But honestly, you can just guess it
- working hard->crash/reboot/freeze = too hot cpu
- laptop -> small space = heat trouble
So a user reporting any trouble with a laptop working hard, you can put bet 99.99% time on heat. |
|
Back to top |
|
|
steveL Watchman
Joined: 13 Sep 2006 Posts: 5153 Location: The Peanut Gallery
|
Posted: Sat Sep 26, 2015 7:51 am Post subject: |
|
|
krinn wrote: | So a user reporting any trouble with a laptop working hard, you can put bet 99.99% time on heat. |
Yup; especially as people don't usually clean them out, for fear of voiding a warranty, or that they won't be able to put it back together. |
|
Back to top |
|
|
moult Retired Dev
Joined: 31 Mar 2008 Posts: 146 Location: Australia
|
Posted: Sat Sep 26, 2015 10:49 pm Post subject: |
|
|
You're right, it is a temperature issue. But the plot thickens!
By default (bios feature or cpu feature, not sure) it'll auto-shutdown at 83degC. This happened way too often for comfort so I wrote my own script to adjust the CPU governor and fan speeds depending on the temperature. Recently I noticed that my script had hung - and the command `cpupower frequency-set -g powersave` (or whatever governor) would turn into a zombie process (kill -9 no worky) instead of properly setting the governor. I suspect this means there was an issue with the CPU that wasn't responding to the governor request, and so it worked itself too hard and kernel panicked. _________________ thinkMoult - I write articles online. You might like some of them.
Planet Larry - do you write a blog and use Gentoo? Get your blog added to the Planet Larry Gentoo user blog aggregator! |
|
Back to top |
|
|
moult Retired Dev
Joined: 31 Mar 2008 Posts: 146 Location: Australia
|
Posted: Mon Sep 28, 2015 9:35 pm Post subject: |
|
|
Hmm - perhaps not just a temperature issue. I can trigger it without a temperature issue if I write a lot of data (>500mb) to an externally mounted sd card (vfat) or ext hard drive (ntfs). What does that imply? _________________ thinkMoult - I write articles online. You might like some of them.
Planet Larry - do you write a blog and use Gentoo? Get your blog added to the Planet Larry Gentoo user blog aggregator! |
|
Back to top |
|
|
|
|
You cannot post new topics in this forum You cannot reply to topics in this forum You cannot edit your posts in this forum You cannot delete your posts in this forum You cannot vote in polls in this forum
|
|