View previous topic :: View next topic |
Author |
Message |
WvR Apprentice
Joined: 03 Mar 2011 Posts: 200 Location: Tsuruga, Japan
|
Posted: Fri Feb 01, 2013 12:05 am Post subject: i915: GPU hung, declared wedged.... tips? |
|
|
I use a Lenovo ThinkPad X201i with gentoo with full satisfaction. However, recently I am experiencing some issues:
Problem: when working with Gnome (v3.6.x), at some point the interface freezes. I can move the mouse, and the cursor will move over the screen, but for instance the clock is frozen. After several seconds, the active window is blacked out. Then, I get the black screen that says "Sorry, something has gone wrong and the system cannot recover. Call a system administrator".
The keyboard is responsive. I use CTRL-ALT-F1 to get to a tty, log in as root, and restart XDM. Then, X will restart, but before the GDM login screen appears, I get the same error message: "Sorry, something has gone wrong and the system cannot recover. Call a system administrator"
After trying several things, I have a feeling that it is an issue with the Intel i915 driver. A snippet from /var/log/messages
Code: |
Jan 31 16:04:17 rine50 kernel: [ 5887.075138] [drm:i915_hangcheck_hung] *ERROR* Hangcheck timer elapsed... GPU hung
Jan 31 16:04:17 rine50 kernel: [ 5887.075145] [drm] capturing error event; look for more information in /debug/dri/0/i915_error_state
Jan 31 16:04:19 rine50 kernel: [ 5888.691015] [drm:i915_hangcheck_hung] *ERROR* Hangcheck timer elapsed... GPU hung
Jan 31 16:04:19 rine50 kernel: [ 5888.691326] [drm:i915_reset] *ERROR* GPU hanging too fast, declaring wedged!
Jan 31 16:04:19 rine50 kernel: [ 5888.691334] [drm:i915_reset] *ERROR* Failed to reset chip.
|
Browsing in Google, I discovered this error, but in all cases with linux kernel 2.6.3x. I use 3.7.4. A more recent message pointed to a buggy combination of BIOS and hardware on a particular type of Intel motherboard. But in my case, the laptop was fine until now. Am I looking at broken hardware? If so, how to find out? Or can I somehow switch off the DRM and use the "old-style" Gnome instead of the "new-style" Gnome. I have used my laptop also with Xsession (twm) - in this case the error does not occur but I have not used the system long enough with twm to make a definitive conclusion.
Any tips are welcome. Is there a way to "stress test" the GPU? |
|
Back to top |
|
|
BillWho Veteran
Joined: 03 Mar 2012 Posts: 1600 Location: US
|
Posted: Fri Feb 01, 2013 1:15 am Post subject: |
|
|
WvR,
Did you check /sys/kernel/debug/dri/0/i915_error_state Maybe some better clues there. _________________ Good luck
Since installing gentoo, my life has become one long emerge |
|
Back to top |
|
|
WvR Apprentice
Joined: 03 Mar 2011 Posts: 200 Location: Tsuruga, Japan
|
Posted: Fri Feb 01, 2013 6:41 am Post subject: |
|
|
It happened again.... This time I copied the i915_error_state. It does not give much help. It is a very long list of register contents in hexadecimal form.
Just today the intel driver (xf86_video_intel) was updated but apparently that does not help..... |
|
Back to top |
|
|
WvR Apprentice
Joined: 03 Mar 2011 Posts: 200 Location: Tsuruga, Japan
|
|
Back to top |
|
|
WvR Apprentice
Joined: 03 Mar 2011 Posts: 200 Location: Tsuruga, Japan
|
Posted: Sat Feb 02, 2013 11:24 pm Post subject: |
|
|
Downgrading the kernel to 3.6.11 did not help. Yesterday evening two "crashes" in 10 minutes. The most irritating feature is that you have to restart the computer to solve it. Simply restarting X does not help because somehow the GPU cannot be "reset. Next try: downgrade the intel driver from 2.20.19-r1 to 2.20.13. Wish me luck... |
|
Back to top |
|
|
BillWho Veteran
Joined: 03 Mar 2012 Posts: 1600 Location: US
|
Posted: Sun Feb 03, 2013 12:59 am Post subject: |
|
|
WvR,
Did you add or change any settings in /etc/X11/xorg.conf.d/20-intel.conf
Is DRM_I915 built-in or a module
Have a look at x11-apps/intel-gpu-tools. Maybe some tests can provide a clue. _________________ Good luck
Since installing gentoo, my life has become one long emerge |
|
Back to top |
|
|
WvR Apprentice
Joined: 03 Mar 2011 Posts: 200 Location: Tsuruga, Japan
|
Posted: Mon Feb 04, 2013 9:28 am Post subject: |
|
|
No changes to anything. These problems seem to have started without a clearly identifiable cause. That is one of the reasons why I suspect hardware problems.
I downgraded xf86-video-intel to the stable version. Let's see if this brings any improvement. |
|
Back to top |
|
|
toralf Developer
Joined: 01 Feb 2004 Posts: 3922 Location: Hamburg
|
Posted: Mon Feb 04, 2013 10:56 am Post subject: |
|
|
WvR wrote: | It happened again.... This time I copied the i915_error_state. It does not give much help. | Well, that content is not intended to be readable by a common user. Just file a bug here https://bugzilla.kernel.org and attach the content of that file. |
|
Back to top |
|
|
WvR Apprentice
Joined: 03 Mar 2011 Posts: 200 Location: Tsuruga, Japan
|
Posted: Thu Feb 07, 2013 5:40 am Post subject: [solved] i915: GPU hung, declared wedged.... tips? |
|
|
Since I downgraded to x11-drivers/xf86-video-intel v2.20.13 the problem has not returned, so I am declaring it "solved" for the time being. |
|
Back to top |
|
|
thens n00b
Joined: 07 Apr 2012 Posts: 12
|
Posted: Thu Jun 06, 2013 7:46 pm Post subject: |
|
|
Just recently I had this problem as well (while I was watching youtube in fullscreen mode, chromium, 3.8.13-gentoo) => X crashed.
Code: | Jun 6 21:31:15 think kernel: [108626.012334] [drm:i915_hangcheck_hung] *ERROR* Hangcheck timer elapsed... GPU hung
Jun 6 21:31:15 think kernel: [108626.012343] [drm] capturing error event; look for more information in /debug/dri/0/i915_error_state
Jun 6 21:31:17 think kernel: [108628.011221] [drm:i915_hangcheck_hung] *ERROR* Hangcheck timer elapsed... GPU hung
Jun 6 21:31:17 think kernel: [108628.011472] [drm:i915_reset] *ERROR* GPU hanging too fast, declaring wedged!
Jun 6 21:31:17 think kernel: [108628.011475] [drm:i915_reset] *ERROR* Failed to reset chip.
|
I'm currently trying to "reproduce" this problem but unsuccessful in doing so
If anyone has an idea, please let me know. |
|
Back to top |
|
|
mhex Apprentice
Joined: 18 Feb 2005 Posts: 160 Location: Germany/Berlin
|
Posted: Sun Jun 09, 2013 3:26 pm Post subject: |
|
|
today i experienced that too
Code: |
[236610.909321] [drm:i915_hangcheck_hung] *ERROR* Hangcheck timer elapsed... GPU hung
[236610.909328] [drm:kick_ring] *ERROR* Kicking stuck wait on render ring
|
watching a downloaded mp4 video
http://www.dlr.de/dlr/desktopdefault.aspx/tabid-10081/151_read-7278//year-all/#gallery/11092
mplayer, vlc-player, avidemux all show only a black screen
x11-drivers/xf86-video-intel 2.20.13
Linux xx 3.8.13-gentoo #1 SMP Thu Jun 6 08:10:20 CEST 2013 x86_64 Intel(R) Core(TM) i5-3210M CPU @ 2.50GHz GenuineIntel GNU/Linux
gcc version 4.6.3 (Gentoo 4.6.3 p1.11, pie-0.5.2) |
|
Back to top |
|
|
mhex Apprentice
Joined: 18 Feb 2005 Posts: 160 Location: Germany/Berlin
|
Posted: Mon Jun 10, 2013 5:58 am Post subject: |
|
|
Today in dmesg
Code: |
[20190.412669] hda-intel 0000:00:1b.0: Unstable LPIB (65408 >= 8192); disabling LPIB delay counting
|
|
|
Back to top |
|
|
mhex Apprentice
Joined: 18 Feb 2005 Posts: 160 Location: Germany/Berlin
|
Posted: Mon Jun 10, 2013 7:08 am Post subject: |
|
|
more info from Xorg.log:
Code: |
(EE) [mi] EQ overflow continuing. 400 events have been dropped.
(EE)
(EE) Backtrace:
(EE) 0: /usr/bin/X (xorg_backtrace+0x34) [0x595be4]
(EE) 1: /usr/bin/X (0x400000+0x4fb44) [0x44fb44]
(EE) 2: /usr/bin/X (xf86PostButtonEvent+0xdd) [0x48adcd]
(EE) 3: /usr/lib64/xorg/modules/input/evdev_drv.so (0x7f66e7e4f000+0x63b8) [0x7f66e7e553b8]
(EE) 4: /usr/bin/X (0x400000+0x7a2a7) [0x47a2a7]
(EE) 5: /usr/bin/X (0x400000+0xa5187) [0x4a5187]
(EE) 6: /lib64/libpthread.so.0 (0x3806a00000+0x10bf0) [0x3806a10bf0]
(EE) 7: /lib64/libc.so.6 (ioctl+0x7) [0x3805ee3327]
(EE) 8: /usr/lib64/libdrm.so.2 (drmIoctl+0x28) [0x31878040e8]
(EE) 9: /usr/lib64/xorg/modules/drivers/intel_drv.so (0x7f66e8c39000+0x19d10) [0x7f66e8c52d10]
(EE) 10: /usr/lib64/xorg/modules/drivers/intel_drv.so (0x7f66e8c39000+0x1b537) [0x7f66e8c54537]
(EE) 11: /usr/lib64/xorg/modules/drivers/intel_drv.so (0x7f66e8c39000+0x1c134) [0x7f66e8c55134]
(EE) 12: /usr/lib64/xorg/modules/drivers/intel_drv.so (0x7f66e8c39000+0x1cbb9) [0x7f66e8c55bb9]
(EE) 13: /usr/lib64/xorg/modules/drivers/intel_drv.so (0x7f66e8c39000+0x1f2e4) [0x7f66e8c582e4]
(EE) 14: /usr/lib64/xorg/modules/drivers/intel_drv.so (0x7f66e8c39000+0x218ab) [0x7f66e8c5a8ab]
(EE) 15: /usr/lib64/xorg/modules/drivers/intel_drv.so (0x7f66e8c39000+0x3b298) [0x7f66e8c74298]
(EE) 16: /usr/bin/X (0x400000+0x12063c) [0x52063c]
(EE) 17: /usr/bin/X (0x400000+0x37bbe) [0x437bbe]
(EE) 18: /usr/bin/X (0x400000+0x3af91) [0x43af91]
(EE) 19: /usr/bin/X (0x400000+0x29b54) [0x429b54]
(EE) 20: /lib64/libc.so.6 (__libc_start_main+0xed) [0x3805e2460d]
(EE) 21: /usr/bin/X (0x400000+0x29e9d) [0x429e9d]
(EE)
[ 33422.706] (EE) intel(0): Detected a hung GPU, disabling acceleration.
[ 33422.706] (EE) intel(0): When reporting this, please include i915_error_state from debugfs and the full dmesg.
[ 33422.706] [mi] Increasing EQ size to 512 to prevent dropped events.
[ 33422.706] [mi] EQ processing has resumed after 473 dropped events.
[ 33422.706] [mi] This may be caused my a misbehaving driver monopolizing the server's resources.
|
|
|
Back to top |
|
|
|