Gentoo Forums
Gentoo Forums
Gentoo Forums
Quick Search: in
i915: GPU hung, declared wedged.... tips?
View unanswered posts
View posts from last 24 hours

 
Reply to topic    Gentoo Forums Forum Index Kernel & Hardware
View previous topic :: View next topic  
Author Message
WvR
Tux's lil' helper
Tux's lil' helper


Joined: 03 Mar 2011
Posts: 139
Location: Tsuruga, Japan

PostPosted: Fri Feb 01, 2013 12:05 am    Post subject: i915: GPU hung, declared wedged.... tips? Reply with quote

I use a Lenovo ThinkPad X201i with gentoo with full satisfaction. However, recently I am experiencing some issues:

Problem: when working with Gnome (v3.6.x), at some point the interface freezes. I can move the mouse, and the cursor will move over the screen, but for instance the clock is frozen. After several seconds, the active window is blacked out. Then, I get the black screen that says "Sorry, something has gone wrong and the system cannot recover. Call a system administrator".

The keyboard is responsive. I use CTRL-ALT-F1 to get to a tty, log in as root, and restart XDM. Then, X will restart, but before the GDM login screen appears, I get the same error message: "Sorry, something has gone wrong and the system cannot recover. Call a system administrator"

After trying several things, I have a feeling that it is an issue with the Intel i915 driver. A snippet from /var/log/messages

Code:

Jan 31 16:04:17 rine50 kernel: [ 5887.075138] [drm:i915_hangcheck_hung] *ERROR* Hangcheck timer elapsed... GPU hung
Jan 31 16:04:17 rine50 kernel: [ 5887.075145] [drm] capturing error event; look for more information in /debug/dri/0/i915_error_state
Jan 31 16:04:19 rine50 kernel: [ 5888.691015] [drm:i915_hangcheck_hung] *ERROR* Hangcheck timer elapsed... GPU hung
Jan 31 16:04:19 rine50 kernel: [ 5888.691326] [drm:i915_reset] *ERROR* GPU hanging too fast, declaring wedged!
Jan 31 16:04:19 rine50 kernel: [ 5888.691334] [drm:i915_reset] *ERROR* Failed to reset chip.


Browsing in Google, I discovered this error, but in all cases with linux kernel 2.6.3x. I use 3.7.4. A more recent message pointed to a buggy combination of BIOS and hardware on a particular type of Intel motherboard. But in my case, the laptop was fine until now. Am I looking at broken hardware? If so, how to find out? Or can I somehow switch off the DRM and use the "old-style" Gnome instead of the "new-style" Gnome. I have used my laptop also with Xsession (twm) - in this case the error does not occur but I have not used the system long enough with twm to make a definitive conclusion.

Any tips are welcome. Is there a way to "stress test" the GPU?
Back to top
View user's profile Send private message
BillWho
Veteran
Veteran


Joined: 03 Mar 2012
Posts: 1600
Location: US

PostPosted: Fri Feb 01, 2013 1:15 am    Post subject: Reply with quote

WvR,

Did you check /sys/kernel/debug/dri/0/i915_error_state :?: Maybe some better clues there.
_________________
Good luck :wink:

Since installing gentoo, my life has become one long emerge :)
Back to top
View user's profile Send private message
WvR
Tux's lil' helper
Tux's lil' helper


Joined: 03 Mar 2011
Posts: 139
Location: Tsuruga, Japan

PostPosted: Fri Feb 01, 2013 6:41 am    Post subject: Reply with quote

It happened again.... This time I copied the i915_error_state. It does not give much help. It is a very long list of register contents in hexadecimal form.

Just today the intel driver (xf86_video_intel) was updated but apparently that does not help.....
Back to top
View user's profile Send private message
WvR
Tux's lil' helper
Tux's lil' helper


Joined: 03 Mar 2011
Posts: 139
Location: Tsuruga, Japan

PostPosted: Fri Feb 01, 2013 7:35 am    Post subject: Reply with quote

I found this thread

http://www.gossamer-threads.com/lists/linux/kernel/1617936

It seems that I am not the only one. I guess I will downgrade to 3.6.11 on the laptop (there is no real reason to use the ~amd64 kernel anyway)
Back to top
View user's profile Send private message
WvR
Tux's lil' helper
Tux's lil' helper


Joined: 03 Mar 2011
Posts: 139
Location: Tsuruga, Japan

PostPosted: Sat Feb 02, 2013 11:24 pm    Post subject: Reply with quote

Downgrading the kernel to 3.6.11 did not help. Yesterday evening two "crashes" in 10 minutes. The most irritating feature is that you have to restart the computer to solve it. Simply restarting X does not help because somehow the GPU cannot be "reset. Next try: downgrade the intel driver from 2.20.19-r1 to 2.20.13. Wish me luck...
Back to top
View user's profile Send private message
BillWho
Veteran
Veteran


Joined: 03 Mar 2012
Posts: 1600
Location: US

PostPosted: Sun Feb 03, 2013 12:59 am    Post subject: Reply with quote

WvR,

Did you add or change any settings in /etc/X11/xorg.conf.d/20-intel.conf :?:

Is DRM_I915 built-in or a module :?:

Have a look at x11-apps/intel-gpu-tools. Maybe some tests can provide a clue.
_________________
Good luck :wink:

Since installing gentoo, my life has become one long emerge :)
Back to top
View user's profile Send private message
WvR
Tux's lil' helper
Tux's lil' helper


Joined: 03 Mar 2011
Posts: 139
Location: Tsuruga, Japan

PostPosted: Mon Feb 04, 2013 9:28 am    Post subject: Reply with quote

No changes to anything. These problems seem to have started without a clearly identifiable cause. That is one of the reasons why I suspect hardware problems.

I downgraded xf86-video-intel to the stable version. Let's see if this brings any improvement.
Back to top
View user's profile Send private message
toralf
Advocate
Advocate


Joined: 01 Feb 2004
Posts: 2671
Location: Hamburg/Germany

PostPosted: Mon Feb 04, 2013 10:56 am    Post subject: Reply with quote

WvR wrote:
It happened again.... This time I copied the i915_error_state. It does not give much help.
Well, that content is not intended to be readable by a common user. Just file a bug here https://bugzilla.kernel.org and attach the content of that file.
Back to top
View user's profile Send private message
WvR
Tux's lil' helper
Tux's lil' helper


Joined: 03 Mar 2011
Posts: 139
Location: Tsuruga, Japan

PostPosted: Thu Feb 07, 2013 5:40 am    Post subject: [solved] i915: GPU hung, declared wedged.... tips? Reply with quote

Since I downgraded to x11-drivers/xf86-video-intel v2.20.13 the problem has not returned, so I am declaring it "solved" for the time being.
Back to top
View user's profile Send private message
thens
n00b
n00b


Joined: 07 Apr 2012
Posts: 12

PostPosted: Thu Jun 06, 2013 7:46 pm    Post subject: Reply with quote

Just recently I had this problem as well (while I was watching youtube in fullscreen mode, chromium, 3.8.13-gentoo) => X crashed.

Code:
Jun  6 21:31:15 think kernel: [108626.012334] [drm:i915_hangcheck_hung] *ERROR* Hangcheck timer elapsed... GPU hung
Jun  6 21:31:15 think kernel: [108626.012343] [drm] capturing error event; look for more information in /debug/dri/0/i915_error_state
Jun  6 21:31:17 think kernel: [108628.011221] [drm:i915_hangcheck_hung] *ERROR* Hangcheck timer elapsed... GPU hung
Jun  6 21:31:17 think kernel: [108628.011472] [drm:i915_reset] *ERROR* GPU hanging too fast, declaring wedged!
Jun  6 21:31:17 think kernel: [108628.011475] [drm:i915_reset] *ERROR* Failed to reset chip.


I'm currently trying to "reproduce" this problem but unsuccessful in doing so :-(
If anyone has an idea, please let me know.
Back to top
View user's profile Send private message
mhex
Tux's lil' helper
Tux's lil' helper


Joined: 18 Feb 2005
Posts: 98
Location: Germany/Berlin

PostPosted: Sun Jun 09, 2013 3:26 pm    Post subject: Reply with quote

today i experienced that too

Code:

[236610.909321] [drm:i915_hangcheck_hung] *ERROR* Hangcheck timer elapsed... GPU hung
[236610.909328] [drm:kick_ring] *ERROR* Kicking stuck wait on render ring


watching a downloaded mp4 video

http://www.dlr.de/dlr/desktopdefault.aspx/tabid-10081/151_read-7278//year-all/#gallery/11092

mplayer, vlc-player, avidemux all show only a black screen

x11-drivers/xf86-video-intel 2.20.13

Linux xx 3.8.13-gentoo #1 SMP Thu Jun 6 08:10:20 CEST 2013 x86_64 Intel(R) Core(TM) i5-3210M CPU @ 2.50GHz GenuineIntel GNU/Linux

gcc version 4.6.3 (Gentoo 4.6.3 p1.11, pie-0.5.2)
Back to top
View user's profile Send private message
mhex
Tux's lil' helper
Tux's lil' helper


Joined: 18 Feb 2005
Posts: 98
Location: Germany/Berlin

PostPosted: Mon Jun 10, 2013 5:58 am    Post subject: Reply with quote

Today in dmesg

Code:

[20190.412669] hda-intel 0000:00:1b.0: Unstable LPIB (65408 >= 8192); disabling LPIB delay counting
Back to top
View user's profile Send private message
mhex
Tux's lil' helper
Tux's lil' helper


Joined: 18 Feb 2005
Posts: 98
Location: Germany/Berlin

PostPosted: Mon Jun 10, 2013 7:08 am    Post subject: Reply with quote

more info from Xorg.log:

Code:

(EE) [mi] EQ overflow continuing.  400 events have been dropped.
(EE)
(EE) Backtrace:
(EE) 0: /usr/bin/X (xorg_backtrace+0x34) [0x595be4]
(EE) 1: /usr/bin/X (0x400000+0x4fb44) [0x44fb44]
(EE) 2: /usr/bin/X (xf86PostButtonEvent+0xdd) [0x48adcd]
(EE) 3: /usr/lib64/xorg/modules/input/evdev_drv.so (0x7f66e7e4f000+0x63b8) [0x7f66e7e553b8]
(EE) 4: /usr/bin/X (0x400000+0x7a2a7) [0x47a2a7]
(EE) 5: /usr/bin/X (0x400000+0xa5187) [0x4a5187]
(EE) 6: /lib64/libpthread.so.0 (0x3806a00000+0x10bf0) [0x3806a10bf0]
(EE) 7: /lib64/libc.so.6 (ioctl+0x7) [0x3805ee3327]
(EE) 8: /usr/lib64/libdrm.so.2 (drmIoctl+0x28) [0x31878040e8]
(EE) 9: /usr/lib64/xorg/modules/drivers/intel_drv.so (0x7f66e8c39000+0x19d10) [0x7f66e8c52d10]
(EE) 10: /usr/lib64/xorg/modules/drivers/intel_drv.so (0x7f66e8c39000+0x1b537) [0x7f66e8c54537]
(EE) 11: /usr/lib64/xorg/modules/drivers/intel_drv.so (0x7f66e8c39000+0x1c134) [0x7f66e8c55134]
(EE) 12: /usr/lib64/xorg/modules/drivers/intel_drv.so (0x7f66e8c39000+0x1cbb9) [0x7f66e8c55bb9]
(EE) 13: /usr/lib64/xorg/modules/drivers/intel_drv.so (0x7f66e8c39000+0x1f2e4) [0x7f66e8c582e4]
(EE) 14: /usr/lib64/xorg/modules/drivers/intel_drv.so (0x7f66e8c39000+0x218ab) [0x7f66e8c5a8ab]
(EE) 15: /usr/lib64/xorg/modules/drivers/intel_drv.so (0x7f66e8c39000+0x3b298) [0x7f66e8c74298]
(EE) 16: /usr/bin/X (0x400000+0x12063c) [0x52063c]
(EE) 17: /usr/bin/X (0x400000+0x37bbe) [0x437bbe]
(EE) 18: /usr/bin/X (0x400000+0x3af91) [0x43af91]
(EE) 19: /usr/bin/X (0x400000+0x29b54) [0x429b54]
(EE) 20: /lib64/libc.so.6 (__libc_start_main+0xed) [0x3805e2460d]
(EE) 21: /usr/bin/X (0x400000+0x29e9d) [0x429e9d]
(EE)
[ 33422.706] (EE) intel(0): Detected a hung GPU, disabling acceleration.
[ 33422.706] (EE) intel(0): When reporting this, please include i915_error_state from debugfs and the full dmesg.
[ 33422.706] [mi] Increasing EQ size to 512 to prevent dropped events.
[ 33422.706] [mi] EQ processing has resumed after 473 dropped events.
[ 33422.706] [mi] This may be caused my a misbehaving driver monopolizing the server's resources.
Back to top
View user's profile Send private message
Display posts from previous:   
Reply to topic    Gentoo Forums Forum Index Kernel & Hardware All times are GMT
Page 1 of 1

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum