Gentoo Forums
Gentoo Forums
Gentoo Forums
Quick Search: in
i9-13900k GPU hang with stable mesa
View unanswered posts
View posts from last 24 hours
View posts from last 7 days

 
Reply to topic    Gentoo Forums Forum Index Desktop Environments
View previous topic :: View next topic  
Author Message
pietinger
Moderator
Moderator


Joined: 17 Oct 2006
Posts: 4396
Location: Bavaria

PostPosted: Thu May 18, 2023 10:35 am    Post subject: i9-13900k GPU hang with stable mesa Reply with quote

This is a new (OpenRC plasma) installation - all stable and using default modesetting for X11. I am using the gpu of my cpu (i915). Also stable mesa: 23.0.3-r1

I had reproducible gpu hangs when watching a youtube-4k-video ( https://www.youtube.com/watch?v=aujOb50T8Pc ) with my falkon browser after some seconds, latest one minute:
Code:
May 18 10:56:44 sun kernel: Asynchronous wait on fence 0000:00:02.0:kwin_x11[2726]:36d6 timed out (hint:0xffffffff970495e0)
May 18 10:56:48 sun kernel: i915 0000:00:02.0: [drm] GPU HANG: ecode 12:1:84dffffb, in Chrome_InProcGp [4644]
May 18 10:56:48 sun kernel: i915 0000:00:02.0: [drm] Resetting chip for stopped heartbeat on rcs0
May 18 10:56:48 sun kernel: i915 0000:00:02.0: [drm] *ERROR* rcs0 reset request timed out: {request: 00000001, RESET_CTL: 00000001}
May 18 10:56:48 sun last message buffered 1 times
May 18 10:56:48 sun kernel: i915 0000:00:02.0: [drm] Chrome_InProcGp[4644] context reset due to GPU hang
May 18 10:56:48 sun kernel: i915 0000:00:02.0: [drm] falkon[4597] context reset due to GPU hang
May 18 10:56:48 sun kernel: i915 0000:00:02.0: [drm] GuC firmware i915/tgl_guc_70.bin version 70.5.1
May 18 10:56:48 sun kernel: i915 0000:00:02.0: [drm] HuC firmware i915/tgl_huc.bin version 7.9.3
May 18 10:56:48 sun kernel: i915 0000:00:02.0: [drm] HuC authenticated
May 18 10:56:48 sun kernel: i915 0000:00:02.0: [drm] GuC submission enabled
May 18 10:56:48 sun kernel: i915 0000:00:02.0: [drm] GuC SLPC enabled


After switching to unstable mesa 23.1.0 no problem anymore ! (this was the only change; so it is mesa)


Edit: Again one gpu hang also with newest mesa ... :cry: ... but far more less then before (now it takes 20 or 30 minutes until it happens) ... :?
Back to top
View user's profile Send private message
pietinger
Moderator
Moderator


Joined: 17 Oct 2006
Posts: 4396
Location: Bavaria

PostPosted: Sun Jul 16, 2023 8:55 pm    Post subject: Reply with quote

In my first report I forgot to mention the kernel version. It was 6.1.? ... Now I am on 6.1.38 (stable)

mesa-23.1.3 is stable since yesterday ... and I had again a GPU hang (with latest stable falkon and QTwebengine) ... :evil:

So, i decided to try disabling GuC with kernel command line parameters:
Quote:
i915.enable_guc=2 i915.enable_psr=0

Before:
Code:
Jul 16 16:51:58 sun kernel: Loading firmware: i915/adls_dmc_ver2_01.bin
Jul 16 16:51:58 sun kernel: i915 0000:00:02.0: [drm] Finished loading DMC firmware i915/adls_dmc_ver2_01.bin (v2.1)
Jul 16 16:51:58 sun kernel: Loading firmware: i915/tgl_guc_70.bin
Jul 16 16:51:58 sun kernel: Loading firmware: i915/tgl_huc.bin
Jul 16 16:51:58 sun kernel: i915 0000:00:02.0: [drm] GuC firmware i915/tgl_guc_70.bin version 70.5.1
Jul 16 16:51:58 sun kernel: i915 0000:00:02.0: [drm] HuC firmware i915/tgl_huc.bin version 7.9.3
Jul 16 16:51:58 sun kernel: i915 0000:00:02.0: [drm] HuC authenticated
Jul 16 16:51:58 sun kernel: i915 0000:00:02.0: [drm] GuC submission enabled
Jul 16 16:51:58 sun kernel: i915 0000:00:02.0: [drm] GuC SLPC enabled
Jul 16 16:51:58 sun kernel: i915 0000:00:02.0: [drm] GuC RC: enabled
Jul 16 16:51:58 sun kernel: [drm] Initialized i915 1.6.0 20201103 for 0000:00:02.0 on minor 0

After:
Code:
Jul 16 21:40:58 sun kernel: Loading firmware: i915/adls_dmc_ver2_01.bin
Jul 16 21:40:58 sun kernel: i915 0000:00:02.0: [drm] Finished loading DMC firmware i915/adls_dmc_ver2_01.bin (v2.1)
Jul 16 21:40:58 sun kernel: Loading firmware: i915/tgl_guc_70.bin
Jul 16 21:40:58 sun kernel: Loading firmware: i915/tgl_huc.bin
Jul 16 21:40:58 sun kernel: i915 0000:00:02.0: [drm] GuC firmware i915/tgl_guc_70.bin version 70.5.1
Jul 16 21:40:58 sun kernel: i915 0000:00:02.0: [drm] HuC firmware i915/tgl_huc.bin version 7.9.3
Jul 16 21:40:58 sun kernel: i915 0000:00:02.0: [drm] HuC authenticated
Jul 16 21:40:58 sun kernel: i915 0000:00:02.0: [drm] GuC submission disabled
Jul 16 21:40:58 sun kernel: i915 0000:00:02.0: [drm] GuC SLPC disabled
Jul 16 21:40:58 sun kernel: [drm] Initialized i915 1.6.0 20201103 for 0000:00:02.0 on minor 0

I have tried now all "bad" YT-videos (in fullscreen) ... NO GPU hang ... knocking on wood ... and waiting for new i915 module in newer kernels ... :roll:
Back to top
View user's profile Send private message
pietinger
Moderator
Moderator


Joined: 17 Oct 2006
Posts: 4396
Location: Bavaria

PostPosted: Fri Feb 23, 2024 7:36 pm    Post subject: Reply with quote

With the new linux-firmware package from yesterday I got a new tgl_guc_70.bin (before it was 70.13.1):
Code:
[    8.868020] i915 0000:00:02.0: [drm] GT0: GuC firmware i915/tgl_guc_70.bin version 70.20.0

So I tried again to remove both i915 parameter (which taints the kernel) ... Kernel is 6.7.6 with stable mesa 23.3.5

Guess what ? Now, it crashes immediately when starting a (fullscreen) 4k YT-video in my falkon browser ... :lol: ... but the error message is better:
Code:
[  845.555841] i915 0000:00:02.0: [drm] *ERROR* GT0: GUC: Engine reset failed on 0:0 (rcs0) because 0x00000000
[  845.599984] i915 0000:00:02.0: [drm] GPU HANG: ecode 12:1:84dffffb, in Chrome_InProcGp [3310]
[  845.599986] GPU hangs can indicate a bug anywhere in the entire gfx stack, including userspace.
[  845.599987] Please file a _new_ bug report at https://gitlab.freedesktop.org/drm/intel/issues/new.
[  845.599987] Please see https://gitlab.freedesktop.org/drm/intel/-/wikis/How-to-file-i915-bugs for details.
[  845.599987] drm/i915 developers can then reassign to the right component if it's not a kernel issue.
[  845.599987] The GPU crash dump is required to analyze GPU hangs, so please always attach it.
[  845.599988] GPU crash dump saved to /sys/class/drm/card0/error
[  845.600283] i915 0000:00:02.0: [drm] GT0: Resetting chip for GuC failed to reset engine mask=0x1
[  845.702436] i915 0000:00:02.0: [drm] *ERROR* GT0: rcs0 reset request timed out: {request: 00000001, RESET_CTL: 00000001}
[  845.703158] i915 0000:00:02.0: [drm] *ERROR* GT0: rcs0 reset request timed out: {request: 00000001, RESET_CTL: 00000001}
[  845.703276] i915 0000:00:02.0: [drm] Chrome_InProcGp[3310] context reset due to GPU hang
[  845.703349] i915 0000:00:02.0: [drm] GT0: GuC firmware i915/tgl_guc_70.bin version 70.20.0
[  845.703354] i915 0000:00:02.0: [drm] GT0: HuC firmware i915/tgl_huc.bin version 7.9.3
[  845.706392] i915 0000:00:02.0: [drm] GT0: HuC: authenticated for all workloads
[  845.707062] i915 0000:00:02.0: [drm] GT0: GUC: submission enabled
[  845.707065] i915 0000:00:02.0: [drm] GT0: GUC: SLPC enabled

.... only ... /sys/class/drm/card0/error is empty .... :lol:

Okay, back to both parms and waiting for 6.8 (because of new Xe drivers) :evil:
_________________
https://wiki.gentoo.org/wiki/User:Pietinger
Back to top
View user's profile Send private message
Display posts from previous:   
Reply to topic    Gentoo Forums Forum Index Desktop Environments All times are GMT
Page 1 of 1

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum