View previous topic :: View next topic |
Author |
Message |
pietinger Moderator
Joined: 17 Oct 2006 Posts: 4396 Location: Bavaria
|
Posted: Thu May 18, 2023 10:35 am Post subject: i9-13900k GPU hang with stable mesa |
|
|
This is a new (OpenRC plasma) installation - all stable and using default modesetting for X11. I am using the gpu of my cpu (i915). Also stable mesa: 23.0.3-r1
I had reproducible gpu hangs when watching a youtube-4k-video ( https://www.youtube.com/watch?v=aujOb50T8Pc ) with my falkon browser after some seconds, latest one minute:
Code: | May 18 10:56:44 sun kernel: Asynchronous wait on fence 0000:00:02.0:kwin_x11[2726]:36d6 timed out (hint:0xffffffff970495e0)
May 18 10:56:48 sun kernel: i915 0000:00:02.0: [drm] GPU HANG: ecode 12:1:84dffffb, in Chrome_InProcGp [4644]
May 18 10:56:48 sun kernel: i915 0000:00:02.0: [drm] Resetting chip for stopped heartbeat on rcs0
May 18 10:56:48 sun kernel: i915 0000:00:02.0: [drm] *ERROR* rcs0 reset request timed out: {request: 00000001, RESET_CTL: 00000001}
May 18 10:56:48 sun last message buffered 1 times
May 18 10:56:48 sun kernel: i915 0000:00:02.0: [drm] Chrome_InProcGp[4644] context reset due to GPU hang
May 18 10:56:48 sun kernel: i915 0000:00:02.0: [drm] falkon[4597] context reset due to GPU hang
May 18 10:56:48 sun kernel: i915 0000:00:02.0: [drm] GuC firmware i915/tgl_guc_70.bin version 70.5.1
May 18 10:56:48 sun kernel: i915 0000:00:02.0: [drm] HuC firmware i915/tgl_huc.bin version 7.9.3
May 18 10:56:48 sun kernel: i915 0000:00:02.0: [drm] HuC authenticated
May 18 10:56:48 sun kernel: i915 0000:00:02.0: [drm] GuC submission enabled
May 18 10:56:48 sun kernel: i915 0000:00:02.0: [drm] GuC SLPC enabled |
After switching to unstable mesa 23.1.0 no problem anymore ! (this was the only change; so it is mesa)
Edit: Again one gpu hang also with newest mesa ... ... but far more less then before (now it takes 20 or 30 minutes until it happens) ... |
|
Back to top |
|
|
pietinger Moderator
Joined: 17 Oct 2006 Posts: 4396 Location: Bavaria
|
Posted: Sun Jul 16, 2023 8:55 pm Post subject: |
|
|
In my first report I forgot to mention the kernel version. It was 6.1.? ... Now I am on 6.1.38 (stable)
mesa-23.1.3 is stable since yesterday ... and I had again a GPU hang (with latest stable falkon and QTwebengine) ...
So, i decided to try disabling GuC with kernel command line parameters:
Quote: | i915.enable_guc=2 i915.enable_psr=0 |
Before:
Code: | Jul 16 16:51:58 sun kernel: Loading firmware: i915/adls_dmc_ver2_01.bin
Jul 16 16:51:58 sun kernel: i915 0000:00:02.0: [drm] Finished loading DMC firmware i915/adls_dmc_ver2_01.bin (v2.1)
Jul 16 16:51:58 sun kernel: Loading firmware: i915/tgl_guc_70.bin
Jul 16 16:51:58 sun kernel: Loading firmware: i915/tgl_huc.bin
Jul 16 16:51:58 sun kernel: i915 0000:00:02.0: [drm] GuC firmware i915/tgl_guc_70.bin version 70.5.1
Jul 16 16:51:58 sun kernel: i915 0000:00:02.0: [drm] HuC firmware i915/tgl_huc.bin version 7.9.3
Jul 16 16:51:58 sun kernel: i915 0000:00:02.0: [drm] HuC authenticated
Jul 16 16:51:58 sun kernel: i915 0000:00:02.0: [drm] GuC submission enabled
Jul 16 16:51:58 sun kernel: i915 0000:00:02.0: [drm] GuC SLPC enabled
Jul 16 16:51:58 sun kernel: i915 0000:00:02.0: [drm] GuC RC: enabled
Jul 16 16:51:58 sun kernel: [drm] Initialized i915 1.6.0 20201103 for 0000:00:02.0 on minor 0 |
After:
Code: | Jul 16 21:40:58 sun kernel: Loading firmware: i915/adls_dmc_ver2_01.bin
Jul 16 21:40:58 sun kernel: i915 0000:00:02.0: [drm] Finished loading DMC firmware i915/adls_dmc_ver2_01.bin (v2.1)
Jul 16 21:40:58 sun kernel: Loading firmware: i915/tgl_guc_70.bin
Jul 16 21:40:58 sun kernel: Loading firmware: i915/tgl_huc.bin
Jul 16 21:40:58 sun kernel: i915 0000:00:02.0: [drm] GuC firmware i915/tgl_guc_70.bin version 70.5.1
Jul 16 21:40:58 sun kernel: i915 0000:00:02.0: [drm] HuC firmware i915/tgl_huc.bin version 7.9.3
Jul 16 21:40:58 sun kernel: i915 0000:00:02.0: [drm] HuC authenticated
Jul 16 21:40:58 sun kernel: i915 0000:00:02.0: [drm] GuC submission disabled
Jul 16 21:40:58 sun kernel: i915 0000:00:02.0: [drm] GuC SLPC disabled
Jul 16 21:40:58 sun kernel: [drm] Initialized i915 1.6.0 20201103 for 0000:00:02.0 on minor 0 |
I have tried now all "bad" YT-videos (in fullscreen) ... NO GPU hang ... knocking on wood ... and waiting for new i915 module in newer kernels ... |
|
Back to top |
|
|
pietinger Moderator
Joined: 17 Oct 2006 Posts: 4396 Location: Bavaria
|
Posted: Fri Feb 23, 2024 7:36 pm Post subject: |
|
|
With the new linux-firmware package from yesterday I got a new tgl_guc_70.bin (before it was 70.13.1):
Code: | [ 8.868020] i915 0000:00:02.0: [drm] GT0: GuC firmware i915/tgl_guc_70.bin version 70.20.0 |
So I tried again to remove both i915 parameter (which taints the kernel) ... Kernel is 6.7.6 with stable mesa 23.3.5
Guess what ? Now, it crashes immediately when starting a (fullscreen) 4k YT-video in my falkon browser ... ... but the error message is better:
Code: | [ 845.555841] i915 0000:00:02.0: [drm] *ERROR* GT0: GUC: Engine reset failed on 0:0 (rcs0) because 0x00000000
[ 845.599984] i915 0000:00:02.0: [drm] GPU HANG: ecode 12:1:84dffffb, in Chrome_InProcGp [3310]
[ 845.599986] GPU hangs can indicate a bug anywhere in the entire gfx stack, including userspace.
[ 845.599987] Please file a _new_ bug report at https://gitlab.freedesktop.org/drm/intel/issues/new.
[ 845.599987] Please see https://gitlab.freedesktop.org/drm/intel/-/wikis/How-to-file-i915-bugs for details.
[ 845.599987] drm/i915 developers can then reassign to the right component if it's not a kernel issue.
[ 845.599987] The GPU crash dump is required to analyze GPU hangs, so please always attach it.
[ 845.599988] GPU crash dump saved to /sys/class/drm/card0/error
[ 845.600283] i915 0000:00:02.0: [drm] GT0: Resetting chip for GuC failed to reset engine mask=0x1
[ 845.702436] i915 0000:00:02.0: [drm] *ERROR* GT0: rcs0 reset request timed out: {request: 00000001, RESET_CTL: 00000001}
[ 845.703158] i915 0000:00:02.0: [drm] *ERROR* GT0: rcs0 reset request timed out: {request: 00000001, RESET_CTL: 00000001}
[ 845.703276] i915 0000:00:02.0: [drm] Chrome_InProcGp[3310] context reset due to GPU hang
[ 845.703349] i915 0000:00:02.0: [drm] GT0: GuC firmware i915/tgl_guc_70.bin version 70.20.0
[ 845.703354] i915 0000:00:02.0: [drm] GT0: HuC firmware i915/tgl_huc.bin version 7.9.3
[ 845.706392] i915 0000:00:02.0: [drm] GT0: HuC: authenticated for all workloads
[ 845.707062] i915 0000:00:02.0: [drm] GT0: GUC: submission enabled
[ 845.707065] i915 0000:00:02.0: [drm] GT0: GUC: SLPC enabled |
.... only ... /sys/class/drm/card0/error is empty ....
Okay, back to both parms and waiting for 6.8 (because of new Xe drivers) _________________ https://wiki.gentoo.org/wiki/User:Pietinger |
|
Back to top |
|
|
|