Gentoo Forums
Gentoo Forums
Gentoo Forums
Quick Search: in
amdgpu gpu reset - freeze/crash
View unanswered posts
View posts from last 24 hours

 
Reply to topic    Gentoo Forums Forum Index Kernel & Hardware
View previous topic :: View next topic  
Author Message
ahhzee
n00b
n00b


Joined: 16 Jul 2021
Posts: 8

PostPosted: Fri Nov 26, 2021 1:08 am    Post subject: amdgpu gpu reset - freeze/crash Reply with quote

When playing a game I've been having a consistant issue of my GPU resetting itself randomly as I play. This happens in a few games, both with native and non-native builds (played thru steam proton).
It results in a frozen/black screen with randomly colored pixels dotting a few places, very buggy.
I have no idea what causes it, and it seems to be semi-common and still present for others (https://bugzilla.kernel.org/show_bug.cgi?id=201957).

Below is from my /var/log/messages during/after the crash while playing Risk of Rain 2.
Code:

Nov 25 18:20:15 zmaj kernel: [drm:amdgpu_dm_atomic_commit_tail [amdgpu]] *ERROR* Waiting for fences timed out!
Nov 25 18:20:15 zmaj kernel: [drm:amdgpu_dm_atomic_commit_tail [amdgpu]] *ERROR* Waiting for fences timed out!
Nov 25 18:20:15 zmaj kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx_0.0.0 timeout, signaled seq=1348774, emitted seq=1348776
Nov 25 18:20:15 zmaj kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process information: process Risk of Rain 2. pid 16876 thread dxvk-submit pid 16923
Nov 25 18:20:15 zmaj kernel: amdgpu 0000:28:00.0: amdgpu: GPU reset begin!
Nov 25 18:20:17 zmaj kernel: [drm] REG_WAIT timeout 1us * 200 tries - hubp2_set_blank line:956
Nov 25 18:20:17 zmaj kernel: [drm] REG_WAIT timeout 1us * 200 tries - hubp2_set_blank line:956
Nov 25 18:20:17 zmaj kernel: amdgpu 0000:28:00.0: [drm:amdgpu_ring_test_helper [amdgpu]] *ERROR* ring kiq_2.1.0 test failed (-110)
Nov 25 18:20:17 zmaj kernel: [drm:gfx_v10_0_hw_fini [amdgpu]] *ERROR* KGQ disable failed
Nov 25 18:20:17 zmaj kernel: amdgpu 0000:28:00.0: [drm:amdgpu_ring_test_helper [amdgpu]] *ERROR* ring kiq_2.1.0 test failed (-110)
Nov 25 18:20:17 zmaj kernel: [drm:gfx_v10_0_hw_fini [amdgpu]] *ERROR* KCQ disable failed
Nov 25 18:20:18 zmaj kernel: [drm:gfx_v10_0_hw_fini [amdgpu]] *ERROR* failed to halt cp gfx
Nov 25 18:20:18 zmaj kernel: [drm] free PSP TMR buffer
Nov 25 18:20:18 zmaj kernel: amdgpu 0000:28:00.0: amdgpu: BACO reset
Nov 25 18:20:20 zmaj kernel: amdgpu 0000:28:00.0: amdgpu: GPU reset succeeded, trying to resume
Nov 25 18:20:20 zmaj kernel: [drm] PCIE GART of 512M enabled (table at 0x0000008000300000).
Nov 25 18:20:20 zmaj kernel: [drm] VRAM is lost due to GPU reset!
Nov 25 18:20:20 zmaj kernel: [drm] PSP is resuming...
Nov 25 18:20:20 zmaj kernel: [drm] reserve 0x900000 from 0x817e400000 for PSP TMR
Nov 25 18:20:20 zmaj kernel: amdgpu 0000:28:00.0: amdgpu: RAS: optional ras ta ucode is not available
Nov 25 18:20:20 zmaj kernel: amdgpu 0000:28:00.0: amdgpu: RAP: optional rap ta ucode is not available
Nov 25 18:20:20 zmaj kernel: amdgpu 0000:28:00.0: amdgpu: SMU is resuming...
Nov 25 18:20:20 zmaj kernel: amdgpu 0000:28:00.0: amdgpu: smu driver if version = 0x00000036, smu fw if version = 0x00000037, smu fw version = 0x002a4000 (42.64.0)
Nov 25 18:20:20 zmaj kernel: amdgpu 0000:28:00.0: amdgpu: SMU driver if version not matched
Nov 25 18:20:20 zmaj kernel: amdgpu 0000:28:00.0: amdgpu: SMU is resumed successfully!
Nov 25 18:20:20 zmaj kernel: [drm] kiq ring mec 2 pipe 1 q 0
Nov 25 18:20:20 zmaj kernel: [drm] VCN decode and encode initialized successfully(under DPG Mode).
Nov 25 18:20:20 zmaj kernel: [drm] JPEG decode initialized successfully.
Nov 25 18:20:20 zmaj kernel: amdgpu 0000:28:00.0: amdgpu: ring gfx_0.0.0 uses VM inv eng 0 on hub 0
Nov 25 18:20:20 zmaj kernel: amdgpu 0000:28:00.0: amdgpu: ring comp_1.0.0 uses VM inv eng 1 on hub 0
Nov 25 18:20:20 zmaj kernel: amdgpu 0000:28:00.0: amdgpu: ring comp_1.1.0 uses VM inv eng 4 on hub 0
Nov 25 18:20:20 zmaj kernel: amdgpu 0000:28:00.0: amdgpu: ring comp_1.2.0 uses VM inv eng 5 on hub 0
Nov 25 18:20:20 zmaj kernel: amdgpu 0000:28:00.0: amdgpu: ring comp_1.3.0 uses VM inv eng 6 on hub 0
Nov 25 18:20:20 zmaj kernel: amdgpu 0000:28:00.0: amdgpu: ring comp_1.0.1 uses VM inv eng 7 on hub 0
Nov 25 18:20:20 zmaj kernel: amdgpu 0000:28:00.0: amdgpu: ring comp_1.1.1 uses VM inv eng 8 on hub 0
Nov 25 18:20:20 zmaj kernel: amdgpu 0000:28:00.0: amdgpu: ring comp_1.2.1 uses VM inv eng 9 on hub 0
Nov 25 18:20:20 zmaj kernel: amdgpu 0000:28:00.0: amdgpu: ring comp_1.3.1 uses VM inv eng 10 on hub 0
Nov 25 18:20:20 zmaj kernel: amdgpu 0000:28:00.0: amdgpu: ring kiq_2.1.0 uses VM inv eng 11 on hub 0
Nov 25 18:20:20 zmaj kernel: amdgpu 0000:28:00.0: amdgpu: ring sdma0 uses VM inv eng 12 on hub 0
Nov 25 18:20:20 zmaj kernel: amdgpu 0000:28:00.0: amdgpu: ring sdma1 uses VM inv eng 13 on hub 0
Nov 25 18:20:20 zmaj kernel: amdgpu 0000:28:00.0: amdgpu: ring vcn_dec uses VM inv eng 0 on hub 1
Nov 25 18:20:20 zmaj kernel: amdgpu 0000:28:00.0: amdgpu: ring vcn_enc0 uses VM inv eng 1 on hub 1
Nov 25 18:20:20 zmaj kernel: amdgpu 0000:28:00.0: amdgpu: ring vcn_enc1 uses VM inv eng 4 on hub 1
Nov 25 18:20:20 zmaj kernel: amdgpu 0000:28:00.0: amdgpu: ring jpeg_dec uses VM inv eng 5 on hub 1
Nov 25 18:20:20 zmaj kernel: amdgpu 0000:28:00.0: amdgpu: recover vram bo from shadow start
Nov 25 18:20:20 zmaj kernel: amdgpu 0000:28:00.0: amdgpu: recover vram bo from shadow done
Nov 25 18:20:20 zmaj kernel: [drm] Skip scheduling IBs!
Nov 25 18:20:20 zmaj kernel: [drm] Skip scheduling IBs!
Nov 25 18:20:20 zmaj kernel: [drm] Skip scheduling IBs!
Nov 25 18:20:20 zmaj kernel: [drm] Skip scheduling IBs!
Nov 25 18:20:20 zmaj kernel: amdgpu 0000:28:00.0: amdgpu: GPU reset(2) succeeded!
Nov 25 18:20:20 zmaj kernel: [drm] Skip scheduling IBs!
Nov 25 18:20:20 zmaj kernel: [drm] Skip scheduling IBs!
Nov 25 18:20:20 zmaj kernel: [drm] Skip scheduling IBs!

<repeat of above for nearly 200 lines>

Nov 25 18:20:20 zmaj kernel: [drm] Skip scheduling IBs!
Nov 25 18:20:20 zmaj kernel: [drm] Skip scheduling IBs!
Nov 25 18:20:20 zmaj kernel: [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize parser -125!
Nov 25 18:20:20 zmaj kernel: [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize parser -125!
Nov 25 18:20:20 zmaj kernel: [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize parser -125!
Nov 25 18:20:20 zmaj kernel: [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize parser -125!
Nov 25 18:20:20 zmaj kernel: [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize parser -125!
Nov 25 18:20:20 zmaj kernel: [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize parser -125!
Nov 25 18:20:20 zmaj kernel: [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize parser -125!
Nov 25 18:20:20 zmaj kernel: [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize parser -125!
Nov 25 18:20:20 zmaj kernel: [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize parser -125!
Nov 25 18:20:20 zmaj kernel: [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize parser -125!
Nov 25 18:20:35 zmaj kernel: amdgpu_cs_ioctl: 6 callbacks suppressed
Nov 25 18:20:35 zmaj kernel: [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize parser -125!
Nov 25 18:21:03 zmaj kernel: [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize parser -125!
Nov 25 18:21:03 zmaj kernel: [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize parser -125!

<repeat until I use my keybind to reboot>


My WM is still resonsive, and if I reboot or exit the WM I will be able to effortlessly startx again (sans some audio glitches). The screen won't update until X is restarted.

If there is a known cause or fix to this I'd be very happy to know, even if it requires patching or old firmware versions.
Back to top
View user's profile Send private message
alamahant
Advocate
Advocate


Joined: 23 Mar 2019
Posts: 3879

PostPosted: Fri Nov 26, 2021 10:53 am    Post subject: Reply with quote

Plz try with
Code:

=< sys-kernel/linux-firmware-20210315 

Plz see this
https://gitlab.freedesktop.org/mesa/mesa/-/issues/4866
Are you using KDE?
Some people mention also that sddm might be to blame.
_________________
:)
Back to top
View user's profile Send private message
ahhzee
n00b
n00b


Joined: 16 Jul 2021
Posts: 8

PostPosted: Fri Nov 26, 2021 3:47 pm    Post subject: Reply with quote

Thank you for the reply.
I tried with version 20210208 and it still didn't work. Same crash/errors as before.

I use nither KDE or SDDM, rather dwm and startx.

The kernerl peramiters I have are
Code:
root=/dev/sda2 ro amdgpu.noretry=0


Manually setting '/sys/class/drm/card0/device/power_dpm_force_performance_level' to high as posted on the manjaro forums doesn't fix the issue
Back to top
View user's profile Send private message
Display posts from previous:   
Reply to topic    Gentoo Forums Forum Index Kernel & Hardware All times are GMT
Page 1 of 1

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum