View previous topic :: View next topic |
Author |
Message |
MorgothSauron n00b


Joined: 24 Sep 2020 Posts: 69
|
Posted: Mon Nov 13, 2023 6:38 pm Post subject: AMD GPU RX 6800 Random display freeze |
|
|
Hello,
Since early September I'm experiencing random display freeze with my ADM GPU. I say it is related to the GPU because /var/log/messages contains kernel messages related to amdgpu. It happened at least 5 times since I started troubleshooting. It might be a pure coincidence but the issue started around the time I started to use Kernel 6.5.
I was able to identify a pattern for this issue, but I'm not able to trigger the problem on purpose I have to wait for the issue to happen to collect any data for troubleshooting.
The freeze follows this pattern:
- Firefox (~amd64) is playing a Youtube video
- The video freezes like it is buffering but the audio is still working
- The display is not refreshing anymore. I can't Alt+Tab and the mouse cursor is not moving.
- I cannot switch to a different console (e.g. Ctrl+Alt+F1)
- The audio stops after about 5 minutes and the screen goes black with a non-blinking cursor at the top left. No text at all.
At this point I have no other option than a power reset.
I was not sure if the system was completely frozen or not. I enabled SSH to give me opportunity to try recovery (e.g. clean reboot).
I was able to connect with SSH the next time the issue happened. At least the system was still working to some extent. I tried a reboot but it didn't work. My SSH session terminated and my PC was still responding to ping after 5 minutes. I had no way to know what was happening and had to force a power reset. I know the ping response was not from a system in boot process because I have LUKS enabled and I have to enter a passphrase.
It never happened while playing a game on Linux. I do get a driver timeout from time to time when I start a specific game on Windows, but this could be a problem with the game itself and not the GPU.
I tried to search on different forums and I couldn't find much information using some keywords from the log.
I did find this https://bugzilla.kernel.org/show_bug.cgi?id=201957 but it didn't help. With kernel 6.5 the default for amdgpu.mcbp is indeed -1 compared to 6.4 where the default is 0. I tried to set the value to 0 but I still encountered the same issue. I know this post is for a different issue, but I decided to give it a try anyway.
I created /etc/modprobe/amdgpu.conf to configure mcbp=0
Code: | #
options amdgpu mcbp=0
# |
I searched AMD GPU Gitlab (https://gitlab.freedesktop.org/drm/amd/) without luck. I'm checking here before trying to open a problem there.
The PC itself is located in a well ventilated space. I clean the inside of the case with a dust blower every month. I take care to not let any fan spins when I use the dust blower. I checked that the GPU is properly "seated" in the PCI slot. The GPU fans are working and will speed up under load. I didn't notice temperature issue using nvtop. This is a brand new GPU purchased from a reputable store in April 2023.
No overclocking (CPU and GPU).
/var/log/messages will contain the following message:
Code: | kernel: ------------[ cut here ]------------
kernel: WARNING: CPU: 2 PID: 4758 at drivers/gpu/drm/amd/amdgpu/../display/amdgpu_dm/amdgpu_dm.c:8242 amdgpu_dm_atomic_commit_tail+0x3884/0x3930 [amdgpu]
kernel: Modules linked in: xt_CHECKSUM xt_MASQUERADE xt_conntrack ipt_REJECT nf_reject_ipv4 ip6table_mangle ip6table_nat ip6table_filter ip6_tables iptable_mangle iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 iptable_filter ip_tables bpfilter bridge stp llc vfat fat joydev snd_hda_codec_realtek snd_hda_codec_generic amdgpu snd_sof_pci_intel_cnl snd_sof_intel_hda_common snd_soc_hdac_hda snd_sof_intel_hda_mlink snd_sof_intel_hda snd_sof_pci snd_sof_xtensa_dsp snd_sof snd_sof_utils wireguard snd_soc_skl libchacha20poly1305 snd_soc_sst_ipc chacha_x86_64 snd_soc_sst_dsp poly1305_x86_64 snd_hda_ext_core ip6_udp_tunnel snd_soc_acpi_intel_match udp_tunnel snd_soc_acpi ledtrig_audio ipv6 snd_soc_core snd_hda_codec_hdmi snd_compress snd_pcm_dmaengine ac97_bus crc_ccitt drm_suballoc_helper intel_rapl_msr amdxcp snd_hda_intel intel_rapl_common mfd_core x86_pkg_temp_thermal snd_intel_dspcfg drm_buddy curve25519_x86_64 intel_powerclamp gpu_sched libcurve25519_generic snd_hda_codec libchacha crct10dif_pclmul
kernel: drm_display_helper snd_hda_core ghash_clmulni_intel it87 cec snd_hwdep sha512_ssse3 drm_ttm_helper hwmon_vid snd_pcm ee1004 ttm rapl intel_cstate drm_kms_helper mei_hdcp snd_timer wmi_bmof intel_wmi_thunderbolt coretemp i2c_i801 intel_uncore pcspkr efi_pstore drm i2c_smbus snd mei_me hid_logitech_hidpp soundcore mei video backlight acpi_pad wmi intel_pch_thermal efivarfs dm_crypt trusted asn1_encoder dm_mod hid_logitech_dj sr_mod sd_mod cdrom crc32_pclmul xhci_pci crc32c_intel e1000e ahci xhci_hcd libahci
kernel: CPU: 2 PID: 4758 Comm: X Not tainted 6.5.11-gentoo-x86_64 #1
kernel: Hardware name: Gigabyte Technology Co., Ltd. Z390 AORUS PRO/Z390 AORUS PRO-CF, BIOS F12 11/05/2021
kernel: RIP: 0010:amdgpu_dm_atomic_commit_tail+0x3884/0x3930 [amdgpu]
kernel: Code: 40 fd ff ff 48 8d 95 94 fd ff ff 48 8b 85 50 fd ff ff 48 8b b6 50 01 00 00 48 8b b8 78 f4 03 00 e8 11 88 20 00 e9 87 f9 ff ff <0f> 0b e9 44 f0 ff ff 49 8b 4d 28 49 39 4b 28 0f 95 85 a0 fc ff ff
kernel: RSP: 0018:ffff986ec25ab8c8 EFLAGS: 00010002
kernel: RAX: 0000000000000286 RBX: 0000000000000286 RCX: 0000000000000019
kernel: RDX: 0000000000000001 RSI: 0000000000000297 RDI: 0000000000000002
kernel: RBP: ffff986ec25abc60 R08: 0000000000000001 R09: 0000000000000000
kernel: R10: ffff8b1f40795118 R11: ffff986ec25ab82c R12: ffff8b1f40795000
kernel: R13: ffff8b1f07d80010 R14: ffff8b218a2c3400 R15: 0000000000000000
kernel: FS: 00007fea15738900(0000) GS:ffff8b269dc80000(0000) knlGS:0000000000000000
kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
kernel: CR2: 00007fc645976b6c CR3: 00000001068ca003 CR4: 00000000003706e0
kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
kernel: DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
kernel: Call Trace:
kernel: <TASK>
kernel: ? amdgpu_dm_atomic_commit_tail+0x3884/0x3930 [amdgpu]
kernel: ? __warn+0x7d/0x130
kernel: ? amdgpu_dm_atomic_commit_tail+0x3884/0x3930 [amdgpu]
kernel: ? report_bug+0x16d/0x1a0
kernel: ? handle_bug+0x3a/0x70
kernel: ? exc_invalid_op+0x13/0x60
kernel: ? asm_exc_invalid_op+0x16/0x20
kernel: ? amdgpu_dm_atomic_commit_tail+0x3884/0x3930 [amdgpu]
kernel: ? amdgpu_dm_atomic_commit_tail+0x28bc/0x3930 [amdgpu]
kernel: ? __wake_up_klogd.part.0+0x3c/0x60
kernel: ? vprintk_emit+0x17f/0x200
kernel: commit_tail+0x91/0x130 [drm_kms_helper]
kernel: drm_atomic_helper_commit+0x116/0x140 [drm_kms_helper]
kernel: drm_atomic_commit+0x93/0xc0 [drm]
kernel: ? __pfx___drm_printfn_info+0x10/0x10 [drm]
kernel: drm_mode_obj_set_property_ioctl+0x146/0x3a0 [drm]
kernel: ? __pfx_drm_mode_obj_set_property_ioctl+0x10/0x10 [drm]
kernel: drm_ioctl_kernel+0xbe/0x160 [drm]
kernel: drm_ioctl+0x258/0x4d0 [drm]
kernel: ? __pfx_drm_mode_obj_set_property_ioctl+0x10/0x10 [drm]
kernel: amdgpu_drm_ioctl+0x4a/0x80 [amdgpu]
kernel: __x64_sys_ioctl+0x90/0xd0
kernel: do_syscall_64+0x38/0x90
kernel: entry_SYSCALL_64_after_hwframe+0x6e/0xd8
kernel: RIP: 0033:0x7fea15cbe3fb
kernel: Code: 00 48 89 44 24 18 31 c0 48 8d 44 24 60 c7 04 24 10 00 00 00 48 89 44 24 08 48 8d 44 24 20 48 89 44 24 10 b8 10 00 00 00 0f 05 <89> c2 3d 00 f0 ff ff 77 1c 48 8b 44 24 18 64 48 2b 04 25 28 00 00
kernel: RSP: 002b:00007fff02c67fd0 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
kernel: RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007fea15cbe3fb
kernel: RDX: 00007fff02c68060 RSI: 00000000c01864ba RDI: 000000000000000c
kernel: RBP: 00007fff02c68060 R08: 0000000000000093 R09: 0000000000001000
kernel: R10: 000000000ffaf041 R11: 0000000000000246 R12: 00000000c01864ba
kernel: R13: 000000000000000c R14: 000055c1aad58460 R15: 0000000000000fff
kernel: </TASK>
kernel: ---[ end trace 0000000000000000 ]--- |
That specific block will repeat multiple times without little different. This block appeared 20 times the last time the issue happened. I can provide a full copy of the log if necessary.
System details (inxi -F)
Code: | System:
Host: morgoth Kernel: 6.5.11-gentoo-x86_64 arch: x86_64 bits: 64
Desktop: KDE Plasma v: 5.27.8 Distro: Gentoo Base System release 2.14
Machine:
Type: Desktop System: Gigabyte product: Z390 AORUS PRO v: N/A
serial: <superuser required>
Mobo: Gigabyte model: Z390 AORUS PRO-CF serial: <superuser required>
UEFI: American Megatrends v: F12 date: 11/05/2021
CPU:
Info: 8-core model: Intel Core i7-9700K bits: 64 type: MCP cache: L2: 2 MiB
Speed (MHz): avg: 800 min/max: 800/4900 cores: 1: 800 2: 800 3: 800 4: 800
5: 800 6: 800 7: 800 8: 800
Graphics:
Device-1: AMD Navi 21 [Radeon RX 6800/6800 XT / 6900 XT] driver: amdgpu
v: kernel
Display: x11 server: X.org v: 1.21.1.9 with: Xwayland v: 23.2.2 driver: X:
loaded: amdgpu unloaded: modesetting,radeon dri: radeonsi gpu: amdgpu
resolution: 2560x1440~144Hz
API: OpenGL v: 4.6 Mesa 23.1.8 renderer: AMD Radeon RX 6800 (navi21 LLVM
16.0.6 DRM 3.54 6.5.11-gentoo-x86_64)
Audio:
Device-1: Intel Cannon Lake PCH cAVS driver: snd_hda_intel
Device-2: AMD Navi 21/23 HDMI/DP Audio driver: snd_hda_intel
API: ALSA v: k6.5.11-gentoo-x86_64 status: kernel-api
Server-1: PulseAudio v: 16.1 status: active
Network:
Device-1: Intel Ethernet I219-V driver: e1000e
IF: eno1 state: up speed: 1000 Mbps duplex: full mac: 18:c0:4d:2d:b3:7e
IF-ID-1: virbr0 state: down mac: 52:54:00:0a:95:c4
Drives:
Local Storage: total: 4.99 TiB used: 3.55 TiB (71.1%)
ID-1: /dev/nvme0n1 vendor: LDLC model: F8+M.2 480 size: 447.13 GiB
ID-2: /dev/nvme1n1 vendor: Samsung model: SSD 970 EVO Plus 1TB
size: 931.51 GiB
ID-3: /dev/sda vendor: Western Digital model: WD40EZRZ-22GXCB0
size: 3.64 TiB
Partition:
ID-1: / size: 844.04 GiB used: 467.86 GiB (55.4%) fs: btrfs dev: /dev/dm-0
ID-2: /boot size: 487.2 MiB used: 142.7 MiB (29.3%) fs: ext4
dev: /dev/nvme1n1p5
ID-3: /home size: 844.04 GiB used: 467.86 GiB (55.4%) fs: btrfs
dev: /dev/dm-0
ID-4: /var size: 844.04 GiB used: 467.86 GiB (55.4%) fs: btrfs
dev: /dev/dm-0
Swap:
ID-1: swap-1 type: file size: 7.98 GiB used: 0 KiB (0.0%)
file: /var/swapfile
Sensors:
System Temperatures: cpu: 30.0 C pch: 46.0 C mobo: N/A gpu: amdgpu
temp: 44.0 C
Fan Speeds (RPM): cpu: 811 fan-2: 0 fan-3: 0 gpu: amdgpu fan: 0
Info:
Processes: 359 Uptime: 21m Memory: available: 31.27 GiB
used: 5.54 GiB (17.7%) Shell: Zsh inxi: 3.3.27 |
System information (neofetch --off)
Code: |
OS: Gentoo Linux x86_64
Host: Z390 AORUS PRO
Kernel: 6.5.11-gentoo-x86_64
Uptime: 42 mins
Packages: 1408 (emerge)
Shell: zsh 5.9
Resolution: 2560x1440
DE: Plasma 5.27.8
WM: KWin
Theme: Breeze Light [Plasma], Breeze [GTK2/3]
Icons: [Plasma], breeze [GTK2/3]
Terminal: kitty
CPU: Intel i7-9700K (8) @ 4.900GHz
GPU: AMD ATI Radeon RX 6800/6800 XT / 6900 XT
Memory: 4659MiB / 32024MiB |
I have the following firmware configured for AMD GPU in /etc/portage/savedconfig/sys-kernel/linux-firmware-20231030. sienna is for my current GPU and I kept navi14 for my old GPU (AMD RX 5500 XT)
Code: | amdgpu/sienna_cichlid_vcn.bin
amdgpu/sienna_cichlid_ta.bin
amdgpu/sienna_cichlid_sos.bin
amdgpu/sienna_cichlid_smc.bin
amdgpu/sienna_cichlid_sdma.bin
amdgpu/sienna_cichlid_rlc.bin
amdgpu/sienna_cichlid_pfp.bin
amdgpu/sienna_cichlid_mec2.bin
amdgpu/sienna_cichlid_mec.bin
amdgpu/sienna_cichlid_me.bin
amdgpu/sienna_cichlid_dmcub.bin
amdgpu/sienna_cichlid_ce.bin
amdgpu/navi14_ta.bin
amdgpu/navi14_vcn.bin
amdgpu/navi14_sos.bin
amdgpu/navi14_smc.bin
amdgpu/navi14_sdma1.bin
amdgpu/navi14_sdma.bin
amdgpu/navi14_rlc.bin
amdgpu/navi14_pfp_wks.bin
amdgpu/navi14_pfp.bin
amdgpu/navi14_mec2_wks.bin
amdgpu/navi14_mec2.bin
amdgpu/navi14_mec_wks.bin
amdgpu/navi14_mec.bin
amdgpu/navi14_me_wks.bin
amdgpu/navi14_me.bin
amdgpu/navi14_gpu_info.bin
amdgpu/navi14_ce_wks.bin
amdgpu/navi14_ce.bin
amdgpu/navi14_asd.bin |
Any suggestion ? |
|
Back to top |
|
 |
jpsollie Apprentice

Joined: 17 Aug 2013 Posts: 279
|
Posted: Wed Nov 15, 2023 8:18 pm Post subject: |
|
|
MorgothSauron,
let's try to isolate the issue first:
Firefox may be using a software renderer and opengl / vulkan to render the image,
or may be using hardware video decoding. I think the former is true.
Can you use youtube downloader and play the video with eg VLC or MPV to see whether it works in a hardware accelerated environment? _________________ The power of Gentoo optimization (not overclocked): [img]https://www.passmark.com/baselines/V10/images/503714802842.png[/img] |
|
Back to top |
|
 |
MorgothSauron n00b


Joined: 24 Sep 2020 Posts: 69
|
Posted: Thu Nov 16, 2023 5:27 pm Post subject: |
|
|
jpsollie wrote: | MorgothSauron,
let's try to isolate the issue first:
Firefox may be using a software renderer and opengl / vulkan to render the image,
or may be using hardware video decoding. I think the former is true.
Can you use youtube downloader and play the video with eg VLC or MPV to see whether it works in a hardware accelerated environment? |
Is there a way to check what Firefox is using for rendering ?
One thing I remembered after your post is that I added the hwaccel USE flag to Firefox back in May. That's still a few months before the first appearance of the issue I currently have.
I will try your suggestion and download the YouTube video for local playback with VLC or MPV. However this approach implies that a given video would trigger a problem each time.
To be honest. I never tried to play the same video a second time to see what happens. I have nothing to lose trying your suggestion. It can only provide more information to continue troubleshooting.
Right now the issue is still unpredictable. I watch Youtube for few hours every day. The issue can take days or even weeks to happen again. I know because I'm writing down when it happens and I make a copy of /var/log/messages. |
|
Back to top |
|
 |
CooSee Veteran


Joined: 20 Nov 2004 Posts: 1356 Location: Earth
|
Posted: Thu Nov 16, 2023 8:53 pm Post subject: |
|
|
Quote: | Is there a way to check what Firefox is using for rendering ? |
e.g.
Code: | Window Protocol wayland |
that's what i get on my only hyprland system - xwayland disabled.
Quote: | One thing I remembered after your post is that I added the hwaccel USE flag to Firefox |
i don't use hwaccel USE flag! - no glitches - no freezes, but i use an very old RX590
have you tried with other desktop environment, e.g. gnome or maybe hyprland ?
 _________________ " Die Realität ist eine Illusion, die durch Mangel an ehrlicher Kommunikation entsteht " |
|
Back to top |
|
 |
MorgothSauron n00b


Joined: 24 Sep 2020 Posts: 69
|
Posted: Mon Nov 20, 2023 4:51 pm Post subject: |
|
|
I'm using X11 (Window Protocol = x11) since I built this system 2 years ago. I'll check if I can "transition" to wayland.
Quote: | i don't use hwaccel USE flag! - no glitches - no freezes |
I only experience rare display freeze. I know I play on words, but what is being displayed is glitch-free. It just stops refreshing. No screen tearing, no visual artifact.
Quote: | have you tried with other desktop environment, e.g. gnome or maybe hyprland ? |
Haven't tried other desktop environment. I only have KDE Plasma installed from the beginning and I'd like to keep it that way. I only install what I really need and I usually do a test installation in a VM first (yes, I have a gentoo VM that I maintain separately).
I'll try to remove the hwaccel for Firefox and see how it goes in the long run. I'll post back when I have new elements to share. |
|
Back to top |
|
 |
logrusx l33t

Joined: 22 Feb 2018 Posts: 989
|
Posted: Mon Nov 20, 2023 6:18 pm Post subject: Re: AMD GPU RX 6800 Random display freeze |
|
|
MorgothSauron wrote: |
[u]
- I cannot switch to a different console (e.g. Ctrl+Alt+F1)
|
Try pressing ALT+PrtSc/SysRq+R prior to attempting to switch to a different VT.
Best Regards,
Georgi |
|
Back to top |
|
 |
CooSee Veteran


Joined: 20 Nov 2004 Posts: 1356 Location: Earth
|
Posted: Thu Nov 23, 2023 7:56 pm Post subject: |
|
|
@MorgothSauron
if it's not much to ask - can you try Gentoo Live Gui - to get sure that this is not an Hardware issue !
 _________________ " Die Realität ist eine Illusion, die durch Mangel an ehrlicher Kommunikation entsteht " |
|
Back to top |
|
 |
|