Gentoo Forums
Gentoo Forums
Gentoo Forums
Quick Search: in
[Solved] AMDGPU fatal error during GPU init
View unanswered posts
View posts from last 24 hours

 
Reply to topic    Gentoo Forums Forum Index Kernel & Hardware
View previous topic :: View next topic  
Author Message
seism
n00b
n00b


Joined: 28 Dec 2023
Posts: 3

PostPosted: Thu Dec 28, 2023 6:17 pm    Post subject: [Solved] AMDGPU fatal error during GPU init Reply with quote

Recently I've tried to setup the amdgpu drivers, but have run into the same error no matter what I do:

Code:

[    0.299520] [drm] amdgpu kernel modesetting enabled.
[    0.317799] amdgpu 0000:08:00.0: No more image in the PCI ROM
[    0.317815] amdgpu 0000:08:00.0: amdgpu: Fetched VBIOS from ROM BAR
[    0.317817] amdgpu: ATOM BIOS: 113-EXT90646-001
[    0.317829] amdgpu 0000:08:00.0: [drm:jpeg_v4_0_early_init] JPEG decode is enabled in VM mode
[    0.317837] Loading firmware: amdgpu/gc_11_0_3_mes_2.bin
[    0.317846] amdgpu 0000:08:00.0: Direct firmware load for amdgpu/gc_11_0_3_mes_2.bin failed with error -2
[    0.317849] [drm] try to fall back to amdgpu/gc_11_0_3_mes.bin
[    0.317850] Loading firmware: amdgpu/gc_11_0_3_mes.bin
[    0.317854] amdgpu 0000:08:00.0: Direct firmware load for amdgpu/gc_11_0_3_mes.bin failed with error -2
[    0.317856] [drm:amdgpu_device_init.cold] *ERROR* early_init of IP block <mes_v11_0> failed -19
[    0.317860] amdgpu 0000:08:00.0: amdgpu: Fatal error during GPU init
[    0.317862] amdgpu 0000:08:00.0: amdgpu: amdgpu: finishing device.
[    0.317973] amdgpu-reset-de (125) used greatest stack depth: 15768 bytes left


I tried looking at the firmware blobs with the same gc_11_0_3* prefix and found

Code:

/lib/firmware/amdgpu/gc_11_0_3_imu.bin
/lib/firmware/amdgpu/gc_11_0_3_me.bin
/lib/firmware/amdgpu/gc_11_0_3_mec.bin
/lib/firmware/amdgpu/gc_11_0_3_mes1.bin
/lib/firmware/amdgpu/gc_11_0_3_mes_2.bin
/lib/firmware/amdgpu/gc_11_0_3_pfp.bin
/lib/firmware/amdgpu/gc_11_0_3_rlc.bin


where, apparently, gc_11_0_3_mes.bin is missing. I tried including the firmware blobs into the kernel image but the same error pops up, and searching for the missing firmware on the internet got me nowhere. I tried loading drm_amdgpu as a module but that also solved nothing. I think I might be missing something else in the kernel build, but I'm out of ideas.

Here's the lspci output:

Code:

00:00.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Starship/Matisse Root Complex [1022:1480]
   Subsystem: ASUSTeK Computer Inc. Starship/Matisse Root Complex [1043:87c0]
00:00.2 IOMMU [0806]: Advanced Micro Devices, Inc. [AMD] Starship/Matisse IOMMU [1022:1481]
   Subsystem: ASUSTeK Computer Inc. Starship/Matisse IOMMU [1043:87c0]
00:01.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Starship/Matisse PCIe Dummy Host Bridge [1022:1482]
00:01.3 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Starship/Matisse GPP Bridge [1022:1483]
   Subsystem: Advanced Micro Devices, Inc. [AMD] Starship/Matisse GPP Bridge [1022:1234]
   Kernel driver in use: pcieport
00:02.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Starship/Matisse PCIe Dummy Host Bridge [1022:1482]
   DeviceName:  Onboard IGD
00:03.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Starship/Matisse PCIe Dummy Host Bridge [1022:1482]
00:03.1 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Starship/Matisse GPP Bridge [1022:1483]
   Subsystem: Advanced Micro Devices, Inc. [AMD] Starship/Matisse GPP Bridge [1022:1234]
   Kernel driver in use: pcieport
00:04.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Starship/Matisse PCIe Dummy Host Bridge [1022:1482]
00:05.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Starship/Matisse PCIe Dummy Host Bridge [1022:1482]
00:07.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Starship/Matisse PCIe Dummy Host Bridge [1022:1482]
00:07.1 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Starship/Matisse Internal PCIe GPP Bridge 0 to bus[E:B] [1022:1484]
   Subsystem: ASUSTeK Computer Inc. Starship/Matisse Internal PCIe GPP Bridge 0 to bus[E:B] [1043:87c0]
   Kernel driver in use: pcieport
00:08.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Starship/Matisse PCIe Dummy Host Bridge [1022:1482]
00:08.1 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Starship/Matisse Internal PCIe GPP Bridge 0 to bus[E:B] [1022:1484]
   Subsystem: ASUSTeK Computer Inc. Starship/Matisse Internal PCIe GPP Bridge 0 to bus[E:B] [1043:87c0]
   Kernel driver in use: pcieport
00:14.0 SMBus [0c05]: Advanced Micro Devices, Inc. [AMD] FCH SMBus Controller [1022:790b] (rev 61)
   Subsystem: ASUSTeK Computer Inc. FCH SMBus Controller [1043:87c0]
   Kernel driver in use: piix4_smbus
00:14.3 ISA bridge [0601]: Advanced Micro Devices, Inc. [AMD] FCH LPC Bridge [1022:790e] (rev 51)
   Subsystem: ASUSTeK Computer Inc. FCH LPC Bridge [1043:87c0]
00:18.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Matisse/Vermeer Data Fabric: Device 18h; Function 0 [1022:1440]
00:18.1 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Matisse/Vermeer Data Fabric: Device 18h; Function 1 [1022:1441]
00:18.2 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Matisse/Vermeer Data Fabric: Device 18h; Function 2 [1022:1442]
00:18.3 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Matisse/Vermeer Data Fabric: Device 18h; Function 3 [1022:1443]
   Kernel driver in use: k10temp
00:18.4 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Matisse/Vermeer Data Fabric: Device 18h; Function 4 [1022:1444]
00:18.5 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Matisse/Vermeer Data Fabric: Device 18h; Function 5 [1022:1445]
00:18.6 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Matisse/Vermeer Data Fabric: Device 18h; Function 6 [1022:1446]
00:18.7 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Matisse/Vermeer Data Fabric: Device 18h; Function 7 [1022:1447]
01:00.0 USB controller [0c03]: Advanced Micro Devices, Inc. [AMD] 400 Series Chipset USB 3.1 xHCI Compliant Host Controller [1022:43d5] (rev 01)
   Subsystem: ASMedia Technology Inc. 400 Series Chipset USB 3.1 xHCI Compliant Host Controller [1b21:1142]
   Kernel driver in use: xhci_hcd
01:00.1 SATA controller [0106]: Advanced Micro Devices, Inc. [AMD] 400 Series Chipset SATA Controller [1022:43c8] (rev 01)
   Subsystem: ASMedia Technology Inc. 400 Series Chipset SATA Controller [1b21:1062]
   Kernel driver in use: ahci
01:00.2 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] 400 Series Chipset PCIe Bridge [1022:43c6] (rev 01)
   Subsystem: ASMedia Technology Inc. 400 Series Chipset PCIe Bridge [1b21:0201]
   Kernel driver in use: pcieport
02:00.0 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] 400 Series Chipset PCIe Port [1022:43c7] (rev 01)
   Subsystem: ASMedia Technology Inc. 400 Series Chipset PCIe Port [1b21:3306]
   Kernel driver in use: pcieport
02:01.0 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] 400 Series Chipset PCIe Port [1022:43c7] (rev 01)
   Subsystem: ASMedia Technology Inc. 400 Series Chipset PCIe Port [1b21:3306]
   Kernel driver in use: pcieport
02:04.0 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] 400 Series Chipset PCIe Port [1022:43c7] (rev 01)
   Subsystem: ASMedia Technology Inc. 400 Series Chipset PCIe Port [1b21:3306]
   Kernel driver in use: pcieport
04:00.0 Ethernet controller [0200]: Realtek Semiconductor Co., Ltd. RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller [10ec:8168] (rev 15)
   Subsystem: ASUSTeK Computer Inc. PRIME B450M-A Motherboard [1043:8677]
   Kernel driver in use: r8169
06:00.0 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD/ATI] Navi 10 XL Upstream Port of PCI Express Switch [1002:1478] (rev 11)
   Kernel driver in use: pcieport
07:00.0 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD/ATI] Navi 10 XL Downstream Port of PCI Express Switch [1002:1479] (rev 11)
   Subsystem: Advanced Micro Devices, Inc. [AMD/ATI] Navi 10 XL Downstream Port of PCI Express Switch [1002:1479]
   Kernel driver in use: pcieport
08:00.0 VGA compatible controller [0300]: Advanced Micro Devices, Inc. [AMD/ATI] Navi 32 [Radeon RX 7700 XT / 7800 XT] [1002:747e] (rev ff)
   Subsystem: Gigabyte Technology Co., Ltd Navi 32 [Radeon RX 7700 XT / 7800 XT] [1458:2414]
08:00.1 Audio device [0403]: Advanced Micro Devices, Inc. [AMD/ATI] Navi 31 HDMI/DP Audio [1002:ab30]
   Subsystem: Advanced Micro Devices, Inc. [AMD/ATI] Navi 31 HDMI/DP Audio [1002:ab30]
   Kernel driver in use: snd_hda_intel
09:00.0 Non-Essential Instrumentation [1300]: Advanced Micro Devices, Inc. [AMD] Starship/Matisse PCIe Dummy Function [1022:148a]
   Subsystem: ASUSTeK Computer Inc. Starship/Matisse PCIe Dummy Function [1043:87c0]
0a:00.0 Non-Essential Instrumentation [1300]: Advanced Micro Devices, Inc. [AMD] Starship/Matisse Reserved SPP [1022:1485]
   Subsystem: ASUSTeK Computer Inc. Starship/Matisse Reserved SPP [1043:87c0]
0a:00.1 Encryption controller [1080]: Advanced Micro Devices, Inc. [AMD] Starship/Matisse Cryptographic Coprocessor PSPCPP [1022:1486]
   Subsystem: ASUSTeK Computer Inc. Starship/Matisse Cryptographic Coprocessor PSPCPP [1043:87c0]
   Kernel driver in use: ccp
0a:00.3 USB controller [0c03]: Advanced Micro Devices, Inc. [AMD] Matisse USB 3.0 Host Controller [1022:149c]
   Subsystem: ASUSTeK Computer Inc. Matisse USB 3.0 Host Controller [1043:87c0]
   Kernel driver in use: xhci_hcd
0a:00.4 Audio device [0403]: Advanced Micro Devices, Inc. [AMD] Starship/Matisse HD Audio Controller [1022:1487]
   Subsystem: ASUSTeK Computer Inc. Starship/Matisse HD Audio Controller [1043:8760]
   Kernel driver in use: snd_hda_intel


The AMDGPU wiki page doesn't mention anything about NAVI 32, so I'm at a loss for what to do.


Last edited by seism on Thu Dec 28, 2023 8:37 pm; edited 1 time in total
Back to top
View user's profile Send private message
grknight
Retired Dev
Retired Dev


Joined: 20 Feb 2015
Posts: 2118

PostPosted: Thu Dec 28, 2023 6:49 pm    Post subject: Reply with quote

Make the driver a module for easy firmware loading.. CONFIG_DRM_AMDGPU=m in the kernel config.
This will make upgrading easier in the future as well when AMD wants new files for the card.
Back to top
View user's profile Send private message
NeddySeagoon
Administrator
Administrator


Joined: 05 Jul 2003
Posts: 55180
Location: 56N 3W

PostPosted: Thu Dec 28, 2023 6:58 pm    Post subject: Reply with quote

seism,

Code:

08:00.0 VGA compatible controller [0300]: Advanced Micro Devices, Inc. [AMD/ATI] Navi 32 [Radeon RX 7700 XT / 7800 XT] [1002:747e] (rev ff)
   Subsystem: Gigabyte Technology Co., Ltd Navi 32 [Radeon RX 7700 XT / 7800 XT] [1458:2414]
08:00.1 Audio device [0403]: Advanced Micro Devices, Inc. [AMD/ATI] Navi 31 HDMI/DP Audio [1002:ab30]
   Subsystem: Advanced Micro Devices, Inc. [AMD/ATI] Navi 31 HDMI/DP Audio [1002:ab30]
   Kernel driver in use: snd_hda_intel


There is no kernel driver loaded for your video card, on it would have a line Kernel driver in use: ...

Looking at the ldddb for your chip set [1002:747e] is not listed. There are no Navi 32 cards there at all and its a liveish page.

Switch amdgpu back to a module. If your kernel is older than 6.6 update it. as Navi 32 is still newish and a new kernel cam make all the difference.
Use testing linux-firmware too.

Firmware loading errors like
Code:
[     0.317837] Loading firmware: amdgpu/gc_11_0_3_mes_2.bin
[    0.317846] amdgpu 0000:08:00.0: Direct firmware load for amdgpu/gc_11_0_3_mes_2.bin failed with error -2
show only the first missing file, then the driver gives up.

Some kernels will list all the firmware they successfully load. My Radeon card needs about 20 files, so discovering the name by trial and error is suboptimal.

After the reboot into a new kernel, check the build time in
Code:
uname -a
Its good to know that you are running the kernel you think you are.
_________________
Regards,

NeddySeagoon

Computer users fall into two groups:-
those that do backups
those that have never had a hard drive fail.
Back to top
View user's profile Send private message
seism
n00b
n00b


Joined: 28 Dec 2023
Posts: 3

PostPosted: Thu Dec 28, 2023 7:44 pm    Post subject: Reply with quote

Thanks for the answers,

I've set DRM_AMDGPU as a module, but this way the screen freezes during RC boot. I'm not sure what's causing it, since I can't login to check for errors. Do I need to build the firmware into the kernel, even if the amd drivers are set as a module?
Back to top
View user's profile Send private message
NeddySeagoon
Administrator
Administrator


Joined: 05 Jul 2003
Posts: 55180
Location: 56N 3W

PostPosted: Thu Dec 28, 2023 8:26 pm    Post subject: Reply with quote

seism,

What kernel are you running?
Share the output uf
Code:
uname -a


With AMDGPU=Y, the firmware must be included in the kernel. It cannot be read from /lib/firmware as root will not be mounted when the firmware is loaded.
With AMDGPU=M the module is loaded after root is mounted and /lib/firmware will be used for the firmware.

With a broken console driver, you will be unable to use the console, but if you set up ssh, that should work.
_________________
Regards,

NeddySeagoon

Computer users fall into two groups:-
those that do backups
those that have never had a hard drive fail.
Back to top
View user's profile Send private message
seism
n00b
n00b


Joined: 28 Dec 2023
Posts: 3

PostPosted: Thu Dec 28, 2023 8:36 pm    Post subject: Reply with quote

NeddySeagoon wrote:

What kernel are you running?
Share the output uf
Code:
uname -a



This was actually the problem! I hadn't changed the kernel symlink, so I was still working with a 6.1.* version. After upgrading to 6.6.8 (and actually linking it) the drivers finally work, thanks for all the help!
Back to top
View user's profile Send private message
Display posts from previous:   
Reply to topic    Gentoo Forums Forum Index Kernel & Hardware All times are GMT
Page 1 of 1

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum