View previous topic :: View next topic |
Author |
Message |
seism n00b

Joined: 28 Dec 2023 Posts: 3
|
Posted: Thu Dec 28, 2023 6:17 pm Post subject: [Solved] AMDGPU fatal error during GPU init |
|
|
Recently I've tried to setup the amdgpu drivers, but have run into the same error no matter what I do:
Code: |
[ 0.299520] [drm] amdgpu kernel modesetting enabled.
[ 0.317799] amdgpu 0000:08:00.0: No more image in the PCI ROM
[ 0.317815] amdgpu 0000:08:00.0: amdgpu: Fetched VBIOS from ROM BAR
[ 0.317817] amdgpu: ATOM BIOS: 113-EXT90646-001
[ 0.317829] amdgpu 0000:08:00.0: [drm:jpeg_v4_0_early_init] JPEG decode is enabled in VM mode
[ 0.317837] Loading firmware: amdgpu/gc_11_0_3_mes_2.bin
[ 0.317846] amdgpu 0000:08:00.0: Direct firmware load for amdgpu/gc_11_0_3_mes_2.bin failed with error -2
[ 0.317849] [drm] try to fall back to amdgpu/gc_11_0_3_mes.bin
[ 0.317850] Loading firmware: amdgpu/gc_11_0_3_mes.bin
[ 0.317854] amdgpu 0000:08:00.0: Direct firmware load for amdgpu/gc_11_0_3_mes.bin failed with error -2
[ 0.317856] [drm:amdgpu_device_init.cold] *ERROR* early_init of IP block <mes_v11_0> failed -19
[ 0.317860] amdgpu 0000:08:00.0: amdgpu: Fatal error during GPU init
[ 0.317862] amdgpu 0000:08:00.0: amdgpu: amdgpu: finishing device.
[ 0.317973] amdgpu-reset-de (125) used greatest stack depth: 15768 bytes left
|
I tried looking at the firmware blobs with the same gc_11_0_3* prefix and found
Code: |
/lib/firmware/amdgpu/gc_11_0_3_imu.bin
/lib/firmware/amdgpu/gc_11_0_3_me.bin
/lib/firmware/amdgpu/gc_11_0_3_mec.bin
/lib/firmware/amdgpu/gc_11_0_3_mes1.bin
/lib/firmware/amdgpu/gc_11_0_3_mes_2.bin
/lib/firmware/amdgpu/gc_11_0_3_pfp.bin
/lib/firmware/amdgpu/gc_11_0_3_rlc.bin
|
where, apparently, gc_11_0_3_mes.bin is missing. I tried including the firmware blobs into the kernel image but the same error pops up, and searching for the missing firmware on the internet got me nowhere. I tried loading drm_amdgpu as a module but that also solved nothing. I think I might be missing something else in the kernel build, but I'm out of ideas.
Here's the lspci output:
Code: |
00:00.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Starship/Matisse Root Complex [1022:1480]
Subsystem: ASUSTeK Computer Inc. Starship/Matisse Root Complex [1043:87c0]
00:00.2 IOMMU [0806]: Advanced Micro Devices, Inc. [AMD] Starship/Matisse IOMMU [1022:1481]
Subsystem: ASUSTeK Computer Inc. Starship/Matisse IOMMU [1043:87c0]
00:01.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Starship/Matisse PCIe Dummy Host Bridge [1022:1482]
00:01.3 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Starship/Matisse GPP Bridge [1022:1483]
Subsystem: Advanced Micro Devices, Inc. [AMD] Starship/Matisse GPP Bridge [1022:1234]
Kernel driver in use: pcieport
00:02.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Starship/Matisse PCIe Dummy Host Bridge [1022:1482]
DeviceName: Onboard IGD
00:03.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Starship/Matisse PCIe Dummy Host Bridge [1022:1482]
00:03.1 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Starship/Matisse GPP Bridge [1022:1483]
Subsystem: Advanced Micro Devices, Inc. [AMD] Starship/Matisse GPP Bridge [1022:1234]
Kernel driver in use: pcieport
00:04.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Starship/Matisse PCIe Dummy Host Bridge [1022:1482]
00:05.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Starship/Matisse PCIe Dummy Host Bridge [1022:1482]
00:07.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Starship/Matisse PCIe Dummy Host Bridge [1022:1482]
00:07.1 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Starship/Matisse Internal PCIe GPP Bridge 0 to bus[E:B] [1022:1484]
Subsystem: ASUSTeK Computer Inc. Starship/Matisse Internal PCIe GPP Bridge 0 to bus[E:B] [1043:87c0]
Kernel driver in use: pcieport
00:08.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Starship/Matisse PCIe Dummy Host Bridge [1022:1482]
00:08.1 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Starship/Matisse Internal PCIe GPP Bridge 0 to bus[E:B] [1022:1484]
Subsystem: ASUSTeK Computer Inc. Starship/Matisse Internal PCIe GPP Bridge 0 to bus[E:B] [1043:87c0]
Kernel driver in use: pcieport
00:14.0 SMBus [0c05]: Advanced Micro Devices, Inc. [AMD] FCH SMBus Controller [1022:790b] (rev 61)
Subsystem: ASUSTeK Computer Inc. FCH SMBus Controller [1043:87c0]
Kernel driver in use: piix4_smbus
00:14.3 ISA bridge [0601]: Advanced Micro Devices, Inc. [AMD] FCH LPC Bridge [1022:790e] (rev 51)
Subsystem: ASUSTeK Computer Inc. FCH LPC Bridge [1043:87c0]
00:18.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Matisse/Vermeer Data Fabric: Device 18h; Function 0 [1022:1440]
00:18.1 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Matisse/Vermeer Data Fabric: Device 18h; Function 1 [1022:1441]
00:18.2 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Matisse/Vermeer Data Fabric: Device 18h; Function 2 [1022:1442]
00:18.3 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Matisse/Vermeer Data Fabric: Device 18h; Function 3 [1022:1443]
Kernel driver in use: k10temp
00:18.4 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Matisse/Vermeer Data Fabric: Device 18h; Function 4 [1022:1444]
00:18.5 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Matisse/Vermeer Data Fabric: Device 18h; Function 5 [1022:1445]
00:18.6 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Matisse/Vermeer Data Fabric: Device 18h; Function 6 [1022:1446]
00:18.7 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Matisse/Vermeer Data Fabric: Device 18h; Function 7 [1022:1447]
01:00.0 USB controller [0c03]: Advanced Micro Devices, Inc. [AMD] 400 Series Chipset USB 3.1 xHCI Compliant Host Controller [1022:43d5] (rev 01)
Subsystem: ASMedia Technology Inc. 400 Series Chipset USB 3.1 xHCI Compliant Host Controller [1b21:1142]
Kernel driver in use: xhci_hcd
01:00.1 SATA controller [0106]: Advanced Micro Devices, Inc. [AMD] 400 Series Chipset SATA Controller [1022:43c8] (rev 01)
Subsystem: ASMedia Technology Inc. 400 Series Chipset SATA Controller [1b21:1062]
Kernel driver in use: ahci
01:00.2 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] 400 Series Chipset PCIe Bridge [1022:43c6] (rev 01)
Subsystem: ASMedia Technology Inc. 400 Series Chipset PCIe Bridge [1b21:0201]
Kernel driver in use: pcieport
02:00.0 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] 400 Series Chipset PCIe Port [1022:43c7] (rev 01)
Subsystem: ASMedia Technology Inc. 400 Series Chipset PCIe Port [1b21:3306]
Kernel driver in use: pcieport
02:01.0 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] 400 Series Chipset PCIe Port [1022:43c7] (rev 01)
Subsystem: ASMedia Technology Inc. 400 Series Chipset PCIe Port [1b21:3306]
Kernel driver in use: pcieport
02:04.0 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] 400 Series Chipset PCIe Port [1022:43c7] (rev 01)
Subsystem: ASMedia Technology Inc. 400 Series Chipset PCIe Port [1b21:3306]
Kernel driver in use: pcieport
04:00.0 Ethernet controller [0200]: Realtek Semiconductor Co., Ltd. RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller [10ec:8168] (rev 15)
Subsystem: ASUSTeK Computer Inc. PRIME B450M-A Motherboard [1043:8677]
Kernel driver in use: r8169
06:00.0 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD/ATI] Navi 10 XL Upstream Port of PCI Express Switch [1002:1478] (rev 11)
Kernel driver in use: pcieport
07:00.0 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD/ATI] Navi 10 XL Downstream Port of PCI Express Switch [1002:1479] (rev 11)
Subsystem: Advanced Micro Devices, Inc. [AMD/ATI] Navi 10 XL Downstream Port of PCI Express Switch [1002:1479]
Kernel driver in use: pcieport
08:00.0 VGA compatible controller [0300]: Advanced Micro Devices, Inc. [AMD/ATI] Navi 32 [Radeon RX 7700 XT / 7800 XT] [1002:747e] (rev ff)
Subsystem: Gigabyte Technology Co., Ltd Navi 32 [Radeon RX 7700 XT / 7800 XT] [1458:2414]
08:00.1 Audio device [0403]: Advanced Micro Devices, Inc. [AMD/ATI] Navi 31 HDMI/DP Audio [1002:ab30]
Subsystem: Advanced Micro Devices, Inc. [AMD/ATI] Navi 31 HDMI/DP Audio [1002:ab30]
Kernel driver in use: snd_hda_intel
09:00.0 Non-Essential Instrumentation [1300]: Advanced Micro Devices, Inc. [AMD] Starship/Matisse PCIe Dummy Function [1022:148a]
Subsystem: ASUSTeK Computer Inc. Starship/Matisse PCIe Dummy Function [1043:87c0]
0a:00.0 Non-Essential Instrumentation [1300]: Advanced Micro Devices, Inc. [AMD] Starship/Matisse Reserved SPP [1022:1485]
Subsystem: ASUSTeK Computer Inc. Starship/Matisse Reserved SPP [1043:87c0]
0a:00.1 Encryption controller [1080]: Advanced Micro Devices, Inc. [AMD] Starship/Matisse Cryptographic Coprocessor PSPCPP [1022:1486]
Subsystem: ASUSTeK Computer Inc. Starship/Matisse Cryptographic Coprocessor PSPCPP [1043:87c0]
Kernel driver in use: ccp
0a:00.3 USB controller [0c03]: Advanced Micro Devices, Inc. [AMD] Matisse USB 3.0 Host Controller [1022:149c]
Subsystem: ASUSTeK Computer Inc. Matisse USB 3.0 Host Controller [1043:87c0]
Kernel driver in use: xhci_hcd
0a:00.4 Audio device [0403]: Advanced Micro Devices, Inc. [AMD] Starship/Matisse HD Audio Controller [1022:1487]
Subsystem: ASUSTeK Computer Inc. Starship/Matisse HD Audio Controller [1043:8760]
Kernel driver in use: snd_hda_intel
|
The AMDGPU wiki page doesn't mention anything about NAVI 32, so I'm at a loss for what to do.
Last edited by seism on Thu Dec 28, 2023 8:37 pm; edited 1 time in total |
|
Back to top |
|
 |
grknight Retired Dev

Joined: 20 Feb 2015 Posts: 2118
|
Posted: Thu Dec 28, 2023 6:49 pm Post subject: |
|
|
Make the driver a module for easy firmware loading.. CONFIG_DRM_AMDGPU=m in the kernel config.
This will make upgrading easier in the future as well when AMD wants new files for the card. |
|
Back to top |
|
 |
NeddySeagoon Administrator


Joined: 05 Jul 2003 Posts: 55180 Location: 56N 3W
|
Posted: Thu Dec 28, 2023 6:58 pm Post subject: |
|
|
seism,
Code: |
08:00.0 VGA compatible controller [0300]: Advanced Micro Devices, Inc. [AMD/ATI] Navi 32 [Radeon RX 7700 XT / 7800 XT] [1002:747e] (rev ff)
Subsystem: Gigabyte Technology Co., Ltd Navi 32 [Radeon RX 7700 XT / 7800 XT] [1458:2414]
08:00.1 Audio device [0403]: Advanced Micro Devices, Inc. [AMD/ATI] Navi 31 HDMI/DP Audio [1002:ab30]
Subsystem: Advanced Micro Devices, Inc. [AMD/ATI] Navi 31 HDMI/DP Audio [1002:ab30]
Kernel driver in use: snd_hda_intel |
There is no kernel driver loaded for your video card, on it would have a line Kernel driver in use: ...
Looking at the ldddb for your chip set [1002:747e] is not listed. There are no Navi 32 cards there at all and its a liveish page.
Switch amdgpu back to a module. If your kernel is older than 6.6 update it. as Navi 32 is still newish and a new kernel cam make all the difference.
Use testing linux-firmware too.
Firmware loading errors like Code: | [ 0.317837] Loading firmware: amdgpu/gc_11_0_3_mes_2.bin
[ 0.317846] amdgpu 0000:08:00.0: Direct firmware load for amdgpu/gc_11_0_3_mes_2.bin failed with error -2 | show only the first missing file, then the driver gives up.
Some kernels will list all the firmware they successfully load. My Radeon card needs about 20 files, so discovering the name by trial and error is suboptimal.
After the reboot into a new kernel, check the build time in Its good to know that you are running the kernel you think you are. _________________ Regards,
NeddySeagoon
Computer users fall into two groups:-
those that do backups
those that have never had a hard drive fail. |
|
Back to top |
|
 |
seism n00b

Joined: 28 Dec 2023 Posts: 3
|
Posted: Thu Dec 28, 2023 7:44 pm Post subject: |
|
|
Thanks for the answers,
I've set DRM_AMDGPU as a module, but this way the screen freezes during RC boot. I'm not sure what's causing it, since I can't login to check for errors. Do I need to build the firmware into the kernel, even if the amd drivers are set as a module? |
|
Back to top |
|
 |
NeddySeagoon Administrator


Joined: 05 Jul 2003 Posts: 55180 Location: 56N 3W
|
Posted: Thu Dec 28, 2023 8:26 pm Post subject: |
|
|
seism,
What kernel are you running?
Share the output uf
With AMDGPU=Y, the firmware must be included in the kernel. It cannot be read from /lib/firmware as root will not be mounted when the firmware is loaded.
With AMDGPU=M the module is loaded after root is mounted and /lib/firmware will be used for the firmware.
With a broken console driver, you will be unable to use the console, but if you set up ssh, that should work. _________________ Regards,
NeddySeagoon
Computer users fall into two groups:-
those that do backups
those that have never had a hard drive fail. |
|
Back to top |
|
 |
seism n00b

Joined: 28 Dec 2023 Posts: 3
|
Posted: Thu Dec 28, 2023 8:36 pm Post subject: |
|
|
NeddySeagoon wrote: |
What kernel are you running?
Share the output uf
|
This was actually the problem! I hadn't changed the kernel symlink, so I was still working with a 6.1.* version. After upgrading to 6.6.8 (and actually linking it) the drivers finally work, thanks for all the help! |
|
Back to top |
|
 |
|
|
You cannot post new topics in this forum You cannot reply to topics in this forum You cannot edit your posts in this forum You cannot delete your posts in this forum You cannot vote in polls in this forum
|
|