Gentoo Forums
Gentoo Forums
Gentoo Forums
Quick Search: in
Startup issues with amdgpu and Radeon VII [Solved]
View unanswered posts
View posts from last 24 hours

Goto page 1, 2  Next  
Reply to topic    Gentoo Forums Forum Index Kernel & Hardware
View previous topic :: View next topic  
Author Message
Sqeaky
Tux's lil' helper
Tux's lil' helper


Joined: 31 Dec 2003
Posts: 149

PostPosted: Thu May 06, 2021 9:01 pm    Post subject: Startup issues with amdgpu and Radeon VII [Solved] Reply with quote

I am trying to set up 3d acceleration in a basic desktop environment. But the system appears to hang on boot.

I had a desktop without 3d acceleration working. I had compiled amdgpu as a module and it wasn't automatically loading. When I manually modprobed amdgpu the system appeared to hang.

I have since compiled amdgpu into the kernel and included the firmware into the kernel. Now the system appears to hang on boot, but I can ssh into it. So the system is booted but not outputting video.

In make.conf I have set VIDEO_CARDS to "amdgpu radeonsi radeon". Without "radeon" I get dependency conflicts, I am not sure how correct this is or if I even need to.

Per the AMDGPU wiki page page I selected all the kernel options and I think I correctly included all the firmware, here is the relevant section of the kernel `.config`:

Code:

#
# Firmware loader
#
CONFIG_FW_LOADER=y
CONFIG_EXTRA_FIRMWARE="amdgpu/vega20_asd.bin amdgpu/vega20_ce.bin amdgpu/vega20_me.bin amdgpu/vega20_mec2.bin amdgpu/vega20_mec.bin amdgpu/vega20_pfp.bin amdgpu/vega20_rlc.bin amdgpu/vega20_sdma1.bin amdgpu/vega20_sdma.bin amdgpu/vega20_smc.bin amdgpu/vega20_sos.bin amdgpu/vega20_ta.bin amdgpu/vega20_uvd.bin amdgpu/vega20_vce.bin"
CONFIG_EXTRA_FIRMWARE_DIR="/lib/firmware"
# CONFIG_FW_LOADER_USER_HELPER is not set
# CONFIG_FW_LOADER_COMPRESS is not set
CONFIG_FW_CACHE=y
# end of Firmware loader


I have the amdgpu options as that page indicates and the Xorg as well:

Code:

# CONFIG_DRM_RADEON is not set
CONFIG_DRM_AMDGPU=y
CONFIG_DRM_AMDGPU_SI=y
CONFIG_DRM_AMDGPU_CIK=y
CONFIG_DRM_AMDGPU_USERPTR=y
# CONFIG_DRM_AMDGPU_GART_DEBUGFS is not set

#
# ACP (Audio CoProcessor) Configuration
#
CONFIG_DRM_AMD_ACP=y
# end of ACP (Audio CoProcessor) Configuration

#
# Display Engine Configuration
#
CONFIG_DRM_AMD_DC=y
CONFIG_DRM_AMD_DC_DCN=y
CONFIG_DRM_AMD_DC_DCN3_0=y
CONFIG_DRM_AMD_DC_HDCP=y
CONFIG_DRM_AMD_DC_SI=y
# end of Display Engine Configuration

CONFIG_HSA_AMD=y


I ssh'ed in and got this output from dmesg here and i have the Xorg.0.log below

Selections from dmesg:
Code:

SNIPPED
[    0.482496] smpboot: CPU0: AMD Ryzen 9 3950X 16-Core Processor (family: 0x17, model: 0x71, stepping: 0x0)
[    0.482540] Performance Events: Fam17h+ core perfctr, AMD PMU driver.
[    0.482544] ... version:                0
[    0.482545] ... bit width:              48
[    0.482545] ... generic registers:      6
[    0.482546] ... value mask:             0000ffffffffffff
[    0.482547] ... max period:             00007fffffffffff
[    0.482548] ... fixed-purpose events:   0
[    0.482549] ... event mask:             000000000000003f
[    0.482588] rcu: Hierarchical SRCU implementation.
[    0.482807] smp: Bringing up secondary CPUs ...
SNIPPED
[    0.550047] ACPI: PCI Interrupt Link [LNKA] (IRQs 4 5 7 10 11 14 15) *0
[    0.550076] ACPI: PCI Interrupt Link [LNKB] (IRQs 4 5 7 10 11 14 15) *0
[    0.550100] ACPI: PCI Interrupt Link [LNKC] (IRQs 4 5 7 10 11 14 15) *0
[    0.550130] ACPI: PCI Interrupt Link [LNKD] (IRQs 4 5 7 10 11 14 15) *0
[    0.550158] ACPI: PCI Interrupt Link [LNKE] (IRQs 4 5 7 10 11 14 15) *0
[    0.550180] ACPI: PCI Interrupt Link [LNKF] (IRQs 4 5 7 10 11 14 15) *0
[    0.550203] ACPI: PCI Interrupt Link [LNKG] (IRQs 4 5 7 10 11 14 15) *0
[    0.550226] ACPI: PCI Interrupt Link [LNKH] (IRQs 4 5 7 10 11 14 15) *0
[    0.550467] iommu: Default domain type: Translated
[    0.550474] pci 0000:0c:00.0: vgaarb: VGA device added: decodes=io+mem,owns=none,locks=none
[    0.550474] pci 0000:0c:00.0: vgaarb: bridge control possible
[    0.550474] pci 0000:0c:00.0: vgaarb: setting as boot device
[    0.550474] vgaarb: loaded
SNIPPED
[    0.574696] amd_uncore: 4  amd_df counters detected
[    0.574704] amd_uncore: 6  amd_l3 counters detected
[    0.575018] LVT offset 0 assigned for vector 0x400
[    0.575177] perf: AMD IBS detected (0x000003ff)
[    0.576526] check: Scanning for low memory corruption every 60 seconds
[    0.576752] Initialise system trusted keyrings
[    0.576777] workingset: timestamp_bits=56 max_order=25 bucket_order=0
[    0.577269] SGI XFS with ACLs, security attributes, no debug enabled
[    0.579886] Key type asymmetric registered
[    0.579887] Asymmetric key parser 'x509' registered
[    0.579892] Block layer SCSI generic (bsg) driver version 0.4 loaded (major 250)
[    0.580104] pcieport 0000:00:01.1: PME: Signaling with IRQ 26
[    0.580188] pcieport 0000:00:01.2: PME: Signaling with IRQ 27
[    0.580272] pcieport 0000:00:03.1: PME: Signaling with IRQ 28
[    0.580377] pcieport 0000:00:07.1: PME: Signaling with IRQ 30
[    0.580447] pcieport 0000:00:08.1: PME: Signaling with IRQ 31
[    0.581863] efifb: probing for efifb
[    0.581873] efifb: framebuffer at 0xd0000000, using 32400k, total 32400k
[    0.581875] efifb: mode is 3840x2160x32, linelength=15360, pages=1
[    0.581876] efifb: scrolling: redraw
[    0.581877] efifb: Truecolor: size=8:8:8:8, shift=24:16:8:0
[    0.592393] Console: switching to colour frame buffer device 480x135
[    0.602713] fb0: EFI VGA frame buffer device
[    0.602802] input: Power Button as /devices/LNXSYSTM:00/LNXSYBUS:00/PNP0C0C:00/input/input0
[    0.602832] ACPI: Power Button [PWRB]
SNIPPED
[    0.605621] Non-volatile memory driver v1.3
[    0.605654] [drm] amdgpu kernel modesetting enabled.
[    0.605672] CRAT table disabled by module option
[    0.605683] Virtual CRAT table created for CPU
[    0.605697] amdgpu: Topology: Add CPU node
[    0.605740] checking generic (d0000000 1fa4000) vs hw (d0000000 10000000)
[    0.605741] fb0: switching to amdgpudrmfb from EFI VGA
[    0.605773] Console: switching to colour dummy device 80x25
[    0.605785] amdgpu 0000:0c:00.0: vgaarb: deactivate vga console
[    0.605803] amdgpu 0000:0c:00.0: enabling device (0006 -> 0007)
[    0.605828] [drm] initializing kernel modesetting (VEGA20 0x1002:0x66AF 0x1002:0x081E 0xC1).
[    0.605830] amdgpu 0000:0c:00.0: amdgpu: Trusted Memory Zone (TMZ) feature not supported
[    0.605835] [drm] register mmio base: 0xFCD00000
[    0.605836] [drm] register mmio size: 524288
[    0.605849] [drm] add ip block number 0 <soc15_common>
[    0.605850] [drm] add ip block number 1 <gmc_v9_0>
[    0.605851] [drm] add ip block number 2 <vega10_ih>
[    0.605853] [drm] add ip block number 3 <psp>
[    0.605854] [drm] add ip block number 4 <gfx_v9_0>
[    0.605856] [drm] add ip block number 5 <sdma_v4_0>
[    0.605857] [drm] add ip block number 6 <powerplay>
[    0.605858] [drm] add ip block number 7 <dm>
[    0.605859] [drm] add ip block number 8 <uvd_v7_0>
[    0.605860] [drm] add ip block number 9 <vce_v4_0>
[    0.605872] amdgpu 0000:0c:00.0: amdgpu: Fetched VBIOS from VFCT
[    0.605874] amdgpu: ATOM BIOS: 113-D3600200-106
[    0.605884] [drm] UVD(0) is enabled in VM mode
[    0.605885] [drm] UVD(1) is enabled in VM mode
[    0.605886] [drm] UVD(0) ENC is enabled in VM mode
[    0.605887] [drm] UVD(1) ENC is enabled in VM mode
[    0.605888] [drm] VCE enabled in VM mode
[    0.605900] amdgpu 0000:0c:00.0: amdgpu: HBM ECC is not presented.
[    0.605902] amdgpu 0000:0c:00.0: amdgpu: SRAM ECC is not presented.
[    0.605905] [drm] vm size is 262144 GB, 4 levels, block size is 9-bit, fragment size is 9-bit
[    0.605913] amdgpu 0000:0c:00.0: amdgpu: VRAM: 16368M 0x0000008000000000 - 0x00000083FEFFFFFF (16368M used)
[    0.605915] amdgpu 0000:0c:00.0: amdgpu: GART: 512M 0x0000000000000000 - 0x000000001FFFFFFF
[    0.605917] amdgpu 0000:0c:00.0: amdgpu: AGP: 267894784M 0x0000008400000000 - 0x0000FFFFFFFFFFFF
[    0.605922] [drm] Detected VRAM RAM=16368M, BAR=256M
[    0.605923] [drm] RAM width 4096bits HBM
[    0.605949] [TTM] Zone  kernel: Available graphics memory: 65918454 KiB
[    0.605950] [TTM] Zone   dma32: Available graphics memory: 2097152 KiB
[    0.605951] [TTM] Initializing pool allocator
[    0.605953] [TTM] Initializing DMA pool allocator
[    0.605974] [drm] amdgpu: 16368M of VRAM memory ready
[    0.605975] [drm] amdgpu: 16368M of GTT memory ready.
[    0.605977] [drm] GART: num cpu pages 131072, num gpu pages 131072
[    0.606100] [drm] PCIE GART of 512M enabled (table at 0x0000008001FA4000).
[    0.607080] amdgpu: hwmgr_sw_init smu backed is vega20_smu
[    0.607099] [drm] Found UVD firmware ENC: 1.2 DEC: .43 Family ID: 19
[    0.607102] [drm] PSP loading UVD firmware
[    0.607659] [drm] Found VCE firmware Version: 57.6 Binary ID: 4
[    0.607662] [drm] PSP loading VCE firmware
[    3.495203] [drm:psp_hw_start] *ERROR* PSP load sysdrv failed!
[    3.495205] [drm:psp_hw_init] *ERROR* PSP firmware loading failed
[    3.495207] [drm:amdgpu_device_fw_loading] *ERROR* hw_init of IP block <psp> failed -22
[    3.495209] amdgpu 0000:0c:00.0: amdgpu: amdgpu_device_ip_init failed
[    3.495217] tsc: Refined TSC clocksource calibration: 3493.437 MHz
[    3.495218] amdgpu 0000:0c:00.0: amdgpu: Fatal error during GPU init
[    3.495230] clocksource: tsc: mask: 0xffffffffffffffff max_cycles: 0x325b1901053, max_idle_ns: 440795306184 ns
[    3.495235] amdgpu: probe of 0000:0c:00.0 failed with error -22
SNIPPED


Xorg.0.log:
Code:

[    14.155] (--) Log file renamed from "/var/log/Xorg.pid-2157.log" to "/var/log/Xorg.0.log"
[    14.155]
X.Org X Server 1.20.11
X Protocol Version 11, Revision 0
[    14.155] Build Operating System: Linux 5.10.27-gentoo-x86_64 x86_64 Gentoo
[    14.155] Current Operating System: Linux NostaligiaForInfinity 5.10.27-gentoo-Sqeaky #45 SMP Thu May 6 15:03:54 CDT 2021 x86_64
[    14.155] Kernel command line: domdadm rootfstype=xfs root=/dev/md127
[    14.155] Build Date: 02 May 2021  03:58:57AM
[    14.155] 
[    14.155] Current version of pixman: 0.40.0
[    14.155]    Before reporting problems, check http://wiki.x.org
   to make sure that you have the latest version.
[    14.155] Markers: (--) probed, (**) from config file, (==) default setting,
   (++) from command line, (!!) notice, (II) informational,
   (WW) warning, (EE) error, (NI) not implemented, (??) unknown.
[    14.155] (==) Log file: "/var/log/Xorg.0.log", Time: Thu May  6 15:05:03 2021
[    14.156] (==) Using system config directory "/usr/share/X11/xorg.conf.d"
[    14.156] (==) No Layout section.  Using the first Screen section.
[    14.156] (==) No screen section available. Using defaults.
[    14.156] (**) |-->Screen "Default Screen Section" (0)
[    14.156] (**) |   |-->Monitor "<default monitor>"
[    14.157] (==) No monitor specified for screen "Default Screen Section".
   Using a default monitor configuration.
[    14.157] (==) Automatically adding devices
[    14.157] (==) Automatically enabling devices
[    14.157] (==) Automatically adding GPU devices
[    14.157] (==) Max clients allowed: 256, resource mask: 0x1fffff
[    14.157] (WW) The directory "/usr/share/fonts/misc/" does not exist.
[    14.157]    Entry deleted from font path.
[    14.157] (WW) The directory "/usr/share/fonts/TTF/" does not exist.
[    14.157]    Entry deleted from font path.
[    14.157] (WW) The directory "/usr/share/fonts/OTF/" does not exist.
[    14.157]    Entry deleted from font path.
[    14.157] (WW) The directory "/usr/share/fonts/Type1/" does not exist.
[    14.157]    Entry deleted from font path.
[    14.157] (WW) The directory "/usr/share/fonts/100dpi/" does not exist.
[    14.157]    Entry deleted from font path.
[    14.157] (WW) The directory "/usr/share/fonts/75dpi/" does not exist.
[    14.157]    Entry deleted from font path.
[    14.157] (==) FontPath set to:
   
[    14.157] (==) ModulePath set to "/usr/lib64/xorg/modules"
[    14.157] (II) The server relies on udev to provide the list of input devices.
   If no devices become available, reconfigure udev or disable AutoAddDevices.
[    14.157] (II) Loader magic: 0x556faba35d00
[    14.157] (II) Module ABI versions:
[    14.157]    X.Org ANSI C Emulation: 0.4
[    14.157]    X.Org Video Driver: 24.1
[    14.157]    X.Org XInput driver : 24.1
[    14.157]    X.Org Server Extension : 10.0
[    14.157] (EE) dbus-core: error connecting to system bus: org.freedesktop.DBus.Error.FileNotFound (Failed to connect to socket /run/dbus/system_bus_socket: No such file or directory)
[    14.157] (++) using VT number 7

[    14.157] (II) systemd-logind: logind integration requires -keeptty and -keeptty was not provided, disabling logind integration
[    14.159] (--) PCI:*(12@0:0:0) 1002:66af:1002:081e rev 193, Mem @ 0xd0000000/268435456, 0xe0000000/2097152, 0xfcd00000/524288, I/O @ 0x0000e000/256, BIOS @ 0x????????/131072
[    14.159] (II) LoadModule: "glx"
[    14.160] (II) Loading /usr/lib64/xorg/modules/extensions/libglx.so
[    14.163] (II) Module glx: vendor="X.Org Foundation"
[    14.163]    compiled for 1.20.11, module version = 1.0.0
[    14.163]    ABI class: X.Org Server Extension, version 10.0
[    14.163] (==) Matched ati as autoconfigured driver 0
[    14.163] (==) Matched modesetting as autoconfigured driver 1
[    14.163] (==) Matched fbdev as autoconfigured driver 2
[    14.163] (==) Matched vesa as autoconfigured driver 3
[    14.163] (==) Assigned the driver to the xf86ConfigLayout
[    14.163] (II) LoadModule: "ati"
[    14.163] (II) Loading /usr/lib64/xorg/modules/drivers/ati_drv.so
[    14.164] (II) Module ati: vendor="X.Org Foundation"
[    14.164]    compiled for 1.20.11, module version = 19.1.0
[    14.164]    Module class: X.Org Video Driver
[    14.164]    ABI class: X.Org Video Driver, version 24.1
[    14.164] (II) LoadModule: "radeon"
[    14.164] (II) Loading /usr/lib64/xorg/modules/drivers/radeon_drv.so
[    14.168] (II) Module radeon: vendor="X.Org Foundation"
[    14.168]    compiled for 1.20.11, module version = 19.1.0
[    14.168]    Module class: X.Org Video Driver
[    14.168]    ABI class: X.Org Video Driver, version 24.1
[    14.168] (II) LoadModule: "modesetting"
[    14.168] (II) Loading /usr/lib64/xorg/modules/drivers/modesetting_drv.so
[    14.168] (II) Module modesetting: vendor="X.Org Foundation"
[    14.168]    compiled for 1.20.11, module version = 1.20.11
[    14.168]    Module class: X.Org Video Driver
[    14.168]    ABI class: X.Org Video Driver, version 24.1
[    14.168] (II) LoadModule: "fbdev"
[    14.168] (WW) Warning, couldn't open module fbdev
[    14.168] (EE) Failed to load module "fbdev" (module does not exist, 0)
[    14.168] (II) LoadModule: "vesa"
[    14.168] (WW) Warning, couldn't open module vesa
[    14.168] (EE) Failed to load module "vesa" (module does not exist, 0)
[    14.168] (II) RADEON: Driver for ATI/AMD Radeon chipsets:
   ATI Radeon Mobility X600 (M24), ATI FireMV 2400,
   ATI Radeon Mobility X300 (M24), ATI FireGL M24 GL,
   ATI Radeon X600 (RV380), ATI FireGL V3200 (RV380),
   ATI Radeon IGP320 (A3), ATI Radeon IGP330/340/350 (A4),
   ATI Radeon 9500, ATI Radeon 9600TX, ATI FireGL Z1, ATI Radeon 9800SE,
   ATI Radeon 9800, ATI FireGL X2, ATI Radeon 9600, ATI Radeon 9600SE,
   ATI Radeon 9600XT, ATI FireGL T2, ATI Radeon 9650, ATI FireGL RV360,
   ATI Radeon 7000 IGP (A4+), ATI Radeon 8500 AIW,
   ATI Radeon IGP320M (U1), ATI Radeon IGP330M/340M/350M (U2),
   ATI Radeon Mobility 7000 IGP, ATI Radeon 9000/PRO, ATI Radeon 9000,
   ATI Radeon X800 (R420), ATI Radeon X800PRO (R420),
   ATI Radeon X800SE (R420), ATI FireGL X3 (R420),
   ATI Radeon Mobility 9800 (M18), ATI Radeon X800 SE (R420),
   ATI Radeon X800XT (R420), ATI Radeon X800 VE (R420),
   ATI Radeon X850 (R480), ATI Radeon X850 XT (R480),
   ATI Radeon X850 SE (R480), ATI Radeon X850 PRO (R480),
   ATI Radeon X850 XT PE (R480), ATI Radeon Mobility M7,
   ATI Mobility FireGL 7800 M7, ATI Radeon Mobility M6,
   ATI FireGL Mobility 9000 (M9), ATI Radeon Mobility 9000 (M9),
   ATI Radeon 9700 Pro, ATI Radeon 9700/9500Pro, ATI FireGL X1,
   ATI Radeon 9800PRO, ATI Radeon 9800XT,
   ATI Radeon Mobility 9600/9700 (M10/M11),
   ATI Radeon Mobility 9600 (M10), ATI Radeon Mobility 9600 (M11),
   ATI FireGL Mobility T2 (M10), ATI FireGL Mobility T2e (M11),
   ATI Radeon, ATI FireGL 8700/8800, ATI Radeon 8500, ATI Radeon 9100,
   ATI Radeon 7500, ATI Radeon VE/7000, ATI ES1000,
   ATI Radeon Mobility X300 (M22), ATI Radeon Mobility X600 SE (M24C),
   ATI FireGL M22 GL, ATI Radeon X800 (R423), ATI Radeon X800PRO (R423),
   ATI Radeon X800LE (R423), ATI Radeon X800SE (R423),
   ATI Radeon X800 XTP (R430), ATI Radeon X800 XL (R430),
   ATI Radeon X800 SE (R430), ATI Radeon X800 (R430),
   ATI FireGL V7100 (R423), ATI FireGL V5100 (R423),
   ATI FireGL unknown (R423), ATI Mobility FireGL V5000 (M26),
   ATI Mobility Radeon X700 XL (M26), ATI Mobility Radeon X700 (M26),
   ATI Radeon X550XTX, ATI Radeon 9100 IGP (A5),
   ATI Radeon Mobility 9100 IGP (U3), ATI Radeon XPRESS 200,
   ATI Radeon XPRESS 200M, ATI Radeon 9250, ATI Radeon 9200,
   ATI Radeon 9200SE, ATI FireMV 2200, ATI Radeon X300 (RV370),
   ATI Radeon X600 (RV370), ATI Radeon X550 (RV370),
   ATI FireGL V3100 (RV370), ATI FireMV 2200 PCIE (RV370),
   ATI Radeon Mobility 9200 (M9+), ATI Mobility Radeon X800 XT (M28),
   ATI Mobility FireGL V5100 (M28), ATI Mobility Radeon X800 (M28),
   ATI Radeon X850, ATI unknown Radeon / FireGL (R480),
   ATI Radeon X800XT (R423), ATI FireGL V5000 (RV410),
   ATI Radeon X700 XT (RV410), ATI Radeon X700 PRO (RV410),
   ATI Radeon X700 SE (RV410), ATI Radeon X700 (RV410),
   ATI Radeon X1800, ATI Mobility Radeon X1800 XT,
   ATI Mobility Radeon X1800, ATI Mobility FireGL V7200,
   ATI FireGL V7200, ATI FireGL V5300, ATI Mobility FireGL V7100,
   ATI FireGL V7300, ATI FireGL V7350, ATI Radeon X1600, ATI RV505,
   ATI Radeon X1300/X1550, ATI Radeon X1550, ATI M54-GL,
   ATI Mobility Radeon X1400, ATI Radeon X1550 64-bit,
   ATI Mobility Radeon X1300, ATI Radeon X1300, ATI FireGL V3300,
   ATI FireGL V3350, ATI Mobility Radeon X1450,
   ATI Mobility Radeon X2300, ATI Mobility Radeon X1350,
   ATI FireMV 2250, ATI Radeon X1650, ATI Mobility FireGL V5200,
   ATI Mobility Radeon X1600, ATI Radeon X1300 XT/X1600 Pro,
   ATI FireGL V3400, ATI Mobility FireGL V5250,
   ATI Mobility Radeon X1700, ATI Mobility Radeon X1700 XT,
   ATI FireGL V5200, ATI Radeon X2300HD, ATI Mobility Radeon HD 2300,
   ATI Radeon X1950, ATI Radeon X1900, ATI AMD Stream Processor,
   ATI RV560, ATI Mobility Radeon X1900, ATI Radeon X1950 GT, ATI RV570,
   ATI FireGL V7400, ATI Radeon 9100 PRO IGP,
   ATI Radeon Mobility 9200 IGP, ATI Radeon X1200, ATI RS740,
   ATI RS740M, ATI Radeon HD 2900 XT, ATI Radeon HD 2900 Pro,
   ATI Radeon HD 2900 GT, ATI FireGL V8650, ATI FireGL V8600,
   ATI FireGL V7600, ATI Radeon 4800 Series, ATI Radeon HD 4870 x2,
   ATI Radeon HD 4850 x2, ATI FirePro V8750 (FireGL),
   ATI FirePro V7760 (FireGL), ATI Mobility RADEON HD 4850,
   ATI Mobility RADEON HD 4850 X2, ATI FirePro RV770,
   AMD FireStream 9270, AMD FireStream 9250, ATI FirePro V8700 (FireGL),
   ATI Mobility RADEON HD 4870, ATI Mobility RADEON M98,
   ATI FirePro M7750, ATI M98, ATI Mobility Radeon HD 4650,
   ATI Radeon RV730 (AGP), ATI Mobility Radeon HD 4670,
   ATI FirePro M5750, ATI RV730XT [Radeon HD 4670], ATI RADEON E4600,
   ATI Radeon HD 4600 Series, ATI RV730 PRO [Radeon HD 4650],
   ATI FirePro V7750 (FireGL), ATI FirePro V5700 (FireGL),
   ATI FirePro V3750 (FireGL), ATI Mobility Radeon HD 4830,
   ATI Mobility Radeon HD 4850, ATI FirePro M7740, ATI RV740,
   ATI Radeon HD 4770, ATI Radeon HD 4700 Series, ATI RV610,
   ATI Radeon HD 2400 XT, ATI Radeon HD 2400 Pro,
   ATI Radeon HD 2400 PRO AGP, ATI FireGL V4000, ATI Radeon HD 2350,
   ATI Mobility Radeon HD 2400 XT, ATI Mobility Radeon HD 2400,
   ATI RADEON E2400, ATI FireMV 2260, ATI RV670, ATI Radeon HD3870,
   ATI Mobility Radeon HD 3850, ATI Radeon HD3850,
   ATI Mobility Radeon HD 3850 X2, ATI Mobility Radeon HD 3870,
   ATI Mobility Radeon HD 3870 X2, ATI Radeon HD3870 X2,
   ATI FireGL V7700, ATI Radeon HD3690, AMD Firestream 9170,
   ATI Radeon HD 4550, ATI Radeon RV710, ATI Radeon HD 4350,
   ATI Mobility Radeon 4300 Series, ATI Mobility Radeon 4500 Series,
   ATI FirePro RG220, ATI Mobility Radeon 4330, ATI RV630,
   ATI Mobility Radeon HD 2600, ATI Mobility Radeon HD 2600 XT,
   ATI Radeon HD 2600 XT AGP, ATI Radeon HD 2600 Pro AGP,
   ATI Radeon HD 2600 XT, ATI Radeon HD 2600 Pro, ATI Gemini RV630,
   ATI Gemini Mobility Radeon HD 2600 XT, ATI FireGL V5600,
   ATI FireGL V3600, ATI Radeon HD 2600 LE,
   ATI Mobility FireGL Graphics Processor, ATI Radeon HD 3470,
   ATI Mobility Radeon HD 3430, ATI Mobility Radeon HD 3400 Series,
   ATI Radeon HD 3450, ATI Radeon HD 3430, ATI FirePro V3700,
   ATI FireMV 2450, ATI Radeon HD 3600 Series, ATI Radeon HD 3650 AGP,
   ATI Radeon HD 3600 PRO, ATI Radeon HD 3600 XT,
   ATI Mobility Radeon HD 3650, ATI Mobility Radeon HD 3670,
   ATI Mobility FireGL V5700, ATI Mobility FireGL V5725,
   ATI Radeon HD 3200 Graphics, ATI Radeon 3100 Graphics,
   ATI Radeon HD 3300 Graphics, ATI Radeon 3000 Graphics, SUMO, SUMO2,
   ATI Radeon HD 4200, ATI Radeon 4100, ATI Mobility Radeon HD 4200,
   ATI Mobility Radeon 4100, ATI Radeon HD 4290, ATI Radeon HD 4250,
   AMD Radeon HD 6310 Graphics, AMD Radeon HD 6250 Graphics,
   AMD Radeon HD 6300 Series Graphics,
   AMD Radeon HD 6200 Series Graphics, PALM, CYPRESS,
   ATI FirePro (FireGL) Graphics Adapter, AMD Firestream 9370,
   AMD Firestream 9350, ATI Radeon HD 5800 Series,
   ATI Radeon HD 5900 Series, ATI Mobility Radeon HD 5800 Series,
   ATI Radeon HD 5700 Series, ATI Radeon HD 6700 Series,
   ATI Mobility Radeon HD 5000 Series, ATI Mobility Radeon HD 5570,
   ATI Radeon HD 5670, ATI Radeon HD 5570, ATI Radeon HD 5500 Series,
   REDWOOD, ATI Mobility Radeon Graphics, CEDAR, ATI FirePro 2270,
   ATI Radeon HD 5450, CAYMAN, AMD Radeon HD 6900 Series,
   AMD Radeon HD 6900M Series, Mobility Radeon HD 6000 Series, BARTS,
   AMD Radeon HD 6800 Series, AMD Radeon HD 6700 Series, TURKS, CAICOS,
   ARUBA, TAHITI, PITCAIRN, VERDE, OLAND, HAINAN, BONAIRE, KABINI,
   MULLINS, KAVERI, HAWAII
[    14.169] (II) modesetting: Driver for Modesetting Kernel Drivers: kms
[    14.169] (EE) open /dev/dri/card0: No such file or directory
[    14.169] (WW) Falling back to old probe method for modesetting
[    14.169] (EE) open /dev/dri/card0: No such file or directory
[    14.169] (EE) Screen 0 deleted because of no matching config section.
[    14.169] (II) UnloadModule: "modesetting"
[    14.169] (EE) Device(s) detected, but none match those in the config file.
[    14.169] (EE)
Fatal server error:
[    14.169] (EE) no screens found(EE)
[    14.169] (EE)
Please consult the The X.Org Foundation support
    at http://wiki.x.org
 for help.
[    14.169] (EE) Please also check the log file at "/var/log/Xorg.0.log" for additional information.
[    14.169] (EE)
[    14.169] (EE) Server terminated with error (1). Closing log file.


I have tried a variety of USE flag and kernel options, and I open to suggestions.


Last edited by Sqeaky on Thu May 20, 2021 8:20 pm; edited 1 time in total
Back to top
View user's profile Send private message
alamahant
Advocate
Advocate


Joined: 23 Mar 2019
Posts: 3878

PostPosted: Thu May 06, 2021 10:06 pm    Post subject: Reply with quote

Two things come to mind
1.please use a dracut made initramdisk.Create it by invoking --early-microcode
2.emerge linux-firmware
Code:

USE="initramfs" emerge -av linux-firmware

3.if uncertain about your kernel config use a prebuilt fully sweet and nice kernel like gentoo-kernel-bin.
4.Maybe temporarily remove any xorg config from /etc/x11
_________________
:)
Back to top
View user's profile Send private message
Sqeaky
Tux's lil' helper
Tux's lil' helper


Joined: 31 Dec 2003
Posts: 149

PostPosted: Thu May 06, 2021 11:41 pm    Post subject: Reply with quote

You say two things than give me four. Are the two: try a binary kernel and remove xorg configs?

1) I have no initramfs. Are you suggesting I need one? Nothing I had seen suggested I need one. Several sources said I could have the amdgpu driver as module with the fs on the normal fs or have them both built in. As near as I can tell this hasn't had any change for most people with similar issues. I am curiously what your line of thinking is.

Why dracut? The Dracut wiki doesn't mention "amd" or firmware. So this seems like extra steps unless you intended this just for the binary kernel you suggested.

2) I have emerged linux-firmware and the vega20 firmware appears to be present:

Code:
$ ls /lib/firmware/amdgpu/vega20* -1
/lib/firmware/amdgpu/vega20_asd.bin
/lib/firmware/amdgpu/vega20_ce.bin
/lib/firmware/amdgpu/vega20_me.bin
/lib/firmware/amdgpu/vega20_mec2.bin
/lib/firmware/amdgpu/vega20_mec.bin
/lib/firmware/amdgpu/vega20_pfp.bin
/lib/firmware/amdgpu/vega20_rlc.bin
/lib/firmware/amdgpu/vega20_sdma1.bin
/lib/firmware/amdgpu/vega20_sdma.bin
/lib/firmware/amdgpu/vega20_smc.bin
/lib/firmware/amdgpu/vega20_sos.bin
/lib/firmware/amdgpu/vega20_ta.bin
/lib/firmware/amdgpu/vega20_uvd.bin
/lib/firmware/amdgpu/vega20_vce.bin


It it my understanding this is what is included in the kernel with the build options I listed.

3) I will look into prebuilt kernels, but I am not looking to run prebuilt binaries. As troubleshooting step it could rule out kernel issues. It seems like I almost have this working. A lot of people have similar issues and then a parameter or setting later have a working system.

4) I have added no Xorg configs, the docs said it was optional and none of the options looked helpful. I will check and see if anything autogenerated is present.
Back to top
View user's profile Send private message
alamahant
Advocate
Advocate


Joined: 23 Mar 2019
Posts: 3878

PostPosted: Fri May 07, 2021 12:01 am    Post subject: Reply with quote

Quote:

I have no initramfs. Are you suggesting I need one?

Since it is your setip that is problematic you should get all the help you can get in booting your machine even as a temporary solution.It will not pollute your system or make it slow when clearly it is super.If it boots you in then you will be able to localize your problem
Quote:

Why dracut?

Because it is awesome.boots when others fail :)

Quote:

I will look into prebuilt kernels

please do that.In the endless wheel of causes and conditions maybe one inocuous little .config missing here can create a mayhem over there.

Some amd cobfig i got from my bloated kernel
Code:

CONFIG_X86_AMD_PLATFORM_DEVICE=y
# CONFIG_MNATIVE_AMD is not set
CONFIG_CPU_SUP_AMD=y
CONFIG_X86_MCE_AMD=y
CONFIG_PERF_EVENTS_AMD_POWER=m
CONFIG_MICROCODE_AMD=y
CONFIG_AMD_MEM_ENCRYPT=y
# CONFIG_AMD_MEM_ENCRYPT_ACTIVE_BY_DEFAULT is not set
CONFIG_AMD_NUMA=y
CONFIG_X86_AMD_FREQ_SENSITIVITY=m
CONFIG_AMD_NB=y
CONFIG_KVM_AMD=m
CONFIG_KVM_AMD_SEV=y
CONFIG_PATA_AMD=m
CONFIG_NET_VENDOR_AMD=y
CONFIG_AMD8111_ETH=m
CONFIG_AMD_XGBE=m
CONFIG_AMD_XGBE_DCB=y
CONFIG_AMD_XGBE_HAVE_ECC=y
CONFIG_AMD_PHY=m
CONFIG_HW_RANDOM_AMD=m
CONFIG_I2C_AMD756=m
CONFIG_I2C_AMD756_S4882=m
CONFIG_I2C_AMD8111=m
CONFIG_I2C_AMD_MP2=m
CONFIG_SPI_AMD=m
CONFIG_PINCTRL_AMD=m
CONFIG_GPIO_AMDPT=m
CONFIG_GPIO_AMD_FCH=m
CONFIG_GPIO_AMD8111=m
CONFIG_SENSORS_AMD_ENERGY=m
CONFIG_AGP_AMD64=m
CONFIG_DRM_AMDGPU=m
CONFIG_DRM_AMDGPU_SI=y
CONFIG_DRM_AMDGPU_CIK=y
CONFIG_DRM_AMDGPU_USERPTR=y
# CONFIG_DRM_AMDGPU_GART_DEBUGFS is not set
CONFIG_DRM_AMD_ACP=y
CONFIG_DRM_AMD_DC=y
CONFIG_DRM_AMD_DC_DCN=y
CONFIG_DRM_AMD_DC_HDCP=y
CONFIG_DRM_AMD_DC_SI=y
CONFIG_HSA_AMD=y
CONFIG_SND_SOC_AMD_ACP=m
CONFIG_SND_SOC_AMD_CZ_DA7219MX98357_MACH=m
CONFIG_SND_SOC_AMD_CZ_RT5645_MACH=m
CONFIG_SND_SOC_AMD_ACP3x=m
CONFIG_SND_SOC_AMD_RV_RT5682_MACH=m
CONFIG_SND_SOC_AMD_RENOIR=m
CONFIG_SND_SOC_AMD_RENOIR_MACH=m
# AMD SFH HID Support
CONFIG_AMD_SFH_HID=m
# end of AMD SFH HID Support
CONFIG_USB_AMD5536UDC=m
CONFIG_EDAC_AMD64=m
CONFIG_AMD_PMC=m
CONFIG_AMD_IOMMU=y
CONFIG_AMD_IOMMU_V2=y
CONFIG_NTB_AMD=m
CONFIG_AMDTEE=m

Maybe this
Code:

CONFIG_MICROCODE_AMD=y

?
Code:

You say two things than give me four.

oh i am sorry for the miscalculation....good luck with your booting

:)
_________________
:)


Last edited by alamahant on Fri May 07, 2021 12:34 am; edited 2 times in total
Back to top
View user's profile Send private message
pietinger
Moderator
Moderator


Joined: 17 Oct 2006
Posts: 4123
Location: Bavaria

PostPosted: Fri May 07, 2021 12:11 am    Post subject: Reply with quote

I am not an AMD expert, but IMHO you should be able to fix it also without an initramfs. Your problem starts here (in dmesg):
Code:
[    3.495203] [drm:psp_hw_start] *ERROR* PSP load sysdrv failed!

When I googled this I found:
https://forum.level1techs.com/t/radeon-vii-not-initialising-psp-fails/143654
which says:
Quote:
I noticed you have the AMD IOMMU driver loaded. There are some serious issues with the AMD IOMMU driver combined with AMD cards running at 1x. Try running your kernel with iommu=soft to see if that solves the issue. This is one thing that will be different in Linux vs Windows.

Also:
https://forums.gentoo.org/viewtopic-t-1106980-start-0.html
which says, you need the newest possible firmware, so maybe you will try: ACCEPTC_KEYWORDS="~amd64" emerge -uv sys-kernel/linux-firmware


P.S.: Dont forget to "make" your kernel again after that, because if you dont work with modules, all the firmware modules are built in the kernel as blob (only emerge doesnt help you).
Back to top
View user's profile Send private message
Sqeaky
Tux's lil' helper
Tux's lil' helper


Joined: 31 Dec 2003
Posts: 149

PostPosted: Fri May 07, 2021 2:01 am    Post subject: Reply with quote

I switched back to Gentoo after years of not understanding what was going on. The little details kept breaking things. I have zero interest in a solution that I do not understand, because I will be right to troubleshooting when it breaks. When I try dracut it will be after reading everything about and understanding what it is doing.

For now I am focusing understanding what I have done wrong.

Here are some excerpts from the dmesg of recent attempts:

Here is with amdgpu built into the kernel

Code:

3.492450] [drm:psp_hw_start] *ERROR* PSP load sysdrv failed!
[    3.492452] [drm:psp_hw_init] *ERROR* PSP firmware loading failed
[    3.492455] [drm:amdgpu_device_fw_loading] *ERROR* hw_init of IP block <psp> failed -22
[    3.492456] amdgpu 0000:0c:00.0: amdgpu: amdgpu_device_ip_init failed
[    3.492464] tsc: Refined TSC clocksource calibration: 3493.437 MHz
[    3.492465] amdgpu 0000:0c:00.0: amdgpu: Fatal error during GPU init
[    3.492477] clocksource: tsc: mask: 0xffffffffffffffff max_cycles: 0x325b1901053, max_idle_ns: 440795306184 ns
[    3.492481] amdgpu: probe of 0000:0c:00.0 failed with error -22


Here it is with no initramfs and it configured as a module.

Code:

[    1.350133] [drm] amdgpu kernel modesetting enabled.
[    1.350173] CRAT table disabled by module option
[    1.350174] Virtual CRAT table created for CPU
[    1.350181] amdgpu: Topology: Add CPU node
[    1.350239] checking generic (d0000000 1fa4000) vs hw (d0000000 10000000)
[    1.350239] fb0: switching to amdgpudrmfb from EFI VGA
[    1.350284] Console: switching to colour dummy device 80x25
[    1.350304] amdgpu 0000:0c:00.0: vgaarb: deactivate vga console
[    1.350333] amdgpu 0000:0c:00.0: enabling device (0006 -> 0007)
[    1.350372] [drm] initializing kernel modesetting (VEGA20 0x1002:0x66AF 0x1002:0x081E 0xC1).
[    1.350373] amdgpu 0000:0c:00.0: amdgpu: Trusted Memory Zone (TMZ) feature not supported
[    1.350378] [drm] register mmio base: 0xFCD00000
[    1.350378] [drm] register mmio size: 524288
[    1.350395] [drm] add ip block number 0 <soc15_common>
[    1.350395] [drm] add ip block number 1 <gmc_v9_0>
[    1.350396] [drm] add ip block number 2 <vega10_ih>
[    1.350396] [drm] add ip block number 3 <psp>
[    1.350397] [drm] add ip block number 4 <gfx_v9_0>
[    1.350397] [drm] add ip block number 5 <sdma_v4_0>
[    1.350397] [drm] add ip block number 6 <powerplay>
[    1.350398] [drm] add ip block number 7 <dm>
[    1.350398] [drm] add ip block number 8 <uvd_v7_0>
[    1.350398] [drm] add ip block number 9 <vce_v4_0>
[    1.350409] amdgpu 0000:0c:00.0: amdgpu: Fetched VBIOS from VFCT
[    1.350410] amdgpu: ATOM BIOS: 113-D3600200-106
[    1.350418] [drm] UVD(0) is enabled in VM mode
[    1.350418] [drm] UVD(1) is enabled in VM mode
[    1.350418] [drm] UVD(0) ENC is enabled in VM mode
[    1.350419] [drm] UVD(1) ENC is enabled in VM mode
[    1.350419] [drm] VCE enabled in VM mode
[    1.350432] amdgpu 0000:0c:00.0: amdgpu: HBM ECC is not presented.
[    1.350433] amdgpu 0000:0c:00.0: amdgpu: SRAM ECC is not presented.
[    1.350435] [drm] vm size is 262144 GB, 4 levels, block size is 9-bit, fragment size is 9-bit
[    1.350441] amdgpu 0000:0c:00.0: amdgpu: VRAM: 16368M 0x0000008000000000 - 0x00000083FEFFFFFF (16368M used)
[    1.350442] amdgpu 0000:0c:00.0: amdgpu: GART: 512M 0x0000000000000000 - 0x000000001FFFFFFF
[    1.350442] amdgpu 0000:0c:00.0: amdgpu: AGP: 267894784M 0x0000008400000000 - 0x0000FFFFFFFFFFFF
[    1.350447] [drm] Detected VRAM RAM=16368M, BAR=256M
[    1.350447] [drm] RAM width 4096bits HBM
[    1.350478] [TTM] Zone  kernel: Available graphics memory: 65923916 KiB
[    1.350478] [TTM] Zone   dma32: Available graphics memory: 2097152 KiB
[    1.350479] [TTM] Initializing pool allocator
[    1.350481] [TTM] Initializing DMA pool allocator
[    1.350501] [drm] amdgpu: 16368M of VRAM memory ready
[    1.350502] [drm] amdgpu: 16368M of GTT memory ready.
[    1.350504] [drm] GART: num cpu pages 131072, num gpu pages 131072
[    1.350654] [drm] PCIE GART of 512M enabled (table at 0x0000008001FA4000).
[    1.351663] amdgpu: hwmgr_sw_init smu backed is vega20_smu
[    1.351689] [drm] Found UVD firmware ENC: 1.2 DEC: .43 Family ID: 19
[    1.351691] [drm] PSP loading UVD firmware
[    1.352236] [drm] Found VCE firmware Version: 57.6 Binary ID: 4
[    1.352239] [drm] PSP loading VCE firmware
SNIPPED
 [drm:psp_hw_start [amdgpu]] *ERROR* PSP load sysdrv failed!
[    4.320191] [drm:psp_hw_init [amdgpu]] *ERROR* PSP firmware loading failed
[    4.320236] [drm:amdgpu_device_fw_loading [amdgpu]] *ERROR* hw_init of IP block <psp> failed -22
[    4.320237] amdgpu 0000:0c:00.0: amdgpu: amdgpu_device_ip_init failed
[    4.320250] amdgpu 0000:0c:00.0: amdgpu: Fatal error during GPU init
[    4.320271] amdgpu: probe of 0000:0c:00.0 failed with error -22


Both look really similiar.

EDIT - Alamahant, I am checking that config you provided, I will see if has something I don't.


Last edited by Sqeaky on Fri May 07, 2021 2:08 am; edited 1 time in total
Back to top
View user's profile Send private message
Sqeaky
Tux's lil' helper
Tux's lil' helper


Joined: 31 Dec 2003
Posts: 149

PostPosted: Fri May 07, 2021 2:03 am    Post subject: Reply with quote

and I just saw Pietinger. Hmmm....

I saw a similar message on IOMMU issues. I tried enabling and disabling a few IOMMU options, and I tried adding "IOMMU=soft" as a kernel param. None of that worked.

I will try the newer firmware. I had not considered that it might be stale.

EDIT - Actually none of the firmware is masked: https://packages.gentoo.org/packages/sys-kernel/linux-firmware
Back to top
View user's profile Send private message
Ralphred
Guru
Guru


Joined: 31 Dec 2013
Posts: 493

PostPosted: Fri May 07, 2021 7:22 am    Post subject: Reply with quote

Firstly
Code:
CONFIG_DRM_AMDGPU_SI=y
CONFIG_DRM_AMDGPU_CIK=y
are for the southern/sea island card respectively, so you can lose these.
Second
Code:
CONFIG_MICROCODE_AMD=y
is for your CPU, so you should have
Code:
CONFIG_EXTRA_FIRMWARE="amd-ucode/microcode_amd_fam17h.bin [more stuff]"

Whilst messing around with firmware, set amdgpu to module, it'll save you having to recompile every time you change/update the firmware. Once you get a clean fw load, recompile with =y and add the firmware to CONFIG_EXTRA_FIRMWARE.
After that you going to want to lose the VIDEO_CARDS="radeon" and solve the dependency conflict, or have an xorg.conf that explicitly uses the amdgpu driver, because most of the errors were xorg looking for cards you don't have.
A small working vega10 xorg.conf
Code:
Section "ServerLayout"
        Identifier     "X.org Configured"
        Screen      0  "Screen0" 0 0
        InputDevice    "Mouse0" "CorePointer"
        InputDevice    "Keyboard0" "CoreKeyboard"
        Option         "DisableVidModeExtension"  "True" #for freesync
EndSection

Section "Files"
        ModulePath   "/usr/lib64/xorg/modules"
        FontPath     "/usr/share/fonts/misc/"
        FontPath     "/usr/share/fonts/TTF/"
        FontPath     "/usr/share/fonts/OTF/"
        FontPath     "/usr/share/fonts/Type1/"
        FontPath     "/usr/share/fonts/100dpi/"
        FontPath     "/usr/share/fonts/75dpi/"
EndSection

Section "Module"
        Load  "glx"
EndSection

Section "InputClass"
  Identifier "keyboard"
  MatchIsKeyboard "yes"
  Option            "XkbModel" "pc105"
  Option            "XkbLayout" "gb"
  Option            "XkbOptions" "ctrl:nocaps, terminate:ctrl_alt_bksp, nodeadkeys"
EndSection

Section "InputDevice"
        Identifier  "Keyboard0"
        Driver      "kbd"
EndSection

Section "InputDevice"
        Identifier  "Mouse0"
        Driver      "mouse"
        Option      "Protocol" "auto"
        Option      "Device" "/dev/input/mice"
        Option      "ZAxisMapping" "4 5 6 7"
EndSection

Section "Monitor"
        Identifier   "DisplayPort-0"
        #Identifier   "HDMI-A-0"
        #Identifier   "DisplayPort-1"
        #Identifier   "HDMI-A-1"
        VendorName   "Vendor"
        ModelName    "Model"
        Modeline     "1080pSeventyFive"  170.00  1920 1928 1960 2026  1080 1105 1113 1119 +hsync -vsync
        Modeline     "1080pSixty"  148.50  1920 2008 2052 2200  1080 1084 1089 1125 +hsync +vsync
        Option "Primary"
EndSection

Section "Device"
        Identifier  "Card0"
        Driver      "amdgpu"
        BusID       "PCI:11:0:0"
        Option      "VariableRefresh" "True" #for freesync
EndSection

Section "Screen"
        Identifier "Screen0"
        Device     "Card0"
        Monitor    "DisplayPort-0"
        SubSection "Display"
                Viewport   0 0
                Depth      24
                Modes      "1080pSeventyFive"
                #Modes      "1080pSixty"
        EndSubSection
EndSection
The important part is the BusID in the Device section, make it point to your card as shown in lspci (converted to decimal, not hex).
Also switch your modeline to something supported by your monitor, or take it out completely and the xorg do the work with EDID, and set the Identifier in the Monitor section to match the port you use.

If you still don't get anywhere, pastebin your .config so we can check for driver conflicts.
Back to top
View user's profile Send private message
Goverp
Veteran
Veteran


Joined: 07 Mar 2007
Posts: 1992

PostPosted: Fri May 07, 2021 9:01 am    Post subject: Reply with quote

Squeaky, I note in your Xorg log that it says:
Code:
[    14.164] (II) LoadModule: "radeon"
[    14.164] (II) Loading /usr/lib64/xorg/modules/drivers/radeon_drv.so
[    14.168] (II) Module radeon: vendor="X.Org Foundation"
[    14.168]    compiled for 1.20.11, module version = 19.1.0
[    14.168]    Module class: X.Org Video Driver
[    14.168]    ABI class: X.Org Video Driver, version 24.1
[    14.168] (II) LoadModule: "modesetting"
[    14.168] (II) Loading /usr/lib64/xorg/modules/drivers/modesetting_drv.so
[    14.168] (II) Module modesetting: vendor="X.Org Foundation"
[    14.168]    compiled for 1.20.11, module version = 1.20.11
[    14.168]    Module class: X.Org Video Driver
[    14.168]    ABI class: X.Org Video Driver, version 24.1
[    14.168] (II) LoadModule: "fbdev"
[    14.168] (WW) Warning, couldn't open module fbdev
[    14.168] (EE) Failed to load module "fbdev" (module does not exist, 0)
[    14.168] (II) LoadModule: "vesa"
[    14.168] (WW) Warning, couldn't open module vesa
[    14.168] (EE) Failed to load module "vesa" (module does not exist, 0)
[    14.168] (II) RADEON: Driver for ATI/AMD Radeon chipsets:
   ATI Radeon Mobility X600 (M24), ATI FireMV 2400,
...

I that's where it's going wrong. AFAIR it should only be loading amdgpu, which now just says it works for all supported chipsets, rather than spamming the log with everything AMD have ever made!
_________________
Greybeard
Back to top
View user's profile Send private message
Goverp
Veteran
Veteran


Joined: 07 Mar 2007
Posts: 1992

PostPosted: Fri May 07, 2021 5:03 pm    Post subject: Reply with quote

Here's similar bit from Xorg log for my RX470 using amdgpu:
Code:
[     5.588] (II) LoadModule: "amdgpu"
[     5.588] (II) Loading /usr/lib64/xorg/modules/drivers/amdgpu_drv.so
[     5.590] (II) Module amdgpu: vendor="X.Org Foundation"
[     5.590]    compiled for 1.20.11, module version = 19.1.0
[     5.590]    Module class: X.Org Video Driver
[     5.590]    ABI class: X.Org Video Driver, version 24.1
[     5.590] (II) AMDGPU: Driver for AMD Radeon:
        All GPUs supported by the amdgpu kernel driver
[     5.600] (II) AMDGPU(0): [KMS] Kernel modesetting enabled.

and here are radeon and amd bits from my settings:
Code:
CONFIG_X86_AMD_PLATFORM_DEVICE=y
CONFIG_CPU_SUP_AMD=y
CONFIG_X86_MCE_AMD=y
CONFIG_PERF_EVENTS_AMD_POWER=m
CONFIG_MICROCODE_AMD=y
# CONFIG_AMD_MEM_ENCRYPT is not set
CONFIG_X86_AMD_FREQ_SENSITIVITY=y
CONFIG_AMD_NB=y
CONFIG_EXTRA_FIRMWARE="amd-ucode/microcode_amd_fam17h.bin amdgpu/polaris10_ce.bin amdgpu/polaris10_ce_2.bin amdgpu/polaris10_k2_smc.bin amdgpu/polaris10_k_mc.bin amdgpu/polaris10_k_smc.bin amdgpu/polaris10_mc.bin amdgpu/polaris10_me.bin amdgpu/polaris10_me_2.bin amdgpu/polaris10_mec.bin amdgpu/polaris10_mec2.bin amdgpu/polaris10_mec2_2.bin amdgpu/polaris10_mec_2.bin amdgpu/polaris10_pfp.bin amdgpu/polaris10_pfp_2.bin amdgpu/polaris10_rlc.bin amdgpu/polaris10_sdma.bin amdgpu/polaris10_sdma1.bin amdgpu/polaris10_smc.bin amdgpu/polaris10_smc_sk.bin amdgpu/polaris10_uvd.bin amdgpu/polaris10_vce.bin"
# CONFIG_NET_VENDOR_AMD is not set
# CONFIG_AMD_PHY is not set
CONFIG_HW_RANDOM_AMD=y
# CONFIG_I2C_AMD756 is not set
# CONFIG_I2C_AMD8111 is not set
# CONFIG_I2C_AMD_MP2 is not set
# CONFIG_PINCTRL_AMD is not set
CONFIG_SENSORS_AMD_ENERGY=m
# CONFIG_DRM_RADEON is not set
CONFIG_DRM_AMDGPU=y
# CONFIG_DRM_AMDGPU_SI is not set
# CONFIG_DRM_AMDGPU_CIK is not set
CONFIG_DRM_AMDGPU_USERPTR=y
CONFIG_DRM_AMD_ACP=y
CONFIG_DRM_AMD_DC=y
CONFIG_DRM_AMD_DC_DCN=y
# CONFIG_DRM_AMD_DC_HDCP is not set
# CONFIG_DRM_AMD_DC_SI is not set
CONFIG_HSA_AMD=y
# CONFIG_FB_RADEON is not set
# AMD SFH HID Support
CONFIG_AMD_SFH_HID=y
# end of AMD SFH HID Support
CONFIG_EDAC_AMD64=y
CONFIG_AMD_IOMMU=y
CONFIG_AMD_IOMMU_V2=y

and finally
Code:
/etc/portage/make.conf/01acer.make:VIDEO_CARDS="amdgpu radeonsi"

/etc/portage/package.accept_keywords/01acer.keywords:x11-drivers/xf86-video-amdgpu
/etc/portage/package.use/01acer.use:x11-libs/libdrm video_cards_radeon

I have no Xorg config
_________________
Greybeard
Back to top
View user's profile Send private message
Sqeaky
Tux's lil' helper
Tux's lil' helper


Joined: 31 Dec 2003
Posts: 149

PostPosted: Mon May 10, 2021 8:23 pm    Post subject: Reply with quote

I have removed the radeon SI and CIK kernel options. I added the CPU microcode bins to my kernel, I figured it wasn't important now and could wait, but why not fix a thing while it is convenient.

I removed "radeon" from VIDEO_CARDS in my make.conf and added "x11-libs/libdrm video_cards_radeon" to my package.use and that seems to have resolved my dependency conflicts

I rebuilt with the added CPU microcode.

It booted to a terminal login, no X desktop. I will tryto troubleshoot for a bit then get back here when I next get stuck.

EDIT -

Going through dmesg, the microcode and spectre mitigation stuff looks the same. I do like the idea of having it in the kernel anyway, so I will leave until a compelling reason to remove is clear.

The load errors appear to be missing but I am still checking that.

EDIT 2 -

This thing related to IOMMUs changed, I have no clue what this means:
Code:
Old: iommu: Default domain type: Translated
New: iommu: Default domain type: Passthrough


EDIT 3 -

Reading up: https://en.wikipedia.org/wiki/Input%E2%80%93output_memory_management_unit

Mellanox makes high perf networking devices, and suggests passthrough: https://community.mellanox.com/s/article/understanding-the-iommu-linux-grub-file-configuration

Lenovo docs on IOMMU virtualization allow for arbitrary devices in guests VMs performantly: https://community.mellanox.com/s/article/understanding-the-iommu-linux-grub-file-configuration

OK, I think an IOMMU is a way/system to make the physical VRAM in my GPU addressable via virtual addresses in the same address space as the rest of normal memory and devices. This has some overhead compared to DMA, but like a single lookup in a table and allows isolation of Main RAM from VRAM so a malicious device or firmware couldn't silently read/write Memory it hasn't been allowed to.

So passthrough is some way to streamline this translation by passing it through to decidcated hardware and translation is a normal table lookup, presumably in softare. This seems like the hardware acceleration in normal address virtualization that every modern CPU has.

EDIT 4 -

I also removed AMD IOMMUv2 support from my kernel and this disappeared:

Code:
AMD-Vi: AMD IOMMUv2 functionality not available on this system


So that is good,but no likely the source of any real issues.

EDIT 5 -

Nothing about amdgpu is present in the new DMesg at all and I don't why?! I will focus on this.
Back to top
View user's profile Send private message
pietinger
Moderator
Moderator


Joined: 17 Oct 2006
Posts: 4123
Location: Bavaria

PostPosted: Mon May 10, 2021 10:22 pm    Post subject: Reply with quote

Sqeaky wrote:
I also removed AMD IOMMUv2 support from my kernel [...]

You shouldnt do this. It is important for security (I have an intel and of course I have IOMMU (for intel) built-in the kernel).
Back to top
View user's profile Send private message
Ralphred
Guru
Guru


Joined: 31 Dec 2013
Posts: 493

PostPosted: Mon May 10, 2021 10:53 pm    Post subject: Reply with quote

Sqeaky wrote:
Nothing about amdgpu is present in the new DMesg at all and I don't why?! I will focus on this.

That's another reason to add it as a module, if it doesn't get auto-loaded it's time to start trawling through kernel source code with your device ID and see which module it really wants to load.
Back to top
View user's profile Send private message
Sqeaky
Tux's lil' helper
Tux's lil' helper


Joined: 31 Dec 2003
Posts: 149

PostPosted: Mon May 10, 2021 11:34 pm    Post subject: Reply with quote

I still have the v1 IOMMU enabled and I somehow didn't havethe amdgpu drivers elected. I included it and I amtrying a few things.
Back to top
View user's profile Send private message
Sqeaky
Tux's lil' helper
Tux's lil' helper


Joined: 31 Dec 2003
Posts: 149

PostPosted: Tue May 11, 2021 1:13 am    Post subject: Reply with quote

I had another minor related issue. When "AMD GPU" -> "Always enable userptr write support" is enabled I get the following, so now it is disabled:

Code:
udevd[808]: starting version 3.2.10
[    4.033449] random: udevd: uninitialized urandom read (16 bytes read)
[    4.047659] udevd[808]: starting eudev-3.2.10
[    4.056814] BUG: kernel NULL pointer dereference, address: 0000000000000008
[    4.056819] #PF: supervisor read access in kernel mode
[    4.056821] #PF: error_code(0x0000) - not-present page
[    4.056823] PGD 0 P4D 0
[    4.056827] Oops: 0000 [#1] PREEMPT SMP NOPTI
[    4.056829] CPU: 6 PID: 842 Comm: udevadm Not tainted 5.10.27-gentoo-Sqeaky #61
[    4.056832] Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./X570 Phantom Gaming 4 WiFi ax, BIOS P3.90 01/26/2021
[    4.056838] RIP: 0010:sysfs_kf_write+0x24/0x40
[    4.056841] Code: 00 00 00 0f 1f 00 48 89 d1 48 8b 17 48 8b 42 08 48 8b 78 60 48 8b 47 28 48 85 c0 74 04 48 8b 40 08 48 85 c9 74 13 4c 8b 42 60 <48> 8b 40 08 48 89 f2 4c 89 c6 e9 5d 21 d6 00 31 c0 c3 66 2e 0f 1f
[    4.056845] RSP: 0018:ffffbf5ec3097e00 EFLAGS: 00010206
[    4.056848] RAX: 0000000000000000 RBX: ffffbf5ec3097e78 RCX: 0000000000000003
[    4.056851] RDX: ffffa28803494100 RSI: ffffa28800f992c0 RDI: ffffa28803368558
[    4.056853] RBP: ffffa28800f992c0 R08: ffffffffb47aa260 R09: 0000000000000000
[    4.056854] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000003
[    4.056857] R13: fffffffffffffff2 R14: ffffa28800ffd9a0 R15: ffffa28800ffd980
[    4.056859] FS:  00007fc35f93d780(0000) GS:ffffa2a6beb80000(0000) knlGS:0000000000000000
[    4.056862] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[    4.056863] CR2: 0000000000000008 CR3: 0000000106c22000 CR4: 0000000000350ee0
[    4.056866] Call Trace:
[    4.056869]  kernfs_fop_write_iter+0x144/0x1d0
[    4.056873]  new_sync_write+0x117/0x1b0
[    4.056876]  vfs_write+0x246/0x2e0
[    4.056878]  ksys_write+0x6b/0x100
[    4.056882]  ? fpregs_assert_state_consistent+0x19/0x40
[    4.056886]  do_syscall_64+0x33/0x40
[    4.056890]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[    4.056893] RIP: 0033:0x7fc35fb701a3
[    4.056895] Code: 64 89 02 48 c7 c0 ff ff ff ff eb af 66 2e 0f 1f 84 00 00 00 00 00 90 64 8b 04 25 18 00 00 00 85 c0 75 14 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 55 c3 0f 1f 40 00 48 83 ec 28 48 89 54 24 18
[    4.056899] RSP: 002b:00007ffd73fd7b88 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[    4.056902] RAX: ffffffffffffffda RBX: 00007ffd73fd7b90 RCX: 00007fc35fb701a3
[    4.056904] RDX: 0000000000000003 RSI: 00007ffd73fd8928 RDI: 0000000000000003
[    4.056906] RBP: 0000564cfb31f710 R08: 0000000000000000 R09: 0000000000000001
[    4.056908] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000003
[    4.056910] R13: 00007ffd73fd8928 R14: 0000564cfb3152a0 R15: 00007ffd73fd7ff0
[    4.056912] Modules linked in:
[    4.056916] CR2: 0000000000000008
[    4.056919] ---[ end trace e534f3b799a7dd83 ]---
[    4.056922] RIP: 0010:sysfs_kf_write+0x24/0x40
[    4.056924] Code: 00 00 00 0f 1f 00 48 89 d1 48 8b 17 48 8b 42 08 48 8b 78 60 48 8b 47 28 48 85 c0 74 04 48 8b 40 08 48 85 c9 74 13 4c 8b 42 60 <48> 8b 40 08 48 89 f2 4c 89 c6 e9 5d 21 d6 00 31 c0 c3 66 2e 0f 1f
[    4.056928] RSP: 0018:ffffbf5ec3097e00 EFLAGS: 00010206
[    4.056931] RAX: 0000000000000000 RBX: ffffbf5ec3097e78 RCX: 0000000000000003
[    4.056933] RDX: ffffa28803494100 RSI: ffffa28800f992c0 RDI: ffffa28803368558
[    4.056935] RBP: ffffa28800f992c0 R08: ffffffffb47aa260 R09: 0000000000000000
[    4.056937] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000003
[    4.056939] R13: fffffffffffffff2 R14: ffffa28800ffd9a0 R15: ffffa28800ffd980
[    4.056941] FS:  00007fc35f93d780(0000) GS:ffffa2a6beb80000(0000) knlGS:0000000000000000
[    4.056944] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[    4.056946] CR2: 0000000000000008 CR3: 0000000106c22000 CR4: 0000000000350ee0
Back to top
View user's profile Send private message
Sqeaky
Tux's lil' helper
Tux's lil' helper


Joined: 31 Dec 2003
Posts: 149

PostPosted: Tue May 11, 2021 7:03 pm    Post subject: Reply with quote

I have done just about eveything I can think of.

The last thing I tried was getting the latest firmware from upstream
https://git.kernel.org/pub/scm/linux/kernel/git/firmware/linux-firmware.git

I downloaded the one from earlier today and I put the contents in /lib/firmware (moving the previous contents to a backup)

Making the amdgpu option a module or not didn't significantly change its behavior. It moved around the dmesg. but it was always similar.

I tried creating a /etc/X11/xorg.conf.d/radeon.conf. I just changed the indentifier from "radeon" to "RadeonVII" to make future logs more clear:
Code:
Section "Device"
   Identifier "RadeonVII"
   Driver "amdgpu"
EndSection


I don't think it is an X issue, it really seems like a kernel issue to me because of this error in my dmesg:
Code:
[    4.360268] [drm:psp_hw_start [amdgpu]] *ERROR* PSP load sysdrv failed!
[    4.360307] [drm:psp_hw_init [amdgpu]] *ERROR* PSP firmware loading failed
[    4.360343] [drm:amdgpu_device_fw_loading [amdgpu]] *ERROR* hw_init of IP block <psp> failed -22
[    4.360344] amdgpu 0000:0c:00.0: amdgpu: amdgpu_device_ip_init failed
[    4.360359] amdgpu 0000:0c:00.0: amdgpu: Fatal error during GPU init
[    4.360384] amdgpu: probe of 0000:0c:00.0 failed with error -22


Here are the current dmesg, kernel config, and Xorg.0.log from my 11th trial since yesterday morning.
https://gist.github.com/Sqeaky/61cc2b9c76ef41a07efd8af8beea6b6b

EDIT - I on my 12th try I re-added Silicon Island and Sea Islands support. No significant change.
https://gist.github.com/Sqeaky/526d713256fed0277e406a51e9b78f2e


Last edited by Sqeaky on Tue May 11, 2021 8:27 pm; edited 1 time in total
Back to top
View user's profile Send private message
Zucca
Moderator
Moderator


Joined: 14 Jun 2007
Posts: 3331
Location: Rasi, Finland

PostPosted: Tue May 11, 2021 7:38 pm    Post subject: Reply with quote

Have you tried 5.4 -series kernel?
I've had some strange problems with 5.10 -series. Not same problems, but still - problems.
_________________
..: Zucca :..
Gentoo IRC channels reside on Libera.Chat.
--
Quote:
I am NaN! I am a man!
Back to top
View user's profile Send private message
Ralphred
Guru
Guru


Joined: 31 Dec 2013
Posts: 493

PostPosted: Tue May 11, 2021 8:11 pm    Post subject: Reply with quote

I've been over your kernel config (I'm lazy and skipped over 5.10.27, so just dumped the .config in the directory and use make menuconfig). The only difference I can see, I have
Code:
CONFIG_FW_LOADER_PAGED_BUF=y
# CONFIG_FW_LOADER_USER_HELPER is not set
CONFIG_FW_LOADER_COMPRESS=y

Zucca's comment about trying 5.4 kernels makes sense, I double checked some of my older kernel sources and they did work (namely 5.4.48 is the highest 5.4 I have). As I said before, the only 5.10 source I have is the same as yours, and I never used it until now, certainly never built or run it. Zucca's comment does ring bells about me "leaving it alone for now as everything's working", then I forgot why I skipped it until a 5.11 hit the tree (I was running 5.9.x as ~ for a while IIRC too, don't remember why though, or why I went back to 5.4)

One last thing, put your BusID="PCI:12:0:0" in the /etc/X11/xorg.conf.d/radeon.conf file, just to be sure.
Back to top
View user's profile Send private message
Sqeaky
Tux's lil' helper
Tux's lil' helper


Joined: 31 Dec 2003
Posts: 149

PostPosted: Tue May 11, 2021 8:31 pm    Post subject: Reply with quote

I have not tried a 5.4 kernel. This is a fresh install. I will try that.

It is probably well documented how to get a specific kernel version right? I am sure I can figure it out

EDIT - Wasn't hard to figure out:
Code:
emerge =sys-kernel/gentoo-sources-5.4.109


Now I will get to building it

EDIT 2 - Zucca, Thanks for Never giving me up and never letting me down :(

EDIT 3 -

Using the 5.4 kernel and largely the same results an black booting screen, that looks like and efi framebuffer, and it fails to transfer to the amdgpu framebuffer.

I haven't read all the logs yet, but the amdgpu driver still has this error. Maybe it doesn't matter, but I will check the rest

Code:
[    2.354699] [drm:psp_hw_start [amdgpu]] *ERROR* PSP load sysdrv failed!
[    2.354740] [drm:psp_hw_init [amdgpu]] *ERROR* PSP firmware loading failed
[    2.354776] [drm:amdgpu_device_fw_loading [amdgpu]] *ERROR* hw_init of IP block <psp> failed -22
[    2.354778] amdgpu 0000:0c:00.0: amdgpu_device_ip_init failed
[    2.354780] amdgpu 0000:0c:00.0: Fatal error during GPU init
[    2.354787] [drm] amdgpu: finishing device.


here are the logs from trial 13 with the Gentoo 5.4.109 kernel sources: https://gist.github.com/Sqeaky/3d5ee8155f11f2bf9b68a517b263042d

Edit 4 - The system is still not starting correctly, but I can reboot/halt it from the command line (via ssh) and that didn't work before.

If that failure to initialize doesn't matter then maybe the Xorg config is it the issue

Code:
[    10.971] (II) AMDGPU: Driver for AMD Radeon:
   All GPUs supported by the amdgpu kernel driver
[    10.971] (II) AMDGPU(0): [KMS] drm report modesetting isn't supported.


here are the logs from trial 14 and I haven't tried yet: https://gist.github.com/Sqeaky/7665300b90730786b205b3a74d11d7d3

EDIT 5-

I have 4 of these kernel stack traces in dmesg

Code:
[    2.508549] ------------[ cut here ]------------
[    2.508549] Memory manager not clean during takedown.
[    2.508554] WARNING: CPU: 22 PID: 848 at drivers/gpu/drm/drm_mm.c:939 drm_mm_takedown+0x1b/0x20
[    2.508554] Modules linked in: snd_usb_audio(+) snd_usbmidi_lib snd_rawmidi snd_seq_device usbhid kvm_amd kvm irqbypass crct10dif_pclmul wmi_bmof crct10dif_common crc32_pclmul crc32c_intel ghash_clmulni_intel aesni_intel amdgpu(+) crypto_simd iwlmvm cryptd glue_helper mfd_core gpu_sched pcspkr ttm backlight igb ahci dca iwlwifi libahci libata wmi
[    2.508561] CPU: 22 PID: 848 Comm: udevd Tainted: G        W         5.4.109-gentoo-Sqeaky #1
[    2.508561] Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./X570 Phantom Gaming 4 WiFi ax, BIOS P3.90 01/26/2021
[    2.508562] RIP: 0010:drm_mm_takedown+0x1b/0x20
[    2.508562] Code: b4 24 d0 00 00 00 eb a6 0f 1f 80 00 00 00 00 48 8b 47 38 48 83 c7 38 48 39 c7 75 02 f3 c3 48 c7 c7 28 b4 1d a6 e8 23 81 47 00 <0f> 0b c3 66 90 41 57 49 89 f7 41 56 49 89 fe 41 55 41 54 55 53 4c
[    2.508563] RSP: 0018:ffffb41344c1faa8 EFLAGS: 00010286
[    2.508563] RAX: 0000000000000000 RBX: ffff8af275ee50e0 RCX: 000000000000040b
[    2.508563] RDX: 0000000000000001 RSI: 0000000000000082 RDI: ffffffffa697202c
[    2.508564] RBP: ffff8af275ee4f50 R08: 0000000000000001 R09: 000000000000040b
[    2.508564] R10: 0000000000015ca4 R11: 0000000000000001 R12: ffff8af2762f4d00
[    2.508566] hid-generic 0003:24F0:0140.0002: input,hidraw1: USB HID v1.10 Keyboard [Metadot - Das Keyboard Das Keyboard] on usb-0000:0e:00.3-1.4/input0
[    2.508567] R13: ffff8af275ee50c0 R14: 0000000000000170 R15: 0000000000000000
[    2.508568] FS:  00007fd921551740(0000) GS:ffff8af27ed80000(0000) knlGS:0000000000000000
[    2.508569] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[    2.508569] CR2: 00007fd9218cf3a9 CR3: 0000001fb2b76000 CR4: 0000000000340ee0
[    2.508569] Call Trace:
[    2.508615]  amdgpu_vram_mgr_fini+0x31/0xb0 [amdgpu]
[    2.508617]  ttm_bo_clean_mm+0xc7/0xe0 [ttm]
[    2.508653]  amdgpu_ttm_fini+0x78/0xe0 [amdgpu]
[    2.508655] hid-generic 0003:1BCF:0005.0003: input,hidraw2: USB HID v1.10 Mouse [USB Optical Mouse] on usb-0000:07:00.1-2.3/input0
[    2.508689]  amdgpu_bo_fini+0x9/0x30 [amdgpu]
[    2.508729]  gmc_v9_0_sw_fini+0x118/0x180 [amdgpu]
[    2.508765]  ? amdgpu_sa_bo_manager_fini+0x7a/0x90 [amdgpu]
[    2.508811]  amdgpu_device_fini+0x23b/0x42e [amdgpu]
[    2.508846]  amdgpu_driver_unload_kms+0x50/0xb0 [amdgpu]
[    2.508891]  amdgpu_driver_load_kms.cold+0x39/0x5b [amdgpu]
[    2.508893]  drm_dev_register+0x13b/0x180
[    2.508927]  amdgpu_pci_probe+0x105/0x160 [amdgpu]
[    2.508928]  ? __pm_runtime_resume+0x63/0x90
[    2.508930]  local_pci_probe+0x4b/0x90
[    2.508931]  ? _cond_resched+0x11/0x40
[    2.508932]  pci_device_probe+0xd5/0x170
[    2.508933]  really_probe+0x101/0x410
[    2.508934]  driver_probe_device+0x59/0xd0
[    2.508935]  device_driver_attach+0xb6/0xc0
[    2.508936]  __driver_attach+0x80/0x120
[    2.508936]  ? device_driver_attach+0xc0/0xc0
[    2.508937]  bus_for_each_dev+0x75/0xc0
[    2.508938]  bus_add_driver+0x12c/0x1e0
[    2.508939]  driver_register+0x86/0xd0
[    2.508939]  ? 0xffffffffc059d000
[    2.508940]  do_one_initcall+0x41/0x1e0
[    2.508941]  ? _cond_resched+0x11/0x40
[    2.508943]  ? kmem_cache_alloc_trace+0x4a/0x1d0
[    2.508944]  do_init_module+0x56/0x240
[    2.508944]  __do_sys_finit_module+0xfd/0x120
[    2.508945]  do_syscall_64+0x3e/0x80
[    2.508947]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[    2.508947] RIP: 0033:0x7fd92178e089
[    2.508948] Code: 00 c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d af 2d 0c 00 f7 d8 64 89 01 48
[    2.508949] RSP: 002b:00007ffcc8387a88 EFLAGS: 00000246 ORIG_RAX: 0000000000000139
[    2.508949] RAX: ffffffffffffffda RBX: 000055e46a5e75b0 RCX: 00007fd92178e089
[    2.508949] RDX: 0000000000000000 RSI: 00007fd92186bab5 RDI: 000000000000000e
[    2.508950] RBP: 0000000000020000 R08: 0000000000000000 R09: 000055e46a5df100
[    2.508950] R10: 000000000000000e R11: 0000000000000246 R12: 00007fd92186bab5
[    2.508950] R13: 0000000000000000 R14: 000055e46a5eae50 R15: 000055e46a5e75b0
[    2.508951] ---[ end trace e6fbe5986f4d60b5 ]---
[    2.508954] ------------[ cut here ]------------


EDIT 6 -

I have switched back the the firmware from the emerged linux-firmware package. This is trial 15, and I haven't checked the results at all. https://gist.github.com/Sqeaky/d17718bc7feee41b3d5421b0f14ba8f5

EDIT 7 -

I am convinced this GPU is good, it worked the last day I used it on Ubuntu. But now I am starting to doubt everything, so I set this the PCIE generation to 3, it should be defaulting to 4:

Here is trial 16 dmesg, xorg log and kernel config: https://gist.github.com/Sqeaky/f35633cd532aa0b22d09b8f7368b8e6a

EDIT 8 -
I am disabling ACP, the AMDGPU Audio CoProcessor. I don't want the sound coming out HDMI anyway, and I am enabling DC 2.1 which wasn't enabled.

https://gist.github.com/Sqeaky/eac1d3f5cada0b0f302a542f01c5b898

no apparent effect.

EDIT 9 -

Adding MTRR cleanup and removing all but vega DC, no apparent effect: https://gist.github.com/Sqeaky/233a923aef6883dcee2e5c830f8ad187

EDIT 10 -

Trying kernel 5.12.2 unmasked ~amd64 in /etc/portage/package.accepted keywords, enable some more DC options, re-enabled IOMMUv2.

A lot of reading from others indicates that these amdgpu stack traces are sometimes race conditions, I can do little to effect this, but I can switch the pre-emption model from desktop (voluntary pre-emption) to server (no pre-emption).

Stack traces are gone now, but still getting a blackboot screen and still have the PSP load error:

Code:
[    3.599896] [drm:psp_hw_start [amdgpu]] *ERROR* PSP load sysdrv failed!
[    3.600037] [drm:psp_hw_init [amdgpu]] *ERROR* PSP firmware loading failed
[    3.600156] [drm:amdgpu_device_fw_loading [amdgpu]] *ERROR* hw_init of IP block <psp> failed -22
[    3.600272] amdgpu 0000:0c:00.0: amdgpu: amdgpu_device_ip_init failed
[    3.600288] amdgpu 0000:0c:00.0: amdgpu: Fatal error during GPU init
[    3.600318] amdgpu: probe of 0000:0c:00.0 failed with error -22


dmesg, xorg log and kernel config for trial 19: https://gist.github.com/Sqeaky/03303a93f705970419891ce9d3efc639
https://gist.github.com/Sqeaky/03303a93f705970419891ce9d3efc639

EDIT 12 -

I am trying the new firmware again - trial 20 since monday morning - No significant change skipping logs

I am tring adding "amdgpu.dpm=0 amdgpu.dc=0 iommu=soft pci=noats" - trail 21 - logs: https://gist.github.com/Sqeaky/ba5adb2cfd33e3baef2fefdf403985da

Still no graphical boot still get PSP firmware load errors:
Code:
$ dmesg |grep amdgpu
[    0.000000] Kernel command line: domdadm rootfstype=xfs root=/dev/md127 amdgpu.dpm=0 amdgpu.dc=0 iommu=soft pci=noats
[    0.955544] [drm] amdgpu kernel modesetting enabled.
[    0.955576] amdgpu: Ignoring ACPI CRAT on non-APU system
[    0.955586] amdgpu: Topology: Add CPU node
[    0.955645] fb0: switching to amdgpudrmfb from EFI VGA
[    0.955708] amdgpu 0000:0c:00.0: vgaarb: deactivate vga console
[    0.955737] amdgpu 0000:0c:00.0: enabling device (0006 -> 0007)
[    0.955775] amdgpu 0000:0c:00.0: amdgpu: Trusted Memory Zone (TMZ) feature not supported
[    0.955813] amdgpu 0000:0c:00.0: amdgpu: Fetched VBIOS from VFCT
[    0.955814] amdgpu: ATOM BIOS: 113-D3600200-106
[    0.956584] amdgpu 0000:0c:00.0: amdgpu: HBM ECC is not presented.
[    0.956585] amdgpu 0000:0c:00.0: amdgpu: SRAM ECC is not presented.
[    0.956595] amdgpu 0000:0c:00.0: amdgpu: VRAM: 16368M 0x0000008000000000 - 0x00000083FEFFFFFF (16368M used)
[    0.956597] amdgpu 0000:0c:00.0: amdgpu: GART: 512M 0x0000000000000000 - 0x000000001FFFFFFF
[    0.956598] amdgpu 0000:0c:00.0: amdgpu: AGP: 267894784M 0x0000008400000000 - 0x0000FFFFFFFFFFFF
[    0.956649] [drm] amdgpu: 16368M of VRAM memory ready
[    0.956650] [drm] amdgpu: 16368M of GTT memory ready.
[    0.959755] amdgpu: hwmgr_sw_init smu backed is vega20_smu
[    3.192294] [drm:psp_hw_start [amdgpu]] *ERROR* PSP load sysdrv failed!
[    3.192437] [drm:psp_hw_init [amdgpu]] *ERROR* PSP firmware loading failed
[    3.192556] [drm:amdgpu_device_fw_loading [amdgpu]] *ERROR* hw_init of IP block <psp> failed -22
[    3.192671] amdgpu 0000:0c:00.0: amdgpu: amdgpu_device_ip_init failed
[    3.192689] amdgpu 0000:0c:00.0: amdgpu: Fatal error during GPU init
[    3.192718] amdgpu: probe of 0000:0c:00.0 failed with error -22


EDIT 13

I re-added the radeon.conf back in but now with the bus id as requested and it made no apparent difffeence. Moving to next trial.

I am removing the extra boot options and enabling some options the newer kernel includes extra debugging around DMA fences which appeared in previous stack traces. I haven't reviewed it in depth yet. I didn't even see the open [ OK ] messages, but I can still ssh in.

Dmesg, kernel config, and xorg logs https://gist.github.com/Sqeaky/0e23f561c950455f8a2908d1f63c31e6

edit -14 I tried a binary prebuilt kernel and it didn't even boot.
Back to top
View user's profile Send private message
Zucca
Moderator
Moderator


Joined: 14 Jun 2007
Posts: 3331
Location: Rasi, Finland

PostPosted: Fri May 14, 2021 6:35 am    Post subject: Reply with quote

You could add loglevel=7 on the kernel command line.
Maybe it'll spit more info about why the firmware couldn't be loaded.
_________________
..: Zucca :..
Gentoo IRC channels reside on Libera.Chat.
--
Quote:
I am NaN! I am a man!
Back to top
View user's profile Send private message
theotherjoe
Guru
Guru


Joined: 22 Nov 2003
Posts: 393

PostPosted: Fri May 14, 2021 12:15 pm    Post subject: Reply with quote

Sqeaky,
I am completely unfamiliar with X570 mainboards, so I may be
totally on the wrong track with my guess work.
Since your kernel is nagging about not being able what it needs to do
in (or with) the PSP, I wonder if there is a PSP enable/disable switch in your
BIOS?
Read some remarks about such possibilities on X570 mainboards

Code:
[    3.192294] [drm:psp_hw_start [amdgpu]] *ERROR* PSP load sysdrv failed!
[    3.192437] [drm:psp_hw_init [amdgpu]] *ERROR* PSP firmware loading failed
[    3.192556] [drm:amdgpu_device_fw_loading [amdgpu]] *ERROR* hw_init of IP block <psp> failed -22
Back to top
View user's profile Send private message
Ralphred
Guru
Guru


Joined: 31 Dec 2013
Posts: 493

PostPosted: Fri May 14, 2021 2:21 pm    Post subject: Reply with quote

Sqeaky wrote:
I re-added the radeon.conf back in but now with the bus id as requested and it made no apparent difffeence. Moving to next trial.

Leave it in there mate, you've not reached the point at which it would make a difference yet.

There is a "firmware persistence" issue I don't fully understand with some newer AMD gpu's, I have it (vega10) if I get a hard system crash caused by driver instability, and my son gets it (I think he's polaris) if he does a lot of switching between linux and windows. It is related to DC and I have to hard power off the machine (and wait for the led's to go out on the gfx card) to fix it.
I just feel like I should mention this with the idea that maybe hard powering down between kernel and firmware changes could be a good move on occasion.
The other thing I'd be doing in your position is completely removing a kernels source tree, re-emerging it then configuring it from scratch from a install bootdisk (so I have access to a fully modular kernels lspci when running make menu_config) following the gentoo handbook on a tablet or laptop. It's tedious I know, just something I'd be doing if I'd invested the amount of time you have.

Also, keep the edits to about 3 per post, then we see the new message notification to catch up on your progress in your new posts. Don't want you to get the impression that those who have tired to help you have exhausted all their ideas and aren't still trying to help.
Back to top
View user's profile Send private message
Sqeaky
Tux's lil' helper
Tux's lil' helper


Joined: 31 Dec 2003
Posts: 149

PostPosted: Sun May 16, 2021 9:30 am    Post subject: Reply with quote

I have actually totally blown away that gentoo install in an attempt to test if it was bad hardware and kubuntu cannot boot this either.

It couldn't. So I removed the riser cable (huge PiTA because of water cooling loop). That didn't fix it. I have gotten another GPU for testing, an rx370, and I after a confluence of confusing hardware issues. I am down to 3d/accelerated functionality on this card being bad, or at least some about the GPU that isn't simple framebuffers. I can make the rx 370 do just about about it is supposed to.

@Theotherjoe - I do not appear to have any PSP (or any HDCP options0 in the EFI settings.

@Zucca - The last few dmesg captured had a highlevel of logging, nearly twice the size. nothing clarifying as far as I could tell. I am not sure it matters now, I demonstrated another kernel worked with a fresh GPU butnot this one.

@Ralphred - I hard powered down between nearly every reboot. I also tried with DC disabled. I reconfigured the kernel cleanly twice so far. I will keep the edit suggestions in mind for the future, this time it didn't matter much, few of them mattered.

I think we can mark this as solved. I need a new a GPU, this one is busted.

I will install a fresh gentoo setup with the radeon to go with this temporary gpu
Back to top
View user's profile Send private message
Zucca
Moderator
Moderator


Joined: 14 Jun 2007
Posts: 3331
Location: Rasi, Finland

PostPosted: Sun May 16, 2021 11:17 am    Post subject: Reply with quote

Sqeaky wrote:
So I removed the riser cable (huge PiTA because of water cooling loop). That didn't fix it. I have gotten another GPU for testing, an rx370, and I after a confluence of confusing hardware issues. I am down to 3d/accelerated functionality on this card being bad, or at least some about the GPU that isn't simple framebuffers.
As a last resort I'd try to drop your PCIe speeds to 3.0 level.

I have had, not PCIe, but SATA speed problems. I had all the SATA ports on my MB in use (mdraid on all of them). I wasn't able to fix the problem with hard drives dropping out until I forced the speed to SATA2 levels. I think the SATA bus got somehow over saturated (hardware bug maybe) and started to drop hard drives.

So as a WILD guess, maybe something similar might be happening with your setup but on PCIe bus.
At least it wouldn't hurt to drop the bus speed and test?
_________________
..: Zucca :..
Gentoo IRC channels reside on Libera.Chat.
--
Quote:
I am NaN! I am a man!
Back to top
View user's profile Send private message
Ralphred
Guru
Guru


Joined: 31 Dec 2013
Posts: 493

PostPosted: Sun May 16, 2021 1:43 pm    Post subject: Reply with quote

Sqeaky wrote:
I think we can mark this as solved. I need a new a GPU, this one is busted.
That's sad news, sorry to hear that Sqeaky.
Zucca wrote:
I think the SATA bus got somehow over saturated (hardware bug maybe) and started to drop hard drives.
There is a lady that I interact with on a professional basis, she's hard to describe, but if you could get a doctorate in "connecting things together" she'd be the one reviewing the theses. I'm gonna recall something she b*tches about at regular intervals, the best I can
Helen the connect-it Guru wrote:
The problem with sata at the hardware level, it was designed to be used with cables longer than 1m, when you consider the speeds involved these 30cm cables are like a bloke standing next to you with a bullhorn wondering why you can't understand him!

Maybe useful, maybe not, but interesting nonetheless.
Back to top
View user's profile Send private message
Display posts from previous:   
Reply to topic    Gentoo Forums Forum Index Kernel & Hardware All times are GMT
Goto page 1, 2  Next
Page 1 of 2

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum