Gentoo Forums
Gentoo Forums
Gentoo Forums
Quick Search: in
Old AMD switchable graphics system (Southern Islands) help
View unanswered posts
View posts from last 24 hours

 
Reply to topic    Gentoo Forums Forum Index Kernel & Hardware
View previous topic :: View next topic  
Author Message
drachir
n00b
n00b


Joined: 25 Apr 2024
Posts: 2
Location: Hungary

PostPosted: Tue May 14, 2024 1:00 pm    Post subject: Old AMD switchable graphics system (Southern Islands) help Reply with quote

Hi!

My main computer is an old laptop (Lenovo IdeaPad G510) with a Haswell Core i7 CPU (Intel HD4600 iGPU) and an AMD R7 M265 (Souther Islands - OLAND) dGPU.

I have some problems with the switchable graphics on this system. Tried searching for solutions on the Gentoo Forums and through Google but anything I've found was quite old information and no clear solutions (also nothing about my specific form of "failure".) First, I'll list some assumptions I had, so you can correct me if I'm wrong:
  • The old radeon driver has the "stable" support for these cards, but it doesn't have Vulkan support, so no joy if I want to run games on the laptop.
  • The new AMDGPU driver only has experimental support for my card, but in theory it should work correctly most of the time, and Vulkan support is also available, so this is the way for gaming.
  • I don't need the AMDGPU-PRO driver / it would even be detrimental for my use case (running games mainly through Proton and Wine.)

So I made a custom kernel config (dist-kernel with config.d snippets) that disables radeon and bakes into the kernel the i915 and amdgpu drivers (so not using them as modules right now). Also made sure to bake the needed firmware files into the kernel. This seemingly worked as intended but 2 issues still persist and I can't find the solution:
  • The AMD dGPU will become unresponsive after a time (not a set time, sometimes it's 5 minutes, sometimes a few hours). I mean with this that it doesn't "wake up" when requested, games won't launch,
    Code:
    DRI_PRIME=1 glxgears
    simply segfaults, and 'amdgpu_top' doesn't see the card anymore. It resolves itself after a full restart.
  • The processor and memory clocks of the GPU sometimes freeze at 300/300 MHz and don't scale with load (should go to 750/900.) This doesn't resolve itself with restart but will get "unstuck" by itself randomly, so I just need to restart my games a few times until it's okay.

I'd like to resolve the above issues because they're very annoying (especially the first since it needs a reboot and it's really random chance when I want to launch a game whether I still have my dGPU or not.)

Also, Wine and Steam likes to bombard me with the following warnings:
Code:
MESA-INTEL: warning: Haswell Vulkan support is incomplete
WARNING: radv is not a conformant vulkan implementation, testing use only.

I don't know how much these warnings matter since everything runs OK when the dGPU doesn't go missing / gets stuck on base clock. The performance is at about what I get on Windows (maybe a little lower sometimes and there's a bit of intermittent stuttering in e.g. Mechwarrior Online that I didn't experience on Windows the last time I played), and the games run as expected. Hardware accelerated video (on either GPU) is something that's also not working quite right but that's an entirely different can of worms that I didn't touch yet.

My make.conf:
Code:
VIDEO_CARDS="amdgpu radeonsi intel fbdev"

Dmesg outputs:
Code:
# dmesg | grep firmware
[    0.152568] Spectre V2 : Enabling Restricted Speculation for firmware calls
[    3.829829] Loading firmware: amdgpu/oland_mc.bin
[    3.831458] Loading firmware: amdgpu/oland_pfp.bin
[    3.831540] Loading firmware: amdgpu/oland_me.bin
[    3.831617] Loading firmware: amdgpu/oland_ce.bin
[    3.831686] Loading firmware: amdgpu/oland_rlc.bin
[    3.831823] Loading firmware: amdgpu/oland_smc.bin
[    3.832467] Loading firmware: amdgpu/oland_uvd.bin
[    3.834168] [drm] Found UVD firmware Version: 64.0 Family ID: 13

# dmesg | grep amdgpu
[    0.000000] Command line: BOOT_IMAGE=/boot/vmlinuz-6.8.7-euterpe root=UUID=223c3b7c-6f92-4f51-b581-d75562bde9bf ro amdgpu.si_support=1
[    0.052309] Kernel command line: BOOT_IMAGE=/boot/vmlinuz-6.8.7-euterpe root=UUID=223c3b7c-6f92-4f51-b581-d75562bde9bf ro amdgpu.si_support=1
[    3.820290] [drm] amdgpu kernel modesetting enabled.
[    3.820306] amdgpu: vga_switcheroo: detected switching method \_SB_.PCI0.GFX0.ATPX handle
[    3.820393] amdgpu: ATPX version 1, functions 0x00000033
[    3.820592] amdgpu: Virtual CRAT table created for CPU
[    3.820613] amdgpu: Topology: Add CPU node
[    3.820725] amdgpu 0000:01:00.0: enabling device (0000 -> 0003)
[    3.829463] amdgpu 0000:01:00.0: amdgpu: Fetched VBIOS from ATRM
[    3.829469] amdgpu: ATOM BIOS: BR45236.001
[    3.829486] kfd kfd: amdgpu: OLAND  not supported in kfd
[    3.829490] amdgpu 0000:01:00.0: amdgpu: Trusted Memory Zone (TMZ) feature not supported
[    3.829829] Loading firmware: amdgpu/oland_mc.bin
[    3.830098] amdgpu 0000:01:00.0: amdgpu: VRAM: 2048M 0x000000F400000000 - 0x000000F47FFFFFFF (2048M used)
[    3.830103] amdgpu 0000:01:00.0: amdgpu: GART: 1024M 0x000000FF00000000 - 0x000000FF3FFFFFFF
[    3.830270] [drm] amdgpu: 2048M of VRAM memory ready
[    3.830274] [drm] amdgpu: 7923M of GTT memory ready.
[    3.831087] amdgpu 0000:01:00.0: amdgpu: PCIE GART of 1024M enabled (table at 0x000000F400000000).
[    3.831458] Loading firmware: amdgpu/oland_pfp.bin
[    3.831540] Loading firmware: amdgpu/oland_me.bin
[    3.831617] Loading firmware: amdgpu/oland_ce.bin
[    3.831686] Loading firmware: amdgpu/oland_rlc.bin
[    3.831823] Loading firmware: amdgpu/oland_smc.bin
[    3.832346] [drm] amdgpu: dpm initialized
[    3.832467] Loading firmware: amdgpu/oland_uvd.bin
[    4.451867] amdgpu 0000:01:00.0: amdgpu: SE 1, SH per SE 1, CU per SH 6, active_cu_number 6
[    4.845180] amdgpu 0000:01:00.0: amdgpu: Using ATPX for runtime pm
[    4.845324] [drm] Initialized amdgpu 3.57.0 20150101 for 0000:01:00.0 on minor 0
[   13.633443] amdgpu 0000:01:00.0: amdgpu: PCIE GART of 1024M enabled (table at 0x000000F400000000).
[   52.598872] amdgpu 0000:01:00.0: amdgpu: PCIE GART of 1024M enabled (table at 0x000000F400000000).
[  272.299186] amdgpu 0000:01:00.0: amdgpu: PCIE GART of 1024M enabled (table at 0x000000F400000000).
[ 4631.940788] amdgpu 0000:01:00.0: amdgpu: PCIE GART of 1024M enabled (table at 0x000000F400000000).
[ 4633.336082] [drm:amdgpu_device_ip_late_init] *ERROR* late_init of IP block <si_dpm> failed -22

# dmesg | grep drm
[    3.820254] ACPI: bus type drm_connector registered
[    3.820290] [drm] amdgpu kernel modesetting enabled.
[    3.820858] [drm] initializing kernel modesetting (OLAND 0x1002:0x6604 0x17AA:0x380B 0x00).
[    3.820910] [drm] register mmio base: 0xB8000000
[    3.820913] [drm] register mmio size: 262144
[    3.820973] [drm] add ip block number 0 <si_common>
[    3.820980] [drm] add ip block number 1 <gmc_v6_0>
[    3.820983] [drm] add ip block number 2 <si_ih>
[    3.820986] [drm] add ip block number 3 <gfx_v6_0>
[    3.820988] [drm] add ip block number 4 <si_dma>
[    3.820990] [drm] add ip block number 5 <si_dpm>
[    3.820993] [drm] add ip block number 6 <dce_v6_0>
[    3.820995] [drm] add ip block number 7 <uvd_v3_1>
[    3.829816] [drm] PCIE gen 3 link speeds already enabled
[    3.829822] [drm] vm size is 64 GB, 2 levels, block size is 10-bit, fragment size is 9-bit
[    3.830113] [drm] Detected VRAM RAM=2048M, BAR=128M
[    3.830115] [drm] RAM width 128bits DDR3
[    3.830270] [drm] amdgpu: 2048M of VRAM memory ready
[    3.830274] [drm] amdgpu: 7923M of GTT memory ready.
[    3.830296] [drm] GART: num cpu pages 262144, num gpu pages 262144
[    3.832335] [drm] Internal thermal controller without fan control
[    3.832346] [drm] amdgpu: dpm initialized
[    3.832368] [drm] AMDGPU Display Connectors
[    3.834168] [drm] Found UVD firmware Version: 64.0 Family ID: 13
[    4.451810] [drm] UVD initialized successfully.
[    4.845324] [drm] Initialized amdgpu 3.57.0 20150101 for 0000:01:00.0 on minor 0
[    4.906294] [drm] Initialized i915 1.6.0 20230929 for 0000:00:02.0 on minor 1
[    4.909659] [drm] Initialized vgem 1.0.0 20120112 for vgem on minor 2
[    4.962498] fbcon: i915drmfb (fb0) is primary device
[    5.347354] i915 0000:00:02.0: [drm] fb0: i915drmfb frame buffer device
[   13.632782] [drm] PCIE gen 3 link speeds already enabled
[   14.242492] [drm] UVD initialized successfully.
[   52.598236] [drm] PCIE gen 3 link speeds already enabled
[   53.204390] [drm] UVD initialized successfully.
[  272.298540] [drm] PCIE gen 3 link speeds already enabled
[  272.911552] [drm] UVD initialized successfully.
[ 4631.940177] [drm] PCIE gen 3 link speeds already enabled
[ 4632.160692] [drm:si_dpm_set_power_state] *ERROR* si_restrict_performance_levels_before_switch failed
[ 4632.405203] [drm:si_dpm_set_power_state] *ERROR* si_set_sw_state failed
[ 4632.834588] [drm:si_dpm_set_power_state] *ERROR* si_restrict_performance_levels_before_switch failed
[ 4633.088837] [drm] UVD initialized successfully.
[ 4633.336082] [drm:amdgpu_device_ip_late_init] *ERROR* late_init of IP block <si_dpm> failed -22
[ 4633.570661] [drm:si_dpm_set_power_state] *ERROR* si_restrict_performance_levels_before_switch failed
[52162.608878] i915 0000:00:02.0: [drm] *ERROR* Atomic update failure on pipe B (start=946248 end=946249) time 121 us, min 1016, max 1023, scanline start 1014, end 1024
[86408.832746] i915 0000:00:02.0: [drm] *ERROR* Atomic update failure on pipe B (start=3515528 end=3515529) time 115 us, min 1016, max 1023, scanline start 1014, end 1022
[141223.398623] i915 0000:00:02.0: [drm] *ERROR* Atomic update failure on pipe B (start=7627910 end=7627911) time 128 us, min 1016, max 1023, scanline start 1015, end 1025

I'd be really grateful if someone with experience in these matters could help me resolve these issues. I'm quite sure that these lines are behind the "loss" of my dGPU but I don't know what to do with it:
Code:
[ 4632.160692] [drm:si_dpm_set_power_state] *ERROR* si_restrict_performance_levels_before_switch failed
[ 4632.405203] [drm:si_dpm_set_power_state] *ERROR* si_set_sw_state failed
[ 4632.834588] [drm:si_dpm_set_power_state] *ERROR* si_restrict_performance_levels_before_switch failed
[ 4633.088837] [drm] UVD initialized successfully.
[ 4633.336082] [drm:amdgpu_device_ip_late_init] *ERROR* late_init of IP block <si_dpm> failed -22
[ 4633.570661] [drm:si_dpm_set_power_state] *ERROR* si_restrict_performance_levels_before_switch failed

The config.d snippet for the kernel GPU settings:
Code:
CONFIG_ACPI_VIDEO=y
CONFIG_EXTRA_FIRMWARE="amdgpu/oland_uvd.bin amdgpu/oland_smc.bin amdgpu/oland_rlc.bin amdgpu/oland_pfp.bin amdgpu/oland_me.bin amdgpu/oland_mc.bin amdgpu/oland_ce.bin"
CONFIG_EXTRA_FIRMWARE_DIR="/lib/firmware"
CONFIG_SYSFB_SIMPLEFB=y
CONFIG_I2C_ALGOBIT=y
# CONFIG_AUXDISPLAY is not set
# CONFIG_AGP_SIS is not set
# CONFIG_AGP_VIA is not set
CONFIG_DRM_DISPLAY_HELPER=y
# CONFIG_DRM_DP_CEC is not set
CONFIG_DRM_TTM=y
CONFIG_DRM_EXEC=y
CONFIG_DRM_BUDDY=y
CONFIG_DRM_TTM_HELPER=y
CONFIG_DRM_GEM_SHMEM_HELPER=y
CONFIG_DRM_SUBALLOC_HELPER=y
CONFIG_DRM_SCHED=y
# CONFIG_DRM_I2C_CH7006 is not set
# CONFIG_DRM_I2C_SIL164 is not set
# CONFIG_DRM_RADEON is not set
CONFIG_DRM_AMDGPU=y
# CONFIG_DRM_AMDGPU_CIK is not set
# CONFIG_DRM_AMD_ACP is not set
# CONFIG_DRM_NOUVEAU is not set
CONFIG_DRM_I915=y
# CONFIG_DRM_I915_GVT_KVMGT is not set
# CONFIG_DRM_I915_PXP is not set
CONFIG_DRM_VGEM=y
# CONFIG_DRM_VKMS is not set
# CONFIG_DRM_VMWGFX is not set
# CONFIG_DRM_GMA500 is not set
# CONFIG_DRM_UDL is not set
# CONFIG_DRM_AST is not set
# CONFIG_DRM_MGAG200 is not set
# CONFIG_DRM_QXL is not set
# CONFIG_DRM_PANEL_WIDECHIPS_WS2401 is not set
# CONFIG_DRM_ANALOGIX_ANX78XX is not set
# CONFIG_DRM_BOCHS is not set
# CONFIG_DRM_CIRRUS_QEMU is not set
# CONFIG_DRM_GM12U320 is not set
# CONFIG_DRM_PANEL_MIPI_DBI is not set
CONFIG_DRM_SIMPLEDRM=y
# CONFIG_TINYDRM_ILI9163 is not set
# CONFIG_TINYDRM_ILI9486 is not set
# CONFIG_DRM_GUD is not set
# CONFIG_DRM_SSD130X is not set
CONFIG_FB_VGA16=y
CONFIG_FIRMWARE_EDID=y
# CONFIG_BACKLIGHT_KTD253 is not set
# CONFIG_BACKLIGHT_KTZ8866 is not set
# CONFIG_BACKLIGHT_APPLE is not set
CONFIG_VGASTATE=y
# CONFIG_LOGO is not set
CONFIG_ACPI_WMI=y
CONFIG_INTEL_IOMMU_DEFAULT_ON=y

I'm able to perform further investigation and try out different solutions with some guidance. Also, pointing back towards my assumptions, do I think correctly that if I want gaming (with Vulkan specifically) to work I can't just simply switch back to the radeon driver?

Also, some delay may be possible before I get back to you after a suggested solution since right now I'm compiling Chromium for about another 7.5 hours then the kernel for about an hour.

Thank you in advance.
_________________
Lenovo G510 - Intel Core i7-4702MQ (+ Intel HD Graphics 4600 IGPU), AMD Radeon R7 M265 DGPU, 16 GB DDR3 RAM, 960 GB Kingston A400 SSD
Back to top
View user's profile Send private message
Ralphred
Guru
Guru


Joined: 31 Dec 2013
Posts: 512

PostPosted: Wed May 15, 2024 11:05 am    Post subject: Reply with quote

I know that DRM_AMDGPU_SI and amdgpu.si_support=1 are always mentioned within the context of "make sure the right driver is loaded", but I'd include them (in the kernel .config and on the kernel command line respectively) and see if anything changes. amdgpu.dc=1 can be set for GCN1.1+ too, but the database I use doesn't show 1.1 as a thing, so maybe it should read 2.0+? EDIT:It's in the kernel though
Quote:
CONFIG_DRM_AMD_DC_SI: Choose this option to enable new AMD DC support for SI asics by default. This includes Tahiti, Pitcairn, Cape Verde, Oland.


Quote:
The processor and memory clocks of the GPU sometimes freeze
You might be able to sidestep this with settings in /sys/class/drm/card[x]/device/, but best to check for any telltale output in dmesg after setting AMDGPU_SI. The arch wiki has quite a comprehensive troubleshooting section as well as our own.

As far as
Code:
WARNING: radv is not a conformant vulkan implementation, testing use only.
goes, yeah it's an annoyance, but expected behaviour, I think it's only Navi and beyond that have "conformant" support.
Back to top
View user's profile Send private message
drachir
n00b
n00b


Joined: 25 Apr 2024
Posts: 2
Location: Hungary

PostPosted: Fri May 17, 2024 1:25 pm    Post subject: Reply with quote

Ralphred wrote:
I know that DRM_AMDGPU_SI and amdgpu.si_support=1 are always mentioned within the context of "make sure the right driver is loaded", but I'd include them (in the kernel .config and on the kernel command line respectively) and see if anything changes. amdgpu.dc=1 can be set for GCN1.1+ too, but the database I use doesn't show 1.1 as a thing, so maybe it should read 2.0+? EDIT:It's in the kernel though
Quote:
CONFIG_DRM_AMD_DC_SI: Choose this option to enable new AMD DC support for SI asics by default. This includes Tahiti, Pitcairn, Cape Verde, Oland.


Quote:
The processor and memory clocks of the GPU sometimes freeze
You might be able to sidestep this with settings in /sys/class/drm/card[x]/device/, but best to check for any telltale output in dmesg after setting AMDGPU_SI. The arch wiki has quite a comprehensive troubleshooting section as well as our own.

As far as
Code:
WARNING: radv is not a conformant vulkan implementation, testing use only.
goes, yeah it's an annoyance, but expected behaviour, I think it's only Navi and beyond that have "conformant" support.


Hey, sorry for the late reply, but I only had time to test this today. I have (and had) both DRM_AMDGPU_SI and CONFIG_DRM_AMD_DC_SI enabled in the kernel, and I already had amdgpu.si_support=1 in the kernel command line but not amdgpu.dc=1. So I added that and started testing it today morning. First restart, my dGPU didn't come online at all (forgot to check dmesg on that sadly...), but on the next boot it was OK. However, it seems my original problems persist, since for about half an hour the clocks were stuck at 300/300 then they unfroze like previously. I don't know if the problem with the dGPU disappearing after a time will still be an issue since it holds up okay for now but I don't think that it's solved with this.

However, thanks for the Arch Wiki link! I think my solution will be there (I'll test it as soon as my GPU once again decides to disappear just so I can be sure that the problem still exists):
Quote:
If you encounter issues where the kernel driver is loaded, but the discrete graphics card still is not available for games or becomes disabled during use (similar to [10]), you can workaround the issue by setting the kernel parameter amdgpu.runpm=0, which prevents the dGPU from being powered down dynamically at runtime.

As far as I can determine from dmesg, power management will be the cause of the problem. I think that the kernel/driver tries to put the card into a lower power state and when it succeeds, it then can't wake it up from that state for some reason.

Current dmesg warnings that make me think this:
Code:
[    3.820806] Linux agpgart interface v0.103
[    3.820914] ACPI: bus type drm_connector registered
[    3.820942] [drm] amdgpu kernel modesetting enabled.
[    3.820965] amdgpu: vga_switcheroo: detected switching method \_SB_.PCI0.GFX0.ATPX handle
[    3.821050] amdgpu: ATPX version 1, functions 0x00000033
[    3.821249] amdgpu: Virtual CRAT table created for CPU
[    3.821263] amdgpu: Topology: Add CPU node
[    3.821382] amdgpu 0000:01:00.0: enabling device (0000 -> 0003)
[    3.821424] ACPI BIOS Error (bug): Could not resolve symbol [\_SB.PCI0.PEG0._PRT.AR02], AE_NOT_FOUND (20230628/psargs-330)
[    3.821433] ACPI Error: Aborting method \_SB.PCI0.PEG0._PRT due to previous error (AE_NOT_FOUND) (20230628/psparse-529)
[    3.821504] [drm] initializing kernel modesetting (OLAND 0x1002:0x6604 0x17AA:0x380B 0x00).
...
[11233.322698] [drm:si_dpm_set_power_state] *ERROR* si_halt_smc failed
...
[14472.042227] [drm:si_dpm_set_power_state] *ERROR* si_power_control_set_level failed

I think that the ACPI errors are related to the dGPU (they come after the dGPU initialization and the PCI slot matches), and I also think that when those currently failed commands don't fail and manage to go through for some reason, that's when the dGPU becomes unresponsive.

I think there's no separate ACPI related setting in the kernel that I could play with to resolve this, because if I'm right the power management and ACPI commands related to the GPU are in the amdgpu driver not in a separate ACPI driver in the kernel. So for now I'm content with preventing the kernel from powering down the dGPU. Since I don't use my system as a laptop, I don't really care about losing power-efficiency by doing this.

I don't know whether reporting this "bug" upstream would do any good, since its a really old GPU, support in the amdgpu driver is marked as experimental, and it might be that this is a Lenovo G5xx series problem because Lenovo might've used some non-conformant ACPI BIOS... Also, I sincerely doubt that there's anyone aside from me who's degenerate enough to try gaming on this old thing AND use Linux to do so :lol:
_________________
Lenovo G510 - Intel Core i7-4702MQ (+ Intel HD Graphics 4600 IGPU), AMD Radeon R7 M265 DGPU, 16 GB DDR3 RAM, 960 GB Kingston A400 SSD
Back to top
View user's profile Send private message
Display posts from previous:   
Reply to topic    Gentoo Forums Forum Index Kernel & Hardware All times are GMT
Page 1 of 1

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum