View previous topic :: View next topic |
Author |
Message |
0x1000000 n00b

Joined: 22 Apr 2024 Posts: 18
|
Posted: Wed Apr 30, 2025 6:02 pm Post subject: Can't read AMD GPU temps |
|
|
I'm having issues with reading AMD GPU temps.
I followed the amdgpu kernel config guide (https://wiki.gentoo.org/wiki/AMDGPU), everything works fine except for reading temps or basically any data about the GPU.
I ran sensors-detect, and here is the sensors output afterwards:
Code: |
amdgpu-pci-0300
Adapter: PCI adapter
vddgfx: N/A
ERROR: Can't get value of subfeature fan1_min: Can't read
ERROR: Can't get value of subfeature fan1_max: Can't read
fan1: N/A (min = 0 RPM, max = 0 RPM)
edge: N/A (crit = +100.0°C, hyst = -273.1°C)
(emerg = +105.0°C)
junction: N/A (crit = +110.0°C, hyst = -273.1°C)
(emerg = +115.0°C)
mem: N/A (crit = +100.0°C, hyst = -273.1°C)
(emerg = +105.0°C)
ERROR: Can't get value of subfeature power1_average: Can't read
ERROR: Can't get value of subfeature power1_cap: Can't read
PPT: N/A (avg = 0.00 W, cap = 0.00 W)
pwm1: N/A
sclk: N/A
mclk: N/A
|
I tried using this kernel parameter as well, but the sensors output didn't change:
Quote: | acpi_enforce_resources=lax |
I also tried enabling PowerPlay via this kernel parameter but that didn't change anything either:
|
|
Back to top |
|
 |
pietinger Moderator

Joined: 17 Oct 2006 Posts: 5641 Location: Bavaria
|
|
Back to top |
|
 |
0x1000000 n00b

Joined: 22 Apr 2024 Posts: 18
|
Posted: Thu May 01, 2025 7:53 am Post subject: Re: Can't read AMD GPU temps |
|
|
I updated my kernel config according to the links, but still nothing sadly.
Kernel config: http://0x0.st/84oR.txt
dmesg output: http://0x0.st/84o7.txt |
|
Back to top |
|
 |
NeddySeagoon Administrator


Joined: 05 Jul 2003 Posts: 55200 Location: 56N 3W
|
Posted: Thu May 01, 2025 10:02 am Post subject: |
|
|
0x1000000,
Code: | # ACPI drivers
#
# CONFIG_SENSORS_ACPI_POWER is not set
# CONFIG_SENSORS_ATK0110 is not set
# CONFIG_SENSORS_ASUS_WMI is not set
# CONFIG_SENSORS_ASUS_EC is not set |
I have both CONFIG_SENSORS_ATK0110=m and CONFIG_SENSORS_ASUS_EC=y
Your kernel command line is[ Code: | 0.025492] Kernel command line: root=/dev/nvme0n1p2 initrd=\EFI\Gentoo\amd-uc.img |
I have Code: | [ 0.158881] Kernel command line: dm-mod.create="nvmestatic-root,,0,rw,0 4194304 linear /dev/nvme0n1p3 2048" root=/dev/dm-0 ro net.ifnames=0 BOOT_IMAGE=6.13.0-gentoo root=UUID=85916fec-4a94-40b7-8522-44ff698b157e ro net.ifnames=0 acpi_enforce_resources=no amd_pstate=active amd_prefcore=enable initrd=initramfs |
Most of that you should ignore.
Code: | acpi_enforce_resources=no | lets kernel drivers that conflict with ACPI drivers load.
is for power management. The default can be set at kernel build time.
I had trouble with that some time ago. My kernel has Code: | $ grep PSTATE /usr/src/linux/.config
# CONFIG_X86_INTEL_PSTATE is not set
CONFIG_X86_AMD_PSTATE=y
CONFIG_X86_AMD_PSTATE_DEFAULT_MODE=3 | It's power management rather that sensor reporting. While it may be interesting, it's not the issue here. _________________ Regards,
NeddySeagoon
Computer users fall into two groups:-
those that do backups
those that have never had a hard drive fail. |
|
Back to top |
|
 |
0x1000000 n00b

Joined: 22 Apr 2024 Posts: 18
|
Posted: Thu May 01, 2025 10:25 am Post subject: |
|
|
NeddySeagoon wrote: | 0x1000000,
Code: | # ACPI drivers
#
# CONFIG_SENSORS_ACPI_POWER is not set
# CONFIG_SENSORS_ATK0110 is not set
# CONFIG_SENSORS_ASUS_WMI is not set
# CONFIG_SENSORS_ASUS_EC is not set |
I have both CONFIG_SENSORS_ATK0110=m and CONFIG_SENSORS_ASUS_EC=y
Your kernel command line is[ Code: | 0.025492] Kernel command line: root=/dev/nvme0n1p2 initrd=\EFI\Gentoo\amd-uc.img |
I have Code: | [ 0.158881] Kernel command line: dm-mod.create="nvmestatic-root,,0,rw,0 4194304 linear /dev/nvme0n1p3 2048" root=/dev/dm-0 ro net.ifnames=0 BOOT_IMAGE=6.13.0-gentoo root=UUID=85916fec-4a94-40b7-8522-44ff698b157e ro net.ifnames=0 acpi_enforce_resources=no amd_pstate=active amd_prefcore=enable initrd=initramfs |
Most of that you should ignore.
Code: | acpi_enforce_resources=no | lets kernel drivers that conflict with ACPI drivers load.
is for power management. The default can be set at kernel build time.
I had trouble with that some time ago. My kernel has Code: | $ grep PSTATE /usr/src/linux/.config
# CONFIG_X86_INTEL_PSTATE is not set
CONFIG_X86_AMD_PSTATE=y
CONFIG_X86_AMD_PSTATE_DEFAULT_MODE=3 | It's power management rather that sensor reporting. While it may be interesting, it's not the issue here. |
I tried the things you listed, I actually ran into an issue with acpi_enforce_resources=no, dmesg said that this kernel parameter conflicted wit the ATK0110 driver, so I removed it, tried again, ran sensors-detect again, but still no data about my GPU. |
|
Back to top |
|
 |
Anon-E-moose Watchman


Joined: 23 May 2008 Posts: 6272 Location: Dallas area
|
Posted: Thu May 01, 2025 10:33 am Post subject: |
|
|
What does Code: | cat /sys/class/hwmon/*/name | return? _________________ UM780, 6.12 zen kernel, gcc 13, openrc, wayland |
|
Back to top |
|
 |
0x1000000 n00b

Joined: 22 Apr 2024 Posts: 18
|
Posted: Thu May 01, 2025 10:35 am Post subject: |
|
|
Anon-E-moose wrote: | What does Code: | cat /sys/class/hwmon/*/name | return? |
Code: | nvme
k10temp
mt7921_phy0
amdgpu
|
|
|
Back to top |
|
 |
Anon-E-moose Watchman


Joined: 23 May 2008 Posts: 6272 Location: Dallas area
|
Posted: Thu May 01, 2025 10:48 am Post subject: |
|
|
You don't have something configured right, I would look at the smbus, i2c and soc areas.
And make sure that if you have them set as modules that they are loaded.
Edit to add: Have you tried running sensors-detect (as root) to see what it says? _________________ UM780, 6.12 zen kernel, gcc 13, openrc, wayland |
|
Back to top |
|
 |
0x1000000 n00b

Joined: 22 Apr 2024 Posts: 18
|
Posted: Thu May 01, 2025 11:09 am Post subject: |
|
|
Anon-E-moose wrote: | You don't have something configured right, I would look at the smbus, i2c and soc areas.
And make sure that if you have them set as modules that they are loaded.
Edit to add: Have you tried running sensors-detect (as root) to see what it says? |
I already enabled PIIX4, AMD SPI controller, all of them are built-in, not really sure what to look for.
I ran sensors-detect multiple times as root, it just says no sensors were detected.
Edit: I also enabled nct6775, since I have seen other people having issues with sensors on motherboards similar to mine (https://bbs.archlinux.org/viewtopic.php?id=296949), it made lm-sensors list more things, but still nothing on the GPU. |
|
Back to top |
|
 |
pietinger Moderator

Joined: 17 Oct 2006 Posts: 5641 Location: Bavaria
|
Posted: Thu May 01, 2025 11:42 am Post subject: |
|
|
0x1000000,
after reviewing your two files I have to admit I don't know why sensors-detect has a problem. You did (almost) everything right: i2c,pinctrl and spi is correctly enabled (even DEBUG_FS is still enabled; normally I don't like it because of the security problem; but some modules output data through it); AMDGPU is enabled as <M>module; CONFIG_X86_AMD_PLATFORM_DEVICE is enabled. The only thing that could be problematic (the i915 from Intel also uses it) is this:
Code: | # CONFIG_TRANSPARENT_HUGEPAGE is not set |
However, I am afraid that it is not due to this missing module (you should activate it anyway) The following also has nothing to do with the problem, but you should also change it:
Activate it:
Code: | # CONFIG_LRU_GEN is not set |
Deactivate it:
Code: | CONFIG_FB_VGA16=y
CONFIG_FB_UVESA=y |
because:
Code: | [ 0.578643] uvesafb: failed to execute /sbin/v86d
[ 0.578646] uvesafb: make sure that the v86d helper is installed and executable
[ 0.578650] uvesafb: Getting VBE info block failed (eax=0x4f00, err=-2)
[ 0.578653] uvesafb: vbe_init() failed with -22
[ 0.578655] uvesafb uvesafb.0: probe with driver uvesafb failed with error -22 |
Did you boot with our GentooLIVE CD? Does the sensors-detect then get its information? If not, then it will probably not be possible. If yes, have you checked all modules with “lsmod”? _________________ https://wiki.gentoo.org/wiki/User:Pietinger |
|
Back to top |
|
 |
0x1000000 n00b

Joined: 22 Apr 2024 Posts: 18
|
Posted: Thu May 01, 2025 12:24 pm Post subject: |
|
|
pietinger wrote: | 0x1000000,
after reviewing your two files I have to admit I don't know why sensors-detect has a problem. You did (almost) everything right: i2c,pinctrl and spi is correctly enabled (even DEBUG_FS is still enabled; normally I don't like it because of the security problem; but some modules output data through it); AMDGPU is enabled as <M>module; CONFIG_X86_AMD_PLATFORM_DEVICE is enabled. The only thing that could be problematic (the i915 from Intel also uses it) is this:
Code: | # CONFIG_TRANSPARENT_HUGEPAGE is not set |
However, I am afraid that it is not due to this missing module (you should activate it anyway) The following also has nothing to do with the problem, but you should also change it:
Activate it:
Code: | # CONFIG_LRU_GEN is not set |
Deactivate it:
Code: | CONFIG_FB_VGA16=y
CONFIG_FB_UVESA=y |
because:
Code: | [ 0.578643] uvesafb: failed to execute /sbin/v86d
[ 0.578646] uvesafb: make sure that the v86d helper is installed and executable
[ 0.578650] uvesafb: Getting VBE info block failed (eax=0x4f00, err=-2)
[ 0.578653] uvesafb: vbe_init() failed with -22
[ 0.578655] uvesafb uvesafb.0: probe with driver uvesafb failed with error -22 |
Did you boot with our GentooLIVE CD? Does the sensors-detect then get its information? If not, then it will probably not be possible. If yes, have you checked all modules with “lsmod”? |
I applied the kernel config changes, nothing changed.
I booted into the livecd, mounted my root partition and used arch-chroot on it, used sensors-detect then sensors.
This time the GPU information part was just missing completely, I noticed a new sensor though, spd5118, after googling it I figured out that thats my RAM, so I guess there is no way for me to see the GPU temps. Is this a motherboard issue, or? |
|
Back to top |
|
 |
NeddySeagoon Administrator


Joined: 05 Jul 2003 Posts: 55200 Location: 56N 3W
|
Posted: Thu May 01, 2025 12:33 pm Post subject: |
|
|
0x1000000,
Now you have just rebuilt your kernel, what does have to say?
It's always good to check taht you are running the kernel you think you are. _________________ Regards,
NeddySeagoon
Computer users fall into two groups:-
those that do backups
those that have never had a hard drive fail. |
|
Back to top |
|
 |
0x1000000 n00b

Joined: 22 Apr 2024 Posts: 18
|
Posted: Thu May 01, 2025 12:39 pm Post subject: |
|
|
NeddySeagoon wrote: | 0x1000000,
Now you have just rebuilt your kernel, what does have to say?
It's always good to check taht you are running the kernel you think you are. |
uname -a shows like a 8 min difference compared to current time, I assume thats the build date of the kernel.
I know I use the kernel with the updated config since I just enabled the spd5118 driver that I mentioned for my RAM and it shows up in sensors. |
|
Back to top |
|
 |
Anon-E-moose Watchman


Joined: 23 May 2008 Posts: 6272 Location: Dallas area
|
Posted: Thu May 01, 2025 12:45 pm Post subject: |
|
|
since you're using an initram make sure it's rebuilt after kernel. _________________ UM780, 6.12 zen kernel, gcc 13, openrc, wayland |
|
Back to top |
|
 |
NeddySeagoon Administrator


Joined: 05 Jul 2003 Posts: 55200 Location: 56N 3W
|
Posted: Thu May 01, 2025 12:46 pm Post subject: |
|
|
0x1000000,
Yes, the date/time is the running kernel build date and time.
It's fairly common for users to fix a problem in the kernel but not know it because they are not booting the kernel they thought they were.
You are not one of those users, this time, anyway. _________________ Regards,
NeddySeagoon
Computer users fall into two groups:-
those that do backups
those that have never had a hard drive fail. |
|
Back to top |
|
 |
0x1000000 n00b

Joined: 22 Apr 2024 Posts: 18
|
Posted: Thu May 01, 2025 12:47 pm Post subject: |
|
|
Anon-E-moose wrote: | since you're using an initram make sure it's rebuilt after kernel. |
I have initramfs/initrd disabled. |
|
Back to top |
|
 |
NeddySeagoon Administrator


Joined: 05 Jul 2003 Posts: 55200 Location: 56N 3W
|
Posted: Thu May 01, 2025 12:54 pm Post subject: |
|
|
0x1000000,
With initramfs/initrd disabled, how does Code: | initrd=\EFI\Gentoo\amd-uc.img | to load microcode updates work?
You can build that into your kernel, here Code: | CONFIG_EXTRA_FIRMWARE="amd-ucode/microcode_amd.bin amd-ucode/microcode_amd_fam15h.bin amd-ucode/microcode_amd_fam16h.bin amd-ucode/microcode_amd_fam17h.bin amd/amd_sev_fam17h_model0xh.sbin amd/amd_sev_fam17h_model3xh.sbin amd-ucode/microcode_amd_fam19h.bin amd/amd_sev_fam19h_model0xh.sbin"
CONFIG_EXTRA_FIRMWARE_DIR="/lib/firmware" |
Ah ... you did.
The initrd on the kernel command line is just a leftover :) _________________ Regards,
NeddySeagoon
Computer users fall into two groups:-
those that do backups
those that have never had a hard drive fail. |
|
Back to top |
|
 |
pietinger Moderator

Joined: 17 Oct 2006 Posts: 5641 Location: Bavaria
|
Posted: Thu May 01, 2025 5:43 pm Post subject: |
|
|
0x1000000 wrote: | [...] so I guess there is no way for me to see the GPU temps. Is this a motherboard issue, or? |
I dont think it is a motherboard issue (if it works under Windows) ... rather than the amdgpu is not able to do it ... so, I suggest to wait for new kernel versions (AMD changes something in EVERY minor kernel version ... since ... I cannot remember ... seems to be a neverending story ). Yes, all other is okay (you could harden your kernel a little bit more ) ... and yes your grub gives this parameter "initrd=" because you surely have the use-flag "initramfs" enabled globally ... then the "make install" will create an initramfs ... and the grub-mkconfig finds it and changes the options ... but, dont worry, this is only a beauty mistake and does not hurt (because you have disabled initramfs in .config; like me ). _________________ https://wiki.gentoo.org/wiki/User:Pietinger |
|
Back to top |
|
 |
0x1000000 n00b

Joined: 22 Apr 2024 Posts: 18
|
Posted: Fri May 02, 2025 10:14 am Post subject: |
|
|
I booted into the GUI Gentoo livecd this time, instead of the minimal install one and the sensors command there did show my GPU temps.
After comparing the lsmod output, I enabled some of the modules that seemed important (asus_wmi, gpio_amdpt, eeepc_wmi) but sensors still shows no info about my GPU.
I tried to cat the temps:
Code: | cat /sys/class/hwmon/hwmon7/temp1_input |
And it says cat: temp1_input: Invalid argument
Running it as root returns the same.
No clue what I'm missing here. |
|
Back to top |
|
 |
Anon-E-moose Watchman


Joined: 23 May 2008 Posts: 6272 Location: Dallas area
|
Posted: Fri May 02, 2025 10:40 am Post subject: |
|
|
0x1000000 wrote: | I booted into the GUI Gentoo livecd this time, instead of the minimal install one and the sensors command there did show my GPU temps.
After comparing the lsmod output, I enabled some of the modules that seemed important (asus_wmi, gpio_amdpt, eeepc_wmi) but sensors still shows no info about my GPU.
I tried to cat the temps:
Code: | cat /sys/class/hwmon/hwmon7/temp1_input |
And it says cat: temp1_input: Invalid argument
Running it as root returns the same.
No clue what I'm missing here. |
You need to check the .config file, there are options that aren't module related.
foir example (on my system)
Code: | CONFIG_PINCTRL_AMD=y
# CONFIG_GPIO_AMDPT is not set |
As I said earlier, I'm pretty sure it's a configuration problem.
It's just that hardware is so different between not only processors, but mb manufacturers and video card manf, so it's hard to give a concrete answer.
I usually fix things with trial and error, make a change to config, and see what happens. Good luck. _________________ UM780, 6.12 zen kernel, gcc 13, openrc, wayland |
|
Back to top |
|
 |
pietinger Moderator

Joined: 17 Oct 2006 Posts: 5641 Location: Bavaria
|
Posted: Fri May 02, 2025 11:25 am Post subject: |
|
|
Anon-E-moose wrote: | As I said earlier, I'm pretty sure it's a configuration problem. |
If it works with our GentooLiveCD THEN it is a configuration problem.
Anon-E-moose wrote: | I usually fix things with trial and error, make a change to config, and see what happens. Good luck. |
It is easier: After checking with "lsmod" and enabling the same in the manually kernel .config you must watch only statically settings in our dist-kernel ... there exist not many ... a quick search showed me this difference:
Code: | CONFIG_THERMAL_STATISTICS=y |
_________________ https://wiki.gentoo.org/wiki/User:Pietinger |
|
Back to top |
|
 |
Anon-E-moose Watchman


Joined: 23 May 2008 Posts: 6272 Location: Dallas area
|
Posted: Fri May 02, 2025 1:01 pm Post subject: |
|
|
It's not a new card (6900xt) so should be well supported.
I would be tempted to take the config from the live cd and use it as the base. Modify as needed (remove options you don't have hardware for)
and add any hardware you have that isn't in the live cd.
Edit to add: The card itself uses soc's, as many newer amd cards do.
It's possible that the sensor reporting is tied into the snd_soc stuff. _________________ UM780, 6.12 zen kernel, gcc 13, openrc, wayland |
|
Back to top |
|
 |
0x1000000 n00b

Joined: 22 Apr 2024 Posts: 18
|
Posted: Sat May 03, 2025 2:35 pm Post subject: |
|
|
Anon-E-moose wrote: | It's not a new card (6900xt) so should be well supported.
I would be tempted to take the config from the live cd and use it as the base. Modify as needed (remove options you don't have hardware for)
and add any hardware you have that isn't in the live cd.
Edit to add: The card itself uses soc's, as many newer amd cards do.
It's possible that the sensor reporting is tied into the snd_soc stuff. |
I tried enabling anything AMD related snd_soc stuff but it didnt help. (I tried both as module and as built in)
Code: |
grep -i soc_amd .config
CONFIG_SND_SOC_AMD_ACP=y
CONFIG_SND_SOC_AMD_CZ_DA7219MX98357_MACH=y
CONFIG_SND_SOC_AMD_CZ_RT5645_MACH=y
CONFIG_SND_SOC_AMD_ST_ES8336_MACH=y
CONFIG_SND_SOC_AMD_ACP3x=y
CONFIG_SND_SOC_AMD_RENOIR=y
CONFIG_SND_SOC_AMD_RENOIR_MACH=y
CONFIG_SND_SOC_AMD_ACP5x=y
CONFIG_SND_SOC_AMD_VANGOGH_MACH=y
CONFIG_SND_SOC_AMD_ACP6x=y
CONFIG_SND_SOC_AMD_YC_MACH=y
CONFIG_SND_SOC_AMD_ACP_COMMON=y
CONFIG_SND_SOC_AMD_ACP_PDM=y
CONFIG_SND_SOC_AMD_ACP_LEGACY_COMMON=y
CONFIG_SND_SOC_AMD_ACP_I2S=y
CONFIG_SND_SOC_AMD_ACP_PCM=y
CONFIG_SND_SOC_AMD_ACP_PCI=y
CONFIG_SND_SOC_AMD_MACH_COMMON=y
CONFIG_SND_SOC_AMD_LEGACY_MACH=y
CONFIG_SND_SOC_AMD_SOF_MACH=y
CONFIG_SND_SOC_AMD_RPL_ACP6x=y
CONFIG_SND_SOC_AMD_ACP63_TOPLEVEL=y
CONFIG_SND_SOC_AMD_SOUNDWIRE_LINK_BASELINE=y
CONFIG_SND_SOC_AMD_PS=y
CONFIG_SND_SOC_AMD_PS_MACH=y |
What would be the best way to get the config from the livecd? I tried to look online but couldnt find much about it. |
|
Back to top |
|
 |
pietinger Moderator

Joined: 17 Oct 2006 Posts: 5641 Location: Bavaria
|
Posted: Sat May 03, 2025 3:28 pm Post subject: |
|
|
0x1000000 wrote: | What would be the best way to get the config from the livecd? I tried to look online but couldnt find much about it. |
Boot with it and then extract the kernel config with:
Code: | zcat /proc/config.gz > config.dist-kernel |
Back to your problem: I dont think it is a missing SOC (SystemOnaChip) module. IF you have verified that every LOADED module ("lsmod") is in your kernel .config THEN it can be only an option which is configured statically into the dist-kernel-config. And then there are not so many, because you dont need to check networking-"things" or crypto-things. With this approach we found one setting which is configured statically into the dist-kernel, we need for some machines (but for sound): https://wiki.gentoo.org/wiki/User:Pietinger/Experimental/Manual_Configuring_Current_Kernel#Serial_Bus_Multi_Instantiate
(I am sorry I am not an AMD graphics expert)
Maybe think also about these two kernel command line parameters:
Code: | acpi_enforce_resources=no ... amd_prefcore=enable |
(They are not necessary for our dist-kernel ...)
If you will find it (I am sure) please tell us too (I want add it into my wiki article). _________________ https://wiki.gentoo.org/wiki/User:Pietinger |
|
Back to top |
|
 |
|