Forums

Skip to content

Advanced search
  • Quick links
    • Unanswered topics
    • Active topics
    • Search
  • FAQ
  • Login
  • Register
  • Board index Assistance Kernel & Hardware
  • Search

Issue with amdgpu card and powerplay since kernel update

Kernel not recognizing your hardware? Problems with power management or PCMCIA? What hardware is compatible with Gentoo? See here. (Only for kernels supported by Gentoo.)
Post Reply
Advanced search
10 posts • Page 1 of 1
Author
Message
ZenoOfElea
n00b
n00b
Posts: 6
Joined: Fri Jan 20, 2017 11:49 pm

Issue with amdgpu card and powerplay since kernel update

  • Quote

Post by ZenoOfElea » Wed Apr 03, 2019 11:35 am

I have noticed a recent issue with my amdgpu based R9 380 (Volcanic islands series) graphics card. I am not sure if this is caused by a mistaken kernel configuration or what exactly triggers the problem but during the boot process when the DRM KMS is taking over from the legacy 80x24 framebuffer the systems freezes for 15 seconds or so and following is printed to the kernel message buffer.

Code: Select all

[   27.574874] amdgpu: [powerplay] Failed to retrieve minimum clocks.
[   27.574875] amdgpu: [powerplay] Error in phm_get_clock_info 
[   27.575091] [drm] dce110_link_encoder_construct: Failed to get encoder_cap_info from VBIOS with error code 4!
[   27.575103] [drm] dce110_link_encoder_construct: Failed to get encoder_cap_info from VBIOS with error code 4!
[   27.575114] [drm] dce110_link_encoder_construct: Failed to get encoder_cap_info from VBIOS with error code 4!
[   27.575442] [drm] Display Core initialized with v3.1.59!
[   27.628472] [drm] Supports vblank timestamp caching Rev 2 (21.10.2013).
[   27.628473] [drm] Driver supports precise vblank timestamp query.
[   27.677800] [drm] UVD initialized successfully.
[   27.888890] [drm] VCE initialized successfully.
[   27.890434] [drm] fb mappable at 0xE0E25000
[   27.890435] [drm] vram apper at 0xE0000000
[   27.890436] [drm] size 8294400
[   27.890436] [drm] fb depth is 24
[   27.890437] [drm]    pitch is 7680
[   27.890574] fbcon: amdgpudrmfb (fb0) is primary device
[   28.031451] Console: switching to colour frame buffer device 240x67
[   28.053547] amdgpu 0000:02:00.0: fb0: amdgpudrmfb frame buffer device
[   28.410220] amdgpu: [powerplay] 
                failed to send message 5d ret is 0 
[   28.760175] amdgpu: [powerplay] 
                last message was failed ret is 0
[   29.110123] amdgpu: [powerplay] 
                failed to send message 148 ret is 0 
[   29.809993] amdgpu: [powerplay] 
                last message was failed ret is 0
[   30.159941] amdgpu: [powerplay] 
                failed to send message 145 ret is 0 
[   30.859815] amdgpu: [powerplay] 
                last message was failed ret is 0
[   31.209777] amdgpu: [powerplay] 
                failed to send message 146 ret is 0 
[   31.568264] amdgpu: [powerplay] 
                last message was failed ret is 0
[   31.914555] amdgpu: [powerplay] 
                last message was failed ret is 0
[   31.920737] amdgpu: [powerplay] 
                failed to send message 155 ret is 0 
[   32.267039] amdgpu: [powerplay] 
                failed to send message 260 ret is 0 
[   32.273256] amdgpu: [powerplay] 
                last message was failed ret is 0
[   32.625681] amdgpu: [powerplay] 
                failed to send message 15b ret is 0 
[   32.969465] amdgpu: [powerplay] 
                last message was failed ret is 0
[   33.319269] amdgpu: [powerplay] 
                failed to send message 260 ret is 0 
[   34.018608] amdgpu: [powerplay] 
                last message was failed ret is 0
[   34.368287] amdgpu: [powerplay] 
                failed to send message 260 ret is 0 
[   35.067659] amdgpu: [powerplay] 
                last message was failed ret is 0
[   35.417341] amdgpu: [powerplay] 
                failed to send message 260 ret is 0 
[   36.116695] amdgpu: [powerplay] 
                last message was failed ret is 0
[   36.466369] amdgpu: [powerplay] 
                failed to send message 260 ret is 0 
[   37.165719] amdgpu: [powerplay] 
                last message was failed ret is 0
[   37.515390] amdgpu: [powerplay] 
                failed to send message 260 ret is 0 
[   38.214738] amdgpu: [powerplay] 
                last message was failed ret is 0
[   38.564429] amdgpu: [powerplay] 
                failed to send message 260 ret is 0 
[   39.263779] amdgpu: [powerplay] 
                last message was failed ret is 0
[   39.613451] amdgpu: [powerplay] 
                failed to send message 260 ret is 0 
[   40.312770] amdgpu: [powerplay] 
                last message was failed ret is 0
[   40.662444] amdgpu: [powerplay] 
                failed to send message 260 ret is 0 
[   41.361799] amdgpu: [powerplay] 
                last message was failed ret is 0
[   41.711474] amdgpu: [powerplay] 
                failed to send message 260 ret is 0 
[   42.410827] amdgpu: [powerplay] 
                last message was failed ret is 0
[   42.760505] amdgpu: [powerplay] 
                failed to send message 260 ret is 0 
[   43.459853] amdgpu: [powerplay] 
                last message was failed ret is 0
[   43.809519] amdgpu: [powerplay] 
                failed to send message 260 ret is 0 
[   43.809611] [drm] Initialized amdgpu 3.27.0 20150101 for 0000:02:00.0 on minor 0
[   43.809974] [drm] Initialized i915 1.6.0 20180719 for 0000:00:02.0 on minor 1
[   43.825122] [drm] Cannot find any crtc or sizes
[   43.830205] [drm] Cannot find any crtc or sizes
[   43.835234] [drm] Cannot find any crtc or sizes
[   46.272965] amdgpu: [powerplay] 
                last message was failed ret is 0
[   46.653008] amdgpu: [powerplay] 
                failed to send message 154 ret is 0 
[   47.658171] [drm:amdgpu_uvd_ring_test_ib [amdgpu]] *ERROR* amdgpu: (0)IB test timed out.
[   47.658205] [drm:amdgpu_ib_ring_tests [amdgpu]] *ERROR* amdgpu: failed testing IB on ring 12 (-110).
[   48.209963] amdgpu: [powerplay] 
                last message was failed ret is 0
[   48.561668] amdgpu: [powerplay] 
                failed to send message 15a ret is 0 
[   48.561885] [drm:process_one_work] *ERROR* ib ring test failed (-110).
[   49.942961] amdgpu: [powerplay] 
                last message was failed ret is 0
[   50.292627] amdgpu: [powerplay] 
                failed to send message 15b ret is 0 
[   50.957807] amdgpu: [powerplay] 
                last message was failed ret is 0
[   51.307466] amdgpu: [powerplay] 
                failed to send message 155 ret is 0 
This only becomes an issue after the system boots when I run the sensors program found in the lm_sensors package. The sensors program works but causes a temporary freeze spams the message buffer with:

Code: Select all


[13835.186423] amdgpu: [powerplay] 
                last message was failed ret is 0
[13835.537600] amdgpu: [powerplay] 
                failed to send message 282 ret is 0 
[13835.888783] amdgpu: [powerplay] 
                last message was failed ret is 0
[13836.239905] amdgpu: [powerplay] 
                failed to send message 170 ret is 0 
[13836.591997] amdgpu: [powerplay] 
                last message was failed ret is 0
[13836.943289] amdgpu: [powerplay] 
                failed to send message 171 ret is 0 
[13837.295225] amdgpu: [powerplay] 
                last message was failed ret is 0
[13837.646192] amdgpu: [powerplay] 
                failed to send message 171 ret is 0 
[13837.998352] amdgpu: [powerplay] 
                last message was failed ret is 0
[13838.349518] amdgpu: [powerplay] 
                failed to send message 171 ret is 0 
[13838.701678] amdgpu: [powerplay] 
                last message was failed ret is 0
[13839.052848] amdgpu: [powerplay] 
                failed to send message 171 ret is 0 
[13839.404880] amdgpu: [powerplay] 
                last message was failed ret is 0
[13839.755661] amdgpu: [powerplay] 
                failed to send message 171 ret is 0 
[13840.107747] amdgpu: [powerplay] 
                last message was failed ret is 0
[13840.459014] amdgpu: [powerplay] 
                failed to send message 171 ret is 0 
[13840.811228] amdgpu: [powerplay] 
                last message was failed ret is 0
[13841.162384] amdgpu: [powerplay] 
                failed to send message 171 ret is 0 
[13841.514520] amdgpu: [powerplay] 
                last message was failed ret is 0
[13841.865626] amdgpu: [powerplay] 
                failed to send message 171 ret is 0 
[13842.217726] amdgpu: [powerplay] 
                last message was failed ret is 0
[13842.569209] amdgpu: [powerplay] 
                failed to send message 171 ret is 0 
[13842.921494] amdgpu: [powerplay] 
                last message was failed ret is 0
[13843.272727] amdgpu: [powerplay] 
                failed to send message 171 ret is 0 
I am at a loss of what I should do any to tackle this problem and suggestion or information would be greatly appreciated[/code]
Top
davee
n00b
n00b
Posts: 1
Joined: Sat May 25, 2019 9:30 pm

  • Quote

Post by davee » Sat May 25, 2019 9:38 pm

Hey,

I have had a similar issue with AMDGPU with my R9 290 (Sea Islands). While I don't have the same 15 second freeze on boot, I do the get the intermittent freezes during normal usage of my system. I have also narrowed this down to the lm_sensors package, and specifically the issue occurs after a failure to read the fan1 state.

Code: Select all

# sensors -u
amdgpu-pci-0100
Adapter: PCI adapter
vddgfx:
  in0_input: 1.000
fan1:
ERROR: Can't get value of subfeature fan1_input: Can't read
temp1:
  temp1_input: 65.000
  temp1_crit: 104000.000
  temp1_crit_hyst: -273.150
power1:
  power1_average: 66.165
  power1_cap: 225.000
When this happens, I also get a similar powerplay error message in the amdgpu driver:

Code: Select all

amdgpu: [powerplay] 
 failed to send message 282 ret is 254
While this error has always been displayed for me, the freezing issue has only appeared for me after updating kernel from 4.19.27 to 4.19.44. I am looking for a more precise cause for this, but so far I have not found anything. Did you manage to get any further with your issue?
Top
Goverp
Advocate
Advocate
User avatar
Posts: 2403
Joined: Wed Mar 07, 2007 6:41 pm

  • Quote

Post by Goverp » Sun May 26, 2019 9:17 pm

FWIW I too get a very annoying 15 sec freeze on booting. Mine is a Radeon RX570. Thanks for the hints about lm-sensors - I'll dig a little. AFAIR I got rid of that package because my old AMD Phenom motherboard tells lies, rendering lm-sensors useless.
Greybeard
Top
miiichael
n00b
n00b
Posts: 1
Joined: Wed Jun 12, 2019 8:16 am

  • Quote

Post by miiichael » Wed Jun 12, 2019 8:41 am

Hi,

For the benefit of posters here, and googlers in general, here are my discoveries. R9 290 on Debian (shhh, don't tell anyone!). 4.19.0 AMD64 kernel.

Anyway, I've just noticed that when something touches /sys/class/hwmon/hwmon3/power1_average is the cause of the kernel error messages I get:

Code: Select all

root@joyola:/home/michael# time cat "/sys/class/hwmon/hwmon3/power1_average";tail /var/log/kern.log|grep $(date +%T)
32140000

real    0m0.498s
user    0m0.000s
sys     0m0.497s
Jun 12 16:26:00 joyola kernel: [399556.316094] amdgpu: [powerplay]
Jun 12 16:26:00 joyola kernel: [399556.316094]  failed to send message 282 ret is 254
I found this out by strace'ing /usr/bin/sensors, which on my system is invoked half a dozen times every five minutes via munin-node.

This does confirm suspicions that this is a kernel issue (as opposed to the xorg driver, or other ancillary libraries, etc).

I can't comment on boot delays, as I don't really reboot often enough to be sure (plus I think my boot delay problems relate mostly to both eth0 and my ethernet over power waiting for the other to wake up before waking themselves up...).

Edited to add: BTW I have "radeon.cik_support=0 amdgpu.cik_support=1 radeon.si_support=0 amdgpu.si_support=1 amdgpu.dc_log=1 amdgpu.dc=0" set, if that matters.
Top
TigerJr
Guru
Guru
Posts: 540
Joined: Tue Jun 19, 2007 9:37 am

  • Quote

Post by TigerJr » Mon Jun 24, 2019 2:25 pm

miiichael wrote:Hi,

For the benefit of posters here, and googlers in general, here are my discoveries. R9 290 on Debian (shhh, don't tell anyone!). 4.19.0 AMD64 kernel.

Anyway, I've just noticed that when something touches /sys/class/hwmon/hwmon3/power1_average is the cause of the kernel error messages I get:

Code: Select all

root@joyola:/home/michael# time cat "/sys/class/hwmon/hwmon3/power1_average";tail /var/log/kern.log|grep $(date +%T)
32140000

real    0m0.498s
user    0m0.000s
sys     0m0.497s
Jun 12 16:26:00 joyola kernel: [399556.316094] amdgpu: [powerplay]
Jun 12 16:26:00 joyola kernel: [399556.316094]  failed to send message 282 ret is 254
I found this out by strace'ing /usr/bin/sensors, which on my system is invoked half a dozen times every five minutes via munin-node.

This does confirm suspicions that this is a kernel issue (as opposed to the xorg driver, or other ancillary libraries, etc).

I can't comment on boot delays, as I don't really reboot often enough to be sure (plus I think my boot delay problems relate mostly to both eth0 and my ethernet over power waiting for the other to wake up before waking themselves up...).

Edited to add: BTW I have "radeon.cik_support=0 amdgpu.cik_support=1 radeon.si_support=0 amdgpu.si_support=1 amdgpu.dc_log=1 amdgpu.dc=0" set, if that matters.

grep $(date +%T) is not right for finding reasons of kernel messages, i think, but im shure error is in amdgpu kernel driver,

just try modern kernel 5.0.x revision and if error repeats again post message here


cat /sys/class/hwmon/hwmon3/in0_input

Have you got same message if you get current voltage ?
Do not use gentoo, it die
Top
Goverp
Advocate
Advocate
User avatar
Posts: 2403
Joined: Wed Mar 07, 2007 6:41 pm

  • Quote

Post by Goverp » Wed Jun 26, 2019 3:05 pm

You may find my partial solution of interest.
Greybeard
Top
MasterCATZ
n00b
n00b
Posts: 6
Joined: Tue Dec 13, 2011 11:47 pm

  • Quote

Post by MasterCATZ » Thu Oct 31, 2019 10:51 pm

"old cards and kernels dpm=1 enabled the new dpm; then when AMD power play came out, they swapped its definition and dpm=0 would select power play and dpm=1 would still select the old power management"



I wounder if this is part of the reason I randomly just loose fan control on my R9 290's

I can not manually adjust anything .. and neither can the bios

because everything started roasting because fans would randomly get forced to 20% by something I modded the bios with higher fan speeds , now I still see this situation even after I am blocked from accessing fans manually , nothing wrong with fans as when GPU hits 96 deg they do go 100% ..
Top
MasterCATZ
n00b
n00b
Posts: 6
Joined: Tue Dec 13, 2011 11:47 pm

  • Quote

Post by MasterCATZ » Tue Nov 26, 2019 4:59 am

anyone found a solution for this yet , its cluttering up my log files and a huge waste of space just from this spam that is created every second ..
5.3.11-050311-generic

and I need PSensor so I can keep track of when AMDGPU fan control has been taken over and disables my manual control
So can reboot before it becomes an inferno because AMD keeps forcing the fan below 20% when it needs 60%+ to keep under 80 deg


Nov 26 14:58:35 aio psensor.desktop[894]: [2019-11-26T04:58:34] [ERR] lmsensor: Cannot get value of subfeature fan1_input: Can't read.
Nov 26 14:58:35 aio kernel: [106737.063836] amdgpu: [powerplay]
Nov 26 14:58:35 aio kernel: [106737.063836] failed to send message 282 ret is 254
Nov 26 14:58:36 aio kernel: [106738.062432] amdgpu: [powerplay]
Nov 26 14:58:36 aio kernel: [106738.062432] failed to send message 282 ret is 254
Nov 26 14:58:37 aio psensor.desktop[894]: [2019-11-26T04:58:36] [ERR] lmsensor: Cannot get value of subfeature fan1_input: Can't read.
Nov 26 14:58:37 aio kernel: [106739.061712] amdgpu: [powerplay]
Nov 26 14:58:37 aio kernel: [106739.061712] failed to send message 282 ret is 254
Nov 26 14:58:38 aio kernel: [106740.061579] amdgpu: [powerplay]
Nov 26 14:58:38 aio kernel: [106740.061579] failed to send message 282 ret is 254
Nov 26 14:58:39 aio psensor.desktop[894]: [2019-11-26T04:58:38] [ERR] lmsensor: Cannot get value of subfeature fan1_input: Can't read.
Nov 26 14:58:39 aio kernel: [106741.061242] amdgpu: [powerplay]
Nov 26 14:58:39 aio kernel: [106741.061242] failed to send message 282 ret is 254
Nov 26 14:58:40 aio kernel: [106742.062397] amdgpu: [powerplay]
Nov 26 14:58:40 aio kernel: [106742.062397] failed to send message 282 ret is 254
Nov 26 14:58:41 aio psensor.desktop[894]: [2019-11-26T04:58:41] [ERR] lmsensor: Cannot get value of subfeature fan1_input: Can't read.
Nov 26 14:58:41 aio kernel: [106743.061627] amdgpu: [powerplay]
Nov 26 14:58:41 aio kernel: [106743.061627] failed to send message 282 ret is 254
Nov 26 14:58:42 aio kernel: [106744.061807] amdgpu: [powerplay]
Nov 26 14:58:42 aio kernel: [106744.061807] failed to send message 282 ret is 254
Nov 26 14:58:43 aio psensor.desktop[894]: [2019-11-26T04:58:43] [ERR] lmsensor: Cannot get value of subfeature fan1_input: Can't read.
Nov 26 14:58:43 aio kernel: [106745.061298] amdgpu: [powerplay]
Nov 26 14:58:43 aio kernel: [106745.061298] failed to send message 282 ret is 254
Nov 26 14:58:44 aio kernel: [106746.061593] amdgpu: [powerplay]
Nov 26 14:58:44 aio kernel: [106746.061593] failed to send message 282 ret is 254
Nov 26 14:58:45 aio psensor.desktop[894]: [2019-11-26T04:58:45] [ERR] lmsensor: Cannot get value of subfeature fan1_input: Can't read.
Nov 26 14:58:45 aio kernel: [106747.064343] amdgpu: [powerplay]
Nov 26 14:58:45 aio kernel: [106747.064343] failed to send message 282 ret is 254
Nov 26 14:58:46 aio kernel: [106748.061517] amdgpu: [powerplay]
Nov 26 14:58:46 aio kernel: [106748.061517] failed to send message 282 ret is 254
Top
azp
Guru
Guru
Posts: 457
Joined: Sun Nov 16, 2003 5:48 pm
Location: Sweden
Contact:
Contact azp
Website

  • Quote

Post by azp » Sat May 30, 2020 6:28 am

Have you reported the issue as a bug to the devs?

EDIT: There seems to be a bug report on something similar: https://bugzilla.kernel.org/show_bug.cgi?id=204609
Weeks of coding can save you hours of planning.
Top
MasterCATZ
n00b
n00b
Posts: 6
Joined: Tue Dec 13, 2011 11:47 pm

  • Quote

Post by MasterCATZ » Sat May 30, 2020 7:05 am

I had to turn off systemd logs

every few days 2tb log file was created that needing nuking
Top
Post Reply

10 posts • Page 1 of 1

Return to “Kernel & Hardware”

Jump to
  • Assistance
  • ↳   News & Announcements
  • ↳   Frequently Asked Questions
  • ↳   Installing Gentoo
  • ↳   Multimedia
  • ↳   Desktop Environments
  • ↳   Networking & Security
  • ↳   Kernel & Hardware
  • ↳   Portage & Programming
  • ↳   Gamers & Players
  • ↳   Other Things Gentoo
  • ↳   Unsupported Software
  • Discussion & Documentation
  • ↳   Documentation, Tips & Tricks
  • ↳   Gentoo Chat
  • ↳   Gentoo Forums Feedback
  • ↳   Duplicate Threads
  • International Gentoo Users
  • ↳   中文 (Chinese)
  • ↳   Dutch
  • ↳   Finnish
  • ↳   French
  • ↳   Deutsches Forum (German)
  • ↳   Diskussionsforum
  • ↳   Deutsche Dokumentation
  • ↳   Greek
  • ↳   Forum italiano (Italian)
  • ↳   Forum di discussione italiano
  • ↳   Risorse italiane (documentazione e tools)
  • ↳   Polskie forum (Polish)
  • ↳   Instalacja i sprzęt
  • ↳   Polish OTW
  • ↳   Portuguese
  • ↳   Documentação, Ferramentas e Dicas
  • ↳   Russian
  • ↳   Scandinavian
  • ↳   Spanish
  • ↳   Other Languages
  • Architectures & Platforms
  • ↳   Gentoo on ARM
  • ↳   Gentoo on PPC
  • ↳   Gentoo on Sparc
  • ↳   Gentoo on Alternative Architectures
  • ↳   Gentoo on AMD64
  • ↳   Gentoo for Mac OS X (Portage for Mac OS X)
  • Board index
  • All times are UTC
  • Delete cookies

© 2001–2026 Gentoo Foundation, Inc.

Powered by phpBB® Forum Software © phpBB Limited

Privacy Policy

 

 

magic