Gentoo Forums
Gentoo Forums
Gentoo Forums
Quick Search: in
AMDGPU Difficulties [SOLVED]
View unanswered posts
View posts from last 24 hours
View posts from last 7 days

 
Reply to topic    Gentoo Forums Forum Index Desktop Environments
View previous topic :: View next topic  
Author Message
ISHAIM
Apprentice
Apprentice


Joined: 08 Oct 2006
Posts: 161
Location: Chicago, IL

PostPosted: Mon Jun 28, 2021 10:25 pm    Post subject: AMDGPU Difficulties [SOLVED] Reply with quote

Hi, please bear with me as this problem is difficult to diagnose and report on. I'm more than certain I've a working system underneath and I've installed Gentoo on this exact same hardware before; literally nothing has changed.

Reproducing the problem causes me to have to chroot back into my system to revert it back to a functioning state, a tedious task requiring specific kernel .config revision.

Effectively, attempting to set up the Graphics stack is giving me trouble. I've followed the AMDGPU Wiki before to great success. I'm not quite sure what I'm missing this time around.

In the past, I've gotten other AMD APU's working fine on an Acer Netbook, and loading /lib/firmware things. Not my first rodeo.

The screen effectively goes blank/black while the system is booting. I'm fairly certain this is happening as OpenRC is already actively loaded/running the remainder of my system. In fact, the reason I *know* the rest of my system is working while the Graphics stack is not is that I can still SSH into the system remotely/locally through my Raspberry Pi.

Experienced/advanced/wise enough to anticipate that should something like this happen, I enabled SSH for such debugging purposes from the get go, should the only singularly possible thing to be broken at this point would be my Graphics setup while all else is otherwise functional underneath.

Beyond this point, it gets very foggy/unclear to me whether or not /etc/portage/make.conf requires "radeonsi" or "radeon" if at all, or where to go from here.

Graphics card: Asus Arez RX-560

Output of `lspci`: 01:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Baffin [Radeon RX 460/560D / Pro 450/455/460/555/555X/560/560X] (rev e5)

Family: Artic Islands

Chipset Name: POLARIS11

*** Before staging (this works):
- /usr/src/linux/.config:
CONFIG_DRM=y
CONFIG_DRM_AMDGPU=n
- dmesg output:

*** After staging (this does not):
- /usr/src/linux/.config:
CONFIG_DRM=y
CONFIG_DRM_AMDGPU=y
CONFIG_EXTRA_FIRMWARE="amdgpu/polaris11_ce.bin amdgpu/polaris11_k_smc.bin amdgpu/polaris11_k2_smc.bin amdgpu/polaris11_k_mc.bin amdgpu/polaris11_mc.bin amdgpu/polaris11_me.bin amdgpu/polaris11_mec2.bin amdgpu/polaris11_mec.bin amdgpu/polaris11_pfp.bin amdgpu/polaris11_rlc.bin amdgpu/polaris11_sdma1.bin amdgpu/polaris11_sdma.bin amdgpu/polaris11_smc.bin amdgpu/polaris11_smc_sk.bin amdgpu/polaris11_uvd.bin amdgpu/polaris11_vce.bin"

Output of `echo` used to configure CONFIG_EXTRA_FIRMWARE: echo amdgpu/polaris11_{ce,k_smc,k2_smc,k_mc,mc,me,mec2,mec,pfp,rlc,sdma1,sdma,smc,smc_sk,uvd,vce}.bin

Value of CONFIG_EXTRA_FIRMWARE="amdgpu/polaris11_ce.bin amdgpu/polaris11_k_smc.bin amdgpu/polaris11_k2_smc.bin amdgpu/polaris11_k_mc.bin amdgpu/polaris11_mc.bin amdgpu/polaris11_me.bin amdgpu/polaris11_mec2.bin amdgpu/polaris11_mec.bin amdgpu/polaris11_pfp.bin amdgpu/polaris11_rlc.bin amdgpu/polaris11_sdma1.bin amdgpu/polaris11_sdma.bin amdgpu/polaris11_smc.bin amdgpu/polaris11_smc_sk.bin amdgpu/polaris11_uvd.bin amdgpu/polaris11_vce.bin"

This setup assumes I've compiled these `modules` into the kernel. Even when I've tried AMDGPU=M, OpenRC only gets a little further before producing the same blank/black screen.

Thanks for reading.
_________________
http://isaiassifuentes.net


Last edited by ISHAIM on Sat Jul 03, 2021 2:16 am; edited 1 time in total
Back to top
View user's profile Send private message
alamahant
Advocate
Advocate


Joined: 23 Mar 2019
Posts: 3879

PostPosted: Mon Jun 28, 2021 10:44 pm    Post subject: Reply with quote

Hi
Plz boot into a live dvd like ubuntu or calculate run lsmod identify the active gpu modules and try to translate them to kernel .config or use a binary kernel.........
Why do you feel the need to have the modules in-built rather than =m?
Is it faster this way?
Does it eliminate the need of an initrd?
:)
_________________
:)


Last edited by alamahant on Mon Jun 28, 2021 10:49 pm; edited 2 times in total
Back to top
View user's profile Send private message
NeddySeagoon
Administrator
Administrator


Joined: 05 Jul 2003
Posts: 54216
Location: 56N 3W

PostPosted: Mon Jun 28, 2021 10:47 pm    Post subject: Reply with quote

ISHAIM,

Code:
CONFIG_EXTRA_FIRMWARE="amdgpu/polaris11_ce.bin amdgpu/polaris11_k_smc.bin amdgpu/polaris11_k2_smc.bin amdgpu/polaris11_k_mc.bin amdgpu/polaris11_mc.bin amdgpu/polaris11_me.bin amdgpu/polaris11_mec2.bin amdgpu/polaris11_mec.bin amdgpu/polaris11_pfp.bin amdgpu/polaris11_rlc.bin amdgpu/polaris11_sdma1.bin amdgpu/polaris11_sdma.bin amdgpu/polaris11_smc.bin amdgpu/polaris11_smc_sk.bin amdgpu/polaris11_uvd.bin amdgpu/polaris11_vce.bin"


You load 16 amdgpu/polaris11* files but
Code:
$ ls /lib/firmware/amdgpu/polaris11* | wc -l
21
21 are provided. It might matter.

Since you can ssh in, look at dmesg. It will tell the first file that it couldn't find for each device.

You need VIDEO_CARDS="amdgpu radeonsi" to get the Xorg drivers and mesa to support your card.
_________________
Regards,

NeddySeagoon

Computer users fall into two groups:-
those that do backups
those that have never had a hard drive fail.
Back to top
View user's profile Send private message
ISHAIM
Apprentice
Apprentice


Joined: 08 Oct 2006
Posts: 161
Location: Chicago, IL

PostPosted: Tue Jun 29, 2021 4:15 am    Post subject: Reply with quote

NeddySeagoon,

I've corrected CONFIG_EXTRA_FIRMWARE to include all of the files in /lib/firmware/amdgpu. This allows the system to at least boot without crashing or screen black/blank; however `startx` still will not work (my Xorg.0.log):

https://bpa.st/6PCQ
_________________
http://isaiassifuentes.net
Back to top
View user's profile Send private message
wwdev16
n00b
n00b


Joined: 29 Aug 2018
Posts: 52

PostPosted: Tue Jun 29, 2021 7:09 am    Post subject: Reply with quote

Does dmesg | egrep 'kms|gpu|drm|fb' have anything interesting?
Since you seem to be using systemd, has the elogind use flag been disabled
for any package?

I have had black screen and then weird permission errors like you, but when using a new
kernel version or with an upgrade of xorg packages. AFAICT:
  • The amdgpu driver only works as a module
  • As a module amdgpu loads from /lib/firmware so I don't embed firmware in the kernel
    (requires having /lib/firmware in an initramfs though)
  • kernel changes to drm or video drivers often affect libdrm and mesa
What I do that makes things work:
  • Set /usr/src/linux to point to the kernel to be booted
  • VIDEO_CARDS="amdgpu radeon radeonsi fbdev vesa" (you might not need fbdev/vesa)
  • emerge --oneshot libdrm mesa
  • Reboot
  • If display-manager has issues, brute force:
    emerge --ask --verbose --oneshot $(qlist -IC x11- media-libs dev-libs)
    (Note I use xorg-server/drivers since xfce isn't wayland compatible, so I also
    include xorg-server xorg-drivers xfce-)
  • Reboot
  • If there are still problems, <explitive> :evil: and then emerge --emptytree.
  • Reboot
So far, emerge --emptytree has always worked. Have never found what extra packages needed to
be emerged to avoid needing to do emerge --emptytree.

Maybe there is something in that which will work for you.
Back to top
View user's profile Send private message
NeddySeagoon
Administrator
Administrator


Joined: 05 Jul 2003
Posts: 54216
Location: 56N 3W

PostPosted: Tue Jun 29, 2021 9:37 am    Post subject: Reply with quote

ISHAIM,

Code:
Fatal server error:
[    26.470] (EE) parse_vt_settings: Cannot open /dev/tty0 (Permission denied)


Either add elogind to USE in make.conf are rebuild everything to get elogind support, or rebuild xord with USE=suid to get the old behaviour.

Also you are missing
Code:
[    87.220] (II) LoadModule: "amdgpu"
[    87.220] (II) Loading /usr/lib64/xorg/modules/drivers/amdgpu_drv.so

Your log shows the radeon driver in use.
_________________
Regards,

NeddySeagoon

Computer users fall into two groups:-
those that do backups
those that have never had a hard drive fail.
Back to top
View user's profile Send private message
ISHAIM
Apprentice
Apprentice


Joined: 08 Oct 2006
Posts: 161
Location: Chicago, IL

PostPosted: Tue Jun 29, 2021 11:42 am    Post subject: Reply with quote

NeddySeagoon,

I've `elogind` in make.conf USE variable and have already `emerge -avtNDu @world`; still no dice.
There's something on the AMDGPU wiki about making sure radeon is blacklisted, although I explicitly am unsetting
Code:
CONFIG_DRM_RADEON=N
CONFIG_FB_RADEON=N

in the kernel .config.
Not quite sure where/why/how that's entering the picture.
Nor am I using systemd; explicitly stated before by mention of me using OpenRC, I'm unsure where/why/how that enters the picture either.

I've gotten some feedback about perhaps Polaris11 may not be the correct version of firmware for this card (RX-560 Arctic Islands class). I have no idea about that.

I've included *all* the *.bin's in /lib/firmware/amdgpu/polaris11* now.

I realize there is this concept of whether or not these "firmware modules" are compiled into the kernel binary; however the AMDGPU wiki is throwing me off quite a bit. In the past, I've just used =Y, not =M, and didn't have to bother with /lib/firmware files, although I remember having to manage this manually in the past on an AMD APU with =M.

I got by for a while with CONFIG_DRM_AMDGPU=Y on this card, not having to bother with /lib/firmware. Now it seems this approach will not work while CONFIG_DRM_AMDGPU=M produces "results".

My make.conf USE flags:
Code:
USE="X gtk dvd alsa cdr udev elogind dbus"


Thanks for all the help so far. I've stopped getting screen blank/black, however `startx` is still giving problems.

I'm unsure whether to mark the thread as solved. I've solved part of the problem while introducing a new one.

Edit: Additionally, here's what Xorg.0.log does produce. No more `fatal server error`; however nor any trace of `amdgpu` but `radeon` instead:
https://bpa.st/CY3A
_________________
http://isaiassifuentes.net
Back to top
View user's profile Send private message
NeddySeagoon
Administrator
Administrator


Joined: 05 Jul 2003
Posts: 54216
Location: 56N 3W

PostPosted: Tue Jun 29, 2021 2:31 pm    Post subject: Reply with quote

ISHAIM,

The Gentoo Wiki says
Quote:
POLARIS11 RX 460, RX 550 640SP, RX 560 amdgpu/polaris11_{ce,k_smc,k2_smc,k_mc,mc,me,mec2,mec,pfp,rlc,sdma1,sdma,smc,smc_sk,uvd,vce}.bin

That's you.

If the kernel driver is configured as a module, it will read /lib/firmware directly and help itself to what it needs,

Your Xorg log at /var/log/Xorg.0.log may be an old one.
With Xorg running as a user, it can't write to /var. The log will be in /home/<username> somewhere. Probably in a hidden file/directory (name starting with a dot.)

elogind is a piece of systemd. References to systemd are not a problem.


Xorg.0.log:
[    91.110] (==) Matched ati as autoconfigured driver 0
[    91.110] (==) Matched modesetting as autoconfigured driver 1
[    91.110] (==) Matched fbdev as autoconfigured driver 2
[    91.110] (==) Matched vesa as autoconfigured driver 3


That's wrong amdgpu fould be the first listed driver. Even if its not, it should be in the list.

Hmm ...
Code:
[    91.014] Build Operating System: Linux 4.15.0-130-generic x86_64 Ubuntu
[    91.014] Current Operating System: Linux xubuntu 5.8.0-43-generic #49~20.04.1-Ubuntu SMP Fri Feb 5 09:57:56 UTC 2021 x86_64
[    91.014] Kernel command line: BOOT_IMAGE=/casper/vmlinuz file=/cdrom/preseed/xubuntu.seed quiet splash nomodeset ---
[    91.014] Build Date: 17 January 2021  09:13:31AM

That's a very old kernel and its Ubuntu's.

I suspect that what you are looking at is an out of date log file.
_________________
Regards,

NeddySeagoon

Computer users fall into two groups:-
those that do backups
those that have never had a hard drive fail.
Back to top
View user's profile Send private message
SlashBeast
Retired Dev
Retired Dev


Joined: 23 May 2006
Posts: 2922

PostPosted: Tue Jun 29, 2021 7:55 pm    Post subject: Reply with quote

Take a look at https://wiki.gentoo.org/wiki/Non_root_Xorg and confirm that elogind is operational.
Back to top
View user's profile Send private message
ISHAIM
Apprentice
Apprentice


Joined: 08 Oct 2006
Posts: 161
Location: Chicago, IL

PostPosted: Sat Jul 03, 2021 1:58 am    Post subject: Reply with quote

Hi. I'm marking this [SOLVED] because I've managed to reproduce/solve this across 3 different Linux distributions (Xubuntu, Tails, Gentoo). I suppose this is, for the most part, "solved".

The documentation that helped me debug this is located at:
https://tails.boum.org/support/known_issues/graphics/#amd-radeon-rx-400
https://wiki.gentoo.org/wiki/AMDGPU#Kernel

If you have AMD Radeon RX 460 Baffin: this applies to you, apparently. You could, in theory, apply to your boot commands
Code:
amdgpu.dc=0
and this appears to be a solution. However, this leads me to the question of "What if we just disabled this in-kernel? Isn't this just something that's ENABLED in-kernel that we're DISABLING?"

Sure enough... Disabling that in the bootmenus of first Tails, then Xubuntu, and finally doing that in-kernel disabling with Gentoo. All of that worked where it seems to have broken things before.

Here's the breakdown:

Xubuntu/Tails/Gentoo(?) only plays nice with this graphics card when you give the additional "boot command":
Code:
amdgpu.dc=0

When simply setting
Code:
CONFIG_DRM_AMD_DC=n
in the kernel's .config,
Code:
amdgpu.dc=0
is unnecessary as far as Gentoo is concerned, because it seems to be some sort of functional, kernel-level equivalent.

Here's that scrot of twm xterm et cetera for good measure: https://ibb.co/CmwfmfY
_________________
http://isaiassifuentes.net
Back to top
View user's profile Send private message
NeddySeagoon
Administrator
Administrator


Joined: 05 Jul 2003
Posts: 54216
Location: 56N 3W

PostPosted: Sat Jul 03, 2021 9:04 am    Post subject: Reply with quote

ISHAIM,

amdgpu.dc=0 passes the module parameter dc=0 to the amdgpu kernel code.

Code:
$ modinfo amdgpu
...
parm:           dc:Display Core driver (1 = enable, 0 = disable, -1 = auto (default)) (int)
...


So auto, which is the default, gets it wrong.

I am unsure if CONFIG_DRM_AMD_DC=n removes the code from the kernel or overrides the setting.
Either way it achieves the same effect.
_________________
Regards,

NeddySeagoon

Computer users fall into two groups:-
those that do backups
those that have never had a hard drive fail.
Back to top
View user's profile Send private message
Goverp
Veteran
Veteran


Joined: 07 Mar 2007
Posts: 1997

PostPosted: Sat Jul 03, 2021 6:05 pm    Post subject: Reply with quote

It removes the code, which is quite a saving (compilation time and kernel size) given the coding diarrhoea of the AMDGPU developers...
If you remove it, of course, amdgpu.dc=1 doesn't work, but you probably wouldn't have wanted it anyway.
_________________
Greybeard
Back to top
View user's profile Send private message
Display posts from previous:   
Reply to topic    Gentoo Forums Forum Index Desktop Environments All times are GMT
Page 1 of 1

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum