View previous topic :: View next topic |
Author |
Message |
ISHAIM Apprentice
Joined: 08 Oct 2006 Posts: 161 Location: Chicago, IL
|
Posted: Mon Jun 28, 2021 10:25 pm Post subject: AMDGPU Difficulties [SOLVED] |
|
|
Hi, please bear with me as this problem is difficult to diagnose and report on. I'm more than certain I've a working system underneath and I've installed Gentoo on this exact same hardware before; literally nothing has changed.
Reproducing the problem causes me to have to chroot back into my system to revert it back to a functioning state, a tedious task requiring specific kernel .config revision.
Effectively, attempting to set up the Graphics stack is giving me trouble. I've followed the AMDGPU Wiki before to great success. I'm not quite sure what I'm missing this time around.
In the past, I've gotten other AMD APU's working fine on an Acer Netbook, and loading /lib/firmware things. Not my first rodeo.
The screen effectively goes blank/black while the system is booting. I'm fairly certain this is happening as OpenRC is already actively loaded/running the remainder of my system. In fact, the reason I *know* the rest of my system is working while the Graphics stack is not is that I can still SSH into the system remotely/locally through my Raspberry Pi.
Experienced/advanced/wise enough to anticipate that should something like this happen, I enabled SSH for such debugging purposes from the get go, should the only singularly possible thing to be broken at this point would be my Graphics setup while all else is otherwise functional underneath.
Beyond this point, it gets very foggy/unclear to me whether or not /etc/portage/make.conf requires "radeonsi" or "radeon" if at all, or where to go from here.
Graphics card: Asus Arez RX-560
Output of `lspci`: 01:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Baffin [Radeon RX 460/560D / Pro 450/455/460/555/555X/560/560X] (rev e5)
Family: Artic Islands
Chipset Name: POLARIS11
*** Before staging (this works):
- /usr/src/linux/.config:
CONFIG_DRM=y
CONFIG_DRM_AMDGPU=n
- dmesg output:
*** After staging (this does not):
- /usr/src/linux/.config:
CONFIG_DRM=y
CONFIG_DRM_AMDGPU=y
CONFIG_EXTRA_FIRMWARE="amdgpu/polaris11_ce.bin amdgpu/polaris11_k_smc.bin amdgpu/polaris11_k2_smc.bin amdgpu/polaris11_k_mc.bin amdgpu/polaris11_mc.bin amdgpu/polaris11_me.bin amdgpu/polaris11_mec2.bin amdgpu/polaris11_mec.bin amdgpu/polaris11_pfp.bin amdgpu/polaris11_rlc.bin amdgpu/polaris11_sdma1.bin amdgpu/polaris11_sdma.bin amdgpu/polaris11_smc.bin amdgpu/polaris11_smc_sk.bin amdgpu/polaris11_uvd.bin amdgpu/polaris11_vce.bin"
Output of `echo` used to configure CONFIG_EXTRA_FIRMWARE: echo amdgpu/polaris11_{ce,k_smc,k2_smc,k_mc,mc,me,mec2,mec,pfp,rlc,sdma1,sdma,smc,smc_sk,uvd,vce}.bin
Value of CONFIG_EXTRA_FIRMWARE="amdgpu/polaris11_ce.bin amdgpu/polaris11_k_smc.bin amdgpu/polaris11_k2_smc.bin amdgpu/polaris11_k_mc.bin amdgpu/polaris11_mc.bin amdgpu/polaris11_me.bin amdgpu/polaris11_mec2.bin amdgpu/polaris11_mec.bin amdgpu/polaris11_pfp.bin amdgpu/polaris11_rlc.bin amdgpu/polaris11_sdma1.bin amdgpu/polaris11_sdma.bin amdgpu/polaris11_smc.bin amdgpu/polaris11_smc_sk.bin amdgpu/polaris11_uvd.bin amdgpu/polaris11_vce.bin"
This setup assumes I've compiled these `modules` into the kernel. Even when I've tried AMDGPU=M, OpenRC only gets a little further before producing the same blank/black screen.
Thanks for reading. _________________ http://isaiassifuentes.net
Last edited by ISHAIM on Sat Jul 03, 2021 2:16 am; edited 1 time in total |
|
Back to top |
|
|
alamahant Advocate
Joined: 23 Mar 2019 Posts: 3879
|
Posted: Mon Jun 28, 2021 10:44 pm Post subject: |
|
|
Hi
Plz boot into a live dvd like ubuntu or calculate run lsmod identify the active gpu modules and try to translate them to kernel .config or use a binary kernel.........
Why do you feel the need to have the modules in-built rather than =m?
Is it faster this way?
Does it eliminate the need of an initrd?
_________________
Last edited by alamahant on Mon Jun 28, 2021 10:49 pm; edited 2 times in total |
|
Back to top |
|
|
NeddySeagoon Administrator
Joined: 05 Jul 2003 Posts: 54216 Location: 56N 3W
|
Posted: Mon Jun 28, 2021 10:47 pm Post subject: |
|
|
ISHAIM,
Code: | CONFIG_EXTRA_FIRMWARE="amdgpu/polaris11_ce.bin amdgpu/polaris11_k_smc.bin amdgpu/polaris11_k2_smc.bin amdgpu/polaris11_k_mc.bin amdgpu/polaris11_mc.bin amdgpu/polaris11_me.bin amdgpu/polaris11_mec2.bin amdgpu/polaris11_mec.bin amdgpu/polaris11_pfp.bin amdgpu/polaris11_rlc.bin amdgpu/polaris11_sdma1.bin amdgpu/polaris11_sdma.bin amdgpu/polaris11_smc.bin amdgpu/polaris11_smc_sk.bin amdgpu/polaris11_uvd.bin amdgpu/polaris11_vce.bin" |
You load 16 amdgpu/polaris11* files but Code: | $ ls /lib/firmware/amdgpu/polaris11* | wc -l
21 | 21 are provided. It might matter.
Since you can ssh in, look at dmesg. It will tell the first file that it couldn't find for each device.
You need VIDEO_CARDS="amdgpu radeonsi" to get the Xorg drivers and mesa to support your card. _________________ Regards,
NeddySeagoon
Computer users fall into two groups:-
those that do backups
those that have never had a hard drive fail. |
|
Back to top |
|
|
ISHAIM Apprentice
Joined: 08 Oct 2006 Posts: 161 Location: Chicago, IL
|
Posted: Tue Jun 29, 2021 4:15 am Post subject: |
|
|
NeddySeagoon,
I've corrected CONFIG_EXTRA_FIRMWARE to include all of the files in /lib/firmware/amdgpu. This allows the system to at least boot without crashing or screen black/blank; however `startx` still will not work (my Xorg.0.log):
https://bpa.st/6PCQ _________________ http://isaiassifuentes.net |
|
Back to top |
|
|
wwdev16 n00b
Joined: 29 Aug 2018 Posts: 52
|
Posted: Tue Jun 29, 2021 7:09 am Post subject: |
|
|
Does dmesg | egrep 'kms|gpu|drm|fb' have anything interesting?
Since you seem to be using systemd, has the elogind use flag been disabled
for any package?
I have had black screen and then weird permission errors like you, but when using a new
kernel version or with an upgrade of xorg packages. AFAICT:- The amdgpu driver only works as a module
- As a module amdgpu loads from /lib/firmware so I don't embed firmware in the kernel
(requires having /lib/firmware in an initramfs though)
- kernel changes to drm or video drivers often affect libdrm and mesa
What I do that makes things work:- Set /usr/src/linux to point to the kernel to be booted
- VIDEO_CARDS="amdgpu radeon radeonsi fbdev vesa" (you might not need fbdev/vesa)
- emerge --oneshot libdrm mesa
- Reboot
- If display-manager has issues, brute force:
emerge --ask --verbose --oneshot $(qlist -IC x11- media-libs dev-libs)
(Note I use xorg-server/drivers since xfce isn't wayland compatible, so I also
include xorg-server xorg-drivers xfce-)
- Reboot
- If there are still problems, <explitive> and then emerge --emptytree.
- Reboot
So far, emerge --emptytree has always worked. Have never found what extra packages needed to
be emerged to avoid needing to do emerge --emptytree.
Maybe there is something in that which will work for you. |
|
Back to top |
|
|
NeddySeagoon Administrator
Joined: 05 Jul 2003 Posts: 54216 Location: 56N 3W
|
Posted: Tue Jun 29, 2021 9:37 am Post subject: |
|
|
ISHAIM,
Code: | Fatal server error:
[ 26.470] (EE) parse_vt_settings: Cannot open /dev/tty0 (Permission denied) |
Either add elogind to USE in make.conf are rebuild everything to get elogind support, or rebuild xord with USE=suid to get the old behaviour.
Also you are missing Code: | [ 87.220] (II) LoadModule: "amdgpu"
[ 87.220] (II) Loading /usr/lib64/xorg/modules/drivers/amdgpu_drv.so |
Your log shows the radeon driver in use. _________________ Regards,
NeddySeagoon
Computer users fall into two groups:-
those that do backups
those that have never had a hard drive fail. |
|
Back to top |
|
|
ISHAIM Apprentice
Joined: 08 Oct 2006 Posts: 161 Location: Chicago, IL
|
Posted: Tue Jun 29, 2021 11:42 am Post subject: |
|
|
NeddySeagoon,
I've `elogind` in make.conf USE variable and have already `emerge -avtNDu @world`; still no dice.
There's something on the AMDGPU wiki about making sure radeon is blacklisted, although I explicitly am unsetting Code: | CONFIG_DRM_RADEON=N
CONFIG_FB_RADEON=N |
in the kernel .config.
Not quite sure where/why/how that's entering the picture.
Nor am I using systemd; explicitly stated before by mention of me using OpenRC, I'm unsure where/why/how that enters the picture either.
I've gotten some feedback about perhaps Polaris11 may not be the correct version of firmware for this card (RX-560 Arctic Islands class). I have no idea about that.
I've included *all* the *.bin's in /lib/firmware/amdgpu/polaris11* now.
I realize there is this concept of whether or not these "firmware modules" are compiled into the kernel binary; however the AMDGPU wiki is throwing me off quite a bit. In the past, I've just used =Y, not =M, and didn't have to bother with /lib/firmware files, although I remember having to manage this manually in the past on an AMD APU with =M.
I got by for a while with CONFIG_DRM_AMDGPU=Y on this card, not having to bother with /lib/firmware. Now it seems this approach will not work while CONFIG_DRM_AMDGPU=M produces "results".
My make.conf USE flags:
Code: | USE="X gtk dvd alsa cdr udev elogind dbus" |
Thanks for all the help so far. I've stopped getting screen blank/black, however `startx` is still giving problems.
I'm unsure whether to mark the thread as solved. I've solved part of the problem while introducing a new one.
Edit: Additionally, here's what Xorg.0.log does produce. No more `fatal server error`; however nor any trace of `amdgpu` but `radeon` instead:
https://bpa.st/CY3A _________________ http://isaiassifuentes.net |
|
Back to top |
|
|
NeddySeagoon Administrator
Joined: 05 Jul 2003 Posts: 54216 Location: 56N 3W
|
Posted: Tue Jun 29, 2021 2:31 pm Post subject: |
|
|
ISHAIM,
The Gentoo Wiki says
Quote: | POLARIS11 RX 460, RX 550 640SP, RX 560 amdgpu/polaris11_{ce,k_smc,k2_smc,k_mc,mc,me,mec2,mec,pfp,rlc,sdma1,sdma,smc,smc_sk,uvd,vce}.bin |
That's you.
If the kernel driver is configured as a module, it will read /lib/firmware directly and help itself to what it needs,
Your Xorg log at /var/log/Xorg.0.log may be an old one.
With Xorg running as a user, it can't write to /var. The log will be in /home/<username> somewhere. Probably in a hidden file/directory (name starting with a dot.)
elogind is a piece of systemd. References to systemd are not a problem.
Xorg.0.log: | [ 91.110] (==) Matched ati as autoconfigured driver 0
[ 91.110] (==) Matched modesetting as autoconfigured driver 1
[ 91.110] (==) Matched fbdev as autoconfigured driver 2
[ 91.110] (==) Matched vesa as autoconfigured driver 3 |
That's wrong amdgpu fould be the first listed driver. Even if its not, it should be in the list.
Hmm ...
Code: | [ 91.014] Build Operating System: Linux 4.15.0-130-generic x86_64 Ubuntu
[ 91.014] Current Operating System: Linux xubuntu 5.8.0-43-generic #49~20.04.1-Ubuntu SMP Fri Feb 5 09:57:56 UTC 2021 x86_64
[ 91.014] Kernel command line: BOOT_IMAGE=/casper/vmlinuz file=/cdrom/preseed/xubuntu.seed quiet splash nomodeset ---
[ 91.014] Build Date: 17 January 2021 09:13:31AM |
That's a very old kernel and its Ubuntu's.
I suspect that what you are looking at is an out of date log file. _________________ Regards,
NeddySeagoon
Computer users fall into two groups:-
those that do backups
those that have never had a hard drive fail. |
|
Back to top |
|
|
SlashBeast Retired Dev
Joined: 23 May 2006 Posts: 2922
|
|
Back to top |
|
|
ISHAIM Apprentice
Joined: 08 Oct 2006 Posts: 161 Location: Chicago, IL
|
Posted: Sat Jul 03, 2021 1:58 am Post subject: |
|
|
Hi. I'm marking this [SOLVED] because I've managed to reproduce/solve this across 3 different Linux distributions (Xubuntu, Tails, Gentoo). I suppose this is, for the most part, "solved".
The documentation that helped me debug this is located at:
https://tails.boum.org/support/known_issues/graphics/#amd-radeon-rx-400
https://wiki.gentoo.org/wiki/AMDGPU#Kernel
If you have AMD Radeon RX 460 Baffin: this applies to you, apparently. You could, in theory, apply to your boot commands and this appears to be a solution. However, this leads me to the question of "What if we just disabled this in-kernel? Isn't this just something that's ENABLED in-kernel that we're DISABLING?"
Sure enough... Disabling that in the bootmenus of first Tails, then Xubuntu, and finally doing that in-kernel disabling with Gentoo. All of that worked where it seems to have broken things before.
Here's the breakdown:
Xubuntu/Tails/Gentoo(?) only plays nice with this graphics card when you give the additional "boot command":
When simply setting Code: | CONFIG_DRM_AMD_DC=n | in the kernel's .config, is unnecessary as far as Gentoo is concerned, because it seems to be some sort of functional, kernel-level equivalent.
Here's that scrot of twm xterm et cetera for good measure: https://ibb.co/CmwfmfY _________________ http://isaiassifuentes.net |
|
Back to top |
|
|
NeddySeagoon Administrator
Joined: 05 Jul 2003 Posts: 54216 Location: 56N 3W
|
Posted: Sat Jul 03, 2021 9:04 am Post subject: |
|
|
ISHAIM,
amdgpu.dc=0 passes the module parameter dc=0 to the amdgpu kernel code.
Code: | $ modinfo amdgpu
...
parm: dc:Display Core driver (1 = enable, 0 = disable, -1 = auto (default)) (int)
... |
So auto, which is the default, gets it wrong.
I am unsure if CONFIG_DRM_AMD_DC=n removes the code from the kernel or overrides the setting.
Either way it achieves the same effect. _________________ Regards,
NeddySeagoon
Computer users fall into two groups:-
those that do backups
those that have never had a hard drive fail. |
|
Back to top |
|
|
Goverp Veteran
Joined: 07 Mar 2007 Posts: 1997
|
Posted: Sat Jul 03, 2021 6:05 pm Post subject: |
|
|
It removes the code, which is quite a saving (compilation time and kernel size) given the coding diarrhoea of the AMDGPU developers...
If you remove it, of course, amdgpu.dc=1 doesn't work, but you probably wouldn't have wanted it anyway. _________________ Greybeard |
|
Back to top |
|
|
|
|
You cannot post new topics in this forum You cannot reply to topics in this forum You cannot edit your posts in this forum You cannot delete your posts in this forum You cannot vote in polls in this forum
|
|