Gentoo Forums
Gentoo Forums
Gentoo Forums
Quick Search: in
[solved] amdgpu freezes, linux starts
View unanswered posts
View posts from last 24 hours

 
Reply to topic    Gentoo Forums Forum Index Kernel & Hardware
View previous topic :: View next topic  
Author Message
Child_of_Sun_24
Guru
Guru


Joined: 28 Jul 2004
Posts: 431

PostPosted: Tue Feb 06, 2018 10:17 am    Post subject: [solved] amdgpu freezes, linux starts Reply with quote

Hello @all

Since i upgraded from a FX-8350 to a Ryzen 5 1600, i have a problem with the amdgpu driver (The radeon driver doesn't load, too but there i can't retrieve logfiles), when it loads the screen freezes with a colorful stroke on the screen. Linux is still starting but without any screen output, i can retrieve the logfiles through the sysresccd.

Here is my PC config:
MB: Gigabyte AX370-Gaming K3
CPU: Amd Ryzen 5 1600
GraKa: 1x Sapphire Radeon R9 280X
Ram: 2x8 GB 2133MHz (Corsair)
SSD: 2x256GB Sandisk (Windows 10 x64, Spiele)
SSD: 64GB Sandisk (Gentoo Hyper-V/Nativ)
HDD: 3TB Toshiba DT01ACA3 (Data)
External HDD: 250 GB Samsung (System-Backups)
Optisch: 1x Optiarc DVD-Burner, 1x LG BlueRay Burner
Netzteil: Thermaltake TR2 S 700W
Kühlung: Boxed Cooler
Kernel Gentoo-Sources-4.15.1 (with 4.14.15 it happens, too)

Here is the partial Output of dmesg:
Code:
[    1.776150] [drm] amdgpu kernel modesetting enabled.
[    1.776163] checking generic (f0000000 7e9000) vs hw (1ff0000000 10000000)
[    1.776265] [drm] initializing kernel modesetting (TAHITI 0x1002:0x6798 0x174B:0x3001 0x00).
[    1.776269] [drm] register mmio base: 0xFE900000
[    1.776269] [drm] register mmio size: 262144
[    1.776284] ATOM BIOS: 113-C3865101-SU2
[    1.776290] [drm] GPU post is not needed
[    1.776291] [drm] Changing default dispclk from 500Mhz to 600Mhz
[    1.776358] [drm] vm size is 64 GB, block size is 13-bit, fragment size is 9-bit
[    1.776360] amdgpu 0000:07:00.0: SME is active, device will require DMA bounce buffers
[    1.776361] amdgpu 0000:07:00.0: SME is active, device will require DMA bounce buffers
[    1.776390] amdgpu 0000:07:00.0: VRAM: 3072M 0x000000F400000000 - 0x000000F4BFFFFFFF (3072M used)
[    1.776391] amdgpu 0000:07:00.0: GTT: 1024M 0x0000000000000000 - 0x000000003FFFFFFF
[    1.776393] [drm] Detected VRAM RAM=3072M, BAR=256M
[    1.776393] [drm] RAM width 384bits GDDR5
[    1.776442] [TTM] Zone  kernel: Available graphics memory: 8170008 kiB
[    1.776443] [TTM] Zone   dma32: Available graphics memory: 2097152 kiB
[    1.776443] [TTM] Initializing pool allocator
[    1.776445] [TTM] Initializing DMA pool allocator
[    1.776459] [drm] amdgpu: 3072M of VRAM memory ready
[    1.776460] [drm] amdgpu: 3072M of GTT memory ready.
[    1.776462] SME is active and system is using DMA bounce buffers
[    1.776463] [drm] GART: num cpu pages 262144, num gpu pages 262144
[    1.778102] amdgpu 0000:07:00.0: PCIE GART of 1024M enabled (table at 0x000000F400040000).
[    1.778153] amdgpu 0000:07:00.0: amdgpu: using MSI.
[    1.778154] [drm] Supports vblank timestamp caching Rev 2 (21.10.2013).
[    1.778154] [drm] Driver supports precise vblank timestamp query.
[    1.778165] [drm] amdgpu: irq initialized.
[    1.778185] [drm] probing gen 2 caps for device 1022:1453 = 737903/e
[    1.778208] [drm] Internal thermal controller with fan control
[    1.778215] [drm] amdgpu: dpm initialized
[    1.778348] [drm] AMDGPU Display Connectors
[    1.778349] [drm] Connector 0:
[    1.778349] [drm]   DP-1
[    1.778349] [drm]   HPD5
[    1.778350] [drm]   DDC: 0x194c 0x194c 0x194d 0x194d 0x194e 0x194e 0x194f 0x194f
[    1.778350] [drm]   Encoders:
[    1.778351] [drm]     DFP1: INTERNAL_UNIPHY2
[    1.778351] [drm] Connector 1:
[    1.778352] [drm]   DP-2
[    1.778352] [drm]   HPD4
[    1.778353] [drm]   DDC: 0x1950 0x1950 0x1951 0x1951 0x1952 0x1952 0x1953 0x1953
[    1.778353] [drm]   Encoders:
[    1.778353] [drm]     DFP2: INTERNAL_UNIPHY2
[    1.778353] [drm] Connector 2:
[    1.778354] [drm]   HDMI-A-1
[    1.778354] [drm]   HPD1
[    1.778355] [drm]   DDC: 0x1958 0x1958 0x1959 0x1959 0x195a 0x195a 0x195b 0x195b
[    1.778355] [drm]   Encoders:
[    1.778355] [drm]     DFP3: INTERNAL_UNIPHY1
[    1.778355] [drm] Connector 3:
[    1.778356] [drm]   DVI-I-1
[    1.778356] [drm]   HPD3
[    1.778357] [drm]   DDC: 0x1960 0x1960 0x1961 0x1961 0x1962 0x1962 0x1963 0x1963
[    1.778357] [drm]   Encoders:
[    1.778357] [drm]     DFP4: INTERNAL_UNIPHY
[    1.778358] [drm]     CRT1: INTERNAL_KLDSCP_DAC1
[    1.778358] [drm] Connector 4:
[    1.778358] [drm]   DVI-D-1
[    1.778358] [drm]   HPD6
[    1.778359] [drm]   DDC: 0x1954 0x1954 0x1955 0x1955 0x1956 0x1956 0x1957 0x1957
[    1.778359] [drm]   Encoders:
[    1.778360] [drm]     DFP5: INTERNAL_UNIPHY1
[    1.778451] amdgpu 0000:07:00.0: fence driver on ring 0 use gpu addr 0x0000000000400080, cpu addr 0x0000000033e7dcf3
[    1.778504] amdgpu 0000:07:00.0: fence driver on ring 1 use gpu addr 0x0000000000400100, cpu addr 0x000000008f4c5af0
[    1.778541] amdgpu 0000:07:00.0: fence driver on ring 2 use gpu addr 0x0000000000400180, cpu addr 0x000000002d1b6325
[    1.778584] amdgpu 0000:07:00.0: fence driver on ring 3 use gpu addr 0x0000000000400200, cpu addr 0x000000006b2e8215
[    1.778624] amdgpu 0000:07:00.0: fence driver on ring 4 use gpu addr 0x0000000000400280, cpu addr 0x000000001b3677d1
[    1.778695] [drm] probing gen 2 caps for device 1022:1453 = 737903/e
[    1.778697] [drm] PCIE gen 3 link speeds already enabled
[    1.830344] random: crng init done
[    2.208022] tsc: Refined TSC clocksource calibration: 3200.036 MHz
[    2.208071] clocksource: tsc: mask: 0xffffffffffffffff max_cycles: 0x2e206bfb320, max_idle_ns: 440795254891 ns
[    2.208810] [drm:gfx_v6_0_ring_test_ring] *ERROR* amdgpu: ring 0 test failed (scratch(0x2140)=0xCAFEDEAD)
[    2.209468] [drm:amdgpu_device_init] *ERROR* hw_init of IP block <gfx_v6_0> failed -22
[    2.209982] amdgpu 0000:07:00.0: amdgpu_init failed


Here is the output that comes alittle later:
Code:
[    2.211882] BUG: scheduling while atomic: swapper/0/1/0x00000000
[    2.212259] Modules linked in:
[    2.212261] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 4.15.1-gentoo-Cracked #7
[    2.212262] Hardware name: Gigabyte Technology Co., Ltd. AX370-Gaming K3/AX370-Gaming K3, BIOS F20 01/31/2018
[    2.212263] Call Trace:
[    2.212268]  dump_stack+0x46/0x65
[    2.212271]  __schedule_bug+0x46/0x60
[    2.212273]  __schedule+0x451/0x580
[    2.212274]  schedule+0x2a/0x80
[    2.212276]  schedule_timeout+0x18a/0x280
[    2.212277]  wait_for_common+0xa6/0x150
[    2.212279]  ? wake_up_q+0x70/0x70
[    2.212280]  kthread_stop+0x38/0x60
[    2.212282]  destroy_workqueue+0x107/0x170
[    2.212284]  ttm_mem_global_release+0x21/0x80
[    2.212286]  drm_global_item_unref+0x44/0x60
[    2.212288]  amdgpu_ttm_fini+0x160/0x1c0
[    2.212289]  amdgpu_bo_fini+0x9/0x30
[    2.212291]  gmc_v6_0_sw_fini+0x29/0x50
[    2.212293]  amdgpu_fini+0x201/0x310
[    2.212294]  amdgpu_device_init+0xdfd/0x13e0
[    2.212297]  ? kernfs_add_one+0xdf/0x130
[    2.212298]  amdgpu_driver_load_kms+0x73/0x1f0
[    2.212300]  drm_dev_register+0x12d/0x1c0
[    2.212301]  amdgpu_pci_probe+0x103/0x140
[    2.212304]  pci_device_probe+0xa1/0x130
[    2.212306]  driver_probe_device+0x24a/0x340
[    2.212308]  ? set_debug_rodata+0xc/0xc
[    2.212309]  __driver_attach+0x88/0x90
[    2.212310]  ? driver_probe_device+0x340/0x340
[    2.212311]  bus_for_each_dev+0x57/0x90
[    2.212312]  bus_add_driver+0x18c/0x210
[    2.212314]  ? ttm_init+0x5b/0x5b
[    2.212315]  driver_register+0x52/0xc0
[    2.212316]  ? ttm_init+0x5b/0x5b
[    2.212318]  do_one_initcall+0x49/0x180
[    2.212320]  kernel_init_freeable+0x157/0x1d9
[    2.212321]  ? rest_init+0xc0/0xc0
[    2.212322]  kernel_init+0x5/0xf0
[    2.212324]  ret_from_fork+0x22/0x40
[    2.212330] [TTM] Zone  kernel: Used memory at exit: 0 kiB
[    2.212331] [TTM] Zone   dma32: Used memory at exit: 0 kiB
[    2.212332] [drm] amdgpu: ttm finalized
[    2.212335] amdgpu 0000:07:00.0: Fatal error during GPU init
[    2.212672] [drm] amdgpu: finishing device.
[    2.212672] [TTM] Memory type 2 has not been initialized
[    2.213166] amdgpu: probe of 0000:07:00.0 failed with error -22


I hope someone can help me with that, when other data is needed i will provide it.

*EDIT* German Thread:
https://forums.gentoo.org/viewtopic-p-8179520.html#8179520


Last edited by Child_of_Sun_24 on Mon Feb 12, 2018 2:11 pm; edited 1 time in total
Back to top
View user's profile Send private message
Zucca
Veteran
Veteran


Joined: 14 Jun 2007
Posts: 1373
Location: KUUSANKOSKI, Finland

PostPosted: Tue Feb 06, 2018 6:12 pm    Post subject: Reply with quote

Before the hardware switch, did you emerge -e @world (to be precise you'd need to emerge binutils, gcc, etc first, then @world) with the new compiler flags for your new CPU?
If not, you may need to use the sysrescuecd to boot your PC and then chroot into your Gentoo install and change the make.conf accordingly. Finally recompile all the packages.
_________________
..: Zucca :..

Code:
ERROR: '--failure' is not an option. Aborting...
Back to top
View user's profile Send private message
Child_of_Sun_24
Guru
Guru


Joined: 28 Jul 2004
Posts: 431

PostPosted: Wed Feb 07, 2018 8:32 am    Post subject: Reply with quote

Yes i have made the emerge -e world as a first step after unpacking the stage3. With "-march=native -O2 -pipe -mtune=znver1 -fomit-frame-pointer".

When i compile amdgpu as a module, blacklist it and use simple-fb as framebuffer the System works just fine, in the VM (HyperV, Windows 10) it works, too. Only amdgpu isn't working (The radeon driver gives me errors, too but there i can't provide logs)

*EDIT*
I have forgotten to say that i made a new installation from a stage3 after the processor/mainboard/ram was changed.
Back to top
View user's profile Send private message
Zucca
Veteran
Veteran


Joined: 14 Jun 2007
Posts: 1373
Location: KUUSANKOSKI, Finland

PostPosted: Wed Feb 07, 2018 10:57 am    Post subject: Reply with quote

Although your system should not shut the display output when the firmware is missing... Make sure you have the right firmware for your GPU.
... and also correct microcode for your CPU.
_________________
..: Zucca :..

Code:
ERROR: '--failure' is not an option. Aborting...
Back to top
View user's profile Send private message
PrSo
Tux's lil' helper
Tux's lil' helper


Joined: 01 Jun 2017
Posts: 124

PostPosted: Wed Feb 07, 2018 1:10 pm    Post subject: Reply with quote

Yours graphic card, as I understood correctly, is R9 280X THAITI, which is Southern Islands family (CI in kernel nomenclature), and support for Southern Islands and Sea Islands (CIK) is _still_ experimental in amdgpu kernel driver.

Four questions arise:

1. Is this problem occurs on kernel compiled with radeon _only_ (eventually amdgpu _only_) driver also? Because from what you are saying you have two drivers compiled in kernel, radeon and amdgpu.

2. Is the driver compiled as module or is included in vmlinuz, and have you included necessarily firmware into kernel image as provided in the wiki?

3. What options are enabled in yours kernel config in accordance to amdgpu driver?

4. Did you try any earlier kernel version than 4.14 (I mean that if amdgpu or radeon driver ever worked for you)?

You could find thru "git bisect" the commit which is causing problems and submit a bug report on bugzilla.

Maybe you could try a kernel from amd devs which is theirs testing ground with all lates patches for CI and CIK included:
https://cgit.freedesktop.org/~agd5f/linux/log/?h=drm-next-4.17-wip with a "git clone".

You cold try also to remove graphic card from the slot and place it again (I know that this sounds silly but I have seen a strange behavior caused by not properly installed gpu on my old rig).
Back to top
View user's profile Send private message
Zucca
Veteran
Veteran


Joined: 14 Jun 2007
Posts: 1373
Location: KUUSANKOSKI, Finland

PostPosted: Wed Feb 07, 2018 5:02 pm    Post subject: Reply with quote

Also. If you REALLY need to have both radeon and amdgpu you need make sure they don't conflict.
I've had a multiseat setup where I needed both. I managed to pull it off by writing some custom udev rules to map each driver to corresponding GPU.
_________________
..: Zucca :..

Code:
ERROR: '--failure' is not an option. Aborting...
Back to top
View user's profile Send private message
Child_of_Sun_24
Guru
Guru


Joined: 28 Jul 2004
Posts: 431

PostPosted: Thu Feb 08, 2018 9:00 am    Post subject: Reply with quote

I haven't radeon and amdgpu in the kernel, when i switch between them i compile the kernel new without the other driver.

The correct firmware is provided by an initrd, microcode for the kernel is compiled in the kernel.

With my old mainboard everythin worked, i think it's an error with the new mainboard.

In systemreccd-5.1.2 the radeon driver works but when i use my own kernel it doesn't work.

I will try an older kernel and look if it's working.

In the log i provided the amdgpu driver was compiled into the kernel, at the moment i compiled it as a module which makes the same error.

*EDIT*
Kernel .config for amdgpu
https://pastebin.com/8wVAv6Dx

*EDIT2*
With gentoo-source-4.9.80 the radeon driver is working :-)

*EDIT3*
Found the error ( https://wiki.gentoo.org/wiki/AMDGPU#Kernel_2 ) the AMD Secure Memory Encryption were enabled by default, deactivating it solves the problem :-) to be safe i set the kernel command line option mem_encrypt=off :-)
Back to top
View user's profile Send private message
PrSo
Tux's lil' helper
Tux's lil' helper


Joined: 01 Jun 2017
Posts: 124

PostPosted: Thu Feb 08, 2018 9:55 am    Post subject: Reply with quote

Child_of_Sun_24 wrote:
I haven't radeon and amdgpu in the kernel, when i switch between them i compile the kernel new without the other driver.

OK.

Quote:
The correct firmware is provided by an initrd, microcode for the kernel is compiled in the kernel.

Good.

Quote:
With my old mainboard everythin worked, i think it's an error with the new mainboard.

In systemreccd-5.1.2 the radeon driver works but when i use my own kernel it doesn't work.

I dont think that this is a hardware issue since radeon driver from SysRescueCD works on the new setup, but I could be wrong.

You could check if you have all needed stuff enabled in kernel .config:

Code:
lspci -nnk


IMHO the best way to check if you have enabled anything that is supported by kernel divers is https://cateee.net/lkddb/.

You could search latest database with those 8 digits from lspci command [1234:5678] a device by device.

I know that changing hardware with gentoo could be a PITA.

EDIT:
I'm glad to hear that.
Back to top
View user's profile Send private message
Display posts from previous:   
Reply to topic    Gentoo Forums Forum Index Kernel & Hardware All times are GMT
Page 1 of 1

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum