Gentoo Forums
Gentoo Forums
Gentoo Forums
Quick Search: in
amdgpu error's
View unanswered posts
View posts from last 24 hours

 
Reply to topic    Gentoo Forums Forum Index Kernel & Hardware
View previous topic :: View next topic  
Author Message
FilthyPitDog
Apprentice
Apprentice


Joined: 12 Jan 2021
Posts: 186
Location: South Pacific

PostPosted: Mon Feb 26, 2024 7:51 pm    Post subject: amdgpu error's Reply with quote

after no set amount of time out of nowhere my screen turns off and I cannot wake it with keyboard or mouse. this happens on 6.6.16 stable bin kernel and also 6.7.6 gentoo sources custom kernel

GPU: AMD Radeon RX 7900 XTX
dmesg - https://bpa.st/WKLJU

Code:
[12668.779682] amdgpu 0000:03:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x000f address=0x50ffe0 flags=0x0000]
[12668.779691] amdgpu 0000:03:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x000f address=0x40 flags=0x0000]
[12668.779726] amdgpu 0000:03:00.0: amdgpu: [gfxhub] page fault (src_id:0 ring:157 vmid:0 pasid:0, for process  pid 0 thread  pid 0)
[12668.779733] amdgpu 0000:03:00.0: amdgpu:   in page starting at address 0x0000000000000000 from client 10
[12668.779735] amdgpu 0000:03:00.0: amdgpu: GCVM_L2_PROTECTION_FAULT_STATUS:0x00000000
[12668.779737] amdgpu 0000:03:00.0: amdgpu:     Faulty UTCL2 client ID: CB/DB (0x0)
[12668.779738] amdgpu 0000:03:00.0: amdgpu:     MORE_FAULTS: 0x0
[12668.779740] amdgpu 0000:03:00.0: amdgpu:     WALKER_ERROR: 0x0
[12668.779741] amdgpu 0000:03:00.0: amdgpu:     PERMISSION_FAULTS: 0x0
[12668.779742] amdgpu 0000:03:00.0: amdgpu:     MAPPING_ERROR: 0x0
[12668.779743] amdgpu 0000:03:00.0: amdgpu:     RW: 0x0
[12668.779748] amdgpu 0000:03:00.0: amdgpu: [gfxhub] page fault (src_id:0 ring:157 vmid:0 pasid:0, for process  pid 0 thread  pid 0)
[12668.779750] amdgpu 0000:03:00.0: amdgpu:   in page starting at address 0x00000000003ef000 from client 10
[12668.779752] amdgpu 0000:03:00.0: amdgpu: GCVM_L2_PROTECTION_FAULT_STATUS:0x00000000
[12668.779753] amdgpu 0000:03:00.0: amdgpu:     Faulty UTCL2 client ID: CB/DB (0x0)
[12668.779755] amdgpu 0000:03:00.0: amdgpu:     MORE_FAULTS: 0x0
[12668.779756] amdgpu 0000:03:00.0: amdgpu:     WALKER_ERROR: 0x0
[12668.779757] amdgpu 0000:03:00.0: amdgpu:     PERMISSION_FAULTS: 0x0
[12668.779758] amdgpu 0000:03:00.0: amdgpu:     MAPPING_ERROR: 0x0
[12668.779759] amdgpu 0000:03:00.0: amdgpu:     RW: 0x0
[12668.779763] amdgpu 0000:03:00.0: amdgpu: [gfxhub] page fault (src_id:0 ring:157 vmid:0 pasid:0, for process  pid 0 thread  pid 0)
[12668.779765] amdgpu 0000:03:00.0: amdgpu:   in page starting at address 0x0000000000000000 from client 10
[12668.779767] amdgpu 0000:03:00.0: amdgpu: GCVM_L2_PROTECTION_FAULT_STATUS:0x00000000
[12668.779768] amdgpu 0000:03:00.0: amdgpu:     Faulty UTCL2 client ID: CB/DB (0x0)
[12668.779769] amdgpu 0000:03:00.0: amdgpu:     MORE_FAULTS: 0x0
[12668.779770] amdgpu 0000:03:00.0: amdgpu:     WALKER_ERROR: 0x0
[12668.779771] amdgpu 0000:03:00.0: amdgpu:     PERMISSION_FAULTS: 0x0
[12668.779772] amdgpu 0000:03:00.0: amdgpu:     MAPPING_ERROR: 0x0
[12668.779774] amdgpu 0000:03:00.0: amdgpu:     RW: 0x0
[12668.779777] amdgpu 0000:03:00.0: amdgpu: [gfxhub] page fault (src_id:0 ring:157 vmid:0 pasid:0, for process  pid 0 thread  pid 0)
[12668.779779] amdgpu 0000:03:00.0: amdgpu:   in page starting at address 0x00000000003ef000 from client 10
[12668.779781] amdgpu 0000:03:00.0: amdgpu: GCVM_L2_PROTECTION_FAULT_STATUS:0x00000000
[12668.779782] amdgpu 0000:03:00.0: amdgpu:     Faulty UTCL2 client ID: CB/DB (0x0)
[12668.779783] amdgpu 0000:03:00.0: amdgpu:     MORE_FAULTS: 0x0
[12668.779785] amdgpu 0000:03:00.0: amdgpu:     WALKER_ERROR: 0x0
[12668.779786] amdgpu 0000:03:00.0: amdgpu:     PERMISSION_FAULTS: 0x0
[12668.779787] amdgpu 0000:03:00.0: amdgpu:     MAPPING_ERROR: 0x0
[12668.779788] amdgpu 0000:03:00.0: amdgpu:     RW: 0x0
[12668.779792] amdgpu 0000:03:00.0: amdgpu: [gfxhub] page fault (src_id:0 ring:157 vmid:0 pasid:0, for process  pid 0 thread  pid 0)
[12668.779794] amdgpu 0000:03:00.0: amdgpu:   in page starting at address 0x0000000000000000 from client 10
[12668.779796] amdgpu 0000:03:00.0: amdgpu: GCVM_L2_PROTECTION_FAULT_STATUS:0x00000000
[12668.779797] amdgpu 0000:03:00.0: amdgpu:     Faulty UTCL2 client ID: CB/DB (0x0)
[12668.779798] amdgpu 0000:03:00.0: amdgpu:     MORE_FAULTS: 0x0
[12668.779800] amdgpu 0000:03:00.0: amdgpu:     WALKER_ERROR: 0x0
[12668.779801] amdgpu 0000:03:00.0: amdgpu:     PERMISSION_FAULTS: 0x0
[12668.779802] amdgpu 0000:03:00.0: amdgpu:     MAPPING_ERROR: 0x0
[12668.779803] amdgpu 0000:03:00.0: amdgpu:     RW: 0x0
[12668.779887] amdgpu 0000:03:00.0: amdgpu: [gfxhub] page fault (src_id:0 ring:173 vmid:0 pasid:0, for process  pid 0 thread  pid 0)
[12668.779889] amdgpu 0000:03:00.0: amdgpu:   in page starting at address 0x0000000000000000 from client 10
[12668.779890] amdgpu 0000:03:00.0: amdgpu: GCVM_L2_PROTECTION_FAULT_STATUS:0x0000093B
[12668.779892] amdgpu 0000:03:00.0: amdgpu:     Faulty UTCL2 client ID: CPF (0x4)
[12668.779893] amdgpu 0000:03:00.0: amdgpu:     MORE_FAULTS: 0x1
[12668.779894] amdgpu 0000:03:00.0: amdgpu:     WALKER_ERROR: 0x5
[12668.779895] amdgpu 0000:03:00.0: amdgpu:     PERMISSION_FAULTS: 0x3
[12668.779896] amdgpu 0000:03:00.0: amdgpu:     MAPPING_ERROR: 0x1
[12668.779897] amdgpu 0000:03:00.0: amdgpu:     RW: 0x0
[12678.822066] [drm:amdgpu_cgs_destroy_device [amdgpu]] *ERROR* ring gfx_0.0.0 timeout, signaled seq=5925407, emitted seq=5925409
[12678.822113] [drm:amdgpu_cgs_destroy_device [amdgpu]] *ERROR* Process information: process gnome-shell pid 3592 thread gnome-shel:cs0 pid 3625
[12678.822139] amdgpu 0000:03:00.0: amdgpu: GPU reset begin!
[12679.825408] amdgpu 0000:03:00.0: amdgpu: IP block:gfx_v11_0 is hung!
[12680.838750] [drm:amdgpu_sdma_ras_sw_init [amdgpu]] *ERROR* amdgpu: IB test timed out
[12680.838812] amdgpu 0000:03:00.0: amdgpu: IP block:sdma_v6_0 is hung!
[12681.035407] amdgpu 0000:03:00.0: amdgpu: [gfxhub] page fault (src_id:0 ring:157 vmid:0 pasid:0, for process  pid 0 thread  pid 0)
[12681.035410] amdgpu 0000:03:00.0: amdgpu:   in page starting at address 0x0000000012d00000 from client 10
[12681.035411] amdgpu 0000:03:00.0: amdgpu: GCVM_L2_PROTECTION_FAULT_STATUS:0x00000D3B
[12681.035412] amdgpu 0000:03:00.0: amdgpu:     Faulty UTCL2 client ID: CPG (0x6)
[12681.035413] amdgpu 0000:03:00.0: amdgpu:     MORE_FAULTS: 0x1
[12681.035414] amdgpu 0000:03:00.0: amdgpu:     WALKER_ERROR: 0x5
[12681.035415] amdgpu 0000:03:00.0: amdgpu:     PERMISSION_FAULTS: 0x3
[12681.035415] amdgpu 0000:03:00.0: amdgpu:     MAPPING_ERROR: 0x1
[12681.035416] amdgpu 0000:03:00.0: amdgpu:     RW: 0x0
[12681.035420] amdgpu 0000:03:00.0: amdgpu: [gfxhub] page fault (src_id:0 ring:157 vmid:0 pasid:0, for process  pid 0 thread  pid 0)
[12681.035421] amdgpu 0000:03:00.0: amdgpu:   in page starting at address 0x0000000000000000 from client 10
[12681.035422] amdgpu 0000:03:00.0: amdgpu: GCVM_L2_PROTECTION_FAULT_STATUS:0x00000000
[12681.035423] amdgpu 0000:03:00.0: amdgpu:     Faulty UTCL2 client ID: CB/DB (0x0)
[12681.035424] amdgpu 0000:03:00.0: amdgpu:     MORE_FAULTS: 0x0
[12681.035424] amdgpu 0000:03:00.0: amdgpu:     WALKER_ERROR: 0x0
[12681.035425] amdgpu 0000:03:00.0: amdgpu:     PERMISSION_FAULTS: 0x0
[12681.035426] amdgpu 0000:03:00.0: amdgpu:     MAPPING_ERROR: 0x0
[12681.035426] amdgpu 0000:03:00.0: amdgpu:     RW: 0x0
[12681.035430] amdgpu 0000:03:00.0: amdgpu: [gfxhub] page fault (src_id:0 ring:157 vmid:0 pasid:0, for process  pid 0 thread  pid 0)
[12681.035431] amdgpu 0000:03:00.0: amdgpu:   in page starting at address 0x0000000012d00000 from client 10
[12681.035432] amdgpu 0000:03:00.0: amdgpu: GCVM_L2_PROTECTION_FAULT_STATUS:0x00000000
[12681.035433] amdgpu 0000:03:00.0: amdgpu:     Faulty UTCL2 client ID: CB/DB (0x0)
[12681.035434] amdgpu 0000:03:00.0: amdgpu:     MORE_FAULTS: 0x0
[12681.035434] amdgpu 0000:03:00.0: amdgpu:     WALKER_ERROR: 0x0
[12681.035435] amdgpu 0000:03:00.0: amdgpu:     PERMISSION_FAULTS: 0x0
[12681.035436] amdgpu 0000:03:00.0: amdgpu:     MAPPING_ERROR: 0x0
[12681.035436] amdgpu 0000:03:00.0: amdgpu:     RW: 0x0
[12681.035440] amdgpu 0000:03:00.0: amdgpu: [gfxhub] page fault (src_id:0 ring:157 vmid:0 pasid:0, for process  pid 0 thread  pid 0)
[12681.035441] amdgpu 0000:03:00.0: amdgpu:   in page starting at address 0x0000000000000000 from client 10
[12681.035442] amdgpu 0000:03:00.0: amdgpu: GCVM_L2_PROTECTION_FAULT_STATUS:0x00000000
[12681.035443] amdgpu 0000:03:00.0: amdgpu:     Faulty UTCL2 client ID: CB/DB (0x0)
[12681.035444] amdgpu 0000:03:00.0: amdgpu:     MORE_FAULTS: 0x0
[12681.035444] amdgpu 0000:03:00.0: amdgpu:     WALKER_ERROR: 0x0
[12681.035445] amdgpu 0000:03:00.0: amdgpu:     PERMISSION_FAULTS: 0x0
[12681.035446] amdgpu 0000:03:00.0: amdgpu:     MAPPING_ERROR: 0x0
[12681.035446] amdgpu 0000:03:00.0: amdgpu:     RW: 0x0
[12681.232423] Failed to wait CP_VMID_RESET to 0
[12681.232431] amdgpu 0000:03:00.0: amdgpu: soft reset failed, will fallback to full reset!
[12684.600599] amdgpu 0000:03:00.0: amdgpu: SMU: I'm not done with your previous command: SMN_C2PMSG_66:0x00000029 SMN_C2PMSG_82:0x00000000
[12684.600601] amdgpu 0000:03:00.0: amdgpu: Failed to disable gfxoff!
[12688.164262] amdgpu 0000:03:00.0: amdgpu: SMU: I'm not done with your previous command: SMN_C2PMSG_66:0x00000029 SMN_C2PMSG_82:0x00000000
[12688.164264] amdgpu 0000:03:00.0: amdgpu: [SetDfCstate] failed!
[12688.164265] amdgpu 0000:03:00.0: amdgpu: Failed to disallow df cstate
[12704.641406] [drm:amdgpu_mes_init_microcode [amdgpu]] *ERROR* MES failed to response msg=3
[12704.641449] [drm:amdgpu_mes_unmap_legacy_queue [amdgpu]] *ERROR* failed to unmap legacy queue
[12704.754581] [drm:amdgpu_mes_init_microcode [amdgpu]] *ERROR* MES failed to response msg=3
[12704.754607] [drm:amdgpu_mes_unmap_legacy_queue [amdgpu]] *ERROR* failed to unmap legacy queue
[12704.867743] [drm:amdgpu_mes_init_microcode [amdgpu]] *ERROR* MES failed to response msg=3
[12704.867767] [drm:amdgpu_mes_unmap_legacy_queue [amdgpu]] *ERROR* failed to unmap legacy queue
[12704.980748] [drm:amdgpu_mes_init_microcode [amdgpu]] *ERROR* MES failed to response msg=3
[12704.980772] [drm:amdgpu_mes_unmap_legacy_queue [amdgpu]] *ERROR* failed to unmap legacy queue
[12705.093763] [drm:amdgpu_mes_init_microcode [amdgpu]] *ERROR* MES failed to response msg=3
[12705.093787] [drm:amdgpu_mes_unmap_legacy_queue [amdgpu]] *ERROR* failed to unmap legacy queue
[12705.206766] [drm:amdgpu_mes_init_microcode [amdgpu]] *ERROR* MES failed to response msg=3
[12705.206790] [drm:amdgpu_mes_unmap_legacy_queue [amdgpu]] *ERROR* failed to unmap legacy queue
[12705.319740] [drm:amdgpu_mes_init_microcode [amdgpu]] *ERROR* MES failed to response msg=3
[12705.319764] [drm:amdgpu_mes_unmap_legacy_queue [amdgpu]] *ERROR* failed to unmap legacy queue
[12705.432716] [drm:amdgpu_mes_init_microcode [amdgpu]] *ERROR* MES failed to response msg=3
[12705.432740] [drm:amdgpu_mes_unmap_legacy_queue [amdgpu]] *ERROR* failed to unmap legacy queue
[12705.545686] [drm:amdgpu_mes_init_microcode [amdgpu]] *ERROR* MES failed to response msg=3
[12705.545710] [drm:amdgpu_mes_unmap_legacy_queue [amdgpu]] *ERROR* failed to unmap legacy queue
[12705.742290] [drm:gfx_v9_4_2_set_power_brake_sequence [amdgpu]] *ERROR* failed to halt cp gfx
[12707.782554] [drm] psp gfx command DESTROY_TMR(0x7) failed and response status is (0x0)
[12707.782572] [drm:psp_xgmi_get_topology_info [amdgpu]] *ERROR* Failed to terminate tmr
[12707.782610] [drm:amdgpu_file_to_fpriv [amdgpu]] *ERROR* suspend of IP block <psp> failed -22
[12707.783767] amdgpu 0000:03:00.0: amdgpu: MODE1 reset
[12707.783769] amdgpu 0000:03:00.0: amdgpu: GPU mode1 reset
[12707.783818] amdgpu 0000:03:00.0: amdgpu: GPU smu mode1 reset
[12711.134368] amdgpu 0000:03:00.0: amdgpu: SMU: I'm not done with your previous command: SMN_C2PMSG_66:0x00000029 SMN_C2PMSG_82:0x00000000
[12711.134373] amdgpu 0000:03:00.0: amdgpu: GPU mode1 reset failed
[12711.134374] amdgpu 0000:03:00.0: amdgpu: ASIC reset failed with error, -62 for drm dev, 0000:03:00.0
[12715.348521] amdgpu 0000:03:00.0: amdgpu: GPU reset succeeded, trying to resume
[12715.348761] [drm] PCIE GART of 512M enabled (table at 0x0000008000F00000).
[12715.348837] [drm] VRAM is lost due to GPU reset!
[12715.348839] [drm] PSP is resuming...
[12715.541315] [drm:psp_ras_initialize [amdgpu]] *ERROR* PSP create ring failed!
[12715.541356] [drm:psp_ras_initialize [amdgpu]] *ERROR* PSP resume failed
[12715.541382] [drm:amdgpu_file_to_fpriv [amdgpu]] *ERROR* resume of IP block <psp> failed -62
[12715.541429] [drm] Skip scheduling IBs!


lots of amdgpu errors, the only thing I can do is ssh in & shutdown....
_________________
Gentoo is a way of life...
Back to top
View user's profile Send private message
FilthyPitDog
Apprentice
Apprentice


Joined: 12 Jan 2021
Posts: 186
Location: South Pacific

PostPosted: Tue Feb 27, 2024 3:26 am    Post subject: Reply with quote

It doesn't happen on other distributions, I left my screen on for 14+ days and no issues. On Gentoo I'm lucky for it to last 24hours without the screen turning off and amdgpu failing spectacularly...

Is this a bug?
_________________
Gentoo is a way of life...
Back to top
View user's profile Send private message
FilthyPitDog
Apprentice
Apprentice


Joined: 12 Jan 2021
Posts: 186
Location: South Pacific

PostPosted: Tue Feb 27, 2024 10:31 am    Post subject: Reply with quote

issue with wayland https://gitlab.freedesktop.org/drm/amd/-/issues/2496
_________________
Gentoo is a way of life...
Back to top
View user's profile Send private message
CooSee
Veteran
Veteran


Joined: 20 Nov 2004
Posts: 1450
Location: Earth

PostPosted: Tue Feb 27, 2024 6:35 pm    Post subject: Reply with quote

Code:
BIOS 1809 09/28/2023

you should update your BIOS - please follow the Manual regarding BIOS upgrade !

https://www.asus.com/motherboards-components/motherboards/tuf-gaming/tuf-gaming-x670e-plus-wifi/helpdesk_bios?model2Name=TUF-GAMING-X670E-PLUS-WIFI

Quote:
It doesn't happen on other distributions

many other distros using many patches from here and there - it's hard to tell what's the difference.

good luck.

8)
_________________
" Die Realität ist eine Illusion, die durch Mangel an ehrlicher Kommunikation entsteht "
---
" Der Mensch ist von Natur aus neugierig, was am Ende übrig bleibt ist die Gier "
Back to top
View user's profile Send private message
logrusx
Veteran
Veteran


Joined: 22 Feb 2018
Posts: 1585

PostPosted: Tue Feb 27, 2024 7:01 pm    Post subject: Reply with quote

CooSee wrote:
Code:
BIOS 1809 09/28/2023

you should update your BIOS - please follow the Manual regarding BIOS upgrade !

https://www.asus.com/motherboards-components/motherboards/tuf-gaming/tuf-gaming-x670e-plus-wifi/helpdesk_bios?model2Name=TUF-GAMING-X670E-PLUS-WIFI

Quote:
It doesn't happen on other distributions

many other distros using many patches from here and there - it's hard to tell what's the difference.

good luck.

8)


It's highly unlikely the BIOS has anything to do with that.

Best Regards,
Georgi
Back to top
View user's profile Send private message
CooSee
Veteran
Veteran


Joined: 20 Nov 2004
Posts: 1450
Location: Earth

PostPosted: Tue Feb 27, 2024 7:24 pm    Post subject: Reply with quote

Code:
It's highly unlikely the BIOS has anything to do with that.

you're maybe right, but not every same Mainboard is the same Mainboard.

instead of searching for other solutions and wasting time, i would start with the BIOS first.

8)
_________________
" Die Realität ist eine Illusion, die durch Mangel an ehrlicher Kommunikation entsteht "
---
" Der Mensch ist von Natur aus neugierig, was am Ende übrig bleibt ist die Gier "
Back to top
View user's profile Send private message
pietinger
Moderator
Moderator


Joined: 17 Oct 2006
Posts: 4253
Location: Bavaria

PostPosted: Tue Feb 27, 2024 7:26 pm    Post subject: Reply with quote

logrusx wrote:
It's highly unlikely the BIOS has anything to do with that.

Are you sure?

There have been 4 BIOS updates since PO's version; two of them have stated "system stability" as the cause.

A BIOS update is usually useful and should be tried. Even if it doesn't fix this bug, the latest version (2413) disables STAPM of AM5 ... PO has AMD Ryzen 9 7900X3D (see his dmesg output) ... ;-)

( https://www.anandtech.com/show/21252/amd-set-to-fix-ryzen-8000g-apu-stapm-issue-sustained-loads-affected )
_________________
https://wiki.gentoo.org/wiki/User:Pietinger
Back to top
View user's profile Send private message
logrusx
Veteran
Veteran


Joined: 22 Feb 2018
Posts: 1585

PostPosted: Tue Feb 27, 2024 7:35 pm    Post subject: Reply with quote

pietinger wrote:
logrusx wrote:
It's highly unlikely the BIOS has anything to do with that.

Are you sure?


Yes. OP said it worked for days with another distribution. Any BIOS update for my laptop will ruin S3 sleep, as an example why one should be wary about a BIOS update.

My advice is do not update BIOS(actually EFI firmware) for no apparent reason.

p.s. another example, more remotely related. Whoever updated their Samsung S20 to Android 13 cannot install anything other than whatever Samsung decides anymore. I think Samsung already release what most probably will be the last firmware update for that model, although you could use it for years to come.

Best Regards,
Georgi
Back to top
View user's profile Send private message
pietinger
Moderator
Moderator


Joined: 17 Oct 2006
Posts: 4253
Location: Bavaria

PostPosted: Tue Feb 27, 2024 8:37 pm    Post subject: Reply with quote

logrusx wrote:
Yes. OP said it worked for days with another distribution. [...]

... but we dont know if he does the same things as with Gentoo. For example: Here with the gentoo kernel he loads a module for XBOX-devices:
Code:
[    5.093960] xone_gip: loading out-of-tree module taints kernel.
[    5.093963] xone_gip: module verification failed: signature and/or required key missing - tainting kernel

Maybe he dont use it in other distributions and maybe this module has some specific problems with some BIOS bugs ? Unlikely ? Yes, maybe. But a BIOS update is it worth a try.
_________________
https://wiki.gentoo.org/wiki/User:Pietinger
Back to top
View user's profile Send private message
logrusx
Veteran
Veteran


Joined: 22 Feb 2018
Posts: 1585

PostPosted: Tue Feb 27, 2024 9:14 pm    Post subject: Reply with quote

This is shooting in the dark and may as well turn out to be shooting in the leg.

That's all I have to say about that and I'm going to end it there.

Best Regards,
Georgi
Back to top
View user's profile Send private message
mpagano
Developer
Developer


Joined: 27 Apr 2004
Posts: 197
Location: USA

PostPosted: Tue Feb 27, 2024 11:40 pm    Post subject: Reply with quote

FilthyPitDog wrote:
It doesn't happen on other distributions, I left my screen on for 14+ days and no issues. On Gentoo I'm lucky for it to last 24hours without the screen turning off and amdgpu failing spectacularly...

Is this a bug?


You could get the config from your working distribution and use that to build a new gentoo-sources kernel
Back to top
View user's profile Send private message
FilthyPitDog
Apprentice
Apprentice


Joined: 12 Jan 2021
Posts: 186
Location: South Pacific

PostPosted: Wed Feb 28, 2024 12:37 am    Post subject: Reply with quote

I have been using x11 instead of wayland but got this today with

Code:
➜ uptime -p
up 14 hours, 8 minutes


dmesg

Code:
[drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx_0.0.0 timeout, but soft recovered


It forced me to logout but at least it recovered, my screen didn't turn off. I may try a main-board update
_________________
Gentoo is a way of life...
Back to top
View user's profile Send private message
Anon-E-moose
Watchman
Watchman


Joined: 23 May 2008
Posts: 6103
Location: Dallas area

PostPosted: Wed Feb 28, 2024 1:01 am    Post subject: Reply with quote

Is this running in a VM?

The reason I ask is it starts with AMD-Vi: Event logged [IO_PAGE_FAULT
Do you have any devices you've marked for vfio-pci?

anyway looking at the output of dmesg I see lot of "can't claim BAR" messages which are suspect

Edit to add: I don't think your problems are bios related, might be kernel mis-configuration, since other distros work.
_________________
PRIME x570-pro, 3700x, 6.1 zen kernel
gcc 13, profile 17.0 (custom bare multilib), openrc, wayland
Back to top
View user's profile Send private message
FilthyPitDog
Apprentice
Apprentice


Joined: 12 Jan 2021
Posts: 186
Location: South Pacific

PostPosted: Wed Feb 28, 2024 1:58 am    Post subject: Reply with quote

Anon-E-moose wrote:
Is this running in a VM?

The reason I ask is it starts with AMD-Vi: Event logged [IO_PAGE_FAULT
Do you have any devices you've marked for vfio-pci?

anyway looking at the output of dmesg I see lot of "can't claim BAR" messages which are suspect

Edit to add: I don't think your problems are bios related, might be kernel mis-configuration, since other distros work.


No this is not a VM its on bare metal.

this is what I have in my config for vfio-pci

Code:
# VFIO support for PCI devices
CONFIG_VFIO_PCI_CORE=m
CONFIG_VFIO_PCI_MMAP=y
CONFIG_VFIO_PCI_INTX=y
CONFIG_VFIO_PCI=m
CONFIG_VFIO_PCI_VGA=y
CONFIG_VFIO_PCI_IGD=y
CONFIG_PDS_VFIO_PCI=m
# end of VFIO support for PCI devices

_________________
Gentoo is a way of life...
Back to top
View user's profile Send private message
Anon-E-moose
Watchman
Watchman


Joined: 23 May 2008
Posts: 6103
Location: Dallas area

PostPosted: Wed Feb 28, 2024 10:10 am    Post subject: Reply with quote

On those systems where the card works, what's the kernel version and what's the mesa version?
_________________
PRIME x570-pro, 3700x, 6.1 zen kernel
gcc 13, profile 17.0 (custom bare multilib), openrc, wayland
Back to top
View user's profile Send private message
FilthyPitDog
Apprentice
Apprentice


Joined: 12 Jan 2021
Posts: 186
Location: South Pacific

PostPosted: Wed Feb 28, 2024 9:21 pm    Post subject: Reply with quote

Anon-E-moose wrote:
On those systems where the card works, what's the kernel version and what's the mesa version?


im not sure, out of frustration I went to Manjaro but then came back to Gentoo and here i am now. I dont know what versions they were...
_________________
Gentoo is a way of life...
Back to top
View user's profile Send private message
Display posts from previous:   
Reply to topic    Gentoo Forums Forum Index Kernel & Hardware All times are GMT
Page 1 of 1

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum