Gentoo Forums
Gentoo Forums
Gentoo Forums
Quick Search: in
AMDGPU driver failing on at least one system
View unanswered posts
View posts from last 24 hours

 
Reply to topic    Gentoo Forums Forum Index Kernel & Hardware
View previous topic :: View next topic  
Author Message
RayDude
Advocate
Advocate


Joined: 29 May 2004
Posts: 2062
Location: San Jose, CA

PostPosted: Thu Jun 08, 2023 3:40 pm    Post subject: AMDGPU driver failing on at least one system Reply with quote

I'm running AMDGPU driver on my laptop (APU=6800H RDNA2), my server (GPU=RX6650XT) and another old laptop that acts as a surfer on the old Samsung TV whose apps are no longer supported.

I play Hogwart's Legacy on the media server and once in a while it crashes and exits X Windows. I figured that was a game bug, but this morning I noticed my laptop which had been up for nine days crashed out to the login screen. I looked in /var/log/messages and found this:

Code:
Jun  8 07:09:40 lenny kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring sdma0 timeout, signaled seq=444930, emitted seq=444932
Jun  8 07:09:40 lenny kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process information: process  pid 0 thread  pid 0
Jun  8 07:09:40 lenny kernel: amdgpu 0000:35:00.0: amdgpu: GPU reset begin!
Jun  8 07:09:41 lenny kernel: amdgpu 0000:35:00.0: amdgpu: MODE2 reset
Jun  8 07:09:41 lenny kernel: amdgpu 0000:35:00.0: amdgpu: GPU reset succeeded, trying to resume
Jun  8 07:09:41 lenny kernel: [drm] PCIE GART of 1024M enabled (table at 0x000000F41FC00000).
Jun  8 07:09:41 lenny kernel: [drm] PSP is resuming...
Jun  8 07:09:41 lenny kernel: [drm] reserve 0xa00000 from 0xf41e000000 for PSP TMR
Jun  8 07:09:41 lenny kernel: amdgpu 0000:35:00.0: amdgpu: RAS: optional ras ta ucode is not available
Jun  8 07:09:41 lenny kernel: amdgpu 0000:35:00.0: amdgpu: RAP: optional rap ta ucode is not available
Jun  8 07:09:41 lenny kernel: amdgpu 0000:35:00.0: amdgpu: SECUREDISPLAY: securedisplay ta ucode is not available
Jun  8 07:09:41 lenny kernel: amdgpu 0000:35:00.0: amdgpu: SMU is resuming...
Jun  8 07:09:41 lenny kernel: amdgpu 0000:35:00.0: amdgpu: SMU is resumed successfully!
Jun  8 07:09:41 lenny kernel: [drm] DMUB hardware initialized: version=0x0400002E
Jun  8 07:09:41 lenny kernel: [drm] kiq ring mec 2 pipe 1 q 0
Jun  8 07:09:41 lenny kernel: [drm] VCN decode and encode initialized successfully(under DPG Mode).
Jun  8 07:09:41 lenny kernel: [drm] JPEG decode initialized successfully.
Jun  8 07:09:41 lenny kernel: amdgpu 0000:35:00.0: amdgpu: ring gfx_0.0.0 uses VM inv eng 0 on hub 0
Jun  8 07:09:41 lenny kernel: amdgpu 0000:35:00.0: amdgpu: ring comp_1.0.0 uses VM inv eng 1 on hub 0
Jun  8 07:09:41 lenny kernel: amdgpu 0000:35:00.0: amdgpu: ring comp_1.1.0 uses VM inv eng 4 on hub 0
Jun  8 07:09:41 lenny kernel: amdgpu 0000:35:00.0: amdgpu: ring comp_1.2.0 uses VM inv eng 5 on hub 0
Jun  8 07:09:41 lenny kernel: amdgpu 0000:35:00.0: amdgpu: ring comp_1.3.0 uses VM inv eng 6 on hub 0
Jun  8 07:09:41 lenny kernel: amdgpu 0000:35:00.0: amdgpu: ring comp_1.0.1 uses VM inv eng 7 on hub 0
Jun  8 07:09:41 lenny kernel: amdgpu 0000:35:00.0: amdgpu: ring comp_1.1.1 uses VM inv eng 8 on hub 0
Jun  8 07:09:41 lenny kernel: amdgpu 0000:35:00.0: amdgpu: ring comp_1.2.1 uses VM inv eng 9 on hub 0
Jun  8 07:09:41 lenny kernel: amdgpu 0000:35:00.0: amdgpu: ring comp_1.3.1 uses VM inv eng 10 on hub 0
Jun  8 07:09:41 lenny kernel: amdgpu 0000:35:00.0: amdgpu: ring kiq_2.1.0 uses VM inv eng 11 on hub 0
Jun  8 07:09:41 lenny kernel: amdgpu 0000:35:00.0: amdgpu: ring sdma0 uses VM inv eng 12 on hub 0
Jun  8 07:09:41 lenny kernel: amdgpu 0000:35:00.0: amdgpu: ring vcn_dec_0 uses VM inv eng 0 on hub 1
Jun  8 07:09:41 lenny kernel: amdgpu 0000:35:00.0: amdgpu: ring vcn_enc_0.0 uses VM inv eng 1 on hub 1
Jun  8 07:09:41 lenny kernel: amdgpu 0000:35:00.0: amdgpu: ring vcn_enc_0.1 uses VM inv eng 4 on hub 1
Jun  8 07:09:41 lenny kernel: amdgpu 0000:35:00.0: amdgpu: ring jpeg_dec uses VM inv eng 5 on hub 1
Jun  8 07:09:41 lenny kernel: amdgpu 0000:35:00.0: amdgpu: recover vram bo from shadow start
Jun  8 07:09:41 lenny kernel: amdgpu 0000:35:00.0: amdgpu: recover vram bo from shadow done
Jun  8 07:09:41 lenny kernel: amdgpu 0000:35:00.0: amdgpu: GPU reset(1) succeeded!
Jun  8 07:09:41 lenny kernel: amdgpu 0000:35:00.0: amdgpu: [gfxhub] page fault (src_id:0 ring:40 vmid:6 pasid:32769, for process Xorg pid 2131 thread Xorg:cs0 pid 2488)
Jun  8 07:09:41 lenny kernel: amdgpu 0000:35:00.0: amdgpu:   in page starting at address 0x0000800103002000 from client 0x1b (UTCL2)
Jun  8 07:09:41 lenny kernel: amdgpu 0000:35:00.0: amdgpu: GCVM_L2_PROTECTION_FAULT_STATUS:0x00640051
Jun  8 07:09:41 lenny kernel: amdgpu 0000:35:00.0: amdgpu: \x09 Faulty UTCL2 client ID: CB/DB (0x0)
Jun  8 07:09:41 lenny kernel: amdgpu 0000:35:00.0: amdgpu: \x09 MORE_FAULTS: 0x1
Jun  8 07:09:41 lenny kernel: amdgpu 0000:35:00.0: amdgpu: \x09 WALKER_ERROR: 0x0
Jun  8 07:09:41 lenny kernel: amdgpu 0000:35:00.0: amdgpu: \x09 PERMISSION_FAULTS: 0x5
Jun  8 07:09:41 lenny kernel: amdgpu 0000:35:00.0: amdgpu: \x09 MAPPING_ERROR: 0x0
Jun  8 07:09:41 lenny kernel: amdgpu 0000:35:00.0: amdgpu: \x09 RW: 0x1
Jun  8 07:09:41 lenny kernel: amdgpu 0000:35:00.0: amdgpu: [gfxhub] page fault (src_id:0 ring:40 vmid:6 pasid:32769, for process Xorg pid 2131 thread Xorg:cs0 pid 2488)
Jun  8 07:09:41 lenny kernel: amdgpu 0000:35:00.0: amdgpu:   in page starting at address 0x0000800103003000 from client 0x1b (UTCL2)
Jun  8 07:09:41 lenny kernel: amdgpu 0000:35:00.0: amdgpu: GCVM_L2_PROTECTION_FAULT_STATUS:0x00640051
Jun  8 07:09:41 lenny kernel: amdgpu 0000:35:00.0: amdgpu: \x09 Faulty UTCL2 client ID: CB/DB (0x0)
Jun  8 07:09:41 lenny kernel: amdgpu 0000:35:00.0: amdgpu: \x09 MORE_FAULTS: 0x1
Jun  8 07:09:41 lenny kernel: amdgpu 0000:35:00.0: amdgpu: \x09 WALKER_ERROR: 0x0
Jun  8 07:09:41 lenny kernel: amdgpu 0000:35:00.0: amdgpu: \x09 PERMISSION_FAULTS: 0x5
Jun  8 07:09:41 lenny kernel: amdgpu 0000:35:00.0: amdgpu: \x09 MAPPING_ERROR: 0x0
Jun  8 07:09:41 lenny kernel: amdgpu 0000:35:00.0: amdgpu: \x09 RW: 0x1
Jun  8 07:09:41 lenny kernel: amdgpu 0000:35:00.0: amdgpu: [gfxhub] page fault (src_id:0 ring:40 vmid:6 pasid:32769, for process Xorg pid 2131 thread Xorg:cs0 pid 2488)
Jun  8 07:09:41 lenny kernel: amdgpu 0000:35:00.0: amdgpu:   in page starting at address 0x0000800103033000 from client 0x1b (UTCL2)
Jun  8 07:09:41 lenny kernel: amdgpu 0000:35:00.0: amdgpu: GCVM_L2_PROTECTION_FAULT_STATUS:0x00640051
Jun  8 07:09:41 lenny kernel: amdgpu 0000:35:00.0: amdgpu: \x09 Faulty UTCL2 client ID: CB/DB (0x0)
Jun  8 07:09:41 lenny kernel: amdgpu 0000:35:00.0: amdgpu: \x09 MORE_FAULTS: 0x1
Jun  8 07:09:41 lenny kernel: amdgpu 0000:35:00.0: amdgpu: \x09 WALKER_ERROR: 0x0
Jun  8 07:09:41 lenny kernel: amdgpu 0000:35:00.0: amdgpu: \x09 PERMISSION_FAULTS: 0x5
Jun  8 07:09:41 lenny kernel: amdgpu 0000:35:00.0: amdgpu: \x09 MAPPING_ERROR: 0x0
Jun  8 07:09:41 lenny kernel: amdgpu 0000:35:00.0: amdgpu: \x09 RW: 0x1
Jun  8 07:09:41 lenny kernel: amdgpu 0000:35:00.0: amdgpu: [gfxhub] page fault (src_id:0 ring:40 vmid:6 pasid:32769, for process Xorg pid 2131 thread Xorg:cs0 pid 2488)
Jun  8 07:09:41 lenny kernel: amdgpu 0000:35:00.0: amdgpu:   in page starting at address 0x0000800103034000 from client 0x1b (UTCL2)
Jun  8 07:09:41 lenny kernel: amdgpu 0000:35:00.0: amdgpu: GCVM_L2_PROTECTION_FAULT_STATUS:0x00640051
Jun  8 07:09:41 lenny kernel: amdgpu 0000:35:00.0: amdgpu: \x09 Faulty UTCL2 client ID: CB/DB (0x0)
Jun  8 07:09:41 lenny kernel: amdgpu 0000:35:00.0: amdgpu: \x09 MORE_FAULTS: 0x1
Jun  8 07:09:41 lenny kernel: amdgpu 0000:35:00.0: amdgpu: \x09 WALKER_ERROR: 0x0
Jun  8 07:09:41 lenny kernel: amdgpu 0000:35:00.0: amdgpu: \x09 PERMISSION_FAULTS: 0x5
Jun  8 07:09:41 lenny kernel: amdgpu 0000:35:00.0: amdgpu: \x09 MAPPING_ERROR: 0x0
Jun  8 07:09:41 lenny kernel: amdgpu 0000:35:00.0: amdgpu: \x09 RW: 0x1
Jun  8 07:09:41 lenny kernel: amdgpu 0000:35:00.0: amdgpu: [gfxhub] page fault (src_id:0 ring:40 vmid:6 pasid:32769, for process Xorg pid 2131 thread Xorg:cs0 pid 2488)
Jun  8 07:09:41 lenny kernel: amdgpu 0000:35:00.0: amdgpu:   in page starting at address 0x0000800103031000 from client 0x1b (UTCL2)
Jun  8 07:09:41 lenny kernel: amdgpu 0000:35:00.0: amdgpu: GCVM_L2_PROTECTION_FAULT_STATUS:0x00640051
Jun  8 07:09:41 lenny kernel: amdgpu 0000:35:00.0: amdgpu: \x09 Faulty UTCL2 client ID: CB/DB (0x0)
Jun  8 07:09:41 lenny kernel: amdgpu 0000:35:00.0: amdgpu: \x09 MORE_FAULTS: 0x1
Jun  8 07:09:41 lenny kernel: amdgpu 0000:35:00.0: amdgpu: \x09 WALKER_ERROR: 0x0
Jun  8 07:09:41 lenny kernel: amdgpu 0000:35:00.0: amdgpu: \x09 PERMISSION_FAULTS: 0x5
Jun  8 07:09:41 lenny kernel: amdgpu 0000:35:00.0: amdgpu: \x09 MAPPING_ERROR: 0x0
Jun  8 07:09:41 lenny kernel: amdgpu 0000:35:00.0: amdgpu: \x09 RW: 0x1
Jun  8 07:09:41 lenny kernel: amdgpu 0000:35:00.0: amdgpu: [gfxhub] page fault (src_id:0 ring:40 vmid:6 pasid:32769, for process Xorg pid 2131 thread Xorg:cs0 pid 2488)
Jun  8 07:09:41 lenny kernel: amdgpu 0000:35:00.0: amdgpu:   in page starting at address 0x0000800103032000 from client 0x1b (UTCL2)
Jun  8 07:09:41 lenny kernel: amdgpu 0000:35:00.0: amdgpu: GCVM_L2_PROTECTION_FAULT_STATUS:0x00000000
Jun  8 07:09:41 lenny kernel: amdgpu 0000:35:00.0: amdgpu: \x09 Faulty UTCL2 client ID: CB/DB (0x0)
Jun  8 07:09:41 lenny kernel: amdgpu 0000:35:00.0: amdgpu: \x09 MORE_FAULTS: 0x0
Jun  8 07:09:41 lenny kernel: amdgpu 0000:35:00.0: amdgpu: \x09 WALKER_ERROR: 0x0
Jun  8 07:09:41 lenny kernel: amdgpu 0000:35:00.0: amdgpu: \x09 PERMISSION_FAULTS: 0x0
Jun  8 07:09:41 lenny kernel: amdgpu 0000:35:00.0: amdgpu: \x09 MAPPING_ERROR: 0x0
Jun  8 07:09:41 lenny kernel: amdgpu 0000:35:00.0: amdgpu: \x09 RW: 0x0
Jun  8 07:09:41 lenny kernel: amdgpu 0000:35:00.0: amdgpu: [gfxhub] page fault (src_id:0 ring:40 vmid:6 pasid:32769, for process Xorg pid 2131 thread Xorg:cs0 pid 2488)
Jun  8 07:09:41 lenny kernel: amdgpu 0000:35:00.0: amdgpu:   in page starting at address 0x0000800103065000 from client 0x1b (UTCL2)
Jun  8 07:09:41 lenny kernel: amdgpu 0000:35:00.0: amdgpu: GCVM_L2_PROTECTION_FAULT_STATUS:0x00640051
Jun  8 07:09:41 lenny kernel: amdgpu 0000:35:00.0: amdgpu: \x09 Faulty UTCL2 client ID: CB/DB (0x0)
Jun  8 07:09:41 lenny kernel: amdgpu 0000:35:00.0: amdgpu: \x09 MORE_FAULTS: 0x1
Jun  8 07:09:41 lenny kernel: amdgpu 0000:35:00.0: amdgpu: \x09 WALKER_ERROR: 0x0
Jun  8 07:09:41 lenny kernel: amdgpu 0000:35:00.0: amdgpu: \x09 PERMISSION_FAULTS: 0x5
Jun  8 07:09:41 lenny kernel: amdgpu 0000:35:00.0: amdgpu: \x09 MAPPING_ERROR: 0x0
Jun  8 07:09:41 lenny kernel: amdgpu 0000:35:00.0: amdgpu: \x09 RW: 0x1
Jun  8 07:09:41 lenny kernel: amdgpu 0000:35:00.0: amdgpu: [gfxhub] page fault (src_id:0 ring:40 vmid:6 pasid:32769, for process Xorg pid 2131 thread Xorg:cs0 pid 2488)
Jun  8 07:09:41 lenny kernel: amdgpu 0000:35:00.0: amdgpu:   in page starting at address 0x0000800103001000 from client 0x1b (UTCL2)
Jun  8 07:09:41 lenny kernel: amdgpu 0000:35:00.0: amdgpu: GCVM_L2_PROTECTION_FAULT_STATUS:0x00640051
Jun  8 07:09:41 lenny kernel: amdgpu 0000:35:00.0: amdgpu: \x09 Faulty UTCL2 client ID: CB/DB (0x0)
Jun  8 07:09:41 lenny kernel: amdgpu 0000:35:00.0: amdgpu: \x09 MORE_FAULTS: 0x1
Jun  8 07:09:41 lenny kernel: amdgpu 0000:35:00.0: amdgpu: \x09 WALKER_ERROR: 0x0
Jun  8 07:09:41 lenny kernel: amdgpu 0000:35:00.0: amdgpu: \x09 PERMISSION_FAULTS: 0x5
Jun  8 07:09:41 lenny kernel: amdgpu 0000:35:00.0: amdgpu: \x09 MAPPING_ERROR: 0x0
Jun  8 07:09:41 lenny kernel: amdgpu 0000:35:00.0: amdgpu: \x09 RW: 0x1
Jun  8 07:09:41 lenny kernel: amdgpu 0000:35:00.0: amdgpu: [gfxhub] page fault (src_id:0 ring:40 vmid:6 pasid:32769, for process Xorg pid 2131 thread Xorg:cs0 pid 2488)
Jun  8 07:09:41 lenny kernel: amdgpu 0000:35:00.0: amdgpu:   in page starting at address 0x0000800103000000 from client 0x1b (UTCL2)
Jun  8 07:09:41 lenny kernel: amdgpu 0000:35:00.0: amdgpu: GCVM_L2_PROTECTION_FAULT_STATUS:0x00640051
Jun  8 07:09:41 lenny kernel: amdgpu 0000:35:00.0: amdgpu: \x09 Faulty UTCL2 client ID: CB/DB (0x0)
Jun  8 07:09:41 lenny kernel: amdgpu 0000:35:00.0: amdgpu: \x09 MORE_FAULTS: 0x1
Jun  8 07:09:41 lenny kernel: amdgpu 0000:35:00.0: amdgpu: \x09 WALKER_ERROR: 0x0
Jun  8 07:09:41 lenny kernel: amdgpu 0000:35:00.0: amdgpu: \x09 PERMISSION_FAULTS: 0x5
Jun  8 07:09:41 lenny kernel: amdgpu 0000:35:00.0: amdgpu: \x09 MAPPING_ERROR: 0x0
Jun  8 07:09:41 lenny kernel: amdgpu 0000:35:00.0: amdgpu: \x09 RW: 0x1
Jun  8 07:09:41 lenny kernel: amdgpu 0000:35:00.0: amdgpu: [gfxhub] page fault (src_id:0 ring:40 vmid:6 pasid:32769, for process Xorg pid 2131 thread Xorg:cs0 pid 2488)
Jun  8 07:09:41 lenny kernel: amdgpu 0000:35:00.0: amdgpu:   in page starting at address 0x0000800103062000 from client 0x1b (UTCL2)
Jun  8 07:09:41 lenny kernel: amdgpu 0000:35:00.0: amdgpu: GCVM_L2_PROTECTION_FAULT_STATUS:0x00640051
Jun  8 07:09:41 lenny kernel: amdgpu 0000:35:00.0: amdgpu: \x09 Faulty UTCL2 client ID: CB/DB (0x0)
Jun  8 07:09:41 lenny kernel: amdgpu 0000:35:00.0: amdgpu: \x09 MORE_FAULTS: 0x1
Jun  8 07:09:41 lenny kernel: amdgpu 0000:35:00.0: amdgpu: \x09 WALKER_ERROR: 0x0
Jun  8 07:09:41 lenny kernel: amdgpu 0000:35:00.0: amdgpu: \x09 PERMISSION_FAULTS: 0x5
Jun  8 07:09:41 lenny kernel: amdgpu 0000:35:00.0: amdgpu: \x09 MAPPING_ERROR: 0x0
Jun  8 07:09:41 lenny kernel: amdgpu 0000:35:00.0: amdgpu: \x09 RW: 0x1
Jun  8 07:09:53 lenny kernel: gmc_v10_0_process_interrupt: 1930 callbacks suppressed
Jun  8 07:09:53 lenny kernel: amdgpu 0000:35:00.0: amdgpu: [gfxhub] page fault (src_id:0 ring:40 vmid:6 pasid:32769, for process Xorg pid 2131 thread Xorg:cs0 pid 2488)
Jun  8 07:09:53 lenny kernel: amdgpu 0000:35:00.0: amdgpu:   in page starting at address 0x0000800103009000 from client 0x1b (UTCL2)
Jun  8 07:09:53 lenny kernel: amdgpu 0000:35:00.0: amdgpu: GCVM_L2_PROTECTION_FAULT_STATUS:0x00641051
Jun  8 07:09:53 lenny kernel: amdgpu 0000:35:00.0: amdgpu: \x09 Faulty UTCL2 client ID: TCP (0x8)
Jun  8 07:09:53 lenny kernel: amdgpu 0000:35:00.0: amdgpu: \x09 MORE_FAULTS: 0x1
Jun  8 07:09:53 lenny kernel: amdgpu 0000:35:00.0: amdgpu: \x09 WALKER_ERROR: 0x0
Jun  8 07:09:53 lenny kernel: amdgpu 0000:35:00.0: amdgpu: \x09 PERMISSION_FAULTS: 0x5
Jun  8 07:09:53 lenny kernel: amdgpu 0000:35:00.0: amdgpu: \x09 MAPPING_ERROR: 0x0
Jun  8 07:09:53 lenny kernel: amdgpu 0000:35:00.0: amdgpu: \x09 RW: 0x1
Jun  8 07:09:53 lenny kernel: amdgpu 0000:35:00.0: amdgpu: [gfxhub] page fault (src_id:0 ring:40 vmid:6 pasid:32769, for process Xorg pid 2131 thread Xorg:cs0 pid 2488)
Jun  8 07:09:53 lenny kernel: amdgpu 0000:35:00.0: amdgpu:   in page starting at address 0x0000800103008000 from client 0x1b (UTCL2)
Jun  8 07:09:53 lenny kernel: amdgpu 0000:35:00.0: amdgpu: GCVM_L2_PROTECTION_FAULT_STATUS:0x00000000
Jun  8 07:09:53 lenny kernel: amdgpu 0000:35:00.0: amdgpu: \x09 Faulty UTCL2 client ID: CB/DB (0x0)
Jun  8 07:09:53 lenny kernel: amdgpu 0000:35:00.0: amdgpu: \x09 MORE_FAULTS: 0x0
Jun  8 07:09:53 lenny kernel: amdgpu 0000:35:00.0: amdgpu: \x09 WALKER_ERROR: 0x0
Jun  8 07:09:53 lenny kernel: amdgpu 0000:35:00.0: amdgpu: \x09 PERMISSION_FAULTS: 0x0
Jun  8 07:09:53 lenny kernel: amdgpu 0000:35:00.0: amdgpu: \x09 MAPPING_ERROR: 0x0
Jun  8 07:09:53 lenny kernel: amdgpu 0000:35:00.0: amdgpu: \x09 RW: 0x0
Jun  8 07:09:53 lenny kernel: amdgpu 0000:35:00.0: amdgpu: [gfxhub] page fault (src_id:0 ring:40 vmid:6 pasid:32769, for process Xorg pid 2131 thread Xorg:cs0 pid 2488)
Jun  8 07:09:53 lenny kernel: amdgpu 0000:35:00.0: amdgpu:   in page starting at address 0x0000800103009000 from client 0x1b (UTCL2)
Jun  8 07:09:53 lenny kernel: amdgpu 0000:35:00.0: amdgpu: GCVM_L2_PROTECTION_FAULT_STATUS:0x00000000
Jun  8 07:09:53 lenny kernel: amdgpu 0000:35:00.0: amdgpu: \x09 Faulty UTCL2 client ID: CB/DB (0x0)
Jun  8 07:09:53 lenny kernel: amdgpu 0000:35:00.0: amdgpu: \x09 MORE_FAULTS: 0x0
Jun  8 07:09:53 lenny kernel: amdgpu 0000:35:00.0: amdgpu: \x09 WALKER_ERROR: 0x0
Jun  8 07:09:53 lenny kernel: amdgpu 0000:35:00.0: amdgpu: \x09 PERMISSION_FAULTS: 0x0
Jun  8 07:09:53 lenny kernel: amdgpu 0000:35:00.0: amdgpu: \x09 MAPPING_ERROR: 0x0
Jun  8 07:09:53 lenny kernel: amdgpu 0000:35:00.0: amdgpu: \x09 RW: 0x0
Jun  8 07:09:53 lenny kernel: amdgpu 0000:35:00.0: amdgpu: [gfxhub] page fault (src_id:0 ring:40 vmid:6 pasid:32769, for process Xorg pid 2131 thread Xorg:cs0 pid 2488)
Jun  8 07:09:53 lenny kernel: amdgpu 0000:35:00.0: amdgpu:   in page starting at address 0x0000800103008000 from client 0x1b (UTCL2)
Jun  8 07:09:53 lenny kernel: amdgpu 0000:35:00.0: amdgpu: GCVM_L2_PROTECTION_FAULT_STATUS:0x00000000
Jun  8 07:09:53 lenny kernel: amdgpu 0000:35:00.0: amdgpu: \x09 Faulty UTCL2 client ID: CB/DB (0x0)
Jun  8 07:09:53 lenny kernel: amdgpu 0000:35:00.0: amdgpu: \x09 MORE_FAULTS: 0x0
Jun  8 07:09:53 lenny kernel: amdgpu 0000:35:00.0: amdgpu: \x09 WALKER_ERROR: 0x0
Jun  8 07:09:53 lenny kernel: amdgpu 0000:35:00.0: amdgpu: \x09 PERMISSION_FAULTS: 0x0
Jun  8 07:09:53 lenny kernel: amdgpu 0000:35:00.0: amdgpu: \x09 MAPPING_ERROR: 0x0
Jun  8 07:09:53 lenny kernel: amdgpu 0000:35:00.0: amdgpu: \x09 RW: 0x0
Jun  8 07:09:53 lenny kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx_0.0.0 timeout, but soft recovered
Jun  8 07:10:03 lenny kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx_0.0.0 timeout, signaled seq=23929877, emitted seq=23929880
Jun  8 07:10:03 lenny kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process information: process Xorg pid 2131 thread Xorg:cs0 pid 2488
Jun  8 07:10:03 lenny kernel: amdgpu 0000:35:00.0: amdgpu: GPU reset begin!
Jun  8 07:10:03 lenny kernel: amdgpu 0000:35:00.0: amdgpu: MODE2 reset
Jun  8 07:10:03 lenny kernel: amdgpu 0000:35:00.0: amdgpu: GPU reset succeeded, trying to resume
Jun  8 07:10:03 lenny kernel: [drm] PCIE GART of 1024M enabled (table at 0x000000F41FC00000).
Jun  8 07:10:03 lenny kernel: [drm] PSP is resuming...
Jun  8 07:10:03 lenny kernel: [drm] reserve 0xa00000 from 0xf41e000000 for PSP TMR
Jun  8 07:10:04 lenny kernel: amdgpu 0000:35:00.0: amdgpu: RAS: optional ras ta ucode is not available
Jun  8 07:10:04 lenny kernel: amdgpu 0000:35:00.0: amdgpu: RAP: optional rap ta ucode is not available
Jun  8 07:10:04 lenny kernel: amdgpu 0000:35:00.0: amdgpu: SECUREDISPLAY: securedisplay ta ucode is not available
Jun  8 07:10:04 lenny kernel: amdgpu 0000:35:00.0: amdgpu: SMU is resuming...
Jun  8 07:10:04 lenny kernel: amdgpu 0000:35:00.0: amdgpu: SMU is resumed successfully!
Jun  8 07:10:04 lenny kernel: [drm] DMUB hardware initialized: version=0x0400002E
Jun  8 07:10:04 lenny kernel: [drm] kiq ring mec 2 pipe 1 q 0
Jun  8 07:10:04 lenny kernel: [drm] VCN decode and encode initialized successfully(under DPG Mode).
Jun  8 07:10:04 lenny kernel: [drm] JPEG decode initialized successfully.
Jun  8 07:10:04 lenny kernel: amdgpu 0000:35:00.0: amdgpu: ring gfx_0.0.0 uses VM inv eng 0 on hub 0
Jun  8 07:10:04 lenny kernel: amdgpu 0000:35:00.0: amdgpu: ring comp_1.0.0 uses VM inv eng 1 on hub 0
Jun  8 07:10:04 lenny kernel: amdgpu 0000:35:00.0: amdgpu: ring comp_1.1.0 uses VM inv eng 4 on hub 0
Jun  8 07:10:04 lenny kernel: amdgpu 0000:35:00.0: amdgpu: ring comp_1.2.0 uses VM inv eng 5 on hub 0
Jun  8 07:10:04 lenny kernel: amdgpu 0000:35:00.0: amdgpu: ring comp_1.3.0 uses VM inv eng 6 on hub 0
Jun  8 07:10:04 lenny kernel: amdgpu 0000:35:00.0: amdgpu: ring comp_1.0.1 uses VM inv eng 7 on hub 0
Jun  8 07:10:04 lenny kernel: amdgpu 0000:35:00.0: amdgpu: ring comp_1.1.1 uses VM inv eng 8 on hub 0
Jun  8 07:10:04 lenny kernel: amdgpu 0000:35:00.0: amdgpu: ring comp_1.2.1 uses VM inv eng 9 on hub 0
Jun  8 07:10:04 lenny kernel: amdgpu 0000:35:00.0: amdgpu: ring comp_1.3.1 uses VM inv eng 10 on hub 0
Jun  8 07:10:04 lenny kernel: amdgpu 0000:35:00.0: amdgpu: ring kiq_2.1.0 uses VM inv eng 11 on hub 0
Jun  8 07:10:04 lenny kernel: amdgpu 0000:35:00.0: amdgpu: ring sdma0 uses VM inv eng 12 on hub 0
Jun  8 07:10:04 lenny kernel: amdgpu 0000:35:00.0: amdgpu: ring vcn_dec_0 uses VM inv eng 0 on hub 1
Jun  8 07:10:04 lenny kernel: amdgpu 0000:35:00.0: amdgpu: ring vcn_enc_0.0 uses VM inv eng 1 on hub 1
Jun  8 07:10:04 lenny kernel: amdgpu 0000:35:00.0: amdgpu: ring vcn_enc_0.1 uses VM inv eng 4 on hub 1
Jun  8 07:10:04 lenny kernel: amdgpu 0000:35:00.0: amdgpu: ring jpeg_dec uses VM inv eng 5 on hub 1
Jun  8 07:10:04 lenny kernel: amdgpu 0000:35:00.0: amdgpu: recover vram bo from shadow start
Jun  8 07:10:04 lenny kernel: amdgpu 0000:35:00.0: amdgpu: recover vram bo from shadow done
Jun  8 07:10:04 lenny kernel: amdgpu 0000:35:00.0: amdgpu: GPU reset(4) succeeded!
Jun  8 07:10:04 lenny kernel: [drm] Skip scheduling IBs!
Jun  8 07:10:04 lenny kernel: [drm] Skip scheduling IBs!
Jun  8 07:10:04 lenny kernel: [drm] Skip scheduling IBs!
Jun  8 07:10:04 lenny kernel: [drm] Skip scheduling IBs!
Jun  8 07:10:04 lenny kernel: [drm] Skip scheduling IBs!
Jun  8 07:10:04 lenny kernel: [drm] Skip scheduling IBs!
Jun  8 07:10:04 lenny kernel: [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize parser -125!


Also, on both systems, even though the video reinitializes. Audio is dead with no devices found. Restarting ALSA doesn't help. I'm running pipewire and I wonder if there's a way to restart it's server to bring things back.

The video crash feels like a driver bug. The next time the desktop PC crashes, I'll check it's /var/log/messages and see if the same thing is reported on it.

Can someone confirm this issue?

Can someone suggest how I report this back to the AMDGPU devs?

Thanks in advance.
_________________
Some day there will only be free software.
Back to top
View user's profile Send private message
Goverp
Advocate
Advocate


Joined: 07 Mar 2007
Posts: 2004

PostPosted: Thu Jun 08, 2023 5:01 pm    Post subject: Re: AMDGPU driver failing on at least one system Reply with quote

RayDude wrote:
...
Can someone suggest how I report this back to the AMDGPU devs?
...

Look at the MAINTAINERS file in /usr/src/linux for the AMDGPU entry... AFAIR it also contains a pointer to the appropriate bugzilla, which might contain some entries for this problem. What's likely to be a problem is finding a way to reliable duplicate it, as it's sporadic. They'd prefer a nicely bisected bad kernel git commit!
_________________
Greybeard
Back to top
View user's profile Send private message
RayDude
Advocate
Advocate


Joined: 29 May 2004
Posts: 2062
Location: San Jose, CA

PostPosted: Thu Jun 08, 2023 9:24 pm    Post subject: Re: AMDGPU driver failing on at least one system Reply with quote

Goverp wrote:
RayDude wrote:
...
Can someone suggest how I report this back to the AMDGPU devs?
...

Look at the MAINTAINERS file in /usr/src/linux for the AMDGPU entry... AFAIR it also contains a pointer to the appropriate bugzilla, which might contain some entries for this problem. What's likely to be a problem is finding a way to reliable duplicate it, as it's sporadic. They'd prefer a nicely bisected bad kernel git commit!


Thanks. I'll see what I can find.
_________________
Some day there will only be free software.
Back to top
View user's profile Send private message
ali3nx
l33t
l33t


Joined: 21 Sep 2003
Posts: 722
Location: Winnipeg, Canada

PostPosted: Fri Jun 09, 2023 9:16 pm    Post subject: Reply with quote

There is an outstanding bug that reappeared with 6.1+ linux kernels and some models of amd graphics cards that rely on amdgpu graphics drivers that causes the gpu driver to generate a kernel panic. The bug results in an amdgpu driver initialization ring timeout.

I have two RX 5500 cards in different builds and neither system are functionally reliable with a 6.1 kernel. My only reliable resolution was forcing a downgrade to 5.15 LTS linux headers and kernel version. Sometimes the kernel panic is so subtle it's not greatly disruptive to system stability but the amdgpu driver will ultimately be entirely unsable as a result.

here's the link to the kernel bugzilla thread

I added this to package.mask then recompiled the entire system build because major linux-headers changes are always a major build time consistency concern.

Code:
>=sys-kernel/gentoo-kernel-bin-6.1
>=virtual/dist-kernel-6.1
>=sys-kernel/linux-headers-6.1


Entirely eliminated the amdgpu driver ring timeouts on two systems with the same model gpu.
_________________
Compiling Gentoo since version 1.4
Thousands of Gentoo Installs Completed
Emerged on every continent but Antarctica
Compile long and Prosper!
Back to top
View user's profile Send private message
RayDude
Advocate
Advocate


Joined: 29 May 2004
Posts: 2062
Location: San Jose, CA

PostPosted: Sun Jun 11, 2023 7:10 am    Post subject: Reply with quote

ali3nx wrote:
There is an outstanding bug that reappeared with 6.1+ linux kernels and some models of amd graphics cards that rely on amdgpu graphics drivers that causes the gpu driver to generate a kernel panic. The bug results in an amdgpu driver initialization ring timeout.

I have two RX 5500 cards in different builds and neither system are functionally reliable with a 6.1 kernel. My only reliable resolution was forcing a downgrade to 5.15 LTS linux headers and kernel version. Sometimes the kernel panic is so subtle it's not greatly disruptive to system stability but the amdgpu driver will ultimately be entirely unsable as a result.

here's the link to the kernel bugzilla thread

I added this to package.mask then recompiled the entire system build because major linux-headers changes are always a major build time consistency concern.

Code:
>=sys-kernel/gentoo-kernel-bin-6.1
>=virtual/dist-kernel-6.1
>=sys-kernel/linux-headers-6.1


Entirely eliminated the amdgpu driver ring timeouts on two systems with the same model gpu.


Have you tried a 6.3 kernel?
_________________
Some day there will only be free software.
Back to top
View user's profile Send private message
ali3nx
l33t
l33t


Joined: 21 Sep 2003
Posts: 722
Location: Winnipeg, Canada

PostPosted: Sun Jun 11, 2023 9:03 am    Post subject: Reply with quote

RayDude wrote:

Have you tried a 6.3 kernel?


No I haven't. I rely on one if the systems for production stability and the other system doesn't see a lot of use. Testing either one with a newer kernel is a significant volume of effort to validate a kernel bug with a consistent system build considering others have reported that bug has been reproducible when using up to 6.4.

I'm not eager to ruin a production reliable system build by recompiling everything with unstable toolchain packages.

The newer kernel is more likely to be unreliable and so far has been. The driver bug started manifesting for me with 6.x and wasn't at all a concern or symptom with 5.15 so I rolled back to the previous LTS.
_________________
Compiling Gentoo since version 1.4
Thousands of Gentoo Installs Completed
Emerged on every continent but Antarctica
Compile long and Prosper!
Back to top
View user's profile Send private message
nvaert1986
Tux's lil' helper
Tux's lil' helper


Joined: 05 May 2019
Posts: 120

PostPosted: Fri Jun 16, 2023 9:27 am    Post subject: Reply with quote

I had a very similar issue with the 5.19.x and 6.0 kernels (the exact same symptoms and sporadic though the error message slightly different; mostly the same). This issue disappeared for me with the 6.1 kernel up until 6.1.20 so far (that's the kernel I'm currently using on the system). You could try using the 6.1.20 kernel to see if this resolves the issue. I'll be upgrading to the latest 6.1.x kernel this weekend and I'll let you know if anything unusual happens. I'm using a AMD Radeon RX6750XT on an Intel Core i9-12900K if that matters.
Back to top
View user's profile Send private message
RayDude
Advocate
Advocate


Joined: 29 May 2004
Posts: 2062
Location: San Jose, CA

PostPosted: Wed Jun 21, 2023 10:37 pm    Post subject: Reply with quote

I upgraded to 6.3.7 and the problem has gotten worse.

I think I may have a clue as to what is sending it over the edge.

I'm running a program called vivado that eats RAM for lunch. Every time I run this thing it takes ram usage from roughly 25% up to around 75% and then when it's done memory drops back to 25%.

After that, twice today, I had failures. One was a video reset and the other was a dead lock of the laptop.

I have no data on the video reset because the system was flaky so I rebooted it. And after reboot the USB mouse and keyboard didn't work so I had to shut down and reboot to get them working again.

I think this is all that applies to the hard crash in /var/log/messages

Code:
Jun 21 10:16:38 lenny kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx_0.0.0 timeout, signaled seq=16084272, emitted seq=16084274
Jun 21 10:16:38 lenny kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process information: process Xorg pid 2136 thread Xorg:cs0 pid 2443
Jun 21 10:16:38 lenny kernel: amdgpu 0000:35:00.0: amdgpu: GPU reset begin!
Jun 21 10:16:39 lenny kernel: amdgpu 0000:35:00.0: amdgpu: MODE2 reset
Jun 21 10:16:39 lenny kernel: amdgpu 0000:35:00.0: amdgpu: GPU reset succeeded, trying to resume
Jun 21 10:16:39 lenny kernel: [drm] PCIE GART of 1024M enabled (table at 0x000000F41FC00000).
Jun 21 10:16:39 lenny kernel: [drm] PSP is resuming...
Jun 21 10:16:39 lenny kernel: [drm] reserve 0xa00000 from 0xf41e000000 for PSP TMR
Jun 21 10:16:40 lenny kernel: amdgpu 0000:35:00.0: amdgpu: RAS: optional ras ta ucode is not available
Jun 21 10:16:40 lenny kernel: amdgpu 0000:35:00.0: amdgpu: RAP: optional rap ta ucode is not available
Jun 21 10:16:40 lenny kernel: amdgpu 0000:35:00.0: amdgpu: SECUREDISPLAY: securedisplay ta ucode is not available
Jun 21 10:16:40 lenny kernel: amdgpu 0000:35:00.0: amdgpu: SMU is resuming...
Jun 21 10:16:40 lenny kernel: amdgpu 0000:35:00.0: amdgpu: SMU is resumed successfully!
Jun 21 10:16:40 lenny kernel: [drm] DMUB hardware initialized: version=0x0400002E
Jun 21 10:16:40 lenny kernel: [drm] kiq ring mec 2 pipe 1 q 0
Jun 21 10:16:40 lenny kernel: [drm] VCN decode and encode initialized successfully(under DPG Mode).
Jun 21 10:16:40 lenny kernel: [drm] JPEG decode initialized successfully.
Jun 21 10:16:40 lenny kernel: amdgpu 0000:35:00.0: amdgpu: ring gfx_0.0.0 uses VM inv eng 0 on hub 0
Jun 21 10:16:40 lenny kernel: amdgpu 0000:35:00.0: amdgpu: ring comp_1.0.0 uses VM inv eng 1 on hub 0
Jun 21 10:16:40 lenny kernel: amdgpu 0000:35:00.0: amdgpu: ring comp_1.1.0 uses VM inv eng 4 on hub 0
Jun 21 10:16:40 lenny kernel: amdgpu 0000:35:00.0: amdgpu: ring comp_1.2.0 uses VM inv eng 5 on hub 0
Jun 21 10:16:40 lenny kernel: amdgpu 0000:35:00.0: amdgpu: ring comp_1.3.0 uses VM inv eng 6 on hub 0
Jun 21 10:16:40 lenny kernel: amdgpu 0000:35:00.0: amdgpu: ring comp_1.0.1 uses VM inv eng 7 on hub 0
Jun 21 10:16:40 lenny kernel: amdgpu 0000:35:00.0: amdgpu: ring comp_1.1.1 uses VM inv eng 8 on hub 0
Jun 21 10:16:40 lenny kernel: amdgpu 0000:35:00.0: amdgpu: ring comp_1.2.1 uses VM inv eng 9 on hub 0
Jun 21 10:16:40 lenny kernel: amdgpu 0000:35:00.0: amdgpu: ring comp_1.3.1 uses VM inv eng 10 on hub 0
Jun 21 10:16:40 lenny kernel: amdgpu 0000:35:00.0: amdgpu: ring kiq_2.1.0 uses VM inv eng 11 on hub 0
Jun 21 10:16:40 lenny kernel: amdgpu 0000:35:00.0: amdgpu: ring sdma0 uses VM inv eng 12 on hub 0
Jun 21 10:16:40 lenny kernel: amdgpu 0000:35:00.0: amdgpu: ring vcn_dec_0 uses VM inv eng 0 on hub 1
Jun 21 10:16:40 lenny kernel: amdgpu 0000:35:00.0: amdgpu: ring vcn_enc_0.0 uses VM inv eng 1 on hub 1
Jun 21 10:16:40 lenny kernel: amdgpu 0000:35:00.0: amdgpu: ring vcn_enc_0.1 uses VM inv eng 4 on hub 1
Jun 21 10:16:40 lenny kernel: amdgpu 0000:35:00.0: amdgpu: ring jpeg_dec uses VM inv eng 5 on hub 1
Jun 21 10:16:40 lenny kernel: amdgpu 0000:35:00.0: amdgpu: recover vram bo from shadow start
Jun 21 10:16:40 lenny kernel: amdgpu 0000:35:00.0: amdgpu: recover vram bo from shadow done
Jun 21 10:16:40 lenny kernel: amdgpu 0000:35:00.0: amdgpu: GPU reset(2) succeeded!
Jun 21 10:16:40 lenny kernel: [drm] Skip scheduling IBs!
Jun 21 10:16:40 lenny kernel: [drm] Skip scheduling IBs!
Jun 21 10:16:40 lenny kernel: [drm] Skip scheduling IBs!
Jun 21 10:16:40 lenny kernel: [drm] Skip scheduling IBs!
Jun 21 10:16:40 lenny kernel: [drm] Skip scheduling IBs!
Jun 21 10:16:40 lenny kernel: [drm] Skip scheduling IBs!
Jun 21 10:16:40 lenny kernel: [drm] Skip scheduling IBs!
Jun 21 10:16:40 lenny kernel: [drm] Skip scheduling IBs!
Jun 21 10:16:40 lenny kernel: [drm] Skip scheduling IBs!
Jun 21 10:16:40 lenny kernel: [drm] Skip scheduling IBs!
Jun 21 10:16:40 lenny kernel: [drm] Skip scheduling IBs!
Jun 21 10:16:40 lenny kernel: [drm] Skip scheduling IBs!
Jun 21 10:16:40 lenny kernel: [drm] Skip scheduling IBs!
Jun 21 10:16:40 lenny kernel: [drm] Skip scheduling IBs!
Jun 21 10:16:40 lenny kernel: [drm] Skip scheduling IBs!
Jun 21 10:16:40 lenny kernel: [drm] Skip scheduling IBs!
Jun 21 10:16:40 lenny kernel: [drm] Skip scheduling IBs!
Jun 21 10:16:40 lenny kernel: [drm] Skip scheduling IBs!
Jun 21 10:16:40 lenny kernel: [drm] Skip scheduling IBs!
Jun 21 10:16:40 lenny kernel: [drm] Skip scheduling IBs!
Jun 21 10:16:40 lenny kernel: [drm] Skip scheduling IBs!
Jun 21 10:16:40 lenny kernel: [drm] Skip scheduling IBs!
Jun 21 10:16:40 lenny kernel: [drm] Skip scheduling IBs!
Jun 21 10:16:40 lenny kernel: [drm] Skip scheduling IBs!
Jun 21 10:16:40 lenny kernel: [drm] Skip scheduling IBs!
Jun 21 10:16:40 lenny kernel: [drm] Skip scheduling IBs!
Jun 21 10:16:40 lenny kernel: [drm] Skip scheduling IBs!
Jun 21 10:16:40 lenny kernel: [drm] Skip scheduling IBs!
Jun 21 10:16:40 lenny kernel: [drm] Skip scheduling IBs!
Jun 21 10:16:40 lenny kernel: [drm] Skip scheduling IBs!
Jun 21 10:16:40 lenny kernel: [drm] Skip scheduling IBs!
Jun 21 10:16:40 lenny kernel: [drm] Skip scheduling IBs!
Jun 21 10:16:40 lenny kernel: [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize parser -125!


This might be a hardware failure... Ugh ... but with all the issues the 680M has had with the amdgpu driver... I wonder.

I wonder if the AMD pro driver would work better... I might give it a go, even though I dropped nvidia so I could get away from kernel blobs.
_________________
Some day there will only be free software.
Back to top
View user's profile Send private message
logrusx
Veteran
Veteran


Joined: 22 Feb 2018
Posts: 1531

PostPosted: Thu Jun 22, 2023 6:13 am    Post subject: Reply with quote

RayDude wrote:

This might be a hardware failure... Ugh ... but with all the issues the 680M has had with the amdgpu driver... I wonder.


Unfortunately this leans on the hw failure side.

Best Regards,
Georgi
Back to top
View user's profile Send private message
ali3nx
l33t
l33t


Joined: 21 Sep 2003
Posts: 722
Location: Winnipeg, Canada

PostPosted: Thu Jun 22, 2023 3:40 pm    Post subject: Reply with quote

logrusx wrote:
RayDude wrote:

This might be a hardware failure... Ugh ... but with all the issues the 680M has had with the amdgpu driver... I wonder.


Unfortunately this leans on the hw failure side.

Best Regards,
Georgi


5.15 kernel entirely eliminated the concerns for me. perhaps build a new chroot build starting with linux-headers and gentoo-kernel-bin 5.15.x immediately followed glibc gcc binutils and libtool perhaps sufficiently completes the core system development toolchain packages then a world rebuild for a build time consistency pass and continue from there.

A hardware failure would also potentially manifest on the older long term stable kernel. Too much faith taken for granted in the 6.x kernel branch appears to reciprocate similar results.
_________________
Compiling Gentoo since version 1.4
Thousands of Gentoo Installs Completed
Emerged on every continent but Antarctica
Compile long and Prosper!


Last edited by ali3nx on Thu Jun 22, 2023 4:15 pm; edited 3 times in total
Back to top
View user's profile Send private message
RayDude
Advocate
Advocate


Joined: 29 May 2004
Posts: 2062
Location: San Jose, CA

PostPosted: Thu Jun 22, 2023 3:42 pm    Post subject: Reply with quote

logrusx wrote:
RayDude wrote:

This might be a hardware failure... Ugh ... but with all the issues the 680M has had with the amdgpu driver... I wonder.


Unfortunately this leans on the hw failure side.

Best Regards,
Georgi


I found a thread on arch bbs talking about similar failures of the AMDGPU driver going back to late 2022, it was supposed to be fixed in 6.3.5, but others have reported issues.

One guy said it got much better when he switched to wayland.

So I again tried logging in with Plasma (wayland) instead of Plasma (X) and for the first time ever it is working very well.

The last time I tried it more than a year ago, the cursor location was different than the cursor action.

I'm running vivado now and so far, so good.

If this doesn't work, I'm going to try the AMD Pro driver for the heck of it.
_________________
Some day there will only be free software.
Back to top
View user's profile Send private message
RayDude
Advocate
Advocate


Joined: 29 May 2004
Posts: 2062
Location: San Jose, CA

PostPosted: Thu Jun 22, 2023 4:12 pm    Post subject: Reply with quote

It just crashed hard a few minutes after completing a vivado run, but the driver was able to recover.

For future reference this is kernel 6.3.7-gentoo and video-amdgpu-22.0.0.

Here are the relevant portions of various logs:

dmesg
Code:
[61264.610587] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring sdma0 timeout, signaled seq=22499, emitted seq=22501
[61264.611006] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process information: process  pid 0 thread  pid 0
[61264.611357] amdgpu 0000:35:00.0: amdgpu: GPU reset begin!
[61265.411621] amdgpu 0000:35:00.0: amdgpu: MODE2 reset
[61265.421668] amdgpu 0000:35:00.0: amdgpu: GPU reset succeeded, trying to resume
[61265.421897] [drm] PCIE GART of 1024M enabled (table at 0x000000F41FC00000).
[61265.421935] [drm] PSP is resuming...
[61265.443953] [drm] reserve 0xa00000 from 0xf41e000000 for PSP TMR
[61265.733188] amdgpu 0000:35:00.0: amdgpu: RAS: optional ras ta ucode is not available
[61265.742872] amdgpu 0000:35:00.0: amdgpu: RAP: optional rap ta ucode is not available
[61265.742876] amdgpu 0000:35:00.0: amdgpu: SECUREDISPLAY: securedisplay ta ucode is not available
[61265.742879] amdgpu 0000:35:00.0: amdgpu: SMU is resuming...
[61265.744673] amdgpu 0000:35:00.0: amdgpu: SMU is resumed successfully!
[61265.746404] [drm] DMUB hardware initialized: version=0x0400002E
[61266.213167] [drm] kiq ring mec 2 pipe 1 q 0
[61266.217890] [drm] VCN decode and encode initialized successfully(under DPG Mode).
[61266.219005] [drm] JPEG decode initialized successfully.
[61266.219009] amdgpu 0000:35:00.0: amdgpu: ring gfx_0.0.0 uses VM inv eng 0 on hub 0
[61266.219012] amdgpu 0000:35:00.0: amdgpu: ring comp_1.0.0 uses VM inv eng 1 on hub 0
[61266.219014] amdgpu 0000:35:00.0: amdgpu: ring comp_1.1.0 uses VM inv eng 4 on hub 0
[61266.219016] amdgpu 0000:35:00.0: amdgpu: ring comp_1.2.0 uses VM inv eng 5 on hub 0
[61266.219017] amdgpu 0000:35:00.0: amdgpu: ring comp_1.3.0 uses VM inv eng 6 on hub 0
[61266.219018] amdgpu 0000:35:00.0: amdgpu: ring comp_1.0.1 uses VM inv eng 7 on hub 0
[61266.219019] amdgpu 0000:35:00.0: amdgpu: ring comp_1.1.1 uses VM inv eng 8 on hub 0
[61266.219020] amdgpu 0000:35:00.0: amdgpu: ring comp_1.2.1 uses VM inv eng 9 on hub 0
[61266.219022] amdgpu 0000:35:00.0: amdgpu: ring comp_1.3.1 uses VM inv eng 10 on hub 0
[61266.219024] amdgpu 0000:35:00.0: amdgpu: ring kiq_2.1.0 uses VM inv eng 11 on hub 0
[61266.219025] amdgpu 0000:35:00.0: amdgpu: ring sdma0 uses VM inv eng 12 on hub 0
[61266.219026] amdgpu 0000:35:00.0: amdgpu: ring vcn_dec_0 uses VM inv eng 0 on hub 1
[61266.219028] amdgpu 0000:35:00.0: amdgpu: ring vcn_enc_0.0 uses VM inv eng 1 on hub 1
[61266.219029] amdgpu 0000:35:00.0: amdgpu: ring vcn_enc_0.1 uses VM inv eng 4 on hub 1
[61266.219030] amdgpu 0000:35:00.0: amdgpu: ring jpeg_dec uses VM inv eng 5 on hub 1
[61266.244112] amdgpu 0000:35:00.0: amdgpu: recover vram bo from shadow start
[61266.244118] amdgpu 0000:35:00.0: amdgpu: recover vram bo from shadow done
[61266.244139] amdgpu 0000:35:00.0: amdgpu: GPU reset(1) succeeded!
[61266.244388] amdgpu 0000:35:00.0: amdgpu: [gfxhub] page fault (src_id:0 ring:40 vmid:1 pasid:32772, for process Xwayland pid 3085 thread Xwayland:cs0 pid 3095)
[61266.244403] amdgpu 0000:35:00.0: amdgpu:   in page starting at address 0x0000800102bc3000 from client 0x1b (UTCL2)
[61266.244409] amdgpu 0000:35:00.0: amdgpu: GCVM_L2_PROTECTION_FAULT_STATUS:0x00141051
[61266.244413] amdgpu 0000:35:00.0: amdgpu:     Faulty UTCL2 client ID: TCP (0x8)
[61266.244416] amdgpu 0000:35:00.0: amdgpu:     MORE_FAULTS: 0x1
[61266.244418] amdgpu 0000:35:00.0: amdgpu:     WALKER_ERROR: 0x0
[61266.244420] amdgpu 0000:35:00.0: amdgpu:     PERMISSION_FAULTS: 0x5
[61266.244422] amdgpu 0000:35:00.0: amdgpu:     MAPPING_ERROR: 0x0
[61266.244424] amdgpu 0000:35:00.0: amdgpu:     RW: 0x1

[snipped 10 copies of above message by hand: raydude]

[61276.386549] gmc_v10_0_process_interrupt: 10 callbacks suppressed
[61276.386559] amdgpu 0000:35:00.0: amdgpu: [gfxhub] page fault (src_id:0 ring:40 vmid:1 pasid:32772, for process Xwayland pid 3085 thread Xwayland:cs0 pid 3095)
[61276.386573] amdgpu 0000:35:00.0: amdgpu:   in page starting at address 0x0000800102bc4000 from client 0x1b (UTCL2)
[61276.386579] amdgpu 0000:35:00.0: amdgpu: GCVM_L2_PROTECTION_FAULT_STATUS:0x00141051
[61276.386582] amdgpu 0000:35:00.0: amdgpu:     Faulty UTCL2 client ID: TCP (0x8)
[61276.386584] amdgpu 0000:35:00.0: amdgpu:     MORE_FAULTS: 0x1
[61276.386587] amdgpu 0000:35:00.0: amdgpu:     WALKER_ERROR: 0x0
[61276.386589] amdgpu 0000:35:00.0: amdgpu:     PERMISSION_FAULTS: 0x5
[61276.386591] amdgpu 0000:35:00.0: amdgpu:     MAPPING_ERROR: 0x0

[snipped a bunch of the above message: raydude]

[61276.386592] amdgpu 0000:35:00.0: amdgpu:     RW: 0x1
[61276.386736] amdgpu 0000:35:00.0: amdgpu: [gfxhub] page fault (src_id:0 ring:40 vmid:1 pasid:32772, for process Xwayland pid 3085 thread Xwayland:cs0 pid 3095)
[61276.386739] amdgpu 0000:35:00.0: amdgpu:   in page starting at address 0x0000800102bcd000 from client 0x1b (UTCL2)
[61276.386741] amdgpu 0000:35:00.0: amdgpu: GCVM_L2_PROTECTION_FAULT_STATUS:0x00000000
[61276.386743] amdgpu 0000:35:00.0: amdgpu:     Faulty UTCL2 client ID: CB/DB (0x0)
[61276.386745] amdgpu 0000:35:00.0: amdgpu:     MORE_FAULTS: 0x0
[61276.386746] amdgpu 0000:35:00.0: amdgpu:     WALKER_ERROR: 0x0
[61276.386748] amdgpu 0000:35:00.0: amdgpu:     PERMISSION_FAULTS: 0x0
[61276.386749] amdgpu 0000:35:00.0: amdgpu:     MAPPING_ERROR: 0x0
[61276.386521] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx_0.0.0 timeout, but soft recovered
[61276.386752] amdgpu 0000:35:00.0: amdgpu:     RW: 0x0
[61276.386755] amdgpu 0000:35:00.0: amdgpu: [gfxhub] page fault (src_id:0 ring:40 vmid:1 pasid:32772, for process Xwayland pid 3085 thread Xwayland:cs0 pid 3095)
[61276.386758] amdgpu 0000:35:00.0: amdgpu:   in page starting at address 0x0000800102bc9000 from client 0x1b (UTCL2)
[61276.386760] amdgpu 0000:35:00.0: amdgpu: GCVM_L2_PROTECTION_FAULT_STATUS:0x00000000
[61276.386761] amdgpu 0000:35:00.0: amdgpu:     Faulty UTCL2 client ID: CB/DB (0x0)
[61276.386763] amdgpu 0000:35:00.0: amdgpu:     MORE_FAULTS: 0x0
[61276.386764] amdgpu 0000:35:00.0: amdgpu:     WALKER_ERROR: 0x0
[61276.386765] amdgpu 0000:35:00.0: amdgpu:     PERMISSION_FAULTS: 0x0
[61276.386767] amdgpu 0000:35:00.0: amdgpu:     MAPPING_ERROR: 0x0
[61276.386768] amdgpu 0000:35:00.0: amdgpu:     RW: 0x0
[61286.626490] gmc_v10_0_process_interrupt: 29 callbacks suppressed
[61286.626495] amdgpu 0000:35:00.0: amdgpu: [gfxhub] page fault (src_id:0 ring:24 vmid:5 pasid:32771, for process kwin_wayland pid 3027 thread kwin_wayla:cs0 pid 3076)
[61286.626500] amdgpu 0000:35:00.0: amdgpu:   in page starting at address 0x0000800108d17000 from client 0x1b (UTCL2)
[61286.626505] amdgpu 0000:35:00.0: amdgpu: GCVM_L2_PROTECTION_FAULT_STATUS:0x00501031
[61286.626507] amdgpu 0000:35:00.0: amdgpu:     Faulty UTCL2 client ID: TCP (0x8)
[61286.626509] amdgpu 0000:35:00.0: amdgpu:     MORE_FAULTS: 0x1
[61286.626510] amdgpu 0000:35:00.0: amdgpu:     WALKER_ERROR: 0x0
[61286.626511] amdgpu 0000:35:00.0: amdgpu:     PERMISSION_FAULTS: 0x3
[61286.626511] amdgpu 0000:35:00.0: amdgpu:     MAPPING_ERROR: 0x0
[61286.626512] amdgpu 0000:35:00.0: amdgpu:     RW: 0x0
[61286.626514] amdgpu 0000:35:00.0: amdgpu: [gfxhub] page fault (src_id:0 ring:24 vmid:5 pasid:32771, for process kwin_wayland pid 3027 thread kwin_wayla:cs0 pid 3076)
[61286.626517] amdgpu 0000:35:00.0: amdgpu:   in page starting at address 0x0000800108d21000 from client 0x1b (UTCL2)
[61286.626518] amdgpu 0000:35:00.0: amdgpu: GCVM_L2_PROTECTION_FAULT_STATUS:0x00000000
[61286.626519] amdgpu 0000:35:00.0: amdgpu:     Faulty UTCL2 client ID: CB/DB (0x0)
[61286.626520] amdgpu 0000:35:00.0: amdgpu:     MORE_FAULTS: 0x0
[61286.626521] amdgpu 0000:35:00.0: amdgpu:     WALKER_ERROR: 0x0
[61286.626521] amdgpu 0000:35:00.0: amdgpu:     PERMISSION_FAULTS: 0x0
[61286.626522] amdgpu 0000:35:00.0: amdgpu:     MAPPING_ERROR: 0x0
[61286.626523] amdgpu 0000:35:00.0: amdgpu:     RW: 0x0
[61286.626525] amdgpu 0000:35:00.0: amdgpu: [gfxhub] page fault (src_id:0 ring:24 vmid:5 pasid:32771, for process kwin_wayland pid 3027 thread kwin_wayla:cs0 pid 3076)
[61286.626526] amdgpu 0000:35:00.0: amdgpu:   in page starting at address 0x0000800108d2a000 from client 0x1b (UTCL2)
[61286.626527] amdgpu 0000:35:00.0: amdgpu: GCVM_L2_PROTECTION_FAULT_STATUS:0x00000000
[61286.626528] amdgpu 0000:35:00.0: amdgpu:     Faulty UTCL2 client ID: CB/DB (0x0)
[61286.626529] amdgpu 0000:35:00.0: amdgpu:     MORE_FAULTS: 0x0
[61286.626529] amdgpu 0000:35:00.0: amdgpu:     WALKER_ERROR: 0x0
[61286.626530] amdgpu 0000:35:00.0: amdgpu:     PERMISSION_FAULTS: 0x0
[61286.626531] amdgpu 0000:35:00.0: amdgpu:     MAPPING_ERROR: 0x0
[61286.626531] amdgpu 0000:35:00.0: amdgpu:     RW: 0x0
[61286.626534] amdgpu 0000:35:00.0: amdgpu: [gfxhub] page fault (src_id:0 ring:24 vmid:5 pasid:32771, for process kwin_wayland pid 3027 thread kwin_wayla:cs0 pid 3076)
[61286.626536] amdgpu 0000:35:00.0: amdgpu:   in page starting at address 0x0000800108d28000 from client 0x1b (UTCL2)
[61286.626536] amdgpu 0000:35:00.0: amdgpu: GCVM_L2_PROTECTION_FAULT_STATUS:0x00000000
[61286.626537] amdgpu 0000:35:00.0: amdgpu:     Faulty UTCL2 client ID: CB/DB (0x0)
[61286.626538] amdgpu 0000:35:00.0: amdgpu:     MORE_FAULTS: 0x0
[61286.626539] amdgpu 0000:35:00.0: amdgpu:     WALKER_ERROR: 0x0
[61286.626539] amdgpu 0000:35:00.0: amdgpu:     PERMISSION_FAULTS: 0x0
[61286.626540] amdgpu 0000:35:00.0: amdgpu:     MAPPING_ERROR: 0x0
[61286.626540] amdgpu 0000:35:00.0: amdgpu:     RW: 0x0

[snipped a bunch of copies of the above message: raydude]

[61286.626581] amdgpu 0000:35:00.0: amdgpu: [gfxhub] page fault (src_id:0 ring:24 vmid:5 pasid:32771, for process kwin_wayland pid 3027 thread kwin_wayla:cs0 pid 3076)
[61286.626582] amdgpu 0000:35:00.0: amdgpu:   in page starting at address 0x0000800108d1e000 from client 0x1b (UTCL2)
[61286.626583] amdgpu 0000:35:00.0: amdgpu: GCVM_L2_PROTECTION_FAULT_STATUS:0x00000000
[61286.626584] amdgpu 0000:35:00.0: amdgpu:     Faulty UTCL2 client ID: CB/DB (0x0)
[61286.626585] amdgpu 0000:35:00.0: amdgpu:     MORE_FAULTS: 0x0
[61286.626585] amdgpu 0000:35:00.0: amdgpu:     WALKER_ERROR: 0x0
[61286.626586] amdgpu 0000:35:00.0: amdgpu:     PERMISSION_FAULTS: 0x0
[61286.626587] amdgpu 0000:35:00.0: amdgpu:     MAPPING_ERROR: 0x0
[61286.626587] amdgpu 0000:35:00.0: amdgpu:     RW: 0x0
[61286.627089] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx_0.0.0 timeout, but soft recovered


messages
Code:
Jun 22 08:49:31 lenny kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring sdma0 timeout, signaled seq=22499, emitted seq=22501
Jun 22 08:49:31 lenny kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process information: process  pid 0 thread  pid 0
Jun 22 08:49:31 lenny kernel: amdgpu 0000:35:00.0: amdgpu: GPU reset begin!
Jun 22 08:49:32 lenny kernel: amdgpu 0000:35:00.0: amdgpu: MODE2 reset
Jun 22 08:49:32 lenny kernel: amdgpu 0000:35:00.0: amdgpu: GPU reset succeeded, trying to resume
Jun 22 08:49:32 lenny kernel: [drm] PCIE GART of 1024M enabled (table at 0x000000F41FC00000).
Jun 22 08:49:32 lenny kernel: [drm] PSP is resuming...
Jun 22 08:49:32 lenny kernel: [drm] reserve 0xa00000 from 0xf41e000000 for PSP TMR
Jun 22 08:49:32 lenny kernel: amdgpu 0000:35:00.0: amdgpu: RAS: optional ras ta ucode is not available
Jun 22 08:49:32 lenny kernel: amdgpu 0000:35:00.0: amdgpu: RAP: optional rap ta ucode is not available
Jun 22 08:49:32 lenny kernel: amdgpu 0000:35:00.0: amdgpu: SECUREDISPLAY: securedisplay ta ucode is not available
Jun 22 08:49:32 lenny kernel: amdgpu 0000:35:00.0: amdgpu: SMU is resuming...
Jun 22 08:49:32 lenny kernel: amdgpu 0000:35:00.0: amdgpu: SMU is resumed successfully!
Jun 22 08:49:32 lenny kernel: [drm] DMUB hardware initialized: version=0x0400002E
Jun 22 08:49:33 lenny kernel: [drm] kiq ring mec 2 pipe 1 q 0
Jun 22 08:49:33 lenny kernel: [drm] VCN decode and encode initialized successfully(under DPG Mode).
Jun 22 08:49:33 lenny kernel: [drm] JPEG decode initialized successfully.
Jun 22 08:49:33 lenny kernel: amdgpu 0000:35:00.0: amdgpu: ring gfx_0.0.0 uses VM inv eng 0 on hub 0
Jun 22 08:49:33 lenny kernel: amdgpu 0000:35:00.0: amdgpu: ring comp_1.0.0 uses VM inv eng 1 on hub 0
Jun 22 08:49:33 lenny kernel: amdgpu 0000:35:00.0: amdgpu: ring comp_1.1.0 uses VM inv eng 4 on hub 0
Jun 22 08:49:33 lenny kernel: amdgpu 0000:35:00.0: amdgpu: ring comp_1.2.0 uses VM inv eng 5 on hub 0
Jun 22 08:49:33 lenny kernel: amdgpu 0000:35:00.0: amdgpu: ring comp_1.3.0 uses VM inv eng 6 on hub 0
Jun 22 08:49:33 lenny kernel: amdgpu 0000:35:00.0: amdgpu: ring comp_1.0.1 uses VM inv eng 7 on hub 0
Jun 22 08:49:33 lenny kernel: amdgpu 0000:35:00.0: amdgpu: ring comp_1.1.1 uses VM inv eng 8 on hub 0
Jun 22 08:49:33 lenny kernel: amdgpu 0000:35:00.0: amdgpu: ring comp_1.2.1 uses VM inv eng 9 on hub 0
Jun 22 08:49:33 lenny kernel: amdgpu 0000:35:00.0: amdgpu: ring comp_1.3.1 uses VM inv eng 10 on hub 0
Jun 22 08:49:33 lenny kernel: amdgpu 0000:35:00.0: amdgpu: ring kiq_2.1.0 uses VM inv eng 11 on hub 0
Jun 22 08:49:33 lenny kernel: amdgpu 0000:35:00.0: amdgpu: ring sdma0 uses VM inv eng 12 on hub 0
Jun 22 08:49:33 lenny kernel: amdgpu 0000:35:00.0: amdgpu: ring vcn_dec_0 uses VM inv eng 0 on hub 1
Jun 22 08:49:33 lenny kernel: amdgpu 0000:35:00.0: amdgpu: ring vcn_enc_0.0 uses VM inv eng 1 on hub 1
Jun 22 08:49:33 lenny kernel: amdgpu 0000:35:00.0: amdgpu: ring vcn_enc_0.1 uses VM inv eng 4 on hub 1
Jun 22 08:49:33 lenny kernel: amdgpu 0000:35:00.0: amdgpu: ring jpeg_dec uses VM inv eng 5 on hub 1
Jun 22 08:49:33 lenny kernel: amdgpu 0000:35:00.0: amdgpu: recover vram bo from shadow start
Jun 22 08:49:33 lenny kernel: amdgpu 0000:35:00.0: amdgpu: recover vram bo from shadow done
Jun 22 08:49:33 lenny kernel: amdgpu 0000:35:00.0: amdgpu: GPU reset(1) succeeded!

[duplicate of dmesg found here: raydude]

Jun 22 08:49:53 lenny kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx_0.0.0 timeout, but soft recovered
Jun 22 08:49:53 lenny bluetoothd[2150]: Endpoint unregistered: sender=:1.75 path=/MediaEndpoint/A2DPSource/ldac
Jun 22 08:49:53 lenny bluetoothd[2150]: Endpoint unregistered: sender=:1.75 path=/MediaEndpoint/A2DPSink/aptx_hd
Jun 22 08:49:53 lenny bluetoothd[2150]: Endpoint unregistered: sender=:1.75 path=/MediaEndpoint/A2DPSource/aptx_hd
Jun 22 08:49:53 lenny bluetoothd[2150]: Endpoint unregistered: sender=:1.75 path=/MediaEndpoint/A2DPSink/aptx
Jun 22 08:49:53 lenny bluetoothd[2150]: Endpoint unregistered: sender=:1.75 path=/MediaEndpoint/A2DPSource/aptx
Jun 22 08:49:53 lenny bluetoothd[2150]: Endpoint unregistered: sender=:1.75 path=/MediaEndpoint/A2DPSink/aac
Jun 22 08:49:53 lenny bluetoothd[2150]: Endpoint unregistered: sender=:1.75 path=/MediaEndpoint/A2DPSource/aac
Jun 22 08:49:53 lenny bluetoothd[2150]: Endpoint unregistered: sender=:1.75 path=/MediaEndpoint/A2DPSink/sbc
Jun 22 08:49:53 lenny bluetoothd[2150]: Endpoint unregistered: sender=:1.75 path=/MediaEndpoint/A2DPSource/sbc
Jun 22 08:49:53 lenny bluetoothd[2150]: Endpoint unregistered: sender=:1.75 path=/MediaEndpoint/A2DPSink/sbc_xq
Jun 22 08:49:53 lenny bluetoothd[2150]: Endpoint unregistered: sender=:1.75 path=/MediaEndpoint/A2DPSource/sbc_xq
Jun 22 08:49:53 lenny bluetoothd[2150]: Endpoint unregistered: sender=:1.75 path=/MediaEndpoint/A2DPSource/aptx_ll_1
Jun 22 08:49:53 lenny bluetoothd[2150]: Endpoint unregistered: sender=:1.75 path=/MediaEndpoint/A2DPSource/aptx_ll_0
Jun 22 08:49:53 lenny bluetoothd[2150]: Endpoint unregistered: sender=:1.75 path=/MediaEndpoint/A2DPSource/aptx_ll_duplex_1
Jun 22 08:49:53 lenny bluetoothd[2150]: Endpoint unregistered: sender=:1.75 path=/MediaEndpoint/A2DPSource/aptx_ll_duplex_0
Jun 22 08:49:53 lenny bluetoothd[2150]: Endpoint unregistered: sender=:1.75 path=/MediaEndpoint/A2DPSource/faststream
Jun 22 08:49:53 lenny bluetoothd[2150]: Endpoint unregistered: sender=:1.75 path=/MediaEndpoint/A2DPSource/faststream_duplex
Jun 22 08:49:53 lenny bluetoothd[2150]: Endpoint unregistered: sender=:1.75 path=/MediaEndpoint/A2DPSink/opus_05
Jun 22 08:49:53 lenny bluetoothd[2150]: Endpoint unregistered: sender=:1.75 path=/MediaEndpoint/A2DPSource/opus_05
Jun 22 08:49:53 lenny bluetoothd[2150]: Endpoint unregistered: sender=:1.75 path=/MediaEndpoint/A2DPSink/opus_05_duplex
Jun 22 08:49:53 lenny bluetoothd[2150]: Endpoint unregistered: sender=:1.75 path=/MediaEndpoint/A2DPSource/opus_05_duplex
Jun 22 08:50:01 lenny login[2600]: pam_unix(login:session): session opened for user root(uid=0) by LOGIN(uid=0)
Jun 22 08:50:01 lenny elogind-daemon[2083]: New session 3 of user root.
Jun 22 08:50:01 lenny login[8949]: ROOT LOGIN  on '/dev/tty1'
Jun 22 08:51:26 lenny sddm-helper[2537]: pam_unix(sddm-greeter:session): session closed for user sddm
Jun 22 08:51:26 lenny elogind-daemon[2083]: Removed session c1.


I grabbed Xorg.log and Xorg.log.old, but they don't have anything. And the Xorg.log itself is incomplete, implying it was overwritten by the video rewrite. This seems like an error on Xorgs part.

I guess I'll try AMD's Pro driver next.
_________________
Some day there will only be free software.
Back to top
View user's profile Send private message
logrusx
Veteran
Veteran


Joined: 22 Feb 2018
Posts: 1531

PostPosted: Thu Jun 22, 2023 7:42 pm    Post subject: Reply with quote

RayDude wrote:


I grabbed Xorg.log and Xorg.log.old, but they don't have anything. And the Xorg.log itself is incomplete, implying it was overwritten by the video rewrite. This seems like an error on Xorgs part.

I guess I'll try AMD's Pro driver next.


You said you logged in into Wayland session, it doesn't write to Xorg logs. For me it writes in the syslog (the compositor is Gnome), for you it might be different.

Best Regards,
Georgi


Last edited by logrusx on Fri Jun 23, 2023 5:50 am; edited 1 time in total
Back to top
View user's profile Send private message
CooSee
Veteran
Veteran


Joined: 20 Nov 2004
Posts: 1438
Location: Earth

PostPosted: Thu Jun 22, 2023 10:03 pm    Post subject: Reply with quote

Code:
Unfortunately this leans on the hw failure side.

i would try any kind of current available linux version via usb, to get sure !

8)
_________________
" Die Realität ist eine Illusion, die durch Mangel an ehrlicher Kommunikation entsteht "
---
" Der Mensch ist von Natur aus neugierig, was am Ende übrig bleibt ist die Gier "
Back to top
View user's profile Send private message
RayDude
Advocate
Advocate


Joined: 29 May 2004
Posts: 2062
Location: San Jose, CA

PostPosted: Sat Jun 24, 2023 10:49 pm    Post subject: Reply with quote

CooSee wrote:
Code:
Unfortunately this leans on the hw failure side.

i would try any kind of current available linux version via usb, to get sure !

8)


That's a good idea. I'll give that a shot at some point.

It has behaved the last few days. I turned down the number of simultaneous thread in vivado and it hasn't absorbed as much memory.

I turns out it was absorbing all available memory, up to 29.5 GB of 32 GB and by reducing the thread count from 8 to 7 it hasn't been crashing.

I think that kinda shows that it might be a memory management issue, but I can't be positive.
_________________
Some day there will only be free software.
Back to top
View user's profile Send private message
ali3nx
l33t
l33t


Joined: 21 Sep 2003
Posts: 722
Location: Winnipeg, Canada

PostPosted: Sun Jun 25, 2023 12:15 am    Post subject: Reply with quote

Some gentoo community members are aware I've provided completed gentoo chroot prebuilds for many years and considering the circumstances I felt it could be beneficial or useful for diagnostic testing this kernel bug to provide a gentoo LTS prebuild to attempt to aid affected users.

Perhaps this could be a stage3 build the releng team developers could consider officially providing.

I've started a new systemd desktop merged-usr stage3 chroot build specifically configured for amdgpu support restricted to linux-headers and gentoo-kernel-bin 5.15.x that should be available on my prebuilds webserver directory perhaps within six to twelve hours from this post reply.

The only changes made are some minor use flag alterations and other necessary make.conf configurations, linux-headers and kernel version restrictions and world rebuild followed by completing the xorg-server and xorg--apps package group installations.

Completed 5.15 lts chroot prebuild is available here

100 MBit upload bandwidth speed limit so please try to be courteous of you notice a slower upload speed.
_________________
Compiling Gentoo since version 1.4
Thousands of Gentoo Installs Completed
Emerged on every continent but Antarctica
Compile long and Prosper!


Last edited by ali3nx on Sun Jun 25, 2023 5:07 am; edited 2 times in total
Back to top
View user's profile Send private message
RayDude
Advocate
Advocate


Joined: 29 May 2004
Posts: 2062
Location: San Jose, CA

PostPosted: Sun Jun 25, 2023 2:46 am    Post subject: Reply with quote

ali3nx wrote:
Some gentoo community members are aware I've provided completed gentoo chroot prebuilds for many years and considering the circumstances I felt it could be beneficial or useful for diagnostic testing this kernel bug to provide a gentoo LTS prebuild to attempt to aid affected users.

Perhaps this could be a stage3 build the releng team developers could consider officially providing.

I've started a new systemd desktop merged-usr stage3 chroot build specifically configured for amdgpu support restricted to linux-headers and gentoo-kernel-bin 5.15.x that should be available on my prebuilds webserver directory perhaps within six to twelve hours from this post reply.

The only changes made are some minor use flag alterations and other necessary make.conf configurations, linux-headers and kernel version restrictions and world rebuild followed by completing the xorg-server and xorg--apps package group installations.

Current package progress at ~200 of 496


Thanks ali3nx.

Does this mean I could install your chroot environment, shut down X windows, run the chroot to your image and start X windows to use your drivers to debug this?

This is probably over my head, but I'd love to give it a try since I'm out of work anyway.
_________________
Some day there will only be free software.
Back to top
View user's profile Send private message
ali3nx
l33t
l33t


Joined: 21 Sep 2003
Posts: 722
Location: Winnipeg, Canada

PostPosted: Sun Jun 25, 2023 2:58 am    Post subject: Reply with quote

If you wanted to attempt to reproduce the kernel bug with a specifically built 5.15 build without altering your own current build you could configure this new build to be usable from a perhaps either a separate partition on some disk of your choosing or a new btrfs subvolume or zfs dataset.

Similar concept as dual booting multiple os installs. Pooled storage filesystems are fantastic for multi boot installation options once your familiar with the usage concept.

Perhaps it's possible to run this build environment from a chroot but testing the running 5.15 kernel with a consistent system build compiled around the 5.15 linux headers version is a consistent sane testing environment that could be considered valid for diagnostic validation.

My build is a chroot build because all gentoo systems are commonly built using a chroot build environment. My custom xeon powered Larry enjoys working hard to aid fellow gentoo users :)

It occurred to me i've never seen someone report this ring timeout amdgpu driver bug using a 5.15 kernel nor have I experienced it but the hardware I have available perhaps wouldn't qualify as a large enough sample size to reproduce the amdgpu ring timeout kernel driver bug with linux 5.15

My own systems when running 6.1.x the ring timeout is guaranteed to occur immediately on system boot but never occurs with 5.15.x

Perhaps building your own purpose built 5.15 lts gentoo desktop build is difficult and inconvenient due to personal computing horsepower limitations. This could help you begin using that configuration with the 5.15 linux-headers world package contents consistency rebuild having been completed in advance.
_________________
Compiling Gentoo since version 1.4
Thousands of Gentoo Installs Completed
Emerged on every continent but Antarctica
Compile long and Prosper!
Back to top
View user's profile Send private message
RayDude
Advocate
Advocate


Joined: 29 May 2004
Posts: 2062
Location: San Jose, CA

PostPosted: Sun Jun 25, 2023 3:43 am    Post subject: Reply with quote

ali3nx wrote:
If you wanted to attempt to reproduce the kernel bug with a specifically built 5.15 build without altering your own current build you could configure this new build to be usable from a perhaps either a separate partition on some disk of your choosing or a new btrfs subvolume or zfs dataset.

Similar concept as dual booting multiple os installs. Pooled storage filesystems are fantastic for multi boot installation options once your familiar with the usage concept.

Perhaps it's possible to run this build environment from a chroot but testing the running 5.15 kernel with a consistent system build compiled around the 5.15 linux headers version is a consistent sane testing environment that could be considered valid for diagnostic validation.

My build is a chroot build because all gentoo systems are commonly built using a chroot build environment. My custom xeon powered Larry enjoys working hard to aid fellow gentoo users :)

It occurred to me i've never seen someone report this ring timeout amdgpu driver bug using a 5.15 kernel nor have I experienced it but the hardware I have available perhaps wouldn't qualify as a large enough sample size to reproduce the amdgpu ring timeout kernel driver bug with linux 5.15

My own systems when running 6.1.x the ring timeout is guaranteed to occur immediately on system boot but never occurs with 5.15.x

Perhaps building your own purpose built 5.15 lts gentoo desktop build is difficult and inconvenient due to personal computing horsepower limitations. This could help you begin using that configuration with the 5.15 linux-headers world package contents consistency rebuild having been completed in advance.


I understand now. Thanks much for taking the time.

I'm upgrading to amdgpu-23 (for the heck of it) with my emerge -DNuq @world every other week update. If I get the ring buffer timeout again after running vivado (with eight threads), I'll go back to 5.15.xx and see if it goes away.

Thanks again.
_________________
Some day there will only be free software.
Back to top
View user's profile Send private message
Display posts from previous:   
Reply to topic    Gentoo Forums Forum Index Kernel & Hardware All times are GMT
Page 1 of 1

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum