Gentoo Forums
Gentoo Forums
Gentoo Forums
Quick Search: in
AMDGPU, 2 identical card, unbind the second, ... Oops: 0000
View unanswered posts
View posts from last 24 hours

 
Reply to topic    Gentoo Forums Forum Index Kernel & Hardware
View previous topic :: View next topic  
Author Message
Jooo
n00b
n00b


Joined: 26 Jul 2016
Posts: 2

PostPosted: Tue Dec 11, 2018 2:46 am    Post subject: AMDGPU, 2 identical card, unbind the second, ... Oops: 0000 Reply with quote

I have to identical Amd Radeon RX580, when I boot before starting gdm I have to unbind the card I want to use with my virtual machine.

What I do is :

echo 0000:0c:00.0 > /sys/bus/pci/devices/0000:0c:00.0/driver/unbind
echo 0000:0c:00.1 > /sys/bus/pci/devices/0000:0c:00.1/driver/unbind

After that I start gdm.

Everything seem to be ok. I can start my vm with qemu and everything is working pretty well. But I have a recurrent bug in journalctl. And when I shutdown my computer, it freeze with a kernel panic.

To prevent to overload my log file I limit the size to 1M. But I would like to know if there's a patch to fix this issue? Or any other configuration I could make to prevent it. I tried to put amdgpu.dc=0, it prevent the error message but It's a mess to be able to start gdm. To make it simple, it's not the solution :)

This is the journalctl bug :

déc 10 21:03:10 Jo kernel: CR2: 0000000000000000
déc 10 21:03:10 Jo kernel: ---[ end trace b8d59f6e48b82af2 ]---
déc 10 21:03:10 Jo kernel: RIP: 0010:sysfs_kf_seq_show+0x89/0x100
déc 10 21:03:10 Jo kernel: Code: 48 89 d1 31 c0 48 83 e7 f8 48 c7 02 00 00 00 00 48 c7 82 f8 0f 00 00 00 00 00 00 48 29 f9 81 c1 00 10 00 00 c1 e9 03 f3 48 ab <48> 8b 45 00 48 85 c0 74 5d 49 8b 09 4c 89 c7 48 8b 71 60 e8 ff 22
déc 10 21:03:10 Jo kernel: RSP: 0018:ffff9d9294213de0 EFLAGS: 00010216
déc 10 21:03:10 Jo kernel: RAX: 0000000000000000 RBX: ffff8b8f08f35880 RCX: 0000000000000000
déc 10 21:03:10 Jo kernel: RDX: ffff8b8f1efc2000 RSI: 0000000000001000 RDI: ffff8b8f1efc3000
déc 10 21:03:10 Jo kernel: RBP: 0000000000000000 R08: ffff8b8f2222a928 R09: ffff8b8f09717cc0
déc 10 21:03:10 Jo kernel: R10: 0000000000000000 R11: 0000000000000000 R12: 00000000ffffffff
déc 10 21:03:10 Jo kernel: R13: 0000000000000001 R14: ffff8b8f14322300 R15: ffff8b8f08f35880
déc 10 21:03:10 Jo kernel: FS: 00007f83d29fa740(0000) GS:ffff8b8f3edc0000(0000) knlGS:0000000000000000
déc 10 21:03:10 Jo kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
déc 10 21:03:10 Jo kernel: CR2: 0000000000000000 CR3: 0000000fedd72000 CR4: 00000000003406e0
déc 10 21:03:15 Jo kernel: BUG: unable to handle kernel NULL pointer dereference at 0000000000000000
déc 10 21:03:15 Jo kernel: PGD 0 P4D 0
déc 10 21:03:15 Jo kernel: Oops: 0000 [#7] SMP NOPTI
déc 10 21:03:15 Jo kernel: CPU: 24 PID: 4313 Comm: sensors Tainted: G D 4.19.8-gentoo #1
déc 10 21:03:15 Jo kernel: Hardware name: Micro-Star International Co., Ltd. MS-7B09/X399 GAMING PRO CARBON AC (MS-7B09), BIOS 1.B0 08/09/2018
déc 10 21:03:15 Jo kernel: RIP: 0010:sysfs_kf_seq_show+0x89/0x100
déc 10 21:03:15 Jo kernel: Code: 48 89 d1 31 c0 48 83 e7 f8 48 c7 02 00 00 00 00 48 c7 82 f8 0f 00 00 00 00 00 00 48 29 f9 81 c1 00 10 00 00 c1 e9 03 f3 48 ab <48> 8b 45 00 48 85 c0 74 5d 49 8b 09 4c 89 c7 48 8b 71 60 e8 ff 22
déc 10 21:03:15 Jo kernel: RSP: 0018:ffff9d929473bde0 EFLAGS: 00010216
déc 10 21:03:15 Jo kernel: RAX: 0000000000000000 RBX: ffff8b8f270acd00 RCX: 0000000000000000
déc 10 21:03:15 Jo kernel: RDX: ffff8b8ef561d000 RSI: 0000000000001000 RDI: ffff8b8ef561e000
déc 10 21:03:15 Jo kernel: RBP: 0000000000000000 R08: ffff8b8f2222a928 R09: ffff8b8f2f795240
déc 10 21:03:15 Jo kernel: R10: 0000000000000000 R11: 0000000000000000 R12: 00000000ffffffff
déc 10 21:03:15 Jo kernel: R13: 0000000000000001 R14: ffff8b8f18a6c700 R15: ffff8b8f270acd00
déc 10 21:03:15 Jo kernel: FS: 00007fac245fc740(0000) GS:ffff8b8f3ec00000(0000) knlGS:0000000000000000
déc 10 21:03:15 Jo kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
déc 10 21:03:15 Jo kernel: CR2: 0000000000000000 CR3: 0000000fede76000 CR4: 00000000003406e0
déc 10 21:03:15 Jo kernel: Call Trace:
déc 10 21:03:15 Jo kernel: seq_read+0x14e/0x3e0
déc 10 21:03:15 Jo kernel: __vfs_read+0x31/0x170
déc 10 21:03:15 Jo kernel: ? __se_sys_newfstat+0x5a/0x70
déc 10 21:03:15 Jo kernel: vfs_read+0x85/0x110
déc 10 21:03:15 Jo kernel: ksys_read+0x4a/0xb0
déc 10 21:03:15 Jo kernel: do_syscall_64+0x43/0xf0
déc 10 21:03:15 Jo kernel: entry_SYSCALL_64_after_hwframe+0x44/0xa9
déc 10 21:03:15 Jo kernel: RIP: 0033:0x7fac24158755
déc 10 21:03:15 Jo kernel: Code: 00 00 0f 1f 00 48 83 ec 38 64 48 8b 04 25 28 00 00 00 48 89 44 24 28 31 c0 48 8d 05 85 6f 2d 00 8b 00 85 c0 75 2f 31 c0 0f 05 <48> 3d 00 f0 ff ff 77 73 48 8b 4c 24 28 64 48 33 0c 25 28 00 00 00
déc 10 21:03:15 Jo kernel: RSP: 002b:00007ffce0aefd60 EFLAGS: 00000246 ORIG_RAX: 0000000000000000
déc 10 21:03:15 Jo kernel: RAX: ffffffffffffffda RBX: 0000556339b53330 RCX: 00007fac24158755
déc 10 21:03:15 Jo kernel: RDX: 0000000000001000 RSI: 0000556339b53970 RDI: 0000000000000004
déc 10 21:03:15 Jo kernel: RBP: 0000000000000d68 R08: 0000000000000003 R09: 000000000000007c
déc 10 21:03:15 Jo kernel: R10: 0000556339b4a010 R11: 0000000000000246 R12: 00007fac244272a0
déc 10 21:03:15 Jo kernel: R13: 00007fac24426760 R14: 0000000000000000 R15: 0000556339b53330
déc 10 21:03:15 Jo kernel: Modules linked in: auth_rpcgss nfsv4 dns_resolver arc4 hid_logitech_hidpp amdkfd iwlmvm amdgpu mac80211 snd_hda_codec_realtek chash snd_usb_audio gpu_sched snd_hda_codec_generic snd_hda_codec_hdmi snd_usbmidi_lib drm_kms_helper iwlwifi snd_hda_intel snd_rawmidi kvm_amd syscopyarea snd_hda_codec sysfillrect snd_seq_device sysimgblt snd_hda_core fb_sys_fops btusb snd_hwdep efivars btrtl kvm pcspkr snd_pcm btbcm irqbypass btintel ttm hid_logitech_dj joydev input_leds bluetooth drm snd_timer k10temp i2c_piix4 cfg80211 ecdh_generic backlight snd rfkill button nct6775 hwmon_vid efivarfs xts aes_x86_64 ecb cbc sha512_generic sha1_generic libiscsi scsi_transport_iscsi ixgb ixgbe tulip cxgb3 cxgb mdio vxlan ip6_udp_tunnel udp_tunnel macvlan tg3 sky2 r8169 libphy pcnet32 mii e1000 bnx2 fuse nfs
déc 10 21:03:15 Jo kernel: lockd grace sunrpc jfs multipath linear raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor xor async_tx raid6_pq raid1 raid0 dm_snapshot dm_bufio dm_crypt dm_mirror dm_region_hash dm_log dm_mod dax hid_sunplus hid_sony hid_samsung hid_pl hid_petalynx hid_monterey hid_microsoft hid_gyration hid_ezkey hid_cypress hid_chicony hid_cherry hid_belkin hid_apple hid_a4tech sl811_hcd xhci_plat_hcd ohci_pci ohci_hcd uhci_hcd usb_storage ehci_pci ehci_hcd aic94xx libsas lpfc crc_t10dif crct10dif_common qla2xxx megaraid_sas megaraid_mbox megaraid_mm megaraid aacraid sx8 DAC960 3w_9xxx 3w_xxxx mptsas scsi_transport_sas mptfc scsi_transport_fc mptspi mptscsih mptbase atp870u dc395x qla1280 imm parport dmx3191d sym53c8xx gdth initio BusLogic arcmsr aic7xxx aic79xx scsi_transport_spi sg
déc 10 21:03:15 Jo kernel: pdc_adma sata_inic162x sata_mv ata_piix sata_qstor sata_vsc sata_uli sata_sis sata_sx4 sata_nv sata_via sata_svw sata_sil24 sata_sil sata_promise pata_sl82c105 pata_via pata_jmicron pata_marvell pata_sis pata_netcell pata_pdc202xx_old pata_triflex pata_atiixp pata_opti pata_amd pata_ali pata_it8213 pata_pcmcia pcmcia pcmcia_core pata_ns87415 pata_ns87410 pata_serverworks pata_artop pata_it821x pata_optidma pata_hpt3x2n pata_hpt3x3 pata_hpt37x pata_hpt366 pata_cmd64x pata_efar pata_rz1000 pata_sil680 pata_radisys pata_pdc2027x pata_mpiix hid_logitech ff_memless usbhid igb xhci_pci i2c_algo_bit xhci_hcd ahci led_class libahci i2c_core usbcore dca libata usb_common
déc 10 21:03:15 Jo kernel: CR2: 0000000000000000
déc 10 21:03:15 Jo kernel: ---[ end trace b8d59f6e48b82af3 ]---
déc 10 21:03:15 Jo kernel: RIP: 0010:sysfs_kf_seq_show+0x89/0x100
déc 10 21:03:15 Jo kernel: Code: 48 89 d1 31 c0 48 83 e7 f8 48 c7 02 00 00 00 00 48 c7 82 f8 0f 00 00 00 00 00 00 48 29 f9 81 c1 00 10 00 00 c1 e9 03 f3 48 ab <48> 8b 45 00 48 85 c0 74 5d 49 8b 09 4c 89 c7 48 8b 71 60 e8 ff 22
déc 10 21:03:15 Jo kernel: RSP: 0018:ffff9d9294213de0 EFLAGS: 00010216
déc 10 21:03:15 Jo kernel: RAX: 0000000000000000 RBX: ffff8b8f08f35880 RCX: 0000000000000000
déc 10 21:03:15 Jo kernel: RDX: ffff8b8f1efc2000 RSI: 0000000000001000 RDI: ffff8b8f1efc3000
déc 10 21:03:15 Jo kernel: RBP: 0000000000000000 R08: ffff8b8f2222a928 R09: ffff8b8f09717cc0
déc 10 21:03:15 Jo kernel: R10: 0000000000000000 R11: 0000000000000000 R12: 00000000ffffffff
déc 10 21:03:15 Jo kernel: R13: 0000000000000001 R14: ffff8b8f14322300 R15: ffff8b8f08f35880
déc 10 21:03:15 Jo kernel: FS: 00007fac245fc740(0000) GS:ffff8b8f3ec00000(0000) knlGS:0000000000000000
déc 10 21:03:15 Jo kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
déc 10 21:03:15 Jo kernel: CR2: 0000000000000000 CR3: 0000000fede76000 CR4: 00000000003406e0
déc 10 21:03:20 Jo kernel: BUG: unable to handle kernel NULL pointer dereference at 0000000000000000
déc 10 21:03:20 Jo kernel: PGD 0 P4D 0
déc 10 21:03:20 Jo kernel: Oops: 0000 [#8] SMP NOPTI
déc 10 21:03:20 Jo kernel: CPU: 24 PID: 4317 Comm: sensors Tainted: G D 4.19.8-gentoo #1
déc 10 21:03:20 Jo kernel: Hardware name: Micro-Star International Co., Ltd. MS-7B09/X399 GAMING PRO CARBON AC (MS-7B09), BIOS 1.B0 08/09/2018
déc 10 21:03:20 Jo kernel: RIP: 0010:sysfs_kf_seq_show+0x89/0x100
déc 10 21:03:20 Jo kernel: Code: 48 89 d1 31 c0 48 83 e7 f8 48 c7 02 00 00 00 00 48 c7 82 f8 0f 00 00 00 00 00 00 48 29 f9 81 c1 00 10 00 00 c1 e9 03 f3 48 ab <48> 8b 45 00 48 85 c0 74 5d 49 8b 09 4c 89 c7 48 8b 71 60 e8 ff 22
Back to top
View user's profile Send private message
Anon-E-moose
Watchman
Watchman


Joined: 23 May 2008
Posts: 6098
Location: Dallas area

PostPosted: Tue Dec 11, 2018 10:36 am    Post subject: Reply with quote

you may need to unbind the card from the video driver before you unbind the device.

example from when I had 2 nvidia cards adapt as necessary
# unbind nvidia
#echo 0000:02:00.0 > /sys/bus/pci/drivers/nouveau/unbind
#echo 0000:02:00.0 > /sys/bus/pci/devices/0000:02:00.0/driver/unbind

and how are you loading/unloading from the virtual drivers.
For me, I put it in a script and load vfio, bind drivers, then drop vfio when done (which gets rid of the video card being bound to it),

modprobe kvm-amd
modprobe vfio_pci
# radeon 6670 vga
echo "1002 6758" >/sys/bus/pci/drivers/vfio-pci/new_id
echo 0000:04:00.0 > /sys/bus/pci/drivers/vfio-pci/bind
# radeon 6670 audio hdmi
echo "1002 aa90" >/sys/bus/pci/drivers/vfio-pci/new_id
echo 0000:04:00.1 > /sys/bus/pci/drivers/vfio-pci/bind
run vm
when done
rmmod vfio_iommu_type1
rmmod vfio_pci
rmmod vfio
rmmod vfio_virqfd
rmmod kvm-amd
rmmod kvm


If you're not going to drop vfio, you would probably need to unbind the video card from it at least.

At least this is my guesses.
_________________
PRIME x570-pro, 3700x, 6.1 zen kernel
gcc 13, profile 17.0 (custom bare multilib), openrc, wayland
Back to top
View user's profile Send private message
bunder
Bodhisattva
Bodhisattva


Joined: 10 Apr 2004
Posts: 5934

PostPosted: Tue Dec 11, 2018 11:49 am    Post subject: Reply with quote

i believe your log got cut off at the top, there should be more up there.
_________________
Neddyseagoon wrote:
The problem with leaving is that you can only do it once and it reduces your influence.

banned from #gentoo since sept 2017
Back to top
View user's profile Send private message
Jooo
n00b
n00b


Joined: 26 Jul 2016
Posts: 2

PostPosted: Wed Dec 12, 2018 2:49 am    Post subject: Reply with quote

Anon-E-moose wrote:
you may need to unbind the card from the video driver before you unbind the device.

example from when I had 2 nvidia cards adapt as necessary
# unbind nvidia
#echo 0000:02:00.0 > /sys/bus/pci/drivers/nouveau/unbind
#echo 0000:02:00.0 > /sys/bus/pci/devices/0000:02:00.0/driver/unbind

and how are you loading/unloading from the virtual drivers.
For me, I put it in a script and load vfio, bind drivers, then drop vfio when done (which gets rid of the video card being bound to it),

modprobe kvm-amd
modprobe vfio_pci
# radeon 6670 vga
echo "1002 6758" >/sys/bus/pci/drivers/vfio-pci/new_id
echo 0000:04:00.0 > /sys/bus/pci/drivers/vfio-pci/bind
# radeon 6670 audio hdmi
echo "1002 aa90" >/sys/bus/pci/drivers/vfio-pci/new_id
echo 0000:04:00.1 > /sys/bus/pci/drivers/vfio-pci/bind
run vm
when done
rmmod vfio_iommu_type1
rmmod vfio_pci
rmmod vfio
rmmod vfio_virqfd
rmmod kvm-amd
rmmod kvm


If you're not going to drop vfio, you would probably need to unbind the video card from it at least.

At least this is my guesses.


I tried your command : echo 0000:0c:00.0 > /sys/bus/pci/drivers/amdgpu/unbind and it does exactly the same as doing : echo 0000:0c:00.0 > /sys/bus/pci/devices/0000:0c:00.0/driver/unbind
If I unbind amdgpu first the other become useless and same if I unbind the peripheral.

I also have a simillar script, I unbind the same way after the vm is done except I don't do the rmmod. But it doesn't change anything. If I start my compuer and unbind the device. If I start gdm and I decide to reboot I have a kernel panic. Even I never start the vm.

The bug in journalctl start to appear at the moment I launch gdm (amdgpu start its work). It's like amdgpu would not like to have a device unbind if another one is loaded.

I even try to rebind the device, it seems to stop the bug message but it doesn't prevent the kernel panic. I didn't do many test because what I would like is to prevent the bug when my vm is working.

bunder wrote:
i believe your log got cut off at the top, there should be more up there.


I just tried to take the complete error message, because what you see between the red lines are repeating indefinetly




So I don't know what to do :) Is it a bug that I should open a bug report? Does a patch already exist? Did I miss something? lol
Back to top
View user's profile Send private message
Display posts from previous:   
Reply to topic    Gentoo Forums Forum Index Kernel & Hardware All times are GMT
Page 1 of 1

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum