System experiences terrible graphical/input lag after idle
PostPosted: Sun Dec 01, 2019 10:54 am    Post subject: System experiences terrible graphical/input lag after idle Reply with quote

Starting about a week to two weeks ago after leaving my system idle (which generally causes xscreensaver to run but I think this occurs even when it doesnt) my system becomes completely unresponsive. It essentially takes a good several seconds to update the screen after I press an action (like switching VTs) or even worse attempt to type anything into a terminal. Typing into a terminal results in hundreds of repeated keys spamming accross the terminal screen which the screen updates every second or so until I kill X.

I've found that killing my wm (in this case fluxbox) and restarting it causes normal functionality again but it is a workaround. I'd like to figure out whats causing this and why.

Here are some specs:


Linux primarybox 5.4.1-gentoo #1 SMP Sat Nov 30 03:14:29 EST 2019 x86_64 AMD Ryzen Threadripper 2950X 16-Core Processor AuthenticAMD GNU/Linux
MemTotal:       65847124 kB
MemFree:        17709276 kB

[    0.000000] Linux version 5.4.1-gentoo (root@primarybox) (gcc version 9.2.0 (Gentoo 9.2.0-r2 p3)) #1 SMP Sat Nov 30 03:14:29 EST 2019
[    0.000000] Command line: BOOT_IMAGE=/kernel-5.4.1 root=/dev/mapper/root ro root_key=/keyfile.gpg crypt_root=/dev/sda2 root_trim=yes amd_iommu=on iommu=pt vfio-pci.ids=144d:a808,10de:1b81,10de:10f0 vfio_iommu_type1.allow_unsafe_interrupts=1 kvm.ignore_msrs=1 pcie_aspm=off

x11 driver in use: x11-drivers/nvidia-drivers-440.31-r1

This system is running kvm with vfio for gpu passthrough to a guest windows vm. I run several linux vms as well and use virt-manager to view them full screen with spice.

Some data points:

When the system is in its extreme lag phase I cannot interact with the guest vms at all, and even typing into a terminal on the host lags (when the guest vm window is present).
However when I remote into the same guest vms from my laptop over wifi the guest is running 100% flawlessly. In other words, its the host vm/gpu/interface that is lagging but all the vms on the system are running normally. Again when I kill and rerun fluxbox the issue goes away.

Interesting error in dmesg on host:

followed by

[  200.633473] unchecked MSR access error: RDMSR from 0x48 at rIP: 0xffffffff8746a327 (svm_vcpu_run+0x607/0x810)
[  200.633474] Call Trace:
[  200.633478]  ? kvm_arch_vcpu_ioctl_run+0x875/0x1c40
[  200.633480]  ? __bpf_prog_run32+0x64/0x90
[  200.633481]  ? kvm_vcpu_ioctl+0x25e/0x600
[  200.633483]  ? do_vfs_ioctl+0x431/0x6b0
[  200.633484]  ? syscall_trace_enter+0x13e/0x2d0
[  200.633485]  ? ksys_ioctl+0x59/0x90
[  200.633485]  ? __x64_sys_ioctl+0x11/0x20
[  200.633486]  ? do_syscall_64+0x43/0x110
[  200.633488]  ? entry_SYSCALL_64_after_hwframe+0x44/0xa9


[62827.852617] resource sanity check: requesting [mem 0x000c0000-0x000fffff], which spans more than PCI Bus 0000:00 [mem 0x000c0000-0x000dffff window]
[62827.852684] caller _nv000906rm+0x1bf/0x1f0 [nvidia] mapping multiple BARs

I haven't changed anything other than update packages every week or so. I'm happy to post more information as requested. Currently I think this might be a bug in nvidia's drivers as maybe there was an update in that. However it could also be a gcc9 bug since its something I had updated within the time window. I have since rebuild every package in @world with gcc9 in case I missed something but that hasn't fixed the problem either.
PostPosted: Sun Dec 01, 2019 5:02 pm    Post subject: Reply with quote

Still seems like you have a memory leak somewhere and out of memory on the host and it's trying to swap (and perhaps you don't have any, and the machine starts puking instead).

What processes are running after it explodes?

Again you should put some swap space. I have swap on all my machines.
