View previous topic :: View next topic |
Author |
Message |
Lori Guru
Joined: 30 Mar 2004 Posts: 338 Location: Barcelona, Spain
|
Posted: Thu Jun 21, 2018 6:03 am Post subject: nvidia-smi stays at 100% CPU for 2h before able to login |
|
|
I have had this issue for a very long time now, and so far learned to live with it. I would like to fix it now though. I have a Sandy Bridge Dell XPS 15, model L502x with an Optimus setup. The issue that I describe happens across many different kernel versions and nvidia-drivers versions (even different series). When my laptop boots, I get an nvidia-smi process that is using one of the four (eight hyperthreded) cores to the max for almost exactly 2 hours before it dies. It cannot be killed with 'kill -9'. While it is running, I cannot log in to my desktop environment. I'm running SDDM with KDE 5. SDDM is responsive until I put in my password and press enter to log into KDE, then the login screen freezes for 2 hours, after which I am logged in and I can work.
If I blacklist the nvidia kernel modules, this doesn't happen.
Right now, this is always reproducible. However, there were some kernel/nvidia-drivers combinations where occasionally this wouldn't happen. I have a feeling that the precise timing when the nvidia-drivers are loaded may have something to do with it, but this is only a hunch. _________________ "The hunt is sweeter then the kill."
Registered Linux User #176911 |
|
Back to top |
|
|
Keruskerfuerst Advocate
Joined: 01 Feb 2006 Posts: 2289 Location: near Augsburg, Germany
|
Posted: Tue Jun 26, 2018 9:00 am Post subject: |
|
|
Which nvidia driver version? |
|
Back to top |
|
|
Lori Guru
Joined: 30 Mar 2004 Posts: 338 Location: Barcelona, Spain
|
Posted: Tue Jun 26, 2018 9:49 am Post subject: |
|
|
Keruskerfuerst wrote: | Which nvidia driver version? |
Right now I'm using 390.48. I cannot switch to the 396 series, because I have a 540M chip, which is no longer supported on this latest series. But the issue happened on earlier series too, including at least 381, 384, and 387. _________________ "The hunt is sweeter then the kill."
Registered Linux User #176911 |
|
Back to top |
|
|
hhfeuer Apprentice
Joined: 28 Jul 2005 Posts: 185
|
Posted: Tue Jun 26, 2018 5:06 pm Post subject: |
|
|
Question is, why is nvidia-smi started anyway? It's a user space tool mostly used to monitor things, so it's not needed for operation of the driver. |
|
Back to top |
|
|
Lori Guru
Joined: 30 Mar 2004 Posts: 338 Location: Barcelona, Spain
|
Posted: Tue Jun 26, 2018 10:42 pm Post subject: |
|
|
hhfeuer wrote: | Question is, why is nvidia-smi started anyway? It's a user space tool mostly used to monitor things, so it's not needed for operation of the driver. |
Good question! No idea though. There is a /etc/init.d/nvidia-smi startup script, but it's not started in any runlevel:
Code: | > rc-update -s -v | grep nvidia
nvidia-persistenced |
nvidia-smi | |
_________________ "The hunt is sweeter then the kill."
Registered Linux User #176911 |
|
Back to top |
|
|
Keruskerfuerst Advocate
Joined: 01 Feb 2006 Posts: 2289 Location: near Augsburg, Germany
|
Posted: Wed Jun 27, 2018 6:05 am Post subject: |
|
|
Can you use the noveau driver?
I have a Nvidia card, too and had installed the nvidia driver.
After a while, I uninstalled the nvidia driver and installed the noveau driver. |
|
Back to top |
|
|
Lori Guru
Joined: 30 Mar 2004 Posts: 338 Location: Barcelona, Spain
|
Posted: Wed Jun 27, 2018 6:07 am Post subject: |
|
|
Keruskerfuerst wrote: | Can you use the noveau driver?
I have a Nvidia card, too and had installed the nvidia driver.
After a while, I uninstalled the nvidia driver and installed the noveau driver. |
I would prefer not to, because I sometimes use CUDA, and I have a working Optimus config for Steam, and I do gaming too. Otherwise I would just use the Intel card and disable the nVidia chip completely. _________________ "The hunt is sweeter then the kill."
Registered Linux User #176911 |
|
Back to top |
|
|
hhfeuer Apprentice
Joined: 28 Jul 2005 Posts: 185
|
Posted: Wed Jun 27, 2018 7:22 am Post subject: |
|
|
I looked at the ebuild and gentoo uses nvidia-smi in a udev rule to create the /dev nodes (a hacky way to do that imho) which can lead to those problems, so that script was changed in 396.24-r1:
https://bugs.gentoo.org/454740 |
|
Back to top |
|
|
Lori Guru
Joined: 30 Mar 2004 Posts: 338 Location: Barcelona, Spain
|
Posted: Wed Jun 27, 2018 12:39 pm Post subject: |
|
|
hhfeuer wrote: | I looked at the ebuild and gentoo uses nvidia-smi in a udev rule to create the /dev nodes (a hacky way to do that imho) which can lead to those problems, so that script was changed in 396.24-r1:
https://bugs.gentoo.org/454740 |
Thanks for the pointer, this is useful info!
I just checked that the 390.48 version driver ebuild (which I have installed) uses the same supposedly fixed script, which is installed into /lib64/udev/nvidia-udev.sh. The difference between the old and new script is this:
Code: | --- /usr/portage/x11-drivers/nvidia-drivers/files/nvidia-udev.sh 2015-08-09 02:38:18.000000000 +0200
+++ /usr/portage/x11-drivers/nvidia-drivers/files/nvidia-udev.sh-r1 2015-09-20 23:29:38.000000000 +0200
@@ -7,7 +7,10 @@
case $1 in
add|ADD)
- /opt/bin/nvidia-smi > /dev/null
+ #hopefully this prevents infinite loops like bug #454740
+ if lsmod | grep -iq nvidia; then
+ /opt/bin/nvidia-smi > /dev/null
+ fi
;;
remove|REMOVE)
rm -f /dev/nvidia* |
However, my issue still persists with the new "fixed" script. _________________ "The hunt is sweeter then the kill."
Registered Linux User #176911 |
|
Back to top |
|
|
hhfeuer Apprentice
Joined: 28 Jul 2005 Posts: 185
|
Posted: Wed Jun 27, 2018 1:34 pm Post subject: |
|
|
And if you just remove the udev rule, just to see if that is the cause? |
|
Back to top |
|
|
hhfeuer Apprentice
Joined: 28 Jul 2005 Posts: 185
|
Posted: Wed Jun 27, 2018 1:37 pm Post subject: |
|
|
BTW, which setup are you using, Bumblebee or PRIME? |
|
Back to top |
|
|
Lori Guru
Joined: 30 Mar 2004 Posts: 338 Location: Barcelona, Spain
|
Posted: Wed Jun 27, 2018 7:01 pm Post subject: |
|
|
hhfeuer wrote: | And if you just remove the udev rule, just to see if that is the cause? |
Indeed, commenting out the "add|ADD" part with nvidia-smi in nvidia-udev.sh seems to help. But I'm reinstalling the kernel quite often (especially since there are frequent releases due to the spectre vulnerabilities) and it would be nice to have a more permanent solution. Also, as a general principle, I don't like to touch non-configuration files installed by the package manager. _________________ "The hunt is sweeter then the kill."
Registered Linux User #176911 |
|
Back to top |
|
|
Lori Guru
Joined: 30 Mar 2004 Posts: 338 Location: Barcelona, Spain
|
Posted: Wed Jun 27, 2018 7:02 pm Post subject: |
|
|
hhfeuer wrote: | BTW, which setup are you using, Bumblebee or PRIME? |
I'm using Bumblebee, PRIME wasn't available yet when I sorted out the Optimus situation for my laptop. Is there a good guide for PRIME? _________________ "The hunt is sweeter then the kill."
Registered Linux User #176911 |
|
Back to top |
|
|
hhfeuer Apprentice
Joined: 28 Jul 2005 Posts: 185
|
|
Back to top |
|
|
Lori Guru
Joined: 30 Mar 2004 Posts: 338 Location: Barcelona, Spain
|
Posted: Thu Jun 28, 2018 10:46 am Post subject: |
|
|
Lori wrote: | hhfeuer wrote: | BTW, which setup are you using, Bumblebee or PRIME? |
I'm using Bumblebee, PRIME wasn't available yet when I sorted out the Optimus situation for my laptop. Is there a good guide for PRIME? |
Sorry, I just realized that PRIME is a Noveau thing, that's why I didn't switch to it. _________________ "The hunt is sweeter then the kill."
Registered Linux User #176911 |
|
Back to top |
|
|
hhfeuer Apprentice
Joined: 28 Jul 2005 Posts: 185
|
Posted: Thu Jun 28, 2018 10:52 am Post subject: |
|
|
It's not a nouveau thing. It's a general drm thing, working with nvidia, nouveau, amd, intel. There are two types of PRIME, though, prime output and prime offload. The prop. driver only supports output, meaning no on demand switching. Bumblebee tries to mimic offload, meaning on demand switching. nouveau supports both types. |
|
Back to top |
|
|
Lori Guru
Joined: 30 Mar 2004 Posts: 338 Location: Barcelona, Spain
|
Posted: Thu Jun 28, 2018 11:38 am Post subject: |
|
|
hhfeuer wrote: | It's not a nouveau thing. It's a general drm thing, working with nvidia, nouveau, amd, intel. There are two types of PRIME, though, prime output and prime offload. The prop. driver only supports output, meaning no on demand switching. |
You mean it is possible to use the HDMI output with the proprietary driver? That would be awesome! I couldn't find any info on how to do that a while back when I searched. _________________ "The hunt is sweeter then the kill."
Registered Linux User #176911 |
|
Back to top |
|
|
hhfeuer Apprentice
Joined: 28 Jul 2005 Posts: 185
|
Posted: Thu Jun 28, 2018 2:49 pm Post subject: |
|
|
Using hdmi should be possible with both, bumblebee and prime. |
|
Back to top |
|
|
Adrien.D Apprentice
Joined: 18 Jan 2015 Posts: 157
|
Posted: Sat Jun 30, 2018 9:29 am Post subject: |
|
|
Hi,
Same problem for me.
I can't open MATE desktop or Fluxbox from lightdm since an update.
Downgrade nvidia-drivers : not work
I disable bumblebee from boot : OK but only use intel card.
I opened a bug : https://bugs.gentoo.org/659646 _________________ Desktop : MSI Gaming Pro X470 - AMD Ryzen 5 2600X - RX 560 - OpenRC GNOME - gentoo-sources-6.1 LTS
Server : Acer Barebone - Intel i3-8100T - OpenRC CLI - gentoo-sources-5.4 LTS
VMs : A lot of VMS to practice Gentoo of course (proxmox, virtualbox) |
|
Back to top |
|
|
Adrien.D Apprentice
Joined: 18 Jan 2015 Posts: 157
|
Posted: Sun Jul 01, 2018 8:29 am Post subject: |
|
|
I tested 4.17.3 kernel => it worls ! _________________ Desktop : MSI Gaming Pro X470 - AMD Ryzen 5 2600X - RX 560 - OpenRC GNOME - gentoo-sources-6.1 LTS
Server : Acer Barebone - Intel i3-8100T - OpenRC CLI - gentoo-sources-5.4 LTS
VMs : A lot of VMS to practice Gentoo of course (proxmox, virtualbox) |
|
Back to top |
|
|
Lori Guru
Joined: 30 Mar 2004 Posts: 338 Location: Barcelona, Spain
|
Posted: Sun Jul 01, 2018 1:07 pm Post subject: |
|
|
Adrien.D wrote: | I tested 4.17.3 kernel => it worls ! |
What changed with the kernel update? _________________ "The hunt is sweeter then the kill."
Registered Linux User #176911 |
|
Back to top |
|
|
Adrien.D Apprentice
Joined: 18 Jan 2015 Posts: 157
|
Posted: Sun Jul 01, 2018 3:33 pm Post subject: |
|
|
Lori wrote: | Adrien.D wrote: | I tested 4.17.3 kernel => it worls ! |
What changed with the kernel update? |
The possibility to start a desktop environment and nvidia-smi process which didn't use 1 CPU durige some time _________________ Desktop : MSI Gaming Pro X470 - AMD Ryzen 5 2600X - RX 560 - OpenRC GNOME - gentoo-sources-6.1 LTS
Server : Acer Barebone - Intel i3-8100T - OpenRC CLI - gentoo-sources-5.4 LTS
VMs : A lot of VMS to practice Gentoo of course (proxmox, virtualbox) |
|
Back to top |
|
|
|