Gentoo Forums
Gentoo Forums
Gentoo Forums
Quick Search: in
nvidia-smi stays at 100% CPU for 2h before able to login
View unanswered posts
View posts from last 24 hours
View posts from last 7 days

 
Reply to topic    Gentoo Forums Forum Index Desktop Environments
View previous topic :: View next topic  
Author Message
Lori
Guru
Guru


Joined: 30 Mar 2004
Posts: 338
Location: Barcelona, Spain

PostPosted: Thu Jun 21, 2018 6:03 am    Post subject: nvidia-smi stays at 100% CPU for 2h before able to login Reply with quote

I have had this issue for a very long time now, and so far learned to live with it. I would like to fix it now though. I have a Sandy Bridge Dell XPS 15, model L502x with an Optimus setup. The issue that I describe happens across many different kernel versions and nvidia-drivers versions (even different series). When my laptop boots, I get an nvidia-smi process that is using one of the four (eight hyperthreded) cores to the max for almost exactly 2 hours before it dies. It cannot be killed with 'kill -9'. While it is running, I cannot log in to my desktop environment. I'm running SDDM with KDE 5. SDDM is responsive until I put in my password and press enter to log into KDE, then the login screen freezes for 2 hours, after which I am logged in and I can work.

If I blacklist the nvidia kernel modules, this doesn't happen.

Right now, this is always reproducible. However, there were some kernel/nvidia-drivers combinations where occasionally this wouldn't happen. I have a feeling that the precise timing when the nvidia-drivers are loaded may have something to do with it, but this is only a hunch.
_________________
"The hunt is sweeter then the kill."
Registered Linux User #176911
Back to top
View user's profile Send private message
Keruskerfuerst
Advocate
Advocate


Joined: 01 Feb 2006
Posts: 2289
Location: near Augsburg, Germany

PostPosted: Tue Jun 26, 2018 9:00 am    Post subject: Reply with quote

Which nvidia driver version?
Back to top
View user's profile Send private message
Lori
Guru
Guru


Joined: 30 Mar 2004
Posts: 338
Location: Barcelona, Spain

PostPosted: Tue Jun 26, 2018 9:49 am    Post subject: Reply with quote

Keruskerfuerst wrote:
Which nvidia driver version?


Right now I'm using 390.48. I cannot switch to the 396 series, because I have a 540M chip, which is no longer supported on this latest series. But the issue happened on earlier series too, including at least 381, 384, and 387.
_________________
"The hunt is sweeter then the kill."
Registered Linux User #176911
Back to top
View user's profile Send private message
hhfeuer
Apprentice
Apprentice


Joined: 28 Jul 2005
Posts: 185

PostPosted: Tue Jun 26, 2018 5:06 pm    Post subject: Reply with quote

Question is, why is nvidia-smi started anyway? It's a user space tool mostly used to monitor things, so it's not needed for operation of the driver.
Back to top
View user's profile Send private message
Lori
Guru
Guru


Joined: 30 Mar 2004
Posts: 338
Location: Barcelona, Spain

PostPosted: Tue Jun 26, 2018 10:42 pm    Post subject: Reply with quote

hhfeuer wrote:
Question is, why is nvidia-smi started anyway? It's a user space tool mostly used to monitor things, so it's not needed for operation of the driver.


Good question! No idea though. There is a /etc/init.d/nvidia-smi startup script, but it's not started in any runlevel:

Code:
> rc-update -s -v | grep nvidia
  nvidia-persistenced |                                       
           nvidia-smi |

_________________
"The hunt is sweeter then the kill."
Registered Linux User #176911
Back to top
View user's profile Send private message
Keruskerfuerst
Advocate
Advocate


Joined: 01 Feb 2006
Posts: 2289
Location: near Augsburg, Germany

PostPosted: Wed Jun 27, 2018 6:05 am    Post subject: Reply with quote

Can you use the noveau driver?

I have a Nvidia card, too and had installed the nvidia driver.
After a while, I uninstalled the nvidia driver and installed the noveau driver.
Back to top
View user's profile Send private message
Lori
Guru
Guru


Joined: 30 Mar 2004
Posts: 338
Location: Barcelona, Spain

PostPosted: Wed Jun 27, 2018 6:07 am    Post subject: Reply with quote

Keruskerfuerst wrote:
Can you use the noveau driver?

I have a Nvidia card, too and had installed the nvidia driver.
After a while, I uninstalled the nvidia driver and installed the noveau driver.


I would prefer not to, because I sometimes use CUDA, and I have a working Optimus config for Steam, and I do gaming too. Otherwise I would just use the Intel card and disable the nVidia chip completely.
_________________
"The hunt is sweeter then the kill."
Registered Linux User #176911
Back to top
View user's profile Send private message
hhfeuer
Apprentice
Apprentice


Joined: 28 Jul 2005
Posts: 185

PostPosted: Wed Jun 27, 2018 7:22 am    Post subject: Reply with quote

I looked at the ebuild and gentoo uses nvidia-smi in a udev rule to create the /dev nodes (a hacky way to do that imho) which can lead to those problems, so that script was changed in 396.24-r1:
https://bugs.gentoo.org/454740
Back to top
View user's profile Send private message
Lori
Guru
Guru


Joined: 30 Mar 2004
Posts: 338
Location: Barcelona, Spain

PostPosted: Wed Jun 27, 2018 12:39 pm    Post subject: Reply with quote

hhfeuer wrote:
I looked at the ebuild and gentoo uses nvidia-smi in a udev rule to create the /dev nodes (a hacky way to do that imho) which can lead to those problems, so that script was changed in 396.24-r1:
https://bugs.gentoo.org/454740


Thanks for the pointer, this is useful info!

I just checked that the 390.48 version driver ebuild (which I have installed) uses the same supposedly fixed script, which is installed into /lib64/udev/nvidia-udev.sh. The difference between the old and new script is this:

Code:
--- /usr/portage/x11-drivers/nvidia-drivers/files/nvidia-udev.sh   2015-08-09 02:38:18.000000000 +0200
+++ /usr/portage/x11-drivers/nvidia-drivers/files/nvidia-udev.sh-r1   2015-09-20 23:29:38.000000000 +0200
@@ -7,7 +7,10 @@

 case $1 in
    add|ADD)
-      /opt/bin/nvidia-smi > /dev/null
+      #hopefully this prevents infinite loops like bug #454740
+      if lsmod | grep -iq nvidia; then
+         /opt/bin/nvidia-smi > /dev/null
+      fi
       ;;
    remove|REMOVE)
       rm -f /dev/nvidia*


However, my issue still persists with the new "fixed" script.
_________________
"The hunt is sweeter then the kill."
Registered Linux User #176911
Back to top
View user's profile Send private message
hhfeuer
Apprentice
Apprentice


Joined: 28 Jul 2005
Posts: 185

PostPosted: Wed Jun 27, 2018 1:34 pm    Post subject: Reply with quote

And if you just remove the udev rule, just to see if that is the cause?
Back to top
View user's profile Send private message
hhfeuer
Apprentice
Apprentice


Joined: 28 Jul 2005
Posts: 185

PostPosted: Wed Jun 27, 2018 1:37 pm    Post subject: Reply with quote

BTW, which setup are you using, Bumblebee or PRIME?
Back to top
View user's profile Send private message
Lori
Guru
Guru


Joined: 30 Mar 2004
Posts: 338
Location: Barcelona, Spain

PostPosted: Wed Jun 27, 2018 7:01 pm    Post subject: Reply with quote

hhfeuer wrote:
And if you just remove the udev rule, just to see if that is the cause?


Indeed, commenting out the "add|ADD" part with nvidia-smi in nvidia-udev.sh seems to help. But I'm reinstalling the kernel quite often (especially since there are frequent releases due to the spectre vulnerabilities) and it would be nice to have a more permanent solution. Also, as a general principle, I don't like to touch non-configuration files installed by the package manager.
_________________
"The hunt is sweeter then the kill."
Registered Linux User #176911
Back to top
View user's profile Send private message
Lori
Guru
Guru


Joined: 30 Mar 2004
Posts: 338
Location: Barcelona, Spain

PostPosted: Wed Jun 27, 2018 7:02 pm    Post subject: Reply with quote

hhfeuer wrote:
BTW, which setup are you using, Bumblebee or PRIME?


I'm using Bumblebee, PRIME wasn't available yet when I sorted out the Optimus situation for my laptop. Is there a good guide for PRIME?
_________________
"The hunt is sweeter then the kill."
Registered Linux User #176911
Back to top
View user's profile Send private message
hhfeuer
Apprentice
Apprentice


Joined: 28 Jul 2005
Posts: 185

PostPosted: Wed Jun 27, 2018 10:43 pm    Post subject: Reply with quote

Of course it's just a workaround. The proper solution is to report to the mentioned bug that the script doesn't work with bumblebee. IMHO that script is not needed anyway since what it does, nvidia-persistenced is meant for. Which comes with an init script/systemd.service so can simply be enabled when needed.
Currently, most PRIME guides are somewhat outdated. Some starting points would be
https://wiki.archlinux.org/index.php/NVIDIA_Optimus
https://devtalk.nvidia.com/default/topic/1022670/linux/official-driver-384-59-with-geforce-1050m-doesn-t-work-on-opensuse-tumbleweed-kde/post/5203910/#5203910
Back to top
View user's profile Send private message
Lori
Guru
Guru


Joined: 30 Mar 2004
Posts: 338
Location: Barcelona, Spain

PostPosted: Thu Jun 28, 2018 10:46 am    Post subject: Reply with quote

Lori wrote:
hhfeuer wrote:
BTW, which setup are you using, Bumblebee or PRIME?


I'm using Bumblebee, PRIME wasn't available yet when I sorted out the Optimus situation for my laptop. Is there a good guide for PRIME?


Sorry, I just realized that PRIME is a Noveau thing, that's why I didn't switch to it.
_________________
"The hunt is sweeter then the kill."
Registered Linux User #176911
Back to top
View user's profile Send private message
hhfeuer
Apprentice
Apprentice


Joined: 28 Jul 2005
Posts: 185

PostPosted: Thu Jun 28, 2018 10:52 am    Post subject: Reply with quote

It's not a nouveau thing. It's a general drm thing, working with nvidia, nouveau, amd, intel. There are two types of PRIME, though, prime output and prime offload. The prop. driver only supports output, meaning no on demand switching. Bumblebee tries to mimic offload, meaning on demand switching. nouveau supports both types.
Back to top
View user's profile Send private message
Lori
Guru
Guru


Joined: 30 Mar 2004
Posts: 338
Location: Barcelona, Spain

PostPosted: Thu Jun 28, 2018 11:38 am    Post subject: Reply with quote

hhfeuer wrote:
It's not a nouveau thing. It's a general drm thing, working with nvidia, nouveau, amd, intel. There are two types of PRIME, though, prime output and prime offload. The prop. driver only supports output, meaning no on demand switching.


You mean it is possible to use the HDMI output with the proprietary driver? That would be awesome! I couldn't find any info on how to do that a while back when I searched.
_________________
"The hunt is sweeter then the kill."
Registered Linux User #176911
Back to top
View user's profile Send private message
hhfeuer
Apprentice
Apprentice


Joined: 28 Jul 2005
Posts: 185

PostPosted: Thu Jun 28, 2018 2:49 pm    Post subject: Reply with quote

Using hdmi should be possible with both, bumblebee and prime.
Back to top
View user's profile Send private message
Adrien.D
Apprentice
Apprentice


Joined: 18 Jan 2015
Posts: 157

PostPosted: Sat Jun 30, 2018 9:29 am    Post subject: Reply with quote

Hi,
Same problem for me.
I can't open MATE desktop or Fluxbox from lightdm since an update.
Downgrade nvidia-drivers : not work
I disable bumblebee from boot : OK but only use intel card.

I opened a bug : https://bugs.gentoo.org/659646
_________________
Desktop : MSI Gaming Pro X470 - AMD Ryzen 5 2600X - RX 560 - OpenRC GNOME - gentoo-sources-6.1 LTS
Server : Acer Barebone - Intel i3-8100T - OpenRC CLI - gentoo-sources-5.4 LTS
VMs : A lot of VMS to practice Gentoo of course :) (proxmox, virtualbox)
Back to top
View user's profile Send private message
Adrien.D
Apprentice
Apprentice


Joined: 18 Jan 2015
Posts: 157

PostPosted: Sun Jul 01, 2018 8:29 am    Post subject: Reply with quote

I tested 4.17.3 kernel => it worls !
_________________
Desktop : MSI Gaming Pro X470 - AMD Ryzen 5 2600X - RX 560 - OpenRC GNOME - gentoo-sources-6.1 LTS
Server : Acer Barebone - Intel i3-8100T - OpenRC CLI - gentoo-sources-5.4 LTS
VMs : A lot of VMS to practice Gentoo of course :) (proxmox, virtualbox)
Back to top
View user's profile Send private message
Lori
Guru
Guru


Joined: 30 Mar 2004
Posts: 338
Location: Barcelona, Spain

PostPosted: Sun Jul 01, 2018 1:07 pm    Post subject: Reply with quote

Adrien.D wrote:
I tested 4.17.3 kernel => it worls !


What changed with the kernel update?
_________________
"The hunt is sweeter then the kill."
Registered Linux User #176911
Back to top
View user's profile Send private message
Adrien.D
Apprentice
Apprentice


Joined: 18 Jan 2015
Posts: 157

PostPosted: Sun Jul 01, 2018 3:33 pm    Post subject: Reply with quote

Lori wrote:
Adrien.D wrote:
I tested 4.17.3 kernel => it worls !


What changed with the kernel update?


The possibility to start a desktop environment and nvidia-smi process which didn't use 1 CPU durige some time
_________________
Desktop : MSI Gaming Pro X470 - AMD Ryzen 5 2600X - RX 560 - OpenRC GNOME - gentoo-sources-6.1 LTS
Server : Acer Barebone - Intel i3-8100T - OpenRC CLI - gentoo-sources-5.4 LTS
VMs : A lot of VMS to practice Gentoo of course :) (proxmox, virtualbox)
Back to top
View user's profile Send private message
Display posts from previous:   
Reply to topic    Gentoo Forums Forum Index Desktop Environments All times are GMT
Page 1 of 1

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum