Unusual nvidia-smi CPU usage and seems hindering X start.

sbbg · n00b Joined: 02 Feb 2013 Posts: 23

Hi, Gentoo user & devs.

I am installing Gentoo on a Mi Notebook Air by now.
http://www.notebookcheck.net/Xiaomi-Mi-Air-13-3-inch-Notebook-Review.180561.0.html,
which happens to be an nVidia Optimus machine.
But I did follow the https://wiki.gentoo.org/wiki/NVIDIA/Optimus guide carefully.

After I finished most of the jobs and rebooting,
I found that "MOSTLY" there are 100% single core cpu usage from nvidia-smi for no reason.
And in this state, I can't normally run startX nor xdm. It will hang in black screen with only unblinking cursor on top left corner.
There will be no Xorg.0.log left, which is very disturbing and frustrating.

Do you have any idea why does this happens randomly?
Is there anything I have to check again?

Thank you.

Roman_Gruber · Posted: Tue Apr 04, 2017 5:09 pm Post subject:

are you using ~ branch? brand new hardware benefit the most from software updates from ~branch

dated hardware, 3 years or older can run on stable branch just fine usually

do you see the command line / shell? => init 3?

how do you start the x-server?

at least i would update the following components to the lastest availabe in portage (~ marked) => gentoo sources, mesa, intel gpu drivers, hole x server components, hole init components (e.g. openrc + eudev)

sbbg · n00b Joined: 02 Feb 2013 Posts: 23

sbbg · n00b Joined: 02 Feb 2013 Posts: 23

Hi,

I can finally come up with more details and clues.

I just found out when I choose "Recovery mode" in GRUB2, after input password and remount the root FS,
nvidia-smi no longer waste CPU usage, and I can start X with both startx and XDM normally.

So I couldn't help but to think there might be something wrong with the service executed in normal boot but not during recovery.
Thus here are the results of both services status.

Normal boot:

Roman_Gruber · Posted: Wed Apr 05, 2017 8:55 am Post subject:

whats the content of grub.cfg

regarding both boot entires please? the working one, the bugged one

e.g. something like (starts with menuentry and ends with })

sbbg · n00b Joined: 02 Feb 2013 Posts: 23

Roman_Gruber · Posted: Wed Apr 05, 2017 1:26 pm Post subject:

You boot into single mode with your recovery.

sbbg · n00b Joined: 02 Feb 2013 Posts: 23

Hi,

Yes, I'm using 4.9.20 gentoo-sources kernel,
here is the uname output:

Roman_Gruber · Posted: Thu Apr 06, 2017 7:46 pm Post subject:

google: nvidia-smi hangs

https://devtalk.nvidia.com/default/topic/929282/nvidia-smi-hangs-cannot-be-killed-even-by-sigkill/?offset=1

http://unix.stackexchange.com/questions/255658/nvidia-smi-hangs-indefinitely-what-could-be-the-issue

http://www.linuxquestions.org/questions/slackware-14/nvidia-drivers-some-hard-freezes-on-current-4175452311/

alikasundara · n00b Joined: 10 Nov 2011 Posts: 5

Hi sbbg,

Have you had any luck with resolving that? I am facing the same issue, the kernel version is 4.9.6.

Thanks.

Borodux · n00b Joined: 21 May 2017 Posts: 1

Hi,

This problem arised for me after updating nvidia-drivers to 381.22. After trying to boot several times it may successfully start X server. Most of the times nvidia-smi eats 100% CPU and the process is totally unkillable.

However downgrading to 378.13 didn't solve the problem. I didn't notice which version was before, but all distfiles are new now. So old version might changed too.

I use:
- 4.9.16-gentoo i686
- framebuffer (vesafb) for splash theme and console decorations
- 01:00.0 VGA compatible controller: NVIDIA Corporation GK106 [GeForce GTX 650 Ti] (rev a1)

Disabling console decorations and framebuffer solved the problem. Also didn't find anything about this new behaviour.

Yamakuzure · Posted: Mon May 22, 2017 8:00 am Post subject:

I have the same issue if the 'nvidia' kernel module is loaded instead of 'nvidia-uvm'. (I am using BumbleBee, though)

When I run something through primusrun/optirun, it looks like this, and everything works just fine:

Zephyrus · Apprentice Joined: 01 Sep 2004 Posts: 204

I have a slightly different setup, indeed I am using bumblebee to run a W530 using the Intel integrated graphics for the main laptop screen and bumblebeee-Nvidia to run an external display.
However, recently I started to have the same issue, with nvidia-smi hanging at boot, when run by udev. This was not happening at every boot but somehow randomly. This made me suspect that it was some kind of race condition at boot.
Indeed, I discovered that by adding a small sleep (sleep 1.5 below) to /lib/udev/nvidia-udev.sh, which was the script running nvidia-smi at boot,

Jimini · l33t Joined: 31 Oct 2006 Posts: 601 Location: Germany

I can confirm this behavior for gentoo-sources-4.9.95 and nvidia-drivers-390.42. After running startx, the black screen only show a not blinking prompt, and nvidia-smi generates 100% cpu load. top / ps also show status "D", which means "uninterruptible sleep", IIRC.

Zephyrus' solution works in my case, so I could finally start the gui :-)

Best regards,
Jimini
_________________
"The most merciful thing in the world, I think, is the inability of the human mind to correlate all its contents." (H.P. Lovecraft: The Call of Cthulhu)

Satarsa · Posted: Thu Jan 24, 2019 3:24 pm Post subject:

I hit the same issue, but Zephyrus's solution does not work for me.
The only thing works, is not to allow autoloading nvidia kernel module.
In my case I added to /etc/modprobe.d/blacklist.conf