View previous topic :: View next topic |
Author |
Message |
sbbg n00b
Joined: 02 Feb 2013 Posts: 23
|
Posted: Tue Apr 04, 2017 4:19 am Post subject: Unusual nvidia-smi CPU usage and seems hindering X start. |
|
|
Hi, Gentoo user & devs.
I am installing Gentoo on a Mi Notebook Air by now.
http://www.notebookcheck.net/Xiaomi-Mi-Air-13-3-inch-Notebook-Review.180561.0.html,
which happens to be an nVidia Optimus machine.
But I did follow the https://wiki.gentoo.org/wiki/NVIDIA/Optimus guide carefully.
After I finished most of the jobs and rebooting,
I found that "MOSTLY" there are 100% single core cpu usage from nvidia-smi for no reason.
And in this state, I can't normally run startX nor xdm. It will hang in black screen with only unblinking cursor on top left corner.
There will be no Xorg.0.log left, which is very disturbing and frustrating.
Do you have any idea why does this happens randomly?
Is there anything I have to check again?
Thank you. |
|
Back to top |
|
|
Roman_Gruber Advocate
Joined: 03 Oct 2006 Posts: 3846 Location: Austro Bavaria
|
Posted: Tue Apr 04, 2017 5:09 pm Post subject: |
|
|
are you using ~ branch? brand new hardware benefit the most from software updates from ~branch
dated hardware, 3 years or older can run on stable branch just fine usually
do you see the command line / shell? => init 3?
how do you start the x-server?
at least i would update the following components to the lastest availabe in portage (~ marked) => gentoo sources, mesa, intel gpu drivers, hole x server components, hole init components (e.g. openrc + eudev) |
|
Back to top |
|
|
sbbg n00b
Joined: 02 Feb 2013 Posts: 23
|
Posted: Wed Apr 05, 2017 2:27 am Post subject: |
|
|
Roman_Gruber wrote: | are you using ~ branch? brand new hardware benefit the most from software updates from ~branch
dated hardware, 3 years or older can run on stable branch just fine usually
do you see the command line / shell? => init 3?
how do you start the x-server?
at least i would update the following components to the lastest availabe in portage (~ marked) => gentoo sources, mesa, intel gpu drivers, hole x server components, hole init components (e.g. openrc + eudev) |
Sir,
thank you for reply first.
I'm actually using ~amd64 branch by now.
Sorry, I'm not sure about what do you mean by init 3.
But this laptop does start all services in boot, sysinit, nonetwork, and default run levels without any problem.
Thus I can see the normal [local] service result of shell prompt as well.
There are 2 ways I tried to launch X:
(1)by XDM, which is set to start SDDM in configuration, the xrandr commands which are required by Optimus guide are specified in /usr/share/sddm/scripts/Xsetup.
(2)by launching "startx" command by root. The xrandr commands are put into .xinitrc in root's home.
The package you mentioned are at the following version:
* I only mask 4.10 kernel for compatibility with nvidia-drivers;
Kernel: gentoo-source 4.9.20 ( look like current nvidia-driver does not get along well with kernel 4.10 )
Mesa: mesa-17.0.3 USE="classic dri3 egl gallium gbm nptl vdpau wayland"
intel gpu: xf86-video-intel is NOT installed by xorg-drivers meta package somehow. I'm not sure if this behavior is correct although I specified VIDEO_CARDS="intel i965 nvidia" in make.conf
nvidia-drivers: 378.13
OpenRC: openrc-0.24.2
If you have further suggestion, please tell me again.
I really scratched my head so hard my hairs are running out. Thank you. |
|
Back to top |
|
|
sbbg n00b
Joined: 02 Feb 2013 Posts: 23
|
Posted: Wed Apr 05, 2017 6:16 am Post subject: Further details concerning init |
|
|
Hi,
I can finally come up with more details and clues.
I just found out when I choose "Recovery mode" in GRUB2, after input password and remount the root FS,
nvidia-smi no longer waste CPU usage, and I can start X with both startx and XDM normally.
So I couldn't help but to think there might be something wrong with the service executed in normal boot but not during recovery.
Thus here are the results of both services status.
Normal boot:
Code: | Runlevel: shutdown
savecache [ stopped ]
killprocs [ stopped ]
mount-ro [ stopped ]
Runlevel: sysinit
sysfs [ started ]
devfs [ started ]
udev [ started ]
dmesg [ started ]
kmod-static-nodes [ started ]
udev-trigger [ started ]
Runlevel: default
dbus [ started ]
bluetooth [ started ]
wicd [ started ]
local [ started ]
Runlevel: nonetwork
local [ started ]
Runlevel: boot
modules [ started ]
hwclock [ started ]
hostname [ started ]
fsck [ started ]
root [ started ]
mtab [ started ]
localmount [ started ]
urandom [ started ]
sysctl [ started ]
bootmisc [ started ]
termencoding [ started ]
keymaps [ started ]
loopback [ started ]
procfs [ started ]
binfmt [ started ]
Dynamic Runlevel: hotplugged
Dynamic Runlevel: needed/wanted
modules-load [ started ]
Dynamic Runlevel: manual
|
Recovery mode:
Code: |
Runlevel: shutdown
savecache [ stopped ]
killprocs [ stopped ]
mount-ro [ stopped ]
Runlevel: sysinit
sysfs [ started ]
devfs [ started ]
udev [ started ]
dmesg [ started ]
kmod-static-nodes [ started ]
udev-trigger [ started ]
Runlevel: default
dbus [ stopped ]
bluetooth [ stopped ]
wicd [ stopped ]
local [ stopped ]
Runlevel: nonetwork
local [ stopped ]
Runlevel: boot
modules [ stopped ]
hwclock [ stopped ]
hostname [ stopped ]
fsck [ stopped ]
root [ stopped ]
mtab [ stopped ]
localmount [ stopped ]
urandom [ stopped ]
sysctl [ stopped ]
bootmisc [ stopped ]
termencoding [ stopped ]
keymaps [ stopped ]
loopback [ stopped ]
procfs [ stopped ]
binfmt [ stopped ]
Dynamic Runlevel: hotplugged
Dynamic Runlevel: needed/wanted
Dynamic Runlevel: manual
|
Does anyone can help me further investigate about where could the problem is?
Thank you. |
|
Back to top |
|
|
Roman_Gruber Advocate
Joined: 03 Oct 2006 Posts: 3846 Location: Austro Bavaria
|
Posted: Wed Apr 05, 2017 8:55 am Post subject: |
|
|
whats the content of grub.cfg
regarding both boot entires please? the working one, the bugged one
e.g. something like (starts with menuentry and ends with })
Code: | menuentry 'System BIOS setup' --users "" {
echo 'Entering system BIOS setup ...'
fwsetup
}
|
|
|
Back to top |
|
|
sbbg n00b
Joined: 02 Feb 2013 Posts: 23
|
Posted: Wed Apr 05, 2017 11:06 am Post subject: |
|
|
Roman_Gruber wrote: | whats the content of grub.cfg
regarding both boot entires please? the working one, the bugged one
e.g. something like (starts with menuentry and ends with })
Code: | menuentry 'System BIOS setup' --users "" {
echo 'Entering system BIOS setup ...'
fwsetup
}
|
|
Hi, thank you for reaching out again.
Here are the grub.cfg menu entries you asked:
Normal boot:
Code: |
menuentry 'Gentoo GNU/Linux' --class gentoo --class gnu-linux --class gnu --class os $menuentry_id_option 'gnulinux-simple-7f39c493-22cd-4e55-8b40-71527175e0\
f9' {
load_video
insmod gzio
insmod part_gpt
insmod ext2
if [ x$feature_platform_search_hint = xy ]; then
search --no-floppy --fs-uuid --set=root 7f39c493-22cd-4e55-8b40-71527175e0f9
else
search --no-floppy --fs-uuid --set=root 7f39c493-22cd-4e55-8b40-71527175e0f9
fi
echo 'Loading Linux 4.9 ...'
linux /boot/kernel-4.9 root=/dev/nvme0n1p5 ro
}
|
Recovery mode:
Code: |
menuentry 'Gentoo GNU/Linux, with Linux 4.9 (recovery mode)' --class gentoo --class gnu-linux --class gnu --class os $menuentry_id_option 'gnulinux-4\
.9-recovery-7f39c493-22cd-4e55-8b40-71527175e0f9' {
load_video
insmod gzio
insmod part_gpt
insmod ext2
if [ x$feature_platform_search_hint = xy ]; then
search --no-floppy --fs-uuid --set=root 7f39c493-22cd-4e55-8b40-71527175e0f9
else
search --no-floppy --fs-uuid --set=root 7f39c493-22cd-4e55-8b40-71527175e0f9
fi
echo 'Loading Linux 4.9 ...'
linux /boot/kernel-4.9 root=/dev/nvme0n1p5 ro single
}
|
I didn't change anything generated from grub2-mkconfig actually.
BTW, a really bad, partial solution I tried is to delete nvidia-smi.
But certainly, I would like to know how could this happen actually.
Thank you. |
|
Back to top |
|
|
Roman_Gruber Advocate
Joined: 03 Oct 2006 Posts: 3846 Location: Austro Bavaria
|
Posted: Wed Apr 05, 2017 1:26 pm Post subject: |
|
|
You boot into single mode with your recovery.
Is this 4.9.0?
uname -a
latest available is 4.9.20. latest stable amd64 is 4.9.16
A few years ago devs just kicked out buggy kernel versions without a notice. An upgrade suddenly fixed it several times.
Quote: | OpenRC: openrc-0.24.2 |
do you use openrc with eudev?
Code: | ASUS-G75VW roman # qlist -Iv openrc eudev
app-admin/openrc-settingsd-1.0.1
sys-apps/openrc-0.24.2
sys-fs/eudev-3.2.1-r1
ASUS-G75VW roman #
|
I still wonder why your box hangs |
|
Back to top |
|
|
sbbg n00b
Joined: 02 Feb 2013 Posts: 23
|
Posted: Thu Apr 06, 2017 7:02 am Post subject: |
|
|
Hi,
Yes, I'm using 4.9.20 gentoo-sources kernel,
here is the uname output:
Code: |
Linux Pexus_Mobile 4.9.20-gentoo #3 SMP Tue Apr 4 06:19:24 CST 2017 x86_64 Intel(R) Core(TM) i5-6200U CPU @ 2.30GHz GenuineIntel GNU/Linux
|
And yes, it uses eudev + openrc.
The result of qlist command is following:
Code: |
sys-apps/openrc-0.24.2
sys-fs/eudev-3.2.1-r1
|
Please feel free to ask anymore details, even an account/passwd to login if that can help.
Thank you again. |
|
Back to top |
|
|
Roman_Gruber Advocate
Joined: 03 Oct 2006 Posts: 3846 Location: Austro Bavaria
|
|
Back to top |
|
|
alikasundara n00b
Joined: 10 Nov 2011 Posts: 5
|
Posted: Wed Apr 12, 2017 7:56 pm Post subject: |
|
|
Hi sbbg,
Have you had any luck with resolving that? I am facing the same issue, the kernel version is 4.9.6.
Thanks. |
|
Back to top |
|
|
Borodux n00b
Joined: 21 May 2017 Posts: 1
|
Posted: Sun May 21, 2017 6:40 pm Post subject: |
|
|
Hi,
This problem arised for me after updating nvidia-drivers to 381.22. After trying to boot several times it may successfully start X server. Most of the times nvidia-smi eats 100% CPU and the process is totally unkillable.
However downgrading to 378.13 didn't solve the problem. I didn't notice which version was before, but all distfiles are new now. So old version might changed too.
I use:
- 4.9.16-gentoo i686
- framebuffer (vesafb) for splash theme and console decorations
- 01:00.0 VGA compatible controller: NVIDIA Corporation GK106 [GeForce GTX 650 Ti] (rev a1)
Disabling console decorations and framebuffer solved the problem. Also didn't find anything about this new behaviour. |
|
Back to top |
|
|
Yamakuzure Advocate
Joined: 21 Jun 2006 Posts: 2284 Location: Adendorf, Germany
|
Posted: Mon May 22, 2017 8:00 am Post subject: |
|
|
I have the same issue if the 'nvidia' kernel module is loaded instead of 'nvidia-uvm'. (I am using BumbleBee, though)
When I run something through primusrun/optirun, it looks like this, and everything works just fine: Code: | ~ # lsmod | grep nvidia
nvidia_modeset 775279 1
nvidia_uvm 560959 0
nvidia 11458879 40 nvidia_modeset,nvidia_uvm
~ # nvidia-smi
Mon May 22 10:03:28 2017
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 381.22 Driver Version: 381.22 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 Quadro K2100M Off | 0000:01:00.0 Off | N/A |
| N/A 32C P8 N/A / N/A | 9MiB / 1999MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| 0 13563 G Xorg 7MiB |
+-----------------------------------------------------------------------------+ | This problem arose when I updated to kernel 4.9, so I do not think that it is some hard bug in nvidia-drivers. It seems like 'nvidia_uvm' being a must-have now. _________________ Important German:- "Aha" - German reaction to pretend that you are really interested while giving no f*ck.
- "Tja" - German reaction to the apocalypse, nuclear war, an alien invasion or no bread in the house.
|
|
Back to top |
|
|
Zephyrus Apprentice
Joined: 01 Sep 2004 Posts: 204
|
Posted: Sat Oct 14, 2017 9:45 am Post subject: |
|
|
I have a slightly different setup, indeed I am using bumblebee to run a W530 using the Intel integrated graphics for the main laptop screen and bumblebeee-Nvidia to run an external display.
However, recently I started to have the same issue, with nvidia-smi hanging at boot, when run by udev. This was not happening at every boot but somehow randomly. This made me suspect that it was some kind of race condition at boot.
Indeed, I discovered that by adding a small sleep (sleep 1.5 below) to /lib/udev/nvidia-udev.sh, which was the script running nvidia-smi at boot,
Code: |
#hopefully this prevents infinite loops like bug #454740
if lsmod | grep -iq nvidia; then
sleep 1.5
/opt/bin/nvidia-smi > /dev/null
fi
|
the issue, at least on my system, disappears! |
|
Back to top |
|
|
Jimini l33t
Joined: 31 Oct 2006 Posts: 601 Location: Germany
|
Posted: Sat Apr 28, 2018 6:40 pm Post subject: |
|
|
I can confirm this behavior for gentoo-sources-4.9.95 and nvidia-drivers-390.42. After running startx, the black screen only show a not blinking prompt, and nvidia-smi generates 100% cpu load. top / ps also show status "D", which means "uninterruptible sleep", IIRC.
Zephyrus' solution works in my case, so I could finally start the gui :-)
Best regards,
Jimini _________________ "The most merciful thing in the world, I think, is the inability of the human mind to correlate all its contents." (H.P. Lovecraft: The Call of Cthulhu) |
|
Back to top |
|
|
Satarsa n00b
Joined: 21 Sep 2005 Posts: 70 Location: Russia, St.-Petersburg
|
Posted: Thu Jan 24, 2019 3:24 pm Post subject: |
|
|
I hit the same issue, but Zephyrus's solution does not work for me.
The only thing works, is not to allow autoloading nvidia kernel module.
In my case I added to /etc/modprobe.d/blacklist.conf
|
|
Back to top |
|
|
|