Gentoo Forums
Gentoo Forums
Gentoo Forums
Quick Search: in
NVIDIA Bumblebee setup mismatch
View unanswered posts
View posts from last 24 hours

 
Reply to topic    Gentoo Forums Forum Index Kernel & Hardware
View previous topic :: View next topic  
Author Message
TangoDrango
n00b
n00b


Joined: 04 Aug 2022
Posts: 4
Location: USA

PostPosted: Wed Sep 07, 2022 3:42 am    Post subject: NVIDIA Bumblebee setup mismatch Reply with quote

I currently have nvidia-drivers-515.65.01 installed and have set it up as outlined on the wiki page. However when I try and utilize it I get the following output:

Quote:
$ optirun glxspheres64
[ 2366.129856] [ERROR]Cannot access secondary GPU - error: [XORG] (EE) NVIDIA: Failed to initialize the NVIDIA kernel module. Please see the

[ 2366.129900] [ERROR]Aborting because fallback start is disabled.


Additionally,

Code:
nvidia-smi


Gives me

Code:
Failed to initialize NVML: Driver/library version mismatch


Additionally, dmesg outputs the following:
Code:
[   12.983059] NVRM: loading NVIDIA UNIX x86_64 Kernel Module  510.73.05  Sat May  7 05:30:26 UTC 2022
...
[   27.849402] NVRM: API mismatch: the client has the version 515.65.01, but
               NVRM: this kernel module has the version 510.73.05.  Please
               NVRM: make sure that this kernel module and all NVIDIA driver
               NVRM: components have the same version.


which confuses me because as far as I can tell these outputs:

Code:
$ cat /proc/driver/nvidia/version
NVRM version: NVIDIA UNIX x86_64 Kernel Module  510.73.05  Sat May  7 05:30:26 UTC 2022
GCC version:  gcc version 11.3.0 (Gentoo 11.3.0 p4)
$ cat /sys/module/nvidia/version
510.73.05


should be indicative of a mismatch if the module actually has a mismatch. I've deselected, removed, and re-emerged nvidia-drivers and re-generated my dracut initramfs with the appropriate settings in nvidia.conf (outlined below), so I don't believe there is an outdated kernel module hiding inside my initramfs if that could be an issue.

Code:
$ cat /etc/dracut.conf.d/nvidia.conf
# Omit the nvidia driver from the ramdisk, to avoid needing to regenerate
# the ramdisk on upates
omit_drivers+=" nvidia nvidia-drm nvidia-modeset nvidia-uvm "


Additionally, and I believe this is symptomatic and not the cause, my bbswitch module is not loading. I have bumblebee installed, config is default barring Bridge=virtualgl being set, and I have added it to default runlevel in rc-update. Now whether I try to run glxgears/glxspheres64 through optirun or primusrun I recieve the following errors:

Code:
$ primusrun glxgears
primus: fatal: Bumblebee daemon reported: error: [XORG] (EE) NVIDIA: Failed to initialize the NVIDIA kernel module. Please see the
$ optirun glxgears
[ 2758.585341] [ERROR]Cannot access secondary GPU - error: [XORG] (EE) NVIDIA: Failed to initialize the NVIDIA kernel module. Please see the

[ 2758.585406] [ERROR]Aborting because fallback start is disabled.


And I'm back to square one with the kernel driver/api mismatch debacle.

Code:
optirun -vvv glxgears
[ 3284.654870] [DEBUG]Reading file: /etc/bumblebee/bumblebee.conf
[ 3284.655038] [INFO]Configured driver: nvidia
[ 3284.655439] [DEBUG]optirun version  starting...
[ 3284.655461] [DEBUG]Active configuration:
[ 3284.655465] [DEBUG] bumblebeed config file: /etc/bumblebee/bumblebee.conf
[ 3284.655469] [DEBUG] X display: :8
[ 3284.655473] [DEBUG] LD_LIBRARY_PATH: /usr/lib64/opengl/nvidia/lib:/usr/lib/opengl/nvidia/lib
[ 3284.655477] [DEBUG] Socket path: /var/run/bumblebee.socket
[ 3284.655480] [DEBUG] Accel/display bridge: virtualgl
[ 3284.655483] [DEBUG] VGL Compression: rgb
[ 3284.655486] [DEBUG] VGLrun extra options:
[ 3284.655489] [DEBUG] Primus LD Path: /usr/lib/primus:/usr/lib32/primus
[ 3284.666628] [INFO]Response: No - error: [XORG] (EE) NVIDIA: Failed to initialize the NVIDIA kernel module. Please see the

[ 3284.666650] [ERROR]Cannot access secondary GPU - error: [XORG] (EE) NVIDIA: Failed to initialize the NVIDIA kernel module. Please see the

[ 3284.666654] [DEBUG]Socket closed.
[ 3284.666684] [ERROR]Aborting because fallback start is disabled.
[ 3284.666687] [DEBUG]Killing all remaining processes.


I haven't cut off any text, it really does just tell me to "Please see the "
I think it's mocking me.

I do not have any other nvidia-related packages installed. Only the driver, no CUDA toolkits or SDKs or the runtime toolkit.

If I need to provide any additional error logs or outputs please do let me know and I'll provide them as soon as I am able.
Back to top
View user's profile Send private message
Ionen
Developer
Developer


Joined: 06 Dec 2018
Posts: 2949

PostPosted: Wed Sep 07, 2022 5:17 am    Post subject: Reply with quote

If really not in the initramfs, then likely cause is that you're building nvidia-drivers against a kernel that's not being used. So you still have your 510 modules on the one you're booting, but 515 are for another kernel and unused.

i.e. confirm kernel versions match between eselect kernel list (which controls the /usr/src/linux symlink that nvidia will use by default) and uname -a (check date on uname -a to ensure it's recent too).

Edit:
Unrelated to your issues, but note bumblebee is essentially dead/obsolete, can control this with simple env vars nowadays:
https://download.nvidia.com/XFree86/Linux-x86_64/515.65.01/README/primerenderoffload.html
(not that I have such a setup myself and never experimented)
Back to top
View user's profile Send private message
TangoDrango
n00b
n00b


Joined: 04 Aug 2022
Posts: 4
Location: USA

PostPosted: Wed Sep 07, 2022 11:02 pm    Post subject: Reply with quote

Thank you very much for the information!
This is a fairly new install, I've only ever had the one kernel version. Nevertheless I checked as advised and sure enough they do match.

Code:

$ eselect kernel list
Available kernel symlink targets:
  [1]   linux-5.15.59-gentoo *
$ uname -a
Linux gentop 5.15.59-gentoo-x86_64 #1 SMP Sun Aug 7 02:10:13 HST 2022 x86_64 11th Gen Intel(R) Core(TM) i7-11800H @ 2.30GHz GenuineIntel GNU/Linux


Having said that, and acknowledging that I'm out of my depth here, I installed genkernel instead to see if maybe it was something that I had done while tweaking things inside the kernel.
Maybe my initial thought on the initramfs being benign was misplaced, or maybe it was something inside the kernel that was throwing a fit, but now when I run optirun I get a split-second flash of the program window being rendered followed by the following output:

Code:
$ optirun -vvv glxspheres64
[  257.917488] [DEBUG]Reading file: /etc/bumblebee/bumblebee.conf
[  257.917659] [INFO]Configured driver: nvidia
[  257.917918] [DEBUG]optirun version  starting...
[  257.917925] [DEBUG]Active configuration:
[  257.917929] [DEBUG] bumblebeed config file: /etc/bumblebee/bumblebee.conf
[  257.917932] [DEBUG] X display: :8
[  257.917936] [DEBUG] LD_LIBRARY_PATH: /usr/lib64/opengl/nvidia/lib:/usr/lib/opengl/nvidia/lib
[  257.917940] [DEBUG] Socket path: /var/run/bumblebee.socket
[  257.917944] [DEBUG] Accel/display bridge: virtualgl
[  257.917947] [DEBUG] VGL Compression: rgb
[  257.917950] [DEBUG] VGLrun extra options:
[  257.917954] [DEBUG] Primus LD Path: /usr/lib/primus:/usr/lib32/primus
[  260.016155] [INFO]Response: Yes. X is active.

[  260.016170] [INFO]Running application using virtualgl.
[  260.016234] [DEBUG]Process vglclient started, PID 4791.
[  260.016249] [DEBUG]Hiding stderr for execution of vglclient
[  260.017418] [DEBUG]SIGCHILD received, but wait failed with No child processes
[  260.017476] [DEBUG]Process vglrun started, PID 4793.
Polygons in scene: 62464 (61 spheres * 1024 polys/spheres)
GLX FB config ID of window: 0xad (8/8/8/0)
Visual ID of window: 0x20
Context is Direct
OpenGL Renderer: NVIDIA GeForce RTX 3060 Laptop GPU/PCIe/SSE2
[VGL] ERROR: Could not connect to VGL client.  Make sure that vglclient is
[VGL]    running and that either the DISPLAY or VGL_CLIENT environment
[VGL]    variable points to the machine on which vglclient is running.
[VGL] ERROR: in connect--
[VGL]    294: Connection refused
[  260.108101] [DEBUG]SIGCHILD received, but wait failed with No child processes
[  260.108111] [DEBUG]Socket closed.
[  260.108123] [DEBUG]Killing all remaining processes.


Which is a good start because (insofar as I can tell) this means I now have the kernel side of things working fine, I just can't use optirun because I never actually set up virtualgl properly.
Rather than fix that (and partly becuase I'm not yet sure how to do so), I tried primusrun and it works perfectly!

Code:
$ primusrun glxspheres64
Polygons in scene: 62464 (61 spheres * 1024 polys/spheres)
GLX FB config ID of window: 0xad (8/8/8/0)
Visual ID of window: 0x3d5
Context is Direct
OpenGL Renderer: NVIDIA GeForce RTX 3060 Laptop GPU/PCIe/SSE2
146.027690 frames/sec - 63.076077 Mpixels/sec
144.014407 frames/sec - 62.206447 Mpixels/sec
144.004484 frames/sec - 62.202161 Mpixels/sec
...


I still need to figure out how to set up virtualgl to get optirun working, but as far as I can tell they accomplish the same task so it isn't a pressing matter, more of a "I want to learn how this works" project.

Also, in reference to your post about environmental variables, I'll definitely look into that, but again I'm a bit out of my depth here so I'll need to do more research and figure out exactly what all is going on in that document. Very useful information though, thank you very much for linking it to me!
Back to top
View user's profile Send private message
logrusx
Advocate
Advocate


Joined: 22 Feb 2018
Posts: 3162

PostPosted: Thu Sep 08, 2022 5:44 am    Post subject: Reply with quote

Have you tried sys-power/switcheroo-control?

Regards,
Georgi
Back to top
View user's profile Send private message
Display posts from previous:   
Reply to topic    Gentoo Forums Forum Index Kernel & Hardware All times are GMT
Page 1 of 1

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum