View previous topic :: View next topic |
Author |
Message |
[n00b@localhost] Apprentice
Joined: 30 Aug 2004 Posts: 266 Location: London, UK
|
Posted: Sat Apr 27, 2013 10:09 pm Post subject: [SOLVED] Getting CUDA to work on Thinkpad W530 |
|
|
I have a Thinkpad W530 with an integrated Intel i915 GPU and a discrete nVidia Quadro K2000M GPU. I have left Optimus enabled in the BIOS (UEFI) and set up X to use the i915. I have installed the nvidia-drivers and the CUDA toolkit but when I try to run deviceQuery from the SDK I get the following:
Code: |
garyslaptop ~ # /opt/cuda/sdk/1_Utilities/deviceQuery/deviceQuery
/opt/cuda/sdk/1_Utilities/deviceQuery/deviceQuery Starting...
CUDA Device Query (Runtime API) version (CUDART static linking)
cudaGetDeviceCount returned 38
-> no CUDA-capable device is detected
|
Occasionally the error is 10 (invalid device ordinal).
Googling for this error turns up a million blogs where somebody says they have the same problem and it is solved by following the advice in the CUDA Getting Started Guide for Linux.
Sometimes the nvidia module is automatically loaded, sometimes not, and sometimes the device nodes are created, sometimes not. If the module is not loaded then loading it manually sometimes gives an error (dmesg says the card is not supported by the driver version but according to the website and driver README it is). If the device nodes are not there, sometimes mknod fails (i.e. returns 0 but the device node is still missing). In fact, the only consistent behaviour is that it doesn't work!
Code: |
garyslaptop ~ # modprobe nvidia
modprobe: ERROR: could not insert 'nvidia': No such device
garyslaptop ~ # lsmod | grep nvidia
nvidia 9149524 1
garyslaptop ~ # dmesg | tail
[ 3155.427927] vgaarb: device changed decodes: PCI:0000:01:00.0,olddecodes=none,decodes=none:owns=none
[ 3155.428048] NVRM: The NVIDIA GPU 0000:01:00.0 (PCI ID: 10de:0ffb)
NVRM: installed in this system is not supported by the 313.30
NVRM: NVIDIA Linux driver release. Please see 'Appendix
NVRM: A - Supported NVIDIA GPU Products' in this release's
NVRM: README, available on the Linux driver download page
NVRM: at www.nvidia.com.
[ 3155.428070] nvidia: probe of 0000:01:00.0 failed with error -1
[ 3155.428117] NVRM: The NVIDIA probe routine failed for 1 device(s).
[ 3155.428120] NVRM: None of the NVIDIA graphics adapters were initialized!
garyslaptop ~ # mknod -m 660 /dev/nvidia0 c 195 0
garyslaptop ~ # echo $?
0
garyslaptop ~ # ls /dev/nvidia0
ls: cannot access /dev/nvidia0: No such file or directory
garyslaptop ~ # mknod -m 660 /dev/nvidiactl c 195 255
garyslaptop ~ # echo $?
0
garyslaptop ~ # ls /dev/nvidiactl
ls: cannot access /dev/nvidiactl: No such file or directory
|
Has anyone been able to get CUDA working on this laptop? I bought it specifically for doing CUDA development after my T61 died.
I have tried nvidia-driver versions 304.88, 313.30 and 319.12 with nvidia-cuda-sdk-5.0.35-r1 and nvidia-cuda-toolkit-5.0.35-r4 and all give the same results. I have also tried disabling the integrated GPU in the BIOS after which the laptop does not boot (i.e. GRUB2 freezes).
Last edited by [n00b@localhost] on Fri May 03, 2013 10:52 am; edited 1 time in total |
|
Back to top |
|
|
kernelOfTruth Watchman
Joined: 20 Dec 2005 Posts: 6111 Location: Vienna, Austria; Germany; hello world :)
|
|
Back to top |
|
|
[n00b@localhost] Apprentice
Joined: 30 Aug 2004 Posts: 266 Location: London, UK
|
Posted: Mon Apr 29, 2013 5:18 pm Post subject: |
|
|
I've tried 310.44 too with the same results. Support for the K2000M was first added in 304.22.
I've since found out more about the problem. It seems that on boot the BIOS keeps the nVidia card turned off until something uses it (this is the point of Optimus - only power up the nVidia GPU when needed to save power). I have installed bumblebee to turn the card on and off manually but after booting it only allows me to turn it on once and when I try to use it in any way (deviceQuery, nvidia-smi, optirun) the BIOS turns it off again (with the syslog error "GPU has fallen off the bus") and won't let anything turn it back on until a reboot. |
|
Back to top |
|
|
kernelOfTruth Watchman
Joined: 20 Dec 2005 Posts: 6111 Location: Vienna, Austria; Germany; hello world :)
|
|
Back to top |
|
|
[n00b@localhost] Apprentice
Joined: 30 Aug 2004 Posts: 266 Location: London, UK
|
Posted: Fri May 03, 2013 10:51 am Post subject: |
|
|
Since this laptop is only a fortnight old I'm assuming it would have the latest BIOS updates when it came out of the factory.
kernelOfTruth wrote: | I yet have to try bumblebee & the discrete graphics out under Linux - currently I have it completely turned off (T530)
does plugging the power source make a change ? is disabling the intel graphics driver entirely an option ? |
Despite only being a fortnight old I have already broken the charging circuit so can no longer run the laptop off a battery. There is a BIOS option to run the laptop with "Integrated graphics only", "Discrete graphics only" or "nVidia Optimus". With Integrated graphics only it works OK but obviously I don't have access to my nVidia card (it doesn't even show up in lspci). Discrete graphics only is strange as now the Intel GPU won't show up in lspci so obviously the display is being run by the nVidia card - but it doesn't stop the card from turning off! I also have to disable KMS to get it past GRUB. It does boot successfully but the screen doesn't redraw and I have to ssh in to reboot. KMS works fine with the Intel card.
My previous googling turned up most of those pages except the first f.g.o and nvnews.net links. My situation is slightly different to those as I didn't have a problem with interrupts although I did have to add "nomodeset" to the kernel command line to get SysRescCD to boot.
Enabling the four kernel options listed, however, (NO_HZ, RCU_FAST_NO_HZ, CALGARY_IOMMU and CALGARY_IOMMU_ENABLED_BY_DEFAULT) stops the "GPU has fallen off the bus" errors and the card turning off when I try to use it. I'm not entirely convinced the NO_HZ and RCU_FAST_NO_HZ options are needed though as they sound like they fix the interrupt problems I never had so I'll try a few kernels with and without those enabled.
I've just run deviceQuery from the CUDA SDK and my nVidia card is listed. The card also didn't turn itself off after using it. I've not tried getting bumblebee or optirun to work yet.
Thanks for all your help! |
|
Back to top |
|
|
[n00b@localhost] Apprentice
Joined: 30 Aug 2004 Posts: 266 Location: London, UK
|
Posted: Fri May 03, 2013 12:50 pm Post subject: |
|
|
So - I just tried a couple of kernels with CONFIG_NO_HZ and CONFIG_RCU_FAST_NO_HZ disabled and everything broke again.
Seems like those options are needed after all. |
|
Back to top |
|
|
|