In order to take advantage of extra computational power from nVidia GPUs like Tesla, we need to install proper CUDA drivers on the system. The best choice is to use docker with nvidia-docker enabled GPU containers. The problem is, you can't install docker on older Linux systems like Fedora 17 ... In order to install docker, you need at least a Fedora version of 24. Sadly, sometimes upgrading the Linux OS for the server is not a viable option.
OK, we have to install the CUDA Toolkit and cuDNN libraries by ourselves now. This is still not quite easy for Linux systems, espesially the old ones. You have to solve following 2 problems to do this.
- Disable the x-window of the system
- Disable Nouvean driver (for newer systems, the nVidia installer handles this automatically)
Here I will provide a walkthrough of this on an old system like Fedora 17.
Download CUDA 8.0 and cuDNN 5.1
You can find CUDA Toolkit 8.0 installer for your system at official nvidia website. For cuDNN 5.1, you can find it HERE. Note that in order to download cuDNN library, you need to create a nVidia developer account first(in case you don't have one). After the download is finished, use some FTP software like FileZilla to pull installers onto your Fedora 17 server.
Then you need the following command to make the run file of CUDA executable:
chmod +x cuda_8.0.61_375.26_linux.run
Update build tools and kernel-devel of the system
NVIDIA’s installer will build a kernel module from the driver and link it to your kernel. Therefore it requires certain build tools being installed.
sudo yum groupinstall "Development Tools"
sudo yum install kernel-devel kernel-headers
## Disable the x-window server by changing default run-level to "3"
NVIDIA drivers only install if no X server is running. You can manually terminate the X server. However, some components, buffers and modules won’t unload. So, we need to boot directly into run-level “3” which is the text mode.
Fedora’s default run-level is defined through a symbolic link which we will modify now and change back later.
sudo rm /etc/systemd/system/default.target
sudo ln -sf /lib/systemd/system/multi-user.target /etc/systemd/system/default.target
Disable Nouveau driver manually (Needed for old systems like Fedora 17)
There are 3 steps for disabling the Nouvean driver. Note that you need to be the super user for all these operations.
1. Blacklist Nouveau in /etc/modprobe.d
We need to prevent nouveau drivers from loading a) at boot time and b) post-boot. This step is to prevent it from being loaded manually or through any dependent module.
We create a new config file disable-nouveau.conf as the existing file blacklist.conf might be updated/overwritten by any system update.
sudo echo 'blacklist nouveau' >> /etc/modprobe.d/disable-nouveau.conf
sudo echo 'nouveau modeset=0' >> /etc/modprobe.d/disable-nouveau.conf
2. blacklist nouveau at boot time
Fedora ships nouveau as part of the boot image. That’s why blacklisting a la Step #4 is not sufficient. We need to pass a parameter to the kernel at boot time that stops nouveau from loading.
In your file /boot/grub2/grub.cfg find the line that loads the kernel (yours might look slightly different but should start similarly):
linux /vmlinuz-3.6.3-1.fc17.x86_64 root=/dev/mapper/vg_fedo-lv_root ro rd.lvm.lv=vg_fedo/lv_swap rd.md=0 rd.dm=0 SYSFONT=True rd.lvm.lv=vg_fedo/lv_root rd.luks=0 KEYTABLE=es LANG=en_US.UTF-8 rhgb quiet
And now add the parameter rdblacklist=nouveau to it:
linux /vmlinuz-3.6.3-1.fc17.x86_64 root=/dev/mapper/vg_fedo-lv_root ro rd.lvm.lv=vg_fedo/lv_swap rd.md=0 rd.dm=0 SYSFONT=True rd.lvm.lv=vg_fedo/lv_root rd.luks=0 KEYTABLE=es LANG=en_US.UTF-8 rdblacklist=nouveau rhgb quiet
3. Remove / disable nouveau drivers from kernel initramfs
For some OS version like Fedora 17, this step is also required to succefully disable Nouvean driver from being loaded.
## Backup old initramfs nouveau image ##
mv /boot/initramfs-$(uname -r).img /boot/initramfs-$(uname -r)-nouveau.img
## Create new initramfs image ##
dracut /boot/initramfs-$(uname -r).img $(uname -r)
Once this three steps are done, reboot the system. Now you are ready to install CUDA driver.
Install CUDA 8.0 and CUDA Toolkit
Now run the .run file you downloaded as the super user to start installing CUDA driver.
sudo cuda_8.0.61_375.26_linux.run
Follow the instructions and you should be able to install the driver now.
Once the installation is completed, run the following command to add environmental variables for CUDA and CUDA Toolkit:
echo -e "\n## CUDA and cuDNN paths" >> ~/.bashrc
echo 'export PATH=/usr/local/cuda-8.0/bin:${PATH}' >> ~/.bashrc
echo 'export LD_LIBRARY_PATH=/usr/local/cuda-8.0/lib64:${LD_LIBRARY_PATH}' >> ~/.bashrc
source ~/.bashrc
You can check the installation using the following command.
which nvcc
# If installation is successful, you should see something like this:
# /usr/local/cuda-8.0/bin/nvcc
nvidia-smi
# If installation is successful, you should see something like this:
# +-----------------------------------------------------------------------------+
#| NVIDIA-SMI 375.26 Driver Version: 375.26 |
#|-------------------------------+----------------------+----------------------+
#| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
#| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
#|===============================+======================+======================|
#| 0 Tesla K20c Off | 0000:03:00.0 Off | 0 |
#| 30% 32C P0 54W / 225W | 0MiB / 4742MiB | 100% Default |
#+-------------------------------+----------------------+----------------------+
#+-----------------------------------------------------------------------------+
#| Processes: GPU Memory |
#| GPU PID Type Process name Usage |
#|=============================================================================|
#| No running processes found |
#+-----------------------------------------------------------------------------+
Install cuDNN library
For most deep learning frameworks like Tensorflow, you also need cuDNN library to run it on GPU. Now let's install it as well.
The installation of cuDNN is relatively simple, just use the following command at where the cudnn tgz file is located:
tar -xzvf cudnn-8.0-linux-x64-v5.1.tgz
sudo cp -a cuda/lib64/* /usr/local/cuda-8.0/lib64/
sudo cp -a cuda/include/* /usr/local/cuda-8.0/include/
sudo ldconfig
Then you can delete the cuda folder and tgz file:
rm -rf cuda
rm cudnn-8.0-linux-x64-v5.1.tgz
Change default run-level back to "5" to use graphical user interface again
We have diabled the x-window server to install nVidia driver before. Now the installation is finished, we can enable it again.
sudo rm /etc/systemd/system/default.target
sudo ln -sf /lib/systemd/system/graphical.target /etc/systemd/system/default.target
Reboot your system again to make it effective.
Now the installation of CUDA and cuDNN is finished, you can use your GPU in deep learning packages now.