J'exécute une instance AWS EC2 g2.2xlarge avec Ubuntu 14.04 LTS . J'aimerais observer l'utilisation du processeur graphique lors de la formation de mes modèles TensorFlow . Une erreur s'est produite lors de l'exécution de 'nvidia-smi'.
ubuntu@ip-10-0-1-213:/etc/alternatives$ cd /usr/lib/nvidia-375/bin
ubuntu@ip-10-0-1-213:/usr/lib/nvidia-375/bin$ ls
nvidia-bug-report.sh nvidia-debugdump nvidia-xconfig
nvidia-cuda-mps-control nvidia-persistenced
nvidia-cuda-mps-server nvidia-smi
ubuntu@ip-10-0-1-213:/usr/lib/nvidia-375/bin$ ./nvidia-smi
NVIDIA-SMI has failed because it couldn't communicate with the NVIDIA driver. Make sure that the latest NVIDIA driver is installed and running.
ubuntu@ip-10-0-1-213:/usr/lib/nvidia-375/bin$ dpkg -l | grep nvidia
ii nvidia-346 352.63-0ubuntu0.14.04.1 AMD64 Transitional package for nvidia-346
ii nvidia-346-dev 346.46-0ubuntu1 AMD64 NVIDIA binary Xorg driver development files
ii nvidia-346-uvm 346.96-0ubuntu0.0.1 AMD64 Transitional package for nvidia-346
ii nvidia-352 375.26-0ubuntu1 AMD64 Transitional package for nvidia-375
ii nvidia-375 375.39-0ubuntu0.14.04.1 AMD64 NVIDIA binary driver - version 375.39
ii nvidia-375-dev 375.39-0ubuntu0.14.04.1 AMD64 NVIDIA binary Xorg driver development files
ii nvidia-modprobe 375.26-0ubuntu1 AMD64 Load the NVIDIA kernel driver and create device files
ii nvidia-opencl-icd-346 352.63-0ubuntu0.14.04.1 AMD64 Transitional package for nvidia-opencl-icd-352
ii nvidia-opencl-icd-352 375.26-0ubuntu1 AMD64 Transitional package for nvidia-opencl-icd-375
ii nvidia-opencl-icd-375 375.39-0ubuntu0.14.04.1 AMD64 NVIDIA OpenCL ICD
ii nvidia-prime 0.6.2.1 AMD64 Tools to enable NVIDIA's Prime
ii nvidia-settings 375.26-0ubuntu1 AMD64 Tool for configuring the NVIDIA graphics driver
ubuntu@ip-10-0-1-213:/usr/lib/nvidia-375/bin$ lspci | grep -i nvidia
00:03.0 VGA compatible controller: NVIDIA Corporation GK104GL [GRID K520] (rev a1)
ubuntu@ip-10-0-1-213:/usr/lib/nvidia-375/bin$
$ inxi -G
Graphics: Card-1: Cirrus Logic Gd 5446
Card-2: NVIDIA GK104GL [GRID K520]
X.org: 1.15.1 driver: N/A tty size: 80x24 Advanced Data: N/A out of X
$ lspci -k | grep -A 2 -E "(VGA|3D)"
00:02.0 VGA compatible controller: Cirrus Logic Gd 5446
Subsystem: XenSource, Inc. Device 0001
Kernel driver in use: cirrus
00:03.0 VGA compatible controller: NVIDIA Corporation GK104GL [GRID K520] (rev a1)
Subsystem: NVIDIA Corporation Device 1014
00:1f.0 Unassigned class [ff80]: XenSource, Inc. Xen Platform Device (rev 01)
J'ai suivi ces instructions pour installer CUDA 7 et cuDNN:
$Sudo apt-get -q2 update
$Sudo apt-get upgrade
$Sudo reboot
=============================================== ======================
Après le redémarrage, mettez à jour initramfs en exécutant '$ Sudo update-initramfs -u'
Maintenant, veuillez éditer le fichier /etc/modprobe.d/blacklist.conf sur la liste noire nouveau. Ouvrez le fichier dans un éditeur et insérez les lignes suivantes à la fin du fichier.
liste noire nouveau liste noire lbm-nouveau options nouveau modeset = 0 alias nouveau désactivé alias lbm-nouveau désactivé
Enregistrez et quittez le fichier.
Maintenant, installez les outils essentiels à la construction, mettez à jour initramfs et redémarrez comme ci-dessous:
$Sudo apt-get install linux-{headers,image,image-extra}-$(uname -r) build-essential
$Sudo update-initramfs -u
$Sudo reboot
=============================================== =======================
Après le redémarrage, exécutez les commandes suivantes pour installer Nvidia.
$Sudo wget http://developer.download.nvidia.com/compute/cuda/7_0/Prod/local_installers/cuda_7.0.28_linux.run
$Sudo chmod 700 ./cuda_7.0.28_linux.run
$Sudo ./cuda_7.0.28_linux.run
$Sudo update-initramfs -u
$Sudo reboot
=============================================== =======================
Maintenant que le système est installé, vérifiez l'installation en exécutant ce qui suit.
$Sudo modprobe nvidia
$Sudo nvidia-smi -q | head`enter code here`
Vous devriez voir la sortie comme 'nvidia.png'.
Maintenant, lancez les commandes suivantes . $
cd ~/NVIDIA_CUDA-7.0_Samples/1_Utilities/deviceQuery
$make
$./deviceQuery
Cependant, 'nvidia-smi' ne montre toujours pas l'activité du processeur graphique alors que Tensorflow est en train de former des modèles:
ubuntu@ip-10-0-1-48:~$ ipython
Python 2.7.11 |Anaconda custom (64-bit)| (default, Dec 6 2015, 18:08:32)
Type "copyright", "credits" or "license" for more information.
IPython 4.1.2 -- An enhanced Interactive Python.
? -> Introduction and overview of IPython's features.
%quickref -> Quick reference.
help -> Python's own help system.
object? -> Details about 'object', use 'object??' for extra details.
In [1]: import tensorflow as tf
I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcublas.so.7.5 locally
I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcudnn.so.5 locally
I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcufft.so.7.5 locally
I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcuda.so.1 locally
I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcurand.so.7.5 locally
ubuntu@ip-10-0-1-48:~$ nvidia-smi
Thu Mar 30 05:45:26 2017
+------------------------------------------------------+
| NVIDIA-SMI 346.46 Driver Version: 346.46 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 GRID K520 Off | 0000:00:03.0 Off | N/A |
| N/A 35C P0 38W / 125W | 10MiB / 4095MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| No running processes found |
+-----------------------------------------------------------------------------+
J'ai résolu "Echec de NVIDIA-SMI car il ne pouvait pas communiquer avec le pilote NVIDIA" sur mon ordinateur portable ASUS avec GTX 950m et Ubuntu 18.04 en désactivant le contrôle de démarrage sécurisé du BIOS.
J'obtenais la même erreur sur mon Ubuntu 16.04 (noyau Linux 4.14) dans Google Compute Engine avec le processeur graphique K80. J'ai mis à jour le noyau à 4.14 et le problème a été résolu. Voici comment j'ai mis à jour mon noyau Linux de 4.13 à 4.14:
Step 1:
Check the existing kernel of your Ubuntu Linux:
uname -a
Step 2:
Ubuntu maintains a website for all the versions of kernel that have
been released. At the time of this writing, the latest stable release
of Ubuntu kernel is 4.15. If you go to this
link: http://kernel.ubuntu.com/~kernel-ppa/mainline/v4.15/, you will
see several links for download.
Step 3:
Download the appropriate files based on the type of OS you have. For 64
bit, I would download the following deb files:
wget http://kernel.ubuntu.com/~kernel-ppa/mainline/v4.15/linux-headers-
4.15.0-041500_4.15.0-041500.201802011154_all.deb
wget http://kernel.ubuntu.com/~kernel-ppa/mainline/v4.15/linux-headers-
4.15.0-041500-generic_4.15.0-041500.201802011154_AMD64.deb
wget http://kernel.ubuntu.com/~kernel-ppa/mainline/v4.15/linux-image-
4.15.0-041500-generic_4.15.0-041500.201802011154_AMD64.deb
Step 4:
Install all the downloaded deb files:
Sudo dpkg -i *.deb
Step 5:
Reboot your machine and check if the kernel has been updated by:
uname -a
Vous devriez voir que votre noyau a été mis à jour et que nvidia-smi devrait fonctionner.
Exécutez ce qui suit pour obtenir le bon pilote NVIDIA:
Appareils ubuntu-drivers Sudo
Puis choisissez le bon et lancez:
Sudo apt install
Je devais installer le pilote NVIDIA 367.57 et CUDA 7.5 avec Tensorflow sur l'instance g2.2xlarge Ubuntu 14.04LTS. par exemple nvidia-graphics-drivers-367_367.57.orig.tar
Maintenant, le GPU GRID K520 fonctionne pendant que je forme des modèles tensorflow:
ubuntu@ip-10-0-1-70:~$ nvidia-smi
Sat Apr 1 18:03:32 2017
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 367.57 Driver Version: 367.57 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 GRID K520 Off | 0000:00:03.0 Off | N/A |
| N/A 39C P8 43W / 125W | 3800MiB / 4036MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| 0 2254 C python 3798MiB |
+-----------------------------------------------------------------------------+
ubuntu@ip-10-0-1-70:~/NVIDIA_CUDA-7.0_Samples/1_Utilities/deviceQuery$ ./deviceQuery
./deviceQuery Starting...
CUDA Device Query (Runtime API) version (CUDART static linking)
Detected 1 CUDA Capable device(s)
Device 0: "GRID K520"
CUDA Driver Version / Runtime Version 8.0 / 7.0
CUDA Capability Major/Minor version number: 3.0
Total amount of global memory: 4036 MBytes (4232052736 bytes)
( 8) Multiprocessors, (192) CUDA Cores/MP: 1536 CUDA Cores
GPU Max Clock rate: 797 MHz (0.80 GHz)
Memory Clock rate: 2500 Mhz
Memory Bus Width: 256-bit
L2 Cache Size: 524288 bytes
Maximum Texture Dimension Size (x,y,z) 1D=(65536), 2D=(65536, 65536), 3D=(4096, 4096, 4096)
Maximum Layered 1D Texture Size, (num) layers 1D=(16384), 2048 layers
Maximum Layered 2D Texture Size, (num) layers 2D=(16384, 16384), 2048 layers
Total amount of constant memory: 65536 bytes
Total amount of shared memory per block: 49152 bytes
Total number of registers available per block: 65536
Warp size: 32
Maximum number of threads per multiprocessor: 2048
Maximum number of threads per block: 1024
Max dimension size of a thread block (x,y,z): (1024, 1024, 64)
Max dimension size of a grid size (x,y,z): (2147483647, 65535, 65535)
Maximum memory pitch: 2147483647 bytes
Texture alignment: 512 bytes
Concurrent copy and kernel execution: Yes with 2 copy engine(s)
Run time limit on kernels: No
Integrated GPU sharing Host Memory: No
Support Host page-locked memory mapping: Yes
Alignment requirement for Surfaces: Yes
Device has ECC support: Disabled
Device supports Unified Addressing (UVA): Yes
Device PCI Domain ID / Bus ID / location ID: 0 / 0 / 3
Compute Mode:
< Default (multiple Host threads can use ::cudaSetDevice() with device simultaneously) >
deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 8.0, CUDA Runtime Version = 7.0, NumDevs = 1, Device0 = GRID K520
Result = PASS