I came across a problem where my Tensorflow installation did not recognize the installed gpu, despite of Cuda and Nvidia drivers being installed properly.
test:
python3 -c “import tensorflow as tf; print(tf.config.list_physical_devices(‘GPU’))”
returned an empty list. Furthermore, it tells it cannot find the cuda library:
2024-01-30 14:57:42.015454: I tensorflow/tsl/cuda/cudart_stub.cc:28] Could not find cuda drivers on your machine, GPU will not be used.
Output of the Nvidia tool is correct and shows Cuda is installed:
nvidia-smi

ubuntu@ip-bla-foo:~/build-nb$ nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2023 NVIDIA Corporation
Built on Mon_Apr__3_17:16:06_PDT_2023
Cuda compilation tools, release 12.1, V12.1.105
Build cuda_12.1.r12.1/compiler.32688072_0
Which tells us it is version 12. Ahhh!💡
Now, 12 is a version from 2023 and my idea was that Tensorflow 2.13 might not know this version, see https://blog.tensorflow.org/2023/11/whats-new-in-tensorflow-2-15.html
Ok, the latest version pip offered was TF 2.13 on Python 3.8. Here is the fix:
- upgrade Python: sudo apt install python3.9
- a new venv: virtualenv –python /usr/bin/python3.9 ~/.env-python3.9
- source ~/.env-python3.9/bin/activate
- pip install –upgrade pip
- python3 -m pip install tensorflow[and-cuda]==2.15.0.post1
Test: python3 -c “import tensorflow as tf; print(tf.config.list_physical_devices(‘GPU’))”
2024-01-30 15:27:04.458720: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:9261] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2024-01-30 15:27:04.458772: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:607] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2024-01-30 15:27:04.459601: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1515] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2024-01-30 15:27:04.465334: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
2024-01-30 15:27:05.115551: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT
2024-01-30 15:27:05.560865: I external/local_xla/xla/stream_executor/cuda/cuda_executor.cc:901] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355
2024-01-30 15:27:05.585883: I external/local_xla/xla/stream_executor/cuda/cuda_executor.cc:901] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355
2024-01-30 15:27:05.586100: I external/local_xla/xla/stream_executor/cuda/cuda_executor.cc:901] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355
[PhysicalDevice(name='/physical_device:GPU:0', device_type='GPU')]
Now we see the GPU in Tensorflow.