2021-07-12 在 GeForce RTX 3090上配置深度学习环境 cuda 11.1 + tensorflow2.5.0 + python3.8.3_device interconnect
本博客配置成功的环境已经导出 至
https://download.csdn.net/download/Julse/20687132?spm=1001.2014.3001.5501
文章目录
成功安装的细节安装tensorflow-gpu 2.5.0安装keras安装 cudnn 问题1 -测试tensorflow是否安装成功问题2 tensorflow 和tensorlow-gpu问题3 conda 的多个数据源里面都没有 tensorflow-gpu=2.5.0,但是pip里面有问题4 tensorflow是gpu版本,keras是否也要指定gpu版本呢?问题5 tensorflow2.5和keras2.4.3可能不兼容问题6 cudnn 报错安装其他版本cuda未解决的问题其他问题:GeForce RTX 3090
配置环境的过程遇到了很多问题,最后成功配置的版本如下,亲测可用
tensorflow-gpu 2.5.0
cudnn 8.1.0.77
python 3.8.3
cuda 11.1
参考的版本对应关系如图
https://www.tensorflow.org/install/source
成功安装的细节
安装tensorflow-gpu 2.5.0
conda activate 虚拟环境名字
pip install tensorflow-gpu==2.5.0 # conda install tensorflow-gpu==2.5.0 如果找不到
检查是否安装成功,出现了/device:GPU:0 字眼,放心安装下一步
>>> tf.__version__
'2.5.0'
>>> tf.test.gpu_device_name()
出现如下字样
'/device:GPU:0'
没有再出现skip gpu...
2021-11-21 09:11:11.576578: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 AVX512F FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2021-11-21 09:11:11.586861: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcuda.so.1
2021-11-21 09:11:12.301775: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1733] Found device 0 with properties:
pciBusID: 0000:3b:00.0 name: GeForce RTX 3090 computeCapability: 8.6
coreClock: 1.695GHz coreCount: 82 deviceMemorySize: 23.70GiB deviceMemoryBandwidth: 871.81GiB/s
2021-11-21 09:11:12.302507: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1733] Found device 1 with properties:
pciBusID: 0000:5e:00.0 name: GeForce RTX 3090 computeCapability: 8.6
coreClock: 1.695GHz coreCount: 82 deviceMemorySize: 23.70GiB deviceMemoryBandwidth: 871.81GiB/s
2021-11-21 09:11:12.303282: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1733] Found device 2 with properties:
pciBusID: 0000:b1:00.0 name: GeForce RTX 3090 computeCapability: 8.6
coreClock: 1.695GHz coreCount: 82 deviceMemorySize: 23.70GiB deviceMemoryBandwidth: 871.81GiB/s
2021-11-21 09:11:12.303954: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1733] Found device 3 with properties:
pciBusID: 0000:d9:00.0 name: GeForce RTX 3090 computeCapability: 8.6
coreClock: 1.695GHz coreCount: 82 deviceMemorySize: 23.70GiB deviceMemoryBandwidth: 871.81GiB/s
2021-11-21 09:11:12.304010: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-11-21 09:11:12.322885: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcublas.so.11
2021-11-21 09:11:12.322999: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcublasLt.so.11
2021-11-21 09:11:12.337252: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcufft.so.10
2021-11-21 09:11:12.342694: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcurand.so.10
2021-11-21 09:11:12.348923: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcusolver.so.11
2021-11-21 09:11:12.354146: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcusparse.so.11
2021-11-21 09:11:12.356096: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudnn.so.8
2021-11-21 09:11:12.362607: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1871] Adding visible gpu devices: 0, 1, 2, 3
2021-11-21 09:11:12.363212: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-11-21 09:11:17.044150: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1258] Device interconnect StreamExecutor with strength 1 edge matrix:
2021-11-21 09:11:17.044213: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1264] 0 1 2 3
2021-11-21 09:11:17.044240: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1277] 0: N N N N
2021-11-21 09:11:17.044245: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1277] 1: N N N N
2021-11-21 09:11:17.044249: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1277] 2: N N N N
2021-11-21 09:11:17.044254: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1277] 3: N N N N
2021-11-21 09:11:17.050196: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1418] Created TensorFlow device (/device:GPU:0 with 3793 MB memory) -> physical GPU (device: 0, name: GeForce RTX 3090, pci bus id: 0000:3b:00.0, compute capability: 8.6)
2021-11-21 09:11:17.053391: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1418] Created TensorFlow device (/device:GPU:1 with 3665 MB memory) -> physical GPU (device: 1, name: GeForce RTX 3090, pci bus id: 0000:5e:00.0, compute capability: 8.6)
2021-11-21 09:11:17.054353: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1418] Created TensorFlow device (/device:GPU:2 with 3663 MB memory) -> physical GPU (device: 2, name: GeForce RTX 3090, pci bus id: 0000:b1:00.0, compute capability: 8.6)
2021-11-21 09:11:17.055315: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1418] Created TensorFlow device (/device:GPU:3 with 3771 MB memory) -> physical GPU (device: 3, name: GeForce RTX 3090, pci bus id: 0000:d9:00.0, compute capability: 8.6)
'/device:GPU:0'
安装keras
pip install keras
在激活conda虚拟环境的条件下,tensorflow用pip命令安装,keras也用pip安装,不然conda会再安装一个tensorflow,导致冲突
代码中所有的keras改成tensorflow.keras, keras包其实不再用上了,这个包没必要再装了。
安装 cudnn
conda install -c nvidia cudnn=8.1.0
问题1 -测试tensorflow是否安装成功
虽然有博客说这个报错可以直接忽略,但是亲测gpu无法使用,说明没有安装好
import tensorflow as tf
tf.test.gpu_device_name()
报错信息
I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: SSE4.1 SSE4.2 AVX AVX2 AVX512F FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags
解决思路,先查询了一下oneDNN是什么
https://01.org/onednn
后来发现其实就是tensorfow没有安装正确,需要卸载重新安装
看到有文章说可以忽略,但是gpu无法成功使用,仅仅是不把报错信息显示出来而已
https://blog.csdn.net/qq_39096123/article/details/100575784
问题2 tensorflow 和tensorlow-gpu
官方网站中提到,早期版本二者软件包是分开的,因此就认为直接安装tensorlow 2.5 版本就好了,事实上发现,用cpu编译的tensorflow,gpu上安装不能成功
参考 https://www.jianshu.com/p/e772b880b4d2
查看tensorflow是否能调用gpu
tf.config.list_physical_devices('GPU')
得到一个空的列表,说明没有找到GPU
tf.test.is_built_with_cuda
发现直接安装的tensorflow,不是用cuda编译的,也就不能调用gpu
应该安装tensorflow-gpu
问题3 conda 的多个数据源里面都没有 tensorflow-gpu=2.5.0,但是pip里面有
此时版本信息
conda 4.9.2
pip 21.1.3 from /home/username/miniconda3/envs/envnames/lib/python3.8/site-packages/pip (python 3.8)
用pip安装之后,对应的cuda版本没有自动安装好
根据python版本指定tensorflow
pip install https://storage.googleapis.com/tensorflow/linux/gpu/tensorflow_gpu-2.4.0-cp38-cp38-manylinux2010_x86_64.whl
安装之后会看到有:tensorflow-gpu 2.4.0
依然无法成功使用GPU
问题4 tensorflow是gpu版本,keras是否也要指定gpu版本呢?
keras-gpu
安装keras-gpu用如下指令
安装之后tensorflow会被conda自动更新
也就是,直接安装keras-gpu就可以了,对应tensorflow-gpu也就自动装好了
但是进入python控制台,发现tensorflow不能用了,可能是因为pip装了一个tensorlow,conda又装了一个
此外,安装的keras-gpu并不能通过import keras导入,无法满足当前程序,因此摈弃这种安装方式
问题5 tensorflow2.5和keras2.4.3可能不兼容
运行代码时候报错,报错的是keras
keras和tf.keras关系
解决:把代码中所有的keras改成tensorflow.keras
问题6 cudnn 报错
Failed to get convolution algorithm.
This is probably because cuDNN failed to initialize,
so try looking to see if a warning log message was printed above.
详细信息
Loaded runtime CuDNN library: 8.0.5
but source was compiled with: 8.1.0.
CuDNN library needs to have matching major version and equal or higher minor version. If using a binary install, upgrade your CuDNN library. If building from sources, make sure the library loaded at runtime is compatible with the version specified during compile configuration.
安装cudnn 8.1.0即可解决
pip install cudnn
ERROR: Could not find a version that satisfies the requirement cudnn (from versions: none)
ERROR: No matching distribution found for cudnn
conda找到了对应版本
但是默认的版本不符合要求
最后发现应该 输入下面的命令安装正确版本的cudnn
https://anaconda.org/nvidia/cudnn
conda install -c nvidia cudnn
安装其他版本cuda
服务器配置多版本CUDA、CUdnn(不同Linux账户使用不同CUDA、CUdnn版本)
https://www.cnblogs.com/sddai/p/10278005.html
下载链接
https://developer.nvidia.com/cuda-toolkit-archive
在官网下载cuda,然后解压,配置环境变量
即是:在用户目录下面的.bashrc 文件末尾,加上这几句,然后source .bashrc 即可
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/home/user/cuda/lib64
export PATH=$PATH:/home/user/cuda/bin
export CUDA_HOME=$CUDA_HOME:/home/user/cuda
未解决的问题
全为N的矩阵 与 部分为Y的矩阵 表示的含义,训练模型的时候有无影响
之前的理解是Y是表示两两之间可以通讯,但是目前全部是N,一个程序能成功调用多块GPU,Y与N目前没有造成影响
其他问题:
安装tensorflow-gpu==2.4的时候找不到文件
在这个网站上找到之后,点击文件详情,复制source_url
conda install <source_url>
即可瞬间安装好tensorflow-gpu
https://anaconda.org/anaconda/tensorflow-gpu/files
安装好的效果
安装好之后,import报错
下载这个文件发现,里面只有一些基本信息,没有内容
不能走这个捷径
相关文章
- 第十七届全国大学生智能车竞赛智能视觉组总结_OneMiddle_openart mini
- 【GPT-3】第2章 使用 OpenAI API_Sonhhxg_柒_openai怎么使用
- 【前端工程化】我开源了个高效的图片压缩工具 - fast-imagemin-cli_前端开发小司机
- RT-Thread 框架下,GD32F450,串口DMA收发驱动 编写示例_灵魂Maker_rt-thread gd32f4 uart
- ESP32 介绍_慕容星言_esp32
- 注解开发:spring的强项(1)_yyy言者
- 智能车:这是你要找的电磁杆吗?_悟黎678_电磁杆
- 【开源物联网】微信小程序MQTT通信及开源框架实现_navlange_微信小程序mqtt通信
- mac m1 安装jdk8_瑶吖瑶吖瑶_jdk8 m1
- 回收站删除的文件能恢复吗?回收站文件恢复,3招解决_数据蛙苹果恢复专家
- 新大陆物联网-Android实现网关功能-连接云平台并上传传感器数据-获取执行器指令并执行-Android网关开发-通信-数据上传云平台-JAVA原理讲解-免费云平台使用-竞赛2022国赛真题_西西菜
- Mac中git ssh配置_我会阿巴_mac配置git ssh
- MQTT 服务 搭建_小群仔_mqtt服务器搭建
- STM32 0.96寸4针IOLED显示器驱动IIC(HAL库)_嵌入式lover_oled驱动代码
- 机器学习期末题库_Crescent_P_机器学习题库
- adb 的安装与连接手机详解_weiqigreen_安卓手机安装adb