nccl-test报错，实际解决过程-实战生产

最新推荐文章于 2025-05-13 15:53:17 发布

清风 001

最新推荐文章于 2025-05-13 15:53:17 发布

阅读量340

点赞数

CC 4.0 BY-SA版权

分类专栏： AI大模型底层建设文章标签： chrome 前端

本文链接：https://blog.csdn.net/jundao1997/article/details/145618484

AI大模型底层建设专栏收录该内容

该专栏为热销专栏榜第64名

46 篇文章 ¥59.90 ¥99.00

订阅专栏

超级会员免费看

报错日志

解决步骤

步骤 1: 确认 OpenMPI 已安装

步骤 2: 检查 libmpi.so.40 文件位置

步骤 3: 更新库路径

方法 1: 使用 ldconfig

方法 2: 设置 LD_LIBRARY_PATH 环境变量

步骤 4: 验证库路径是否生效

步骤 5: 重新运行测试

其他可能的问题和解决方案

报错日志

fs@h1-6-gpu:~/nccl-tests$ cd /home/fs
fs@h1-6-gpu:~$ git clone https://github.com/NVIDIA/nccl-tests.git 
fatal: destination path 'nccl-tests' already exists and is not an empty directory.
fs@h1-6-gpu:~$ cd nccl-tests  && make MPI=1 MPI_HOME=/usr/lib/x86_64-linux-gnu/openmpi/
make -C src build BUILDDIR=/home/fs/nccl-tests/build
make[1]: Entering directory '/home/fs/nccl-tests/src'
make[1]: Leaving directory '/home/fs/nccl-tests/src'
fs@h1-6-gpu:~/nccl-tests$ ./build/all_reduce_perf -b 8 -e 512M -f 2 -g 8
./build/all_reduce_perf: error while loading shared libraries: libmpi.s

了解本专栏