极智项目 | YOLO11目标检测算法训练+TensorRT部署实战

欢迎关注我的公众号 [极智视界],获取我的更多技术分享

项目下载方式:链接

大家好,我是极智视界,本文分享实战项目之YOLO11目标检测算法训练+TensorRT部署实战。

1. 项目介绍

  • 项目作者: 极智视界
  • 项目init时间: 20241001
  • 项目介绍:对YOLO11基于coco_minitrain_10k数据集进行训练,并使用py TensorRT进行加速推理,包括导onnx和onnx2trt转换脚本
  • 项目参考:YOLO11部分参考 => https://github.com/ultralytics/ultralytics (这是YOLO11算法的官方出处)

2. 算法训练

(1) 数据集整备

数据集放在 datasets/coco_minitrain_10k
数据集目录结构如下:

datasets/
└── coco_mintrain_10k/
    ├── annotations/
    │   ├── instances_train2017.json
    │   ├── instances_val2017.json
    │   ├── ... (其他标注文件)
    ├── train2017/
    │   ├── 000000000001.jpg
    │   ├── ... (其他训练图像)
    ├── val2017/
    │   ├── 000000000001.jpg
    │   ├── ... (其他验证图像)
    └── test2017/
        ├── 000000000001.jpg
        ├── ... (其他测试图像)

(2) 训练环境搭建

conda creaet -n yolo11_py310 python=3.10

conda activate yolo11_py310

pip install -U -r train/requirements.txt

(3) 推理测试

先下载预训练权重:

bash 0_download_wgts.sh

执行预测测试:

bash 1_run_predict_yolo11.sh

预测结果保存在 runs 文件夹下,效果如下:

(4) 开启训练

已经准备好一键训练肩膀,直接执行训练脚本:

bash 2_run_train_yolo11.sh

其中其作用的代码很简单,就在 train/train_yolo11.py 中,如下:

# Load a model
model = YOLO(curr_path + "/wgts/yolo11n.pt")

# Train the model
train_results = model.train(
    data= curr_path + "/cfg/coco128.yaml",  # path to dataset YAML
    epochs=100,  # number of training epochs
    imgsz=640,  # training image size
    device="0",  # device to run on, i.e. device=0 or device=0,1,2,3 or device=cpu
)

# Evaluate model performance on the validation set
metrics = model.val()

主要就是配置一下训练参数,如数据集路径、训练轮数、显卡ID、图片大小等,然后执行训练即可
训练完成后,训练日志会在 runs/train 文件夹下,比如训练中 val 预测图片如下:

这样就完成了算法训练

3. 算法部署

使用 TensorRT 进行算法部署

(1) 导ONNX

直接执行一键导出ONNX脚本:

bash 3_run_export_onnx.sh

在脚本中已经对ONNX做了sim的简化
生成的ONNX以及_simONNX模型保存在wgts文件夹下

(2) 安装tensorrt环境

直接去NVIDIA的官网下载(https://developer.nvidia.com/tensorrt/download)对应版本的tensorrt TAR包,解压
基本步骤如下:

tar zxvf TensorRT-xxx-.tar.gz

# 软链trtexec
sudo ln -s /path/to/TensorRT/bin/trtexec /usr/local/bin
# 验证一下
trtexec --help

# 安装trt的python接口
cd python
pip install tensorrt-xxx.whl

(3) 生成trt模型引擎文件

直接执行一键生成trt模型引擎的脚本:

bash 4_build_trt_engine.sh

正常会在wgts路径下生成yolo11n.engine,并有类似如下的日志:

[10/02/2024-21:28:48] [V] === Explanations of the performance metrics ===
[10/02/2024-21:28:48] [V] Total Host Walltime: the host walltime from when the first query (after warmups) is enqueued to when the last query is completed.
[10/02/2024-21:28:48] [V] GPU Compute Time: the GPU latency to execute the kernels for a query.
[10/02/2024-21:28:48] [V] Total GPU Compute Time: the summation of the GPU Compute Time of all the queries. If this is significantly shorter than Total Host Walltime, the GPU may be under-utilized because of host-side overheads or data transfers.
[10/02/2024-21:28:48] [V] Throughput: the observed throughput computed by dividing the number of queries by the Total Host Walltime. If this is significantly lower than the reciprocal of GPU Compute Time, the GPU may be under-utilized because of host-side overheads or data transfers.
[10/02/2024-21:28:48] [V] Enqueue Time: the host latency to enqueue a query. If this is longer than GPU Compute Time, the GPU may be under-utilized.
[10/02/2024-21:28:48] [V] H2D Latency: the latency for host-to-device data transfers for input tensors of a single query.
[10/02/2024-21:28:48] [V] D2H Latency: the latency for device-to-host data transfers for output tensors of a single query.
[10/02/2024-21:28:48] [V] Latency: the summation of H2D Latency, GPU Compute Time, and D2H Latency. This is the latency to infer a single query.
[10/02/2024-21:28:48] [I] 
&&&& PASSED TensorRT.trtexec [TensorRT v100500] [b18] # trtexec --onnx=../wgts/yolo11n_sim.onnx --saveEngine=../wgts/yolo11n.engine --fp16 --verbose

(4) 执行trt推理

直接执行一键推理脚本:

bash 5_infer_trt.sh

实际的trt推理脚本在 deploy/infer_trt.py
推理成功会有如下日志:

 ------ trt infer success! ------ 

推理结果保存在 deploy/output.jpg

如下:

好了,以上分享了实战项目之YOLO11目标检测算法训练+TensorRT部署实战,希望我的分享能对你的学习有一点帮助。

### YOLOv11TensorRT在Python中的集成 目前可获得的信息主要集中在YOLO系列较早版本如YOLOv5与TensorRT的集成方法[^1]。对于YOLOv11的具体实现细节尚未广泛公开,因此无法提供针对该特定版本的确切指导。然而,可以推测YOLOv11TensorRT的集成过程可能遵循类似的模式。 #### 安装依赖库 为了能够在Python环境中使用YOLOv11配合TensorRT进行部署或优化,首先需要确保安装了必要的软件包和工具链。这通常包括但不限于CUDA Toolkit、cuDNN以及TensorRT本身。这些组件可以通过NVIDIA官方渠道获取并按照说明完成设置[^4]。 #### 导出模型文件 假设已经训练好了YOLOv11权重文件(例如`.pt`),下一步就是将其转换成适合TensorRT加载的形式。此操作一般通过导出ONNX格式或其他兼容格式来达成。参考以往经验,可能会存在专门用于此类目的脚本,比如: ```bash python export.py --weights yolov11.pt --include onnx ``` 上述命令会将PyTorch格式(`.pt`) 的YOLOv11模型转化为ONNX格式以便后续处理[^3]。 #### 构建TensorRT引擎 一旦拥有了合适的中间表示形式(如ONNX),就可以利用TensorRT API创建高效的推理引擎。这一阶段涉及编写一段Python代码片段调用相关API读取ONNX定义,并据此生成序列化后的计划文件(plan file)供实际应用中快速加载执行。下面给出一个简化版的例子: ```python import tensorrt as trt from polygraphy.backend.trt import CreateConfig, EngineFromNetwork, NetworkFromOnnxPath # 加载 ONNX 文件路径 onnx_file_path = "path_to_your/yolov11.onnx" # 创建 TensorRT 运行环境 TRT_LOGGER = trt.Logger(trt.Logger.WARNING) config = CreateConfig() engine_builder = EngineFromNetwork(NetworkFromOnnxPath(onnx_file_path), config=config) with engine_builder() as engine: with open("yolov11.engine", "wb") as f: f.write(engine.serialize()) ``` 这段程序负责解析给定的ONNX描述并将之编译成为适用于目标硬件平台的最佳性能配置下的TensorRT引擎实例。 #### 执行推断任务 最后,在应用程序层面只需要简单地加载之前保存下来的引擎文件即可开始图像检测等工作流。这里展示了一个基本框架示意如何初始化Session对象进而发起预测请求: ```python import pycuda.driver as cuda import pycuda.autoinit import numpy as np import cv2 import tensorrt as trt class HostDeviceMem(object): def __init__(self, host_mem, device_mem): self.host = host_mem self.device = device_mem def allocate_buffers(engine): inputs = [] outputs = [] bindings = [] stream = cuda.Stream() for binding in engine: size = trt.volume(engine.get_binding_shape(binding)) * engine.max_batch_size dtype = trt.nptype(engine.get_binding_dtype(binding)) # Allocate host and device buffers host_mem = cuda.pagelocked_empty(size, dtype) device_mem = cuda.mem_alloc(host_mem.nbytes) # Append the device buffer to device bindings. bindings.append(int(device_mem)) # Append to the appropriate list. if engine.binding_is_input(binding): inputs.append(HostDeviceMem(host_mem, device_mem)) else: outputs.append(HostDeviceMem(host_mem, device_mem)) return inputs, outputs, bindings, stream def do_inference(context, bindings, inputs, outputs, stream, batch_size=1): # Transfer input data to the GPU. [cuda.memcpy_htod_async(inp.device, inp.host, stream) for inp in inputs] # Run inference. context.execute_async(batch_size=batch_size, bindings=bindings, stream_handle=stream.handle) # Transfer predictions back from the GPU. [cuda.memcpy_dtoh_async(out.host, out.device, stream) for out in outputs] # Synchronize the stream stream.synchronize() return [out.host for out in outputs] if __name__ == '__main__': TRT_LOGGER = trt.Logger(trt.Logger.WARNING) runtime = trt.Runtime(TRT_LOGGER) with open("yolov11.engine", 'rb') as f: serialized_engine = f.read() engine = runtime.deserialize_cuda_engine(serialized_engine) context = engine.create_execution_context() image = cv2.imread('test.jpg') img_resized = preprocess(image) # 预处理函数需自行定义 inputs, outputs, bindings, stream = allocate_buffers(engine) np.copyto(inputs[0].host, img_resized.ravel()) detections = do_inference(context, bindings=bindings, inputs=inputs, outputs=outputs, stream=stream)[0] postprocess(detections) # 后处理逻辑同样由开发者自定义 ``` 以上即为大致流程概述;需要注意的是具体实施过程中还需考虑更多因素,像输入预处理方式的选择、输出解码策略的设计等均会影响最终效果表现。
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

极智视界

你的支持 是我持续创作的动力

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值