SAHI项目预测功能详解：切片推理与批量预测实战指南

原创于 2025-06-09 09:03:40 发布 · 295 阅读

CC 4.0 BY-SA版权

SAHI项目预测功能详解：切片推理与批量预测实战指南

sahi Framework agnostic sliced/tiled inference + interactive ui + error analysis plots 项目地址: https://gitcode.com/gh_mirrors/sa/sahi

一、SAHI预测功能概述

SAHI（Slicing Aided Hyper Inference）是一个专注于提升目标检测性能的开源工具库，其核心优势在于能够处理大尺寸图像或高分辨率场景下的目标检测任务。通过创新的切片推理技术，SAHI有效解决了传统目标检测模型在处理大图像时面临的内存限制和性能下降问题。

二、核心预测方法解析

2.1 切片推理（Sliced Inference）

切片推理是SAHI最具特色的功能，它将大图像分割成多个重叠的小切片，分别进行检测后再合并结果。这种方法特别适合处理高分辨率医学图像、卫星遥感图像等场景。

from sahi.predict import get_sliced_prediction
from sahi import AutoDetectionModel

# 初始化模型（支持多种框架）
detection_model = AutoDetectionModel.from_pretrained(
    model_type='mmdet',  # 可选：mmdet/ultralytics/huggingface/torchvision
    model_path='path/to/model',
    config_path='path/to/config'  # 部分框架需要
)

# 执行切片预测
result = get_sliced_prediction(
    image,  # 输入图像路径或numpy数组
    detection_model,
    slice_height=256,  # 切片高度
    slice_width=256,   # 切片宽度
    overlap_height_ratio=0.2,  # 高度重叠比例
    overlap_width_ratio=0.2    # 宽度重叠比例
)

关键参数说明：

slice_height/width：控制切片尺寸，需根据显存容量调整
overlap_ratio：切片重叠比例，防止目标被切分，通常0.1-0.3
支持MMDetection、YOLO系列、HuggingFace等主流框架

2.2 标准推理（Standard Inference）

对于常规尺寸图像，可以直接使用标准推理模式：

from sahi.predict import get_prediction

result = get_prediction(
    image,
    detection_model  # 使用相同方式初始化的模型
)

2.3 批量推理（Batch Inference）

SAHI提供了高效的批量预测接口，支持对整个文件夹的图像进行处理：

from sahi.predict import predict

result = predict(
    model_type='ultralytics',
    model_path='yolov8n.pt',
    source='image_folder/',
    slice_height=512,
    slice_width=512,
    overlap_height_ratio=0.2,
    model_device='cuda:0'  # 使用GPU加速
)

三、高级功能应用

3.1 类别过滤功能

在实际应用中，我们可能只需要检测特定类别的目标：

# 通过类名排除
exclude_classes_by_name = ["person", "car"]

# 或通过类别ID排除
exclude_classes_by_id = [0, 2]

result = get_sliced_prediction(
    image,
    detection_model,
    exclude_classes_by_name=exclude_classes_by_name
)

3.2 结果可视化与导出

SAHI提供了丰富的可视化选项和多种导出格式：

# 自定义可视化导出
result.export_visuals(
    export_dir="output/",
    text_size=1.5,      # 标签文字大小
    rect_th=3,          # 边界框粗细
    color=(0, 255, 0),  # RGB颜色值
    export_format="png"  # 支持jpg/png
)

# 导出为COCO格式
coco_annotations = result.to_coco_annotations()
coco_predictions = result.to_coco_predictions(image_id=1)

# 导出为FiftyOne兼容格式
fo_detections = result.to_fiftyone_detections()