目标定位程序实现

MR_Colorful

已于 2025-04-10 10:14:14 修改

阅读量2.4k

点赞数 8

CC 4.0 BY-SA版权

分类专栏：视觉抓取文章标签：数码相机

于 2025-01-04 16:31:35 首次发布

本文链接：https://blog.csdn.net/MR_Colorful/article/details/144931121

视觉抓取专栏收录该内容

3 篇文章

订阅专栏

1：载入双目相机参数

2: 双目测距

2.1 消除畸变

在立体视觉系统中，在立体校正前进行畸变消除是必要的，因为相机的镜头通常会引入几何畸变，而这些畸变会影响图像的几何一致性，从而影响立体校正和后续的深度计算。

# 消除畸变
def undistortion(image, camera_matrix, dist_coeff):
    undistortion_image = cv2.undistort(image, camera_matrix, dist_coeff)

    return undistortion_image

2.2 立体校正

2.2.1 校正目的

立体校正利用双目标定的内外参数（焦距、成像原点、畸变系数）和双目相对位置关系（旋转矩阵和平移向量），分别对左右视图进行消除畸变和行对准，使得左右视图的成像原点坐标一致、两摄像头光轴平行、左右成像平面共面、对极线行对齐。

校正前的左右相机的光心并不是平行的，两个光心的连线就叫基线，像平面与基线的交点就是极点，像点与极点所在的直线就是极线，左右极线与基线构成的平面就是空间点对应的极平面。
校正后，极点在无穷远处，两个相机的光轴平行。像点在左右图像上的高度一致。这也就是极线校正的目标。校正后做后续的立体匹配时，只需在同一行上搜索左右像平面的匹配点即可，能使效率大大提高。

2.2.2 校正方法

        利用OpenCV中的stereoRectify()函数实现立体校正，stereoRectify()函数内部采用的是Bouguet的极线校正算法,Bouguet算法步骤：
        1、将右图像平面相对于左图像平面的旋转矩阵分解成两个矩阵Rl和Rr，叫做左右相机的合成旋转矩阵
        2、将左右相机各旋转一半，使得左右相机的光轴平行。此时左右相机的成像面达到平行，但是基线与成像平面不平行
        3、构造变换矩阵Rrect使得基线与成像平面平行。构造的方法是通过右相机相对于左相机的偏移矩阵T完成的
        4、通过合成旋转矩阵与变换矩阵相乘获得左右相机的整体旋转矩阵。左右相机坐标系乘以各自的整体旋转矩阵就可使得左右相机的主光轴平行，且像平面与基线平行
        5、通过上述的两个整体旋转矩阵，就能够得到理想的平行配置的双目立体系图像。校正后根据需要对图像进行裁剪，需重新选择一个图像中心，和图像边缘从而让左、右叠加部分最大

2.2.3 相关代码

def getRectifyTransform(height, width, config):
    # 读取内参和外参
    left_K = config.cam_matrix_left
    right_K = config.cam_matrix_right
    left_distortion = config.distortion_l
    right_distortion = config.distortion_r
    R = config.R
    T = config.T

    # 计算校正变换
    R1, R2, P1, P2, Q, roi1, roi2 = cv2.stereoRectify(left_K, left_distortion, right_K, right_distortion,
                                                      (width, height), R, T, alpha=0)
    map1x, map1y = cv2.initUndistortRectifyMap(left_K, left_distortion, R1, P1, (width, height), cv2.CV_32FC1)
    map2x, map2y = cv2.initUndistortRectifyMap(right_K, right_distortion, R2, P2, (width, height), cv2.CV_32FC1)

    return map1x, map1y, map2x, map2y, Q


# 畸变校正和立体校正
def rectifyImage(image1, image2, map1x, map1y, map2x, map2y):
    rectifyed_img1 = cv2.remap(image1, map1x, map1y, cv2.INTER_AREA)
    rectifyed_img2 = cv2.remap(image2, map2x, map2y, cv2.INTER_AREA)

    return rectifyed_img1, rectifyed_img2

2.3 立体匹配和视差计算

立体匹配也称作视差估计，立体匹配可划分为四个步骤：匹配代价计算、代价聚合、视差计算和视差优化。立体校正后的左右两幅图像得到后，匹配点是在同一行上的，可以使用OpenCV中的BM算法或者SGBM算法计算视差图。由于SGBM算法的表现要远远优于BM算法，因此采用SGBM算法获取视差图。在立体匹配生成视差图后，可以对视差图进行后处理，如滤波，空洞填充等方法，从而改善视差图的视觉效果

2.3.1 相关代码

def stereoMatchSGBM(left_image, right_image, down_scale=False):
    # SGBM匹配参数设置
    if left_image.ndim == 2:
        img_channels = 1
    else:
        img_channels = 3
    blockSize = 3
    paraml = {'minDisparity': 0,
              'numDisparities': 64,
              'blockSize': blockSize,
              'P1': 8 * img_channels * blockSize ** 2,
              'P2': 32 * img_channels * blockSize ** 2,
              'disp12MaxDiff': 1,
              'preFilterCap': 63,
              'uniquenessRatio': 15,
              'speckleWindowSize': 100,
              'speckleRange': 1,
              'mode': cv2.STEREO_SGBM_MODE_SGBM_3WAY
              }

    # 构建SGBM对象
    left_matcher = cv2.StereoSGBM_create(**paraml)
    paramr = paraml
    paramr['minDisparity'] = -paraml['numDisparities']
    right_matcher = cv2.StereoSGBM_create(**paramr)

    # 计算视差图
    size = (left_image.shape[1], left_image.shape[0])
    if down_scale == False:
        disparity_left = left_matcher.compute(left_image, right_image)
        disparity_right = right_matcher.compute(right_image, left_image)
    else:
        left_image_down = cv2.pyrDown(left_image)
        right_image_down = cv2.pyrDown(right_image)
        factor = left_image.shape[1] / left_image_down.shape[1]

        disparity_left_half = left_matcher.compute(left_image_down, right_image_down)
        disparity_right_half = right_matcher.compute(right_image_down, left_image_down)
        disparity_left = cv2.resize(disparity_left_half, size, interpolation=cv2.INTER_AREA)
        disparity_right = cv2.resize(disparity_right_half, size, interpolation=cv2.INTER_AREA)
        disparity_left = factor * disparity_left
        disparity_right = factor * disparity_right

    # 真实视差（因为SGBM算法得到的视差是×16的）
    trueDisp_left = disparity_left.astype(np.float32) / 16.
    trueDisp_right = disparity_right.astype(np.float32) / 16.

    return trueDisp_left, trueDisp_right

2.4 深度计算

得到视差图后，计算像素深度值，公式：depth = ( f * baseline) / disp
其中，depth表示深度图；f表示归一化的焦距，也就是内参中的fx； baseline是两个相机光心之间的距离，称作基线距离；disp是视差值。直接利用opencv中的cv2.reprojectImageTo3D（）函数计算深度图，代码如下：

def getDepthMapWithQ(disparityMap: np.ndarray, Q: np.ndarray) -> np.ndarray:
    points_3d = cv2.reprojectImageTo3D(disparityMap, Q)
    depthMap = points_3d[:, :, 2]
    reset_index = np.where(np.logical_or(depthMap < 0.0, depthMap > 65535.0))
    depthMap[reset_index] = 0
    return depthMap.astype(np.float32)

3：修改YOLOv5中detect.py中的部分代码

# Run inference
model.warmup(imgsz=(1 if pt or model.triton else bs, 3, *imgsz))  # warmup
seen, windows, dt = 0, [], (Profile(), Profile(), Profile())# 立体校正配置
config = stereoconfig.stereoCamera()

# 获取立体校正变换矩阵
map1x, map1y, map2x, map2y, Q = getRectifyTransform(720, 1280, config)

# 遍历数据集
for path, im, im0s, vid_cap, s in dataset:
    # 将图像转换为模型所需的格式
    with dt[0]:
        im = torch.from_numpy(im).to(model.device)
        im = im.half() if model.fp16 else im.float()  # uint8 to fp16/32
        im /= 255  # 0 - 255 to 0.0 - 1.0
        if len(im.shape) == 3:
            im = im[None]  # 扩展为批处理维度

    # 推理过程
    with dt[1]:
        visualize = increment_path(save_dir / Path(path).stem, mkdir=True) if visualize else False
        pred = model(im, augment=augment, visualize=visualize)

    # 非极大值抑制
    with dt[2]:
        pred = non_max_suppression(pred, conf_thres, iou_thres, classes, agnostic_nms, max_det=max_det)

    # 定义CSV文件路径
    csv_path = save_dir / 'predictions.csv'

    # 写入CSV文件的函数
    def write_to_csv(image_name, prediction, confidence):
        data = {'Image Name': image_name, 'Prediction': prediction, 'Confidence': confidence}
        with open(csv_path, mode='a', newline='') as f:
            writer = csv.DictWriter(f, fieldnames=data.keys())
            if not csv_path.is_file():
                writer.writeheader()
            writer.writerow(data)

    # 处理预测结果
    for i, det in enumerate(pred):  # 每张图片
        seen += 1
        if webcam:  # batch_size >= 1
            p, im0, frame = path[i], im0s[i].copy(), dataset.count
            s += f'{i}: '
        else:
            p, im0, frame = path, im0s.copy(), getattr(dataset, 'frame', 0)

        # 启动立体视觉线程
        thread = MyThread(stereo_threading, args=(config, im0, map1x, map1y, map2x, map2y, Q))
        thread.start()
        p = Path(p)  # 转换为Path对象
        save_path = str(save_dir / p.name)  # 图片保存路径
        txt_path = str(save_dir / 'labels' / p.stem) + ('' if dataset.mode == 'image' else f'_{frame}')  # 标签保存路径
        s += '%gx%g ' % im.shape[2:]  # 打印图像尺寸
        gn = torch.tensor(im0.shape)[[1, 0, 1, 0]]  # 归一化增益
        imc = im0.copy() if save_crop else im0  # 是否保存裁剪图像
        annotator = Annotator(im0, line_width=line_thickness, example=str(names))
        if len(det):
            # 将检测框从图像尺寸缩放到原始图像尺寸
            det[:, :4] = scale_boxes(im.shape[2:], det[:, :4], im0.shape).round()

            # 打印结果
            for c in det[:, 5].unique():
                n = (det[:, 5] == c).sum()  # 每类的检测数
                s += f"{n} {names[int(c)]}{'s' * (n > 1)}, "  # 添加到字符串

            # 写入结果
            for *xyxy, conf, cls in reversed(det):
                c = int(cls)  # 类别整数
                label = names[c] if hide_conf else f'{names[c]}'
                confidence = float(conf)
                confidence_str = f'{confidence:.2f}'

                # 写入CSV文件
                if save_csv:
                    write_to_csv(p.name, label, confidence_str)

                # 写入TXT文件
                if save_txt:  # 写入文件
                    xywh = (xyxy2xywh(torch.tensor(xyxy).view(1, 4)) / gn).view(-1).tolist()  # 归一化xywh
                    line = (cls, *xywh, conf) if save_conf else (cls, *xywh)  # 标签格式
                    with open(f'{txt_path}.txt', 'a') as f:
                        f.write(('%g ' * len(line)).rstrip() % line + '\n')

                # 在图像上添加检测框
                if save_img or save_crop or view_img:
                    c = int(cls)  # 类别整数
                    label = None if hide_labels else (names[c] if hide_conf else f'{names[c]} {conf:.2f}')
                    annotator.box_label(xyxy, label, color=colors(c, True))
                    # 双目测距代码
                    x_center = (xyxy[0] + xyxy[2]) / 2
                    y_center = (xyxy[1] + xyxy[3]) / 2
                    x_0 = int(x_center)
                    y_0 = int(y_center)
                    if 0 < x_0 < 1280:
                        x1 = xyxy[0]
                        x2 = xyxy[2]
                        y1 = xyxy[1]
                        y2 = xyxy[3]
                        thread.join()
                        points_3d = thread.get_result()
                        a = points_3d[int(y_0), int(x_0), 0] / 1000
                        b = points_3d[int(y_0), int(x_0), 1] / 1000
                        c = points_3d[int(y_0), int(x_0), 2] / 1000
                        distance = ((a ** 2 + b ** 2 + c ** 2) ** 0.5)
                        print(f'目标 {label} 的三维坐标为: ({a:.2f}, {b:.2f}, {c:.2f}) 米')
                        if distance != 0:
                            c = int(cls)  # 类别整数
                            label = None if hide_labels else (names[c] if hide_conf else f'{names[c]} {conf:.2f}')
                            annotator.box_label(xyxy, label, color=colors(c, True))
                            print('点 (%d, %d) 的 %s 距离左摄像头的相对距离为 %0.2f m' % (
                                x_center, y_center, label, distance))
                            text_dis_avg = "dis:%0.2fm" % distance
                            cv2.putText(im0, text_dis_avg, (int(x1 + (x2 - x1) + 5), int(y1 + 30)), cv2.FONT_ITALIC,
                                        1.2, (255, 255, 255), 3)
        if save_crop:
            save_one_box(xyxy, imc, file=save_dir / 'crops' / names[c] / f'{p.stem}.jpg', BGR=True)

    # 显示结果
    im0 = annotator.result()
    if view_img:
        cv2.namedWindow("Webcam", cv2.WINDOW_NORMAL)
        cv2.resizeWindow("Webcam", 1280, 480)
        cv2.moveWindow("Webcam", 0, 100)
        cv2.imshow("Webcam", im0)
        cv2.waitKey(1)