论文阅读： Channel Augmented Joint Learning for Visible-Infrared Recognition

CH-Yuan

已于 2022-07-25 12:49:13 修改

阅读量1.2k

点赞数 1

CC 4.0 BY-SA版权

分类专栏：深度行人文章标签：计算机视觉人工智能深度学习

于 2022-07-25 12:47:26 首次发布

本文链接：https://blog.csdn.net/yuanchheneducn/article/details/125915464

深度行人专栏收录该内容

21 篇文章

订阅专栏

这篇论文提出了针对可见光-红外图像匹配的数据增广方法，包括随机通道交换和通道级随机擦除，增强了模型对颜色变化的鲁棒性。此外，还介绍了增强型通道混合学习策略，用于同时处理类内和类间变化，并通过平方差优化增广图像的输出。实验结果显示，这些方法提高了跨模态识别的性能。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

论文阅读： Channel Augmented Joint Learning for Visible-Infrared Recognition

code: https://gitee.com/mindspore/contrib/tree/master/papers/CAJ

动机：

现有的图像增广策略主要针对单模态的可见光图像，没有考虑可见光-红外图像匹配时的图像特性。
在这里插入图片描述

主要工作：

数据增广：通过随机交换颜色通道生成与颜色无关的图像，能和现有增广方法相结合，增强对颜色变化的鲁棒性。——模拟随机遮挡，丰富了图像的多样性。
针对跨模态度量学习，提出Channel-mixed learning strategy，利用平方差，同时处理类内和类间变化；进一步提出channel-augmented joint learning strategy：明确优化增广图像的输出。

图像增广

Random Channel Exchangeable Augmentation

该增广方法可以被理解为：均匀产生可见光图像的三个通道。这样鼓励模型去学习每个颜色通道与单通道可见光图像见的关系。
Channel-Level Random Erasing （CRE)

替换成为：从ImageNet中获取的 R, G and B channels 的均值。
另外，也采用grayscale trasformation(GA)，random horizontal flip (FP)

代码如下：

from __future__ import absolute_import
import random
import math在这里插入图片描述



class ChannelAdap():
    """ Adaptive selects a channel or two channels.
    Args:
         probability: The probability that the Random Erasing operation will be performed.
         sl: Minimum proportion of erased area against input image.
         sh: Maximum proportion of erased area against input image.
         r1: Minimum aspect ratio of erased area.
         mean: Erasing value.
    """

    def __init__(self, probability=0.5):
        self.probability = probability

    def __call__(self, img):

        # if random.uniform(0, 1) > self.probability:
            # return img

        idx = random.randint(0, 3)

        if idx == 0:
            # random select R Channel
            img[1, :, :] = img[0, :, :]
            img[2, :, :] = img[0, :, :]
        elif idx == 1:
            # random select B Channel
            img[0, :, :] = img[1, :, :]
            img[2, :, :] = img[1, :, :]
        elif idx == 2:
            # random select G Channel
            img[0, :, :] = img[2, :, :]
            img[1, :, :] = img[2, :, :]
        else:
            img = img

        return img

class ChannelAdapGray():
    """ Adaptive selects a channel or two channels.
    Args:
         probability: The probability that the Random Erasing operation will be performed.
         sl: Minimum proportion of erased area against input image.
         sh: Maximum proportion of erased area against input image.
         r1: Minimum aspect ratio of erased area.
         mean: Erasing value.
    """

    def __init__(self, probability=0.5):
        self.probability = probability


    def __call__(self, img):

        # if random.uniform(0, 1) > self.probability:
            # return img

        idx = random.randint(0, 3)
        if idx == 0:
            # random select R Channel
            img[1, :, :] = img[0, :, :]
            img[2, :, :] = img[0, :, :]
        elif idx == 1:
            # random select B Channel
            img[0, :, :] = img[1, :, :]
            img[2, :, :] = img[1, :, :]
        elif idx == 2:
            # random select G Channel
            img[0, :, :] = img[2, :, :]
            img[1, :, :] = img[2, :, :]
        else:
            if random.uniform(0, 1) > self.probability:
                # return img
                img = img
            else:
                tmp_img = 0.2989 * img[0, :, :] + 0.5870 * img[1, :, :] + 0.1140 * img[2, :, :]
                img[0, :, :] = tmp_img
                img[1, :, :] = tmp_img
                img[2, :, :] = tmp_img
        return img

class ChannelRandomErasing():
    """ Randomly selects a rectangle region in an image and erases its pixels.
        'Random Erasing Data Augmentation' by Zhong et al.
    Args:
         probability: The probability that the Random Erasing operation will be performed.
         sl: Minimum proportion of erased area against input image.
         sh: Maximum proportion of erased area against input image.
         r1: Minimum aspect ratio of erased area.
         mean: Erasing value.
    """

    def __init__(self, probability=0.5, sl=0.02, sh=0.4, r1=0.3):
        self.probability = probability
        self.mean = [0.4914, 0.4822, 0.4465]
        self.sl = sl
        self.sh = sh
        self.r1 = r1
    def __call__(self, img):
        if random.uniform(0, 1) > self.probability:
            return img
        for _ in range(100):
            area = img.shape[1] * img.shape[2]

            target_area = random.uniform(self.sl, self.sh) * area
            aspect_ratio = random.uniform(self.r1, 1/self.r1)

            h = int(round(math.sqrt(target_area * aspect_ratio)))
            w = int(round(math.sqrt(target_area / aspect_ratio)))

            if w < img.shape[2] and h < img.shape[1]:
                x1 = random.randint(0, img.shape[1] - h)
                y1 = random.randint(0, img.shape[2] - w)
                if img.shape[0] == 3:
                    img[0, x1:x1+h, y1:y1+w] = self.mean[0]
                    img[1, x1:x1+h, y1:y1+w] = self.mean[1]
                    img[2, x1:x1+h, y1:y1+w] = self.mean[2]
                # TODO when will img.shape != 3
                else:
                    img[0, x1:x1+h, y1:y1+w] = self.mean[0]
                return img

        return img

class ChannelExchange():
    """ Adaptive selects a channel or two channels.
    Args:
         probability: The probability that the Random Erasing operation will be performed.
         sl: Minimum proportion of erased area against input image.
         sh: Maximum proportion of erased area against input image.
         r1: Minimum aspect ratio of erased area.
         mean: Erasing value.
    """
    def __init__(self, gray=2):
        self.gray = gray

    def __call__(self, img):
        idx = random.randint(0, self.gray)
        if idx == 0:
            # random select R Channel
            img[1, :, :] = img[0, :, :]
            img[2, :, :] = img[0, :, :]
        elif idx == 1:
            # random select B Channel
            img[0, :, :] = img[1, :, :]
            img[2, :, :] = img[1, :, :]
        elif idx == 2:
            # random select G Channel
            img[0, :, :] = img[2, :, :]
            img[1, :, :] = img[2, :, :]
        else:
            tmp_img = 0.2989 * img[0, :, :] + 0.5870 * img[1, :, :] + 0.1140 * img[2, :, :]
            img[0, :, :] = tmp_img
            img[1, :, :] = tmp_img
            img[2, :, :] = tmp_img
        return img

跨模态度量学习

在这里插入图片描述
1. Enhanced Channel-Mixed Learning
构建一个包括不同模态的图像，不去考虑模态的差异进行直接优化它们的关系。优化身份损失和 weighted regularization triplet loss（加权规则化的triplet loss）。

值得注意的是：pj和pk可以来自统一模态，也可以来自不同模态。——这就是作者提出mixed的含义吧，就是从混合模态组成的batch里随机去选图像，从而不去考虑模态的差异，直接优化intra-和inter-modality learning.