论文阅读: Channel Augmented Joint Learning for Visible-Infrared Recognition
code: https://gitee.com/mindspore/contrib/tree/master/papers/CAJ
动机:
现有的图像增广策略主要针对单模态的可见光图像,没有考虑可见光-红外图像匹配时的图像特性。
主要工作:
- 数据增广: 通过随机交换颜色通道生成与颜色无关的图像,能和现有增广方法相结合,增强对颜色变化的鲁棒性。——模拟随机遮挡,丰富了图像的多样性。
- 针对跨模态度量学习,提出Channel-mixed learning strategy,利用平方差,同时处理类内和类间变化; 进一步提出channel-augmented joint learning strategy:明确优化增广图像的输出。
图像增广
- Random Channel Exchangeable Augmentation
该增广方法可以被理解为:均匀产生可见光图像的三个通道。这样鼓励模型去学习每个颜色通道与单通道可见光图像见的关系。 - Channel-Level Random Erasing (CRE)
替换成为:从ImageNet中获取的 R, G and B channels 的均值。 - 另外,也采用grayscale trasformation(GA),random horizontal flip (FP)
代码如下:
from __future__ import absolute_import
import random
import math在这里插入图片描述
class ChannelAdap():
""" Adaptive selects a channel or two channels.
Args:
probability: The probability that the Random Erasing operation will be performed.
sl: Minimum proportion of erased area against input image.
sh: Maximum proportion of erased area against input image.
r1: Minimum aspect ratio of erased area.
mean: Erasing value.
"""
def __init__(self, probability=0.5):
self.probability = probability
def __call__(self, img):
# if random.uniform(0, 1) > self.probability:
# return img
idx = random.randint(0, 3)
if idx == 0:
# random select R Channel
img[1, :, :] = img[0, :, :]
img[2, :, :] = img[0, :, :]
elif idx == 1:
# random select B Channel
img[0, :, :] = img[1, :, :]
img[2, :, :] = img[1, :, :]
elif idx == 2:
# random select G Channel
img[0, :, :] = img[2, :, :]
img[1, :, :] = img[2, :, :]
else:
img = img
return img
class ChannelAdapGray():
""" Adaptive selects a channel or two channels.
Args:
probability: The probability that the Random Erasing operation will be performed.
sl: Minimum proportion of erased area against input image.
sh: Maximum proportion of erased area against input image.
r1: Minimum aspect ratio of erased area.
mean: Erasing value.
"""
def __init__(self, probability=0.5):
self.probability = probability
def __call__(self, img):
# if random.uniform(0, 1) > self.probability:
# return img
idx = random.randint(0, 3)
if idx == 0:
# random select R Channel
img[1, :, :] = img[0, :, :]
img[2, :, :] = img[0, :, :]
elif idx == 1:
# random select B Channel
img[0, :, :] = img[1, :, :]
img[2, :, :] = img[1, :, :]
elif idx == 2:
# random select G Channel
img[0, :, :] = img[2, :, :]
img[1, :, :] = img[2, :, :]
else:
if random.uniform(0, 1) > self.probability:
# return img
img = img
else:
tmp_img = 0.2989 * img[0, :, :] + 0.5870 * img[1, :, :] + 0.1140 * img[2, :, :]
img[0, :, :] = tmp_img
img[1, :, :] = tmp_img
img[2, :, :] = tmp_img
return img
class ChannelRandomErasing():
""" Randomly selects a rectangle region in an image and erases its pixels.
'Random Erasing Data Augmentation' by Zhong et al.
Args:
probability: The probability that the Random Erasing operation will be performed.
sl: Minimum proportion of erased area against input image.
sh: Maximum proportion of erased area against input image.
r1: Minimum aspect ratio of erased area.
mean: Erasing value.
"""
def __init__(self, probability=0.5, sl=0.02, sh=0.4, r1=0.3):
self.probability = probability
self.mean = [0.4914, 0.4822, 0.4465]
self.sl = sl
self.sh = sh
self.r1 = r1
def __call__(self, img):
if random.uniform(0, 1) > self.probability:
return img
for _ in range(100):
area = img.shape[1] * img.shape[2]
target_area = random.uniform(self.sl, self.sh) * area
aspect_ratio = random.uniform(self.r1, 1/self.r1)
h = int(round(math.sqrt(target_area * aspect_ratio)))
w = int(round(math.sqrt(target_area / aspect_ratio)))
if w < img.shape[2] and h < img.shape[1]:
x1 = random.randint(0, img.shape[1] - h)
y1 = random.randint(0, img.shape[2] - w)
if img.shape[0] == 3:
img[0, x1:x1+h, y1:y1+w] = self.mean[0]
img[1, x1:x1+h, y1:y1+w] = self.mean[1]
img[2, x1:x1+h, y1:y1+w] = self.mean[2]
# TODO when will img.shape != 3
else:
img[0, x1:x1+h, y1:y1+w] = self.mean[0]
return img
return img
class ChannelExchange():
""" Adaptive selects a channel or two channels.
Args:
probability: The probability that the Random Erasing operation will be performed.
sl: Minimum proportion of erased area against input image.
sh: Maximum proportion of erased area against input image.
r1: Minimum aspect ratio of erased area.
mean: Erasing value.
"""
def __init__(self, gray=2):
self.gray = gray
def __call__(self, img):
idx = random.randint(0, self.gray)
if idx == 0:
# random select R Channel
img[1, :, :] = img[0, :, :]
img[2, :, :] = img[0, :, :]
elif idx == 1:
# random select B Channel
img[0, :, :] = img[1, :, :]
img[2, :, :] = img[1, :, :]
elif idx == 2:
# random select G Channel
img[0, :, :] = img[2, :, :]
img[1, :, :] = img[2, :, :]
else:
tmp_img = 0.2989 * img[0, :, :] + 0.5870 * img[1, :, :] + 0.1140 * img[2, :, :]
img[0, :, :] = tmp_img
img[1, :, :] = tmp_img
img[2, :, :] = tmp_img
return img
跨模态度量学习
1. Enhanced Channel-Mixed Learning
构建一个包括不同模态的图像,不去考虑模态的差异进行直接优化它们的关系。优化身份损失和 weighted regularization triplet loss(加权规则化的triplet loss)。
值得注意的是:pj和pk可以来自统一模态,也可以来自不同模态。——这就是作者提出mixed的含义吧,就是从混合模态组成的batch里随机去选图像,从而不去考虑模态的差异,直接优化intra-和inter-modality learning.
这里的d是欧式距离:
加权策略通过自适应考虑每个Triplet的贡献,增加困难样本的贡献(具有较大/较小距离的正/负对), 从而能够充分利用batch中的所有三元组。
Enhanced Squared Difference
常用的公式是L1L1L1,本文采用増广的平方差。
作者通过将函数曲线进行展示分析这样做的好处:
实验效果
**2. Channel-Augmented Joint Learning **
明确将通道増广图像看成一个辅助模态,这样一个batch中同时包含可见光RGB图像,通道増广图像,和红外图像。这样使得Batch增大,和之前一样,共享分类和度量学习模型。作者尝试采用不同的模型,但并未获得较好的结果。
实 验
-
分析了各种増广方法的效果
-
分析了平方距离的性能
-
分析不同学习策略的性能
-
与其他方法的性能比较