2301.02830v4
2301.02830v4
divided into two main categories. Each of them is discussed (iii) Shearing: Shearing data augmentation involves shifting
below: one part of an image in one direction, while the other
Geometric Data Augmentation: Geometric data augmen- part is shifted in the opposite direction. This technique
tation encompasses modifications to the geometric attributes can provide a new and diverse perspective on the data,
of an image, including its position, orientation, and aspect thereby improving the robustness of a model. However,
ratio. This technique involves transforming the arrangement of excessive shearing can cause significant deformation of
pixels within an image through a variety of techniques such the image, making it difficult for the model to accurately
as rotation, translation, and shearing. Figure 3 illustrates the recognize the objects within it. It is therefore important
most commonly employed geometric augmentations. These to consider the amount of shearing applied to the data
methods are widely used in the domain of computer vision carefully in order to avoid over-augmenting the images
to diversify the training data and improve the resilience of and introducing unwanted noise. In this way, shearing can
models to diverse transformations. The utilization of geometric be a powerful tool for enhancing the generalization ability
data augmentation has become a critical component in the of computer vision models, while avoiding the potential
development of robust computer vision algorithms. Each of drawbacks of over-augmentation. For example, applying
the geometric data augmentations is discussed below: excessive shearing on cat image during data augmentation
may result in a distorted, stretched appearance, hindering
(i) Rotation: Rotation data augmentation involves rotating the ability of a model to correctly classify the image as
an image by a specified angle within the range of 0 to 360 a cat. It is crucial to find a balance between the amount
degrees. The precise degree of rotation is a hyperparame- of shearing applied and the desired level of diversity, as
ter that requires careful consideration based on the nature excessive shearing can introduce significant noise.
and characteristics of the dataset. For instance, in the
Non-Geometric Data Augmentations The non-geometric
MNIST [27] dataset, rotating all digits by 180 degrees,
data augmentation category focuses on modifications to the
transforming a right-rotated 6 results into a 9, would not
visual characteristics of an image, as opposed to its geomet-
be a meaningful transformation. Therefore, a thorough
ric shape. This includes techniques such as noise injection,
understanding of the dataset is necessary to determine the
flipping, cropping, resizing, and color space manipulation, as
optimal degree of rotation and achieve the best results.
illustrated in Figure 4. These techniques can help improve the
(ii) Translation: Translation data augmentation involves
generalization performance of a model by exposing it to a
shifting an image in any of the upward, downward, right,
wider variety of image variations during training. However, it
or left directions, as illustrated in Figure 3, in order
is important to consider the trade-off between augmenting the
to provide a more diverse representation of the data.
data and preserving the integrity of the underlying information
The magnitude of this type of augmentation must be
in the image. The following section outlines several classical
selected with caution, as an excessive shift can result in
non-geometric data augmentation approaches.
a substantial change in the appearance of the image. For
example, translating a digit 8 to the left by half the width (i) Flipping: Flipping is a type of image data augmentation
of the image could result in an augmented image that technique that involves flipping an image either horizon-
resembles the digit 3. Hence, it is imperative to consider tally or vertically. The efficacy of this method has been
the nature of the dataset when determining the magnitude demonstrated on various widely-used datasets, including
of the translation augmentation to ensure its efficacy. cifar10 and cifar100 [74]. However, care must be taken
Image Data Augmentations
Basic Image Data Augmentations Advanced Image Data Augmentations
Image Manipulation Image Erasing Image Mixing Auto Augment Feature Augmentation Neural Style Transfer
Geometric Erasing Single Image Mixing Reinforcement Feature Aug Neural Style
Manipulation Learning Based Transfer
• Local Augment • AutoAugment • FeatMatch • STaDA
• Rotation • Cutout • Self-Aug • Fast AutoAug • Feature Space • Style Aug
• Translation • Random • SalfMix • Faster AutoAug (FS) Aug • StyPath
• Shearing Erasing • KeepAugment • Local Patch • Dataset Aug in And many more
• Hide-and-Seek • CutThumbnail with RL FS
• GridMask And many more And many more
Non-Geometric
Manipulation Multi-Images Mixing
Non-Reinforcement
Learning Based
• Flipping • Mixup
• Cropping • CutMix • RandAug
• Noise injection • SaliencyMix • ADA
• Color Space • RSMDA And many more
• Jitter • PuzzleMix
• Kernel • SnapMix
And many more
Fig. 2. Image data augmentation taxonomy. Note: All image data augmentation names are not added in this taxonomy due space limit. However, all relevant and remaining image data augmentations are
discussed as per taxonomy. The remaining sub-type of categories are discussed in the text.
Channels (C). By altering the values of each channel
separately, this technique can prevent a model from
becoming biased towards specific lighting conditions. The
most straightforward approach to perform color space
augmentation involves replacing a single channel within
the image with a randomly generated channel of the
same size, or with a channel filled with either 0 or 255.
The utilization of color space manipulation is commonly
observed in photo editing applications, where it is used
to adjust the brightness or darkness of the image [123].
Fig. 15. This image shows an example of reduced images that are called
thumbnails. After reducing the image to a certain size of 112×112 or 56×56,
The dog is still recognizable even though lots of local details are lost,
courtesy [156].
Fig. 22. Example masks and mixed images from CIFAR-10 for FMix,
example is from [52].
Fig. 23. This image shows the overview of MixMo augmentation, the image
is taken from [112].
Fig. 26. Diagram of the label guessing process used in MixMatch, cour-
tesy [10].
Fig. 28. This image shows the procedure of FixMatch, image is taken from
[130].
Fig. 38. ObjectAug can perform various augmentation methods for each
object to boost the performance of semantic segmentation. The left husky
is scaled and shifted, while the right one is flipped and shifted. Thus,
the boundaries between objects are extensively augmented to boost their
performance, the example is from [164].
Fig. 45. Example of scale-aware search space which includes image level
and box-level augmentation, the example is from, [18].
Fig. 50. Overview of Robust and Accurate Object detection via adversarial
learning. In the top image, it improves object detector accuracy on clean im-
ages. In middle, improves the detector’s robustness against natural corruption,
and at the bottom, it improves the robustness against cross-dataset domain
shift. The image is taken from [18].
Fig. 55. Overview of the original image and two stylized images by STaDA.
Image is taken from [169].
Fig. 58. Overview of generating synthetic COVID images from the healthy
category. As the no of epochs grows the quality of the synthetic images
improves. An example is from [58].
(iv) A Neural Algorithm of Artistic Style : This work [42] Fig. 59. Overview of the styled image by the neural algorithm. Image is
from [42].
introduces an artificial system (AS) based on a deep
neural network that generates artistic images of high
perceptual quality. AS creates neural embedding then it A. Image Classification
uses the embedding to separate the style and content of
the image and then recombines the content and style of In this section, we present the result of several SOTA
target images to generate the artistic image. The sample data augmentation methods for supervised learning and semi-
is shown in figure 59 supervised learning. Both are discussed below:
(v) Neural Style Transfer as Data Augmentation for 1) supervised learning results: In supervised learning, we
Improving COVID-19 Diagnosis Classification : This have data on a large quantity that is fully labeled and we use
work [58] shows the effectiveness of a cycle GAN, which this data to train the neural network (NN) model. In this sec-
is mostly used for neural style transfer, augments COVID- tion, we compile and compare the results from several SOTA
19 negative x-ray image to convert into a positive COVID data augmentation methods and put them in two different ta-
image to balance the dataset and also to increase the bles as shown in table II-B5 and table II. In table II-B5 results,
+
diversity of the dataset. It shows that augmenting the sign shows traditional data augmentations such as flipping,
images with cycle GAN can improve performance over rotating, and cropping, have been used along with the SOTA
several different CNN architectures. A sample of this augmentation methods. The used datasets are CIFAR10 [74],
augmentation is shown in figure 58. CIFAR100 [74] and ImageNet [26], and the used networks
are wideresnet flavours [55], pyramid network flavours and
several popular resnet flavours [55]. Accuracy is the evaluation
metric used to compare the different algorithms used. The
III. R ESULTS higher the accuracy, the better. As it can be in table II-B5 and
table II, each data augmentation has significantly improved the
In this section, we provide the detailed result for various accuracy.
Computer Vision tasks such as image classification, object 2) Semi-supervised learning: Semi-supervised learning
detection, and semantic segmentation. The main purpose is (SSL) is when we have a limited labeled data but unlabeled
to show the effect of the data augmentation in CV different data is available on the large scale. Labeling the unlabeled
tasks and to do so, we compile results from various SOTA data is tedious, time-consuming, and costly [79], [155]. To
data augmentation works. avoid these issues, SSL is used. There are several techniques
Accuracies
Method CIFAR10 CIFAR10+ CIFAR100 CIFAR100+
ResNet-18 (Baseline) 89.37 95.28 63.32 77.54
ResNet-18 + CutOut 90.69 96.25 65.02 80.58
ResNet-18 + Random Erasing 95.28 95.32 - -
ResNet-18 + CutMix 90.56 96.22 65.58 80.58
ResNet-18 + SaliencyMix 92.41 96.35 71.27 80.71
ResNet-18 + GridMask 95.28 96.54 - -
ResNet-50 (Baseline) 87.86 95.02 63.52 78.42
ResNet-50 + CutOut 91.16 96.14 67.03 78.62
ResNet-50 + CutMix 90.84 96.39 68.35 81.28
ResNet-50 + SaliencyMix 93.19 96.54 75.11 81.43
WideResNet-28-10 (Baseline) [141] 93.03 96.13 73.94 81.20
WideResNet-28-10 + CutOut [29] 94.46 96.92 76.06 81.59
WideResNet-28-10 + Random Erasing 96.2 96.92 81.59 82.27
WideResNet-28-10 + GridMask 96.13 97.24 - -
WideResNet-28-10 + CutMix 94.82 97.13 76.79 83.34
WideResNet-28-10 + PuzzleMix - - - 83.77
WideResNet-28-10 + SaliencyMix 95.96 97.24 80.55 83.44
Note: + sign after dataset name shows
that traditional data augmentation methods have been used
TABLE I
BASELINE PERFORMANCE COMPARISON OF VARIOUS AUGMENTATION ON CIFAR10 AND CIFAR100 DATASETS .
of SSL, but recently, data augmentation is employed with used in several research papers. In table IX and table X,
the limited labeled data to increase the diversity of the data. we compiled the effectiveness of validation set results on the
Data augmentation with SSL has increased the performance different datasets with the effect of SOTA data augmentations
on different datasets and NN architectures. The used dataset on the semantic segmentation task. The results are reported
are CIFAR10, CIFAR100, SVHn [103] and Mini-ImageNet. in the term of mean intersection over union (mIoU) as the
Several SSL techniques are used such as pseudoLabel, SSL accuracy on the Cityscape dataset and PASCAL VOC dataset
with memory, label propagation, mean teacher, etc. We com- as shown in table IX and table X, respectively. We found
pile the results from many SOTA SSL methods with data performance gains on a few metrics such as mIoU and mAP,
augmentation and present them in this work. The effect of with several semantic segmentation models: deeplabv3+ [160],
the data augmentation has also been shown with the different DeepLab-v2 [104], Xception-65 [160], ExFuse [166] and
number of samples in SSL as shown in table III, table IV, and Eff-L2 [172] . It has been observed that incorporating data
table V. augmentation techniques can enhance the performance of
semantic segmentation models. Notably, advanced image data
B. Object detection augmentation methods have demonstrated greater improve-
In this section, we discuss the effectiveness of various ments in performance compared to traditional techniques.
image data augmentation techniques on the frequently used Table IX and table X provide evidence of this improvement.
COCO2017 [92], PASCAL VOC [35], VOC 2007 [33], and The traditional data augmentations including rotation, scaling,
VOC 2012 [34] datasets, which are commonly used for object flipping, and shifting [164].
detection tasks. We compile results from various SOTA data
augmentation methods and put them in three different tables as IV. D ISCUSSION AND FUTURE DIRECTIONS
shown in the table II-B5, VII, and VIII. FRCNN along with
synthetic data gives the best mAP accuracy on VOC 2007 A. Current approaches
dataset as shown in table VII. Several classical and automatic It is proven that if we provide more data to the model,
data augmentation methods have shown promising perfor- it improves model performance [50], [136]. A few current
mance using different SOTA models on the PASCAL VOC tendencies are discussed by Xu et al. [157]. Among these,
dataset as shown in table II-B5. The DetAdvProp achieves the one way is to collect the data and label it manually, but it
highest score outperforming AutoAugment [23] on PASCAL is not an efficient way to do this. Another efficient way is
VOC 2012 dataset as shown in the table VIII. The scores are to apply data augmentation, the more data augmentations we
in terms of mean average precision (mAP), average precision apply, the better improvement we get in terms of performance
(AP) at the intersection over union (IOU) of 0.5 (AP50), and but to a certain extent. Currently, image mixing methods and
AP at IOU of 0.75 (AP75) metrics. autoaugment methods are successful for image classification
tasks, scale aware based auto augment methods are showing
C. Semantic Segmentation promising results in detection tasks and semantic segmentation
This subsection includes semantic segmentation results on tasks. But these data augmentation performances can vary
PASCAL VOC and CITYSCAPES datasets, most frequently with the number of data augmentation applied, as it is known
CIFAR-10 CIFAR-100 ImageNet
Augmentation Accuracy (%) Model Accuracy (%) Model Accuracy (%) Model
Cutout [29] 97.04 WRN-28-10 81.59 WRN-28-10 77.1 ResNet-50
Random Erasing [170] 96.92 WRN-28-10 82.27 WRN-28-10 - -
Hide-and-Seek [129] 95.53 ResNet-110 78.13 ResNet-110 77.20 ResNet-50
GridMask [15] 97.24 WRN-28-10 - - 77.9 ResNet-50
LocalAugment [71] - - 95.92 WRN-22-10 76.87 ResNet-50
SalfMix [20] 96.62 PreActResNet-101 80.11 PreActResNet-101 - -
KeepAugment [47] 97.8 ResNet-28-10 - - 80.3 ResNet-101
Cut-Thumbnail [156] 97.8 ResNet-56 95.94 WRN-28-10 79.21 ResNet-50
MixUp [163] 97.3 WRN-28-10 82.5 WRN-28-10 77.9 ResNet-50
CutMix [162] 97.10 WRN-28-10 83.40 WRN-28-10 78.6 ResNet-50
SaliencyMix [141] 97.24 WRN-28-10 83.44 WRN-28-10 78.74 ResNet-50
PuzzleMix [70] - - 84.05 WRN-28-10 77.51 ResNet-50
FMix [52] 98.64 Pyramid 83.95 Dense 77.70 ResNet-101
MixMo [112] 96.38 WRN-28-10 82.40 WRN-28-10 - -
StyleMix [59] 96.44 PyramidNet-200 85.83 PyramidNet-200 77.29 PyramidNet-200
RandomMix [94] 98.02 WRN-28-10 84.84 WRN-28-10 77.88 WRN-28-10
MixMatch [10] 95.05 WRN-28-10 74.12 WRN-28-10 - -
ReMixMatch [9] 94.71 WRN-28-2 - - - -
FixMatch [130] 95.69 WRN-28-2 77.04 WRN-28-2 - -
AugMix [56] - - - - 77.6 ResNet-50
Improved Mixed-Example [135] 96.02 ResNet-18 80.3 ResNet-18 - -
RICAP [137] 97.18 WRN-28-10 82.56 ResNet-28-10 78.62 WRN-50-2
ResizeMix [111] 97.60 WRN-28-10 84.31 WRN-28-10 79.00 ResNet-50
AutoAugment [23] 97.40 WRN-28-10 82.90 WRN-28-10 83.50 AmoebaNet-C
Fast AutoAugment [90] 98.00 SS(26 2×96d) 85.10 SS(26 2×96d) 80.60 ResNet-200
Faster AutoAugment [53] 98.00 SS(26 2 × 112d) 84.40 SS(26 2×96d) 75.90 ResNet-50
Local Patch AutoAugment [91] 98.10 SS(26 2 × 112d) 85.90 SS(26 2×96d) 81.00 ResNet-200
RandAugment [24] 98.50 PyramidNet 83.30 WRN-28-10 85.00 EfficientNet-B7
TABLE II
P ERFORMANCE COMPARISON OF THE VARIOUS IMAGE ERASING AND IMAGE MIXING AUGMENTATIONS FOR IMAGE CLASSIFICATION PROBLEMS . WRN
AND SS STAND FOR W IDE R ES N ET AND S HAKE -S HAKE , RESPECTIVELY.
that the combined data augmentation methods show better to find the optimal number of sample generation [78]. But it
performance than single one [108], [158]. is not feasible way as it requires time and computational cost.
Can we devise a mechanism to find an optimal number of
B. Theoretical aspects
samples, which is an open research challenge?
There is no theoretical support available to explain why
specific augmentation is improving performance and which D. Selection of data augmentation based on model archi-
sample(s) should be augmented, as the same aspect has been tecture and dataset
discussed by Yang et al [158] and Shorten et al [123]. Like
Data augmentation selection depends on the nature of the
in random erasing, we randomly erase the region of the
dataset and model architecture. Like on MNIST [27] dataset,
image - sometime may erase discriminating features, and the
geometric transformations are not safe such as rotation on 6
erased image makes no sense to a human. But the reason
and 9 digits will no longer preserve the label information.
behind performance improvement is still unknown, which is
For densely parameterized CNN, it is easy to overfit weakly
another open challenge. Most of the time, we find the optimal
augmented datasets, and for shallow parameterized CNN, it
parameters of the augmentation through an extensive number
may break generalization capability with data augmentation.
of experiments or we choose data augmentation based on our
It suggests, while selecting the data augmentation, the nature
experience. But there should be a mechanism for choosing the
of the dataset and model architecture should be taken into
data augmentation with theoretical support considering model
account. Currently, numerous experiments are performed to
architecture and dataset size. Researching the theoretical as-
find model architecture and suitable data augmentation for a
pect is another open challenge for the research community.
specific dataset. Devising a systematic approach to select the
C. Optimal number of samples generation data augmentation based on dataset and model architecture is
It is a known fact, as we increase data size, it improves another gap to be filled.
the performance [50], [123], [136], [158] but it is not a case
- increasing the number of samples will not improve perfor- E. Augmentations for spaces
mance after a certain number of samples [78]. What is the Most of the data augmentation approaches have been ex-
optimal number of samples to be generated, depending on the plored on the image level - data space. Very few research
model architecture and dataset size, is a challenging aspect to works have explored data on feature level - feature space. The
be explored. Currently, researchers perform many experiments challenge here arises, in which space should we apply data
TABLE III
C OMPARISON ON CIFAR-10 AND SVHN. T HE NUMBER REPRESENTS ERROR RATES ACROSS THREE RUNS .
CIFAR-10 SVHN
Method 40 labels 250 labels 1,000 labels 4,000 labels 40 labels 250 labels 1,000 labels 4,000 labels
VAT [101] - 36.03 ± 2.82 18.64 ± 0.40 11.05 ± 0.31 - 8.41 ± 1.01 5.98 ± 0.21 4.20 ± 0.15
Mean Teacher [138] - 47.32 ± 4.71 17.32±4.00 10.36±0.25 - 6.45±2.43 3.75±.10 3.39±0.11
MixMatch [10] 47.54±11.50 11.08±.87 7.75±.32 6.24±.06 42.55±14.53 3.78±.26 3.27±.31 2.89±.06
ReMixMatch [9] 19.10±9.64 6.27±0.34 5.73±0.16 5.14±0.04 3.34±0.20 3.10±0.50 2.83±0.30 2.42±0.09
UDA 29.05±5.93 8.76± 0.90 5.87± 0.13 5.29± 0.25 52.63±20.51 2.76± 0.17 2.55± 0.09 2.47± 0.15
SSL with Memory [17] - - - 11.9±0.22 - 8.83 4.21 -
Deep Co-Training [110] - - - 8.35± 0.06 - - 3.29 ±0.03 -
Weight Averaging [5] - - 15.58 I 0.12 9.05± 0.21 - - - -
ICT [142] - - 15.48 I 0.78 7.29± 0.02 - 4.78 I 0.68 3.89 ±0.04 -
Label Propagation [64] - - 16.93 ± 0.70 10.61 ± 0.28 - - - -
SNTG [96] - - 18.41 ± 0.52 9.89 ±0.34 - 4.29± 0.23 3.86 ±0.27 -
PLCB [4] - - 6.85 ±0.15 5.97± 0.15 - - - -
II-model [120] - 53.02 ±2.05 31.53 ± 0.98 17.41± 0.37 - 17.65 ±0.27 8.60± 0.18 5.57± 0.14
PseudoLabel [85] - 49.98 ±1.17 30.91 ±1.73 16.21 ± 0.11 - 21.16± 0.88 10.19 ± 0.41 5.71± 0.07
Mixup [163] - 47.43 ± 0.92 25.72 ± 0.66 13.15 ± 0.20 - 39.97 ± 1.89 16.79 ± 0.63 7.96 ±0.14
FeatMatch [81] - 7.50 ±0.64 5.76 ±0.07 4.91± 0.18 - 3.34± 0.19 3.10± 0.06 2.62 ±0.08
FixMatch [130] 13.81±3.37 5.07±0.65 - 4.26±0.05 3.96±2.17 2.48±0.38 2.28±0.11 -
SelfMatch [69] 93.19±1.08 95.13±0.26 - 95.94±0.08 96.58±1.02 97.37±0.43 97.49±0.07 -
TABLE IV
C OMPARISON ON CIFAR-100 AND MINI -I MAGE N ET. T HE NUMBER REPRESENTS ERROR RATES ACROSS TWO RUNS .
CIFAR-100 mini-ImageNet
Method 400 labels 4,000 labels 10,000 labels 4,000 labels 10,000 labels
II-model [120] - - 39.19± 0.36 - -
SNTG [96] - - 37.97± 0.29 - -
SSL with Memory [17] - - 34.51± 0.61 - -
Deep Co-Training [110] - - 34.63± 0.14 - -
Weight Averaging [5] - - 33.62± 0.54 - -
Mean Teacher [138] - 45.36 ±0.49 36.08± 0.51 72.51± 0.22 57.55 ± 1.11
Label Propagation [64] - 43.73 ±0.20 35.92 ±0.47 70.29± 0.81 57.58 ±1.47
PLCB [4] - 37.55 ±1.09 32.15 ±0.50 56.49 ±0.51 46.08 ± 0.11
FeatMatch - 31.06 ± 0.41 26.83 ± 0.04 39.05 0.06 34.79±0.22
MixMatch 67.61±1.32 - 28.31±0.33 - -
UDA 59.28±0.88 - 24.50±0.25 - -
ReMixMatch 44.28±2.06 - 23.03±0.56 - -
FixMatch 48.85±1.75 - 22.60±0.12 - -
augmentation, data space, or feature space? It is another inter- smoothing for image manipulation and image erasing
esting aspect that can be explored. For the current approaches, subcategories - where the image part is lost. For example,
it seems like it depends on the dataset, model architecture, and if the image portion is randomly cut out in cutout data
task. Currently, approaches are conducting experiments in data augmentation, the corresponding label should be mixed.
space and feature space and then selecting the best one [154]. It is an interesting open research question.
It is not the optimal way to find data augmentation for specific • Currently, data augmentation is performed without con-
space. It is still an open challenge to be solved. sidering the importance of an example. All examples
may not be difficult for the neural network to learn,
F. Open research questions but some are. Thus, augmentation should be applied to
Despite the success of data augmentation techniques in dif- those difficult examples by measuring the importance
ferent Computer Vision tasks, it still failed to solve challenges of the examples. How neural network behave if data
in SOTA data augmentation techniques. After thoroughly augmentation is applied to those difficult examples?
reviewing SOTA data augmentation approaches, we found • In image mixing data augmentations, if we mix more
several challenges and difficulties, which are yet to be solved, than two images salient parts, that are truly participating
as it is listed below: in augmentation unlike RICAP [137], what is its effect
• In image mixing techniques, label smoothing has been
in terms of accuracy and robustness against adversarial
used. It makes sense whatever portion of images is mixed, attacks? Note, the corresponding labels of these images
corresponding labels should be mixed accordingly. To will be mixed accordingly.
the best of our knowledge, none has explored label • In random data augmentation under the auto augmen-
TABLE V
C OMPARISON OF TEST ERROR RATES ON CIFAR-10 & SVHN USING W IDE R ES N ET-28 AND CNN-13.
tation category, the order of augmentations has not been have the potential to further advance the field. This survey is
explored. We believe it has a significant importance. What expected to benefit researchers in several ways: (i) a deeper
are the possible ways to explore the order of existing understanding of data augmentation, (ii) the ability to easily
augmentations such as first traditional data augmentations compare results, and (iii) the ability to reproduce results with
and then image mixing or weight-based? available code.
• Finding an optimal and an ordered number of data
augmentation, and the optimal number of samples to be ACKNOWLEDGMENT
augmented are open challenges. For example, in randAug
This research was supported by Science Foundation Ireland
method, there are N optimal number of augmentations
under grant numbers 18/CRT/6223 (SFI Centre for Research
found but it is not known how many, in which order and
Training in Artificial intelligence), SFI/12/RC/2289/P 2
what samples should be augmented?
(Insight SFI Research Centre for Data Analytics),
V. C ONCLUSION 13/RC/2094/P 2 (Lero SFI Centre for Software) and
13/RC/2106/P 2 (ADAPT SFI Research Centre for AI-
This survey provides a comprehensive overview of state-of- Driven Digital Content Technology). For the purpose of Open
the-art (SOTA) data augmentation techniques for addressing Access, the author has applied a CC BY public copyright
overfitting in computer vision tasks due to limited data. A licence to any Author Accepted Manuscript version arising
detailed taxonomy of image data augmentation approaches from this submission.
is presented, along with an overview of each SOTA method
and the results of its application to various computer vision R EFERENCES
tasks such as image classification, object detection, and seman-
tic segmentation. The results for both supervised and semi- [1] Jiwoon Ahn, Sunghyun Cho, and Suha Kwak. Weakly supervised
learning of instance segmentation with inter-pixel relations. In Pro-
supervised learning are also compiled for easy comparison ceedings of the IEEE/CVF conference on computer vision and pattern
purposes. In addition, the available code for each data augmen- recognition, pages 2209–2218, 2019.
tation approach is provided to facilitate result reproducibility. [2] Jiwoon Ahn and Suha Kwak. Learning pixel-level semantic affinity
with image-level supervision for weakly supervised semantic segmen-
The difficulties and challenges of data augmentation are also tation. In Proceedings of the IEEE conference on computer vision and
discussed, along with promising open research questions that pattern recognition, pages 4981–4990, 2018.
Method Detector BackBone AP AP5 0 AP7 5 APs APm APl
Hand-crafted:
Dropblock [44] RetinaNet ResNet-50 38.4 56.4 41.2 − − −
AutoAugment+color Ops [171] RetinaNet ResNet-50 37.5 - - − − −
geometric Ops [171] RetinaNet ResNet-50 38.6 - - − − −
bbox-only Ops [171] RetinaNet ResNet-50 39.0 - - − − −
Mix-up [167] Faster R-CNN ResNet-101 41.1 - - - - -
PSIS* [144] Faster R-CNN ResNet-101 40.2 61.1 44.2 22.3 45.7 51.6
Stitcher [19] Faster R-CNN ResNet-101 42.1 - - 26.9 45.5 54.1
GridMask [15] Faster R-CNN ResNeXt-101 42.6 65.0 46.5 - - -
InstaBoost* [37] Mask R-CNN ResNet-101 43.0 64.3 47.2 24.8 45.9 54.6
SNIP (MS test)* [127] Faster R-CNN ResNet-101-DCN-C4 44.4 66.2 49.9 27.3 47.4 56.9
SNIPER (MS test)* [128] Faster R-CNN ResNet-101-DCN-C4 46.1 67.0 51.6 29.6 48.9 58.1
Traditional Aug [158] Faster R-CNN ResNet-101 36.80 58.0 40.0 - - -
Traditional Aug* [31] CenterNet ResNet-101 41.15 58.01 45.30 - - -
Traditional Aug+ [15] Faster-RCNN 50-FPN (2×) 37.4 58.7 40.5 - - -
Traditional Aug+ [15] Faster-RCNN 50-FPN (2×)+GridMask (p = 0.3) 38.2 60.0 41.4 - - -
Traditional Aug+ [15] Faster-RCNN 50-FPN (2×)+ GridMask (p = 0.5) 38.1 60.1 41.2 - - -
Traditional Aug+ [15] Faster-RCNN 50-FPN (2×)+ GridMask (p = 0.7) 38.3 60.4 41.7 - - -
Traditional Aug+ [15] Faster-RCNN 50-FPN (2×)+ GridMask (p = 0.9) 38.0 60.1 41.2 - - -
Traditional Aug+ [15] Faster-RCNN 50-FPN (4×) 35.7 56.0 38.3 - - -
Traditional Aug+ [15] Faster-RCNN 50-FPN (4×)+ GridMask (p = 0.7) 39.2 60.8 42.2 - - -
Traditional Aug+ [15] Faster-RCNN X101-FPN (1×)) 41.2 63.3 44.8 - - -
Traditional Aug+ [15] Faster-RCNN X101-FPN (2×)) 40.4 62.2 43.8 - - -
Traditional Aug+ [15] Faster-RCNN X101-FPN (2×)+ GridMask (p = 0.7)) 42.6 65.0 46.5 - - -
Traditional Aug+ [15] Faster-RCNN X101-FPN (2×)+ GridMask (p = 0.7)) 42.6 65.0 46.5 - - -
KeepAugment: [47] Faster R-CNN ResNet50-C4 39.5 − − − − −
KeepAugment: [47] Faster R-CNN ResNet50-FPN 40.7 − − − − −
KeepAugment: [47] RetinaNet ResNet50-FPN 39.1 − − − − −
KeepAugment: [47] Faster R-CNN ResNet101-C4 42.2 − − − − −
KeepAugment: [47] Faster R-CNN ResNet101-FPN 42.9 − − − − −
KeepAugment: [47] RetinaNet ResNet101-FPN 41.2 − − − − −
DADAAugment: [88] RetinaNet ResNet-50 35.9 55.8 38.4 19.9 38.8 45.0
DADAAugment: [88] RetinaNet ResNet-50(DADA) 36.6 56.8 39.2 20.2 39.7 46.0
DADAAugment: [88] Faster R-CNN ResNet-50 36.6 58.8 39.6 21.6 39.8 45.0
DADAAugment: [88] Faster R-CNN ResNet-50 (DADA) 37.2 59.1 40.2 22.2 40.2 45.7
DADAAugment: [88] Mask R-CNN ResNet-50 37.4 59.3 40.7 22.2 40.6 46.3
DADAAugment: [88] Mask R-CNN ResNet-50(DADA) 37.8 59.6 41.1 22.4 40.9 46.6
AutoAugment: [16] EfficientDet D0 EfficientNet B0 34.4 52.8 36.7 53.1 40.2 13.9
Det-AdvProp: [16] EfficientDet D0 EfficientNet B0 34.7 52.9 37.2 54.1 40.6 13.9
AutoAugment: [16] EfficientDet D1 EfficientNet B1 40.1 59.2 43.2 57.9 45.7 19.9
Det-AdvProp: [16] EfficientDet D1 EfficientNet B1 40.5 59.2 43.3 58.8 46.2 20.6
AutoAugment: [16] EfficientDet D2 EfficientNet B2 43.5 62.8 46.6 59.8 48.7 23.9
Det-AdvProp: [16] EfficientDet D2 EfficientNet B2 43.8 62.6 47.3 61.0 49.6 25.6
AutoAugment: [16] EfficientDet D3 EfficientNet B3 47.0 66.0 50.8 63.0 51.7 29.8
Det-AdvProp: [16] EfficientDet D3 EfficientNet B3 47.6 66.3 51.4 64.0 52.2 30.2
AutoAugment: [16] EfficientDet D4 EfficientNet B4 49.5 68.7 53.7 64.9 54.0 31.9
Det-AdvProp: [16] EfficientDet D4 EfficientNet B4 49.8 68.6 54.2 65.2 54.2 32.4
AutoAugment: [16] EfficientDet D5 EfficientNet B5 51.5 70.4 56.0 65.2 56.1 35.4
Det-AdvProp: [16] EfficientDet D5 EfficientNet B5 51.8 70.7 56.3 66.1 56.2 36.2
Automatic:
AutoAug-det [171] RetinaNet ResNet-50 39.0 - - - - -
AutoAug-det [171] RetinaNet ResNet-101 40.4 - - - - -
AutoAugment [23] RetinaNet ResNet-200 42.1 - - - - -
AutoAug-det’ [171] RetinaNet ResNet-50 40.3 60.0 43.0 23.6 43.9 53.8
RandAugmnet* [24] RetinaNet ResNet-200 41.9 - - - - -
AutoAug-det [171] RetinaNet ResNet-101 41.8 61.5 44.8 24.4 45.9 55.9
RandAug [24] RetinaNet ResNet-101 40.1 - - - - -
RandAug? [10] RetinaNet ResNet-101 41.4 61.4 44.5 25.0 45.4 54.2
Scale-aware AutoAug [18] RetinaNet ResNet-50 41.3 61.0 441 25.2 44.5 54.6
Scale-aware AutoAug RetinaNet ResNet-101 43.1 62.8 46.0 26.2 46.8 56.7
Scale-aware AutoAug Faster R-CNN ResNet-101 44.2 65.6 48.6 29.4 47.9 56.7
Scale-aware AutoAug (MS test) Faster R-CNN ResNet-101-DCN-C4 47.0 68.6 52.1 32.3 49.3 60.4
Scale-aware AutoAug FCOS ResNet-101 44.0 62.7 47.3 28.2 47.8 56.1
Scale-aware AutoAug FCOS ResNeXt-32x8d-101-DCN 48.5 67.2 52.8 31.5 51.9 63.0
Scale-aware AutoAug (1200 size) FCOS ResNeXt-32x8d-101-DCN 49.6 68.5 54.1 35.7 52.5 62.4
Scale-aware AutoAug (MS Test) ResNeXt-32x8d-101-DCN FCOS 51.4 69.6 57.0 37.4 54.2 65.1
TABLE VI
DATA AUGMENTATION EFFECT ON DIFFERENT OBJECT DETECTION METHODS USING PASCAL VOC DATASET
Method TSet mAP aero bike bird boat bottle bus car cat chair cow table dog horse mbike person plant sheep sofa train tv
FRCN [45] 7 66.9 74.5 78.3 69.2 53.2 36.6 77.3 78.2 82.0 40.7 72.7 67.9 79.6 79.2 73.0 69.0 30.1 65.4 70.2 75.8 65.8
FRCN* [148] 7 69.1 75.4 80.8 67.3 59.9 37.6 81.9 80.0 84.5 50.0 77.1 68.2 81.0 82.5 74.3 69.9 28.4 71.1 70.2 75.8 66.6
ASDN [148] 7 71.0 74.4 81.3 67.6 57.0 46.6 81.0 79.3 86.0 52.9 75.9 73.7 82.6 83.2 77.7 72.7 37.4 66.3 71.2 78.2 74.3
IRE 7 70.5 75.9 78.9 69.0 57.7 46.4 81.7 79.5 82.9 49.3 76.9 67.9 81.5 83.3 76.7 73.2 40.7 72.8 66.9 75.4 74.2
ORE 7 71.0 75.1 79.8 69.7 60.8 46.0 80.4 79.0 83.8 51.6 76.2 67.8 81.2 83.7 76.8 73.8 43.1 70.8 67.4 78.3 75.6
I+ORE 7 71.5 76.1 81.6 69.5 60.1 45.6 82.2 79.2 84.5 52.5 78.7 71.6 80.4 83.3 76.7 73.9 39.4 68.9 69.8 79.2 77.4
FRCN [45] 7+12 70.0 77.0 78.1 69.3 59.4 38.3 81.6 78.6 86.7 42.8 78.8 68.9 84.7 82.0 76.6 69.9 31.8 70.1 74.8 80.4 70.4
FRCN* [148] 7+12 74.8 78.5 81.0 74.7 67.9 53.4 85.6 84.4 86.2 57.4 80.1 72.2 85.2 84.2 77.6 76.1 45.3 75.7 72.3 81.8 77.3
IRE 7+12 75.6 79.0 84.1 76.3 66.9 52.7 84.5 84.4 88.7 58.0 82.9 71.1 84.8 84.4 78.6 76.7 45.5 77.1 76.3 82.5 76.8
ORE 7+12 75.8 79.4 81.6 75.6 66.5 52.7 85.5 84.7 88.3 58.7 82.9 72.8 85.0 84.3 79.3 76.3 46.3 76.3 74.9 86.0 78.2
I+ORE 7+12 76.2 79.6 82.5 75.7 70.5 55.1 85.2 84.4 88.4 58.6 82.6 73.9 84.2 84.7 78.8 76.3 46.7 77.9 75.9 83.3 79.3
SSD 7+12 77.4 81.7 85.4 75.7 69.6 49.9 84.9 85.8 87.4 61.5 82.3 79.2 86.6 87.1 84.7 78.9 50.0 77.4 79.1 86.2 76.3
SSD+ SD (1x) [145] 7+12 78.1 83.2 84.5 76.1 72.1 50.2 85.2 86.3 87.8 63.7 82.8 80.1 85.2 87.2 84.8 80.0 51.5 77.0 82.0 86.1 76.9
SSD + SD(2x) [145] 7+12 78.3 83.6 85.0 76.2 72.0 51.3 85.1 87.2 87.6 64.2 82.5 81.9 85.5 86.5 85.9 81.2 51.2 72.3 82.8 86.9 78.4
SSD +SD(3x) [145] 7+12 77.8 80.4 85.0 76.3 70.1 50.4 84.8 86.3 88.2 61.0 83.5 79.5 87.2 86.9 85.9 78.8 51.2 76.9 79.4 86.5 77.9
FRCN [45] 7+12 73.2 76.5 79.0 70.9 65.5 52.1 83.1 84.7 86.4 52.0 81.9 65.7 84.8 84.6 77.5 76.7 38.8 73.6 73.9 83.0 72.6
FRCN+SD(1x) [156] 7 79.9 85.1 86.6 78.6 75.7 65.2 83.5 88.4 88.9 65.8 83.6 74.3 86.4 84.7 85.5 88.0 62.0 75.5 75.3 87.7 76.3
TABLE VII
VOC 2007 TEST DETECTION AVERAGE PRECISION (%). FRCN* REFERS TO FRCN WITH TRAINING SCHEDULE IN [148] AND SD REFERS TO SYNTHETIC DATA
.
Model mAP AP50 AP75
[16] Xiangning Chen, Cihang Xie, Mingxing Tan, Li Zhang, Cho-Jui Hsieh,
EfficientDet-D0 55.6 77.6 61.4
and Boqing Gong. Robust and accurate object detection via adversarial
+ AutoAugment 55.7 (+0.1) 77.7 (+0.1) 61.8 (+0.4)
learning. In Proceedings of the IEEE/CVF Conference on Computer
+ Det-AdvProp 55.9 (+0.3) 77.9 (+0.3) 62.0 (+0.6)
Vision and Pattern Recognition, pages 16622–16631, 2021.
EfficientDet-D1 60.8 82.0 66.7
[17] Yanbei Chen, Xiatian Zhu, and Shaogang Gong. Semi-supervised deep
+ AutoAugment 61.0 (+0.2) 82.2 (+0.2) 67.2 (+0.5)
learning with memory. In Proceedings of the European conference on
+ Det-AdvProp 61.2 (+0.4) 82.3 (+0.3) 67.4 (+0.7)
computer vision (ECCV), pages 268–283, 2018.
EfficientDet-D2 63.3 83.6 69.3
[18] Yukang Chen, Yanwei Li, Tao Kong, Lu Qi, Ruihang Chu, Lei Li, and
+ AutoAugment 62.7 (-0.6) 83.3 (-0.3) 69.2 (-0.1)
Jiaya Jia. Scale-aware automatic augmentation for object detection.
+ Det-AdvProp 63.5 (+0.2) 83.8 (+0.2) 69.7 (+0.4)
In Proceedings of the IEEE/CVF Conference on Computer Vision and
EfficientDet-D3 65.7 85.3 71.8 Pattern Recognition, pages 9563–9572, 2021.
+ AutoAugment 65.2 (-0.5) 85.1 (-0.2) 71.3 (-0.5)
[19] Yukang Chen, Peizhen Zhang, Zeming Li, Yanwei Li, Xiangyu Zhang,
+ Det-AdvProp 66.2 (+0.5) 85.9 (+0.6) 72.5 (+0.7) Gaofeng Meng, Shiming Xiang, Jian Sun, and Jiaya Jia. Stitcher:
EfficientDet-D4 67.0 86.0 73.0 Feedback-driven data provider for object detection. arXiv preprint
+ AutoAugment 67.0 (+0.0) 86.3 (+0.3) 73.5 (+0.5) arXiv:2004.12432, 2(7):12, 2020.
+ Det-AdvProp 67.5 (+0.5) 86.6 (+0.6) 74.0 (+1.0) [20] Jaehyeop Choi, Chaehyeon Lee, Donggyu Lee, and Heechul Jung.
EfficientDet-D5 67.4 86.9 73.8 Salfmix: A novel single image-based data augmentation technique
+ AutoAugment 67.6 (+0.2) 87.2 (+0.3) 74.2 (+0.4) using a saliency map. Sensors, 21(24):8444, 2021.
+ Det-AdvProp 68.2 (+0.8) 87.6 (+0.7) 74.7 (+0.9) [21] Peng Chu, Xiao Bian, Shaopeng Liu, and Haibin Ling. Feature
TABLE VIII
space augmentation for long-tailed data. In European Conference on
R ESULTS ON PASCAL VOC 2012. T HE PROPOSED D ETA DV P ROP GIVES
Computer Vision, pages 694–710. Springer, 2020.
THE HIGHEST SCORE ON EVERY MODEL AND METRIC . I T LARGELY
OUTPERFORMS AUTOAUGMENT [23] WHEN FACING DOMAIN SHIFT.
[22] Pietro Antonio Cicalese, Aryan Mobiny, Pengyu Yuan, Jan Becker,
Chandra Mohan, and Hien Van Nguyen. Stypath: Style-transfer data
augmentation for robust histology image classification. In International
Conference on Medical Image Computing and Computer-Assisted
Intervention, pages 351–361. Springer, 2020.
[3] Sidra Aleem, Teerath Kumar, Suzanne Little, Malika Bendechache, [23] Ekin D Cubuk, Barret Zoph, Dandelion Mane, Vijay Vasudevan, and
Rob Brennan, and Kevin McGuinness. Random data augmentation Quoc V Le. Autoaugment: Learning augmentation strategies from data.
based enhancement: Ageneralized enhancement approach for medical In Proceedings of the IEEE/CVF Conference on Computer Vision and
datasets. 2022. Pattern Recognition, pages 113–123, 2019.
[4] Eric Arazo, Diego Ortego, Paul Albert, Noel E O’Connor, and Kevin [24] Ekin D Cubuk, Barret Zoph, Jonathon Shlens, and Quoc V Le.
McGuinness. Pseudo-labeling and confirmation bias in deep semi- Randaugment: Practical automated data augmentation with a reduced
supervised learning. In 2020 International Joint Conference on Neural search space. In Proceedings of the IEEE/CVF Conference on Com-
Networks (IJCNN), pages 1–8. IEEE, 2020. puter Vision and Pattern Recognition Workshops, pages 702–703, 2020.
[5] Ben Athiwaratkun, Marc Finzi, Pavel Izmailov, and Andrew Gordon [25] Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-
Wilson. Improving consistency-based semi-supervised learning with Fei. Imagenet: A large-scale hierarchical image database. In 2009
weight averaging. arXiv preprint arXiv:1806.05594, 2(9):11, 2018. IEEE conference on computer vision and pattern recognition, pages
[6] Soroush Baseri Saadi, Nazanin Tataei Sarshar, Soroush Sadeghi, 248–255. Ieee, 2009.
Ramin Ranjbarzadeh, Mersedeh Kooshki Forooshani, and Malika Ben- [26] Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-
dechache. Investigation of effectiveness of shuffled frog-leaping opti- Fei. Imagenet: A large-scale hierarchical image database. In 2009
mizer in training a convolution neural network. Journal of Healthcare IEEE Conference on Computer Vision and Pattern Recognition, pages
Engineering, 2022, 2022. 248–255, 2009.
[7] Markus Bayer, Marc-André Kaufhold, and Christian Reuter. A survey [27] Li Deng. The mnist database of handwritten digit images for machine
on data augmentation for text classification. ACM Computing Surveys, learning research. IEEE Signal Processing Magazine, 29(6):141–142,
2021. 2012.
[8] Sima Behpour, Kris M Kitani, and Brian D Ziebart. Ada: Adversarial [28] Terrance DeVries and Graham W Taylor. Dataset augmentation in
data augmentation for object detection. In 2019 IEEE Winter Confer- feature space. arXiv preprint arXiv:1702.05538, 2017.
ence on Applications of Computer Vision (WACV), pages 1243–1252. [29] Terrance DeVries and Graham W Taylor. Improved regulariza-
IEEE, 2019. tion of convolutional neural networks with cutout. arXiv preprint
[9] David Berthelot, Nicholas Carlini, Ekin D Cubuk, Alex Kurakin, arXiv:1708.04552, 2017.
Kihyuk Sohn, Han Zhang, and Colin Raffel. Remixmatch: Semi- [30] Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weis-
supervised learning with distribution alignment and augmentation an- senborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani,
choring. arXiv preprint arXiv:1911.09785, 2019. Matthias Minderer, Georg Heigold, Sylvain Gelly, et al. An image
[10] David Berthelot, Nicholas Carlini, Ian Goodfellow, Nicolas Papernot, is worth 16x16 words: Transformers for image recognition at scale.
Avital Oliver, and Colin A Raffel. Mixmatch: A holistic approach to arXiv preprint arXiv:2010.11929, 2020.
semi-supervised learning. Advances in neural information processing [31] Kaiwen Duan, Song Bai, Lingxi Xie, Honggang Qi, Qingming Huang,
systems, 32, 2019. and Qi Tian. Centernet: Keypoint triplets for object detection. In
[11] Aisha Chandio, Gong Gui, Teerath Kumar, Irfan Ullah, Ramin Ran- Proceedings of the IEEE/CVF international conference on computer
jbarzadeh, Arunabha M Roy, Akhtar Hussain, and Yao Shen. Precise vision, pages 6569–6578, 2019.
single-stage detector. arXiv preprint arXiv:2210.04252, 2022. [32] Dumitru Erhan, Aaron Courville, Yoshua Bengio, and Pascal Vincent.
[12] Aisha Chandio, Yao Shen, Malika Bendechache, Irum Inayat, and Why does unsupervised pre-training help deep learning? In Proceed-
Teerath Kumar. Audd: audio urdu digits dataset for automatic audio ings of the thirteenth international conference on artificial intelligence
urdu digit recognition. Applied Sciences, 11(19):8842, 2021. and statistics, pages 201–208. JMLR Workshop and Conference Pro-
[13] Arslan Chaudhry, Puneet K Dokania, and Philip HS Torr. Discover- ceedings, 2010.
ing class-specific pixels for weakly-supervised semantic segmentation. [33] M. Everingham, L. Van Gool, C. K. I. Williams, J. Winn,
arXiv preprint arXiv:1707.05821, 2017. and A. Zisserman. The PASCAL Visual Object Classes
[14] Liang-Chieh Chen, Yukun Zhu, George Papandreou, Florian Schroff, Challenge 2007 (VOC2007) Results. http://www.pascal-
and Hartwig Adam. Encoder-decoder with atrous separable convolution network.org/challenges/VOC/voc2007/workshop/index.html.
for semantic image segmentation. In Proceedings of the European [34] M. Everingham, L. Van Gool, C. K. I. Williams, J. Winn,
conference on computer vision (ECCV), pages 801–818, 2018. and A. Zisserman. The PASCAL Visual Object Classes
[15] Pengguang Chen, Shu Liu, Hengshuang Zhao, and Jiaya Jia. Gridmask Challenge 2012 (VOC2012) Results. http://www.pascal-
data augmentation. arXiv preprint arXiv:2001.04086, 2020. network.org/challenges/VOC/voc2012/workshop/index.html.
Method Model 1/8 1/4 1/2 7/8 Full
SDA [160] DeepLabV3Plus 74.1 - - - -
SDA + DSBN [160] DeepLabV3Plus 69.5 - - - -
SDA [160] DeepLabV3Plus - - - - 78.7
SDA + DSBN [160] DeepLabV3Plus - - - - 79.2
SDA [160] DeepLabV3Plus - - - 71.4 -
SDA + DSBN [160] DeepLabV3Plus - - - 72.5 -
AdvSemi [62] DeepLabV2 58.8 62.3 65.7 - 66.0
S4GAN + MT [100] DeepLabV2 59.3 61.9 - - 65.8
CutMix [41] DeepLabV2 60.3 63.87 - - 67.7
DST-CBC [40] DeepLabV2 60.5 64.4 - - 66.9
ClassMix [104] DeepLabV2 61.4 63.6 66.3 - 66.2
ECS [99] DeepLabv3Plus 67.4 70.7 72.9 - 74.8
DSBN [160] DeepLabV2 67.6 69.3 70.7 - 70.1
SSBN [160] DeepLabV3Plus 74.1 77.8 78.7 - 78.7
Adversarial [62] DeepLab-v2 - 58.8 62.3 65.7 -
s4GAN [100] DeepLab-v2 - 59.3 61.9 - 65.8
French et al [41] DeepLab-v2 51.20 60.34 63.87 - -
DST-CBC [40] DeepLab-v2 48.7 60.5 64.4 - -
ClassMix-Seg [104] DeepLab-v2 54.07 61.35 63.63 66.29
DeepLab V3plus [164] MobileNet - - - - 73.5
DeepLab V3plus [164] ResNet-50 - - - - 76.9
DeepLab V3plus [164] ResNet-101 - - - - 78.5
Baseline+ CutOut (16×16, p = 1) [164] MobileNet - - - - 72.8
Baseline+ CutMix (p = 1) [164] MobileNet - - - - 72.6
Baseline+ ObjectAug [164] MobileNet - - - - 73.5
TABLE IX
R ESULTS OF P ERFORMANCE ( M I O U) ON C ITYSCAPES VALIDATION SET
[35] Mark Everingham, Luc Van Gool, Christopher KI Williams, John [47] Chengyue Gong, Dilin Wang, Meng Li, Vikas Chandra, and Qiang
Winn, and Andrew Zisserman. The pascal visual object classes (voc) Liu. Keepaugment: A simple information-preserving data augmentation
challenge. International journal of computer vision, 88:303–308, 2009. approach. In Proceedings of the IEEE/CVF conference on computer
[36] Junsong Fan, Zhaoxiang Zhang, Chunfeng Song, and Tieniu Tan. vision and pattern recognition, pages 1055–1064, 2021.
Learning integral objects with intra-class discriminator for weakly- [48] Gregory Griffin, Alex Holub, and Pietro Perona. Caltech-256 object
supervised semantic segmentation. In Proceedings of the IEEE/CVF category dataset. 2007.
Conference on Computer Vision and Pattern Recognition, pages 4283– [49] Jian Guo and Stephen Gould. Deep cnn ensemble with data augmen-
4292, 2020. tation for object detection. arXiv preprint arXiv:1506.07224, 2015.
[37] Hao-Shu Fang, Jianhua Sun, Runzhong Wang, Minghao Gou, Yong- [50] Alon Halevy, Peter Norvig, and Fernando Pereira. The unreasonable
Lu Li, and Cewu Lu. Instaboost: Boosting instance segmentation via effectiveness of data. IEEE intelligent systems, 24(2):8–12, 2009.
probability map guided copy-pasting. In Proceedings of the IEEE/CVF [51] Junlin Han, Pengfei Fang, Weihao Li, Jie Hong, Mohammad Ali
International Conference on Computer Vision, pages 682–691, 2019. Armin, Ian Reid, Lars Petersson, and Hongdong Li. You only cut
[38] Li Fei-Fei, Robert Fergus, and Pietro Perona. One-shot learning of once: Boosting data augmentation with a single cut. arXiv preprint
object categories. IEEE transactions on pattern analysis and machine arXiv:2201.12078, 2022.
intelligence, 28(4):594–611, 2006. [52] Ethan Harris, Antonia Marcu, Matthew Painter, Mahesan Niranjan,
[39] Steven Y Feng, Varun Gangal, Dongyeop Kang, Teruko Mitamura, Adam Prügel-Bennett, and Jonathon Hare. Fmix: Enhancing mixed
and Eduard Hovy. Genaug: Data augmentation for finetuning text sample data augmentation. arXiv preprint arXiv:2002.12047, 2020.
generators. arXiv preprint arXiv:2010.01794, 2020. [53] Ryuichiro Hataya, Jan Zdenek, Kazuki Yoshizoe, and Hideki
[40] Zhengyang Feng, Qianyu Zhou, Guangliang Cheng, Xin Tan, Jianping Nakayama. Faster autoaugment: Learning augmentation strategies
Shi, and Lizhuang Ma. Semi-supervised semantic segmentation via using backpropagation. In European Conference on Computer Vision,
dynamic self-training and classbalanced curriculum. arXiv preprint pages 1–16. Springer, 2020.
arXiv:2004.08514, 1(2):5, 2020. [54] Kaiming He, Georgia Gkioxari, Piotr Dollár, and Ross Girshick.
[41] Geoff French, Timo Aila, Samuli Laine, Michal Mackiewicz, and Mask r-cnn. In Proceedings of the IEEE international conference on
Graham Finlayson. Semi-supervised semantic segmentation needs computer vision, pages 2961–2969, 2017.
strong, high-dimensional perturbations. 2019. [55] Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep
[42] Leon A Gatys, Alexander S Ecker, and Matthias Bethge. A neural residual learning for image recognition. In Proceedings of the IEEE
algorithm of artistic style. arXiv preprint arXiv:1508.06576, 2015. conference on computer vision and pattern recognition, pages 770–778,
[43] Golnaz Ghiasi, Yin Cui, Aravind Srinivas, Rui Qian, Tsung-Yi Lin, 2016.
Ekin D Cubuk, Quoc V Le, and Barret Zoph. Simple copy-paste [56] Dan Hendrycks, Norman Mu, Ekin D Cubuk, Barret Zoph, Justin
is a strong data augmentation method for instance segmentation. In Gilmer, and Balaji Lakshminarayanan. Augmix: A simple data pro-
Proceedings of the IEEE/CVF Conference on Computer Vision and cessing method to improve robustness and uncertainty. arXiv preprint
Pattern Recognition, pages 2918–2928, 2021. arXiv:1912.02781, 2019.
[44] Golnaz Ghiasi, Tsung-Yi Lin, and Quoc V Le. Dropblock: A reg- [57] Dan Hendrycks, Kevin Zhao, Steven Basart, Jacob Steinhardt, and
ularization method for convolutional networks. Advances in neural Dawn Song. Natural adversarial examples. In Proceedings of the
information processing systems, 31, 2018. IEEE/CVF Conference on Computer Vision and Pattern Recognition,
[45] Ross Girshick. Fast r-cnn. In Proceedings of the IEEE international pages 15262–15271, 2021.
conference on computer vision, pages 1440–1448, 2015. [58] Netzahualcoyotl Hernandez-Cruz, David Cato, and Jesus Favela. Neu-
[46] Ross Girshick, Jeff Donahue, Trevor Darrell, and Jitendra Malik. ral style transfer as data augmentation for improving covid-19 diagnosis
Rich feature hierarchies for accurate object detection and semantic classification. SN Computer Science, 2(5):1–12, 2021.
segmentation. In Proceedings of the IEEE conference on computer [59] Minui Hong, Jinwoo Choi, and Gunhee Kim. Stylemix: Separating
vision and pattern recognition, pages 580–587, 2014. content and style for enhanced data augmentation. In Proceedings of the
Method Model 1/100 1/50 1/20 1/8 1/4 Full
GANSeg [131] VGG16 - - - - 64.1
AdvSemSeg [62] ResNet-101 - - - - 68.4
CCT [105] ResNet-50 - - - - 69.4
PseudoSeg [173] ResNet-101 - - - - 73.2
DSBN [160] ResNet-101 - - - - 75.0
DSBN [160] Xception-65 - - - - 79.3
Fully supervised [160] ResNet-101 - - - - 78.3
Fully supervised [160] Xception-65 - - - - 79.2
Adversarial [62] DeepLab-v2 - 57.2 64.7 69.5 72.1 -
s4GAN [100] DeepLab-v2 - 63.3 67.2 71.4 - 75.6
French et.el [41] DeepLab-v2 53.79 64.81 66.48 67.60 - -
DST-CBC [40] DeepLab-v2 61.6 65.5 69.3 70.7 71.8 -
ClassMix:Seg* [104] DeepLab-v2 54.18 66.15 67.77 71.00 72.45 -
Mixup [163] IRNet - - - - - 49
CutOut [29] IRNet - - - - - 48.9
CutMix [162] IRNet - - - - - 49.2
Random pasting [134] IRNet - - - - - 49.8
CCNN [107] VGG16 - - - - - 35.6
SEC [73] VGG16 - - - - - 51.1
STC [151] VGG16 - - - - - 51.2
AdvEra [150] VGG16 - - - - - 55.7
DCSP [13] ResNet101 - - - - - 61.9
MDC [152] VGG16 - - - - - 60.8
MCOF [147] ResNet101 - - - - - 61.2
DSRG [61] ResNet101 - - - - - 63.2
AffinityNet [2] ResNet-38 - - - - - 63.7
IRNet [1] ResNet50 - - - - - 64.8
FickleNet [86] ResNet101 - - - - - 65.3
SEAM [149] ResNet38 - - - - - 65.7
ICD [36] ResNet101 - - - - - 64.3
IRNet + CDA [134] ResNet50 - - - - - 66.4
SEAM + CDA [134] ResNet38 - - - - - 66.8
DeepLab V3 [164] MobileNet - - - - - 71.9
DeepLab V3 [164] ResNet-50 - - - - - 77.8
DeepLab V3 [164] ResNet-101 - - - - - 78.4
DeepLab V3plus [164] MobileNet - - - - - 73.8
DeepLab V3plus [164] ResNet-50 - - - - - 78.8
DeepLab V3plus [164] ResNet-101 - - - - - 79.6
Baseline+R.Rotation [164] ObjectAug - - - - - 69.5
Baseline +R.Scaling [164] ObjectAug - - - - - 70.3
Baseline + R.Flipping [164] ObjectAug - - - - - 69.6
Baseline + R.Shifting [164] ObjectAug - - - - - 70.7
Baseline + All [164] ObjectAug - - - - - 73.8
Baseline + CutOut (16×16, p = 0.5) [164] MobileNet - - - - - 71.9
Baseline + CutOut (16×16, p = 1) [164] MobileNet - - - - - 72.3
Baseline + CutMix (p = 0.5) [164] MobileNet - - - - - 72.7
Baseline + CutMix (p = 1) [164] MobileNet - - - - - 72.4
Baseline + ObjectAug [164] MobileNet - - - - - 73.8
Baseline + CutOut (16×16, p=0.5) + ObjectAug [164]
MobileNet - - - - - 73.9
Baseline + CutMix (p=0.5) + ObjectAug [164] MobileNet - - - - - 74.1
DeepLabv3+ [14] EfficientNet-B7 - - - - - 84.6
ExFuse [166] EfficientNet-B7 - - - - - 85.8
Eff-B7 [172] EfficientNet-B7 - - - - - 85.2
Eff-L2 [172] EfficientNet-B7 - - - - - 88.7
Eff-B7 NAS-FPN [43] EfficientNet-B7 - - - - - 83.9
Eff-B7 NAS-FPN w/ Copy-Paste pre-training [43] EfficientNet-B7 - - - - - 86.6
TABLE X
R ESULTS OF P ERFORMANCE MEAN INTERSECTION OVER UNION ( M I O U) ON THE PASCAL VOC 2012 VALIDATION SET
IEEE/CVF Conference on Computer Vision and Pattern Recognition, [81] Chia-Wen Kuo, Chih-Yao Ma, Jia-Bin Huang, and Zsolt Kira. Feat-
pages 14862–14870, 2021. match: Feature-based augmentation for semi-supervised learning. In
[60] Shaoli Huang, Xinchao Wang, and Dacheng Tao. Snapmix: Seman- European Conference on Computer Vision, pages 479–495. Springer,
tically proportional mixing for augmenting fine-grained data. In Pro- 2020.
ceedings of the AAAI Conference on Artificial Intelligence, volume 35, [82] Jiss Kuruvilla, Dhanya Sukumaran, Anjali Sankar, and Siji P Joy.
pages 1628–1636, 2021. A review on image processing and image segmentation. In 2016
[61] Zilong Huang, Xinggang Wang, Jiasi Wang, Wenyu Liu, and Jingdong international conference on data mining and advanced computing
Wang. Weakly-supervised semantic segmentation network with deep (SAPIENCE), pages 198–203. IEEE, 2016.
seeded region growing. In Proceedings of the IEEE conference on [83] Samuli Laine and Timo Aila. Temporal ensembling for semi-supervised
computer vision and pattern recognition, pages 7014–7023, 2018. learning. arXiv preprint arXiv:1610.02242, 2016.
[62] Wei-Chih Hung, Yi-Hsuan Tsai, Yan-Ting Liou, Yen-Yu Lin, and [84] Misha Laskin, Kimin Lee, Adam Stooke, Lerrel Pinto, Pieter Abbeel,
Ming-Hsuan Yang. Adversarial learning for semi-supervised semantic and Aravind Srinivas. Reinforcement learning with augmented data.
segmentation. arXiv preprint arXiv:1802.07934, 2018. Advances in Neural Information Processing Systems, 33:19884–19895,
[63] Sergey Ioffe and Christian Szegedy. Batch normalization: Accelerating 2020.
deep network training by reducing internal covariate shift. In Interna- [85] Dong-Hyun Lee et al. Pseudo-label: The simple and efficient semi-
tional conference on machine learning, pages 448–456. PMLR, 2015. supervised learning method for deep neural networks. In Workshop
[64] Ahmet Iscen, Giorgos Tolias, Yannis Avrithis, and Ondrej Chum. Label on challenges in representation learning, ICML, volume 3, page 896,
propagation for deep semi-supervised learning. In Proceedings of the 2013.
IEEE/CVF Conference on Computer Vision and Pattern Recognition, [86] Jungbeom Lee, Eunji Kim, Sungmin Lee, Jangho Lee, and Sungroh
pages 5070–5079, 2019. Yoon. Ficklenet: Weakly and semi-supervised semantic image seg-
[65] Jacob Jackson and John Schulman. Semi-supervised learning by label mentation using stochastic inference. In Proceedings of the IEEE/CVF
gradient alignment. arXiv preprint arXiv:1902.02336, 2019. Conference on Computer Vision and Pattern Recognition, pages 5267–
[66] Philip TG Jackson, Amir Atapour Abarghouei, Stephen Bonner, Toby P 5276, 2019.
Breckon, and Boguslaw Obara. Style augmentation: data augmentation [87] Victor Lempitsky, Pushmeet Kohli, Carsten Rother, and Toby Sharp.
via style randomization. In CVPR workshops, volume 6, pages 10–11, Image segmentation with a bounding box prior. In 2009 IEEE 12th
2019. international conference on computer vision, pages 277–284. IEEE,
[67] Wisal Khan, Kislay Raj, Teerath Kumar, Arunabha M Roy, and Bin 2009.
Luo. Introducing urdu digits dataset with demonstration of an efficient [88] Yonggang Li, Guosheng Hu, Yongtao Wang, Timothy Hospedales,
and robust noisy decoder-based pseudo example generator. Symmetry, Neil M Robertson, and Yongxin Yang. Dada: Differentiable automatic
14(10):1976, 2022. data augmentation. arXiv preprint arXiv:2003.03780, 2020.
[68] Cherry Khosla and Baljit Singh Saini. Enhancing performance of [89] JunHao Liew, Yunchao Wei, Wei Xiong, Sim-Heng Ong, and Jiashi
deep learning models with different data augmentation techniques: A Feng. Regional interactive image segmentation networks. In 2017 IEEE
survey. In 2020 International Conference on Intelligent Engineering international conference on computer vision (ICCV), pages 2746–2754.
and Management (ICIEM), pages 79–85. IEEE, 2020. IEEE Computer Society, 2017.
[69] Byoungjip Kim, Jinho Choo, Yeong-Dae Kwon, Seongho Joe, Seungjai [90] Sungbin Lim, Ildoo Kim, Taesup Kim, Chiheon Kim, and Sungwoong
Min, and Youngjune Gwon. Selfmatch: Combining contrastive self- Kim. Fast autoaugment. Advances in Neural Information Processing
supervision and consistency for semi-supervised learning. arXiv Systems, 32, 2019.
preprint arXiv:2101.06480, 2021. [91] Shiqi Lin, Tao Yu, Ruoyu Feng, Xin Li, Xin Jin, and Zhibo Chen.
[70] Jang-Hyun Kim, Wonho Choo, and Hyun Oh Song. Puzzle mix: Ex- Local patch autoaugment with multi-agent collaboration. arXiv preprint
ploiting saliency and local statistics for optimal mixup. In International arXiv:2103.11099, 2021.
Conference on Machine Learning, pages 5275–5285. PMLR, 2020. [92] Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Per-
[71] Youmin Kim, AFM Shahab Uddin, and Sung-Ho Bae. Local augment: ona, Deva Ramanan, Piotr Dollár, and C Lawrence Zitnick. Microsoft
Utilizing local bias property of convolutional neural networks for data coco: Common objects in context. In Computer Vision–ECCV 2014:
augmentation. IEEE Access, 9:15191–15199, 2021. 13th European Conference, Zurich, Switzerland, September 6-12, 2014,
[72] Tom Ko, Vijayaditya Peddinti, Daniel Povey, and Sanjeev Khudanpur. Proceedings, Part V 13, pages 740–755. Springer, 2014.
Audio augmentation for speech recognition. In Sixteenth annual [93] Pei Liu, Xuemin Wang, Chao Xiang, and Weiye Meng. A survey of
conference of the international speech communication association, text data augmentation. In 2020 International Conference on Computer
2015. Communication and Network Security (CCNS), pages 191–195. IEEE,
[73] Alexander Kolesnikov and Christoph H Lampert. Seed, expand and 2020.
constrain: Three principles for weakly-supervised image segmentation. [94] Xiaoliang Liu, Furao Shen, Jian Zhao, and Changhai Nie. Randommix:
In European conference on computer vision, pages 695–711. Springer, A mixed sample data augmentation method with multiple mixed modes.
2016. arXiv preprint arXiv:2205.08728, 2022.
[74] Alex Krizhevsky, Geoffrey Hinton, et al. Learning multiple layers of [95] Xiaolong Liu, Zhidong Deng, and Yuhan Yang. Recent progress in se-
features from tiny images. 2009. mantic image segmentation. Artificial Intelligence Review, 52(2):1089–
[75] Alex Krizhevsky, Ilya Sutskever, and Geoffrey E Hinton. Imagenet 1106, 2019.
classification with deep convolutional neural networks. Advances in [96] Yucen Luo, Jun Zhu, Mengxi Li, Yong Ren, and Bo Zhang. Smooth
neural information processing systems, 25, 2012. neighbors on teacher graphs for semi-supervised learning. In Pro-
[76] Alex Krizhevsky, Ilya Sutskever, and Geoffrey E Hinton. Imagenet ceedings of the IEEE conference on computer vision and pattern
classification with deep convolutional neural networks. Communica- recognition, pages 8896–8905, 2018.
tions of the ACM, 60(6):84–90, 2017. [97] Aleksander Madry, Aleksandar Makelov, Ludwig Schmidt, Dimitris
[77] Teerath Kumar, Alessandra Mileo, Rob Brennan, and Malika Ben- Tsipras, and Adrian Vladu. Towards deep learning models resistant
dechache. Rsmda: Random slices mixing data augmentation. Applied to adversarial attacks. arXiv preprint arXiv:1706.06083, 2017.
Sciences, 13(3):1711, 2023. [98] Sachin Mehta, Saeid Naderiparizi, Fartash Faghri, Maxwell Horton,
[78] Teerath Kumar, Jinbae Park, Muhammad Salman Ali, AFM Uddin, and Lailin Chen, Ali Farhadi, Oncel Tuzel, and Mohammad Rastegari.
Sung-Ho Bae. Class specific autoencoders enhance sample diversity. Rangeaugment: Efficient online augmentation with range learning.
Journal of Broadcast Engineering, 26(7):844–854, 2021. arXiv preprint arXiv:2212.10553, 2022.
[79] Teerath Kumar, Jinbae Park, Muhammad Salman Ali, AFM Shahab [99] Robert Mendel, Luis Antonio de Souza, David Rauber, Joao Paulo
Uddin, Jong Hwan Ko, and Sung-Ho Bae. Binary-classifiers-enabled Papa, and Christoph Palm. Semi-supervised segmentation based on
filters for semi-supervised learning. IEEE Access, 9:167663–167673, error-correcting supervision. In European Conference on Computer
2021. Vision, pages 141–157. Springer, 2020.
[80] Teerath Kumar, Jinbae Park, and Sung-Ho Bae. Intra-class random [100] Sudhanshu Mittal, Maxim Tatarchenko, and Thomas Brox. Semi-
erasing (icre) augmentation for audio classification. In Proceedings Of supervised semantic segmentation with high-and low-level consis-
The Korean Society Of Broadcast Engineers Conference, pages 244– tency. IEEE transactions on pattern analysis and machine intelligence,
247. The Korean Institute of Broadcast and Media Engineers, 2020. 43(4):1369–1379, 2019.
[101] Takeru Miyato, Shin-ichi Maeda, Masanori Koyama, and Shin Ishii. [119] Olga Russakovsky, Jia Deng, Hao Su, Jonathan Krause, Sanjeev
Virtual adversarial training: a regularization method for supervised and Satheesh, Sean Ma, Zhiheng Huang, Andrej Karpathy, Aditya Khosla,
semi-supervised learning. IEEE transactions on pattern analysis and Michael Bernstein, et al. Imagenet large scale visual recognition
machine intelligence, 41(8):1979–1993, 2018. challenge. International journal of computer vision, 115(3):211–252,
[102] Loris Nanni, Gianluca Maguolo, and Michelangelo Paci. Data augmen- 2015.
tation approaches for improving animal audio classification. Ecological [120] Mehdi Sajjadi, Mehran Javanmardi, and Tolga Tasdizen. Regular-
Informatics, 57:101084, 2020. ization with stochastic transformations and perturbations for deep
[103] Yuval Netzer, Tao Wang, Adam Coates, Alessandro Bissacco, Bo Wu, semi-supervised learning. Advances in neural information processing
and Andrew Y Ng. Reading digits in natural images with unsupervised systems, 29, 2016.
feature learning. 2011. [121] Jin-Woo Seo, Hong-Gyu Jung, and Seong-Whan Lee. Self-
[104] Viktor Olsson, Wilhelm Tranheden, Juliano Pinto, and Lennart Svens- augmentation: Generalizing deep networks to unseen classes for few-
son. Classmix: Segmentation-based data augmentation for semi- shot learning. Neural Networks, 138:140–149, 2021.
supervised learning. In Proceedings of the IEEE/CVF Winter Con- [122] Ling Shao, Fan Zhu, and Xuelong Li. Transfer learning for visual
ference on Applications of Computer Vision, pages 1369–1378, 2021. categorization: A survey. IEEE transactions on neural networks and
[105] Yassine Ouali, Céline Hudelot, and Myriam Tami. Semi-supervised learning systems, 26(5):1019–1034, 2014.
semantic segmentation with cross-consistency training. In Proceedings [123] Connor Shorten and Taghi M Khoshgoftaar. A survey on image data
of the IEEE/CVF Conference on Computer Vision and Pattern Recog- augmentation for deep learning. Journal of big data, 6(1):1–48, 2019.
nition, pages 12674–12684, 2020. [124] Connor Shorten, Taghi M Khoshgoftaar, and Borko Furht. Text data
[106] Jinbae Park, Teerath Kumar, and Sung-Ho Bae. Search of an optimal augmentation for deep learning. Journal of big Data, 8(1):1–34, 2021.
sound augmentation policy for environmental sound classification with [125] Karen Simonyan and Andrew Zisserman. Very deep convolu-
deep neural networks. In Proceedings Of The Korean Society Of tional networks for large-scale image recognition. arXiv preprint
Broadcast Engineers Conference, pages 18–21. The Korean Institute arXiv:1409.1556, 2014.
of Broadcast and Media Engineers, 2020. [126] Aditya Singh, Ramin Ranjbarzadeh, Kislay Raj, Teerath Kumar, and
[107] Deepak Pathak, Philipp Krahenbuhl, and Trevor Darrell. Constrained Arunabha M Roy. Understanding eeg signals for subject-wise definition
convolutional neural networks for weakly supervised segmentation. In of armoni activities. arXiv preprint arXiv:2301.00948, 2023.
Proceedings of the IEEE international conference on computer vision, [127] Bharat Singh and Larry S Davis. An analysis of scale invariance
pages 1796–1804, 2015. in object detection snip. In Proceedings of the IEEE conference on
[108] Pornntiwa Pawara, Emmanuel Okafor, Lambert Schomaker, and Marco computer vision and pattern recognition, pages 3578–3587, 2018.
Wiering. Data augmentation for plant classification. In International [128] Bharat Singh, Mahyar Najibi, and Larry S Davis. Sniper: Efficient
conference on advanced concepts for intelligent vision systems, pages multi-scale training. Advances in neural information processing sys-
615–626. Springer, 2017. tems, 31, 2018.
[109] Luis Perez and Jason Wang. The effectiveness of data augmen- [129] Krishna Kumar Singh, Hao Yu, Aron Sarmasi, Gautam Pradeep,
tation in image classification using deep learning. arXiv preprint and Yong Jae Lee. Hide-and-seek: A data augmentation tech-
arXiv:1712.04621, 2017. nique for weakly-supervised localization and beyond. arXiv preprint
[110] Siyuan Qiao, Wei Shen, Zhishuai Zhang, Bo Wang, and Alan Yuille. arXiv:1811.02545, 2018.
Deep co-training for semi-supervised image recognition. In Proceed- [130] Kihyuk Sohn, David Berthelot, Nicholas Carlini, Zizhao Zhang, Han
ings of the european conference on computer vision (eccv), pages 135– Zhang, Colin A Raffel, Ekin Dogus Cubuk, Alexey Kurakin, and Chun-
152, 2018. Liang Li. Fixmatch: Simplifying semi-supervised learning with con-
[111] Jie Qin, Jiemin Fang, Qian Zhang, Wenyu Liu, Xingang Wang, and sistency and confidence. Advances in Neural Information Processing
Xinggang Wang. Resizemix: Mixing data with preserved object Systems, 33:596–608, 2020.
information and true labels. arXiv preprint arXiv:2012.11101, 2020. [131] Nasim Souly, Concetto Spampinato, and Mubarak Shah. Semi super-
[112] Alexandre Ramé, Rémy Sun, and Matthieu Cord. Mixmo: Mixing mul- vised semantic segmentation using generative adversarial network. In
tiple inputs for multiple outputs via deep subnetworks. In Proceedings Proceedings of the IEEE international conference on computer vision,
of the IEEE/CVF International Conference on Computer Vision, pages pages 5688–5696, 2017.
823–833, 2021. [132] Nitish Srivastava, Geoffrey Hinton, Alex Krizhevsky, Ilya Sutskever,
[113] Ramin Ranjbarzadeh, Shadi Dorosti, Saeid Jafarzadeh Ghoushchi, and Ruslan Salakhutdinov. Dropout: a simple way to prevent neural
Annalina Caputo, Erfan Babaee Tirkolaee, Sadia Samar Ali, Zahra networks from overfitting. The journal of machine learning research,
Arshadi, and Malika Bendechache. Breast tumor localization and 15(1):1929–1958, 2014.
segmentation using machine learning techniques: Overview of datasets, [133] Xingzhe Su. A survey on data augmentation methods based on gan
findings, and methods. Computers in Biology and Medicine, page in computer vision. In The International Conference on Natural
106443, 2022. Computation, Fuzzy Systems and Knowledge Discovery, pages 852–
[114] Ramin Ranjbarzadeh, Saeid Jafarzadeh Ghoushchi, Nazanin Tataei Sar- 865. Springer, 2020.
shar, Erfan Babaee Tirkolaee, Sadia Samar Ali, Teerath Kumar, and [134] Yukun Su, Ruizhou Sun, Guosheng Lin, and Qingyao Wu. Context de-
Malika Bendechache. Me-ccnn: Multi-encoded images and a cascade coupling augmentation for weakly supervised semantic segmentation.
convolutional neural network for breast tumor segmentation and recog- In Proceedings of the IEEE/CVF international conference on computer
nition. Artificial Intelligence Review, pages 1–38, 2023. vision, pages 7004–7014, 2021.
[115] Ramin Ranjbarzadeh, Nazanin Tataei Sarshar, Saeid [135] Cecilia Summers and Michael J Dinneen. Improved mixed-example
Jafarzadeh Ghoushchi, Mohammad Saleh Esfahani, Mahboub data augmentation. In 2019 IEEE Winter Conference on Applications
Parhizkar, Yaghoub Pourasad, Shokofeh Anari, and Malika of Computer Vision (WACV), pages 1262–1270. IEEE, 2019.
Bendechache. Mrfe-cnn: multi-route feature extraction model [136] Chen Sun, Abhinav Shrivastava, Saurabh Singh, and Abhinav Gupta.
for breast tumor segmentation in mammograms using a convolutional Revisiting unreasonable effectiveness of data in deep learning era. In
neural network. Annals of Operations Research, pages 1–22, 2022. Proceedings of the IEEE international conference on computer vision,
[116] Ramin Ranjbarzadeh, Payam Zarbakhsh, Annalina Caputo, Er- pages 843–852, 2017.
fan Babaee Tirkolaee, and Malika Bendechache. Brain tumor seg- [137] Ryo Takahashi, Takashi Matsubara, and Kuniaki Uehara. Ricap:
mentation based on an optimized convolutional neural network and an Random image cropping and patching data augmentation for deep
improved chimp optimization algorithm. Available at SSRN 4295236, cnns. In Asian conference on machine learning, pages 786–798. PMLR,
2022. 2018.
[117] Antti Rasmus, Mathias Berglund, Mikko Honkala, Harri Valpola, [138] Antti Tarvainen and Harri Valpola. Mean teachers are better role
and Tapani Raiko. Semi-supervised learning with ladder networks. models: Weight-averaged consistency targets improve semi-supervised
Advances in neural information processing systems, 28, 2015. deep learning results. Advances in neural information processing
[118] Arunabha M Roy, Jayabrata Bhaduri, Teerath Kumar, and Kislay Raj. systems, 30, 2017.
Wildect-yolo: An efficient and robust computer vision-based accurate [139] Nazanin Tataei Sarshar, Ramin Ranjbarzadeh, Saeid
object localization model for automated endangered wildlife detection. Jafarzadeh Ghoushchi, Gabriel Gomes de Oliveira, Shokofeh
Ecological Informatics, page 101919, 2022. Anari, Mahboub Parhizkar, and Malika Bendechache. Glioma brain
tumor segmentation in four mri modalities using a convolutional neural [158] Suorong Yang, Weikang Xiao, Mengcheng Zhang, Suhan Guo, Jian
network and based on a transfer learning method. In Proceedings Zhao, and Furao Shen. Image data augmentation for deep learning: A
of the 7th Brazilian Technology Symposium (BTSym’21) Emerging survey. arXiv preprint arXiv:2204.08610, 2022.
Trends in Human Smart and Sustainable Future of Cities (Volume 1), [159] Jaejun Yoo, Namhyuk Ahn, and Kyung-Ah Sohn. Rethinking data
pages 386–402. Springer, 2022. augmentation for image super-resolution: A comprehensive analysis
[140] Muhammad Turab, Teerath Kumar, Malika Bendechache, and Takfari- and a new strategy. In Proceedings of the IEEE/CVF Conference on
nas Saber. Investigating multi-feature selection and ensembling for Computer Vision and Pattern Recognition, pages 8375–8384, 2020.
audio classification. arXiv preprint arXiv:2206.07511, 2022. [160] Jianlong Yuan, Yifan Liu, Chunhua Shen, Zhibin Wang, and Hao Li. A
[141] AFM Uddin, Mst Monira, Wheemyung Shin, TaeChoong Chung, Sung- simple baseline for semi-supervised semantic segmentation with strong
Ho Bae, et al. Saliencymix: A saliency guided data augmentation data augmentation. In Proceedings of the IEEE/CVF International
strategy for better regularization. arXiv preprint arXiv:2006.01791, Conference on Computer Vision, pages 8229–8238, 2021.
2020. [161] Fei Yue, Chao Zhang, MingYang Yuan, Chen Xu, and YaLin Song.
[142] Vikas Verma, Kenji Kawaguchi, Alex Lamb, Juho Kannala, Yoshua Survey of image augmentation based on generative adversarial network.
Bengio, and David Lopez-Paz. Interpolation consistency training for In Journal of Physics: Conference Series, volume 2203, page 012052.
semi-supervised learning. arXiv preprint arXiv:1903.03825, 2019. IOP Publishing, 2022.
[162] Sangdoo Yun, Dongyoon Han, Seong Joon Oh, Sanghyuk Chun, Junsuk
[143] Riccardo Volpi, Pietro Morerio, Silvio Savarese, and Vittorio Murino.
Choe, and Youngjoon Yoo. Cutmix: Regularization strategy to train
Adversarial feature augmentation for unsupervised domain adaptation.
strong classifiers with localizable features. In Proceedings of the
In Proceedings of the IEEE conference on computer vision and pattern
IEEE/CVF international conference on computer vision, pages 6023–
recognition, pages 5495–5504, 2018.
6032, 2019.
[144] Hao Wang, Qilong Wang, Fan Yang, Weiqi Zhang, and Wangmeng Zuo. [163] Hongyi Zhang, Moustapha Cisse, Yann N Dauphin, and David Lopez-
Data augmentation for object detection via progressive and selective Paz. mixup: Beyond empirical risk minimization. arXiv preprint
instance-switching. arXiv preprint arXiv:1906.00358, 2019. arXiv:1710.09412, 2017.
[145] Ke Wang, Bin Fang, Jiye Qian, Su Yang, Xin Zhou, and Jie Zhou. [164] Jiawei Zhang, Yanchun Zhang, and Xiaowei Xu. Objectaug: object-
Perspective transformation data augmentation for object detection. level data augmentation for semantic image segmentation. In 2021
IEEE Access, 8:4935–4943, 2019. International Joint Conference on Neural Networks (IJCNN), pages
[146] Xiang Wang, Kai Wang, and Shiguo Lian. A survey on face data 1–8. IEEE, 2021.
augmentation for the training of deep neural networks. Neural [165] Xiaofeng Zhang, Zhangyang Wang, Dong Liu, Qifeng Lin, and Qing
computing and applications, 32(19):15503–15531, 2020. Ling. Deep adversarial data augmentation for extremely low data
[147] Xiang Wang, Shaodi You, Xi Li, and Huimin Ma. Weakly-supervised regimes. IEEE Transactions on Circuits and Systems for Video
semantic segmentation by iteratively mining common object features. Technology, 31(1):15–28, 2020.
In Proceedings of the IEEE conference on computer vision and pattern [166] Zhenli Zhang, Xiangyu Zhang, Chao Peng, Xiangyang Xue, and Jian
recognition, pages 1354–1362, 2018. Sun. Exfuse: Enhancing feature fusion for semantic segmentation. In
[148] Xiaolong Wang, Abhinav Shrivastava, and Abhinav Gupta. A-fast- Proceedings of the European conference on computer vision (ECCV),
rcnn: Hard positive generation via adversary for object detection. In pages 269–284, 2018.
Proceedings of the IEEE conference on computer vision and pattern [167] Zhi Zhang, Tong He, Hang Zhang, Zhongyue Zhang, Junyuan Xie, and
recognition, pages 2606–2615, 2017. Mu Li. Bag of freebies for training object detection neural networks.
[149] Yude Wang, Jie Zhang, Meina Kan, Shiguang Shan, and Xilin Chen. arXiv preprint arXiv:1902.04103, 2019.
Self-supervised equivariant attention mechanism for weakly supervised [168] Zhengli Zhao, Dheeru Dua, and Sameer Singh. Generating natural
semantic segmentation. In Proceedings of the IEEE/CVF Conference on adversarial examples. arXiv preprint arXiv:1710.11342, 2017.
Computer Vision and Pattern Recognition, pages 12275–12284, 2020. [169] Xu Zheng, Tejo Chalasani, Koustav Ghosal, Sebastian Lutz, and Aljosa
[150] Yunchao Wei, Jiashi Feng, Xiaodan Liang, Ming-Ming Cheng, Yao Smolic. Stada: Style transfer as data augmentation. arXiv preprint
Zhao, and Shuicheng Yan. Object region mining with adversarial arXiv:1909.01056, 2019.
erasing: A simple classification to semantic segmentation approach. In [170] Zhun Zhong, Liang Zheng, Guoliang Kang, Shaozi Li, and Yi Yang.
Proceedings of the IEEE conference on computer vision and pattern Random erasing data augmentation. In Proceedings of the AAAI
recognition, pages 1568–1576, 2017. conference on artificial intelligence, volume 34, pages 13001–13008,
2020.
[151] Yunchao Wei, Xiaodan Liang, Yunpeng Chen, Xiaohui Shen, Ming- [171] Barret Zoph, Ekin D Cubuk, Golnaz Ghiasi, Tsung-Yi Lin, Jonathon
Ming Cheng, Jiashi Feng, Yao Zhao, and Shuicheng Yan. Stc: A Shlens, and Quoc V Le. Learning data augmentation strategies for
simple to complex framework for weakly-supervised semantic segmen- object detection. In European conference on computer vision, pages
tation. IEEE transactions on pattern analysis and machine intelligence, 566–583. Springer, 2020.
39(11):2314–2320, 2016. [172] Barret Zoph, Golnaz Ghiasi, Tsung-Yi Lin, Yin Cui, Hanxiao Liu,
[152] Yunchao Wei, Huaxin Xiao, Honghui Shi, Zequn Jie, Jiashi Feng, and Ekin Dogus Cubuk, and Quoc Le. Rethinking pre-training and self-
Thomas S Huang. Revisiting dilated convolution: A simple approach training. Advances in neural information processing systems, 33:3833–
for weakly-and semi-supervised semantic segmentation. In Proceedings 3845, 2020.
of the IEEE conference on computer vision and pattern recognition, [173] Yuliang Zou, Zizhao Zhang, Han Zhang, Chun-Liang Li, Xiao Bian,
pages 7268–7277, 2018. Jia-Bin Huang, and Tomas Pfister. Pseudoseg: Designing pseudo labels
[153] Karl Weiss, Taghi M Khoshgoftaar, and DingDing Wang. A survey of for semantic segmentation. arXiv preprint arXiv:2010.09713, 2020.
transfer learning. Journal of Big data, 3(1):1–40, 2016.
[154] Sebastien C Wong, Adam Gatt, Victor Stamatescu, and Mark D
McDonnell. Understanding data augmentation for classification: when
to warp? In 2016 international conference on digital image computing:
techniques and applications (DICTA), pages 1–6. IEEE, 2016.
[155] Shasha Xie, Hui Lin, and Yang Liu. Semi-supervised extractive
speech summarization via co-training algorithm. In Eleventh Annual
Conference of the International Speech Communication Association,
2010.
[156] Tianshu Xie, Xuan Cheng, Xiaomin Wang, Minghui Liu, Jiali Deng,
Tao Zhou, and Ming Liu. Cut-thumbnail: A novel data augmentation
for convolutional neural network. In Proceedings of the 29th ACM
International Conference on Multimedia, pages 1627–1635, 2021.
[157] Mingle Xu, Sook Yoon, Alvaro Fuentes, and Dong Sun Park. A
comprehensive survey of image augmentation techniques for deep
learning. arXiv preprint arXiv:2205.01491, 2022.