0% found this document useful (0 votes)
6 views32 pages

2301.02830v4

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
6 views32 pages

2301.02830v4

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 32

Image Data Augmentation Approaches: A

Comprehensive Survey and Future directions


* Corresponding author(s)

1st Teerath Kumar* 2nd Alessandra Mileo 3rd Rob Brennan


CRT-AI, ADAPT Research Centre INSIGHT & I-Form Research Centre ADAPT Research Centre
School of Computing, School of Computing, School of Computer Science,
Dublin City University, Ireland; Dublin City University, Ireland; University College Dublin, Ireland;
[email protected] [email protected] [email protected]
arXiv:2301.02830v4 [cs.CV] 12 Mar 2023

4th Malika Bendechache


ADAPT & Lero Research Centres,
School of Computer Science,
University of Galway, Ireland;
[email protected]

Abstract—Deep learning algorithms have demonstrated re- I. I NTRODUCTION & MOTIVATION


markable performance in various computer vision tasks, however,
limited labeled data can lead to overfitting problems, hindering Deep learning models have gained popularity and achieved
the network’s performance on unseen data. To address this
tremendous progress in computer vision (CV) tasks such as
issue, various generalization techniques have been proposed,
including dropout, normalization, and advanced data augmen- image classification [11], [55], [67], [75], [78], [79], [118],
tation. Among these techniques, image data augmentation - [125], object detection [46], [54], , image segmentation [82],
which increases the dataset size by incorporating sample di- [87], [89], [95] and medical imaging [6], [113]–[116], [139].
versity - has received significant attention in recent times. In This advancement has been propelled by various deep neural
this survey, we focus on advanced image data augmentation
network architectures, powerful computation resources and
techniques. We provide an overview of data augmentation,
present a novel and comprehensive taxonomy of the reviewed extensive availability of data [123]. Convolutional Neural
data augmentation techniques, and discuss their strengths and Networks (CNNs) have demonstrated remarkable performance
limitations. Furthermore, we provide comprehensive results of in CV tasks among all deep learning models. CNNs learn
the impact of data augmentation on three popular computer different features of an image by applying the convolution
vision tasks: image classification, object detection, and semantic
operation with the input image and kernel. The initial layers of
segmentation. For results reproducibility, the available codes of
all data augmentation techniques have been compiled. Finally, we CNN learn low-level features (e.g. edges, lines) while deeper
discuss the challenges and difficulties, as well as possible future layers learn more structured and complex features. The success
directions for the research community. This survey provides of CNNs has stimulated the interest to use them for CV tasks.
several benefits: i) readers will gain a deeper understanding of In addition to CNNs, Vision Transformers (ViT) [30] are also
how data augmentation can help address overfitting problems,
gaining popularity and have been widely used in deep learning
ii) researchers will save time searching for comparison results,
iii) the codes for the data augmentation techniques are available for CV tasks.
for result reproducibility, and iv) the discussion of future work However, these algorithms are data-intensive and often
will spark interest in the research community. suffer from the overfitting problem [119] - where the model
Index Terms—Computer vision, Data Augmentation,Deep performs well on training data but poorly on test data (unseen
learning, Image classification, Object detection, Semantic seg- data). The issue is exacerbated when large amounts of data
mentation, Survey Data Augmentation are not available, which can occur due to privacy concerns or
the need for time-consuming and expensive human labeling
tasks [79], [123]. Despite the existence of large datasets such
as ImageNet [25], overfitting remains a challenge because the
This research was supported by Science Foundation Ireland under grant standard training process only learns the important regions,
numbers 18/CRT/6223 (SFI Centre for Research Training in Artificial intelli-
gence), SFI/12/RC/2289/P 2 (Insight SFI Research Centre for Data Analyt- but fails to learn less important features that are necessary for
ics), 13/RC/2094/P 2 (Lero SFI Centre for Software) and 13/RC/2106/P 2 generalization [162]. Moreover, adversarial attacks [57], [97],
(ADAPT SFI Research Centre for AI-Driven Digital Content Technology). [168] pose a threat to the accuracy of CNNs, where small,
For the purpose of Open Access, the author has applied a CC BY public
copyright licence to any Author Accepted Manuscript version arising from invisible perturbations added to the input image can fool the
this submission. network and cause it to fail to identify the correct features in
an image. source code used in this study is available for result repro-
To address these challenges, data augmentation is often ducibility. It should be noted that this survey does not cover
applied, not just in CV tasks, but also in a range of domains data augmentations based on generative adversarial networks
such as audio [3], [12], [72], [80], [102], [106], [126], [140] (GANs) due to out of the scope of this paper. But we redirect
and text [7], [39], [93], [124]. This survey will specifically the reader to [133], [161] for more details about GAN-based
focus on the CV domain. data augmentations.
Regularization is an effective method for generalizing Con- The followings are our contributions:
volutional Neural Network (CNN) models from both architec- • A comprehensive image data augmentation taxonomy is
tural and data perspectives. Various forms of regularization presented.
have been developed, including Data Augmentation [170], • An extensive survey of state-of-the-art data augmentation
Dropout [132], Batch Normalization [63], Transfer Learn- techniques, complete with visual examples, is provided.
ing [122], [153], and Pre-training [32]. Among these, Image • The performance of state-of-the-art data augmentation
Data Augmentation [170] has proven to be a useful form of techniques is evaluated and compared for several com-
regularization in several studies [76], [125], [170]. This tech- puter vision tasks.
nique expands the dataset by altering the sample’s appearance • The challenges of data augmentation are highlighted and
or flavor [170] to provide a more diverse range of views. future directions are identified.
However, performing Image Data Augmentation directly on • The available codes for data augmentations, following the
image data can increase the risk of overfitting and biases, proposed taxonomy, are compiled for result reproducibil-
making it both important and challenging, as discussed further ity and made available at 3 .
in Section IV. The above contributions provide the following benefits:
Generally, Image Data Augmentation addresses two primary
• A better understanding of data augmentation working
problems in CNN models. The first problem is a shortage of
mechanism to fix the overfitting problem.
data or limited data, which can result in overfitting. Image
• Our comprehensive analysis and comparison between
Data Augmentation provides a solution by feeding the model
the existing data augmentation techniques will save re-
with various scenarios of an image, making the model more
searchers time searching this field.
generalized and allowing for the extraction of more informa-
• Facilitates result reproducibility by providing the source
tion from the original dataset. The second problem relates to
code for the different data augmentation techniques in-
labeling, where the original dataset has a label for each sample.
vestigated.
Augmenting the sample preserves the label of the original
• Future work will spark interest in the research commu-
sample and assigns it to the augmented sample.
nity.
Numerous surveys have been conducted on the topic of
image data augmentation. For example, Wang et al. explored II. TAXONOMY AND BACKGROUND
and compared several traditional data augmentation techniques
in their work [109], but this study was limited to image The proposed taxonomy, presented in Figure 2, classifies
classification tasks only. In another study, Wang et al. re- data augmentation into two main branches: Basic and Ad-
viewed the available data augmentation approaches for face vanced data augmentations. The former encompasses fun-
recognition [146]. Khosla et al. briefly discussed warping damental techniques for data augmentation, while the latter
and oversampling-based data augmentation approaches in their encompasses more complex techniques. The specifics of each
work [68]. However, the authors did not provide a comprehen- data augmentation method are thoroughly discussed in subse-
sive taxonomy or a thorough evaluation of the techniques they quent sections.
discussed. Shorten et al. presented a comprehensive survey on A. Basic Image Data Augmentations
image data augmentation in their work [123]. The authors
This section describes basic image data augmentation meth-
proposed a novel taxonomy, discussed future directions, and
ods and their classifications. They are classified as below:
addressed the challenges associated with data augmentation.
• Image Manipulation
However, the survey lacked an evaluation of image data
augmentation for various computer vision tasks. Additionally, – Geometric Manipulation
as the study is three years old, it may not include the latest – Non-Geometric Manipulation
state-of-the-art augmentation methods such as cutmix and grid • Image Erasing
mask. . Recently, Yang et al. conducted a survey on data – Erasing
augmentation in computer vision tasks [158]. However, their 1) Image Manipulation: Image manipulation refers to the
study only covered a few data augmentation methods and did changes made in an image with respect to its position or color.
not provide any code compilation for result reproducibility. Positional manipulation is made by adjusting the position of
Another study by Xu proposed a novel taxonomy for image the pixels while color manipulations are made by altering
data augmentations [157], but did not evaluate the techniques the pixel values of the image. Image manipulation is further
discussed. This paper presents an extended taxonomy for
data augmentation and reviews state-of-the-art techniques. The 3 https://github.com/kmr2017/Advanced-Data-augmentation-codes
Fig. 1. Overfitting problem: On the left side, overfitting is explained in terms of accuracy, after the inflation point (red dotted line), the training accuracy is
increasing but validation accuracy is decreasing. On the right side, alternatively in terms of loss, training loss is decreasing but validation loss is increasing
after the red dotted line. The figure is taken from the source 2 https://www.baeldung.com/cs/ml-underfitting-overfitting

divided into two main categories. Each of them is discussed (iii) Shearing: Shearing data augmentation involves shifting
below: one part of an image in one direction, while the other
Geometric Data Augmentation: Geometric data augmen- part is shifted in the opposite direction. This technique
tation encompasses modifications to the geometric attributes can provide a new and diverse perspective on the data,
of an image, including its position, orientation, and aspect thereby improving the robustness of a model. However,
ratio. This technique involves transforming the arrangement of excessive shearing can cause significant deformation of
pixels within an image through a variety of techniques such the image, making it difficult for the model to accurately
as rotation, translation, and shearing. Figure 3 illustrates the recognize the objects within it. It is therefore important
most commonly employed geometric augmentations. These to consider the amount of shearing applied to the data
methods are widely used in the domain of computer vision carefully in order to avoid over-augmenting the images
to diversify the training data and improve the resilience of and introducing unwanted noise. In this way, shearing can
models to diverse transformations. The utilization of geometric be a powerful tool for enhancing the generalization ability
data augmentation has become a critical component in the of computer vision models, while avoiding the potential
development of robust computer vision algorithms. Each of drawbacks of over-augmentation. For example, applying
the geometric data augmentations is discussed below: excessive shearing on cat image during data augmentation
may result in a distorted, stretched appearance, hindering
(i) Rotation: Rotation data augmentation involves rotating the ability of a model to correctly classify the image as
an image by a specified angle within the range of 0 to 360 a cat. It is crucial to find a balance between the amount
degrees. The precise degree of rotation is a hyperparame- of shearing applied and the desired level of diversity, as
ter that requires careful consideration based on the nature excessive shearing can introduce significant noise.
and characteristics of the dataset. For instance, in the
Non-Geometric Data Augmentations The non-geometric
MNIST [27] dataset, rotating all digits by 180 degrees,
data augmentation category focuses on modifications to the
transforming a right-rotated 6 results into a 9, would not
visual characteristics of an image, as opposed to its geomet-
be a meaningful transformation. Therefore, a thorough
ric shape. This includes techniques such as noise injection,
understanding of the dataset is necessary to determine the
flipping, cropping, resizing, and color space manipulation, as
optimal degree of rotation and achieve the best results.
illustrated in Figure 4. These techniques can help improve the
(ii) Translation: Translation data augmentation involves
generalization performance of a model by exposing it to a
shifting an image in any of the upward, downward, right,
wider variety of image variations during training. However, it
or left directions, as illustrated in Figure 3, in order
is important to consider the trade-off between augmenting the
to provide a more diverse representation of the data.
data and preserving the integrity of the underlying information
The magnitude of this type of augmentation must be
in the image. The following section outlines several classical
selected with caution, as an excessive shift can result in
non-geometric data augmentation approaches.
a substantial change in the appearance of the image. For
example, translating a digit 8 to the left by half the width (i) Flipping: Flipping is a type of image data augmentation
of the image could result in an augmented image that technique that involves flipping an image either horizon-
resembles the digit 3. Hence, it is imperative to consider tally or vertically. The efficacy of this method has been
the nature of the dataset when determining the magnitude demonstrated on various widely-used datasets, including
of the translation augmentation to ensure its efficacy. cifar10 and cifar100 [74]. However, care must be taken
Image Data Augmentations
Basic Image Data Augmentations Advanced Image Data Augmentations
Image Manipulation Image Erasing Image Mixing Auto Augment Feature Augmentation Neural Style Transfer
Geometric Erasing Single Image Mixing Reinforcement Feature Aug Neural Style
Manipulation Learning Based Transfer
• Local Augment • AutoAugment • FeatMatch • STaDA
• Rotation • Cutout • Self-Aug • Fast AutoAug • Feature Space • Style Aug
• Translation • Random • SalfMix • Faster AutoAug (FS) Aug • StyPath
• Shearing Erasing • KeepAugment • Local Patch • Dataset Aug in And many more
• Hide-and-Seek • CutThumbnail with RL FS
• GridMask And many more And many more
Non-Geometric
Manipulation Multi-Images Mixing
Non-Reinforcement
Learning Based
• Flipping • Mixup
• Cropping • CutMix • RandAug
• Noise injection • SaliencyMix • ADA
• Color Space • RSMDA And many more
• Jitter • PuzzleMix
• Kernel • SnapMix
And many more
Fig. 2. Image data augmentation taxonomy. Note: All image data augmentation names are not added in this taxonomy due space limit. However, all relevant and remaining image data augmentations are
discussed as per taxonomy. The remaining sub-type of categories are discussed in the text.
Channels (C). By altering the values of each channel
separately, this technique can prevent a model from
becoming biased towards specific lighting conditions. The
most straightforward approach to perform color space
augmentation involves replacing a single channel within
the image with a randomly generated channel of the
same size, or with a channel filled with either 0 or 255.
The utilization of color space manipulation is commonly
observed in photo editing applications, where it is used
to adjust the brightness or darkness of the image [123].

Fig. 3. Overview of the geometric data augmentations.

when applying this technique, as the outcome may depend


on the nature of the dataset. For instance, the horizontal
flipping of the digit ”2” in the Urdu digits dataset [67]
may result in the appearance of the digit ”6”. As such,
the choice of flipping must be made carefully to ensure
that the desired level of augmentation is achieved without
introducing significant noise into the data.
(ii) Cropping and resizing: Cropping is a common pre-
processing data augmentation technique that can be ap-
plied randomly or to the center of the image. This
technique involves trimming the image and then resizing
it back to its original size, preserving the original label
of the image. However, caution must be exercised when
using cropping as a data augmentation method, as it may
result in misleading information for the model, such as
cropping the upper or lower part of the digit ”8” and
making it appear as the digit ”0”.
(iii) Noise Injection: Noise injection is a data augmentation
technique that has been demonstrated to enhance the
robustness of neural networks in learning features and Fig. 4. Overview of the non-geometric data augmentations.
defending against adversarial attacks. As shown in the
survey of nine datasets from the UCI repository [123], (v) Jitter: Jitter is a technique of data augmentation that
the use of noise injection has resulted in impressive involves randomly altering the brightness, contrast, sat-
performance improvements. uration, and hue of an image. The four hyperparameters,
(iv) Color Space: The manipulation of individual channel i.e., brightness, contrast, saturation, and hue, can be
values within an image, also known as photometric aug- adjusted by specifying their minimum and maximum
mentation, is a type of data augmentation that can help range. However, it is important to carefully select these
control the brightness of the image. Image data typically ranges as improper adjustments can negatively impact the
consists of three channels: Red (R), Green (G), and Blue image’s content. For example, increasing the brightness of
(B) and has dimensions of Height (H) x Width (W) x X-Ray images used for lung disease detection can result
in the whitening and blending of the lungs in the X-Ray,
hindering the diagnosis of the disease.
(vi) Kernel Filter: Kernel filtering is a form of data augmen-
tation that enhances or softens the image. This is achieved
by applying a window, with a specified size of n x n,
containing a Gaussian-blur or an edge filter to the image.
The Gaussian-blur filter serves to soften the image, while
the edge filter sharpens its edges either horizontally or
vertically.
2) Image Erasing Data Augmentations: The data aug-
mentation technique of image erasing involves the process of
removing specific parts of an image and replacing them with
either 0, 255, or the mean of the entire dataset. This type of
data augmentation includes various methods such as cutout,
random erasing, hide-and-seek, and grid mask, each with their
unique implementation and purpose.
(i) Cutout : The Cutout data augmentation method involves Fig. 6. An example of Hide-and-Seek augmentation [129].
the random removal of a sub-region within an image,
which is then filled with a constant value such as 0
or 255, during the training phase. This approach has (iv) GridMask Data Augmentation The GridMask data aug-
been shown to result in improved performance on widely mentation technique [15] aims to address the challenges
used datasets [29]. An illustration of the Cutout data associated with randomly removing regions from images.
augmentation process is provided in Figure 16. This process, which can completely erase objects or strip
(ii) Random erasing : Random Erasing (RE) [170] randomly away context information, requires a trade-off between
erases the sub-region in the image similar to cutout. But the two. To resolve this, GridMask creates a uniform
the main difference is, it randomly determines whether masking pattern and applies it to images as demonstrated
to mask out region or not and also determines the aspect in figure 7.
ratio and size of the masked region. RE demonstration
for different tasks is shown in figure 5.

Fig. 5. Random erasing examples for different tasks [170].

(iii) Hide-and-Seek The process of hide-and-seek data aug-


mentation [129] involves dividing an image into uniform
squares of random size and then randomly removing a
specified number of these squares. This technique aims to
force neural networks to learn relevant features by hiding
important information. A different view of the image is
presented at each epoch, as depicted in figure 6. It is
important to note that while this technique has been found Fig. 7. This figure shows the procedure of GridMask augmentation. They
to be effective in certain applications, it may also result produce a mask and then multiply it with the input image. Image is taken
in the removal of important information which could from [15].
negatively impact the performance of the model.
B. Advanced Image Data Augmentations
The field of computer vision has seen a surge in interest
regarding data augmentation techniques in recent times. This
has led to the development of a wide range of innovative
methods for augmenting image data, such as mixing im-
ages in novel ways, using reinforcement learning, feature-
based augmentation, and style-based augmentation. To better
understand these advancements, advanced data augmentation
techniques have been classified into different major categories.
These categories provide a useful framework for surveying
the current state of the field and identifying areas for further
research and development.
1) Image Mixing Data Augmentations: Image mixing data
augmentation has gained popularity in computer vision re-
search in recent years. This technique involves blending one or
more images, including the same image, resulting in improved
deep neural network model accuracy. We categorize image
mixing data augmentation into two sub-categories: single
image mixing and non-single image mixing. We compare the
effectiveness of these sub-categories on benchmark datasets
(such as CIFAR10, CIFAR100, IamgeNet etc), as shown in
table II-B5, II, VII, VIII and II-B5.
Single Image Mixing Data Augmentations: A single-
image mixing technique uses only one image and mixes a sin-
gle image from different mixing points of view. Recently, there
has been a lot of work done on single-image augmentation,
such as LocalAugment, SelfAugmentation, SalfMix, and many
more. The description of each SOTA single image mixing data Fig. 8. An example of Global and Local Rotation Image, the example is
taken from [71].
augmentation has been discussed below.
(i) Local Augment: Kim et al. [71] proposed a technique
called LocalAugment, which involves dividing an image
into smaller patches and applying different types of data
augmentation to each patch. The purpose of this technique
is to increase diversity in local features, which could help
reduce bias and improve generalization performance of
neural networks. While this approach does not preserve
the global structure of an image, it provides a rich
set of local features that can benefit neural network
training. Figure8 and 9 provide visual representations of
the LocalAugment technique. Although LocalAugment
technique can generate diverse local features of an image,
it may not be well-suited for certain types of images that
have complex global structures requiring preservation of
global spatial relationships. Therefore, it may have some
limitations, which should be taken into account while
using this technique for image mixing data augmentation.

(ii) Self-Augmentation:This work [121] proposes self-


augmentation, where a random region of an image is
cropped and pasted randomly in the image, improving the
generalization capability in few-shot learning. Moreover,
Fig. 9. Comparison of LocalAugment with CutOut, MixUp etc, example is
the self-augmentation combines regional dropout and taken from [71].
knowledge distillation- knowledge from the trained large
network is transferred to a small network.The process
demonstrated in the figure 10.
Fig. 10. An example of self augmentation, image is taken from [121]

(iii) SalfMix: This work [20] focuses on whether it is pos-


sible to generalize neural networks based on single-
image mixed augmentation. For that purpose, it proposes
SalfMix, the first salient part of the image is found
to decide which part should be removed and which Fig. 12. This image shows the example of KeepAugment with other
augmentations, courtesy [47].
portion should be duplicated. Most salient regions are
cropped and placed into non-salient regions. This process
is defined and compared with other techniques in figure process is shown in figure 13.
11.

Fig. 13. An example of YOCO augmentation, image is taken from [51].

(vi) Cut-Thumbnail: Cut-Thumbnail [156] is a novel data


augmentation, that resizes the image to a certain small
size and then randomly replaces the random region of
the image with the resized image, aiming to alleviate
the shape bias of the network. The advantage of Cut-
thumbnail is, that it not only preserves the original image
Fig. 11. Conceptual comparison between SalfMix method and other single but also keeps it global in the small resized image. On Im-
image-based data augmentation methods, the example is taken from [20]. ageNet, it shows impressive performance using resnet50.
Overall, the cut-thumbnail process and its comparison are
(iv) KeepAugment KeepAugment [47] is introduced to pre- shown in figure 15 and figure 14, respectively.
vent distribution shift which degrades the performance of
neural networks. The idea of KeepAugment is to increase
fidelity by preserving the salient features of the image
and augmenting the non-salient region. Preserved features
help to increase diversity without shifting the distribution.
KeepAugment is demonstrated in figure 12.
(v) You Only Cut Once You Only Cut Once (YOCO) [51] is
introduced with the aim of recognizing objects from par- Fig. 14. Comparison between existing data augmentation methods with Cut-
tial information and improving the diversity of augmen- Thumbnail, the example is from [156].
tation that encourage neural networks to perform better.
YOCO makes two pieces of image and augmentation is Multi-Images Mixing: Multi-Images Mixing data augmen-
applied on each piece, then each piece is concatenated tation uses more than one image and applies different mixing
for an image and YOCO shows impressive performance strategies. Recently, many researchers have explored a lot
and compared with SOTA augmentations, sometimes it of Multi-Images Mixing strategies and still, it is a very
outperforms them. It is easy to implement, has no pa- attentive topic for many researchers. Recently work has in-
rameters, and is easy to use. The YOCO augmentation cluded Mixup, CutMix, SaliencyMix, and many more. Each
Fig. 17. An example of SaliencyMix augmentation, image is taken from [141].

Fig. 15. This image shows an example of reduced images that are called
thumbnails. After reducing the image to a certain size of 112×112 or 56×56,
The dog is still recognizable even though lots of local details are lost,
courtesy [156].

of the relevant non-single image mixing data augmentation


Fig. 18. This image shows the proposed SaliencyMix data augmentation
techniques is discussed below. procedure, courtesy [141]
(i) Mixup: Mixup blends two images based on the blending
factor (alpha) and the corresponding labels of these
images are also mixed in the same way. Mixup data aug- (iv) RSMDA: Random Slices Mixing Data Augmentation:
mentation [163] consistently improved the performance RSMDA [77] addresses issues of feature losing in sin-
not only in terms of accuracy but also in terms of ro- gle image erasing data augmentation. RSMDA gets the
bustness. Experiments on ImageNet-2012 [119], CIFAR- slices of one image and mixes them with another image
10, CIFAR-100, Google commands 4 and UCI datasets 5 alternatively and the corresponding labels are also mixed
showed impressive results on SOTA methods. Further accordingly. This work further investigates three different
demonstration and comparison are shown in the figure 16. strategies of RSMDA; row-wise slice mixing, column-
(ii) CutMix : CutMix tackles the issues of information loss wise slice mixing and randomness of both. Row-wise
and region dropout [162]. It is inspired by cutout [29], slice mixing has shown superior performance. Demon-
where any random region is filled with 0 or 255, while stration of each of the slices mixing strategy is in fig-
in cutmix instead of filling the random region with 0 or ure 19.
255, the region is filled with a patch from another image. (v) Puzzle Mix: This article [70] proposes a puzzle mix data
Correspondingly, their labels are also mixed proportion- augmentation technique that focuses on using explicitly
ally to the number of pixels mixed. It is compared with salient information and basic statistics of image wisely
other methods and shown in figure 16. with the aim of breaking the misleading supervision
of neural networks over existing data augmentations.
Furthermore, the demonstration is shown and compared
with relevant methods in figure 20.
(vi) SnapMix: The article [60] proposes the Semantically
Proportional Mixing (SnapMix) that utilises class activa-
tion map (CAM) to reduce the label noise level. SnapMix
creates the target label considering the actual salient
Fig. 16. Overview of the Mixup, Cutout, and CutMix, Example is from [162]. pixel taking part in the augmented image, which ensures
semantic correspondence between the augmented image
(iii) SaliencyMix: This technique addresses the problem of and mixed labels. The overall process is demonstrated
cutmix and argues that filling a random region of the and compared with closely matching augmentations in
image with a patch from another will not guarantee that the figure 21.
patch has rich information and thereby mixing labels of (vii) FMix: This article proposes the FMix [52], a kind of
unguaranteed patches leads the model to learn unnec- mixed sample data augmentation (MSDA), that utilises
essary information about the patch [141]. To deal with the random binary masks. These random binary masks
that issue, saliencyMix first selects the salient part of the are acquired by applying a threshold to low-frequency
image and pastes it to a random region or salient or non- images that are obtained from the Fourier space. Once
salient of another image. It is shown in figure 17 and the mask is obtained, one color region is applied to input
figure 18. one and another color region is applied to another input.
4 https://research.googleblog.com/2017/08/ The overall process is shown in figure 22.
launching-speech-commands-
dataset.html (viii) MixMo: This paper [112] focuses on the learning of
5 http://archive.ics.uci.edu/ml/index.php multi-input multi-output via sub-network. The main mo-
Fig. 21. A visual comparison of Mixup, CutMix, and SnapMix. The figure
gives an example where a label generated by SnapMix is visually more
consistent with the mixed image semantic structure compared to CutMix and
Mixup, courtesy [60].

Fig. 22. Example masks and mixed images from CIFAR-10 for FMix,
example is from [52].

tivation of the paper is to replace direct hidden summing


operations with more solid mechanisms. For that purpose,
Fig. 19. RSMDA three different strategies. Image is taken from [77]. it proposes MixMo, which embeds M inputs into the
shared space, mixes and passes these to a further layer
for classification. Moreover, the overall process is demon-
strated in figure 23:

Fig. 23. This image shows the overview of MixMo augmentation, the image
is taken from [112].

(ix) StyleMix: This paper [59] targets previous approaches


Fig. 20. A visual comparison of the mixup methods. Puzzle Mix ensures to problems that are unable to differentiate between content
contain sufficient target class information while preserving the local statistics
of each in, the example is from [70]. and style features, approaches such as mixup based data
augmentations. To remedy this problem, it proposes two
approaches styleMix and StyleCutMix, this is the first
work that separately deals with content and style fea- (xi) MixMatch: MixMatch data augmentation technique is
tures of images very carefully and it showed impressive very useful in semi-supervised learning. MixMatch [10]
performance on popular benchmark datasets. The overall augments single image K times and passes all K num-
process is defined and compared with SOTA approaches ber of images to a classifier, averages their prediction
in figure 24. and finally, their predictions are sharpened by adjusting
their distribution temperature term. It is demonstrated in
figure 26.

Fig. 26. Diagram of the label guessing process used in MixMatch, cour-
tesy [10].

(xii) ReMixMatch: In this work [9], an extension of Mix-


Match [10] is proposed to make prior work efficient by
introducing distribution alignment and augmentation an-
choring. The distribution alignment task aims to minimize
the gap between the marginal distribution of predictions
on unlabeled data and the marginal distribution of ground
Fig. 24. A Visual comparison of StyleMix [59] and StyleCutMix with truth labels. On the other hand, augmentation anchoring
Mixup [163] and CutMix [162], example is from [59].
feeds multiple strongly augmented versions of the input
into the model and encourages each output to be close
(x) RandomMix: This work [94] improves generalization
to the prediction for a weakly-augmented version of the
capability by proposing randomMix, which randomly
same input. The process is illustrated in figure 27.
selects augmentation from a set of image mixing aug-
mentations and applies it to images, enabling the model to
look at diverse samples. This method showed impressive
results over SOTA image mixing methods. The overall
demonstration is shown in figure 25.

Fig. 27. Anchoring augmentation. It makes predictions on strong augmen-


tations of the same image (blue) using the forecast for a weakly enhanced
image (green, centre), courtesy [9].

(xiii) FixMatch: Fixmatch [130] is a method for improving the


performance of semi-supervised learning (SSL). It first
assigns pseudo-labels to unlabeled images that have a
predicted probability above a certain threshold, and then
trains the model to match these labels using cross-entropy
loss on a strongly augmented version of the image. The
process is illustrated in Figure 28.
(xiv) AugMix: The work proposed in [56] presents Augmix,
a data augmentation technique that aims to reduce the
distribution gap between training and test data. Augmix
applies M random augmentations to an input image, each
Fig. 25. An illustrative example of RandomMix, image is taken from [94]. with a random strength, and merges the resulting images
based on linear combinations of two examples have
demonstrated promising results. In this paper [135], the
authors investigate two research questions: (i) the rea-
sons behind the success of these methods and (ii) the
significance of linearity in data augmentations. Figure 31
illustrates the overall process.

Fig. 28. This image shows the procedure of FixMatch, image is taken from
[130].

to produce a new image that spans a wider area of


the input space. The process is illustrated in Figure 29,
where three branches perform separate augmentations
and additional operations are added to increase diversity.
The resulting images are then mixed to produce a final
augmented image, which is effective in improving model
robustness.

Fig. 31. A visual comparison of linear methods and generalized augmentation


performed by Improved Mixed-Example, image is taken from [135].

(xvii) Random image cropping and patching (RICAP): Ran-


Fig. 29. An overall procedure of AugMix augmentation and example is
from [56]. dom Image Cropping and Patching (RICAP) [137] is a
new data augmentation technique that cuts and mixes
(xv) Simple Copy-Paste is a Strong Data Augmentation four images rather than two images. The key idea behind
Method, for Instance Segmentation: The proposed RICAP is to crop patch from each of the four images
approach in this work [43] involves copying and pasting and then mixes these patch to create augmented image.
instances from one image to another to create an aug- The labels of the images are also mixed in proportion to
mented image. This simple technique has shown promis- the area of the patcehs. This technique showed impres-
ing results and is easy to implement. Figure 30 illustrates sive performance on popular datasets i.e. CIFAR10, CI-
the process, where instances from two images are pasted FAR100, and imageNet. RICAP demonstration is shown
onto each other at different scales. in figure 32.
(xviii) Cutblur: This article [159] explores and analyses ex-
isting data augmentation techniques for super-resolution
and proposes another data augmentation technique for
super-resolution, named cutblur that cuts high-resolution
image patches and pastes to corresponding low-resolution
images and vice-versa. Cutblur shows impressive perfor-
mance on several super-resolution benchmark datasets.
Furthermore, the process is illustrated in figure 33 and
34.
(xix) ResizeMix: Mixing Data with Preserved Object In-
formation and True Labels : The ResizeMix [111]
Fig. 30. Image augmentation performed by simple Copy-Paste method, image
method directly cuts and pastes the source data in four
courtesy [43]. different ways to target the image. These four different
ways include salient part, non-part, random part or resized
(xvi) Improved Mixed-Example Data Augmentation: Re- source image to patch, as shown in the figure 35. It
cently, label non-preserving data augmentation techniques addresses two questions:
Fig. 35. A visual representation of different cropping manners from the source
image and different pasting manners to the target image, image is taken from
[111].
Fig. 32. A conceptual explanation of the RICAP data augmentation, the
example is from [137].

fective for image semantic segmentation as they are for


image classification. The proposed data augmentation
named ClassMix, augments the training sample by mix-
ing unlabeled samples, by exploiting network prediction
while taking into account object boundaries. It showed
a massive performance gain on two common semantic
segmentation datasets for semi-supervised learning. The
overall process is shown in the figure 36.

Fig. 33. An Schematic illustration of CutBlur operation, image is taken


from [159].

• How to obtain a patch from the source image?


• where to paste the patch from the source image in the
target image?
Furthermore, it was found that saliency information is
not important to promote mixing data augmentation.
ResizeMix is shown in the figure 35.
(xx) ClassMix: Segmentation-Based Data Augmenta-
tion for Semi-Supervised Learning : This research
work [104] proposes novel data augmentation for semi-
supervised learning for semantic segmentation task. It
showed that traditional data augmentations are not ef-
Fig. 36. In a visual representation classMix augmentation, two images are
sampled then based on the predictions of each image a binary mask is created.
The mask is then used to mix the images and their predictions, he image is
taken from [?].

(xxi) Context Decoupling Augmentation for Weakly Su-


pervised Semantic Segmentation (WSSS): This arti-
cle [134] addresses the problem of traditional data aug-
mentation techniques for WSSS, increasing the same con-
textual data semantic samples does not add much value
in object differentiation, i.e. in image classification, “cat”
Fig. 34. A visual comparison between High resolution, low resolution and recognition is due to the cat itself and its surrounding
CutBlur, courtesy [159]. context, these both contexts discourages model to focus
only on the cat. To tackle that issue, this work proposes
a novel data augmentation, named Context Decoupling
Augmentation (CDA). CDA increases diversity of the
specific object and it guides the network to break the
dependencies between object and contextual information.
In this way, it also provides augmentation and the network
focuses on object(s) only rather than object(s) and its
contextual information. A comparison of traditional data
augmentation and CDA is shown below in the figure 37.

Fig. 38. ObjectAug can perform various augmentation methods for each
object to boost the performance of semantic segmentation. The left husky
is scaled and shifted, while the right one is flipped and shifted. Thus,
the boundaries between objects are extensively augmented to boost their
performance, the example is from [164].

• Non-Reinforcement learning data augmentation


Reinforcement Learning data augmentations: Reinforce-
Fig. 37. A visual representation of the difference between the conventional
augmentation approach and context decoupling augmentation (CDA), image ment learning data augmentation techniques generalize and
is taken from [134]. improve the performance of deep networks in an environment.
(i) AutoAugment: This work [23] automatically finds the
(xxii) ObjectAug: Object-level Data Augmentation for Se-
best data augmentation rather than manual data aug-
mantic Image Segmentation: This article [164] ad-
mentation. To address the limitations of manual search-
dresses the problem of mixing image-level data augmen-
based data augmentation, this article proposes autoaug-
tation strategies, which failed to work for segmentation
ment, where search space is designed and has policies
as object and background are coupled and boundaries of
consisting of many sub-policies. Each sub-policy has two
objects are not augmented due to their fixed semantic
parameters one is the image processing function and
bond with the background. To mitigate this problem, this
the second one is the probability with magnitude. These
article [164] proposes a novel approach named Objec-
sub-policies are found using reinforcement learning as a
tAug, object-level augmentation for semantic segmenta-
searching algorithm. The overall process is demonstrated
tion. First, it separates object(s) and backgrounds from an
in figure 39.
image with the help of semantic labels then each object
is augmented using popular data augmentation techniques
such as flipping and rotating. Pixel changes due to these
data augmentations are restored using image inpainting.
Finally, the object(s) and background are coupled to
create an augmented image. Experimental results suggest
that ObjectAug has shown performance improvement for
segmentation tasks. Furthermore, ObjectAug is shown in
the figure 38.
2) AutoAugment: The goal of this technique is to find
the data augmentation policies from training data. It solves
Fig. 39. A visual overview of the sub-policies from ImageNet using
the problem of finding the best augmentation policy as a AutoAugment, example is from [23].
discrete search problem. It consists of a search algorithm and
a search space. Furthermore, these techniques are classified (ii) Fast AutoAugment: Fast Autoaugment [90] addresses
into two sub-categories based on reinforcement learning and the problem of autoaugment, autoaugment takes a lot
non-reinforcement learning. of time to find the optimal data augmentation strategy.
• Reinforcement learning data augmentation To reduce the searching time, fast auto augment finds
more optimal data augmentations using an efficient search
strategy based on density matching. It reduces the higher
order of training time compared to autoaugment. The
overall procedure is shown in figure 40.

Fig. 40. An overall procedure of augmentation search by Fast AutoAugment


algorithm, courtesy [90].

(iii) Faster AutoAugment: This article proposes a faster


autoaugment [53] policy intending to find effective data
augmentation policies very efficiently. Faster autoaug-
ment is based on a differentiable augmentation searching
policy and additionally, it not only estimates gradients
for many transformation operations having discrete pa- Fig. 41. An Overview of the Faster AutoAugment augmentation, image is
taken from [53].
rameters but also provides a mechanism for choosing
operations efficiently. Moreover, it introduces a training
objective function with aim of minimising the distance
between original and augmented distribution, which is
also differentiable. Parameters of augmentations are up-
dated during backpropagation. The Overall process is
defined in figure 41:
(iv) Reinforcement Learning with Augmented Data: This
paper proposes Reinforcement Learning with Augmented
Data (RAD) [84], easily pluggable and enhances the
performance of RL algorithms by targeting two issues i) Fig. 42. An overview of different augmentation investigated in RAD, the
example is from [84].
learning data efficiency and ii) generalisation capability
for new environments. Furthermore, it shows traditional
data augmentation techniques enable RL algorithms to
outperform complex SOTA tasks for pixel-based control
and state-based control. Overall process is demonstrated
in figure 42:
(v) Local Patch AutoAugment with Multi-Agent Collab-
oration: This is the first work [91] that finds data
augmentation policy for patch level using reinforce-
ment learning, named multi-agent reinforcement learning
(MARL). MARL starts by dividing images into patches
and jointly finds the optimal data augmentation policy
for each patch. It shows competitive results on SOTA
benchmarks. Overall process is defined in figure 43:
(vi) Learning Data Augmentation Strategies for Object
Detection: This work [171] proposes to use autoaug-
ment that learns the best policies for object detection.
It finds the best operation and optimal value. Moreover,
it addresses two key issues of augmentation for object Fig. 43. An illustration of different automated augmentation policies, cour-
detection, tesy [91].
a) Classification learned policies can not directly be ap-
plied for detection tasks, and it adds more complexity
to deal with bounding boxes if geometric augmenta-
tions are applied.
b) Most researchers think it adds much less value com-
pared to designing new network architecture so gets
less attention but augmentation for object detection
should be selected carefully.
Some sub-policies for this data augmentation are shown
below:

Fig. 45. Example of scale-aware search space which includes image level
and box-level augmentation, the example is from, [18].

3) Non-Reinforcement Learning data augmentations:


In auto-agument category, there are some approaches that do
not require any reinforcement learning algorithm to find the
best data augmentation, we refer to them as non-reinforcement
learning data augmentation. We categorise a few of them as
discussed below.
(i) RandAugment: Previous optimal augmentation finding
uses reinforcement or some complex learning strategy
that takes a lot of time to find. RandAugment augmen-
tation [24] removes obstacles of a separate searching
phase, which makes training more complex and con-
sequently adds computational cost overhead. To break
this, randaugment applies randomly N number of data
augmentations with M magnitude of all augmentations.
Some visualisation is demonstrated in the figure 46:

Fig. 44. Different data augmentation sub-policies explored, image is taken


from [171].

Sub-policy 1. (Color, 0.2, 8), (Rotate, 0.8, 10)


Sub-policy 2. (BBox Only ShearY, 0.8, 5)
Sub-policy 3. (SolarizeAdd, 0.6, 8), (Brightness, 0.8, 10)
Sub-policy 4. (ShearY, 0.6, 10), (BBox Only Equalize, 0.6,8)
Sub-policy 5. (Equalize, 0.6, 10), (TranslateX, 0.2, 2)

(vii) Scale-aware Automatic Augmentation for Object De-


tection: This work [18] proposes a new data augmen-
tation for object detection named scale aware autoAug,
first, it defines a search space where image level and box
level data augmentation are prepared for scale invariance, Fig. 46. Example images augmented by RandAugment, image is taken from
secondly, it also proposes a new search metric named [24].
Pareto scale balance for search augmentation effectively
and efficiently. Some examples of data augmentation are (ii) RangeAugment RangeAugment [98] is a data aug-
shown in figure 45. mentation technique that aims to improve upon the
shortcomings of existing approaches like AutoAugment PASCAL VOC training examples with Microsoft COCO
and RandAugment. These methods use manually-defined data by selecting a subset from Microsoft COCO datasets
ranges of magnitudes for each type of data augmentation, that are consistent with PASCAL VOC. Consequently, it
which can result in sub-optimal policies. In contrast, increases the dataset size and improves the performance.
RangeAugment learns efficient ranges of magnitudes for The schematic diagram is shown in the figure 49.
each augmentation and composite data augmentation by
introducing an auxiliary loss based on image similarity.
This loss is designed to control the magnitude ranges,
resulting in more effective and optimal policies. The
process of RangeAugment is illustrated in Figure 47.

Fig. 47. RangeAugment with neural network training [98].


Fig. 49. The proposed schematic diagram. The example is taken from [49].
(iii) ADA: Adversarial Data Augmentation for Object
Detection: Data augmentation for object detection has (v) Robust and Accurate Object Detection via Adversarial
improved performance but it is difficult to understand Learning: This article [16] first shows classifier perfor-
whether these augmentations are optimal or not. This mance gain from different data augmentations when it
article [8] provides a systematic way to find optimal is fine-tuned to object detection tasks and suggests that
adversarial perturbation of data augmentation from an the performance in terms of accuracy or robustness is not
object detection perspective, that is based on game- improving. The article provides a unique way of exploring
theoretic interpretation aka Nash equilibrium of data. adversarial samples that helps to improve performance.
Nash equilibrium provides the optimal bounding box pre- To do so, it augments the example during the fine-
dictor and optimal design for data augmentation. Optimal tuning stage for object detectors by exploring adversarial
adversarial perturbation refers to the worst perturbation of samples, which is considered as model-dependent data
ground truth, that forces the box predictor to learn from augmentation. First, it picks the stronger adversarial sam-
the most difficult distribution of samples. An example is ple from detector classification and localization layers
shown in figure 48. and ensures the augmentation policy remains consistent.
It showed significant performance gain in terms of ac-
curacy and robustness on different object detection tasks.
Furthermore, the robustness and accuracy of the proposed
method are shown in figure 50.
(vi) Perspective Transformation Data Augmentation for
Object Detection: This article [145] proposes a new
data augmentation for objection detection named perspec-
tive transformation that generates new images captured
Fig. 48. Annotation distribution types. Adversarial augmentation chooses at different angles. Thus, it mimics images as if they
bounding boxes that are as distinct from the truth as possible while yet
containing crucial object characteristics. The example is taken from [8]. are taken at a certain angle where the camera can not
capture those images. This method showed effectiveness
(iv) Deep CNN Ensemble with Data Augmentation for on several object detection datasets. An example of the
Object Detection: This article [49] proposes a new vari- proposed data augmentation is shown in figure 51.
ant of the regions with convolutional neural network (R- (vii) Deep Adversarial Data Augmentation for Extremely
CNN) model with two core modifications in training and Low Data Regimes: This article [165] addresses the
evaluation. First, it uses several different CNN models issue of extremely low data regimes-labeled data is very
as ensembler in R-CNN, secondly, it smartly augments less, no unlabeled data at all. To deal with that problem, it
proposes a deep adversarial data augmentation (DADA),
where data augmentation is formulated as a problem of
training class conditional and supervised GAN. Further-
more, it also introduces new discriminator loss with aim
of fitting data augmentation where real and augmented
samples are forced to participate equally and be consistent
in finding decision boundaries.
4) Feature augmentation: Feature augmentation is another
category of data augmentation, where images are transformed
into embedding or representation then data augmentation is
performed on the embedding of the image. Recently a few
works have been done in this area, we selectively highlight
the work in a precise way.
(i) FeatMatch: Feature-Based Augmentation for Semi-
Supervised Learning : This work [81] presents a novel
approach of data augmentation in features space for SSL
inspired by an image-based SSL method that uses a com-
bination of augmentations of the images and consistency
regularization. Image-based SSL methods are restricted to
only conventional data augmentation. To break this end,
the feature-based SSL method produced diverse features
from complex data augmentations. One key point is, these
advanced data augmentations exploit the information
from both intra-class and inter-class representations ex-
tracted via clustering. The proposed method only showed
significant performance gain on min-Imagenet such as an
absolute 17.44% gain on miniImageNet, but also showed
robustness on samples that are out-of-distribution. More-
over, the difference between image-level and feature-level
augmentation and consistency is shown in figure 52.

Fig. 50. Overview of Robust and Accurate Object detection via adversarial
learning. In the top image, it improves object detector accuracy on clean im-
ages. In middle, improves the detector’s robustness against natural corruption,
and at the bottom, it improves the robustness against cross-dataset domain
shift. The image is taken from [18].

Fig. 52. An overview of featMatch augmentation applied on images and


features. Image is taken from [21].

(ii) Dataset Augmentation in Feature Space: This


work [28] first used encoder-decoder to learn represen-
Fig. 51. Perspective transformation data augmentation. An example image is tation, then on representation apply different transforma-
taken from [145] tions such as adding noise, interpolating, or extrapolating.
The proposed method has shown performance improve-
ment on both static and sequential data. Moreover, a also suggests that it is possible to apply general data
demonstration of this augmentation is shown in figure 53. augmentation techniques in feature space if reasonable
data augmentations for data are known.
5) Neural Style Transfer: It is another category of data
augmentation, which can transfer the artist style of one image
to another without changing semantics at a high level. It brings
more variety to the training set. The main objective of this
neural style transfer is to generate a third image from two
images, where one image provides texture content and another
provides high-level semantic content. We explore some of the
SOTA augmentations for the sub-category.
Fig. 53. Overview of interpolation and extrapolation between handwritten
characters. Original characters are shown in bold. Image is taken from [28]. (i) STaDA: Style Transfer as Data Augmentation : This
work [169] thoroughly evaluated different SOTA neural
(iii) Feature Space Augmentation for Long-Tailed Data : style transfer algorithms as data augmentation for image
This paper [21] proposed the novel data augmentation in classification tasks. It shows significant performance gain
feature space to address the long-tailed issue and uplift on Caltech 101 [38] and Caltech 256 [48] datasets. Fur-
the under-represented class samples. The proposed ap- thermore, it also combines neural style transfer algorithms
proach first separates class-specific features into generic with conventional data augmentation methods. A sample
and specific features with the help of class activation of this augmentation is shown in figure 55.
maps. Under-represented class samples are generated
by injecting class-specific features of under-represented
classes with class-generic features from other confusing
classes. It enables diverse data and also deals with the
problem of under-represented class samples. It has shown
SOTA performance on different datasets. It is demon-
strated in figure 54.

Fig. 55. Overview of the original image and two stylized images by STaDA.
Image is taken from [169].

(ii) Style Augmentation: Data Augmentation via Style


Randomization: This work [66] proposed a novel data
augmentation named style augmentation (SA) based on
style neural transfer. SA randomizes the color, contrast,
and texture while maintaining the shape and semantic
Fig. 54. Left: limited but well-spread data. Right: Without sufficient data. content during the training. This is done by picking
Image is taken from [21].
an arbitrary style transfer network for randomizing the
(iv) Adversarial Feature Augmentation for Unsupervised style and by getting the target style from multivariate
Domain Adaptation: Generative Adversarial Networks normal distribution embedding. It improves performance
(GANs) showed promising results in unsupervised do- in three different tasks: classification, regression, and
main adaptation to learn target domain features indistin- domain adaptation. The style augmentation sample is
guishable from the source domain. This work [143] ex- shown in figure 56.
tends GAN by contributing: i) it forces feature extractor to
be domain-invariant ii) To train it via data augmentation
in feature space, named feature augmentation. This work
explores data augmentation at the feature level with GAN.
(v) Understanding data augmentation for classification:
when to warp? : This paper [154] investigates the data
augmentation advantages on image space and feature
space during training. It proposed two approaches i)
Fig. 56. Overview of Style augmentation applied to an image. The shape is
data warping which generates extra samples in image preserved but the style, including color, texture, and contrast is randomized.
space using data augmentations and ii) synthetic over- Image is from [66].
sampling, which generates samples in feature space. It
(iii) StyPath: Style-Transfer Data Augmentation for Ro-
bust Histology Image Classification: This paper [22]
proposes a novel pipeline for Antibody Mediated Rejec-
tion (AMR) classification in kidneys based on StyPath
data augmentation. StyPath is data augmentation that
transfers style intending to reduce bias. The proposed
augmentation is much faster than SOTA augmentations
for AMR classification. Some samples are shown in
figure 57.

Fig. 58. Overview of generating synthetic COVID images from the healthy
category. As the no of epochs grows the quality of the synthetic images
improves. An example is from [58].

Fig. 57. Comparison of content and random initialization. Authors observe


that output images initialized as the noise appeared distorted and discolored
and failed to retain the content fidelity. Image is from [22].

(iv) A Neural Algorithm of Artistic Style : This work [42] Fig. 59. Overview of the styled image by the neural algorithm. Image is
from [42].
introduces an artificial system (AS) based on a deep
neural network that generates artistic images of high
perceptual quality. AS creates neural embedding then it A. Image Classification
uses the embedding to separate the style and content of
the image and then recombines the content and style of In this section, we present the result of several SOTA
target images to generate the artistic image. The sample data augmentation methods for supervised learning and semi-
is shown in figure 59 supervised learning. Both are discussed below:
(v) Neural Style Transfer as Data Augmentation for 1) supervised learning results: In supervised learning, we
Improving COVID-19 Diagnosis Classification : This have data on a large quantity that is fully labeled and we use
work [58] shows the effectiveness of a cycle GAN, which this data to train the neural network (NN) model. In this sec-
is mostly used for neural style transfer, augments COVID- tion, we compile and compare the results from several SOTA
19 negative x-ray image to convert into a positive COVID data augmentation methods and put them in two different ta-
image to balance the dataset and also to increase the bles as shown in table II-B5 and table II. In table II-B5 results,
+
diversity of the dataset. It shows that augmenting the sign shows traditional data augmentations such as flipping,
images with cycle GAN can improve performance over rotating, and cropping, have been used along with the SOTA
several different CNN architectures. A sample of this augmentation methods. The used datasets are CIFAR10 [74],
augmentation is shown in figure 58. CIFAR100 [74] and ImageNet [26], and the used networks
are wideresnet flavours [55], pyramid network flavours and
several popular resnet flavours [55]. Accuracy is the evaluation
metric used to compare the different algorithms used. The
III. R ESULTS higher the accuracy, the better. As it can be in table II-B5 and
table II, each data augmentation has significantly improved the
In this section, we provide the detailed result for various accuracy.
Computer Vision tasks such as image classification, object 2) Semi-supervised learning: Semi-supervised learning
detection, and semantic segmentation. The main purpose is (SSL) is when we have a limited labeled data but unlabeled
to show the effect of the data augmentation in CV different data is available on the large scale. Labeling the unlabeled
tasks and to do so, we compile results from various SOTA data is tedious, time-consuming, and costly [79], [155]. To
data augmentation works. avoid these issues, SSL is used. There are several techniques
Accuracies
Method CIFAR10 CIFAR10+ CIFAR100 CIFAR100+
ResNet-18 (Baseline) 89.37 95.28 63.32 77.54
ResNet-18 + CutOut 90.69 96.25 65.02 80.58
ResNet-18 + Random Erasing 95.28 95.32 - -
ResNet-18 + CutMix 90.56 96.22 65.58 80.58
ResNet-18 + SaliencyMix 92.41 96.35 71.27 80.71
ResNet-18 + GridMask 95.28 96.54 - -
ResNet-50 (Baseline) 87.86 95.02 63.52 78.42
ResNet-50 + CutOut 91.16 96.14 67.03 78.62
ResNet-50 + CutMix 90.84 96.39 68.35 81.28
ResNet-50 + SaliencyMix 93.19 96.54 75.11 81.43
WideResNet-28-10 (Baseline) [141] 93.03 96.13 73.94 81.20
WideResNet-28-10 + CutOut [29] 94.46 96.92 76.06 81.59
WideResNet-28-10 + Random Erasing 96.2 96.92 81.59 82.27
WideResNet-28-10 + GridMask 96.13 97.24 - -
WideResNet-28-10 + CutMix 94.82 97.13 76.79 83.34
WideResNet-28-10 + PuzzleMix - - - 83.77
WideResNet-28-10 + SaliencyMix 95.96 97.24 80.55 83.44
Note: + sign after dataset name shows
that traditional data augmentation methods have been used
TABLE I
BASELINE PERFORMANCE COMPARISON OF VARIOUS AUGMENTATION ON CIFAR10 AND CIFAR100 DATASETS .

of SSL, but recently, data augmentation is employed with used in several research papers. In table IX and table X,
the limited labeled data to increase the diversity of the data. we compiled the effectiveness of validation set results on the
Data augmentation with SSL has increased the performance different datasets with the effect of SOTA data augmentations
on different datasets and NN architectures. The used dataset on the semantic segmentation task. The results are reported
are CIFAR10, CIFAR100, SVHn [103] and Mini-ImageNet. in the term of mean intersection over union (mIoU) as the
Several SSL techniques are used such as pseudoLabel, SSL accuracy on the Cityscape dataset and PASCAL VOC dataset
with memory, label propagation, mean teacher, etc. We com- as shown in table IX and table X, respectively. We found
pile the results from many SOTA SSL methods with data performance gains on a few metrics such as mIoU and mAP,
augmentation and present them in this work. The effect of with several semantic segmentation models: deeplabv3+ [160],
the data augmentation has also been shown with the different DeepLab-v2 [104], Xception-65 [160], ExFuse [166] and
number of samples in SSL as shown in table III, table IV, and Eff-L2 [172] . It has been observed that incorporating data
table V. augmentation techniques can enhance the performance of
semantic segmentation models. Notably, advanced image data
B. Object detection augmentation methods have demonstrated greater improve-
In this section, we discuss the effectiveness of various ments in performance compared to traditional techniques.
image data augmentation techniques on the frequently used Table IX and table X provide evidence of this improvement.
COCO2017 [92], PASCAL VOC [35], VOC 2007 [33], and The traditional data augmentations including rotation, scaling,
VOC 2012 [34] datasets, which are commonly used for object flipping, and shifting [164].
detection tasks. We compile results from various SOTA data
augmentation methods and put them in three different tables as IV. D ISCUSSION AND FUTURE DIRECTIONS
shown in the table II-B5, VII, and VIII. FRCNN along with
synthetic data gives the best mAP accuracy on VOC 2007 A. Current approaches
dataset as shown in table VII. Several classical and automatic It is proven that if we provide more data to the model,
data augmentation methods have shown promising perfor- it improves model performance [50], [136]. A few current
mance using different SOTA models on the PASCAL VOC tendencies are discussed by Xu et al. [157]. Among these,
dataset as shown in table II-B5. The DetAdvProp achieves the one way is to collect the data and label it manually, but it
highest score outperforming AutoAugment [23] on PASCAL is not an efficient way to do this. Another efficient way is
VOC 2012 dataset as shown in the table VIII. The scores are to apply data augmentation, the more data augmentations we
in terms of mean average precision (mAP), average precision apply, the better improvement we get in terms of performance
(AP) at the intersection over union (IOU) of 0.5 (AP50), and but to a certain extent. Currently, image mixing methods and
AP at IOU of 0.75 (AP75) metrics. autoaugment methods are successful for image classification
tasks, scale aware based auto augment methods are showing
C. Semantic Segmentation promising results in detection tasks and semantic segmentation
This subsection includes semantic segmentation results on tasks. But these data augmentation performances can vary
PASCAL VOC and CITYSCAPES datasets, most frequently with the number of data augmentation applied, as it is known
CIFAR-10 CIFAR-100 ImageNet
Augmentation Accuracy (%) Model Accuracy (%) Model Accuracy (%) Model
Cutout [29] 97.04 WRN-28-10 81.59 WRN-28-10 77.1 ResNet-50
Random Erasing [170] 96.92 WRN-28-10 82.27 WRN-28-10 - -
Hide-and-Seek [129] 95.53 ResNet-110 78.13 ResNet-110 77.20 ResNet-50
GridMask [15] 97.24 WRN-28-10 - - 77.9 ResNet-50
LocalAugment [71] - - 95.92 WRN-22-10 76.87 ResNet-50
SalfMix [20] 96.62 PreActResNet-101 80.11 PreActResNet-101 - -
KeepAugment [47] 97.8 ResNet-28-10 - - 80.3 ResNet-101
Cut-Thumbnail [156] 97.8 ResNet-56 95.94 WRN-28-10 79.21 ResNet-50
MixUp [163] 97.3 WRN-28-10 82.5 WRN-28-10 77.9 ResNet-50
CutMix [162] 97.10 WRN-28-10 83.40 WRN-28-10 78.6 ResNet-50
SaliencyMix [141] 97.24 WRN-28-10 83.44 WRN-28-10 78.74 ResNet-50
PuzzleMix [70] - - 84.05 WRN-28-10 77.51 ResNet-50
FMix [52] 98.64 Pyramid 83.95 Dense 77.70 ResNet-101
MixMo [112] 96.38 WRN-28-10 82.40 WRN-28-10 - -
StyleMix [59] 96.44 PyramidNet-200 85.83 PyramidNet-200 77.29 PyramidNet-200
RandomMix [94] 98.02 WRN-28-10 84.84 WRN-28-10 77.88 WRN-28-10
MixMatch [10] 95.05 WRN-28-10 74.12 WRN-28-10 - -
ReMixMatch [9] 94.71 WRN-28-2 - - - -
FixMatch [130] 95.69 WRN-28-2 77.04 WRN-28-2 - -
AugMix [56] - - - - 77.6 ResNet-50
Improved Mixed-Example [135] 96.02 ResNet-18 80.3 ResNet-18 - -
RICAP [137] 97.18 WRN-28-10 82.56 ResNet-28-10 78.62 WRN-50-2
ResizeMix [111] 97.60 WRN-28-10 84.31 WRN-28-10 79.00 ResNet-50
AutoAugment [23] 97.40 WRN-28-10 82.90 WRN-28-10 83.50 AmoebaNet-C
Fast AutoAugment [90] 98.00 SS(26 2×96d) 85.10 SS(26 2×96d) 80.60 ResNet-200
Faster AutoAugment [53] 98.00 SS(26 2 × 112d) 84.40 SS(26 2×96d) 75.90 ResNet-50
Local Patch AutoAugment [91] 98.10 SS(26 2 × 112d) 85.90 SS(26 2×96d) 81.00 ResNet-200
RandAugment [24] 98.50 PyramidNet 83.30 WRN-28-10 85.00 EfficientNet-B7
TABLE II
P ERFORMANCE COMPARISON OF THE VARIOUS IMAGE ERASING AND IMAGE MIXING AUGMENTATIONS FOR IMAGE CLASSIFICATION PROBLEMS . WRN
AND SS STAND FOR W IDE R ES N ET AND S HAKE -S HAKE , RESPECTIVELY.

that the combined data augmentation methods show better to find the optimal number of sample generation [78]. But it
performance than single one [108], [158]. is not feasible way as it requires time and computational cost.
Can we devise a mechanism to find an optimal number of
B. Theoretical aspects
samples, which is an open research challenge?
There is no theoretical support available to explain why
specific augmentation is improving performance and which D. Selection of data augmentation based on model archi-
sample(s) should be augmented, as the same aspect has been tecture and dataset
discussed by Yang et al [158] and Shorten et al [123]. Like
Data augmentation selection depends on the nature of the
in random erasing, we randomly erase the region of the
dataset and model architecture. Like on MNIST [27] dataset,
image - sometime may erase discriminating features, and the
geometric transformations are not safe such as rotation on 6
erased image makes no sense to a human. But the reason
and 9 digits will no longer preserve the label information.
behind performance improvement is still unknown, which is
For densely parameterized CNN, it is easy to overfit weakly
another open challenge. Most of the time, we find the optimal
augmented datasets, and for shallow parameterized CNN, it
parameters of the augmentation through an extensive number
may break generalization capability with data augmentation.
of experiments or we choose data augmentation based on our
It suggests, while selecting the data augmentation, the nature
experience. But there should be a mechanism for choosing the
of the dataset and model architecture should be taken into
data augmentation with theoretical support considering model
account. Currently, numerous experiments are performed to
architecture and dataset size. Researching the theoretical as-
find model architecture and suitable data augmentation for a
pect is another open challenge for the research community.
specific dataset. Devising a systematic approach to select the
C. Optimal number of samples generation data augmentation based on dataset and model architecture is
It is a known fact, as we increase data size, it improves another gap to be filled.
the performance [50], [123], [136], [158] but it is not a case
- increasing the number of samples will not improve perfor- E. Augmentations for spaces
mance after a certain number of samples [78]. What is the Most of the data augmentation approaches have been ex-
optimal number of samples to be generated, depending on the plored on the image level - data space. Very few research
model architecture and dataset size, is a challenging aspect to works have explored data on feature level - feature space. The
be explored. Currently, researchers perform many experiments challenge here arises, in which space should we apply data
TABLE III
C OMPARISON ON CIFAR-10 AND SVHN. T HE NUMBER REPRESENTS ERROR RATES ACROSS THREE RUNS .

CIFAR-10 SVHN
Method 40 labels 250 labels 1,000 labels 4,000 labels 40 labels 250 labels 1,000 labels 4,000 labels
VAT [101] - 36.03 ± 2.82 18.64 ± 0.40 11.05 ± 0.31 - 8.41 ± 1.01 5.98 ± 0.21 4.20 ± 0.15
Mean Teacher [138] - 47.32 ± 4.71 17.32±4.00 10.36±0.25 - 6.45±2.43 3.75±.10 3.39±0.11
MixMatch [10] 47.54±11.50 11.08±.87 7.75±.32 6.24±.06 42.55±14.53 3.78±.26 3.27±.31 2.89±.06
ReMixMatch [9] 19.10±9.64 6.27±0.34 5.73±0.16 5.14±0.04 3.34±0.20 3.10±0.50 2.83±0.30 2.42±0.09
UDA 29.05±5.93 8.76± 0.90 5.87± 0.13 5.29± 0.25 52.63±20.51 2.76± 0.17 2.55± 0.09 2.47± 0.15
SSL with Memory [17] - - - 11.9±0.22 - 8.83 4.21 -
Deep Co-Training [110] - - - 8.35± 0.06 - - 3.29 ±0.03 -
Weight Averaging [5] - - 15.58 I 0.12 9.05± 0.21 - - - -
ICT [142] - - 15.48 I 0.78 7.29± 0.02 - 4.78 I 0.68 3.89 ±0.04 -
Label Propagation [64] - - 16.93 ± 0.70 10.61 ± 0.28 - - - -
SNTG [96] - - 18.41 ± 0.52 9.89 ±0.34 - 4.29± 0.23 3.86 ±0.27 -
PLCB [4] - - 6.85 ±0.15 5.97± 0.15 - - - -
II-model [120] - 53.02 ±2.05 31.53 ± 0.98 17.41± 0.37 - 17.65 ±0.27 8.60± 0.18 5.57± 0.14
PseudoLabel [85] - 49.98 ±1.17 30.91 ±1.73 16.21 ± 0.11 - 21.16± 0.88 10.19 ± 0.41 5.71± 0.07
Mixup [163] - 47.43 ± 0.92 25.72 ± 0.66 13.15 ± 0.20 - 39.97 ± 1.89 16.79 ± 0.63 7.96 ±0.14
FeatMatch [81] - 7.50 ±0.64 5.76 ±0.07 4.91± 0.18 - 3.34± 0.19 3.10± 0.06 2.62 ±0.08
FixMatch [130] 13.81±3.37 5.07±0.65 - 4.26±0.05 3.96±2.17 2.48±0.38 2.28±0.11 -
SelfMatch [69] 93.19±1.08 95.13±0.26 - 95.94±0.08 96.58±1.02 97.37±0.43 97.49±0.07 -

TABLE IV
C OMPARISON ON CIFAR-100 AND MINI -I MAGE N ET. T HE NUMBER REPRESENTS ERROR RATES ACROSS TWO RUNS .

CIFAR-100 mini-ImageNet
Method 400 labels 4,000 labels 10,000 labels 4,000 labels 10,000 labels
II-model [120] - - 39.19± 0.36 - -
SNTG [96] - - 37.97± 0.29 - -
SSL with Memory [17] - - 34.51± 0.61 - -
Deep Co-Training [110] - - 34.63± 0.14 - -
Weight Averaging [5] - - 33.62± 0.54 - -
Mean Teacher [138] - 45.36 ±0.49 36.08± 0.51 72.51± 0.22 57.55 ± 1.11
Label Propagation [64] - 43.73 ±0.20 35.92 ±0.47 70.29± 0.81 57.58 ±1.47
PLCB [4] - 37.55 ±1.09 32.15 ±0.50 56.49 ±0.51 46.08 ± 0.11
FeatMatch - 31.06 ± 0.41 26.83 ± 0.04 39.05 0.06 34.79±0.22
MixMatch 67.61±1.32 - 28.31±0.33 - -
UDA 59.28±0.88 - 24.50±0.25 - -
ReMixMatch 44.28±2.06 - 23.03±0.56 - -
FixMatch 48.85±1.75 - 22.60±0.12 - -

augmentation, data space, or feature space? It is another inter- smoothing for image manipulation and image erasing
esting aspect that can be explored. For the current approaches, subcategories - where the image part is lost. For example,
it seems like it depends on the dataset, model architecture, and if the image portion is randomly cut out in cutout data
task. Currently, approaches are conducting experiments in data augmentation, the corresponding label should be mixed.
space and feature space and then selecting the best one [154]. It is an interesting open research question.
It is not the optimal way to find data augmentation for specific • Currently, data augmentation is performed without con-
space. It is still an open challenge to be solved. sidering the importance of an example. All examples
may not be difficult for the neural network to learn,
F. Open research questions but some are. Thus, augmentation should be applied to
Despite the success of data augmentation techniques in dif- those difficult examples by measuring the importance
ferent Computer Vision tasks, it still failed to solve challenges of the examples. How neural network behave if data
in SOTA data augmentation techniques. After thoroughly augmentation is applied to those difficult examples?
reviewing SOTA data augmentation approaches, we found • In image mixing data augmentations, if we mix more
several challenges and difficulties, which are yet to be solved, than two images salient parts, that are truly participating
as it is listed below: in augmentation unlike RICAP [137], what is its effect
• In image mixing techniques, label smoothing has been
in terms of accuracy and robustness against adversarial
used. It makes sense whatever portion of images is mixed, attacks? Note, the corresponding labels of these images
corresponding labels should be mixed accordingly. To will be mixed accordingly.
the best of our knowledge, none has explored label • In random data augmentation under the auto augmen-
TABLE V
C OMPARISON OF TEST ERROR RATES ON CIFAR-10 & SVHN USING W IDE R ES N ET-28 AND CNN-13.

Approach Method CIFAR-10 (Nl =4000) SVHN(Nl =1000)


WideResNet-28
Supervised 20.26 ± 0.38 12.83 ± 0.47
Pseudo PL [85] 17.78 ± 0.57 7.62 ± 0.29
Labeling PL-CB [4] 6.28 ± 0.3 -
II Model [83] 16.37 ± 0.63 7.19 ± 0.27
Mean Teacher [138] 15.87 ± 0.28 5.65 + 0.47
VAT [101] 13.86 ± 0.27 5.63 ± 0.20
Consistency VAT + EntMin [101] 13.13 I 0.39 5.35 + 0.19
LGA + VAT [65] 12.06 ± 0.19 6.58 ± 0.36
Regularization ICT [142] 7.66 ± 0.17 3.53 ± 0.07
MixMatch [10] 6.24 ± 0.06 3.27 ± 0.31
UDA 5.29 ± 0.25 2.46 ± 0.17
ReMixMatch (Berthelot et al. 2020) 5.14 ±0.04 2.42 ± 0.09
FixMatch [130] 4.26 ± 0.05 2.28 ± 0.11
CL 8.92 ± 0.03 5.65 ± 0.11
Pseudo CL+FA [90] 5.51 0.14 2.90 ± 0.19
Labeling CL+FA [90]+Mixup [163] 5.09 ± 0.18 2.75 ± 0.15
CL+RA+Mixup [163] 5.27 ± 0.16 2.80 ± 0.188
CNN-13
Pseudo Labeling TSSDL-MT 9.30 ± 0.55 3.35 ± 0.27
LP-MT 10.61±0.28 -
Ladder net [117] 12.36±0.31 -
MeanTeacher [138] 12.31 ± 0.24 3.95 ± 0.19
Temporal ensembling [83] 12.16 ± 0.24 4.42 ± 0.16
Consistency VAT [101] 11.36 ± 0.34 5.42
Regularization NATEntMin [101] 10.55 ± 0.05 3.86
SNTG [96] 10.93 ± 0.14 3.86 ± 0.27
ICT [142] 7.29 ± 0.02 2.89 ± 0.04
Pseudo CL 9.81 ± 0.22 4.75 ± 0.28
Labeling CL+RA 5.92 ± 0.07 3.96 ± 0.10

tation category, the order of augmentations has not been have the potential to further advance the field. This survey is
explored. We believe it has a significant importance. What expected to benefit researchers in several ways: (i) a deeper
are the possible ways to explore the order of existing understanding of data augmentation, (ii) the ability to easily
augmentations such as first traditional data augmentations compare results, and (iii) the ability to reproduce results with
and then image mixing or weight-based? available code.
• Finding an optimal and an ordered number of data
augmentation, and the optimal number of samples to be ACKNOWLEDGMENT
augmented are open challenges. For example, in randAug
This research was supported by Science Foundation Ireland
method, there are N optimal number of augmentations
under grant numbers 18/CRT/6223 (SFI Centre for Research
found but it is not known how many, in which order and
Training in Artificial intelligence), SFI/12/RC/2289/P 2
what samples should be augmented?
(Insight SFI Research Centre for Data Analytics),
V. C ONCLUSION 13/RC/2094/P 2 (Lero SFI Centre for Software) and
13/RC/2106/P 2 (ADAPT SFI Research Centre for AI-
This survey provides a comprehensive overview of state-of- Driven Digital Content Technology). For the purpose of Open
the-art (SOTA) data augmentation techniques for addressing Access, the author has applied a CC BY public copyright
overfitting in computer vision tasks due to limited data. A licence to any Author Accepted Manuscript version arising
detailed taxonomy of image data augmentation approaches from this submission.
is presented, along with an overview of each SOTA method
and the results of its application to various computer vision R EFERENCES
tasks such as image classification, object detection, and seman-
tic segmentation. The results for both supervised and semi- [1] Jiwoon Ahn, Sunghyun Cho, and Suha Kwak. Weakly supervised
learning of instance segmentation with inter-pixel relations. In Pro-
supervised learning are also compiled for easy comparison ceedings of the IEEE/CVF conference on computer vision and pattern
purposes. In addition, the available code for each data augmen- recognition, pages 2209–2218, 2019.
tation approach is provided to facilitate result reproducibility. [2] Jiwoon Ahn and Suha Kwak. Learning pixel-level semantic affinity
with image-level supervision for weakly supervised semantic segmen-
The difficulties and challenges of data augmentation are also tation. In Proceedings of the IEEE conference on computer vision and
discussed, along with promising open research questions that pattern recognition, pages 4981–4990, 2018.
Method Detector BackBone AP AP5 0 AP7 5 APs APm APl
Hand-crafted:
Dropblock [44] RetinaNet ResNet-50 38.4 56.4 41.2 − − −
AutoAugment+color Ops [171] RetinaNet ResNet-50 37.5 - - − − −
geometric Ops [171] RetinaNet ResNet-50 38.6 - - − − −
bbox-only Ops [171] RetinaNet ResNet-50 39.0 - - − − −
Mix-up [167] Faster R-CNN ResNet-101 41.1 - - - - -
PSIS* [144] Faster R-CNN ResNet-101 40.2 61.1 44.2 22.3 45.7 51.6
Stitcher [19] Faster R-CNN ResNet-101 42.1 - - 26.9 45.5 54.1
GridMask [15] Faster R-CNN ResNeXt-101 42.6 65.0 46.5 - - -
InstaBoost* [37] Mask R-CNN ResNet-101 43.0 64.3 47.2 24.8 45.9 54.6
SNIP (MS test)* [127] Faster R-CNN ResNet-101-DCN-C4 44.4 66.2 49.9 27.3 47.4 56.9
SNIPER (MS test)* [128] Faster R-CNN ResNet-101-DCN-C4 46.1 67.0 51.6 29.6 48.9 58.1
Traditional Aug [158] Faster R-CNN ResNet-101 36.80 58.0 40.0 - - -
Traditional Aug* [31] CenterNet ResNet-101 41.15 58.01 45.30 - - -
Traditional Aug+ [15] Faster-RCNN 50-FPN (2×) 37.4 58.7 40.5 - - -
Traditional Aug+ [15] Faster-RCNN 50-FPN (2×)+GridMask (p = 0.3) 38.2 60.0 41.4 - - -
Traditional Aug+ [15] Faster-RCNN 50-FPN (2×)+ GridMask (p = 0.5) 38.1 60.1 41.2 - - -
Traditional Aug+ [15] Faster-RCNN 50-FPN (2×)+ GridMask (p = 0.7) 38.3 60.4 41.7 - - -
Traditional Aug+ [15] Faster-RCNN 50-FPN (2×)+ GridMask (p = 0.9) 38.0 60.1 41.2 - - -
Traditional Aug+ [15] Faster-RCNN 50-FPN (4×) 35.7 56.0 38.3 - - -
Traditional Aug+ [15] Faster-RCNN 50-FPN (4×)+ GridMask (p = 0.7) 39.2 60.8 42.2 - - -
Traditional Aug+ [15] Faster-RCNN X101-FPN (1×)) 41.2 63.3 44.8 - - -
Traditional Aug+ [15] Faster-RCNN X101-FPN (2×)) 40.4 62.2 43.8 - - -
Traditional Aug+ [15] Faster-RCNN X101-FPN (2×)+ GridMask (p = 0.7)) 42.6 65.0 46.5 - - -
Traditional Aug+ [15] Faster-RCNN X101-FPN (2×)+ GridMask (p = 0.7)) 42.6 65.0 46.5 - - -
KeepAugment: [47] Faster R-CNN ResNet50-C4 39.5 − − − − −
KeepAugment: [47] Faster R-CNN ResNet50-FPN 40.7 − − − − −
KeepAugment: [47] RetinaNet ResNet50-FPN 39.1 − − − − −
KeepAugment: [47] Faster R-CNN ResNet101-C4 42.2 − − − − −
KeepAugment: [47] Faster R-CNN ResNet101-FPN 42.9 − − − − −
KeepAugment: [47] RetinaNet ResNet101-FPN 41.2 − − − − −
DADAAugment: [88] RetinaNet ResNet-50 35.9 55.8 38.4 19.9 38.8 45.0
DADAAugment: [88] RetinaNet ResNet-50(DADA) 36.6 56.8 39.2 20.2 39.7 46.0
DADAAugment: [88] Faster R-CNN ResNet-50 36.6 58.8 39.6 21.6 39.8 45.0
DADAAugment: [88] Faster R-CNN ResNet-50 (DADA) 37.2 59.1 40.2 22.2 40.2 45.7
DADAAugment: [88] Mask R-CNN ResNet-50 37.4 59.3 40.7 22.2 40.6 46.3
DADAAugment: [88] Mask R-CNN ResNet-50(DADA) 37.8 59.6 41.1 22.4 40.9 46.6
AutoAugment: [16] EfficientDet D0 EfficientNet B0 34.4 52.8 36.7 53.1 40.2 13.9
Det-AdvProp: [16] EfficientDet D0 EfficientNet B0 34.7 52.9 37.2 54.1 40.6 13.9
AutoAugment: [16] EfficientDet D1 EfficientNet B1 40.1 59.2 43.2 57.9 45.7 19.9
Det-AdvProp: [16] EfficientDet D1 EfficientNet B1 40.5 59.2 43.3 58.8 46.2 20.6
AutoAugment: [16] EfficientDet D2 EfficientNet B2 43.5 62.8 46.6 59.8 48.7 23.9
Det-AdvProp: [16] EfficientDet D2 EfficientNet B2 43.8 62.6 47.3 61.0 49.6 25.6
AutoAugment: [16] EfficientDet D3 EfficientNet B3 47.0 66.0 50.8 63.0 51.7 29.8
Det-AdvProp: [16] EfficientDet D3 EfficientNet B3 47.6 66.3 51.4 64.0 52.2 30.2
AutoAugment: [16] EfficientDet D4 EfficientNet B4 49.5 68.7 53.7 64.9 54.0 31.9
Det-AdvProp: [16] EfficientDet D4 EfficientNet B4 49.8 68.6 54.2 65.2 54.2 32.4
AutoAugment: [16] EfficientDet D5 EfficientNet B5 51.5 70.4 56.0 65.2 56.1 35.4
Det-AdvProp: [16] EfficientDet D5 EfficientNet B5 51.8 70.7 56.3 66.1 56.2 36.2
Automatic:
AutoAug-det [171] RetinaNet ResNet-50 39.0 - - - - -
AutoAug-det [171] RetinaNet ResNet-101 40.4 - - - - -
AutoAugment [23] RetinaNet ResNet-200 42.1 - - - - -
AutoAug-det’ [171] RetinaNet ResNet-50 40.3 60.0 43.0 23.6 43.9 53.8
RandAugmnet* [24] RetinaNet ResNet-200 41.9 - - - - -
AutoAug-det [171] RetinaNet ResNet-101 41.8 61.5 44.8 24.4 45.9 55.9
RandAug [24] RetinaNet ResNet-101 40.1 - - - - -
RandAug? [10] RetinaNet ResNet-101 41.4 61.4 44.5 25.0 45.4 54.2
Scale-aware AutoAug [18] RetinaNet ResNet-50 41.3 61.0 441 25.2 44.5 54.6
Scale-aware AutoAug RetinaNet ResNet-101 43.1 62.8 46.0 26.2 46.8 56.7
Scale-aware AutoAug Faster R-CNN ResNet-101 44.2 65.6 48.6 29.4 47.9 56.7
Scale-aware AutoAug (MS test) Faster R-CNN ResNet-101-DCN-C4 47.0 68.6 52.1 32.3 49.3 60.4
Scale-aware AutoAug FCOS ResNet-101 44.0 62.7 47.3 28.2 47.8 56.1
Scale-aware AutoAug FCOS ResNeXt-32x8d-101-DCN 48.5 67.2 52.8 31.5 51.9 63.0
Scale-aware AutoAug (1200 size) FCOS ResNeXt-32x8d-101-DCN 49.6 68.5 54.1 35.7 52.5 62.4
Scale-aware AutoAug (MS Test) ResNeXt-32x8d-101-DCN FCOS 51.4 69.6 57.0 37.4 54.2 65.1
TABLE VI
DATA AUGMENTATION EFFECT ON DIFFERENT OBJECT DETECTION METHODS USING PASCAL VOC DATASET
Method TSet mAP aero bike bird boat bottle bus car cat chair cow table dog horse mbike person plant sheep sofa train tv
FRCN [45] 7 66.9 74.5 78.3 69.2 53.2 36.6 77.3 78.2 82.0 40.7 72.7 67.9 79.6 79.2 73.0 69.0 30.1 65.4 70.2 75.8 65.8
FRCN* [148] 7 69.1 75.4 80.8 67.3 59.9 37.6 81.9 80.0 84.5 50.0 77.1 68.2 81.0 82.5 74.3 69.9 28.4 71.1 70.2 75.8 66.6
ASDN [148] 7 71.0 74.4 81.3 67.6 57.0 46.6 81.0 79.3 86.0 52.9 75.9 73.7 82.6 83.2 77.7 72.7 37.4 66.3 71.2 78.2 74.3
IRE 7 70.5 75.9 78.9 69.0 57.7 46.4 81.7 79.5 82.9 49.3 76.9 67.9 81.5 83.3 76.7 73.2 40.7 72.8 66.9 75.4 74.2
ORE 7 71.0 75.1 79.8 69.7 60.8 46.0 80.4 79.0 83.8 51.6 76.2 67.8 81.2 83.7 76.8 73.8 43.1 70.8 67.4 78.3 75.6
I+ORE 7 71.5 76.1 81.6 69.5 60.1 45.6 82.2 79.2 84.5 52.5 78.7 71.6 80.4 83.3 76.7 73.9 39.4 68.9 69.8 79.2 77.4
FRCN [45] 7+12 70.0 77.0 78.1 69.3 59.4 38.3 81.6 78.6 86.7 42.8 78.8 68.9 84.7 82.0 76.6 69.9 31.8 70.1 74.8 80.4 70.4
FRCN* [148] 7+12 74.8 78.5 81.0 74.7 67.9 53.4 85.6 84.4 86.2 57.4 80.1 72.2 85.2 84.2 77.6 76.1 45.3 75.7 72.3 81.8 77.3
IRE 7+12 75.6 79.0 84.1 76.3 66.9 52.7 84.5 84.4 88.7 58.0 82.9 71.1 84.8 84.4 78.6 76.7 45.5 77.1 76.3 82.5 76.8
ORE 7+12 75.8 79.4 81.6 75.6 66.5 52.7 85.5 84.7 88.3 58.7 82.9 72.8 85.0 84.3 79.3 76.3 46.3 76.3 74.9 86.0 78.2
I+ORE 7+12 76.2 79.6 82.5 75.7 70.5 55.1 85.2 84.4 88.4 58.6 82.6 73.9 84.2 84.7 78.8 76.3 46.7 77.9 75.9 83.3 79.3
SSD 7+12 77.4 81.7 85.4 75.7 69.6 49.9 84.9 85.8 87.4 61.5 82.3 79.2 86.6 87.1 84.7 78.9 50.0 77.4 79.1 86.2 76.3
SSD+ SD (1x) [145] 7+12 78.1 83.2 84.5 76.1 72.1 50.2 85.2 86.3 87.8 63.7 82.8 80.1 85.2 87.2 84.8 80.0 51.5 77.0 82.0 86.1 76.9
SSD + SD(2x) [145] 7+12 78.3 83.6 85.0 76.2 72.0 51.3 85.1 87.2 87.6 64.2 82.5 81.9 85.5 86.5 85.9 81.2 51.2 72.3 82.8 86.9 78.4
SSD +SD(3x) [145] 7+12 77.8 80.4 85.0 76.3 70.1 50.4 84.8 86.3 88.2 61.0 83.5 79.5 87.2 86.9 85.9 78.8 51.2 76.9 79.4 86.5 77.9
FRCN [45] 7+12 73.2 76.5 79.0 70.9 65.5 52.1 83.1 84.7 86.4 52.0 81.9 65.7 84.8 84.6 77.5 76.7 38.8 73.6 73.9 83.0 72.6
FRCN+SD(1x) [156] 7 79.9 85.1 86.6 78.6 75.7 65.2 83.5 88.4 88.9 65.8 83.6 74.3 86.4 84.7 85.5 88.0 62.0 75.5 75.3 87.7 76.3
TABLE VII
VOC 2007 TEST DETECTION AVERAGE PRECISION (%). FRCN* REFERS TO FRCN WITH TRAINING SCHEDULE IN [148] AND SD REFERS TO SYNTHETIC DATA
.
Model mAP AP50 AP75
[16] Xiangning Chen, Cihang Xie, Mingxing Tan, Li Zhang, Cho-Jui Hsieh,
EfficientDet-D0 55.6 77.6 61.4
and Boqing Gong. Robust and accurate object detection via adversarial
+ AutoAugment 55.7 (+0.1) 77.7 (+0.1) 61.8 (+0.4)
learning. In Proceedings of the IEEE/CVF Conference on Computer
+ Det-AdvProp 55.9 (+0.3) 77.9 (+0.3) 62.0 (+0.6)
Vision and Pattern Recognition, pages 16622–16631, 2021.
EfficientDet-D1 60.8 82.0 66.7
[17] Yanbei Chen, Xiatian Zhu, and Shaogang Gong. Semi-supervised deep
+ AutoAugment 61.0 (+0.2) 82.2 (+0.2) 67.2 (+0.5)
learning with memory. In Proceedings of the European conference on
+ Det-AdvProp 61.2 (+0.4) 82.3 (+0.3) 67.4 (+0.7)
computer vision (ECCV), pages 268–283, 2018.
EfficientDet-D2 63.3 83.6 69.3
[18] Yukang Chen, Yanwei Li, Tao Kong, Lu Qi, Ruihang Chu, Lei Li, and
+ AutoAugment 62.7 (-0.6) 83.3 (-0.3) 69.2 (-0.1)
Jiaya Jia. Scale-aware automatic augmentation for object detection.
+ Det-AdvProp 63.5 (+0.2) 83.8 (+0.2) 69.7 (+0.4)
In Proceedings of the IEEE/CVF Conference on Computer Vision and
EfficientDet-D3 65.7 85.3 71.8 Pattern Recognition, pages 9563–9572, 2021.
+ AutoAugment 65.2 (-0.5) 85.1 (-0.2) 71.3 (-0.5)
[19] Yukang Chen, Peizhen Zhang, Zeming Li, Yanwei Li, Xiangyu Zhang,
+ Det-AdvProp 66.2 (+0.5) 85.9 (+0.6) 72.5 (+0.7) Gaofeng Meng, Shiming Xiang, Jian Sun, and Jiaya Jia. Stitcher:
EfficientDet-D4 67.0 86.0 73.0 Feedback-driven data provider for object detection. arXiv preprint
+ AutoAugment 67.0 (+0.0) 86.3 (+0.3) 73.5 (+0.5) arXiv:2004.12432, 2(7):12, 2020.
+ Det-AdvProp 67.5 (+0.5) 86.6 (+0.6) 74.0 (+1.0) [20] Jaehyeop Choi, Chaehyeon Lee, Donggyu Lee, and Heechul Jung.
EfficientDet-D5 67.4 86.9 73.8 Salfmix: A novel single image-based data augmentation technique
+ AutoAugment 67.6 (+0.2) 87.2 (+0.3) 74.2 (+0.4) using a saliency map. Sensors, 21(24):8444, 2021.
+ Det-AdvProp 68.2 (+0.8) 87.6 (+0.7) 74.7 (+0.9) [21] Peng Chu, Xiao Bian, Shaopeng Liu, and Haibin Ling. Feature
TABLE VIII
space augmentation for long-tailed data. In European Conference on
R ESULTS ON PASCAL VOC 2012. T HE PROPOSED D ETA DV P ROP GIVES
Computer Vision, pages 694–710. Springer, 2020.
THE HIGHEST SCORE ON EVERY MODEL AND METRIC . I T LARGELY
OUTPERFORMS AUTOAUGMENT [23] WHEN FACING DOMAIN SHIFT.
[22] Pietro Antonio Cicalese, Aryan Mobiny, Pengyu Yuan, Jan Becker,
Chandra Mohan, and Hien Van Nguyen. Stypath: Style-transfer data
augmentation for robust histology image classification. In International
Conference on Medical Image Computing and Computer-Assisted
Intervention, pages 351–361. Springer, 2020.
[3] Sidra Aleem, Teerath Kumar, Suzanne Little, Malika Bendechache, [23] Ekin D Cubuk, Barret Zoph, Dandelion Mane, Vijay Vasudevan, and
Rob Brennan, and Kevin McGuinness. Random data augmentation Quoc V Le. Autoaugment: Learning augmentation strategies from data.
based enhancement: Ageneralized enhancement approach for medical In Proceedings of the IEEE/CVF Conference on Computer Vision and
datasets. 2022. Pattern Recognition, pages 113–123, 2019.
[4] Eric Arazo, Diego Ortego, Paul Albert, Noel E O’Connor, and Kevin [24] Ekin D Cubuk, Barret Zoph, Jonathon Shlens, and Quoc V Le.
McGuinness. Pseudo-labeling and confirmation bias in deep semi- Randaugment: Practical automated data augmentation with a reduced
supervised learning. In 2020 International Joint Conference on Neural search space. In Proceedings of the IEEE/CVF Conference on Com-
Networks (IJCNN), pages 1–8. IEEE, 2020. puter Vision and Pattern Recognition Workshops, pages 702–703, 2020.
[5] Ben Athiwaratkun, Marc Finzi, Pavel Izmailov, and Andrew Gordon [25] Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-
Wilson. Improving consistency-based semi-supervised learning with Fei. Imagenet: A large-scale hierarchical image database. In 2009
weight averaging. arXiv preprint arXiv:1806.05594, 2(9):11, 2018. IEEE conference on computer vision and pattern recognition, pages
[6] Soroush Baseri Saadi, Nazanin Tataei Sarshar, Soroush Sadeghi, 248–255. Ieee, 2009.
Ramin Ranjbarzadeh, Mersedeh Kooshki Forooshani, and Malika Ben- [26] Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-
dechache. Investigation of effectiveness of shuffled frog-leaping opti- Fei. Imagenet: A large-scale hierarchical image database. In 2009
mizer in training a convolution neural network. Journal of Healthcare IEEE Conference on Computer Vision and Pattern Recognition, pages
Engineering, 2022, 2022. 248–255, 2009.
[7] Markus Bayer, Marc-André Kaufhold, and Christian Reuter. A survey [27] Li Deng. The mnist database of handwritten digit images for machine
on data augmentation for text classification. ACM Computing Surveys, learning research. IEEE Signal Processing Magazine, 29(6):141–142,
2021. 2012.
[8] Sima Behpour, Kris M Kitani, and Brian D Ziebart. Ada: Adversarial [28] Terrance DeVries and Graham W Taylor. Dataset augmentation in
data augmentation for object detection. In 2019 IEEE Winter Confer- feature space. arXiv preprint arXiv:1702.05538, 2017.
ence on Applications of Computer Vision (WACV), pages 1243–1252. [29] Terrance DeVries and Graham W Taylor. Improved regulariza-
IEEE, 2019. tion of convolutional neural networks with cutout. arXiv preprint
[9] David Berthelot, Nicholas Carlini, Ekin D Cubuk, Alex Kurakin, arXiv:1708.04552, 2017.
Kihyuk Sohn, Han Zhang, and Colin Raffel. Remixmatch: Semi- [30] Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weis-
supervised learning with distribution alignment and augmentation an- senborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani,
choring. arXiv preprint arXiv:1911.09785, 2019. Matthias Minderer, Georg Heigold, Sylvain Gelly, et al. An image
[10] David Berthelot, Nicholas Carlini, Ian Goodfellow, Nicolas Papernot, is worth 16x16 words: Transformers for image recognition at scale.
Avital Oliver, and Colin A Raffel. Mixmatch: A holistic approach to arXiv preprint arXiv:2010.11929, 2020.
semi-supervised learning. Advances in neural information processing [31] Kaiwen Duan, Song Bai, Lingxi Xie, Honggang Qi, Qingming Huang,
systems, 32, 2019. and Qi Tian. Centernet: Keypoint triplets for object detection. In
[11] Aisha Chandio, Gong Gui, Teerath Kumar, Irfan Ullah, Ramin Ran- Proceedings of the IEEE/CVF international conference on computer
jbarzadeh, Arunabha M Roy, Akhtar Hussain, and Yao Shen. Precise vision, pages 6569–6578, 2019.
single-stage detector. arXiv preprint arXiv:2210.04252, 2022. [32] Dumitru Erhan, Aaron Courville, Yoshua Bengio, and Pascal Vincent.
[12] Aisha Chandio, Yao Shen, Malika Bendechache, Irum Inayat, and Why does unsupervised pre-training help deep learning? In Proceed-
Teerath Kumar. Audd: audio urdu digits dataset for automatic audio ings of the thirteenth international conference on artificial intelligence
urdu digit recognition. Applied Sciences, 11(19):8842, 2021. and statistics, pages 201–208. JMLR Workshop and Conference Pro-
[13] Arslan Chaudhry, Puneet K Dokania, and Philip HS Torr. Discover- ceedings, 2010.
ing class-specific pixels for weakly-supervised semantic segmentation. [33] M. Everingham, L. Van Gool, C. K. I. Williams, J. Winn,
arXiv preprint arXiv:1707.05821, 2017. and A. Zisserman. The PASCAL Visual Object Classes
[14] Liang-Chieh Chen, Yukun Zhu, George Papandreou, Florian Schroff, Challenge 2007 (VOC2007) Results. http://www.pascal-
and Hartwig Adam. Encoder-decoder with atrous separable convolution network.org/challenges/VOC/voc2007/workshop/index.html.
for semantic image segmentation. In Proceedings of the European [34] M. Everingham, L. Van Gool, C. K. I. Williams, J. Winn,
conference on computer vision (ECCV), pages 801–818, 2018. and A. Zisserman. The PASCAL Visual Object Classes
[15] Pengguang Chen, Shu Liu, Hengshuang Zhao, and Jiaya Jia. Gridmask Challenge 2012 (VOC2012) Results. http://www.pascal-
data augmentation. arXiv preprint arXiv:2001.04086, 2020. network.org/challenges/VOC/voc2012/workshop/index.html.
Method Model 1/8 1/4 1/2 7/8 Full
SDA [160] DeepLabV3Plus 74.1 - - - -
SDA + DSBN [160] DeepLabV3Plus 69.5 - - - -
SDA [160] DeepLabV3Plus - - - - 78.7
SDA + DSBN [160] DeepLabV3Plus - - - - 79.2
SDA [160] DeepLabV3Plus - - - 71.4 -
SDA + DSBN [160] DeepLabV3Plus - - - 72.5 -
AdvSemi [62] DeepLabV2 58.8 62.3 65.7 - 66.0
S4GAN + MT [100] DeepLabV2 59.3 61.9 - - 65.8
CutMix [41] DeepLabV2 60.3 63.87 - - 67.7
DST-CBC [40] DeepLabV2 60.5 64.4 - - 66.9
ClassMix [104] DeepLabV2 61.4 63.6 66.3 - 66.2
ECS [99] DeepLabv3Plus 67.4 70.7 72.9 - 74.8
DSBN [160] DeepLabV2 67.6 69.3 70.7 - 70.1
SSBN [160] DeepLabV3Plus 74.1 77.8 78.7 - 78.7
Adversarial [62] DeepLab-v2 - 58.8 62.3 65.7 -
s4GAN [100] DeepLab-v2 - 59.3 61.9 - 65.8
French et al [41] DeepLab-v2 51.20 60.34 63.87 - -
DST-CBC [40] DeepLab-v2 48.7 60.5 64.4 - -
ClassMix-Seg [104] DeepLab-v2 54.07 61.35 63.63 66.29
DeepLab V3plus [164] MobileNet - - - - 73.5
DeepLab V3plus [164] ResNet-50 - - - - 76.9
DeepLab V3plus [164] ResNet-101 - - - - 78.5
Baseline+ CutOut (16×16, p = 1) [164] MobileNet - - - - 72.8
Baseline+ CutMix (p = 1) [164] MobileNet - - - - 72.6
Baseline+ ObjectAug [164] MobileNet - - - - 73.5
TABLE IX
R ESULTS OF P ERFORMANCE ( M I O U) ON C ITYSCAPES VALIDATION SET

[35] Mark Everingham, Luc Van Gool, Christopher KI Williams, John [47] Chengyue Gong, Dilin Wang, Meng Li, Vikas Chandra, and Qiang
Winn, and Andrew Zisserman. The pascal visual object classes (voc) Liu. Keepaugment: A simple information-preserving data augmentation
challenge. International journal of computer vision, 88:303–308, 2009. approach. In Proceedings of the IEEE/CVF conference on computer
[36] Junsong Fan, Zhaoxiang Zhang, Chunfeng Song, and Tieniu Tan. vision and pattern recognition, pages 1055–1064, 2021.
Learning integral objects with intra-class discriminator for weakly- [48] Gregory Griffin, Alex Holub, and Pietro Perona. Caltech-256 object
supervised semantic segmentation. In Proceedings of the IEEE/CVF category dataset. 2007.
Conference on Computer Vision and Pattern Recognition, pages 4283– [49] Jian Guo and Stephen Gould. Deep cnn ensemble with data augmen-
4292, 2020. tation for object detection. arXiv preprint arXiv:1506.07224, 2015.
[37] Hao-Shu Fang, Jianhua Sun, Runzhong Wang, Minghao Gou, Yong- [50] Alon Halevy, Peter Norvig, and Fernando Pereira. The unreasonable
Lu Li, and Cewu Lu. Instaboost: Boosting instance segmentation via effectiveness of data. IEEE intelligent systems, 24(2):8–12, 2009.
probability map guided copy-pasting. In Proceedings of the IEEE/CVF [51] Junlin Han, Pengfei Fang, Weihao Li, Jie Hong, Mohammad Ali
International Conference on Computer Vision, pages 682–691, 2019. Armin, Ian Reid, Lars Petersson, and Hongdong Li. You only cut
[38] Li Fei-Fei, Robert Fergus, and Pietro Perona. One-shot learning of once: Boosting data augmentation with a single cut. arXiv preprint
object categories. IEEE transactions on pattern analysis and machine arXiv:2201.12078, 2022.
intelligence, 28(4):594–611, 2006. [52] Ethan Harris, Antonia Marcu, Matthew Painter, Mahesan Niranjan,
[39] Steven Y Feng, Varun Gangal, Dongyeop Kang, Teruko Mitamura, Adam Prügel-Bennett, and Jonathon Hare. Fmix: Enhancing mixed
and Eduard Hovy. Genaug: Data augmentation for finetuning text sample data augmentation. arXiv preprint arXiv:2002.12047, 2020.
generators. arXiv preprint arXiv:2010.01794, 2020. [53] Ryuichiro Hataya, Jan Zdenek, Kazuki Yoshizoe, and Hideki
[40] Zhengyang Feng, Qianyu Zhou, Guangliang Cheng, Xin Tan, Jianping Nakayama. Faster autoaugment: Learning augmentation strategies
Shi, and Lizhuang Ma. Semi-supervised semantic segmentation via using backpropagation. In European Conference on Computer Vision,
dynamic self-training and classbalanced curriculum. arXiv preprint pages 1–16. Springer, 2020.
arXiv:2004.08514, 1(2):5, 2020. [54] Kaiming He, Georgia Gkioxari, Piotr Dollár, and Ross Girshick.
[41] Geoff French, Timo Aila, Samuli Laine, Michal Mackiewicz, and Mask r-cnn. In Proceedings of the IEEE international conference on
Graham Finlayson. Semi-supervised semantic segmentation needs computer vision, pages 2961–2969, 2017.
strong, high-dimensional perturbations. 2019. [55] Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep
[42] Leon A Gatys, Alexander S Ecker, and Matthias Bethge. A neural residual learning for image recognition. In Proceedings of the IEEE
algorithm of artistic style. arXiv preprint arXiv:1508.06576, 2015. conference on computer vision and pattern recognition, pages 770–778,
[43] Golnaz Ghiasi, Yin Cui, Aravind Srinivas, Rui Qian, Tsung-Yi Lin, 2016.
Ekin D Cubuk, Quoc V Le, and Barret Zoph. Simple copy-paste [56] Dan Hendrycks, Norman Mu, Ekin D Cubuk, Barret Zoph, Justin
is a strong data augmentation method for instance segmentation. In Gilmer, and Balaji Lakshminarayanan. Augmix: A simple data pro-
Proceedings of the IEEE/CVF Conference on Computer Vision and cessing method to improve robustness and uncertainty. arXiv preprint
Pattern Recognition, pages 2918–2928, 2021. arXiv:1912.02781, 2019.
[44] Golnaz Ghiasi, Tsung-Yi Lin, and Quoc V Le. Dropblock: A reg- [57] Dan Hendrycks, Kevin Zhao, Steven Basart, Jacob Steinhardt, and
ularization method for convolutional networks. Advances in neural Dawn Song. Natural adversarial examples. In Proceedings of the
information processing systems, 31, 2018. IEEE/CVF Conference on Computer Vision and Pattern Recognition,
[45] Ross Girshick. Fast r-cnn. In Proceedings of the IEEE international pages 15262–15271, 2021.
conference on computer vision, pages 1440–1448, 2015. [58] Netzahualcoyotl Hernandez-Cruz, David Cato, and Jesus Favela. Neu-
[46] Ross Girshick, Jeff Donahue, Trevor Darrell, and Jitendra Malik. ral style transfer as data augmentation for improving covid-19 diagnosis
Rich feature hierarchies for accurate object detection and semantic classification. SN Computer Science, 2(5):1–12, 2021.
segmentation. In Proceedings of the IEEE conference on computer [59] Minui Hong, Jinwoo Choi, and Gunhee Kim. Stylemix: Separating
vision and pattern recognition, pages 580–587, 2014. content and style for enhanced data augmentation. In Proceedings of the
Method Model 1/100 1/50 1/20 1/8 1/4 Full
GANSeg [131] VGG16 - - - - 64.1
AdvSemSeg [62] ResNet-101 - - - - 68.4
CCT [105] ResNet-50 - - - - 69.4
PseudoSeg [173] ResNet-101 - - - - 73.2
DSBN [160] ResNet-101 - - - - 75.0
DSBN [160] Xception-65 - - - - 79.3
Fully supervised [160] ResNet-101 - - - - 78.3
Fully supervised [160] Xception-65 - - - - 79.2
Adversarial [62] DeepLab-v2 - 57.2 64.7 69.5 72.1 -
s4GAN [100] DeepLab-v2 - 63.3 67.2 71.4 - 75.6
French et.el [41] DeepLab-v2 53.79 64.81 66.48 67.60 - -
DST-CBC [40] DeepLab-v2 61.6 65.5 69.3 70.7 71.8 -
ClassMix:Seg* [104] DeepLab-v2 54.18 66.15 67.77 71.00 72.45 -
Mixup [163] IRNet - - - - - 49
CutOut [29] IRNet - - - - - 48.9
CutMix [162] IRNet - - - - - 49.2
Random pasting [134] IRNet - - - - - 49.8
CCNN [107] VGG16 - - - - - 35.6
SEC [73] VGG16 - - - - - 51.1
STC [151] VGG16 - - - - - 51.2
AdvEra [150] VGG16 - - - - - 55.7
DCSP [13] ResNet101 - - - - - 61.9
MDC [152] VGG16 - - - - - 60.8
MCOF [147] ResNet101 - - - - - 61.2
DSRG [61] ResNet101 - - - - - 63.2
AffinityNet [2] ResNet-38 - - - - - 63.7
IRNet [1] ResNet50 - - - - - 64.8
FickleNet [86] ResNet101 - - - - - 65.3
SEAM [149] ResNet38 - - - - - 65.7
ICD [36] ResNet101 - - - - - 64.3
IRNet + CDA [134] ResNet50 - - - - - 66.4
SEAM + CDA [134] ResNet38 - - - - - 66.8
DeepLab V3 [164] MobileNet - - - - - 71.9
DeepLab V3 [164] ResNet-50 - - - - - 77.8
DeepLab V3 [164] ResNet-101 - - - - - 78.4
DeepLab V3plus [164] MobileNet - - - - - 73.8
DeepLab V3plus [164] ResNet-50 - - - - - 78.8
DeepLab V3plus [164] ResNet-101 - - - - - 79.6
Baseline+R.Rotation [164] ObjectAug - - - - - 69.5
Baseline +R.Scaling [164] ObjectAug - - - - - 70.3
Baseline + R.Flipping [164] ObjectAug - - - - - 69.6
Baseline + R.Shifting [164] ObjectAug - - - - - 70.7
Baseline + All [164] ObjectAug - - - - - 73.8
Baseline + CutOut (16×16, p = 0.5) [164] MobileNet - - - - - 71.9
Baseline + CutOut (16×16, p = 1) [164] MobileNet - - - - - 72.3
Baseline + CutMix (p = 0.5) [164] MobileNet - - - - - 72.7
Baseline + CutMix (p = 1) [164] MobileNet - - - - - 72.4
Baseline + ObjectAug [164] MobileNet - - - - - 73.8
Baseline + CutOut (16×16, p=0.5) + ObjectAug [164]
MobileNet - - - - - 73.9
Baseline + CutMix (p=0.5) + ObjectAug [164] MobileNet - - - - - 74.1
DeepLabv3+ [14] EfficientNet-B7 - - - - - 84.6
ExFuse [166] EfficientNet-B7 - - - - - 85.8
Eff-B7 [172] EfficientNet-B7 - - - - - 85.2
Eff-L2 [172] EfficientNet-B7 - - - - - 88.7
Eff-B7 NAS-FPN [43] EfficientNet-B7 - - - - - 83.9
Eff-B7 NAS-FPN w/ Copy-Paste pre-training [43] EfficientNet-B7 - - - - - 86.6
TABLE X
R ESULTS OF P ERFORMANCE MEAN INTERSECTION OVER UNION ( M I O U) ON THE PASCAL VOC 2012 VALIDATION SET
IEEE/CVF Conference on Computer Vision and Pattern Recognition, [81] Chia-Wen Kuo, Chih-Yao Ma, Jia-Bin Huang, and Zsolt Kira. Feat-
pages 14862–14870, 2021. match: Feature-based augmentation for semi-supervised learning. In
[60] Shaoli Huang, Xinchao Wang, and Dacheng Tao. Snapmix: Seman- European Conference on Computer Vision, pages 479–495. Springer,
tically proportional mixing for augmenting fine-grained data. In Pro- 2020.
ceedings of the AAAI Conference on Artificial Intelligence, volume 35, [82] Jiss Kuruvilla, Dhanya Sukumaran, Anjali Sankar, and Siji P Joy.
pages 1628–1636, 2021. A review on image processing and image segmentation. In 2016
[61] Zilong Huang, Xinggang Wang, Jiasi Wang, Wenyu Liu, and Jingdong international conference on data mining and advanced computing
Wang. Weakly-supervised semantic segmentation network with deep (SAPIENCE), pages 198–203. IEEE, 2016.
seeded region growing. In Proceedings of the IEEE conference on [83] Samuli Laine and Timo Aila. Temporal ensembling for semi-supervised
computer vision and pattern recognition, pages 7014–7023, 2018. learning. arXiv preprint arXiv:1610.02242, 2016.
[62] Wei-Chih Hung, Yi-Hsuan Tsai, Yan-Ting Liou, Yen-Yu Lin, and [84] Misha Laskin, Kimin Lee, Adam Stooke, Lerrel Pinto, Pieter Abbeel,
Ming-Hsuan Yang. Adversarial learning for semi-supervised semantic and Aravind Srinivas. Reinforcement learning with augmented data.
segmentation. arXiv preprint arXiv:1802.07934, 2018. Advances in Neural Information Processing Systems, 33:19884–19895,
[63] Sergey Ioffe and Christian Szegedy. Batch normalization: Accelerating 2020.
deep network training by reducing internal covariate shift. In Interna- [85] Dong-Hyun Lee et al. Pseudo-label: The simple and efficient semi-
tional conference on machine learning, pages 448–456. PMLR, 2015. supervised learning method for deep neural networks. In Workshop
[64] Ahmet Iscen, Giorgos Tolias, Yannis Avrithis, and Ondrej Chum. Label on challenges in representation learning, ICML, volume 3, page 896,
propagation for deep semi-supervised learning. In Proceedings of the 2013.
IEEE/CVF Conference on Computer Vision and Pattern Recognition, [86] Jungbeom Lee, Eunji Kim, Sungmin Lee, Jangho Lee, and Sungroh
pages 5070–5079, 2019. Yoon. Ficklenet: Weakly and semi-supervised semantic image seg-
[65] Jacob Jackson and John Schulman. Semi-supervised learning by label mentation using stochastic inference. In Proceedings of the IEEE/CVF
gradient alignment. arXiv preprint arXiv:1902.02336, 2019. Conference on Computer Vision and Pattern Recognition, pages 5267–
[66] Philip TG Jackson, Amir Atapour Abarghouei, Stephen Bonner, Toby P 5276, 2019.
Breckon, and Boguslaw Obara. Style augmentation: data augmentation [87] Victor Lempitsky, Pushmeet Kohli, Carsten Rother, and Toby Sharp.
via style randomization. In CVPR workshops, volume 6, pages 10–11, Image segmentation with a bounding box prior. In 2009 IEEE 12th
2019. international conference on computer vision, pages 277–284. IEEE,
[67] Wisal Khan, Kislay Raj, Teerath Kumar, Arunabha M Roy, and Bin 2009.
Luo. Introducing urdu digits dataset with demonstration of an efficient [88] Yonggang Li, Guosheng Hu, Yongtao Wang, Timothy Hospedales,
and robust noisy decoder-based pseudo example generator. Symmetry, Neil M Robertson, and Yongxin Yang. Dada: Differentiable automatic
14(10):1976, 2022. data augmentation. arXiv preprint arXiv:2003.03780, 2020.
[68] Cherry Khosla and Baljit Singh Saini. Enhancing performance of [89] JunHao Liew, Yunchao Wei, Wei Xiong, Sim-Heng Ong, and Jiashi
deep learning models with different data augmentation techniques: A Feng. Regional interactive image segmentation networks. In 2017 IEEE
survey. In 2020 International Conference on Intelligent Engineering international conference on computer vision (ICCV), pages 2746–2754.
and Management (ICIEM), pages 79–85. IEEE, 2020. IEEE Computer Society, 2017.
[69] Byoungjip Kim, Jinho Choo, Yeong-Dae Kwon, Seongho Joe, Seungjai [90] Sungbin Lim, Ildoo Kim, Taesup Kim, Chiheon Kim, and Sungwoong
Min, and Youngjune Gwon. Selfmatch: Combining contrastive self- Kim. Fast autoaugment. Advances in Neural Information Processing
supervision and consistency for semi-supervised learning. arXiv Systems, 32, 2019.
preprint arXiv:2101.06480, 2021. [91] Shiqi Lin, Tao Yu, Ruoyu Feng, Xin Li, Xin Jin, and Zhibo Chen.
[70] Jang-Hyun Kim, Wonho Choo, and Hyun Oh Song. Puzzle mix: Ex- Local patch autoaugment with multi-agent collaboration. arXiv preprint
ploiting saliency and local statistics for optimal mixup. In International arXiv:2103.11099, 2021.
Conference on Machine Learning, pages 5275–5285. PMLR, 2020. [92] Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Per-
[71] Youmin Kim, AFM Shahab Uddin, and Sung-Ho Bae. Local augment: ona, Deva Ramanan, Piotr Dollár, and C Lawrence Zitnick. Microsoft
Utilizing local bias property of convolutional neural networks for data coco: Common objects in context. In Computer Vision–ECCV 2014:
augmentation. IEEE Access, 9:15191–15199, 2021. 13th European Conference, Zurich, Switzerland, September 6-12, 2014,
[72] Tom Ko, Vijayaditya Peddinti, Daniel Povey, and Sanjeev Khudanpur. Proceedings, Part V 13, pages 740–755. Springer, 2014.
Audio augmentation for speech recognition. In Sixteenth annual [93] Pei Liu, Xuemin Wang, Chao Xiang, and Weiye Meng. A survey of
conference of the international speech communication association, text data augmentation. In 2020 International Conference on Computer
2015. Communication and Network Security (CCNS), pages 191–195. IEEE,
[73] Alexander Kolesnikov and Christoph H Lampert. Seed, expand and 2020.
constrain: Three principles for weakly-supervised image segmentation. [94] Xiaoliang Liu, Furao Shen, Jian Zhao, and Changhai Nie. Randommix:
In European conference on computer vision, pages 695–711. Springer, A mixed sample data augmentation method with multiple mixed modes.
2016. arXiv preprint arXiv:2205.08728, 2022.
[74] Alex Krizhevsky, Geoffrey Hinton, et al. Learning multiple layers of [95] Xiaolong Liu, Zhidong Deng, and Yuhan Yang. Recent progress in se-
features from tiny images. 2009. mantic image segmentation. Artificial Intelligence Review, 52(2):1089–
[75] Alex Krizhevsky, Ilya Sutskever, and Geoffrey E Hinton. Imagenet 1106, 2019.
classification with deep convolutional neural networks. Advances in [96] Yucen Luo, Jun Zhu, Mengxi Li, Yong Ren, and Bo Zhang. Smooth
neural information processing systems, 25, 2012. neighbors on teacher graphs for semi-supervised learning. In Pro-
[76] Alex Krizhevsky, Ilya Sutskever, and Geoffrey E Hinton. Imagenet ceedings of the IEEE conference on computer vision and pattern
classification with deep convolutional neural networks. Communica- recognition, pages 8896–8905, 2018.
tions of the ACM, 60(6):84–90, 2017. [97] Aleksander Madry, Aleksandar Makelov, Ludwig Schmidt, Dimitris
[77] Teerath Kumar, Alessandra Mileo, Rob Brennan, and Malika Ben- Tsipras, and Adrian Vladu. Towards deep learning models resistant
dechache. Rsmda: Random slices mixing data augmentation. Applied to adversarial attacks. arXiv preprint arXiv:1706.06083, 2017.
Sciences, 13(3):1711, 2023. [98] Sachin Mehta, Saeid Naderiparizi, Fartash Faghri, Maxwell Horton,
[78] Teerath Kumar, Jinbae Park, Muhammad Salman Ali, AFM Uddin, and Lailin Chen, Ali Farhadi, Oncel Tuzel, and Mohammad Rastegari.
Sung-Ho Bae. Class specific autoencoders enhance sample diversity. Rangeaugment: Efficient online augmentation with range learning.
Journal of Broadcast Engineering, 26(7):844–854, 2021. arXiv preprint arXiv:2212.10553, 2022.
[79] Teerath Kumar, Jinbae Park, Muhammad Salman Ali, AFM Shahab [99] Robert Mendel, Luis Antonio de Souza, David Rauber, Joao Paulo
Uddin, Jong Hwan Ko, and Sung-Ho Bae. Binary-classifiers-enabled Papa, and Christoph Palm. Semi-supervised segmentation based on
filters for semi-supervised learning. IEEE Access, 9:167663–167673, error-correcting supervision. In European Conference on Computer
2021. Vision, pages 141–157. Springer, 2020.
[80] Teerath Kumar, Jinbae Park, and Sung-Ho Bae. Intra-class random [100] Sudhanshu Mittal, Maxim Tatarchenko, and Thomas Brox. Semi-
erasing (icre) augmentation for audio classification. In Proceedings Of supervised semantic segmentation with high-and low-level consis-
The Korean Society Of Broadcast Engineers Conference, pages 244– tency. IEEE transactions on pattern analysis and machine intelligence,
247. The Korean Institute of Broadcast and Media Engineers, 2020. 43(4):1369–1379, 2019.
[101] Takeru Miyato, Shin-ichi Maeda, Masanori Koyama, and Shin Ishii. [119] Olga Russakovsky, Jia Deng, Hao Su, Jonathan Krause, Sanjeev
Virtual adversarial training: a regularization method for supervised and Satheesh, Sean Ma, Zhiheng Huang, Andrej Karpathy, Aditya Khosla,
semi-supervised learning. IEEE transactions on pattern analysis and Michael Bernstein, et al. Imagenet large scale visual recognition
machine intelligence, 41(8):1979–1993, 2018. challenge. International journal of computer vision, 115(3):211–252,
[102] Loris Nanni, Gianluca Maguolo, and Michelangelo Paci. Data augmen- 2015.
tation approaches for improving animal audio classification. Ecological [120] Mehdi Sajjadi, Mehran Javanmardi, and Tolga Tasdizen. Regular-
Informatics, 57:101084, 2020. ization with stochastic transformations and perturbations for deep
[103] Yuval Netzer, Tao Wang, Adam Coates, Alessandro Bissacco, Bo Wu, semi-supervised learning. Advances in neural information processing
and Andrew Y Ng. Reading digits in natural images with unsupervised systems, 29, 2016.
feature learning. 2011. [121] Jin-Woo Seo, Hong-Gyu Jung, and Seong-Whan Lee. Self-
[104] Viktor Olsson, Wilhelm Tranheden, Juliano Pinto, and Lennart Svens- augmentation: Generalizing deep networks to unseen classes for few-
son. Classmix: Segmentation-based data augmentation for semi- shot learning. Neural Networks, 138:140–149, 2021.
supervised learning. In Proceedings of the IEEE/CVF Winter Con- [122] Ling Shao, Fan Zhu, and Xuelong Li. Transfer learning for visual
ference on Applications of Computer Vision, pages 1369–1378, 2021. categorization: A survey. IEEE transactions on neural networks and
[105] Yassine Ouali, Céline Hudelot, and Myriam Tami. Semi-supervised learning systems, 26(5):1019–1034, 2014.
semantic segmentation with cross-consistency training. In Proceedings [123] Connor Shorten and Taghi M Khoshgoftaar. A survey on image data
of the IEEE/CVF Conference on Computer Vision and Pattern Recog- augmentation for deep learning. Journal of big data, 6(1):1–48, 2019.
nition, pages 12674–12684, 2020. [124] Connor Shorten, Taghi M Khoshgoftaar, and Borko Furht. Text data
[106] Jinbae Park, Teerath Kumar, and Sung-Ho Bae. Search of an optimal augmentation for deep learning. Journal of big Data, 8(1):1–34, 2021.
sound augmentation policy for environmental sound classification with [125] Karen Simonyan and Andrew Zisserman. Very deep convolu-
deep neural networks. In Proceedings Of The Korean Society Of tional networks for large-scale image recognition. arXiv preprint
Broadcast Engineers Conference, pages 18–21. The Korean Institute arXiv:1409.1556, 2014.
of Broadcast and Media Engineers, 2020. [126] Aditya Singh, Ramin Ranjbarzadeh, Kislay Raj, Teerath Kumar, and
[107] Deepak Pathak, Philipp Krahenbuhl, and Trevor Darrell. Constrained Arunabha M Roy. Understanding eeg signals for subject-wise definition
convolutional neural networks for weakly supervised segmentation. In of armoni activities. arXiv preprint arXiv:2301.00948, 2023.
Proceedings of the IEEE international conference on computer vision, [127] Bharat Singh and Larry S Davis. An analysis of scale invariance
pages 1796–1804, 2015. in object detection snip. In Proceedings of the IEEE conference on
[108] Pornntiwa Pawara, Emmanuel Okafor, Lambert Schomaker, and Marco computer vision and pattern recognition, pages 3578–3587, 2018.
Wiering. Data augmentation for plant classification. In International [128] Bharat Singh, Mahyar Najibi, and Larry S Davis. Sniper: Efficient
conference on advanced concepts for intelligent vision systems, pages multi-scale training. Advances in neural information processing sys-
615–626. Springer, 2017. tems, 31, 2018.
[109] Luis Perez and Jason Wang. The effectiveness of data augmen- [129] Krishna Kumar Singh, Hao Yu, Aron Sarmasi, Gautam Pradeep,
tation in image classification using deep learning. arXiv preprint and Yong Jae Lee. Hide-and-seek: A data augmentation tech-
arXiv:1712.04621, 2017. nique for weakly-supervised localization and beyond. arXiv preprint
[110] Siyuan Qiao, Wei Shen, Zhishuai Zhang, Bo Wang, and Alan Yuille. arXiv:1811.02545, 2018.
Deep co-training for semi-supervised image recognition. In Proceed- [130] Kihyuk Sohn, David Berthelot, Nicholas Carlini, Zizhao Zhang, Han
ings of the european conference on computer vision (eccv), pages 135– Zhang, Colin A Raffel, Ekin Dogus Cubuk, Alexey Kurakin, and Chun-
152, 2018. Liang Li. Fixmatch: Simplifying semi-supervised learning with con-
[111] Jie Qin, Jiemin Fang, Qian Zhang, Wenyu Liu, Xingang Wang, and sistency and confidence. Advances in Neural Information Processing
Xinggang Wang. Resizemix: Mixing data with preserved object Systems, 33:596–608, 2020.
information and true labels. arXiv preprint arXiv:2012.11101, 2020. [131] Nasim Souly, Concetto Spampinato, and Mubarak Shah. Semi super-
[112] Alexandre Ramé, Rémy Sun, and Matthieu Cord. Mixmo: Mixing mul- vised semantic segmentation using generative adversarial network. In
tiple inputs for multiple outputs via deep subnetworks. In Proceedings Proceedings of the IEEE international conference on computer vision,
of the IEEE/CVF International Conference on Computer Vision, pages pages 5688–5696, 2017.
823–833, 2021. [132] Nitish Srivastava, Geoffrey Hinton, Alex Krizhevsky, Ilya Sutskever,
[113] Ramin Ranjbarzadeh, Shadi Dorosti, Saeid Jafarzadeh Ghoushchi, and Ruslan Salakhutdinov. Dropout: a simple way to prevent neural
Annalina Caputo, Erfan Babaee Tirkolaee, Sadia Samar Ali, Zahra networks from overfitting. The journal of machine learning research,
Arshadi, and Malika Bendechache. Breast tumor localization and 15(1):1929–1958, 2014.
segmentation using machine learning techniques: Overview of datasets, [133] Xingzhe Su. A survey on data augmentation methods based on gan
findings, and methods. Computers in Biology and Medicine, page in computer vision. In The International Conference on Natural
106443, 2022. Computation, Fuzzy Systems and Knowledge Discovery, pages 852–
[114] Ramin Ranjbarzadeh, Saeid Jafarzadeh Ghoushchi, Nazanin Tataei Sar- 865. Springer, 2020.
shar, Erfan Babaee Tirkolaee, Sadia Samar Ali, Teerath Kumar, and [134] Yukun Su, Ruizhou Sun, Guosheng Lin, and Qingyao Wu. Context de-
Malika Bendechache. Me-ccnn: Multi-encoded images and a cascade coupling augmentation for weakly supervised semantic segmentation.
convolutional neural network for breast tumor segmentation and recog- In Proceedings of the IEEE/CVF international conference on computer
nition. Artificial Intelligence Review, pages 1–38, 2023. vision, pages 7004–7014, 2021.
[115] Ramin Ranjbarzadeh, Nazanin Tataei Sarshar, Saeid [135] Cecilia Summers and Michael J Dinneen. Improved mixed-example
Jafarzadeh Ghoushchi, Mohammad Saleh Esfahani, Mahboub data augmentation. In 2019 IEEE Winter Conference on Applications
Parhizkar, Yaghoub Pourasad, Shokofeh Anari, and Malika of Computer Vision (WACV), pages 1262–1270. IEEE, 2019.
Bendechache. Mrfe-cnn: multi-route feature extraction model [136] Chen Sun, Abhinav Shrivastava, Saurabh Singh, and Abhinav Gupta.
for breast tumor segmentation in mammograms using a convolutional Revisiting unreasonable effectiveness of data in deep learning era. In
neural network. Annals of Operations Research, pages 1–22, 2022. Proceedings of the IEEE international conference on computer vision,
[116] Ramin Ranjbarzadeh, Payam Zarbakhsh, Annalina Caputo, Er- pages 843–852, 2017.
fan Babaee Tirkolaee, and Malika Bendechache. Brain tumor seg- [137] Ryo Takahashi, Takashi Matsubara, and Kuniaki Uehara. Ricap:
mentation based on an optimized convolutional neural network and an Random image cropping and patching data augmentation for deep
improved chimp optimization algorithm. Available at SSRN 4295236, cnns. In Asian conference on machine learning, pages 786–798. PMLR,
2022. 2018.
[117] Antti Rasmus, Mathias Berglund, Mikko Honkala, Harri Valpola, [138] Antti Tarvainen and Harri Valpola. Mean teachers are better role
and Tapani Raiko. Semi-supervised learning with ladder networks. models: Weight-averaged consistency targets improve semi-supervised
Advances in neural information processing systems, 28, 2015. deep learning results. Advances in neural information processing
[118] Arunabha M Roy, Jayabrata Bhaduri, Teerath Kumar, and Kislay Raj. systems, 30, 2017.
Wildect-yolo: An efficient and robust computer vision-based accurate [139] Nazanin Tataei Sarshar, Ramin Ranjbarzadeh, Saeid
object localization model for automated endangered wildlife detection. Jafarzadeh Ghoushchi, Gabriel Gomes de Oliveira, Shokofeh
Ecological Informatics, page 101919, 2022. Anari, Mahboub Parhizkar, and Malika Bendechache. Glioma brain
tumor segmentation in four mri modalities using a convolutional neural [158] Suorong Yang, Weikang Xiao, Mengcheng Zhang, Suhan Guo, Jian
network and based on a transfer learning method. In Proceedings Zhao, and Furao Shen. Image data augmentation for deep learning: A
of the 7th Brazilian Technology Symposium (BTSym’21) Emerging survey. arXiv preprint arXiv:2204.08610, 2022.
Trends in Human Smart and Sustainable Future of Cities (Volume 1), [159] Jaejun Yoo, Namhyuk Ahn, and Kyung-Ah Sohn. Rethinking data
pages 386–402. Springer, 2022. augmentation for image super-resolution: A comprehensive analysis
[140] Muhammad Turab, Teerath Kumar, Malika Bendechache, and Takfari- and a new strategy. In Proceedings of the IEEE/CVF Conference on
nas Saber. Investigating multi-feature selection and ensembling for Computer Vision and Pattern Recognition, pages 8375–8384, 2020.
audio classification. arXiv preprint arXiv:2206.07511, 2022. [160] Jianlong Yuan, Yifan Liu, Chunhua Shen, Zhibin Wang, and Hao Li. A
[141] AFM Uddin, Mst Monira, Wheemyung Shin, TaeChoong Chung, Sung- simple baseline for semi-supervised semantic segmentation with strong
Ho Bae, et al. Saliencymix: A saliency guided data augmentation data augmentation. In Proceedings of the IEEE/CVF International
strategy for better regularization. arXiv preprint arXiv:2006.01791, Conference on Computer Vision, pages 8229–8238, 2021.
2020. [161] Fei Yue, Chao Zhang, MingYang Yuan, Chen Xu, and YaLin Song.
[142] Vikas Verma, Kenji Kawaguchi, Alex Lamb, Juho Kannala, Yoshua Survey of image augmentation based on generative adversarial network.
Bengio, and David Lopez-Paz. Interpolation consistency training for In Journal of Physics: Conference Series, volume 2203, page 012052.
semi-supervised learning. arXiv preprint arXiv:1903.03825, 2019. IOP Publishing, 2022.
[162] Sangdoo Yun, Dongyoon Han, Seong Joon Oh, Sanghyuk Chun, Junsuk
[143] Riccardo Volpi, Pietro Morerio, Silvio Savarese, and Vittorio Murino.
Choe, and Youngjoon Yoo. Cutmix: Regularization strategy to train
Adversarial feature augmentation for unsupervised domain adaptation.
strong classifiers with localizable features. In Proceedings of the
In Proceedings of the IEEE conference on computer vision and pattern
IEEE/CVF international conference on computer vision, pages 6023–
recognition, pages 5495–5504, 2018.
6032, 2019.
[144] Hao Wang, Qilong Wang, Fan Yang, Weiqi Zhang, and Wangmeng Zuo. [163] Hongyi Zhang, Moustapha Cisse, Yann N Dauphin, and David Lopez-
Data augmentation for object detection via progressive and selective Paz. mixup: Beyond empirical risk minimization. arXiv preprint
instance-switching. arXiv preprint arXiv:1906.00358, 2019. arXiv:1710.09412, 2017.
[145] Ke Wang, Bin Fang, Jiye Qian, Su Yang, Xin Zhou, and Jie Zhou. [164] Jiawei Zhang, Yanchun Zhang, and Xiaowei Xu. Objectaug: object-
Perspective transformation data augmentation for object detection. level data augmentation for semantic image segmentation. In 2021
IEEE Access, 8:4935–4943, 2019. International Joint Conference on Neural Networks (IJCNN), pages
[146] Xiang Wang, Kai Wang, and Shiguo Lian. A survey on face data 1–8. IEEE, 2021.
augmentation for the training of deep neural networks. Neural [165] Xiaofeng Zhang, Zhangyang Wang, Dong Liu, Qifeng Lin, and Qing
computing and applications, 32(19):15503–15531, 2020. Ling. Deep adversarial data augmentation for extremely low data
[147] Xiang Wang, Shaodi You, Xi Li, and Huimin Ma. Weakly-supervised regimes. IEEE Transactions on Circuits and Systems for Video
semantic segmentation by iteratively mining common object features. Technology, 31(1):15–28, 2020.
In Proceedings of the IEEE conference on computer vision and pattern [166] Zhenli Zhang, Xiangyu Zhang, Chao Peng, Xiangyang Xue, and Jian
recognition, pages 1354–1362, 2018. Sun. Exfuse: Enhancing feature fusion for semantic segmentation. In
[148] Xiaolong Wang, Abhinav Shrivastava, and Abhinav Gupta. A-fast- Proceedings of the European conference on computer vision (ECCV),
rcnn: Hard positive generation via adversary for object detection. In pages 269–284, 2018.
Proceedings of the IEEE conference on computer vision and pattern [167] Zhi Zhang, Tong He, Hang Zhang, Zhongyue Zhang, Junyuan Xie, and
recognition, pages 2606–2615, 2017. Mu Li. Bag of freebies for training object detection neural networks.
[149] Yude Wang, Jie Zhang, Meina Kan, Shiguang Shan, and Xilin Chen. arXiv preprint arXiv:1902.04103, 2019.
Self-supervised equivariant attention mechanism for weakly supervised [168] Zhengli Zhao, Dheeru Dua, and Sameer Singh. Generating natural
semantic segmentation. In Proceedings of the IEEE/CVF Conference on adversarial examples. arXiv preprint arXiv:1710.11342, 2017.
Computer Vision and Pattern Recognition, pages 12275–12284, 2020. [169] Xu Zheng, Tejo Chalasani, Koustav Ghosal, Sebastian Lutz, and Aljosa
[150] Yunchao Wei, Jiashi Feng, Xiaodan Liang, Ming-Ming Cheng, Yao Smolic. Stada: Style transfer as data augmentation. arXiv preprint
Zhao, and Shuicheng Yan. Object region mining with adversarial arXiv:1909.01056, 2019.
erasing: A simple classification to semantic segmentation approach. In [170] Zhun Zhong, Liang Zheng, Guoliang Kang, Shaozi Li, and Yi Yang.
Proceedings of the IEEE conference on computer vision and pattern Random erasing data augmentation. In Proceedings of the AAAI
recognition, pages 1568–1576, 2017. conference on artificial intelligence, volume 34, pages 13001–13008,
2020.
[151] Yunchao Wei, Xiaodan Liang, Yunpeng Chen, Xiaohui Shen, Ming- [171] Barret Zoph, Ekin D Cubuk, Golnaz Ghiasi, Tsung-Yi Lin, Jonathon
Ming Cheng, Jiashi Feng, Yao Zhao, and Shuicheng Yan. Stc: A Shlens, and Quoc V Le. Learning data augmentation strategies for
simple to complex framework for weakly-supervised semantic segmen- object detection. In European conference on computer vision, pages
tation. IEEE transactions on pattern analysis and machine intelligence, 566–583. Springer, 2020.
39(11):2314–2320, 2016. [172] Barret Zoph, Golnaz Ghiasi, Tsung-Yi Lin, Yin Cui, Hanxiao Liu,
[152] Yunchao Wei, Huaxin Xiao, Honghui Shi, Zequn Jie, Jiashi Feng, and Ekin Dogus Cubuk, and Quoc Le. Rethinking pre-training and self-
Thomas S Huang. Revisiting dilated convolution: A simple approach training. Advances in neural information processing systems, 33:3833–
for weakly-and semi-supervised semantic segmentation. In Proceedings 3845, 2020.
of the IEEE conference on computer vision and pattern recognition, [173] Yuliang Zou, Zizhao Zhang, Han Zhang, Chun-Liang Li, Xiao Bian,
pages 7268–7277, 2018. Jia-Bin Huang, and Tomas Pfister. Pseudoseg: Designing pseudo labels
[153] Karl Weiss, Taghi M Khoshgoftaar, and DingDing Wang. A survey of for semantic segmentation. arXiv preprint arXiv:2010.09713, 2020.
transfer learning. Journal of Big data, 3(1):1–40, 2016.
[154] Sebastien C Wong, Adam Gatt, Victor Stamatescu, and Mark D
McDonnell. Understanding data augmentation for classification: when
to warp? In 2016 international conference on digital image computing:
techniques and applications (DICTA), pages 1–6. IEEE, 2016.
[155] Shasha Xie, Hui Lin, and Yang Liu. Semi-supervised extractive
speech summarization via co-training algorithm. In Eleventh Annual
Conference of the International Speech Communication Association,
2010.
[156] Tianshu Xie, Xuan Cheng, Xiaomin Wang, Minghui Liu, Jiali Deng,
Tao Zhou, and Ming Liu. Cut-thumbnail: A novel data augmentation
for convolutional neural network. In Proceedings of the 29th ACM
International Conference on Multimedia, pages 1627–1635, 2021.
[157] Mingle Xu, Sook Yoon, Alvaro Fuentes, and Dong Sun Park. A
comprehensive survey of image augmentation techniques for deep
learning. arXiv preprint arXiv:2205.01491, 2022.

You might also like