A Review On Generative Adversarial Networks Algorithms Theory and Applications
A Review On Generative Adversarial Networks Algorithms Theory and Applications
Abstract—Generative adversarial networks (GANs) have recently become a hot research topic; however, they have been studied since
2014, and a large number of algorithms have been proposed. Nevertheless, few comprehensive studies explain the connections among
different GAN variants and how they have evolved. In this paper, we attempt to provide a review of the various GAN methods from the
perspectives of algorithms, theory, and applications. First, the motivations, mathematical representations, and structures of most GAN
algorithms are introduced in detail, and we compare their commonalities and differences. Second, theoretical issues related to GANs
are investigated. Finally, typical applications of GANs in image processing and computer vision, natural language processing, music,
speech and audio, the medical field, and data science are discussed.
1041-4347 © 2021 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
See ht_tps://www.ieee.org/publications/rights/index.html for more information.
Authorized licensed use limited to: University of Obuda. Downloaded on April 08,2024 at 18:15:32 UTC from IEEE Xplore. Restrictions apply.
3314 IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. 35, NO. 4, APRIL 2023
TABLE 1
Overview of GAN Algorithms Discussed in Section 3
TABLE 2
Applications of GAN Algorithms Discussed in Section 5
To the best of our knowledge, this paper is the first to include maximum likelihood estimation (MLE), approximate
provide a comprehensive survey of GANs from algorithm, inference [83], [84], and the Markov chain method [85], [86],
theory, and application perspectives that covers recent [87]. These explicit density models use an explicit distribution
progress. Furthermore, our paper focuses on applications and have limitations. For instance, MLE is conducted on true
related not only to image processing and computer vision data, and its parameters are directly updated based on the
but also to sequential data such as natural language process- true data, which leads to an overly smooth generative model.
ing and to related areas such as the medical field. The generative model learned by approximate inference only
The remainder of this paper is organized as follows. The approaches the lower bound of the objective function rather
related works are discussed in Section 2. Sections 3–5 intro- than directly solving the objective function because of difficul-
duce GANs from the algorithm, theory, and application ties involved in solving the objective function. The Markov
perspectives. Tables 1 and 2 list the main GAN algorithms chain algorithm can be used to train generative models, but it
and application fields, which are discussed in Sections 3 is computationally expensive. Furthermore, explicit density
and 5, respectively. Finally, Section 6 concludes the survey. models have a computational tractability problem because
they may fail to reflect the complexity of the true data distribu-
2 RELATED WORK tion and learn the high-dimensional data distributions [88].
GANs belong to a class of generative algorithms. Generative
algorithms and discriminative algorithms are two categories 2.1.2 Implicit Density Models
of machine learning algorithms. Approaches that explicitly An implicit density model does not directly estimate or fit
or implicitly model the distributions of inputs as well as the data distribution; instead, it produces data instances
outputs are known as generative models [82]. Generative from the distribution without an explicit hypothesis [89]
algorithms have become increasingly popular and impor- and utilizes the produced examples to modify the model.
tant due to their wide practical applications. Prior to GANs, the implicit density model generally needs
to be trained utilizing either ancestral sampling [90] or Mar-
2.1 Generative Algorithms kov chain-based sampling, which is inefficient and limits
Generative algorithms can be classified into two classes: their practical applications. GANs belong to the directed
explicit density models and implicit density models. implicit density model category. A detailed summary and
relevant papers can be found in [91].
2.1.1 Explicit Density Models
An explicit density model defines a probability density func- 2.1.3 Comparison of GANs and Other Generative
tion pmodel ðx; uÞ and utilizes true data to fit the parameters u. Algorithms
After training, new examples are produced utilizing the GANs were proposed to overcome the disadvantages of other
trained model or distribution. The explicit density models generative algorithms. The basic idea behind adversarial
Authorized licensed use limited to: University of Obuda. Downloaded on April 08,2024 at 18:15:32 UTC from IEEE Xplore. Restrictions apply.
GUI ET AL.: REVIEW ON GENERATIVE ADVERSARIAL NETWORKS: ALGORITHMS, THEORY, AND APPLICATIONS 3315
learning is that the generator tries to create examples that are but the output of DðxÞ is a single scalar. DðxÞ denotes the
as realistic as possible to deceive the discriminator, while the probability that x comes from the data rather than from the
discriminator tries to distinguish the generated fake examples generator G. The discriminator D is trained to maximize the
from true examples. Both the generator and discriminator are probability of assigning a correct label to both real training
improved through adversarial learning. This adversarial pro- data and fake examples generated by the generator G. Simul-
cess gives GANs notable advantages over other generative taneously, G is trained to minimize log ð1 DðGðzÞÞÞ.
algorithms. The specific advantages of GANs over other gener-
ative algorithms are as follows. 3.1.1 Objective Function
1) GANs can parallelize generation across a single large Different objective functions can be used in GANs.
image, which is difficult for other generative algo- 3.1.1.1 Original minimax game: The objective function of
rithms such as the pixel convolutional neural net- GANs [6] is
work (PixelCNN) [92] and fully visible belief
min max V ðD; GÞ ¼ Expdata ðxÞ ½log DðxÞ
networks (FVBNs) [93], [94]. G D
(1)
2) The generator design has few restrictions. þ Ezpz ðzÞ ½log ð1 DðGðzÞÞÞ:
3) GANs are subjectively thought to produce better
examples than those produced by other methods. where log DðxÞ is the cross-entropy between ½1 0T and
Refer to [91] for more detailed discussions about these ½DðxÞ 1 DðxÞT . Similarly, log ð1 DðGðzÞÞÞ is the cross-
comparisons. entropy between ½0 1T and ½DðGðzÞÞ 1 DðGðzÞÞT . For
a fixed G, the optimal discriminator D [6] is given by
2.2 Adversarial Idea
pdata ðxÞ
The adversarial idea has been successfully applied in many DG ðxÞ ¼ : (2)
pdata ðxÞ þ pg ðxÞ
areas, including machine learning, artificial intelligence,
computer vision and natural language processing. The 2016 The minimax game in (1) can be reformulated as
defeat of the world’s top human Go player by the AlphaGo
model [95] engaged public interest in artificial intelligence. CðGÞ ¼ max V ðD; GÞ ¼ Expdata log DG ðxÞ
D
The intermediate version of AlphaGo utilizes two networks
that compete with each other. þ Ezpz log 1 DG ðGðzÞÞ
Adversarial examples [96], [97], [98], [99], [100], [101], ¼ Expdata log DG ðxÞ þ Expg log 1 DG ðxÞ
" #
[102], [103], [104], [105] also involve the adversarial idea.
pdata ðxÞ
Adversarial examples are examples that differ substantially ¼ Expdata log 1
from real examples but are classified into a real category 2 pdata ðxÞ þ pg ðxÞ
" #
with high confidence or examples that differ only slightly pg ðxÞ
from the real examples but are misclassified. This has þ Expg 1 2log 2: (3Þ
2 pdata ðxÞ þ pg ðxÞ
recently become a very hot research topic [100], [101]. To
prevent adversarial attacks [106], [107], [108], [109] utilized The Kullback–Leibler (KL) divergence and the Jensen-Shan-
GANs to conduct the correct defense. non (JS) divergence between two probabilistic distributions
Adversarial machine learning [110] is a minimax prob- pðxÞ and qðxÞ are defined as follows:
lem in which a defender, who builds the classifier that we
Z
want to work correctly, searches over the parameter space pðxÞ
KLðpkqÞ ¼ pðxÞlog dx; (4)
to find the parameters that reduce the cost of the classifier qðxÞ
as much as possible. Simultaneously, the attacker searches
over the model inputs to maximize the cost. 1 pþq 1 pþq
Adversarial ideas can be found in adversarial networks, JSðpkqÞ ¼ KL pk þ KL qk : (5)
2 2 2 2
adversarial machine learning, and adversarial examples.
However, they have different objectives. Therefore, (3) is equal to
pdata þ pg pdata þ pg
3 ALGORITHMS CðGÞ ¼ KL pdata k þ KL pg 2log 2
2 2
In this section, we first introduce the original GANs fol- ¼ 2JSðpdata kpg Þ 2log 2: (8)
lowed by their representative variants and training.
Thus, the objective function of GANs is related to the JS
3.1 GANs divergence.
The GAN framework is straightforward to implement when 3.1.1.2 Non-saturating game: In some cases, Eq. (1) may
the models are both neural networks. To learn the generator not provide a sufficient gradient for G to learn well. Gener-
distribution pg over data x, a prior on input noise variables is ally, G is poor during early learning, and the generated
defined as pz ðzÞ [6], where z is the noise variable. Then, the examples clearly substantially differ from the training data.
generator represents
a mapping from noise space to data Therefore, D can reject these early generated examples with
space as G z; ug , where G is a differentiable function repre- high confidence. In this situation, log ð1 DðGðzÞÞÞ satu-
sented by a neural network with parameters ug . The other rates. However, we can train G to maximize log ðDðGðzÞÞÞ
neural network, Dðx; ud Þ, is also defined with parameters ud , rather than minimize log ð1 DðGðzÞÞÞ. The cost for the
Authorized licensed use limited to: University of Obuda. Downloaded on April 08,2024 at 18:15:32 UTC from IEEE Xplore. Restrictions apply.
3316 IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. 35, NO. 4, APRIL 2023
generator then becomes inaccurate, while the second error type involves insuffi-
ciently diverse generated examples. Based on this, G will
J ðGÞ ¼ Ezpz ðzÞ ½log ðDðGðzÞÞÞ prefer to produce repetitious but safe examples rather than
(7) risk producing different but unsafe examples. This problem
¼ Expg ½log ðDðxÞÞ:
is termed the mode collapse problem.
This new objective function results in the same fixed point 3.1.1.3 Maximum likelihood game: Many methods exist
in the dynamics of D and G but provides much larger gra- to approximate (1) in GANs. Under the assumption that the
dients during the early learning process. The non-saturating discriminator is optimal, minimizing
game is heuristic and is not motivated by theory. However,
the non-saturating game has other problems, such as an
unstable numerical gradient for training G. With the opti- J ðGÞ ¼ Ezpz ðzÞ exp s 1 ðDðGðzÞÞÞ
(13)
mal DG , we have ¼ Ezpz ðzÞ ½DðGðzÞÞ=ð1 DðGðzÞÞÞ;
Expg log DG ðxÞ þ Expg log 1 DG ðxÞ
where s is the logistic sigmoid function, is equal to minimiz-
1 DG ðxÞ pg ðxÞ
¼ Expg log ¼ Expg log (8) ing (1) [111]. A demonstration of this equivalence can be
DG ðxÞ pdata ðxÞ
found in Section 8.3 of [91]. Furthermore, there are other
¼ KLðpg pdata Þ: possible ways of approximating maximum likelihood
within the GAN framework [20]. A comparison of the origi-
Therefore, Expg log DG ðxÞ is equal to nal zero-sum game, non-saturating game, and maximum
likelihood game is shown in Fig. 1.
Expg log DG ðxÞ
(9) Three observations can be obtained from Fig. 1.
¼ KLðpg pdata Þ Exp log 1 D ðxÞ :
g G
First, when the example is fake (the left end of the
figure), both the maximum likelihood game and the
From (3) and (6), we have original minimax game suffer from the vanishing
gradient problem. The heuristically motivated non-
Expdata log DG ðxÞ þ Expg log 1 DG ðxÞ saturating game does not have this problem.
(10)
¼ 2JSðpdata kpg Þ 2log 2: Second, the maximum likelihood game also has the
problem that almost all of the gradient occurs at the
right end of the curve, which means that a rather
Therefore, Expg log 1 DG ðxÞ equals
small number of examples in each mini-batch domi-
nate the gradient computation. This demonstrates
Expg log 1 DG ðxÞ
(11) that variance reduction methods based on the maxi-
¼ 2JSðpdata kpg Þ 2log 2 Expdata log DG ðxÞ : mum likelihood game could be an important
research direction for improving GAN performance.
By substituting (11) into (9), (9) reduces to
Third, the heuristically motivated non-saturating
game has lower example variance, which is one pos-
Expg log DG ðxÞ
sible reason why it is more successful in real
¼ KLðpg pdata Þ 2JSðpdata kpg Þ (12) applications.
þ Expdata log DG ðxÞ þ 2log 2: GAN Lab [112] was proposed as an interactive visualiza-
tion tool designed for non-experts to learn and experiment
From (12), we can see that optimizing the alternative G loss with GANs. Bau et al. [113] presented an analytic frame-
in the non-saturating game is contradictory because the first work for visualizing and understanding GANs.
term aims to minimize the divergence between the generated
distribution and the real distribution while the second term 3.2 GAN Representative Variants
aims to maximize the divergence between these two distri- There are many papers related to GANs [114], [115], [116],
butions due to the negative sign. This results in an unstable [117], [118], [119], [120], [121], [122], [123], [124], [125], [126],
numerical gradient when training G. Furthermore, the KL such as least squares GAN (LSGAN) [23], cyclic-synthesized
divergence is not a symmetrical quantity, as reflected by the GAN (CSGAN) [127], and latent optimisation for GAN
following two examples: (LOGAN) [128]. In this subsection, we will introduce the
If pdata ðxÞ ! 0 and pg ðxÞ ! 1, we have KLðpg representative GAN variants.
kpdata Þ ! þ1.
If pdata ðxÞ ! 1 and pg ðxÞ ! 0, we have KLðpg 3.2.1 InfoGAN
kpdata Þ ! 0. Rather than utilizing a single unstructured noise vector z,
The penalties for the two types of errors made by G are decomposing the input noise vector into two parts was pro-
completely different. The first error type occurs when G posed for information maximizing GAN (InfoGAN) [17]: z,
produces implausible examples, which results in a large which is considered incompressible noise, and c, which is
penalty. The second error type occurs when G does not pro- called the latent code and targets the significant structured
duce real examples, and the penalization is quite small. The semantic features of the real data distribution. InfoGAN
first error type involves generated examples that are [17] aims to solve
Authorized licensed use limited to: University of Obuda. Downloaded on April 08,2024 at 18:15:32 UTC from IEEE Xplore. Restrictions apply.
GUI ET AL.: REVIEW ON GENERATIVE ADVERSARIAL NETWORKS: ALGORITHMS, THEORY, AND APPLICATIONS 3317
By comparing (15) and (16), we can see that the InfoGAN Furthermore, the l1 distance is used
generator is similar to that of cGANs. However, the latent
code c of InfoGAN is not known; it is discovered through Ll1 ðGÞ ¼ Ex;y kx GðyÞk1 : (20)
Authorized licensed use limited to: University of Obuda. Downloaded on April 08,2024 at 18:15:32 UTC from IEEE Xplore. Restrictions apply.
3318 IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. 35, NO. 4, APRIL 2023
h i
min VLSGAN ðGÞ ¼ Ezpz ðzÞ ðDðGðzÞÞ cÞ2 ; and Y ¼ ðy1 ; ; yn Þ be examples consisting of independent
(26)
G and identically distributed observations drawn from P and Q,
respectively. Then, the MMD and its empirical estimate are
where c is the value that G hopes for D to believe for gener- defined as follows:
ated examples. The authors of [24] showed that LSGANs
have two advantages over the original GANs. MMDðE; P; QÞ ¼ sup ExP ½f ðxÞ EyQ ½f ðyÞ MMDðE; X; Y Þ
f2E
!
The new decision boundary produced by D imposes 1X m
1X n
¼ sup f ðxi Þ f ð yi Þ :
a large penalty for generated examples that are far f2E m i¼1 n i¼1
from the decision boundary, which forces the “low (29)
quality” generated examples to move toward the
decision boundary. This approach is effective at gen- When E is the unit ball in a universal RKHS, Theorem 2.2 in
erating higher quality examples. [169] guarantees that MMDðE; P; QÞ will detect any discrep-
Penalizing generated examples far from the decision ancy between P and Q. The MMD has been widely used for
boundary results in larger gradients when updating GANs [170], [171], [172], [173], [174], [175], [176], [177].
G, which overcomes the vanishing gradient prob-
lems in the original GANs. WGAN: The authors of [21] conducted a comprehensive
3.3.1.2 f-GAN: The KL divergence measures the differ- theoretical analysis of how the Wasserstein-1 distance
ence between two probability distributions. A large class of behaves in comparison with popular probability distances
assorted divergences are the so-called Ali-Silvey distances, and divergences such as the total variation (TV) distance,
also known as the f-divergences [163]. Given two probabil- the KL divergence, and the JS divergence utilized in the con-
ity distributions P and Q that have absolutely continuous text of learning distributions. The definition of the Wasser-
density functions p and q, respectively, with regard to a stein-1 distance is
base measure dx defined on domain X, the f-divergence is
defined as following: W ðpdata ; pg Þ ¼ inf Eðx;yÞ2g ½kx yk; (30)
g2Pðpdata ;pg Þ
Z
pðxÞ where P pdata ; pg denotes the set of all joint distributions
Df ðP kQÞ ¼ qðxÞf dx: (27)
X qðxÞ g ðx; yÞ whose marginals are pdata and pg . However, the infi-
mum in (30) is highly intractable. According to the Kantoro-
Different choices of f recover popular divergences as spe-
vich-Rubinstein duality [178], we know that
cial cases of the f-divergence. For example, if f ðaÞ ¼ alog a,
the f-divergence becomes the KL divergence. The original
W ðpdata ; pg Þ ¼ sup Ex2pdata ½f ðxÞ Ex2pg ½f ðxÞ (31)
GANs [6] are a special case of f-GAN [20], which is based kf kL 1
on the f-divergence. The authors of [20] showed that any
f-divergence can be used for training GANs. Furthermore, where the supremum is taken over all the 1-Lipschitz func-
[20] discussed the advantages of different choices of diver- tions f. In [21], kf kL 1 was replaced with kf kL K (con-
gence functions on both the quality of the produced genera- sidering K-Lipschitz for some constant K), and
tive models and the training complexity. Im et al. [164] K W ðpdata ; pg Þ was obtained. The authors of [21] used the
quantitatively evaluated GANs with divergences proposed following equation to approximate the Wasserstein-1 dis-
for training. Uehara et al. [165] further extended f-GAN by tance:
directly minimizing the f-divergence in the generator step;
max Expdata ðxÞ ½fw ðxÞ Ezpz ðzÞ ½fw ðGðzÞÞ; (32)
then, the ratio of the real and generated data distributions w2W
are predicted in the discriminator step.
3.3.1.3 Integral probability metrics (IPMs): P denotes the where a parameterized family of functions ffw gw2W exists
set of all Borel probability measures on a topological space that are all K-Lipschitz for some K, and fw can be realized
ðM; AÞ. The integral probability metric (IPM) [166], [167] by the discriminator D. When D is optimized, (32) denotes
between two probability distributions P 2 P and Q 2 P is the approximated Wasserstein-1 distance. Then, the aim of
defined as G is to minimize (32) to make the generated distribution as
close to the real distribution as possible. Therefore, the over-
Z Z
all objective function of WGAN is
g F ðP; QÞ ¼ sup fdP fdQ ; (28)
f2F M M
min maxw2W Expdata ðxÞ ½fw ðxÞ Ezpz ðzÞ ½fw ðGðzÞÞ
G
where F is a class of real-valued bounded measurable func-
¼ min max Expdata ðxÞ ½DðxÞ Ezpz ðzÞ ½DðGðzÞÞ:
tions on M. IPMs include the reproducing kernel Hilbert G D
space (RKHS)-induced maximum mean discrepancy (33)
(MMD) [168] and the Wasserstein distance used in WGAN.
MMD: The following definition of the MMD can be By comparing (1) and (33), we can see three differences
found in [169]. Here, X represents the input domain, which between the objective function of the original GANs and
is assumed to be a nonempty compact set. that of the WGAN:
Definition 1. Let E be a class of functions f : E ! R. Let P and First, there is no log in the objective function of
Q be Borel probability distributions, and let X ¼ ðx1 ; ; xm Þ WGAN.
Authorized licensed use limited to: University of Obuda. Downloaded on April 08,2024 at 18:15:32 UTC from IEEE Xplore. Restrictions apply.
3320 IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. 35, NO. 4, APRIL 2023
Second, the D in the original GANs is utilized as a transformed layer CðxÞ as DðxÞ ¼ sðCðxÞÞ. A simple way to
binary classifier, while the D in WGAN is utilized to make the discriminator relativistic (i.e., to make the output
approximate the Wasserstein distance, which is a of D depend on both real and generated examples) [31] is to
regression task. Therefore, the sigmoid function that sample from real and generated data pairs x~ ¼ xr ; xg ,
appears in the last layer of D is not used in WGAN; which is defined as
the discriminator of the original GANs outputs a
value between zero and one, while no such con- Dðx~Þ ¼ s C ðxr Þ C xg : (36)
straint exists for WGAN.
This modification can be interpreted in the following way
Third, the D in WGAN is required to be K-Lipschitz
[31]: D estimates the probability that the given real example
for some K; therefore, WGAN uses weight clipping.
is more realistic than a randomly
sampled generated exam-
Compared with traditional GAN training, WGAN
ple. Similarly, Drev ðx~Þ ¼ s C xg C ðxr Þ can be inter-
improves the learning stability and provides meaningful
preted as the probability that the given generated example
learning curves that are useful for hyperparameter searches
is more realistic than a randomly sampled real example.
and debugging. However, approximating the K-Lipschitz
The discriminator and generator loss functions of the rela-
constraint is challenging, which is required by the Wasser-
tivistic standard GAN (RSGAN) are
stein-1 metric. WGAN-GP, proposed in [22], uses a gradient
penalty to restrict the K-Lipschitz constraint, and the
LRSGAN
D ¼ Eðxr ;xg Þ log s C ðxr Þ C xg ; (37)
WGAN-GP objective function is
L ¼ Expdata ½DðxÞ þ Ex~pg ½Dðx~Þ LRSGAN ¼ Eðxr ;xg Þ log s C xg C ðxr Þ : (38)
h 2 i (34)
G
þEx^px^ krx^ Dðx^Þk2 1 Most GANs can be parameterized
where the first two terms are the WGAN objective function
LGAN
D ¼ Exr ½f1 ðC ðxr ÞÞ þ Exg f2 C xg ; (39)
and x^ is sampled from the distribution px^ ; these are uniform
examples along straight lines between pairs of points sam-
pled from the real data distribution pdata and the generated LGAN
G ¼ Exr ½g1 ðC ðxr ÞÞ þ Exg g2 C xg ; (40)
distribution pg . Gradient penalties are now a commonly
used approach in GANs, following [179], [180], [181]. Some where f1 , f2 , g1 , and g2 are scalar-to-scalar functions. If we
other methods are closely related to WGAN-GP, such as adopt a relativistic discriminator, the loss functions of these
deep regret analytic GAN (DRAGAN) [182]. Wu et al. [183] GANs become
proposed a novel and relaxed version of the Wasserstein-1
LRGAN
D ¼ Eðxr ;xg Þ f1 C ðxr Þ C xg
metric called the Wasserstein divergence (W-div), which ; (41)
does not require the K-Lipschitz constraint. Based on W- þ Eðxr ;xg Þ f2 C xg C ðxr Þ
div, Wu et al. [183] introduced a Wasserstein divergence
objective for GANs (WGAN-div) that faithfully approxi-
mates W-div through optimization. The Wasserstein dis- LRGAN
G ¼ Eðxr ;xg Þ g1 C ðxr Þ C xg
(42)
tance was argued to lead to biased gradients, and the use of þ Eðxr ;xg Þ g2 C xg C ðxr Þ :
the Cramer distance between two distributions was sug-
gested and implemented in CramerGAN [184]. Other 3.3.2 Skills
papers related to WGAN can be found in [185], [186], [187], NIPS 2016 held a workshop on adversarial training and
[188], [189]. invited Soumith Chintala to give a talk called “How to train
3.3.1.4: Spectrally normalized GANs (SN-GANs): A a GAN”. This talk included assorted tips and tricks, such as
novel weight normalization method named spectral nor- suggesting that when labels are available, also training the
malization to stabilize the discriminator training was pro- discriminator to classify the examples is useful, as in AC-
posed in SN-GANs [26]. This new normalization technique GAN [33]. Readers can refer to the GitHub repository asso-
is both computationally efficient and easy to integrate into ciated with Soumith’s talk, https://github.com/soumith/
existing methods. Spectral normalization [26] uses a simple ganhacks, for more advice.
method to make the weight matrix W satisfy the Lipschitz Salimans et al. [32] proposed useful and improved techni-
constraint s ðW Þ ¼ 1: ques for training GANs (ImprovedGANs), such as feature
matching, mini-batch discrimination, historical averaging,
SN ðW Þ :¼ W =s ðW Þ;
W (35) one-sided label smoothing, and virtual batch normalization.
where W is the weight matrix of each layer in D, and s ðW Þ is
the spectral norm of W . As shown in [26], SN-GANs can gen- 3.3.3 Structure
erate images of equal or better quality than the previous The original GANs utilized multi-layer perceptrons (MLPs).
training stabilization methods. In theory, spectral normaliza- Specific types of structures may be better for specific appli-
tion can be applied to all GAN variants. Both BigGANs [39] cations, e.g., recurrent neural networks (RNNs) for time
and self-attention GAN (SAGAN) [38] use spectral normali- series data and convolutional neural networks (CNNs) for
zation and have achieved good performances on ImageNet. images.
3.3.1.5 Relativistic GANs (RGANs): In the original GANs, 3.3.3.1 The original GANs: The original GANs used
the discriminator can be defined according to the non- MLPs for both the generator G and discriminator D.
Authorized licensed use limited to: University of Obuda. Downloaded on April 08,2024 at 18:15:32 UTC from IEEE Xplore. Restrictions apply.
GUI ET AL.: REVIEW ON GENERATIVE ADVERSARIAL NETWORKS: ALGORITHMS, THEORY, AND APPLICATIONS 3321
equivalent to maximizing the log-likelihood as the number Other methods also exist that can reduce mode collapse
of examples m increases: in GANs. For example, PACGAN [202] alleviated mode col-
lapse by changing the input to the discriminator.
u ¼ arg min KLD pdata kpg
u 4.3 Other Theoretical Issues
Z
pg ðxÞ 4.3.1 Do GANs Actually Learn the Distribution?
¼ arg min pdata ðxÞlog dx
u p data ðxÞ Perhaps the most crucial aspect of GAN theory is whether
Z
the distributions are modeled. Both true data distributions
¼ arg min pdata ðxÞlog pdata ðxÞdx and GAN generator distributions have their own densities.
u
Z As [159] noted, neither distribution typically has a density
pdata ðxÞlog pg ðxÞdx in GANs. Furthermore, [159] studied and proved the prob-
Z lems involved in training GANs, such as saturation and
¼ arg max pdata ðxÞlog pg ðxÞdx instability, investigated directions to mitigate these prob-
u lems and introduced new tools to study them.
1 Xm Several studies [44], [200], [203] both empirically and the-
¼ arg max lim log pg ðxi Þ: (43)
u m!1 m i¼1
oretically shed light on the fact that distributions learned by
GANs suffer from mode collapse. In contrast, Bai et al. [204]
The model probability distribution pu ðxÞ is replaced with showed that GANs can in principle learn distributions
pg ðxÞ for notation consistency. Refer to Chapter 5 of [197] using the Wasserstein distance (or KL divergence in many
for more information on MLE and other statistical situations) with polynomial sample complexity if the dis-
estimators. criminator class has strong discriminating power against
the particular generator class (instead of against all possible
4.2 Mode Collapse generators). Liang et al. [205] studied how well GANs learn
GANs are notoriously difficult to train, and they have been densities, including nonparametric and parametric target
observed [29], [32] to suffer from mode collapse [198], [199], distributions. Singh et al. [206] further studied nonparamet-
in which the generator learns to generate examples from ric density estimation with adversarial losses.
only a few modes of the data distribution and misses many
other modes, even if examples of the missing modes exist 4.3.2 Divergence/Distance
throughout the training data. In the worst case, the genera-
Arora et al. [200] showed that GAN training may not result
tor simply produces a single example (complete collapse)
in good generalization properties (e.g., training may look
[159], [200]. In this subsection, we first introduce two view-
successful, but the generated distribution may be far from
points regarding GAN mode collapse. Then, we introduce
the real data distribution using standard metrics). Popular
methods that propose new objective functions or new struc-
distances such as Wasserstein and JS may not generalize
tures to solve the mode collapse problem.
well. However, generalization can still occur by introducing
a novel notion of distance between distributions—the neu-
4.2.1 Two Viewpoints: Divergence and Algorithmic ral net distance—which raises the issue of whether other
We can resolve and understand GAN mode collapse and useful divergences exist.
instability from both divergence and algorithmic viewpoints.
Divergence Viewpoint. Roth et al. [179] stabilized the train- 4.3.3 Mathematical Perspectives Such as Optimization
ing of GANs and their variants, such as f-divergence based
Mohamed et al. [207] used their understanding of GANs to
GANs (f-GAN), through regularization.
build connections to the diverse set of statistical thinking
Algorithmic Viewpoint. The numerics of common algo-
regarding GANs. Gidel et al. [208] examined optimization
rithms for training GANs were analyzed and a new algo-
approaches designed for GANs and cast GAN optimization
rithm with better convergence was proposed in [201].
problems into the general variational inequality framework.
Mescheder et al. [180] showed which training methods for
The convergence and robustness of training GANs with reg-
GANs actually converge.
ularized optimal transport are discussed in [209].
to many fields, such as image processing, computer vision, person generation network (PG2 ) that synthesizes person
and sequential data. images in arbitrary poses based on a novel pose and an
image of that person. Cao et al. [220] proposed a high-fidel-
5.1 Image Processing and Computer Vision ity pose-invariant model for high-resolution face frontaliza-
The most successful applications of GANs are in image tion based on GANs. Siarohin et al. [221] proposed
processing and computer vision, such as image super-reso- deformable GANs for pose-based human image generation.
lution, image synthesis and manipulation, and video Pose-robust spatial-aware GAN (PSGAN) was proposed for
processing. customizable makeup transfer in [59].
Portrait Related: APDrawingGAN [60] was proposed to
generate artistic portrait drawings from face photos with
5.1.1 Super-Resolution (SR) hierarchical GANs. APDrawingGAN has software based on
SRGAN [51], which is a GAN model for performing SR, was WeChat, and the results are shown in Fig. 5. GANs have
the first framework able to infer photo-realistic natural also been used in other face-related applications, such as
images for 4 upscaling factors. To further improve the facial attribute changes [222] and portrait editing [223],
visual quality of SRGAN, Wang et al. [52] thoroughly stud- [224], [225], [226].
ied three of its key components and improved each one to Face Generation: The quality of faces generated by GANs
derive an enhanced SRGAN (ESRGAN). For example, ESR- has steadily improved year over year; examples can be
GAN uses the idea from relativistic GANs [31] of having found in Sebastian Nowozin’s GAN lecture materials1. As
the discriminator predict relative realness rather than abso- shown in Fig. 4, faces generated based on the original
lute value. Benefiting from these improvements, ESRGAN GANs [6] have poor visual qualities and serve only as a
won first place in the PIRM2018-SR Challenge (region 3) proof of concept. Radford et al. [35] used better neural net-
[211] and obtained the best perceptual index. Based on work architectures—deep convolutional neural networks—
CycleGAN [154], cycle-in-cycle GANs [53] were proposed for generating faces. Roth et al. [179] addressed GAN train-
for unsupervised image SR. SRDGAN [54] were proposed ing instability problems, which allowed larger architectures
to learn the noise prior for SR with DualGAN [158]. Deep such as ResNet to be utilized. Karras et al. [36] utilized
tensor generative adversarial nets (TGAN) [55] were pro- multi-scale training to enable megapixel face image genera-
posed to generate large high-quality images by exploring tion with high fidelity.
tensor structures. Specific methods have been designed for Face generation [19], [227], [228], [229], [230], [231], [232],
face SR [212], [213], [214]. Other related methods can be [233] is relatively easy because the problem includes only
found in [215], [216], [217], [218]. one class of objects. Every object is a face, and most face
datasets tend to be composed of people looking straight
5.1.2 Image Synthesis and Manipulation into the camera. Most people are registered by putting nose,
eyes, and other landmarks in consistent locations.
5.1.2.1 Faces: Pose related: Disentangled representation learn- 5.1.2.2 General objects: Having GANs work on assorted
ing GAN (DR-GAN) [219] was proposed for pose-invariant data sets, such as ImageNet [147], which has a thousand dif-
face recognition. Huang et al. [57] proposed a two-pathway ferent object classes, is slightly more difficult. However,
GAN (TP-GAN) for photorealistic frontal view synthesis by progress on this task has been rapid in recent years, and the
simultaneously perceiving both local details and global quality of such generated images has steadily improved
structures. Ma et al. [58] proposed the novel pose guided [180].
Most studies use GANs to synthesize 2D images [234],
[235]; however, Wu et al. [236] synthesized three-dimen-
sional (3D) novel objects such as cars, chairs, sofas, and
tables using GANs and volumetric convolutions. Im et al.
[237] generated images with recurrent adversarial networks.
Yang et al. [238] proposed layered recursive GANs (LR-
GAN) for image generation.
5.1.2.3 Interaction between a human being and an image
generation process: Many applications involve interactions
Fig. 5. Given a photo such as the image in (a), APDrawingGAN can pro-
duce corresponding artistic portrait drawings such as the image in (b). 1. https://github.com/nowozin/mlss2018-madrid-gan
Authorized licensed use limited to: University of Obuda. Downloaded on April 08,2024 at 18:15:32 UTC from IEEE Xplore. Restrictions apply.
3324 IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. 35, NO. 4, APRIL 2023
between a human being and an image generation process; GANs have also been used in other video applications,
however, realistic image manipulation in such situations is such as video prediction [72], [243], [244] and video retar-
difficult because it requires allowing the user to control geting [245].
image modifications while still making them appear realis-
tic. When the user does not have sufficient artistic skill, the
image easily deviates from the manifold of natural images 5.1.6 Other Image and Vision Applications
while editing. Interactive GAN (IGAN) [61] defines a class GANs have also been utilized in other image processing
of image editing operations and constrains their output to and computer vision tasks [246], [247], [248], such as
lie on a learned manifold at all times. Introspective adver- object transfiguration [249], [250], semantic segmentation
sarial networks [62] also offer the ability to perform interac- [251], visual saliency prediction [252], object tracking
tive photo editing; their results have been demonstrated [253], [254], image dehazing [255], [256], [257], natural
mostly for face editing. GauGAN [63] can turn doodles into image matting [258], image inpainting [259], [260], image
stunning, photorealistic landscapes. fusion [261], image completion [262], [263], and image
classification [264].
Creswell et al. [265] showed that the representations
5.1.3 Texture Synthesis learned by GANs can also be used for retrieval. GANs have
Texture synthesis is a classical problem in the image field. also been used to anticipate where people will look next
Markovian GANs (MGAN) [64] is a texture synthesis [266], [267].
method based on GANs. By capturing the texture data of
Markovian patches, MGAN can generate stylized videos
and images very quickly and realize real-time texture syn- 5.2 Sequential Data
thesis. Spatial GAN (SGAN) [65] was the first to apply Additionally, GANs have made achievements in sequential
GANs with fully unsupervised learning to texture synthe- data tasks, such as those involving natural language, music,
sis. Periodic spatial GAN (PSGAN) [66] is an SGAN variant speech, voice [268], [269], and time series data [270], [271],
that can learn periodic textures from either a single image [272], [273].
or a large complex dataset. Natural Language Processing (NLP). IRGAN [76], [77] was
proposed for information retrieval (IR). Li et al. [274] used
adversarial learning to generate neural dialogue. GANs
5.1.4 Object Detection have also been used for text generation [75], [275], [276],
How can we learn an object detector that is invariant to [277] and speech language processing [81]. KBGAN [278]
deformations and occlusions? One way is to use a data- was proposed to generate high-quality negative examples,
driven strategy—collecting large-scale datasets that have and it was used in knowledge graph embeddings. Adver-
numerous object examples that appear in different condi- sarial REward Learning (AREL) [279] was proposed for
tions. Using this strategy, we can simply hope that the final visual storytelling. DSGAN [280] was proposed for distant
classifier can use these numerous instances to learn invari- supervision relation extraction. ScratchGAN [281] was pro-
ances. However, can all the possible deformations and posed to train a language GAN from scratch—without max-
occlusions be included in a dataset? Some deformations and imum likelihood pre-training.
occlusions are so rare that they almost never occur in real- Qiao et al. [78] learned text-to-image generation by rede-
world conditions; however, we want our method to be scription, and a text conditioned auxiliary classifier GAN
invariant to such situations. To address this problem, Wang (TAC-GAN) [282] was also proposed for text-to-image
et al. [239] used GANs to generate instances with deforma- tasks. GANs have also been widely used for image-to-text
tions and occlusions. The goal of the generator is to generate tasks (image captioning) [283], [284].
instances that are difficult for the object detector to classify. Furthermore, GANs have been widely utilized in other
By using a segmentation model and GANs, Segan [67] NLP applications, such as question-answer selection [285],
detected objects occluded by other objects in an image. To [286], poetry generation [287], talent-job fit [288], and review
address the small object detection problem, Li et al. [68] pro- detection and generation [289], [290].
posed perceptual GANs, and Bai et al. [69] proposed an Music: GANs have also been used to generate music,
end-to-end multi-task GAN (MTGAN). including continuous RNN-GAN (C-RNN-GAN) [79],
object-reinforced GAN (ORGAN) [80], and sequence GANs
5.1.5 Video Applications (SeqGAN) [81].
Speech and Audio. GANs have been used for speech and
The first study to use GANs for video generation was
audio analysis, such as synthesis [291], [292], [293], enhance-
[70]. Villegas et al. [240] proposed a deep neural network
ment [294], and recognition [295].
to predict future frames in natural video sequences using
GANs. Denton and Birodkar [71] proposed a new model
named disentangled representation net (DRNET) that 5.3 Other Applications
learns disentangled image representations from a video Medical Field. GANs have been widely utilized in the medi-
based on GANs. A novel video-to-video synthesis app- cal fields such as for generating and designing DNA [296],
roach (video2video) under a generative adversarial learn- [297], drug discovery [298], generating multi-label discrete
ing framework was proposed in [73]. MoCoGan [74] was patient records [299], medical image processing [300], [301],
proposed to decompose motion and content to generate [302], [303], [304], [305], [306], [307], and doctor recommen-
videos [241], [242]. dation [308].
Authorized licensed use limited to: University of Obuda. Downloaded on April 08,2024 at 18:15:32 UTC from IEEE Xplore. Restrictions apply.
GUI ET AL.: REVIEW ON GENERATIVE ADVERSARIAL NETWORKS: ALGORITHMS, THEORY, AND APPLICATIONS 3325
Data Science. GANs have been used to generate data [11] K. Wang, C. Gou, Y. Duan, Y. Lin, X. Zheng, and F.-Y. Wang,
“Generative adversarial networks: introduction and outlook,”
[309], [310], [311], [312], [313], [314], [315], [316], to generate IEEE/CAA J. Automatica Sinica, vol. 4, no. 4, pp. 588–598, 2017.
neural networks [317], to augment data [318], [319], to learn [12] Y. Hong, U. Hwang, J. Yoo, and S. Yoon, “How generative adver-
spatial representations [320], and in network embedding sarial networks and their variants work: An overview,” ACM
[321], heterogeneous information networks [322], and Comput. Surv., vol. 52, no. 1, pp. 1–43, 2019.
[13] A. Creswell, T. White, V. Dumoulin, K. Arulkumaran, B. Sen-
mobile user profiling [323]. gupta, and A. A. Bharath, “Generative adversarial networks: An
Finally, GANs have been widely applied to many overview,” IEEE Signal Process. Mag., vol. 35, no. 1, pp. 53–65,
other areas, such as malware detection [324], steganogra- Jan. 2018.
[14] Z. Wang, Q. She, and T. E. Ward, “Generative adversarial net-
phy [325], [326], [327], [328], privacy preserving [329], works in computer vision: A survey and taxonomy,” ACM Com-
[330], [331], social robots [332], and network pruning put. Surv., vol. 54, no. 2, pp. 1–38, 2021.
[333], [334]. [15] M. Zamorski, A. Zdobylak, M. Zieba, and J. Swiatek, “Generative
adversarial networks: recent developments,” in Proc. Int. Conf.
Artif. Intell. Soft Comput., 2019, pp. 248–258.
[16] Z. Pan, W. Yu, X. Yi, A. Khan, F. Yuan, and Y. Zheng, “Recent
6 CONCLUSION progress on generative adversarial networks (GANs): A survey,”
This paper provides a comprehensive review of various IEEE Access, vol. 7, pp. 36322–36333, 2019.
[17] X. Chen, Y. Duan, R. Houthooft, J. Schulman, I. Sutskever, and
aspects of GANs by elaborating on several perspectives, i.e., P. Abbeel, “InfoGAN: Interpretable representation learning by
algorithms, theory, and applications. We believe that this information maximizing generative adversarial nets,” in Proc.
survey will help readers gain a thorough understanding of Neural Inf. Process. Syst., 2016, pp. 2172–2180.
the existing research on GANs. To conclude, we would like [18] M. Mirza and S. Osindero, “Conditional generative adversarial
nets,” 2014, arXiv:1411.1784.
to note that, in order to maintain an appropriate size of the
[19] Y. Lu, Y.-W. Tai, and C.-K. Tang, “Attribute-guided face genera-
article, we had to limit the number of referenced studies. tion using conditional cyclegan,” in Proc. Eur. Conf. Comput. Vis.,
We therefore apologize to the authors of papers that were 2018, pp. 282–297.
not cited. [20] S. Nowozin, B. Cseke, and R. Tomioka, “f-GAN: Training genera-
tive neural samplers using variational divergence mini-
mization,” in Proc. Neural Inf. Process. Syst., 2016, pp. 271–279.
ACKNOWLEDGMENTS [21] M. Arjovsky, S. Chintala, and L. Bottou, “Wasserstein generative
adversarial networks,” in Proc. Int. Conf. Mach. Learn., 2017,
The authors would like to thank the NetEase course taught pp. 214–223.
by Shuang Yang, Ian Goodfellow’s invited talk at AAAI 19, [22] I. Gulrajani, F. Ahmed, M. Arjovsky, V. Dumoulin, and A. C.
the CVPR 2018 tutorial on GANs, and Sebastian Nowozin’s Courville, “Improved training of wasserstein GANs,” in Proc.
MLSS 2018 GAN lecture materials. The authors would also Neural Inf. Process. Syst., 2017, pp. 5767–5777.
[23] G.-J. Qi, “Loss-sensitive generative adversarial networks on lip-
like to thank Shuang Yang, Weinan Zhang, and the mem- schitz densities,” in Proc. Int. J. Comput. Vis., 2019, pp. 1–23.
bers of the Umich Yelab and Foreseer research groups for [24] X. Mao, Q. Li, H. Xie, R. Y. Lau, Z. Wang, and S. P. Smolley,
helpful discussions. “Least squares generative adversarial networks,” in Proc. Int.
Conf. Comput. Vis., 2017, pp. 2794–2802.
[25] X. Mao, Q. Li, H. Xie, R. Y. K. Lau, Z. Wang, and S. P. Smolley,
REFERENCES “On the effectiveness of least squares generative adversarial
[1] L. J. Ratliff, S. A. Burden, and S. S. Sastry, “Characterization and networks,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 41, no. 12,
computation of local Nash equilibria in continuous games,” in pp. 2947–2960, Dec. 2019.
Proc. Annu. Allerton Conf. Commun., Control, Comput., 2013, [26] T. Miyato, T. Kataoka, M. Koyama, and Y. Yoshida, “Spectral
pp. 917–924. normalization for generative adversarial networks,” in Proc. Int.
[2] J. Schmidhuber, “Making the world differentiable: On using fully Conf. Learn. Representations, 2018, pp. 1–26.
recurrent self-supervised neural networks for dynamic reinforce- [27] J. H. Lim and J. C. Ye, “Geometric GAN,” 2017, arXiv:1705.02894.
ment learning and planning in non-stationary environments,” [28] D. Tran, R. Ranganath, and D. M. Blei, “Hierarchical implicit
Inst. Comput. Sci., Tech. Univ. Munich, Germany, FKI-126, Tech. models and likelihood-free variational inference,” in Proc. Neural
Rep., 1990. Inf. Process. Syst., 2017, pp. 2794–2802.
[3] J. Schmidhuber, “A possibility for implementing curiosity and [29] T. Che, Y. Li, A. P. Jacob, Y. Bengio, and W. Li, “Mode regular-
boredom in model-building neural controllers,” in Proc. Int. ized generative adversarial networks,” in Proc. Int. Conf. Learn.
Conf. Simul. Adaptive Behav., From Animals Animats, 1991, Representations, 2017, pp. 1–13.
pp. 222–227. [30] L. Metz, B. Poole, D. Pfau, and J. Sohl-Dickstein, “Unrolled gen-
[4] J. Schmidhuber, “Art & science as by-products of the search for erative adversarial networks,” in Proc. Int. Conf. Learn. Representa-
novel patterns, or data compressible in unknown yet learnable tions, 2017, pp. 1–25.
ways,” M. Botta ed., et al. Edizioni, pp. 98–112, 2009. [Online]. [31] A. Jolicoeur-Martineau, “The relativistic discriminator: A key
Available: https://people.idsia.ch/~juergen/onlinepub.html element missing from standard GAN,” in Proc. Int. Conf. Learn.
[5] J. Schmidhuber, “Learning factorial codes by predictability mini- Representation, 2019, pp. 1–13.
mization,” Neural Comput., vol. 4, no. 6, pp. 863–879, 1992. [32] T. Salimans, I. Goodfellow, W. Zaremba, V. Cheung, A. Radford,
[6] I. Goodfellow et al., “Generative adversarial nets,” in Proc. Neural and X. Chen, “Improved techniques for training GANs,” in Proc.
Inf. Process. Syst., pp. 2672–2680, 2014. Neural Inf. Process. Syst., 2016, pp. 2234–2242.
[7] J. Schmidhuber, “Unsupervised minimax: Adversarial curiosity, [33] A. Odena, C. Olah, and J. Shlens, “Conditional image synthesis
generative adversarial networks, and predictability mini- with auxiliary classifier GANs,” in Proc. Int. Conf. Mach. Learn.,
mization,” 2020, arXiv:1906.04493. 2017, pp. 2642–2651.
[8] X. Wu, K. Xu, and P. Hall, “A survey of image synthesis and edit- [34] E. L. Denton, S. Chintala, A. D. Szlam, and R. Fergus, “Deep gen-
ing with generative adversarial networks,” Tsinghua Sci. Technol., erative image models using a laplacian pyramid of adversarial
vol. 22, no. 6, pp. 660–674, 2017. networks,” in Proc. Neural Inf. Process. Syst., 2015, pp. 1486–1494.
[9] R. Zhou, C. Jiang, and Q. Xu, “A survey on generative adversar- [35] A. Radford, L. Metz, and S. Chintala, “Unsupervised representa-
ial network-based text-to-image synthesis,” Neurocomputing, vol. tion learning with deep convolutional generative adversarial
451, pp. 316–336, 2021. networks,” in Proc. Int. Conf. Learn. Representations, 2016, pp. 1–16.
[10] N. Torres-Reyes and S. Latifi, “Audio enhancement and synthe- [36] T. Karras, T. Aila, S. Laine, and J. Lehtinen, “Progressive growing
sis using generative adversarial networks: A survey,” Int. J. Com- of GANs for improved quality, stability, and variation,” in Proc.
put. Appl., vol. 182, no. 35, pp. 27–31, 2019. Int. Conf. Learn. Representations, 2018, pp. 1–26.
Authorized licensed use limited to: University of Obuda. Downloaded on April 08,2024 at 18:15:32 UTC from IEEE Xplore. Restrictions apply.
3326 IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. 35, NO. 4, APRIL 2023
[37] H. Zhang et al., “StackGAN: Text to photo-realistic image synthe- [62] A. Brock, T. Lim, J. M. Ritchie, and N. Weston, “Neural photo
sis with stacked generative adversarial networks,” in Proc. Int. editing with introspective adversarial networks,” in Proc. Int.
Conf. Comput. Vis., 2017, pp. 5907–5915. Conf. Learn. Representations, 2017, pp. 1–15.
[38] H. Zhang, I. Goodfellow, D. Metaxas, and A. Odena, “Self-atten- [63] T. Park, M.-Y. Liu, T.-C. Wang, and J.-Y. Zhu, “Semantic image
tion generative adversarial networks,” in Proc. Int. Conf. Mach. synthesis with spatially-adaptive normalization,” in Proc. Conf.
Learn., 2019, pp. 7354–7363. Comput. Vis. Pattern Recognit., 2019, pp. 2337–2346.
[39] A. Brock, J. Donahue, and K. Simonyan, “Large scale GAN train- [64] C. Li and M. Wand, “Precomputed real-time texture synthesis
ing for high fidelity natural image synthesis,” in Proc. Int. Conf. with markovian generative adversarial networks,” in Proc. Eur.
Learn. Representations, 2019, pp. 1–29. Conf. Comput. Vis., 2016, pp. 702–716.
[40] T. Karras, S. Laine, and T. Aila, “A style-based generator archi- [65] N. Jetchev, U. Bergmann, and R. Vollgraf, “Texture synthesis
tecture for generative adversarial networks,” in Proc. Conf. Com- with spatial generative adversarial networks,” in Proc. Neural Inf.
put. Vis. Pattern Recognit., 2019, pp. 4401–4410. Process. Syst. Adv. Learn. Workshop, 2016, pp. 1–11.
[41] J. Zhao, M. Mathieu, and Y. LeCun, “Energy-based generative [66] U. Bergmann, N. Jetchev, and R. Vollgraf, “Learning texture
adversarial network,” in Proc. Int. Conf. Learn. Representations, manifolds with the periodic spatial Gan,” in Proc. Int. Conf. Mach.
2017, pp. 1–17. Learn., 2017, pp. 469–477.
[42] D. Berthelot, T. Schumm, and L. Metz, “BeGAN: Boundary equilib- [67] K. Ehsani, R. Mottaghi, and A. Farhadi, “SeGAN: Segmenting
rium generative adversarial networks,” 2017, arXiv:1703.10717. and generating the invisible,” in Proc. Conf. Comput. Vis. Pattern
[43] J. Donahue, P. Kr€ ahenb€uhl, and T. Darrell, “Adversarial feature Recognit., 2018, pp. 6144–6153.
learning,” in Proc. Int. Conf. Learn. Representations, 2017, pp. 1–18. [68] J. Li, X. Liang, Y. Wei, T. Xu, J. Feng, and S. Yan, “Perceptual
[44] V. Dumoulin et al., “Adversarially learned inference,” in Proc. generative adversarial networks for small object detection,”
Int. Conf. Learn. Representations, 2017, pp. 1–18. in Proc. Conf. Comput. Vis. Pattern Recognit., 2017, pp. 1222–
[45] D. Ulyanov, A. Vedaldi, and V. Lempitsky, “It takes (only) two: 1230.
Adversarial generator-encoder networks,” in Proc. AAAI Conf. [69] Y. Bai, Y. Zhang, M. Ding, and B. Ghanem, “SOD-MTGAN:
Artifi. Intell., 2018, pp. 1250–1257. Small object detection via multi-task generative adversarial
[46] T. Nguyen, T. Le, H. Vu, and D. Phung, “Dual discriminator gen- network,” in Proc. Eur. Conf. Comput. Vis., 2018, pp. 206–221.
erative adversarial nets,” in Proc. Neural Inf. Process. Syst., 2017, [70] C. Vondrick, H. Pirsiavash, and A. Torralba, “Generating videos
pp. 2670–2680. with scene dynamics,” in Proc. Neural Inf. Process. Syst., 2016,
[47] I. Durugkar, I. Gemp, and S. Mahadevan, “Generative multi-adver- pp. 613–621.
sarial networks,” in Proc. Int. Conf. Learn. Representations, 2017, [71] E. L. Denton and V. Birodkar, “Unsupervised learning of disen-
pp. 1–14. tangled representations from video,” in Proc. Neural Inf. Process.
[48] Q. Hoang, T. D. Nguyen, T. Le, and D. Phung, “MGAN: Training Syst., 2017, pp. 4414–4423.
generative adversarial nets with multiple generators,” in Proc. [72] J. Walker, K. Marino, A. Gupta, and M. Hebert, “The pose
Int. Conf. Learn. Representations, 2018, pp. 1–24. knows: Video forecasting by generating pose futures,” in Proc.
[49] A. Ghosh, V. Kulharia, V. P. Namboodiri, P. H. Torr, and Int. Conf. Comput. Vis., 2017, pp. 33321–3341.
P. K. Dokania, “Multi-agent diverse generative adversarial [73] T.-C. Wang et al., “Video-to-video synthesis,” in Proc. Neural Inf.
networks,” in Proc. Conf. Comput. Vis. Pattern Recognit., 2018, Process. Syst., 2018, pp. 1152–1164.
pp. 8513–8521. [74] S. Tulyakov, M.-Y. Liu, X. Yang, and J. Kautz, “MoCoGAN:
[50] M.-Y. Liu and O. Tuzel, “Coupled generative adversarial Decomposing motion and content for video generation,” in Proc.
networks,” in Proc. Neural Inf. Process. Syst., 2016, pp. 469–477. Conf. Comput. Vis. Pattern Recognit., 2018, pp. 1526–1535.
[51] C. Ledig et al., “Photo-realistic single image super-resolution [75] K. Lin, D. Li, X. He, Z. Zhang, and M.-T. Sun, “Adversarial rank-
using a generative adversarial network,” in Proc. Conf. Comput. ing for language generation,” in Proc. Neural Inf. Process. Syst.,
Vis. Pattern Recognit., 2017, pp. 4681–4690. 2017, pp. 3155–3165.
[52] X. Wang et al., “ESRGAN: Enhanced super-resolution generative [76] J. Wang et al., “IRGAN: A minimax game for unifying gener-
adversarial networks,” in Proc. Eur. Conf. Comput. Vis., 2018, ative and discriminative information retrieval models,”
pp. 63–79. in Proc. ACM SIGIR Conf. Res. Develop. Inf. Retrieval, 2017,
[53] Y. Yuan, S. Liu, J. Zhang, Y. Zhang, C. Dong, and L. Lin, pp. 515–524.
“Unsupervised image super-resolution using cycle-in-cycle gen- [77] S. Lu, Z. Dou, X. Jun, J.-Y. Nie, and J.-R. Wen, “PSGAN: A mini-
erative adversarial networks,” in Proc. IEEE/CVF Conf. Comput. max game for personalized search with limited and noisy click
Vis. Pattern Recognit. Workshops, 2018, pp. 701–710. data,” in Proc. ACM SIGIR Conf. Res. Develop. Inf. Retrieval, 2019,
[54] J. Guan, C. Pan, S. Li, and D. Yu, “SRDGAN: Learning the noise pp. 555–564.
prior for super resolution with dual generative adversarial [78] T. Qiao, J. Zhang, D. Xu, and D. Tao, “MirrorGAN: Learning text-
networks,” 2019, arXiv:1903.11821. to-image generation by redescription,” in Proc. Conf. Comput. Vis.
[55] Z. Ding, X.-Y. Liu, M. Yin, W. Liu, and L. Kong, “Tensor super- Pattern Recognit., 2019, pp. 1505–1514.
resolution with generative adversarial nets: A large image gener- [79] O. Mogren, “C-RNN-GAN: Continuous recurrent neural net-
ation approach,” in Proc. Int. Workshop Hum. Brain Artif. Intell., works with adversarial training,” in Proc. Neural Inf. Process. Syst.
2019, pp. 2019–223. Constructive Mach. Learn. Workshop, 2016, pp. 1–6.
[56] L. Q. Tran, X. Yin, and X. Liu, “Representation learning by rotat- [80] G. L. Guimaraes, B. Sanchez-Lengeling, C. Outeiral, P. L. C.
ing your faces,” IEEE Trans. Pattern Analy. Mach. Intell., vol. 41, Farias, and A. Aspuru-Guzik, “Objective-reinforced generative
no. 12, pp. 3007–3021, Dec. 2019. adversarial networks (ORGAN) for sequence generation mod-
[57] R. Huang, S. Zhang, T. Li, and R. He, “Beyond face rotation: els,” 2018, arXiv:1705.10843.
Global and local perception GAN for photorealistic and identity [81] L. Yu, W. Zhang, J. Wang, and Y. Yu, “SeqGAN: Sequence gener-
preserving frontal view synthesis,” in Proc. Int. Conf. Comput. ative adversarial nets with policy gradient,” in Proc. AAAI Conf.
Vis., 2017, pp. 2439–2448. Artifi. Intell., 2017, pp. 2852–2858.
[58] L. Ma, X. Jia, Q. Sun, B. Schiele, T. Tuytelaars, and L. Van Gool, [82] C. M. Bishop, Pattern Recognition and Machine Learning. New
“Pose guided person image generation,” in Proc. Neural Inf. Pro- York, NY, USA: Springer, 2006.
cess. Syst., 2017, pp. 406–416. [83] D. P. Kingma and M. Welling, “Auto-encoding variational
[59] W. Jiang et al., “PSGAN: Pose and expression robust spatial- Bayes,” in Proc. Int. Conf. Learn. Representations, 2014, pp. 1–14.
aware Gan for customizable makeup transfer,” in Proc. Conf. [84] D. J. Rezende, S. Mohamed, and D. Wierstra, “Stochastic back-
Comput. Vis. Pattern Recognit., 2020, pp. 5194–5202. propagation and approximate inference in deep generative mod-
[60] R. Yi, Y.-J. Liu, Y.-K. Lai, and P. L. Rosin, “APDrawingGAN: els,” in Proc. Int. Conf. Mach. Learn., 2014, pp. 1278–1286.
Generating artistic portrait drawings from face photos with hier- [85] G. E. Hinton, T. J. Sejnowski, and D. H. Ackley, Boltzmann
archical GANs,” in Proc. Conf. Comput. Vis. Pattern Recognit., Machines: Constraint Satisfaction Networks that Learn. Pittsburgh,
2019, pp. 10743–10752. PA, USA: Carnegie-Mellon Univ., 1984.
[61] J.-Y. Zhu, P. Kr€ ahenb€ uhl, E. Shechtman, and A. A. Efros, [86] D. H. Ackley, G. E. Hinton, and T. J. Sejnowski, “A learning
“Generative visual manipulation on the natural image man- algorithm for boltzmann machines,” Cogn. Sci., vol. 9, no. 1,
ifold,” in Proc. Eur. Conf. Comput. Vis., 2016, pp. 597–613. pp. 147–169, 1985.
Authorized licensed use limited to: University of Obuda. Downloaded on April 08,2024 at 18:15:32 UTC from IEEE Xplore. Restrictions apply.
GUI ET AL.: REVIEW ON GENERATIVE ADVERSARIAL NETWORKS: ALGORITHMS, THEORY, AND APPLICATIONS 3327
[87] G. E. Hinton, S. Osindero, and Y.-W. Teh, “A fast learning [113] D. Bau et al., “GAN dissection: Visualizing and understanding
algorithm for deep belief nets,” Neural Comput., vol. 18, no. 7, generative adversarial networks,” in Proc. Int. Conf. Learn. Repre-
pp. 1527–1554, 2006. sentations, 2019, pp. 1–18.
[88] A. Nguyen, A. Dosovitskiy, J. Yosinski, T. Brox, and J. Clune, [114] P. Wu, C. Zheng, and L. Pan, “A unified generative adversarial
“Synthesizing the preferred inputs for neurons in neural net- learning framework for improvement of skip-gram network
works via deep generator networks,” in Proc. Neural Inf. Process. representation learning methods,” IEEE Trans. Knowl. Data Eng.,
Syst., 2016, pp. 3387–3395. early access, Apr. 30, 2021, doi: 10.1109/TKDE.2021.3076766.
[89] Y. Bengio, E. Laufer, G. Alain, and J. Yosinski, “Deep generative [115] G.-J. Qi, L. Zhang, H. Hu, M. Edraki, J. Wang, and X.-S. Hua,
stochastic networks trainable by backprop,” in Proc. Int. Conf. “Global versus localized generative adversarial nets,” in Proc.
Mach. Learn., 2014, pp. 226–234. Conf. Comput. Vis. Pattern Recognit., 2018, pp. 1517–1525.
[90] Y. Bengio, L. Yao, G. Alain, and P. Vincent, “Generalized denois- [116] Z. Yu, Z. Zhang, W. Cao, C. Liu, J. P. Chen, and H. San Wong,
ing auto-encoders as generative models,” in Proc. Neural Inf. Pro- “GAN-based enhanced deep subspace clustering networks,”
cess. Syst., 2013, pp. 899–907. IEEE Trans. Knowl. Data Eng., early access, Sep., 21, 2021,
[91] I. Goodfellow, “NIPS 2016 tutorial: Generative adversarial doi: 10.1109/TKDE.2020.3025301.
networks,” 2017, arXiv:1701.00160. [117] M. Kocaoglu, C. Snyder, A. G. Dimakis, and S. Vishwanath,
[92] T. Salimans, A. Karpathy, X. Chen, and D. P. Kingma, “CausalGAN: Learning causal implicit generative models with
“PixelCNN++: Improving the pixelCNN with discretized logistic adversarial training,” in Proc. Int. Conf. Learn. Representations,
mixture likelihood and other modifications,” in Proc. Int. Conf. 2018, pp. 1–37.
Learn. Representations, 2017, pp. 1–10. [118] H. Wang et al., “Learning graph representation with generative
[93] B. J. Frey, G. E. Hinton, and P. Dayan, “Does the wake-sleep algo- adversarial nets,” IEEE Trans. Knowl. Data Eng., vol. 33, no. 8,
rithm produce good density estimators?,” in Proc. Neural Inf. Pro- pp. 3090–3103, Aug. 2021.
cess. Syst., 1996, pp. 661–667. [119] H. Zhao, S. Zhang, G. Wu, J. M. Moura, J. P. Costeira, and G. J.
[94] B. J. Frey, J. F. Brendan, and B. J. Frey, Graphical Models for Gordon, “Adversarial multiple source domain adaptation,” in
Machine Learning and Digital Communication. Cambridge, MA, Proc. Neural Inf. Process. Syst., 2018, pp. 8559–8570.
USA: MIT Press, 1998. [120] Y. Liu et al., “Generative adversarial active learning for unsuper-
[95] D. Silver, et al., “Mastering the game of go with deep neural net- vised outlier detection,” IEEE Trans. Knowl. Data Eng., vol. 32,
works and tree search,” Nature, vol. 529, no. 7587, pp. 484–489, 2016. no. 8, pp. 1517–1528, Aug. 2020.
[96] K. Eykholt et al., “Robust physical-world attacks on deep learn- [121] Y. Zhao, Z. Jin, G.-J. Qi, H. Lu, and X.-S. Hua, “An adversarial
ing visual classification,” in Proc. Conf. Comput. Vis. Pattern Recog- approach to hard triplet generation,” in Proc. Eur. Conf. Comput.
nit., 2018, pp. 1625–1634. Vis., 2018, pp. 501–517.
[97] A. Kurakin, I. Goodfellow, and S. Bengio, “Adversarial examples [122] S. Feizi, F. Farnia, T. Ginart, and D. Tse, “Understanding GANs
in the physical world,” in Proc. Int. Conf. Learn. Representations in the LQG setting: Formulation, generalization and stability,”
Workshop, 2017, pp. 1–15. IEEE J. Sel. Areas Inf. Theory, vol. 1, no. 1, pp. 304–311, May
[98] G. Elsayed et al., “Adversarial examples that fool both computer 2020.
vision and time-limited humans,” in Proc. Neural Inf. Process. [123] F. Farnia and D. Tse, “A convex duality framework for GANs,”
Syst., 2018, pp. 3910–3920. in Proc. Neural Inf. Process. Syst., 2018, pp. 5248–5258.
[99] X. Jia, X. Wei, X. Cao, and H. Foroosh, “ComDefend: An efficient [124] J. Zhao, J. Li, Y. Cheng, T. Sim, S. Yan, and J. Feng,
image compression model to defend adversarial examples,” in “Understanding humans in crowded scenes: Deep nested adver-
Proc. Conf. Comput. Vis. Pattern Recognit., 2019, pp. 6084–6092. sarial learning and a new benchmark for multi-human parsing,”
[100] A. Athalye, N. Carlini, and D. Wagner, “Obfuscated gradients give in Proc. ACM Multimedia Conf., 2018, pp. 792–800.
a false sense of security: Circumventing defenses to adversarial [125] A. Jahanian, L. Chai, and P. Isola, “On the ”steerability” of gener-
examples,” in Proc. Int. Conf. Mach. Learn., 2018, pp. 274–283. ative adversarial networks,” in Proc. Int. Conf. Learn. Representa-
[101] D. Z€ ugner, A. Akbarnejad, and S. G€ unnemann, “Adversarial tions, 2020, pp. 1–31.
attacks on neural networks for graph data,” in Proc. SIGKDD [126] B. Zhu, J. Jiao, and D. Tse, “Deconstructing generative adversar-
Conf. Knowl. Discov. Data Mining, 2018, pp. 2847–2856. ial networks,” IEEE Trans. Inf. Theory, vol. 66, no. 11, pp. 7155–
[102] Y. Dong et al., “Boosting adversarial attacks with momentum,” in 7179, Nov. 2020.
Proc. Conf. Comput. Vis. Pattern Recognit., 2018, pp. 9185–9193. [127] K. K. Babu and S. R. Dubey, “CSGAN: Cyclic-synthesized gener-
[103] C. Szegedy et al., “Intriguing properties of neural networks,” in ative adversarial networks for image-to-image transformation,”
Proc. Int. Conf. Learn. Representations, 2014, pp. 1–9. Expert Syst. Appl., vol. 169, pp. 1–12, 2021.
[104] I. J. Goodfellow, J. Shlens, and C. Szegedy, “Explaining and har- [128] Y. Wu, J. Donahue, D. Balduzzi, K. Simonyan, and T. Lillicrap,
nessing adversarial examples,” in Proc. Int. Conf. Learn. Represen- “LoGAN: Latent optimisation for generative adversarial
tations, 2015, pp. 1–11. networks,” 2020, arXiv:1912.00953.
[105] J. Kos, I. Fischer, and D. Song, “Adversarial examples for genera- [129] T. Kurutach, A. Tamar, G. Yang, S. J. Russell, and P. Abbeel,
tive models,” in Proc. IEEE Secur. Privacy Workshops, 2018, “Learning plannable representations with causal infoGAN,” in
pp. 36–42. Proc. Neural Inf. Process. Syst., 2018, pp. 8733–8744.
[106] P. Samangouei, M. Kabkab, and R. Chellappa, “Defense-GAN: [130] A. Spurr, E. Aksan, and O. Hilliges, “Guiding infoGAN with
Protecting classifiers against adversarial attacks using generative semi-supervision,” in Proc. Joint Eur. Conf. Mach. Learn. Knowl.
models,” in Proc. Int. Conf. Learn. Representations, 2018, pp. 1–17. Discov. Databases, 2017, pp. 119–134.
[107] N. Akhtar and A. Mian, “Threat of adversarial attacks on deep [131] A. Nguyen, J. Clune, Y. Bengio, A. Dosovitskiy, and J. Yosinski,
learning in computer vision: A survey,” IEEE Access, vol. 6, pp. “Plug & play generative networks: Conditional iterative genera-
14410–14430, 2018. tion of images in latent space,” in Proc. Conf. Comput. Vis. Pattern
[108] G. Jin, S. Shen, D. Zhang, F. Dai and Y. Zhang, “APE-GAN: Recognit., 2017, pp. 4467–4477.
Adversarial perturbation elimination with GAN,” in Proc. IEEE [132] S. Reed, Z. Akata, X. Yan, L. Logeswaran, B. Schiele, and H. Lee,
Int. Conf. Acoust., Speech Signal Process., 2019, pp. 3842–3846. “Generative adversarial text to image synthesis,” in Proc. Int.
[109] H. Lee, S. Han, and J. Lee, “Generative adversarial trainer: Conf. Mach. Learn., 2016, pp. 1–10.
Defense to adversarial perturbations with GAN,” 2017, [133] S. Hong, D. Yang, J. Choi, and H. Lee, “Inferring semantic layout
arXiv:1705.03387. for hierarchical text-to-image synthesis,” in Proc. Conf. Comput.
[110] L. Huang, A. D. Joseph, B. Nelson, B. I. Rubinstein, and J. Tygar, Vis. Pattern Recognit., 2018, pp. 7986–7994.
“Adversarial machine learning,” in Proc. ACM Workshop Secur. [134] S. E. Reed, Z. Akata, S. Mohan, S. Tenka, B. Schiele, and H. Lee,
Artif. Intell., 2011, pp. 43–58. “Learning what and where to draw,” in Proc. Neural Inf. Process.
[111] I. J. Goodfellow, “On distinguishability criteria for estimating Syst., 2016, pp. 217–225.
generative models,” in Proc. Int. Conf. Learn. Representations Work- [135] H. Zhang et al., “StackGAN++: Realistic image synthesis with
shop, 2015, pp. 1–6. stacked generative adversarial networks,” IEEE Trans. Pattern
[112] M. Kahng, N. Thorat, D. H. P. Chau, F. B. Viegas, and M. Watten- Anal. Mach. Intell., vol. 41, no. 8, pp. 1947–1962, Aug. 2019.
berg, “GAN lab: Understanding complex deep generative mod- [136] X. Huang, Y. Li, O. Poursaeed, J. Hopcroft, and S. Belongie,
els using interactive visual experimentation,” IEEE Trans. “Stacked generative adversarial networks,” in Proc. Conf. Com-
Visualization Comput. Graph., vol. 25, no. 1, pp. 1–11, Jan. 2018. put. Vis. Pattern Recognit., 2017, pp. 5077–5086.
Authorized licensed use limited to: University of Obuda. Downloaded on April 08,2024 at 18:15:32 UTC from IEEE Xplore. Restrictions apply.
3328 IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. 35, NO. 4, APRIL 2023
[137] J. Gauthier, “Conditional generative adversarial nets for convo- [161] M. Heusel, H. Ramsauer, T. Unterthiner, B. Nessler, and
lutional face generation,” Class Project for Stanford CS231N: Con- S. Hochreiter, “GANs trained by a two time-scale update rule
volutional Neural Netw. Vis. Recognit., Winter semester, vol. 2014, converge to a local nash equilibrium,” in Proc. Neural Inf. Process.
no. 5, 2014, Art. no. 5. Syst., 2017, pp. 6626–6637.
[138] G. Antipov, M. Baccouche, and J.-L. Dugelay, “Face aging with [162] M. Lucic, K. Kurach, M. Michalski, S. Gelly, and O. Bousquet,
conditional generative adversarial networks,” in Proc. IEEE Int. “Are GANs created equal? A large-scale study,” in Proc. Neural
Conf. Image Process., 2017, pp. 2089–2093. Inf. Process. Syst., 2018, pp. 700–709.
[139] H. Tang, D. Xu, N. Sebe, Y. Wang, J. J. Corso, and Y. Yan, “Multi- [163] I. Csiszar, P. C. Shields, et al., “Information theory and statistics:
channel attention selection GAN with cascaded semantic guid- A tutorial,” Found. TrendsÒ Commun. Inf. Theory, vol. 1, no. 4,
ance for cross-view image translation,” in Proc. Conf. Comput. pp. 417–528, 2004.
Vis. Pattern Recognit., 2019, pp. 2417–2426. [164] D. J. Im, H. Ma, G. Taylor, and K. Branson, “Quantitatively eval-
[140] L. Karacan, Z. Akata, A. Erdem, and E. Erdem, “Learning to gen- uating Gans with divergences proposed for training,” in Proc.
erate images of outdoor scenes from attributes and semantic Int. Conf. Learn. Representations, 2018, pp. 1–30.
layouts,” 2016, arXiv:1612.00215. [165] M. Uehara, I. Sato, M. Suzuki, K. Nakayama, and Y. Matsuo,
[141] B. Dai, S. Fidler, R. Urtasun, and D. Lin, “Towards diverse and “Generative adversarial nets from a density ratio estimation
natural image descriptions via a conditional GAN,” in Proc. Int. perspective,” in Proc. Int. Conf. Learn. Representations, 2017,
Conf. Comput. Vis., 2017, pp. 2970–2979. pp. 1–16.
[142] S. Yao et al., “3D-aware scene manipulation via inverse graph- [166] B. K. Sriperumbudur, A. Gretton, K. Fukumizu, B. Sch€ olkopf,
ics,” in Proc. Neural Inf. Process. Syst., 2018, pp. 1887–1898. and G. R. Lanckriet, “Hilbert space embeddings and metrics on
[143] G. G. Chrysos, J. Kossaifi, and S. Zafeiriou, “Robust conditional probability measures,” J. Mach. Learn. Res., vol. 11, pp. 1517–
generative adversarial networks,” in Proc. Int. Conf. Learn. Repre- 1561, 2010.
sentations, 2019, pp. 1–27. [167] A. Uppal, S. Singh, and B. Poczos, “Nonparametric density esti-
[144] K. K. Thekumparampil, A. Khetan, Z. Lin, and S. Oh, mation & convergence of GANs under besov IPM losses,” in
“Robustness of conditional GANs to noisy labels,” in Proc. Neural Proc. Neural Inf. Process. Syst., 2019, pp. 9086–9097.
Inf. Process. Syst., 2018, pp. 10271–10282. [168] A. Gretton, K. M. Borgwardt, M. J. Rasch, B. Sch€ olkopf, and A.
[145] Q. Mao, H.-Y. Lee, H.-Y. Tseng, S. Ma, and M.-H. Yang, “Mode Smola, “A kernel two-sample test,” J. Mach. Learn. Res., vol. 13,
seeking generative adversarial networks for diverse image syn- pp. 723–773, Mar. 2012. [Online]. Available: https://jmlr.org/
thesis,” in Proc. Conf. Comput. Vis. Pattern Recognit., 2019, pp. papers/volume13/gretton12a/gretton12a.pdf
1429–1437. [169] K. M. Borgwardt, A. Gretton, M. J. Rasch, H. P. Kriegel, B.
[146] M. Gong, Y. Xu, C. Li, K. Zhang, and K. Batmanghelich, “Twin Sch€ olkopf, and A. J. Smola, “Integrating structured biological
auxilary classifiers GAN,” in Proc. Neural Inf. Process. Syst., 2019, data by kernel maximum mean discrepancy,” Bioinformatics, vol.
pp. 1328–1337. 22, no. 14, pp. e49–e57, 2006.
[147] J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei, [170] W. Wang, Y. Sun, and S. Halgamuge, “Improving MMD-GAN
“ImageNET: A large-scale hierarchical image database,” in Proc. training with repulsive loss function,” in Proc. Int. Conf. Learn.
IEEE Conf. Comput. Vis. Pattern Recognit., 2009, pp. 248–255. Representations, 2019, pp. 1–24.
[148] G. Perarnau, J. Van De Weijer, B. Raducanu, and J. M. [171] M. Arbel, D. Sutherland, M. Bi nkowski, and A. Gretton, “On gra-
Alvarez, “Invertible conditional GANs for image editing,” in dient regularizers for MMD GANs,” in Proc. Neural Inf. Process.
Proc. Conf. Neural Inf. Process. Syst. Workshop Adversarial Train- Syst., 2018, pp. 6700–6710.
ing, 2016, pp. 1–9. [172] M. Bi nkowski, D. J. Sutherland, M. Arbel, and A. Gretton,
[149] M. Saito, E. Matsumoto, and S. Saito, “Temporal generative “Demystifying MMD GANs,” 2021, arXiv:1801.01401.
adversarial nets with singular value clipping,” in Proc. Int. Conf. [173] C.-L. Li, W.-C. Chang, Y. Cheng, Y. Yang, and B. P oczos,
Comput. Vis., 2017, pp. 2830–2839. “MMD GAN: Towards deeper understanding of moment
[150] K. Sricharan, R. Bala, M. Shreve, H. Ding, K. Saketh, and J. matching network,” in Proc. Neural Inf. Process. Syst., 2017,
Sun, “Semi-supervised conditional GANs,” 2017, arXiv:1708. pp. 2203–2213.
05789. [174] Y. Mroueh, C.-L. Li, T. Sercu, A. Raj, and Y. Cheng, “Sobolev
[151] T. Miyato and M. Koyama, “cGANS with projection discrimi- GAN,” in Proc. Int. Conf. Learn. Representations, 2018.
nator,” in Proc. Int. Conf. Learn. Representations, 2018, pp. 1–21. [175] D. J. Sutherland et al., “Generative models and model criti-
[152] P. Isola, J.-Y. Zhu, T. Zhou, and A. A. Efros, “Image-to-image cism via optimized maximum mean discrepancy,” 2021,
translation with conditional adversarial networks,” in Proc. Conf. arXiv:1611.04488.
Comput. Vis. Pattern Recognit., 2017, pp. 1125–1134. [176] G. K. Dziugaite, D. M. Roy, and Z. Ghahramani, “Training genera-
[153] T.-C. Wang, M.-Y. Liu, J.-Y. Zhu, A. Tao, J. Kautz, and B. Cata- tive neural networks via maximum mean discrepancy opti-
nzaro, “High-resolution image synthesis and semantic manipu- mization,” in Proc. Conf. Uncertainty Artif. Intell., 2015, pp. 258–267.
lation with conditional GANs,” in Proc. Conf. Comput. Vis. Pattern [177] Y. Li, K. Swersky, and R. Zemel, “Generative moment matching
Recognit., 2018, pp. 8798–8807. networks,” in Proc. Int. Conf. Mach. Learn., 2015, pp. 1718–1727.
[154] J.-Y. Zhu, T. Park, P. Isola, and A. A. Efros, “Unpaired image-to- [178] C. Villani, Optimal Transport: Old and New, Berlin, Germany:
image translation using cycle-consistent adversarial networks,” Springer, 2008.
in Proc. Int. Conf. Comput. Vis., 2017, pp. 2223–2232. [179] K. Roth, A. Lucchi, S. Nowozin, and T. Hofmann, “Stabilizing
[155] C. Li et al., “Alice: Towards understanding adversarial learning training of generative adversarial networks through regu-
for joint distribution matching,” in Proc. Neural Inf. Process. Syst., larization,” in Proc. Neural Inf. Process. Syst., 2017, pp. 2018–2028.
2017, pp. 5495–5503. [180] L. Mescheder, A. Geiger, and S. Nowozin, “Which training meth-
[156] L. C. Tiao, E. V. Bonilla, and F. Ramos, “Cycle-consistent adver- ods for GANs do actually converge?,” in Proc. Int. Conf. Mach.
sarial learning as approximate Bayesian inference,” in Proc. Int. Learn., 2018, pp. 3481–3490.
Conf. Mach. Learn. Workshop Theor. Found. Appl. Deep Generative [181] W. Fedus, M. Rosca, B. Lakshminarayanan, A. M. Dai, S.
Models, 2018, pp. 1–17. Mohamed, and I. Goodfellow, “Many paths to equilibrium:
[157] T. Kim, M. Cha, H. Kim, J. K. Lee, and J. Kim, “Learning to dis- GANs do not need to decrease a divergence at every step,” in
cover cross-domain relations with generative adversarial Proc. Int. Conf. Learn. Representations, 2018, pp. 1–18.
networks,” in Proc. Int. Conf. Mach. Learn., 2017, pp. 1857–1865. [182] N. Kodali, J. Abernethy, J. Hays, and Z. Kira, “On convergence
[158] Z. Yi, H. Zhang, P. Tan, and M. Gong, “DualGAN: Unsupervised and stability of GANs,” 2017, arXiv:1705.07215.
dual learning for image-to-image translation,” in Proc. Int. Conf. [183] J. Wu, Z. Huang, J. Thoma, D. Acharya, and L. Van Gool,
Comput. Vis., 2017, pp. 2849–2857. “Wasserstein divergence for GANs,” in Proc. Eur. Conf. Comput.
[159] M. Arjovsky and L. Bottou, “Towards principled methods for Vis., 2018, pp. 653–668.
training generative adversarial networks,” in Proc. Int. Conf. [184] M. G. Bellemare et al., “The cramer distance as a solution to
Learn. Representations, 2017, pp. 1–17. biased wasserstein gradients,” 2017, arXiv:1705.10743.
[160] A. Yadav, S. Shah, Z. Xu, D. Jacobs, and T. Goldstein, “Stabilizing [185] H. Petzka, A. Fischer, and D. Lukovnicov, “On the regularization
adversarial nets with prediction methods,” in Proc. Int. Conf. of wasserstein GANs,” in Proc. Int. Conf. Learn. Representations,
Learn. Representations, 2018, pp. 1–21. 2018, pp. 1–24.
Authorized licensed use limited to: University of Obuda. Downloaded on April 08,2024 at 18:15:32 UTC from IEEE Xplore. Restrictions apply.
GUI ET AL.: REVIEW ON GENERATIVE ADVERSARIAL NETWORKS: ALGORITHMS, THEORY, AND APPLICATIONS 3329
[186] C.-C. Hsu, H.-T. Hwang, Y.-C. Wu, Y. Tsao, and H.-M. Wang, [213] H. Zhu, A. Zheng, H. Huang, and R. He, “Arbitrary talking face
“Voice conversion from unaligned corpora using variational generation via attentional audio-visual coherence learning,” in
autoencoding wasserstein generative adversarial networks,” in Proc. Int. Joint Conf. Artif. Intell., 2020, pp. 2362–2368.
Proc. Interspeech, 2017, pp. 3364–3368. [214] H. Huang, R. He, Z. Sun, and T. Tan, “Wavelet domain genera-
[187] J. Adler and S. Lunz, “Banach wasserstein GAN,” in Proc. Neural tive adversarial network for multi-scale face hallucination,” Int.
Info. Process. Syst., 2018, pp. 6754–6763. J. Comput. Vis., vol. 127, no. 6–7, pp. 763–784, 2019.
[188] Y.-S. Chen, Y.-C. Wang, M.-H. Kao, and Y.-Y. Chuang, “Deep [215] C. K. Sønderby, J. Caballero, L. Theis, W. Shi, and F. Huszar,
photo enhancer: Unpaired learning for image enhancement from “Amortised MAP inference for image super-resolution,” in Proc.
photographs with GANs,” in Proc. Conf. Comput. Vis. Pattern Rec- Int. Conf. Learn. Representations, 2017, pp. 1–17.
ognit., 2018, pp. 6306–6314. [216] J. Johnson, A. Alahi, and L. Fei-Fei, “Perceptual losses for real-
[189] S. Athey, G. Imbens, J. Metzger, and E. Munro, “Using wasser- time style transfer and super-resolution,” in Proc. Eur. Conf. Com-
stein generative adversarial networks for the design of monte put. Vis., 2016, pp. 694–711.
carlo simulations,” J. Econometrics, 2021. [217] X. Wang, K. Yu, C. Dong, and C. Change Loy, “Recovering realis-
[190] A. Krizhevsky, “Learning multiple layers of features from tiny tic texture in image super-resolution by deep spatial feature
images,” State College, PA, USA, Tech. Rep., 2009. transform,” in Proc. Conf. Comput. Vis. Pattern Recognit., 2018,
[191] Y. LeCun, L. Bottou, Y. Bengio, P. Haffner, “Gradient-based pp. 606–615.
learning applied to document recognition,” Proc. IEEE, vol. 86, [218] W. Zhang, Y. Liu, C. Dong, and Y. Qiao, “RankSRGAN:
no. 11, pp. 2278–2324, Nov. 1998. Generative adversarial networks with ranker for image super-
[192] J. Susskind, A. Anderson, and G. E. Hinton, “The toronto face data- resolution,” in Proc. Int. Conf. Comput. Vis., 2019, pp. 3096–3105.
set,” Univ. Toronto, ON, Canada, UTML, Tech. Rep. UTML TR, [219] L. Tran, X. Yin, and X. Liu, “Disentangled representation learn-
2010. ing GAN for pose-invariant face recognition,” in Proc. Conf. Com-
[193] J. T. Springenberg, A. Dosovitskiy, T. Brox, and M. Riedmiller, put. Vis. Pattern Recognit., 2017, pp. 1415–1424.
“Striving for simplicity: The all convolutional net,” in Proc. Int. [220] J. Cao, Y. Hu, H. Zhang, R. He, and Z. Sun, “Learning a high
Conf. Learn. Representations Workshop, 2015, pp. 1–14. fidelity pose invariant model for high-resolution face
[194] A. A. Rusu et al., “Progressive neural networks,” 2016, frontalization,” in Proc. Neural Inf. Process. Syst., 2018, pp. 2867–
arXiv:1606.04671. 2877.
[195] T. Xu et al., “AttnGAN: Fine-grained text to image generation [221] A. Siarohin, E. Sangineto, S. Lathuiliere, and N. Sebe,
with attentional generative adversarial networks,” in Proc. Conf. “Deformable GANs for pose-based human image generation,” in
Comput. Vis. Pattern Recognit., 2018, pp. 1316–1324. Proc. Conf. Comput. Vis. Pattern Recognit., 2018, pp. 3408–3416.
[196] T. R. Shaham, T. Dekel, and T. Michaeli, “SinGAN: Learning a [222] C. Wang, C. Wang, C. Xu, and D. Tao, “Tag disentangled genera-
generative model from a single natural image,” in Proc. Int. Conf. tive adversarial networks for object image re-rendering,” in Proc.
Comput. Vis., 2019, pp. 4570–4580. Int. Joint Conf. Artif. Intell., 2017, pp. 2901–2907.
[197] I. Goodfellow, Y. Bengio, and A. Courville, Deep Learning. Cam- [223] Z. Shu, E. Yumer, S. Hadap, K. Sunkavalli, E. Shechtman, and
bridge, MA, USA: MIT Press, 2016. D. Samaras, “Neural face editing with intrinsic image disen-
[198] A. Srivastava, L. Valkov, C. Russell, M. U. Gutmann, and C. Sut- tangling,” in Proc. Conf. Comput. Vis. Pattern Recognit., 2017,
ton, “VEEGAN: Reducing mode collapse in GANs using implicit pp. 5541–5550.
variational learning,” in Proc. Neural Inf. Process. Syst., 2017, [224] H. Chang, J. Lu, F. Yu, and A. Finkelstein, “Pairedcyclegan:
pp. 3308–3318. Asymmetric style transfer for applying and removing make-
[199] D. Bau et al., “Seeing what a GAN cannot generate,” in Proc. Int. up,” in Proc. Conf. Comput. Vis. Pattern Recognit., 2018,
Conf. Comput. Vis., 2019, pp. 4502–4511. pp. 40–48.
[200] S. Arora, R. Ge, Y. Liang, T. Ma, and Y. Zhang, “Generalization [225] B. Dolhansky and C. Canton Ferrer, “Eye in-painting with exem-
and equilibrium in generative adversarial nets (GANs),” in Proc. plar generative adversarial networks,” in Proc. Conf. Comput. Vis.
Int. Conf. Mach. Learn., 2017, pp. 224–232. Pattern Recognit., 2018, pp. 7902–7911.
[201] L. Mescheder, S. Nowozin, and A. Geiger, “The numerics of [226] A. Pumarola, A. Agudo, A. M. Martinez, A. Sanfeliu, and F. Mor-
GANs,” in Proc. Neural Inf. Process. Syst., 2017, pp. 1825–1835. eno-Noguer, “Ganimation: Anatomically-aware facial animation
[202] Z. Lin, A. Khetan, G. Fanti, and S. Oh, “PacGAN: The power of from a single image,” in Proc. Eur. Conf. Comput. Vis., 2018,
two samples in generative adversarial networks,” in Proc. Neural pp. 818–833.
Inf. Process. Syst., 2018, pp. 1498–1507. [227] C. Donahue, Z. C. Lipton, A. Balsubramani, and J. McAuley,
[203] S. Arora, A. Risteski, and Y. Zhang, “Do GANs learn the distribu- “Semantically decomposing the latent spaces of generative
tion? Some theory and empirics,” in Proc. Int. Conf. Learn. Repre- adversarial networks,” in Proc. Int. Conf. Learn. Representations,
sentations, 2018, pp. 1–16. 2018, pp. 1–19.
[204] Y. Bai, T. Ma, and A. Risteski, “Approximability of discrimina- [228] A. Duarte et al., “Wav2Pix: Speech-conditioned face generation
tors implies diversity in GANs,” in Proc. Int. Conf. Learn. Repre- using generative adversarial networks,” in Proc. Int. Conf.
sentations, 2019, pp. 1–45. Acoust., Speech Signal Process., 2019, pp. 8633–8637.
[205] T. Liang, “How well generative adversarial networks learn dis- [229] B. Gecer, S. Ploumpis, I. Kotsia, and S. Zafeiriou, “GANFIT: Gen-
tributions,” 2020, arXiv:1811.03179. erative adversarial network fitting for high fidelity 3D face
[206] S. Singh, A. Uppal, B. Li, C.-L. Li, M. Zaheer, and B. P oczos, reconstruction,” in Proc. Conf. Comput. Vis. Pattern Recognit.,
“Nonparametric density estimation with adversarial losses,” in 2019, pp. 1155–1164.
Proc. Neural Inf. Process. Syst., 2018, pp. 10246–10257. [230] Z. Shu, M. Sahasrabudhe, R. Alp Guler, D. Samaras, N. Paragios,
[207] S. Mohamed and B. Lakshminarayanan, “Learning in implicit and I. Kokkinos, “Deforming autoencoders: Unsupervised disen-
generative models,” 2017, arXiv:1610.03483. tangling of shape and appearance,” in Proc. Eur. Conf. Comput.
[208] G. Gidel, H. Berard, G. Vignoud, P. Vincent, and S. Lacoste-Julien, Vis., 2018, pp. 650–665.
“A variational inequality perspective on generative adversarial [231] C. Fu, X. Wu, Y. Hu, H. Huang, and R. He, “Dual variational gen-
networks,” in Proc. Int. Conf. Learn. Representations, 2019, pp. 1–38. eration for low shot heterogeneous face recognition,” in Proc.
[209] M. Sanjabi, J. Ba, M. Razaviyayn, and J. D. Lee, “On the conver- Neural Inf. Process. Syst., 2019, pp. 2670–2679.
gence and robustness of training GANs with regularized optimal [232] J. Cao, Y. Hu, B. Yu, R. He, and Z. Sun, “3D aided duet GANs for
transport,” in Proc. Neural Inf. Process. Syst., 2018, pp. 7091–7101. multi-view face image synthesis,” IEEE Trans. Inf. Forensics
[210] V. Nagarajan, C. Raffel, and I. J. Goodfellow, “Theoretical Secur., vol. 14, no. 8, pp. 2028–2042, Aug. 2019.
insights into memorization in GANs,” in Proc. Neural Inf. Process. [233] Y. Liu, Q. Li, and Z. Sun, “Attribute-aware face aging with wave-
Syst. Workshop, 2018, pp. 1–10. let-based generative adversarial networks,” in Proc. Conf. Com-
[211] Y. Blau, R. Mechrez, R. Timofte, T. Michaeli, and L. Zelnik- put. Vis. Pattern Recognit., 2019, pp. 11877–118866.
Manor, “The 2018 PIRM challenge on perceptual image super- [234] J. Bao, D. Chen, F. Wen, H. Li, and G. Hua, “CVAE-GAN: Fine-
resolution,” in Proc. Eur. Conf. Comput. Vis. Workshops, 2018, pp. grained image generation through asymmetric training,” in Proc.
334–355. Int. Conf. Comput. Vis., 2017, pp. 2745–2754.
[212] X. Yu and F. Porikli, “Ultra-resolving face images by discrimina- [235] H. Dong, S. Yu, C. Wu, and Y. Guo, “Semantic image synthesis
tive generative networks,” in Proc. Eur. Conf. Comput. Vis., 2016, via adversarial learning,” in Proc. Int. Conf. Comput. Vis., 2017,
pp. 318–333. pp. 5706–5714.
Authorized licensed use limited to: University of Obuda. Downloaded on April 08,2024 at 18:15:32 UTC from IEEE Xplore. Restrictions apply.
3330 IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. 35, NO. 4, APRIL 2023
[236] J. Wu, C. Zhang, T. Xue, B. Freeman, and J. Tenenbaum, [261] X. Liu, Y. Wang, and Q. Liu, “PsGAN: A generative adversarial
“Learning a probabilistic latent space of object shapes via 3D network for remote sensing image pan-sharpening,” in Proc.
generative-adversarial modeling,” in Proc. Neural Inf. Process. IEEE Int. Conf. Image Process., 2018, pp. 873–877.
Syst., 2016, pp. 82–90. [262] S. Zhao, J. Cui, Y. Sheng, Y. Dong, X. Liang, E. I. Chang, and
[237] D. J. Im, C. D. Kim, H. Jiang, and R. Memisevic, “Generating Y. Xu, “Large scale image completion via co-modulated genera-
images with recurrent adversarial networks,” 2016, arXiv:1602. tive adversarial networks,” in Proc. Int. Conf. Learn. Representa-
05110. tions, 2021, pp. 1–25.
[238] J. Yang, A. Kannan, D. Batra, and D. Parikh, “LR-GAN: Layered [263] S. Iizuka, E. Simo-Serra, and H. Ishikawa, “Globally and locally
recursive generative adversarial networks for image generation,” consistent image completion,” ACM Trans. Graph., vol. 36, no. 4,
in Proc. Int. Conf. Learn. Representations, 2017, pp. 1–21. pp. 1–14, 2017.
[239] X. Wang, A. Shrivastava, and A. Gupta, “A-fast-RCNN: Hard [264] F. Liu, L. Jiao, and X. Tang, “Task-oriented GAN for polSAR
positive generation via adversary for object detection,” in Proc. image classification and clustering,” IEEE Trans. Neural Netw.
Conf. Comput. Vis. Pattern Recognit., 2017, pp. 2606–2615. Learn. Syst., vol. 30, no. 9, pp. 2707–2719, Sep. 2019.
[240] R. Villegas, J. Yang, S. Hong, X. Lin, and H. Lee, “Decomposing [265] A. Creswell and A. A. Bharath, “Adversarial training
motion and content for natural video sequence prediction,” in for sketch retrieval,” in Proc. Eur. Conf. Comput. Vis., 2016,
Proc. Int. Conf. Learn. Representations, 2017, pp. 1–22. pp. 798–809.
[241] E. Santana and G. Hotz, “Learning a driving simulator,” 2016, [266] M. Zhang, K. Teck Ma, J. Hwee Lim, Q. Zhao, and J. Feng, “Deep
arXiv:1608.01230. future gaze: Gaze anticipation on egocentric videos using adver-
[242] C. Chan, S. Ginosar, T. Zhou, and A. A. Efros, “Everybody dance sarial networks,” in Proc. Conf. Comput. Vis. Pattern Recognit.,
now,” in Proc. Int. Conf. Comput. Vis., 2019, pp. 5932–5941. 2017, pp. 4372–4381.
[243] M. Mathieu, C. Couprie, and Y. LeCun, “Deep multi-scale video [267] M. Zhang, K. T. Ma, J. Lim, Q. Zhao, and J. Feng, “Anticipating
prediction beyond mean square error,” in Proc. Int. Conf. Learn. where people will look using adversarial networks,” IEEE Trans.
Representations, 2016, pp. 1–14. Pattern Anal. Mach. Intell., vol. 41, no. 8, pp. 1783–1796, Aug. 2019.
[244] X. Liang, L. Lee, W. Dai, and E. P. Xing, “Dual motion GAN for [268] F. Fang, J. Yamagishi, I. Echizen, and J. Lorenzo-Trueba, “High-
future-flow embedded video prediction,” in Proc. Int. Conf. Com- quality nonparallel voice conversion based on cycle-consistent
put. Vis., 2017, pp. 1744–1752. adversarial network,” in Proc. IEEE Int. Conf. Acoust., Speech Sig-
[245] A. Bansal, S. Ma, D. Ramanan, and Y. Sheikh, “Recycle-GAN: nal Process., 2018, pp. 5279–5283.
Unsupervised video retargeting,” in Proc. Eur. Conf. Comput. Vis., [269] T. Kaneko and H. Kameoka, “Parallel-data-free voice conver-
2018, pp. 119–135. sion using cycle-consistent adversarial networks,” 2017,
[246] X. Liang, H. Zhang, L. Lin, and E. Xing, “Generative semantic arXiv:1711.11293.
manipulation with mask-contrasting GAN,” in Proc. Eur. Conf. [270] C. Esteban, S. L. Hyland, and G. R€atsch, “Real-valued (medical)
Comput. Vis., 2018, pp. 558–573. time series generation with recurrent conditional GANs,” 2017,
[247] Y. Chen, Y.-K. Lai, and Y.-J. Liu, “CartoonGAN: Generative arXiv:1706.02633.
adversarial networks for photo cartoonization,” in Proc. Conf. [271] K. G. Hartmann, R. T. Schirrmeister, and T. Ball, “Eeg-GAN:
Comput. Vis. Pattern Recognit., 2018, pp. 9465–9474. Generative adversarial networks for electroencephalograhic
[248] R. Villegas, J. Yang, D. Ceylan, and H. Lee, “Neural kinematic (EEG) brain signals,” 2018, arXiv:1806.01875.
networks for unsupervised motion retargetting,” in Proc. Conf. [272] C. Donahue, J. McAuley, and M. Puckette, “Synthesizing audio
Comput. Vis. Pattern Recognit., 2018, pp. 8639–8648. with generative adversarial networks,” 2018, arXiv:1802.04208.
[249] S. Zhou, T. Xiao, Y. Yang, D. Feng, Q. He, and W. He, “GeneGAN: [273] D. Li, D. Chen, B. Jin, L. Shi, J. Goh, and S.-K. Ng, “MAD-GAN:
Learning object transfiguration and attribute subspace from Multivariate anomaly detection for time series data with genera-
unpaired data,” in Proc. Brit. Mach. Vis. Conf., 2017, pp. 1–13. tive adversarial networks,” in Proc. Int. Conf. Artif. Neural Netw.,
[250] H. Wu, S. Zheng, J. Zhang, and K. Huang, “GP-GAN: Towards 2019, pp. 703–716.
realistic high-resolution image blending,” in Proc. ACM Int. Conf. [274] J. Li, W. Monroe, T. Shi, S. Jean, A. Ritter, and D. Jurafsky,
Multimedia, 2019, pp. 2487–2495. “Adversarial learning for neural dialogue generation,” in Proc.
[251] N. Souly, C. Spampinato, and M. Shah, “Semi supervised seman- Conf. Empirical Methods Natural Lang. Process., 2017, pp. 2157–2169.
tic segmentation using generative adversarial network,” in Proc. [275] Y. Zhang, Z. Gan, and L. Carin, “Generating text via adversarial
Int. Conf. Comput. Vis., 2017, pp. 5688–5696. training,” in Proc. Conf. Neural Inf. Process. Syst. Workshop Adv.
[252] J. Pan et al., “SalGAN: Visual saliency prediction with generative Training, 2016, pp. 1–6.
adversarial networks,” 2018, arXiv:1701.01081. [276] W. Fedus, I. Goodfellow, and A. M. Dai, “MaskGAN: Better text
[253] Y. Song et al., “Vital: Visual tracking via adversarial learning,” in generation via filling in the _,” in Proc. Int. Confe. Learn. Represen-
Proc. Conf. Comput. Vis. Pattern Recognit., 2018, pp. 8990–8999. tations, 2018, pp. 1–16.
[254] Y. Han, P. Zhang, W. Huang, Y. Zha, G. D. Cooper, and [277] S. Yang, J. Liu, W. Wang, and Z. Guo, “TET-GAN: Text effects
Y. Zhang, “Robust visual tracking based on adversarial unla- transfer via stylization and destylization,” in Proc. AAAI Conf.
beled instance generation with label smoothing loss regu- Artif. Intelli., 2019, pp. 1238–1245.
larization,” Pattern Recognit., vol. 97, pp. 1–15, 2020. [278] L. Cai and W. Y. Wang, “KBGAN: Adversarial learning for
[255] D. Engin, A. Genç, and H. Kemal Ekenel, “Cycle-dehaze: knowledge graph embeddings,” in Proc. Annu. Conf. North Amer.
Enhanced cycleGAN for single image dehazing,” in Proc. IEEE/ Chapter Assoc. Comput. Linguistics, 2018, pp. 1–10.
CVF Conf. Comput. Vis. Pattern Recognit. Workshops, 2018, pp. [279] X. Wang, W. Chen, Y.-F. Wang, and W. Y. Wang, “No metrics are
825–833. perfect: Adversarial reward learning for visual storytelling,” in
[256] X. Yang, Z. Xu, and J. Luo, “Towards perceptual image dehazing Proc. Annu. Meeting Assoc. Comput. Linguistics, 2018, pp. 1–15.
by physics-based disentanglement and adversarial training,” in [280] P. Qin, W. Xu, and W. Y. Wang, “Dsgan: generative adversarial
Proc. AAAI Conf. Artif. Intell., 2018, pp. 7485–7492. training for distant supervision relation extraction,” in Proc.
[257] W. Liu, X. Hou, J. Duan, and G. Qiu, “End-to-end single image Annu. Meeting Assoc. Comput. Linguistics, 2018, pp. 1–10.
fog removal using enhanced cycle consistent adversarial [281] C. d. M. d’Autume, M. Rosca, J. Rae, and S. Mohamed, “Training
networks,” IEEE Trans. Image Process., vol. 29, pp. 7819–7833, language GANs from scratch,” in Proc. Neural Inf. Process. Syst.,
2020. [Online]. Available: https://ieeexplore.ieee.org/abstract/ 2019, pp. 4302–4313.
document/9139368 [282] A. Dash, J. C. B. Gamboa, S. Ahmed, M. Liwicki, and M. Z. Afzal,
[258] S. Lutz, K. Amplianitis, and A. Smolic, “AlphaGAN: Generative “TAC-GAN-text conditioned auxiliary classifier generative
adversarial networks for natural image matting,” in Proc. Brit. adversarial network,” 2017, arXiv:1703.06412.
Mach. Vis. Conf., 2018, pp. 1–17. [283] T.-H. Chen, Y.-H. Liao, C.-Y. Chuang, W.-T. Hsu, J. Fu, and
[259] R. A. Yeh, C. Chen, T. Yian Lim, A. G. Schwing, M. Hasegawa- M. Sun, “Show, adapt and tell: Adversarial training of cross-
Johnson, and M. N. Do, “Semantic image inpainting with deep domain image captioner,” in Proc. Int. Conf. Comput. Vis., 2017,
generative models,” in Proc. Conf. Comput. Vis. Pattern Recognit., pp. 521–530.
2017, pp. 5485–5493. [284] R. Shetty, M. Rohrbach, L. Anne Hendricks, M. Fritz, and
[260] J. Yu, Z. Lin, J. Yang, X. Shen, X. Lu, and T. S. Huang, B. Schiele, “Speaking the same language: Matching machine to
“Generative image inpainting with contextual attention,” in Proc. human captions by adversarial training,” in Proc. Int. Conf. Com-
Conf. Comput. Vis. Pattern Recognit., 2018, pp. 5505–5514. put. Vis., 2017, pp. 4135–4144.
Authorized licensed use limited to: University of Obuda. Downloaded on April 08,2024 at 18:15:32 UTC from IEEE Xplore. Restrictions apply.
GUI ET AL.: REVIEW ON GENERATIVE ADVERSARIAL NETWORKS: ALGORITHMS, THEORY, AND APPLICATIONS 3331
[285] S. Rao and H. Daum e III, “Answer-based adversarial training for [307] G. St-Yves and T. Naselaris, “Generative adversarial networks
generating clarification questions,” in Proc. Annu. Conf. North conditioned on brain activity reconstruct seen images,” in Proc.
Amer. Chapter Assoc. Comput. Linguistics, 2019, pp. 1–14. IEEE Int. Conf. Syst., Man, Cybern, 2018, pp. 1054–1061.
[286] X. Yang et al., “Adversarial training for community question [308] B. Tian, Y. Zhang, X. Chen, C. Xing, and C. Li, “DRGAN: A gan-
answer selection based on multi-scale matching,” in Proc. AAAI based framework for doctor recommendation in chinese on-line
Conf. Artifi. Intell., 2019, pp. 395–402. QA communities,” in Proc. Int. Conf. Database Syst. Adv. Appl.,
[287] B. Liu, J. Fu, M. P. Kato, and M. Yoshikawa, “Beyond narrative 2019, pp. 444–447.
description: Generating poetry from images by multi-adversarial [309] Z. Zheng, L. Zheng, and Y. Yang, “Unlabeled samples generated
training,” in Proc. ACM Multimedia Conf., 2018, pp. 783–791. by GAN improve the person re-identification baseline in vitro,”
[288] Y. Luo, H. Zhang, Y. Wen, and X. Zhang, “ResumeGAN: An in Proc. Int. Conf. Comput. Vis., 2017, pp. 3754–3762.
optimized deep representation learning framework for talent-job [310] B. Chang, Q. Zhang, S. Pan, and L. Meng, “Generating handwrit-
fit via adversarial learning,” in Proc. ACM Int. Conf. Inf. Knowl. ten chinese characters using cyclegan,” in Proc. IEEE Winter
Manage., 2019, pp. 1101–1110. Confe. Appl. Comput. Vis., 2018, pp. 199–207.
[289] C. Garbacea, S. Carton, S. Yan, and Q. Mei, “Judge the judges: A [311] L. Sixt, B. Wild, and T. Landgraf, “RenderGAN: Generating real-
large-scale evaluation study of neural language models for istic labeled data,” Front. Robot. AI, vol. 5, pp. 1–9, 2018.
online review generation,” in Proc. Conf. Empirical Methods Natu- [312] D. Xu, S. Yuan, L. Zhang, and X. Wu, “FairGAN: Fairness-aware
ral Lang. Process. Int. Joint Conf., 2019, pp. 3966–3979. generative adversarial networks,” in Proc. IEEE Int. Conf. Big
[290] H. Aghakhani, A. Machiry, S. Nilizadeh, C. Kruegel, and G. Vigna, Data, 2018, pp. 570–575.
“Detecting deceptive reviews using generative adversarial [313] M.-C. Lee, B. Gao, and R. Zhang, “Rare query expansion through
networks,” in Proc. IEEE Secur. Privacy Workshops, 2018, pp. 89–95. generative adversarial networks in search advertising,” in Proc.
[291] K. Takuhiro, K. Hirokazu, H. Nobukatsu, I. Yusuke, H. Kaoru, ACM SIGKDD Int. Conf. Knowl. Discov. Data Mining, 2018, pp.
and K. Kunio, “Generative adversarial network-based postfilter- 500–508.
ing for statistical parametric speech synthesis,” in Proc. IEEE Int. [314] D. Xu, Y. Wu, S. Yuan, L. Zhang, and X. Wu, “Achieving causal
Conf. Acoust., Speech Signal Process., 2017, pp. 4910–4914. fairness through generative adversarial networks,” in Proc. Int.
[292] Y. Saito, S. Takamichi, and H. Saruwatari, “Statistical parametric Joint Conf. Artif. Intell., 2019, pp. 1452–1458.
speech synthesis incorporating generative adversarial [315] M. O. Turkoglu, W. Thong, L. Spreeuwers, and B. Kicanaoglu,
networks,” IEEE/ACM Trans. Audio, Speech, Lang. Process., vol. 26, “A layer-based sequential framework for scene generation with
no. 1, pp. 84–96, Jan. 2018. gans,” in Proc. AAAI Conf. Artif. Intell., 2019, pp. 8901–8908.
[293] C. Donahue, J. McAuley, and M. Puckette, “Adversarial audio [316] A. El-Nouby et al., “Tell, draw, and repeat: Generating and modi-
synthesis,” in Proc. Int. Conf. Learn. Representations, 2019, fying images based on continual linguistic instruction,” in Proc.
pp. 1–16. Int. Conf. Comput. Vis., 2019, pp. 10303–10311.
[294] S. Pascual, A. Bonafonte, and J. Serra, “SeGAN: Speech enhance- [317] N. Ratzlaff and L. Fuxin, “HyperGAN: A generative model for
ment generative adversarial network,” in Proc. Interspeech, 2017, diverse, performant neural networks,” in Proc. Int. Conf. Mach.
pp. 3642–3646. Learn., 2019, pp. 5361–5369.
[295] C. Donahue, B. Li, and R. Prabhavalkar, “Exploring speech [318] M. Frid-Adar, I. Diamant, E. Klang, M. Amitai, J. Goldberger,
enhancement with generative adversarial networks for robust and H. Greenspan, “GAN-based synthetic medical image aug-
speech recognition,” in Proc. IEEE Int. Conf. Acoust., Speech Signal mentation for increased CNN performance in liver lesion classi-
Process., 2018, pp. 5024–5028. fication,” Neurocomputing, vol. 321, pp. 321–331, 2018.
[296] N. Killoran, L. J. Lee, A. Delong, D. Duvenaud, and B. J. Frey, [319] Q. Wang, H. Yin, H. Wang, Q. V. H. Nguyen, Z. Huang, and
“Generating and designing dna with deep generative models,” in L. Cui, “Enhancing collaborative filtering with generative
Proc. Conf. Neural Inf. Process. Syst. Comput. Biol. Workshop, 2017, augmentation,” in Proc. ACM SIGKDD Int. Conf. Knowl. Discov.
pp. 1–19. Data Mining, 2019, pp. 548–556.
[297] A. Gupta and J. Zou, “Feedback GAN for DNA optimizes protein [320] Y. Zhang, Y. Fu, P. Wang, X. Li, and Y. Zheng, “Unifying inter-
functions,” Nat. Mach. Intell., vol. 1, no. 2, pp. 105–111, 2019. region autocorrelation and intra-region structures for spatial
[298] M. Benhenda, “Chemgan challenge for drug discovery: Can AI embedding via collective adversarial learning,” in Proc. ACM
reproduce natural chemical diversity?,” 2017, arXiv:1708.08227. SIGKDD Int. Conf. Knowl. Discov. Data Mining, 2019, pp. 1700–
[299] E. Choi, S. Biswal, B. Malin, J. Duke, W. F. Stewart, and J. Sun, 1708.
“Generating multi-label discrete patient records using generative [321] H. Gao, J. Pei, and H. Huang, “ProGAN: Network embedding
adversarial networks,” in Proc. Mach. Learn. Healthcare, 2017, via proximity generative adversarial network,” in Proc. ACM
pp. 1–20. SIGKDD Inte. Conf. Knowl. Discov. Data Mining, 2019, pp. 1308–
[300] W. Dai et al., “SCAN: Structure correcting adversarial network 1316.
for organ segmentation in chest X-rays,” in Proc. Deep Learn. Med. [322] B. Hu, Y. Fang, and C. Shi, “Adversarial learning on heteroge-
Image Anal. Multimodal Learn. Clin. Decis. Support, 2018, pp. 1–10. neous information networks,” in Proc. ACM SIGKDD Int. Conf.
[301] T. Schlegl, P. Seeb€ ock, S. M. Waldstein, U. Schmidt-Erfurth, and Knowl. Discov. Data Mining, 2019, pp. 120–129.
G. Langs, “Unsupervised anomaly detection with generative [323] P. Wang, Y. Fu, H. Xiong, and X. Li, “Adversarial substruc-
adversarial networks to guide marker discovery,” in Proc. Int. tured representation learning for mobile user profiling,” in
Conf. Inf. Process. Med. Imaging, 2017, pp. 146–157. Proc. ACM SIGKDD Int. Conf. Knowl. Discov. Data Mining,
[302] J. M. Wolterink, A. M. Dinkla, M. H. Savenije, P. R. Seevinck, C. 2019, pp. 130–138.
A. van den Berg, and I. Isgum, “Deep MR to CT synthesis using [324] W. Hu and Y. Tan, “Generating adversarial malware examples
unpaired data,” in Proc. Int. Workshop Simulation Synth. Med. for black-box attacks based on GAN,” 2017, arXiv:1702.05983.
Imaging, 2017, pp. 14–23. [325] C. Chu, A. Zhmoginov, and M. Sandler, “CycleGAN: A master of
[303] T. M. Quan, T. Nguyen-Duc, and W.-K. Jeong, “Compressed steganography,” in Proc. Confe. Neural Inf. Process. Syst. Workshop
sensing MRI reconstruction using a generative adversarial net- Mach. Deception, 2017, pp. 1–6.
work with a cyclic loss,” IEEE Trans. Med. Imaging, vol. 37, no. 6, [326] D. Volkhonskiy, I. Nazarov, B. Borisenko, and E. Burnaev,
pp. 1488–1497, Jun. 2018. “Steganographic generative adversarial networks,” in Proc.
[304] M. Mardani et al., “Deep generative adversarial neural networks Confe. Neural Inf. Process. Syst. Workshop Adv. Training, 2016,
for compressive sensing mri,” IEEE Trans. Med. Imaging, vol. 38, pp. 1–15.
no. 1, pp. 167–179, Jan. 2019. [327] H. Shi, J. Dong, W. Wang, Y. Qian, and X. Zhang, “SSGAN: Secure
[305] Y. Xue, T. Xu, H. Zhang, L. R. Long, and X. Huang, “SeGAN: steganography based on generative adversarial networks,” in
Adversarial network with multi-scale l1 loss for medical image Proc. Pacific Rim Conf. Multimedia, 2017, pp. 534–544.
segmentation,” Neuroinformatics, vol. 16, no. 3–4, pp. 383–392, [328] J. Hayes and G. Danezis, “Generating steganographic images via
2018. adversarial training,” in Proc. Neural Inf. Process. Syst., 2017,
[306] Q. Yang et al., “Low-dose ct image denoising using a generative pp. 1954–1963.
adversarial network with Wasserstein distance and perceptual [329] M. Abadi and D. G. Andersen, “Learning to protect commu-
loss,” IEEE Trans. Med. Imaging, vol. 37, no. 6, pp. 1348–1357, Jun. nications with adversarial neural cryptography,” 2016,
2018. arXiv:1610.06918.
Authorized licensed use limited to: University of Obuda. Downloaded on April 08,2024 at 18:15:32 UTC from IEEE Xplore. Restrictions apply.
3332 IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. 35, NO. 4, APRIL 2023
[330] A. N. Gomez, S. Huang, I. Zhang, B. M. Li, M. Osama, and L. Kai- Yonggang Wen (Fellow, IEEE) received the PhD
ser, “Unsupervised cipher cracking using discrete GANs,” in degree in electrical engineering and computer
Proc. Int. Conf. Learn. Representations, 2018, pp. 1–15. science (with a minor in Western literature) from
[331] B. K. Beaulieu-Jones et al., “Privacy-preserving generative deep the Massachusetts Institute of Technology, Cam-
neural networks support clinical data sharing,” Circulation: Car- bridge, MA, USA, in 2008. He is currently a pro-
diovasc. Quality Outcomes, vol. 12, no. 7, pp. 1–10, 2019. fessor with the School of Computer Science and
[332] A. Gupta, J. Johnson, L. Fei-Fei, S. Savarese, and A. Alahi, “Social Engineering, Nanyang Technological University,
GAN: Socially acceptable trajectories with generative adversarial Singapore. He has been with Cisco, San Jose,
networks,” in Proc. Conf. Comput. Vis. Pattern Recognit., 2018, CA, USA, where he led product development in a
pp. 2255–2264. content delivery network, which had a revenue
[333] H. Shu et al., “Co-evolutionary compression for unpaired image impact of $3 billion globally. His work in multi-
translation,” in Proc. Int. Conf. Comput. Vis., 2019, pp. 3234–3243. screen cloud social TV has been featured by global media (more than
[334] S. Lin et al., “Towards optimal structured CNN pruning via gen- 1600 news articles from more than 29 countries). He has authored or
erative adversarial learning,” in Proc. Conf. Comput. Vis. Pattern coauthored more than 140 papers in top journals and prestigious
Recognit., 2019, pp. 2790–2799. conferences.
Jie Gui (Senior Member,IEEE) received the BS Dacheng Tao (Fellow, IEEE) is currently a pro-
degree in computer science from Hohai Univer- fessor of computer science and ARC laureate fel-
sity, Nanjing, China, in 2004, the MS degree in low with the School of Computer Science and the
computer applied technology from the Hefei Faculty of Engineering, and the inaugural director
Institute of Physical Science, Chinese Academy of the UBTECH Sydney Artificial Intelligence
of Sciences, Hefei, China, in 2007, and the PhD Centre, The University of Sydney. His research
degree in pattern recognition and intelligent sys- interest include artificial intelligence have been
tems from the University of Science and Technol- expounded in one monograph and more than 200
ogy of China, Hefei, China, in 2010. He is publications in prestigious journals and at promi-
currently a professor with the School of Cyber nent conferences, including IEEE Transactions
Science and Engineering, Southeast University. on Pattern Analysis and Machine Intelligence,
His research interests include machine learning, pattern recognition, IJCV, JMLR, AAAI, IJCAI, NIPS, ICML, CVPR, ICCV, ECCV, ICDM, and
and image processing. He is currently the associate editor for Neuro- KDD, with several best paper awards. He was the Recipient of the 2018
computing. He has authored or coauthored more than 60 papers in inter- IEEE ICDM Research Contributions Award and the 2015 Australian Sco-
national journals and conferences, including IEEE Transactions on pus-Eureka prize. He is a fellow of the ACM and the Australian Academy
Pattern Analysis and Machine Intelligence, IEEE Transactions on Neural of Science.
Networks and Learning Systems, IEEE Transactions on Cybernetics,
IEEE Transactions on Image Processing, IEEE Transactions on Circuits
and Systems for Video Technology, IEEE TSMCS, KDD, AAAI, and Jieping Ye (Fellow, IEEE) recevied the PhD
ACM MM. He is the area chair, senior PC member, or PC member of degree in computer science and engineering
many conferences, including NeurIPS and ICML. He is a senior member from the University of Minnesota in 2005. He is
of the ACM and a CCF distinguished member. currently a VP of Beike, China, and also a Profes-
sor with the University of Michigan, Ann Arbor,
MI, USA. His research interests include Big Data,
Zhenan Sun (Senior Member, IEEE) received the machine learning, and data mining with applica-
BE degree in industrial automation from the tions in transportation and biomedicine. He was a
Dalian University of Technology, Dalian, China, in senior program committee or the area chair or
1999, the MS degree in system engineering from the program committee vice chair of many confer-
the Huazhong University of Science and Technol- ences, including NIPS, ICML, KDD, IJCAI, ICDM,
ogy, Wuhan, China, in 2002, and the PhD degree and SDM. He was an associate editor for Data Mining and Knowledge
in pattern recognition and intelligent systems Discovery and IEEE Transactions on Knowledge and Data Engineering.
from the Institute of Automation, Chinese Acad- He was the Recipient of the best paper awards at ICML and KDD and
emy of Sciences (CASIA), Beijing, China, in NSF CAREER Award in 2010.
2006. Since 2006, he has been a faculty member
with the National Laboratory of Pattern Recogni-
tion, CASIA, where he is currently a professor with the Center for " For more information on this or any other computing topic,
Research on Intelligent Perception and Computing. please visit our Digital Library at www.computer.org/csdl.
Authorized licensed use limited to: University of Obuda. Downloaded on April 08,2024 at 18:15:32 UTC from IEEE Xplore. Restrictions apply.