0% found this document useful (0 votes)
68 views

Image2StyleGAN How To Embed Images Into The StyleGAN Latent Space

The document proposes an efficient algorithm to embed images into the latent space of a pretrained StyleGAN model. It shows that the algorithm can embed not only human faces, but other types of images like cars. The embeddings allow for semantic image editing operations like morphing, style transfer, and expression transfer. The authors analyze the quality of embeddings and gain insights into the structure of the latent space.

Uploaded by

Anthony
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
68 views

Image2StyleGAN How To Embed Images Into The StyleGAN Latent Space

The document proposes an efficient algorithm to embed images into the latent space of a pretrained StyleGAN model. It shows that the algorithm can embed not only human faces, but other types of images like cars. The embeddings allow for semantic image editing operations like morphing, style transfer, and expression transfer. The authors analyze the quality of embeddings and gain insights into the structure of the latent space.

Uploaded by

Anthony
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 10

Image2StyleGAN: How to Embed Images Into the StyleGAN Latent Space?

Rameen Abdal Yipeng Qin Peter Wonka


KAUST KAUST KAUST
[email protected] [email protected] [email protected]

Abstract is not only able to embed human face images, but also suc-
cessfully embeds non-face images from different classes.
We propose an efficient algorithm to embed a given im- Therefore, we continue our investigation by analyzing the
age into the latent space of StyleGAN. This embedding en- quality of the embedding to see if the embedding is semanti-
ables semantic image editing operations that can be applied cally meaningful. To this end, we propose to use three basic
to existing photographs. Taking the StyleGAN trained on operations on vectors in the latent space: linear interpola-
the FFHD dataset as an example, we show results for image tion, crossover, and adding a vector and a scaled difference
morphing, style transfer, and expression transfer. Studying vector. These operations correspond to three semantic im-
the results of the embedding algorithm provides valuable age processing applications: morphing, style transfer, and
insights into the structure of the StyleGAN latent space. We expression transfer. As a result, we gain more insight into
propose a set of experiments to test what class of images can the structure of the latent space and can solve the mystery
be embedded, how they are embedded, what latent space is why even instances of non-face images such as cars can be
suitable for embedding, and if the embedding is semanti- embedded.
cally meaningful. Our contributions include:

• An efficient embedding algorithm which can map a


1. Introduction given image into the extended latent space W + of a
pre-trained StyleGAN.
Generative Adverserial Networks (GANs) are very suc-
cessfully applied in various computer vision applications, • We study multiple questions providing insight into the
e.g. texture synthesis [18, 33, 28], video generation [31, 30], structure of the StyleGAN latent space, e.g.: What type
image-to-image translation [11, 36, 1, 24] and object de- of images can be embedded? What type of faces can
tection [19]. In the few past years, the quality of images be embedded? What latent space can be used for the
synthesized by GANs has increased rapidly. Compared to embedding?
the seminal DCGAN framework [25] in 2015, the current
state-of-the-art GANs [13, 3, 14, 36, 37] can synthesize at a • We propose to use three basic operations on vectors
much higher resolution and produce significantly more re- to study the quality of the embedding. As a result,
alistic images. Among them, StyleGAN [14] makes use of we can better understand the latent space and how dif-
an intermediate W latent space that holds the promise of ferent classes of images are embedded. As a byprod-
enabling some controlled image modifications. We believe uct, we obtain excellent results on multiple face image
that image modifications are a lot more exciting when it be- editing applications including morphing, style transfer,
comes possible to modify a given image rather than a ran- and expression transfer.
domly GAN generated one. This leads to the natural ques-
tion if it is possible to embed a given photograph into the
2. Related Work
GAN latent space.
To tackle this question, we build an embedding algo- High-quality GANs Starting from the groundbreaking
rithm that can map a given image I in the latent space of work by Goodfellow et al. [8] in 2014, the entire computer
StyleGAN pre-trained on the FFHQ dataset. One of our vision community has witnessed the fast-paced improve-
important insights is that the generalization ability of the ments on GANs in the past years. For image generation
pre-trained StyleGAN is significantly enhanced when using tasks, DCGAN [25] is the first milestone that lays down
an extended latent space W + (See Sec. 3.3). As a conse- the foundation of GAN architectures as fully-convolutional
quence, somewhat surprisingly, our embedding algorithm neural networks. Since then, various efforts have been made

14432
Figure 1: Top row: input images. Bottom row: results of embedding the images into the StyleGAN latent space.

to improve the performance of GANs from different as- ii) select a random initial latent code and optimize it using
pects, e.g. the loss function [21, 2], the regularization or gradient descent [35, 4]. Between them, the first approach
normalization [9, 23], and the architecture [9]. However, provides a fast solution of image embedding by performing
due to the limitation of computational power and the short- a forward pass through the encoder neural network. How-
age of high-quality training data, these works are only tested ever, it usually has problems generalizing beyond the train-
with low resolution and poor quality datasets collected for ing dataset. In this paper, we decided to build on the second
classification / recognition tasks. Addressing this issue, approach as the more general and stable solution.
Karras et al. collected the first high-quality human face
dataset CelebA-HQ and proposed a progressive strategy to
train GANs for high resolution image generation tasks [13].
Their ProGAN is the first GAN that can generate realistic
human faces at a high resolution of 1024 × 1024. How- Perceptual Loss and Style Transfer Traditionally, the
ever, the generation of high-quality images from complex low-level similarity between two images is measured in the
datasets (e.g. ImageNet) remains a challenge. To this end, pixel space with L1/L2 loss functions. While in the past
Brock et al. proposed BigGAN and argued that the training years, inspired by the success of complex image classifica-
of GANs benefit dramatically from large batch sizes [3]. tion [17, 20], Gatys et al. [6, 7] observed that the learned
Their BigGAN can generate realistic samples and smooth filters of the VGG image classification model [20] are ex-
interpolations spanning different classes. Recently, Karras cellent general-purpose feature extractors and proposed to
et al. collected a more diverse and higher quality human use the covariance statistics of the extracted features to mea-
face dataset FFHQ and proposed a new generator archi- sure the high-level similarity between images perceptually,
tecture inspired by the idea of neural style transfer [10], which is then formalized as the perceptual loss [12, 5].
which further improves the performance of GANs on hu- To demonstrate the power of their method, they showed
man face generation tasks [14]. However, the lack of con- promising results on style transfer [7]. Specifically, they
trol over image modification ascribed to the interpretability argued that different layers of the VGG neural network ex-
of neural networks, is still an open problem. In this paper, tract the image features at different scales and can be sep-
we tackle the interpretability problem by embedding user- arated into content and style. To accelerate the initial algo-
specified images back to the GAN latent space, which leads rithm, Johnson et al. [12] proposed to train a neural network
to a variety of potential applications. to solve the optimization problem of [7], which can trans-
fer the style of a given image to any other image in real-
time. The only limitation of their method is that they need
Latent Space Embedding In general, there are two exist- to train separate neural networks for different style images.
ing approaches to embed instances from the image space to Finally, this issue is resolved by Huang and Belongie [10]
the latent space: i) learn an encoder that maps a given image with adaptive instance normalization. As a result, they can
to the latent space (e.g. the Variational Auto-Encoder [15]); transfer arbitrary style in real-time.

4433
(a) (b) (c) (d) (e) (f) (g)

Figure 2: Top row: the input images. Bottom row: the embedded results. (a) Standard embedding results. (b) Translation
140 pixels to the right. (c) Translation 160 pixels to the left. (d) Zoom out by 2X. (e) Zoom in by 2X. (f) 90◦ rotation. (g)
180◦ rotation.

3. What images can be embedded into the Transformation L(×105 ) kw∗ − w̄k
StyleGAN latent space? Translation (Right 140 pixels) 0.782 48.56
Translation (Left 160 pixels) 0.406 44.12
We set out to study the question if it is even possible
Zoom out (2X) 0.225 38.04
to embed images into the StyleGAN latent space. This
Zoom in (2X) 0.718 40.55
question is not trivial, because our initial embedding ex-
90◦ Rotation 0.622 47.21
periments with faces and with other GANs resulted in faces
180◦ Rotation 0.599 42.93
that were no longer recognizable as the same person. Due to
the improved variability of the FFHQ dataset and the supe-
Table 1: Embedding results of the transformed images. L
rior quality of the StyleGAN architecture, there is a renewed
is the loss (Eq.1) after optimization. kw∗ − w̄k is the dis-
hope that embedding existing images in the latent space is
tance between the latent codes w∗ and w̄ (Section 5.1) of
possible.
the average face [14].
3.1. Embedding Results for Various Image Classes
To test our method, we collect a small-scale dataset of 25 man faces. As Figure 1 shows, although slightly worse
diverse images spanning 5 categories (i.e. faces, cats, dogs, than those of human faces, we can obtain reasonable and
cars, and paintings). Details of the dataset are shown in relatively high-quality embeddings of cats, dogs and even
the supplementary material. We use the code provided by paintings and cars. This reveals the effective embedding ca-
StyleGAN [14] to preprocess the face images. This prepro- pability of the algorithm and the generality of the learned
cess includes registration to a canonical face position. filters of the generator.
To better understand the structure and attributes of the
Another interesting question is how the quality of the
latent space, it is beneficial to study the embedding of a
pre-trained latent space affects the embedding. To conduct
larger variety of image classes. We choose faces of cats,
these tests we also used StyleGANs trained on cars, cats, ...
dogs, and paintings as they share the overall structure with
The quality of these results is significantly lower, as shown
human faces, but are depicted in a very different style. Cars
in supplementary materials.
are selected as they have no structural similarity to faces.
Figure 1 shows the embedding results consist of one ex- 3.2. How Robust is the Embedding of Face Images?
ample for each image class in the collected test dataset. It
can be observed that the embedded Obama face is of very Affine Transformation As Figure 2 and Table 1 show,
high perceptual quality and faithfully reproduces the in- the performance of StyleGAN embedding is very sensitive
put. However, it is noted that the embedded face is slightly to affine transformations (translation, resizing and rotation).
smoothed and minor details are absent. Among them, the translation seems to have the worst perfor-
Going beyond faces, interestingly, we find that although mance as it can fail to produce a valid face embedding. For
the StyleGAN generator is trained on a human face dataset, resizing and rotation, the results are valid faces. However,
the embedding algorithm is capable to go far beyond hu- they are blurry and lose many details, which are still worse

4434
than the normal embedding. From these observations, we
argue that the generalization ability of GANs is sensitive to
affine transformation, which implies that the learned rep-
resentations are still scale and position dependent to some
extent.

Figure 3: Stress test results on defective image embedding.


Top row: the input images. Bottom row: the embedded
results.

Embedding Defective Images As Figure 3 shows, the


performance of StyleGAN embedding is quite robust to de-
fects in images. It can be observed that the embeddings of Figure 4: Morphing between two embedded images (the
different facial features are independent of each other. For left-most and right-most ones).
example, removing the nose does not have an obvious influ-
ence on the embedding of the eyes and the mouth. On the
one hand, this phenomenon is good for general image edit- and these tests correspond to semantic image editing appli-
ing applications. On the other hand, it shows that the latent cations in computer vision and computer graphics: morph-
space does not force the embedded image to be a complete ing, expression transfer, and style transfer. We consider a
face, i.e. it does not inpaint the missing information. test successful if the resulting manipulation results in high
quality images.
3.3. Which Latent Space to Choose?
There are multiple latent spaces in StyleGAN [14] that 4.1. Morphing
could be used for an embedding. Two obvious candidates
Image morphing is a longstanding research topic in com-
are the initial latent space Z and the intermediate latent
puter graphics and computer vision, e.g. [32, 26, 27, 29, 34,
space W . The 512-dimensional vectors w ∈ W are ob-
16]). Given two embedded images with their respective la-
tained from the 512-dimensional vectors z ∈ Z by passing
tent vectors w1 and w2 , morphing is computed by a linear
them through a fully connected neural network. An impor-
interpolation, w = λw1 + (1 − λ)w2 , λ ∈ (0, 1), and sub-
tant insight of our work is that it is not easily possible to
sequent image generation using the new code w. As Figure
embed into W or Z directly. Therefore, we propose to em-
4 shows, our method generates high-quality morphing be-
bed into an extended latent space W + . W + is a concate-
tween face images (row 1,2,3) but fails on non-face images
nation of 18 different 512-dimensional w vectors, one for
in both in-class (row 4) and inter-class (row 5) morphing.
each layer of the StyleGAN architecture that can receive
Interestingly, it can be observed that there are contours of
input via AdaIn. As shown in Figure 5 (c)(d), embedding
human faces in the intermediate images of the inter-class
into W directly does not give reasonable results. Another
morphing, which shows that the latent space structure of
interesting question is how important the learned network
this StyleGAN is dedicated to human faces. We therefore
weights are for the result. We answer this question in Fig-
conjecture that non-face images are actually embedded the
ure 5 (b)(e) by showing an embedding into a network that is
following way. The initial layers create a face like structure
simply initialized with random weights.
but the later layers paint over this structure so that it is no
4. How Meaningful is the Embedding? longer recognizable. While an extensive study of morphing
itself is beyond the scope of this paper, we believe that the
We propose three tests to evaluate if an embedding is face morphing results are excellent and might be superior
semantically meaningful. Each of these tests can be con- to the current state of the art. We leave this investigation to
ducted by simple latent code manipulations of vectors wi future work.

4435
(a) (b) (c) (d) (e) (f) (g)

Figure 5: (a) Original images. Embedding results into the original space W : (b) using random weights in the network layers;
(c) with w̄ initialization; (d) with random initialization. Embedding results into the W + space: (e) using random weights in
the network layers; (f) with w̄ initialization; (g) with random initialization.

9 layers (corresponding to spatial resolution 642 − 10242 ).


Our method is able to transfer the low level features (e.g.
colors and textures) but fails to faithfully maintain the con-
tent structure of non-face images (second column Figure 8),
especially the painting. This phenomenon reveals that the
generalization and expressing power of StyleGAN is more
likely to reside in the style layers corresponding to higher
spatial resolutions.

4.3. Expression Transfer and Face Reenactment

Figure 6: First column: style image; Second column: em- Given three input vectors w1 , w2 , w3 , expression trans-
bedded stylized image using style loss from conv4 2 layer fer is computed as w = w1 + λ(w3 − w2 ), where w1 is the
of VGG-16; Third to Sixth column: style transfer by re- latent code of the target image, w2 corresponds to a neu-
placing latent code of last 9 layers of base image with the tral expression of the source image, and w3 corresponds to
embedded style image. a more distinct expression. For example, w3 could corre-
spond to a smiling face and w2 to an expressionless face of
the same person. To eliminate the noise (e.g. background
4.2. Style Transfer noise), we heuristically set a lower bound threshold on the
L2 − norm of the channels of difference latent code, be-
Given two latent codes w1 and w2 , style transfer is com- low which, the channel is replaced by a zero vector. For
puted by a crossover operation [14]. We show the style the above experiment, the selected value of the threshold is
transfer results between an embedded stylized image and 1. We normalize the resultant vectors to control the inten-
other face images (Figure 6) and between embedded images sity of an expression in a particular direction. Such code is
from different classes (Figure 8). relatively independent of the source faces and can be used
More specifically in Figure 8, we retain the latent codes to transfer expressions (Figure 7). We believe that these
of the embedded content image for the first 9 layers (cor- expression transfer results are also of very high quality. Ad-
responding to spatial resolution 42 − 642 ) and override the ditional results are available in supplementary materials and
latent codes with the ones of the style image for the last the accompanying video.

4436
Figure 7: Results on expression transfer. The first row shows the reference images from IMPA-FACES3D [22] dataset. In the
following rows, the middle image in each of the examples is the embedded image, whose expression is gradually transferred
to the reference expression (on the right) and the opposite direction (on the left) respectively. More results are included in the
supplementary material.

of the pre-trained generator. Starting from a suitable ini-


tialization w, we search for an optimized vector w∗ that
minimizes the loss function that measures the similarity be-
tween the given image and the image generated from w∗ .
Algorithm 1 shows the pseudo-code of our method. An in-
teresting aspect of this work is that not all design choices
lead to good results and that experimenting with the design
choices provides further insights into the embedding.

Algorithm 1: Latent Space Embedding for GANs


Input: An image I ∈ Rn×m×3 to embed; a
pre-trained generator G(·).
Output: The embedded latent code w∗ and the
embedded image G(w∗ ) optimzed via F ′ .
1 Initialize latent code w∗ = w;
2 while not converged do
λ
3 L ← Lpercept (G(w∗ ), I) + N kG(w∗ ) − Ik22 ;
∗ ∗ ′
4 w ← w − ηF (∇w∗ L );
Figure 8: Style transfer between the embedded style image 5 end
(first column) and the embedded content images (first row).

5.1. Initialization
5. Embedding Algorithm
We investigate two design choices for the initialization.
Our method follows a straightforward optimization The first choice is random initialization. In this case, each
framework [4] to embed a given image onto the manifold variable is sampled independently from a uniform distribu-

4437
Data class w Init. L(×105 ) kw∗ − w̄k
w = w̄ 0.309 30.67
Face
Random 0.351 35.60
w = w̄ 0.752 70.86
Cat
Random 0.740 70.97
w = w̄ 0.922 74.78
Dog
Random 0.845 75.14
w = w̄ 3.530 103.61
Painting
Random 3.451 105.29
w = w̄ 1.390 82.53
Car
Random 1.269 82.60

Table 2: Algorithmic choice justification on the latent code


initialization. w Init. is the initialization method for the
latent code w. L is the mean of the loss (Eq.1) after opti-
mization. kw∗ − w̄k is the distance between the latent codes
w∗ and w̄ of the average face [14].

tion U [−1, 1]. The second choice is motivated by the obser- Figure 9: Algorithmic choice justification on the loss func-
vation that the distance to the mean latent vector w̄ can be tion. Each row shows the results of an image from the five
used to identify low quality faces [14]. Therefore, we pro- different classes in our test dataset respectively. From left
pose to use w̄ as initialization and expect the optimization to right, each column shows: (1) the original image; (2)
to converge to a vector w∗ that is closer to w̄. pixel-wise MSE loss only; (3) perceptual loss on VGG-16
To evaluate these two design choices, we compared the conv3 2 layer only; (4) pixel-wise MSE loss and VGG-16
loss values and the distance kw∗ − w̄k between the opti- conv3 2; (5) perceptual loss (Eq.2) only; (6) our loss func-
mized latent code w∗ and w̄ after optimization. As Table 2 tion (Eq.1). More results are included in the supplementary
shows, initializing w = w̄ for face image embeddings not material.
only makes the optimized w∗ closer to w̄, but also achieves
a much lower loss value. However, for images in other
where I1 , I2 ∈ Rn×n×3 are the input images, Fj is the fea-
classes (e.g. dog), random initialization proves to be the
ture output of VGG-16 layers conv1 1, conv1 2, conv3 2
better option. Intuitively, the phenomenon suggests that the
and conv4 2 respectively, Nj is the number of scalars in the
distribution has only one cluster of faces, the other instances
jth layer output, λj = 1 for all js are empirically obtained
(e.g. dogs, cats) are scattered points surrounding the cluster
for good performance.
without obvious patterns. Qualitative results are shown in
Our choice of the perceptual loss together with the pixel-
Figure 5 (f)(g).
wise MSE loss comes from the fact that the pixel-wise MSE
5.2. Loss Function loss alone cannot find a high quality embedding. The per-
ceptual loss therefore acts as some sort of regularizer to
To measure the similarity between the input image and
guide the optimization into the right region of the latent
the embedded image during optimization, we employ a loss
space.
function that is a weighted combination of the VGG-16 per-
We perform an ablation study to justify our choice of loss
ceptual loss [12] and the pixel-wise MSE loss:
function in Eq.1. As Figure 9 shows, using the pixel-wise
λmse MSE loss term alone (column 2) embeds the general colors
w∗ = min Lpercept (G(w), I) + kG(w) − Ik22 (1) well but fails to catch the features of non-face images. In ad-
w N
dition, it has a smoothing effect that does not preserve the
where I ∈ Rn×n×3 is the input image, G(·) is the pre- details even for the human faces. Interestingly, due to the
trained generator, N is the number of scalars in the image pixel-wise MSE loss working in the pixel space and ignor-
(i.e. N = n × n × 3), w is the latent code to optimize, ing the differences in feature space, its embedding results
λmse = 1 is empirically obtained for good performance. on non-face images (e.g. the car and the painting) have a
For the perceptual loss term Lpercept (·) in Eq.1, we use: tendency towards the average face of the pre-trained Style-
GAN [14]. This problem is addressed by the perceptual
X4
λj
Lpercept (I1 , I2 ) = kFj (I1 ) − Fj (I2 )k22 (2) losses (column 3, 5) that measures image similarity in the
j=1
N j feature space. Since our embedding task requires the em-

4438
Figure 11: Stress test results on iterative embedding. The
left most column shows the original images and the subse-
quent columns are the results of iterative embedding.

degenerates (more details are lost) with the number of iter-


ative embedding. The reason for this observation may be
that the employed optimization approach suffers from slow
convergence around local optimum. For the embeddings
other than human faces, the stochastic initial latent codes
Figure 10: Loss values vs. the number of optimization steps. may also be a factor for the degradation. In summary, these
observations show that our embedding approach can reach
bedded image to be close to the input at all scales, we found reasonably “good” embeddings on the model distribution
that matching the features at multiple layers of the VGG-16 easily, although “perfect” embeddings are hard to reach.
network (column 5) works better than using only a single
layer (column 3). This further motivates us to combine the 6. Conclusion
pixel-wise MSE loss with the perceptual loss (column 4, 6)
We proposed an efficient algorithm to embed a given
from that the pixel-wise MSE loss can be viewed as the low-
image into the latent space of StyleGAN. This algorithm
est level perceptual loss at pixel scale. Column 6 of Figure 9
enables semantic image editing operations, such as image
shows the embedding results of our final choice (pixel-wise
morphing, style transfer, and expression transfer. We also
MSE + multi-layer perceptual loss), which achieves the best
used the algorithm to study multiple aspects of the Style-
performance among different algorithmic choices.
GAN latent space. We proposed experiments to analyze
5.3. Other Parameters what type of images can be embedded, how they are em-
bedded, and how meaningful the embedding is. Important
We use the Adam optimizer with a learning rate of 0.01, conclusions of our work are that embedding works best into
β1 = 0.9, β2 = 0.999, and ǫ = 1e−8 in all our exper- the extended latent space W + and that any type of image
iments. We use 5000 gradient descent steps for the opti- can be embedded. However, only the embedding of faces is
mization, taking less than 7 minutes per image on a 32GB semantically meaningful.
Nvidia TITAN V100 GPU.
Our framework still has several limitations. First, we in-
To justify our choice of 5000 optimization steps, we in-
herit image artifacts present in pre-trained StyleGAN that
vestigated the change in the loss function as a function of
we illustrate in supplementary materials. Second, the opti-
the number of iterations. As Figure 10 shows, the loss value
mization takes several minutes and an embedding algorithm
of the human face image drops the quickest and converges
that can work in under a second would be more appealing
at around 1000 optimization steps; those of the cat, the dog
for interactive editing.
and the car images converge slower at around 3000 opti-
In future work, we hope to extend our framework to pro-
mization steps; while the painting curve is the slowest and
cess videos in addition to static images. Further, we would
converges around 5000 optimization steps. We choose to
like to explore embeddings into GANs trained on three-
optimize the loss function for 5000 steps in all our experi-
dimensional data, such as point clouds or meshes.
ments.
Acknowledgement This work was supported by the
KAUST Office of Sponsored Research (OSR) under Award
Iterative Embedding We tested the robustness of the pro- No. OSR-CRG2017-3426.
posed method on iterative embedding, i.e. we iteratively
take the embedding results as new input images and do the
embedding again. This process is repeated seven times. As References
Figure 11 shows, although it is guaranteed that the input [1] Yazeed Alharbi, Neil Smith, and Peter Wonka. Latent filter
image exists in the model distribution after the first em- scaling for multimodal unsupervised image-to-image trans-
bedding, the performance of the proposed method slowly lation. arXiv preprint arXiv:1812.09877, 2018. 1

4439
[2] Martin Arjovsky, Soumith Chintala, and Léon Bottou. [18] Chuan Li and Michael Wand. Precomputed real-time texture
Wasserstein generative adversarial networks. In Proceedings synthesis with markovian generative adversarial networks.
of the 34th International Conference on Machine Learning, In Computer Vision - ECCV 2016 - 14th European Confer-
volume 70, pages 214–223, 2017. 2 ence, Amsterdam, The Netherlands, October 11-14, 2016,
[3] Andrew Brock, Jeff Donahue, and Karen Simonyan. Large Proceedings, Part III, 2016. 1
scale GAN training for high fidelity natural image synthe- [19] Jianan Li, Xiaodan Liang, Yunchao Wei, Tingfa Xu, Jiashi
sis. In International Conference on Learning Representa- Feng, and Shuicheng Yan. Perceptual generative adversarial
tions, 2019. 1, 2 networks for small object detection. In The IEEE Conference
[4] Antonia Creswell and Anil Anthony Bharath. Inverting the on Computer Vision and Pattern Recognition (CVPR), July
generator of a generative adversarial network. IEEE Trans- 2017. 1
actions on Neural Networks and Learning Systems, 2018. 2, [20] Shuying Liu and Weihong Deng. Very deep convolutional
6 neural network based image classification using small train-
[5] Alexey Dosovitskiy and Thomas Brox. Generating images ing sample size. In 2015 3rd IAPR Asian Conference on
with perceptual similarity metrics based on deep networks. Pattern Recognition (ACPR), Nov 2015. 2
In Advances in neural information processing systems, pages [21] Xudong Mao, Qing Li, Haoran Xie, Raymond Y.K. Lau,
658–666, 2016. 2 Zhen Wang, and Stephen Paul Smolley. Least squares gener-
[6] Leon Gatys, Alexander S Ecker, and Matthias Bethge. Tex- ative adversarial networks. In The IEEE International Con-
ture synthesis using convolutional neural networks. In Pro- ference on Computer Vision (ICCV), Oct 2017. 2
ceedings of the 28th International Conference on Neural In- [22] Jesús P Mena-Chalco, Luiz Velho, and RM Cesar Junior.
formation Processing Systems - Volume 1, NIPS’15, 2015. 3d human face reconstruction using principal components
2 spaces. In Proceedings of WTD SIBGRAPI Conference on
[7] Leon A Gatys, Alexander S Ecker, and Matthias Bethge. A Graphics, Patterns and Images, 2011. 6
neural algorithm of artistic style. arXiv, Aug 2015. 2 [23] Takeru Miyato, Toshiki Kataoka, Masanori Koyama, and
Yuichi Yoshida. Spectral normalization for generative ad-
[8] Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing
versarial networks. In International Conference on Learning
Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and
Representations, 2018. 2
Yoshua Bengio. Generative adversarial nets. In Advances in
neural information processing systems, 2014. 1 [24] Taesung Park, Ming-Yu Liu, Ting-Chun Wang, and Jun-Yan
Zhu. Semantic image synthesis with spatially-adaptive nor-
[9] Ishaan Gulrajani, Faruk Ahmed, Martin Arjovsky, Vincent
malization. In Proceedings of the IEEE Conference on Com-
Dumoulin, and Aaron C Courville. Improved training of
puter Vision and Pattern Recognition, 2019. 1
wasserstein gans. In Advances in Neural Information Pro-
[25] Alec Radford, Luke Metz, and Soumith Chintala. Un-
cessing Systems, pages 5767–5777, 2017. 2
supervised representation learning with deep convolu-
[10] Xun Huang and Serge Belongie. Arbitrary style transfer in
tional generative adversarial networks. arXiv preprint
real-time with adaptive instance normalization. In ICCV,
arXiv:1511.06434, 2015. 1
2017. 2
[26] Ulrich Scherhag, Christian Rathgeb, Johannes Merkle,
[11] Phillip Isola, Jun-Yan Zhu, Tinghui Zhou, and Alexei A Ralph Breithaupt, and Christoph Busch. Face recognition
Efros. Image-to-image translation with conditional adver- systems under morphing attacks: A survey. IEEE Access, 7,
sarial networks. CVPR, 2017. 1 2019. 4
[12] Justin Johnson, Alexandre Alahi, and Li Fei-Fei. Perceptual [27] Clemens Seibold, Wojciech Samek, Anna Hilsmann, and Pe-
losses for real-time style transfer and super-resolution. In ter Eisert. Detection of face morphing attacks by deep learn-
European conference on computer vision, 2016. 2, 7 ing. In International Workshop on Digital Watermarking,
[13] Tero Karras, Timo Aila, Samuli Laine, and Jaakko Lehtinen. pages 107–120. Springer, 2017. 4
Progressive growing of GANs for improved quality, stabil- [28] Ron Slossberg, Gil Shamai, and Ron Kimmel. High quality
ity, and variation. In International Conference on Learning facial surface and texture synthesis via generative adversar-
Representations, 2018. 1, 2 ial networks. In European Conference on Computer Vision,
[14] Tero Karras, Samuli Laine, and Timo Aila. A style-based pages 498–513. Springer, 2018. 1
generator architecture for generative adversarial networks. [29] Mark Steyvers. Morphing techniques for manipulating face
arXiv preprint arXiv:1812.04948, 2018. 1, 2, 3, 4, 5, 7 images. Behavior Research Methods, Instruments, & Com-
[15] Diederik P Kingma and Max Welling. Auto-encoding varia- puters, 31(2):359–369, 1999. 4
tional bayes. arXiv preprint arXiv:1312.6114, 2013. 2 [30] Sergey Tulyakov, Ming-Yu Liu, Xiaodong Yang, and Jan
[16] Pavel Korshunov and Touradj Ebrahimi. Using face morph- Kautz. Mocogan: Decomposing motion and content for
ing to protect privacy. In 2013 10th IEEE International Con- video generation. In The IEEE Conference on Computer Vi-
ference on Advanced Video and Signal Based Surveillance, sion and Pattern Recognition (CVPR), June 2018. 1
pages 208–213. IEEE, 2013. 4 [31] Carl Vondrick, Hamed Pirsiavash, and Antonio Torralba.
[17] Alex Krizhevsky, Ilya Sutskever, and Geoffrey E Hinton. Generating videos with scene dynamics. In Advances in Neu-
Imagenet classification with deep convolutional neural net- ral Information Processing Systems 29. 2016. 1
works. In Advances in Neural Information Processing Sys- [32] George Wolberg. Image morphing: a survey. The Visual
tems 25. 2012. 2 Computer, 14(8), 1998. 4

4440
[33] Wenqi Xian, Patsorn Sangkloy, Varun Agrawal, Amit Raj,
Jingwan Lu, Chen Fang, Fisher Yu, and James Hays. Tex-
turegan: Controlling deep image synthesis with texture
patches. In The IEEE Conference on Computer Vision and
Pattern Recognition (CVPR), June 2018. 1
[34] Fei Yang, Eli Shechtman, Jue Wang, Lubomir Bourdev, and
Dimitris Metaxas. Face morphing using 3d-aware appear-
ance optimization. In Proceedings of Graphics Interface
2012, pages 93–99. Canadian Information Processing Soci-
ety, 2012. 4
[35] Jun-Yan Zhu, Philipp Krhenbhl, Eli Shechtman, and
Alexei A. Efros. Generative visual manipulation on the nat-
ural image manifold. Lecture Notes in Computer Science,
page 597613, 2016. 2
[36] Jun-Yan Zhu, Taesung Park, Phillip Isola, and Alexei A
Efros. Unpaired image-to-image translation using cycle-
consistent adversarial networkss. In Computer Vision
(ICCV), 2017 IEEE International Conference on, 2017. 1
[37] Jun-Yan Zhu, Richard Zhang, Deepak Pathak, Trevor Dar-
rell, Alexei A Efros, Oliver Wang, and Eli Shechtman. To-
ward multimodal image-to-image translation. In I. Guyon,
U. V. Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vish-
wanathan, and R. Garnett, editors, Advances in Neural Infor-
mation Processing Systems 30, pages 465–476. Curran As-
sociates, Inc., 2017. 1

4441

You might also like