0% found this document useful (0 votes)
27 views8 pages

Ioegc 12 131 12188

Uploaded by

yusrafaisalcs
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
27 views8 pages

Ioegc 12 131 12188

Uploaded by

yusrafaisalcs
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 8

Proceedings of 12th IOE Graduate Conference

Peer Reviewed
ISSN: 2350-8914 (Online), 2350-8906 (Print)
Year: 2022 Month: October Volume: 12

Generating Progressed Face Ageing Image using Generative


Adversarial Network
Niraj Kumar Gupta a , Sharan Thapa b , Bal Krishna Nyaupane c , Prabin Nepali d
a, b, c, d Department of Electronics & Computer Engineering, Paschimanchal Campus, IOE, Tribhuvan University, Nepal
a [email protected], b [email protected], c [email protected], d [email protected]

Abstract
The aged version of own face image is a matter of curiosity that one would look in near future. Among
various technique use for modeling progressed aged face image, Generative Adversarial Network (GAN)
and its extension conditional GAN with regression has shown astonishing results. This research work aims
to generate the progressed facial image using the proposed model. The model takes the input image of
size 256 *256 and the target age in range 1 to 80 for the generation of aged image. The input is converted
into intermediate eighteen different style of latent space by age encoder in StyleGan domain which is input
to StyleGAN2 generator to produce the target aged image. The output; aged face image is passed to age
predictor to estimate the age. The eighteen style control the feature of generated output images like pose,
hair, face shape, eyes etc. The loss between estimated and the target age along with other losses is used to
update the model to produce aged version of input face image which is of size 1024 *1024. UTKFace datasets
has been used to train the model. The model is able to generate plausible progressed aged face image in the
range of 1 to 80 for single front facing image.
Keywords
Generative Adversarial Networks, Pixel to Style to Pixel, StyleGAN, Face Aging, Latent Space

1. Introduction of retrieving finer details and usually creates fuzzy


pictures. The GAN optimization process is a minmax
Face aging is a sequential change in facial skin tone problem, and it is finished at the Nash equilibrium
and structure in future appearance of individual, can be point. In a strategy profile, each player’s plan is the
used in search of missing human individual, fugitive ideal response to all other player strategies, according
criminal, cinematography. Face aging image creation to the Nash Equilibrium.
model has a high uncertainty and very much affected
by individual facial expression, posture, illumination The GAN model has advanced with the use of
and resolution.The aging rate vary by person to person conditional GANs (cGANs). They have shown to be
which seriously increase the problem of generalization superior to standard GANS since they enable the
of model. generation of images with certain criteria or qualities
in advance.
Generative Adversarial Networks GANs are a type of
deep neural network created by Ian Goodfellow in Target age condition along with the input face image
2014 [1] that employs a generator and discriminator is given in the model, to generate the aged face image.
network that train each other via repeated cycles of The progressed aged face image is achieved by
generation and discrimination while attempting to generating aged face image at various age and then
mislead one another. The discriminator is trained to concatenated to display face aging progression. Aging
discern between fake and true data, while the is an unavoidable and ongoing process. Face age
generator is taught to generate bogus data.The progression is required in various situations such as
structural diagram of the GANs is shown in figure 1 child/person missing, fleeing criminal, entertainment,
GANs provide a powerful framework for rapidly security checkpoint, and so on. Human aging may be
creating data from supplied data probability divided into two stages: childhood to adulthood and
distributions, in contrast to CNN, which is incapable adulthood to old age. Cranial development occurs

Pages: 1009 – 1016


Generating Progressed Face Ageing Image using Generative Adversarial Network

methods. The traditional methods has been


categorized into physical-model based and prototype
based .Similarly, in deep learning based model,
translational-based, sequence-based and conditional
based are these three method been in used for the
generating aged image of the face as shown in figure 2
Figure 1: The Typical Structure of Generative
Adversarial Networks (GANs).

from childhood through adulthood, and the latter is


marked by changes in skin shape and texture. As a
result, if a deep neural network model is developed
that can imitate the given age version of the person at
numerous ages, it can assist in identifying the person
at the desired age. Similarly, in today’s digital era, Figure 2: Face age progress and types.
facial biometric security (at immigration, security
checkpoints, and so on) has been deployed; these
systems may be made more resilient using face aging
For the first time, the extra age labels were
modeling without the hassle of gathering facial data
incorporated into the network model by Zhang et
every time for cross verification. The cinematography
al.[4]. The writers implements a Conditional
makes extensive use of it since it depicts the
Adversarial Auto encoder (CAAE) network, believing
characters’ ages over a lengthy period of time,
all images of the human face are represented by a
allowing for the easy synthesis of their many aged
high-dimensional manifold. To accomplish this, a
versions. This can be used to visualize own different
convolutional encoder is used to map an input image
aged version and to smile.
of a face to the latent space. The encoded samples are
The problem of generating progressed face image and moved in the direction of age change when the images
identifying aging accuracy is considered for this paper are projected into the latent space by assessing the age
work. When developing a face age picture, visual label. A decoder network is then used to recreate the
integrity, aging correctness, and identity preservation input facial image with the ageing effect. To construct
are crucial factors. In most cases, the visual accuracy Identity-preservation (LID), Yang et al.[5] took a deep
of a synthetic face image is determined in terms of face descriptor that has been pre-trained to extract
human perception. A contemporary technique, like identity-based feature vectors from both young and
Frechet Inception Distance, defines the quantitative aged face images. The penalization of network for
evaluation of visual quality. Whether a synthetic face large identity difference is done by calculating the
image is within the intended age range or not can be euclidean distance between the associated
determined by aging accuracy. Age estimator and user identity-related feature vectors. Similarly, Lage
studies are the two quantitative evaluation methods that includes an age classifier that penalizes the difference
are employed. Identity preservation can be evaluated between the age of the synthesized face image and the
with three methods: automatic face verification [2], target age to prevent the synthesized face from
automatic face identification or user groups. Metrics straying from the target age. Following this logic,
for face identification are best calculated using Arc- Wang et al. [6] proposed an Identity Preserving GAN
Face and FaceNet [3]. (IPCGAN), which combines an identity-preserving
component with a pre-trained CNN that functions as
The main objective of this paper is to develop a
an age estimator. Shen et al. [7] describes
methodology based on generative adversarial network
InterFaceGAN, a model for manipulating facial
for generating progressed aged face image.
features in a given face image. InterFaceGAN
operates in the latent space of a previously developed
2. Related Work facial image generating model, such as StyleGAN [8].
InterFace GAN leverages the well-structured latent
Face aging has been categorized broadly into two type space by looking for the linear boundaries that divide
viz. Traditional methods and Deep learning based the latent space into two subspaces in terms of a

1010
Proceedings of 12th IOE Graduate Conference

binary semantic. Finally, an individual’s age is interpolating between age group using the latent
continually determined by shifting a latent vector vector which is trained on FFHQageing data.
perpendicular to the border. However, the more the Similarly, Yao et. al.[14] explains the generation of
latent vector is moved in one direction, the greater the high resolution aged image, the model of which is
change in the identity of the original data is seen. trained on FFHQ.
Vanilla GANs are effective at creating crisp images,
but due to model stability, they are limited to small 3. Methodology
image dimensions. While Progressive growing
GANs[9] is a reliable method for training GANs The main framework of this paper is as in figure 3
models which produces huge, high-quality images by
gradually expanding the size of the model throughout
the training procedure. In Progressive GAN, batch
normalization is not used instead it uses other two
technique mini-batch standard deviation and
pixel-wise normalization. After each convolution
layer, the generator does pixelwise normalization,
which normalizes each pixel value in the activation
throughout the channel.This is a type of activation
limitation known more broadly as local response
normalization. In this GAN, the bias for each layer is
set to zero, and the model weights are set to a random Figure 3: Model architecture.
Gaussian before being rescaled using the He weight
normalization technique and model is optimized using
Adam optimizer. 3.1 Age Encoder
Pixel2Style2pixel[10] introduces a technique that Age encoder is based on Pixel-to-Style-Pixel encoder
transform the input image into intermediate z and architecture, the input to age encoder is four channel.
extended latent vector w+ that can be used with The input image is added with input target age that is
StyleGAN generator for easy manipulation of facial randomly sampled, as a constant value. The age
encoder extracts features maps at three spatial level,
attribute traversing the extended latent vector w+. The the fine, medium and coarse style groups of
architecture for encoder is used same as in paper[10] StyleGAN. From these three level of spatial style,
to achieve the objective of this paper. maptostyle convolutional neural network block
change it to 18 different style of latent vector codes.
StyleGAN1[11] is an advancement of the progressive Itarget age is the 4 channel input encompassing 3
growing GAN for generating high resolution channel of image and 1 channel of age. The age
images.The StyleGAN generator no longer accepts a vector is stacked to image array tensor using vector
broadcasting.
latent space point as input; instead, two new
randomness sources are employed to build a synthetic
image: a solo mapping network and noise layers.The Age vector tensor = target age ∗ (1, image width, image height) (1)
mapping network produces a vector that defines the
styles and connects them at each point in the
I ≡ Input image
generator model via a new layer called adaptive
instance normalization. Control over the style of the Itarget age = concatenate(I,target age)
resulting image is provided by using this style vector. The structure of encoder is such that the input is fed to
The output image of this has blob like aritifact. The conv2d block with channel 4, filter size 64, kernel size
adaptive instance normalization in broken down into (3, 3) stride (1, 1) and padding (1, 1). Followed by
modulation and demodulation process in batch normalization and then parametric rectified
styleGAN2[12] that empirically prove resolving of linear activation . The output from convolution block
blob like artifact produced in styleGAN1. fed to subsequent 24 ResNet-block. Each resNet
Or et. al[13] introduces Lifespan age transformation block is characterized by maxpool2d followed by
synthesis scheme that generate aging image by batch normalization followed by convolution then
PReLU, followed by conv2d and then batchnorm2d.

1011
Generating Progressed Face Ageing Image using Generative Adversarial Network

StyleGAN2, at first synthesis block, it is fed with


512*4*4 constant input. The received input is then
convolved with kernel size of 3*3. The resulting is
passed to (toRGB) block that convert it into RGB
channel followed by up sample block that increases
the dimension by factor 2. The up sample is done
every twice convolution block and thus 18 synthesis
block. Thus, the constant input is thus progressively
convolved and up sample from 4*4 to 8*8, to 16*16
Figure 4: Age Encoder to 32*32 to 64*64 to 128*128 to 256*256 to 512*512
. and finally outputs 1024*1024.
In figure 5, A denotes the linear layer and B denotes
Again the output is inputted to next block that is also a broadcast and scaling operation , noise in the single
convolution block which is characterized by average channel. StyleGAN2 has better generator network, the
pooling then conv2d, then Relu activation again AdaIN operation has been replaced with weight
conv2d and finally sigmoid activation function. modulation and demodulation step. According to
There are 24 resNet+convolution block, the three early creator of styleGAN2, it improve droplet artifact from
style namely middle, coarse and fine are taken from the image generated by styleGAN generator, the
7, 21, 24 block , which are then fed to map-to-style earlier version which was brought by normalization
block. The map-to-style block is a convolution neural step from adaptive instance normalization. Style
network block . The three style is changed to 18 * vector code per layer is computed as from equation 2.
512 latent vector code. The fine style form 1-3 latent The convolution weights w are computed as below for
vector code, middle form 4-7 and coarse form 8-18 modulation as in equation 3.
latent vector code out of 18 different style code of size
512 each.
si = fAi (Wi ) (2)
The first three style/ feature are c1, c2 and c3 of
dimension (128, 64, 64), (256, 32, 32) and (512, 16,
16) respectively. The feature from c2 is up sampled ′

and is added to c3 and thus the resulting output is of Wi, j,k = si × wi, j,k (3)
dimension (512, 32, 32) and say this is style p2.
Similarly, the output from c1 is again up sampled and Similarly, convolution weight is demodulated as
is added to p2 resulting p1 of dimension (512, 64, 64). follows where i is the input channel, j is output
The output of c3 fine style is fed to map2Style block channel and k is kernel index.
(1-3), which generates the extended latent vector of ′
Wi, j,k
′′
dimension 512, similarly the output of p2 is fed to Wi, j,k = q (4)

map2Style block (4-7) this also produce extended ∑i. j Wi, j,k + ∈
latent vector of size 512 and the output of p1 is fed to
map2style block from (8-18) which too produces 512 Also, another feature called path length regularization
dimension latent vector. Thus age encoder in total has been introduced that motivates a constant step in
produces 18*512 extended latent vector. This latent W+ to get in a non-zero that is a shift of constant
vector is called extended because it is in styleGAN magnitude in image generated by generator.
domain.
StyleGAN2 leverage the use of residual connections
with down-sampling in the discriminator and skip
3.2 StyleGAN Generator connections in the generator with up-sampling. At the
beginning of training time period, the contribution of
The StyleGAN generator is a pre-trained styleGAN2
low-resolution layers is large and subsequently the
generator. The input to this is 18 style latent vector
high –resolution layers take over.
from the map to style block. W space from map to
style is separated from the image space, where the As a result, the generator starts with a learning
factor of variation is more linear in nature. In the constant and then proceeds through a sequence of

1012
Proceedings of 12th IOE Graduate Conference

Figure 5: Age Encoder details


.

blocks, with the feature map being doubled at each process repeat until the achievable loss saturates/
block. Each block generates an RGB picture, which is converge. Hence, the approach for discriminator
then scaled and summed to give the final full network to make generator produce more realistic
resolution RGB image. data is done by age predictor, thus this is regression in
GAN[15].
The 18 style help us to control the feature of the
generated the output image. The coarse style helps to
control pose, hair, face shape, similarly middle feature 3.3 Training Objective function
helps control feature such as eyes and fine styles helps
The training objective for the proposed GAN network
to control color scheme of face. Latent vector 8, 9 is
is the sum of forward loss and cyclic loss which in
used to control the hair style and color of the source
aggregate need to be minimized.
image given target image.
The problem of face aging is solved by using
3.4 L2 loss
conditional GAN regression model. Here, the
condition is target age that the model try to convert L2 loss is the MSE loss to learn similarities at pixel
the input image to the aged face image of target age. level between the input image and the target aged face
This model takes the estimated age of the image image. As age grow, the shape of face increase, thus
generated by the generator estimated by state of art this encourage to put higher weight for this loss.
predtrained model. The L2 loss between the target age
and the age predicted by age predictor is used too L2 (Itarget age ) =∥ I − Net(Itarget age ) ∥2 (6)
used train the encoder network of the face aging
model. Mathematically,
3.5 Cropped L2 loss
Aging loss =∥ target age−AP(Net(Itarget age ) ∥2 (5) In this, l2 loss is calculated for the cropped part of face
image to give more significance to center face.The
In vanilla GAN, the discriminator compare the
cropped has been as taken as image[13:227,15:229,:]
generated image(Fake) with the target image(True) to that is cropped image consider 13 to 227 rows of image
compute BCE/MAE/MSE loss. Here, the age and 15 to 229 column for all 3 channel of image.The
predictor use the generated image to predict estimated training image and its cropped part is discussed in
age. Then L2 lossloss is calculated by comparing the datasets section.
estimated age with target age, the so calculated loss
with other losses together is fed to age encoder
network to update the weight of the model. The L2 (Itarget age )cropped =∥ I − Net(Itarget age )cropped ∥2 (7)

1013
Generating Progressed Face Ageing Image using Generative Adversarial Network

3.6 Learned Perceptual Image Patch


Similarity Loss
High score of LPIPS indicate the image patches are
perceptually dissimilar. The perceptual feature of
image is calculated by VGG pertained network.
Figure 6: Cycle Loss
.
LLPIPS (Itarget age ) =∥ FE(I) − FE(Net(Itarget age ) ∥2 (8)
Loss Forward = J* L2 loss + K* cropped L2 Loss + L*
3.7 Cropped Learned Perceptual Image Patch LPIPS loss + M* cropped LPIPS loss + O * identity
Similarity Loss loss + P* aging loss
This focus on the center region of the face image to Where J, K, L, M, N, O, P are weights related to losses.
calculate the cropped perceptual image patch similarity
loss.
4. Datasets
LLPIPS (Itarget age )cropped =∥ FE(I) − FE(Net(Itarget age )cropped ∥2 (9)
The datasets for this paper has been taken from
UTKFace kaggle repository. This repository has two
3.8 Identity loss
folder UTKFace and crop part1 with 23708 face
Preserving identity of the face is the key point when image and 9780 images respectively. The filename for
a face is transformed during training. Thus identity image is in convention such as age gender X serialno.
loss is calculated by using cosine similarity of the
In this paper, for training the model 3200 images are
source image and the input image. Also the identity
preservation is less for the large age difference and taken and 800 images are taken for validation of the
more for the less age difference between target and model. Sample images from dataset with their
source image. cropped one is displayed together.

LID (Itarget age ) = A ∗ (1 − [R(I), R(Net(Itarget age) ]) (10)

Where R is the ArcFace pretrained model. The weight


function A(.) is defined by

A = 0.25 ∗ cos(π ∗ (| source age − target age)/80. |) + 0.75 (11)

The value of A is minimum when difference between


source and target age is high and vice versa.
Figure 7: Sample datset image with cropped image
side by
3.9 Aging loss .
The aging loss is characterized by the L2 loss between
the target age supplied and the age predicted by the
pretrained age predictor AP. 5. Results and Discussion

5.1 Implementation Details


Aging loss =∥ target age − AP(Net(Itarget age ) ∥2 (12)
The model is able to generate progressed face image
given input image of age below 80. The obtained
3.10 Cyclic Loss image reflect aged version of given input. The output
image are in age group from 1-10,11-20, 21-30, 31-40,
Cyclic loss is calculated for the robustness of the
41-50, 51-60, 61-70 and 70-80. The result are obtained
network because the network must be able to generate
by implementing following hyper parameter:
the source image if the image generated by the
network with predicted age by age predictor is passed Input image size: 256*256, Output image size:
as input to the network. 256*256, Image Batch size = 2, Number of input

1014
Proceedings of 12th IOE Graduate Conference

Channel = 4, Learning rate for Ranger Optimizer =


0.0001, Alpha for ranger = 0.5, Number of batch for
ranger optimizer = 6, Iteration = 16,000, L2 lambda =
0.25, L2 lambda crop = 1, LPIPS lambda = 0.1,
LPIPS lambda crop = 0.6, ID lambda = 0.1, L2 aging
lambda = 7, Cycle lambda = 1

5.2 Result and Discussions


The loss of the encoder is high in the beginning
because at the beginning of the training, the model has
not seen enough data. As the training progress, the
encoder learns the data resulting in gradually decrease
in the loss. The loss decreases exponentially but
sudden peaks in the loss graph is visible. This sudden
rise in the graph came due to the reason that model
fail to construct the target age nearly. During training,
the target age for construction is uniformly chosen
between 1 to 80. After 14000 iteration the loss
remains almost constant with some variance. The

Figure 8: Encoder Loss

model models the progressed aged face image only


with the single image as input. Also as people get old,
the color of the hair also get changed. The proposed
model work face but it can also be used to manipulate
color of the hair. The hair color of the source face
image can be changed to hair color of target image
with this same trained model. For this, eighteen style
latent vector of the source and target is evaluated and
later the 8,9 style latent vector of source is replaced by
target image. Thus the resultant latent vector is passed
to generator for head hair color change.

5.3 Testimony Images


The following images on test datasets are generated by
trained model.

1015
Generating Progressed Face Ageing Image using Generative Adversarial Network

IEEE conference on computer vision and pattern


recognition, pages 815–823, 2015.
[4] Zhifei Zhang, Yang Song, and Hairong Qi. Age
progression/regression by conditional adversarial
autoencoder. In Proceedings of the IEEE conference
on computer vision and pattern recognition, pages
5810–5818, 2017.
[5] Hongyu Yang, Di Huang, Yunhong Wang, and Anil K
Jain. Learning continuous face age progression:
A pyramid of gans. IEEE transactions on pattern
analysis and machine intelligence, 43(2):499–515,
2019.
[6] Brandon Amos, Bartosz Ludwiczuk, Mahadev
Satyanarayanan, et al. Openface: A general-purpose
face recognition library with mobile applications.
CMU School of Computer Science, 6(2):20, 2016.
[7] Yujun Shen, Jinjin Gu, Xiaoou Tang, and Bolei Zhou.
Interpreting the latent space of gans for semantic face
editing. In Proceedings of the IEEE/CVF conference
on computer vision and pattern recognition, pages
Figure 9: Output image generated by proposed model 9243–9252, 2020.
on left column and on right by HRFAE [8] Tero Karras, Samuli Laine, and Timo Aila. A
style-based generator architecture for generative
adversarial networks. In Proceedings of the
IEEE/CVF conference on computer vision and pattern
6. Conclusion recognition, pages 4401–4410, 2019.
[9] Tero Karras, Timo Aila, Samuli Laine, and Jaakko
The proposed model is based on GAN framework that Lehtinen. Progressive growing of gans for improved
synthesis the progressed aged face image. The input quality, stability, and variation. arXiv preprint
image is converted into intermediate extended latent arXiv:1710.10196, 2017.
vector code by age encoder to feed into generator to [10] Elad Richardson, Yuval Alaluf, Or Patashnik, Yotam
generate aged image of face. The result generated by Nitzan, Yaniv Azar, Stav Shapiro, and Daniel Cohen-
Or. Encoding in style: a stylegan encoder for
proposed model and that by state of art model HRFAE image-to-image translation. In Proceedings of the
[14] is comparable as seen from recovered age IEEE/CVF conference on computer vision and pattern
accuracy. The model is able to generate plausible recognition, pages 2287–2296, 2021.
progressed aged face image in the range of 1 to 80 for [11] Tero Karras, Samuli Laine, and Timo Aila. A
style-based generator architecture for generative
single front facing image. Moreover, the proposed adversarial networks. In Proceedings of the
model can be further improved by preserving the IEEE/CVF conference on computer vision and pattern
background of face image, and generating multiple recognition, pages 4401–4410, 2019.
aged face image from group photos is a part of future [12] Tero Karras, Samuli Laine, Miika Aittala, Janne
work. Hellsten, Jaakko Lehtinen, and Timo Aila. Analyzing
and improving the image quality of stylegan.
In Proceedings of the IEEE/CVF conference on
computer vision and pattern recognition, pages 8110–
References 8119, 2020.
[13] Roy Or-El, Soumyadip Sengupta, Ohad Fried,
[1] Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Eli Shechtman, and Ira Kemelmacher-Shlizerman.
Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Lifespan age transformation synthesis. In European
Courville, and Yoshua Bengio. Generative adversarial Conference on Computer Vision, pages 739–755.
nets. Advances in neural information processing Springer, 2020.
systems, 27, 2014.
[14] Xu Yao, Gilles Puy, Alasdair Newson, Yann
[2] Yunfan Liu, Qi Li, and Zhenan Sun. Attribute- Gousseau, and Pierre Hellier. High resolution face
aware face aging with wavelet-based generative age editing. In 2020 25th International Conference on
adversarial networks. In Proceedings of the Pattern Recognition (ICPR), pages 8624–8631. IEEE,
IEEE/CVF Conference on Computer Vision and 2021.
Pattern Recognition, pages 11877–11886, 2019.
[15] Lucy Chai, Jonas Wulff, and Phillip Isola.
[3] Florian Schroff, Dmitry Kalenichenko, and James Using latent space regression to analyze and
Philbin. Facenet: A unified embedding for face leverage compositionality in gans. arXiv preprint
recognition and clustering. In Proceedings of the arXiv:2103.10426, 2021.

1016

You might also like