Ioegc 12 131 12188
Ioegc 12 131 12188
Peer Reviewed
ISSN: 2350-8914 (Online), 2350-8906 (Print)
Year: 2022 Month: October Volume: 12
Abstract
The aged version of own face image is a matter of curiosity that one would look in near future. Among
various technique use for modeling progressed aged face image, Generative Adversarial Network (GAN)
and its extension conditional GAN with regression has shown astonishing results. This research work aims
to generate the progressed facial image using the proposed model. The model takes the input image of
size 256 *256 and the target age in range 1 to 80 for the generation of aged image. The input is converted
into intermediate eighteen different style of latent space by age encoder in StyleGan domain which is input
to StyleGAN2 generator to produce the target aged image. The output; aged face image is passed to age
predictor to estimate the age. The eighteen style control the feature of generated output images like pose,
hair, face shape, eyes etc. The loss between estimated and the target age along with other losses is used to
update the model to produce aged version of input face image which is of size 1024 *1024. UTKFace datasets
has been used to train the model. The model is able to generate plausible progressed aged face image in the
range of 1 to 80 for single front facing image.
Keywords
Generative Adversarial Networks, Pixel to Style to Pixel, StyleGAN, Face Aging, Latent Space
1010
Proceedings of 12th IOE Graduate Conference
binary semantic. Finally, an individual’s age is interpolating between age group using the latent
continually determined by shifting a latent vector vector which is trained on FFHQageing data.
perpendicular to the border. However, the more the Similarly, Yao et. al.[14] explains the generation of
latent vector is moved in one direction, the greater the high resolution aged image, the model of which is
change in the identity of the original data is seen. trained on FFHQ.
Vanilla GANs are effective at creating crisp images,
but due to model stability, they are limited to small 3. Methodology
image dimensions. While Progressive growing
GANs[9] is a reliable method for training GANs The main framework of this paper is as in figure 3
models which produces huge, high-quality images by
gradually expanding the size of the model throughout
the training procedure. In Progressive GAN, batch
normalization is not used instead it uses other two
technique mini-batch standard deviation and
pixel-wise normalization. After each convolution
layer, the generator does pixelwise normalization,
which normalizes each pixel value in the activation
throughout the channel.This is a type of activation
limitation known more broadly as local response
normalization. In this GAN, the bias for each layer is
set to zero, and the model weights are set to a random Figure 3: Model architecture.
Gaussian before being rescaled using the He weight
normalization technique and model is optimized using
Adam optimizer. 3.1 Age Encoder
Pixel2Style2pixel[10] introduces a technique that Age encoder is based on Pixel-to-Style-Pixel encoder
transform the input image into intermediate z and architecture, the input to age encoder is four channel.
extended latent vector w+ that can be used with The input image is added with input target age that is
StyleGAN generator for easy manipulation of facial randomly sampled, as a constant value. The age
encoder extracts features maps at three spatial level,
attribute traversing the extended latent vector w+. The the fine, medium and coarse style groups of
architecture for encoder is used same as in paper[10] StyleGAN. From these three level of spatial style,
to achieve the objective of this paper. maptostyle convolutional neural network block
change it to 18 different style of latent vector codes.
StyleGAN1[11] is an advancement of the progressive Itarget age is the 4 channel input encompassing 3
growing GAN for generating high resolution channel of image and 1 channel of age. The age
images.The StyleGAN generator no longer accepts a vector is stacked to image array tensor using vector
broadcasting.
latent space point as input; instead, two new
randomness sources are employed to build a synthetic
image: a solo mapping network and noise layers.The Age vector tensor = target age ∗ (1, image width, image height) (1)
mapping network produces a vector that defines the
styles and connects them at each point in the
I ≡ Input image
generator model via a new layer called adaptive
instance normalization. Control over the style of the Itarget age = concatenate(I,target age)
resulting image is provided by using this style vector. The structure of encoder is such that the input is fed to
The output image of this has blob like aritifact. The conv2d block with channel 4, filter size 64, kernel size
adaptive instance normalization in broken down into (3, 3) stride (1, 1) and padding (1, 1). Followed by
modulation and demodulation process in batch normalization and then parametric rectified
styleGAN2[12] that empirically prove resolving of linear activation . The output from convolution block
blob like artifact produced in styleGAN1. fed to subsequent 24 ResNet-block. Each resNet
Or et. al[13] introduces Lifespan age transformation block is characterized by maxpool2d followed by
synthesis scheme that generate aging image by batch normalization followed by convolution then
PReLU, followed by conv2d and then batchnorm2d.
1011
Generating Progressed Face Ageing Image using Generative Adversarial Network
and is added to c3 and thus the resulting output is of Wi, j,k = si × wi, j,k (3)
dimension (512, 32, 32) and say this is style p2.
Similarly, the output from c1 is again up sampled and Similarly, convolution weight is demodulated as
is added to p2 resulting p1 of dimension (512, 64, 64). follows where i is the input channel, j is output
The output of c3 fine style is fed to map2Style block channel and k is kernel index.
(1-3), which generates the extended latent vector of ′
Wi, j,k
′′
dimension 512, similarly the output of p2 is fed to Wi, j,k = q (4)
′
map2Style block (4-7) this also produce extended ∑i. j Wi, j,k + ∈
latent vector of size 512 and the output of p1 is fed to
map2style block from (8-18) which too produces 512 Also, another feature called path length regularization
dimension latent vector. Thus age encoder in total has been introduced that motivates a constant step in
produces 18*512 extended latent vector. This latent W+ to get in a non-zero that is a shift of constant
vector is called extended because it is in styleGAN magnitude in image generated by generator.
domain.
StyleGAN2 leverage the use of residual connections
with down-sampling in the discriminator and skip
3.2 StyleGAN Generator connections in the generator with up-sampling. At the
beginning of training time period, the contribution of
The StyleGAN generator is a pre-trained styleGAN2
low-resolution layers is large and subsequently the
generator. The input to this is 18 style latent vector
high –resolution layers take over.
from the map to style block. W space from map to
style is separated from the image space, where the As a result, the generator starts with a learning
factor of variation is more linear in nature. In the constant and then proceeds through a sequence of
1012
Proceedings of 12th IOE Graduate Conference
blocks, with the feature map being doubled at each process repeat until the achievable loss saturates/
block. Each block generates an RGB picture, which is converge. Hence, the approach for discriminator
then scaled and summed to give the final full network to make generator produce more realistic
resolution RGB image. data is done by age predictor, thus this is regression in
GAN[15].
The 18 style help us to control the feature of the
generated the output image. The coarse style helps to
control pose, hair, face shape, similarly middle feature 3.3 Training Objective function
helps control feature such as eyes and fine styles helps
The training objective for the proposed GAN network
to control color scheme of face. Latent vector 8, 9 is
is the sum of forward loss and cyclic loss which in
used to control the hair style and color of the source
aggregate need to be minimized.
image given target image.
The problem of face aging is solved by using
3.4 L2 loss
conditional GAN regression model. Here, the
condition is target age that the model try to convert L2 loss is the MSE loss to learn similarities at pixel
the input image to the aged face image of target age. level between the input image and the target aged face
This model takes the estimated age of the image image. As age grow, the shape of face increase, thus
generated by the generator estimated by state of art this encourage to put higher weight for this loss.
predtrained model. The L2 loss between the target age
and the age predicted by age predictor is used too L2 (Itarget age ) =∥ I − Net(Itarget age ) ∥2 (6)
used train the encoder network of the face aging
model. Mathematically,
3.5 Cropped L2 loss
Aging loss =∥ target age−AP(Net(Itarget age ) ∥2 (5) In this, l2 loss is calculated for the cropped part of face
image to give more significance to center face.The
In vanilla GAN, the discriminator compare the
cropped has been as taken as image[13:227,15:229,:]
generated image(Fake) with the target image(True) to that is cropped image consider 13 to 227 rows of image
compute BCE/MAE/MSE loss. Here, the age and 15 to 229 column for all 3 channel of image.The
predictor use the generated image to predict estimated training image and its cropped part is discussed in
age. Then L2 lossloss is calculated by comparing the datasets section.
estimated age with target age, the so calculated loss
with other losses together is fed to age encoder
network to update the weight of the model. The L2 (Itarget age )cropped =∥ I − Net(Itarget age )cropped ∥2 (7)
1013
Generating Progressed Face Ageing Image using Generative Adversarial Network
1014
Proceedings of 12th IOE Graduate Conference
1015
Generating Progressed Face Ageing Image using Generative Adversarial Network
1016