0% found this document useful (0 votes)
1 views56 pages

APM 598 Final Project Report

This paper discusses the applications of generative image models, focusing on image restoration, upscaling, and neural style transfer. It explains how these models, particularly Variational Autoencoders (VAEs) and Generative Adversarial Networks (GANs), are utilized to enhance image quality and apply artistic styles. The implementation details include methods for restoring old images, improving resolution, and transferring artistic styles using advanced neural network architectures.

Uploaded by

Akshat Sharma
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
1 views56 pages

APM 598 Final Project Report

This paper discusses the applications of generative image models, focusing on image restoration, upscaling, and neural style transfer. It explains how these models, particularly Variational Autoencoders (VAEs) and Generative Adversarial Networks (GANs), are utilized to enhance image quality and apply artistic styles. The implementation details include methods for restoring old images, improving resolution, and transferring artistic styles using advanced neural network architectures.

Uploaded by

Akshat Sharma
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 56

Applications of Generative Image Models

Akshat Sharma Aradhita Sharma Apoorva Uplap


[email protected] [email protected] [email protected]

Abstract

This paper describes the applications of generative models to generate


new images. Generative models are a class of statistical models which
involve using a model to generate new examples that plausibly come
from an existing distribution of samples. Our project focuses mainly on
three applications of generative image models that are 1) image
restoration, which is performed to restore the old deteriorated image 2)
image upscaling, which is performed to enlarge the restored image 3)
neural style transfer, which is performed to give an artistic touch to the
images.

1 Introduction
Generative models have lately made news for applications like creating fake images and
swapping faces in celebrity images, but this application possesses a serious social
challenge to discriminate between real and fake images. Generative models are created
from unsupervised learning (analyze structure with unlabeled data), and once the
structure is learnt, a new set of data can be created that does not exist. The two popular
kinds of generative models are Generative Adversarial Network (GAN) and Variational
Autoencoder (VAE). This project focuses on explaining three key applications of
generative modeling for images :
1. Image Restoration : Restoring old degraded photos by using variational
autoencoders.
2. Image Upscaling : Enlarging and enhancing a small image using enhanced super
resolution GAN.
3. Neural Style Transfer : Generates a digital image which adopts the style of a
different image.

2 Implementation
We implemented the project to test the state of art generative networks for image
restoration, image upscaling using super-resolution and then using style transfer to give
artistic touch to the images. Further details for each of the methods above is provided
now :

2.1 Microsoft Image Restoration


Research has shown that looking back at old photos triggers the feelings of happiness and
love emotions, and reminiscing the special moments turns out to be more relaxing than
meditating [5]. But old printed photos before the digital era are deteriorating because of
aging and improper handling. There are methods to restore these photos using image
processing, but it is difficult to construct a signal dependent noise filtering model or a
generalized image restoration model. Hence, an attempt is made to restore these old
images using deep learning approach to construct a generalized model.

2.1.1 Previous image restoration methods

Image degradation is classified into structured and unstructured degradation. Blurr,


camera misfocus, color fading, noise and low resolution are examples of unstructured
degradation, while patches, holes, marks and scratches are examples of structured
degradation, which is more challenging to deal with than the former. Image denoising,
deblurring, and local smoothness can be used to repair both organized and unstructured
degradation in old images. However, there is little concern for repairing color fading or
poor resolution issues, and thus restored photos appear outdated.

Real old photos are a mixture of unknown degradation (which can be any
combination of structured or unstructured degradation), which is difficult to be
characterized accurately, making it more difficult to create a degradation model which
can realistically render the old image artifact. Hence, there is a need to construct a
degradation model, which includes real degraded images and synthetic generated data.

2.1.2 Latent space translation

Previous deep learning methods used supervised learning which did not give good results
for real old photos because the degradation model consisted of synthetic generated data of
degraded photos, which were nowhere similar to the real old photos. As a result of this, a
domain gap is created between real old photos and the photos synthesized for training the
model. To reduce this domain gap, triplet domain translation is used to bridge between
domains of real old photos ( R ), synthetic photos (X) constructed for training, and
ground truth domain (Y) consisting of images without degradation. r, x, y are denoted as
images where r ∈ R, x ∈ X and y ∈ Y domains and mapping is done to corresponding
latent spaces through
ER: R → ZR, EX: X → ZX, EY: Y → Z Y
The latent space of synthetic photos and real old photos are oriented in the
shared domain such that ZR , ZX are close to each other (ZR ≈ ZX) using variational
autoencoders (VAEs). This shared latent space is used for performing image restoration.
After latent space translation, real old photos “r” can be restored by sequentially
performing the mappings,
rR→Y = GY ◦ TZ ◦ ER(r)

Figure 1 : Latent translation method for three domains [6]


2.1.3 Variational Autoencoders (VAEs)

Autoencoders take high dimensional input data, compress it by passing it through encoder
to create a smaller representation (less dimension than the input) known as a
"(bottleneck) latent space representation” and is given as input to decoders to reconstruct
the high dimensional data. This reconstructed data and input data are compared to obtain
the error function, and an iterative optimisation method is used for training the neural
network in order to generate compressed output images. But this case produces
unrealistic images when expected variation because of the discontinuities in its latent
space representation.

The variational autoencoder has a continuous latent space representation, because


in this case, the encoded distribution is regularized during training. Hence, VAEs can
generate new data with different variations. In VAEs, rather than a single point, the input
is encoded as a distribution (near to a standard normal distribution) throughout the latent
space. Here, the bottleneck vector is replaced by two separate independent vectors
representing standard deviation (σ) and mean (μ) of the encoded distribution. Samples
from this distribution (sampled latent vector) are fed to the decoder network, which acts
as a generator. Difference between the learned distribution and standard normal
distribution is the loss function which is minimized along with the reconstruction loss
function. In order to train this network, reparameterization is performed which includes a
fixed space standard distribution ϵ which is randomly sampled and not learnt during back
propagation, whereas mean and standard deviations are updated in each iteration. The
update is denoted as z=μ+σ⊙ϵ.

Figure 2 : Variational Autoencoder network diagram [7]

2.1.4 Image restoration through latent mapping

The given figure describes the architecture of the proposed network. Here, two VAEs are
used to generate variational data for two cases. One VAE consists of old photos “R” and
synthetic photos “X” where ER,X is the encoder and GR,X is the generator, which share old
photos and synthetic photos such that both the degraded images can be mapped into this
shared latent space. The other VAE is used for the output image “Y” where the encoder
and generator are EY , GY. VAEs are used because it learns the mapping of old photos and
synthetic photos and generalizes well to real photos by reducing the domain gap.
Afterwards, image restoration is performed for the synthetic pair {X,Y} using mapping T
which include “ResBlocks” and “Partial Nonlocal Blocks”. Nonlocal blocks deal with
structured degradation (patches, holes, scratches) and ResBlocks deal with unstructured
degradation (blurr, color fading, noise, low resolution). The combination of these blocks
enhances the capability of latent space.. Given the latent space ZR ≈ ZX , the generator GY
always generates a completely clean image without degradation.

Figure 3 : Image restoration network [6]

2.1.5 Face refinement network

It is assumed that the old photos reminisce about the special moments, which include the
faces of the loved ones. When generating synthetic images, sometimes unwanted textures
are observed on generated faces. Therefore, a face refinement network is included to
retrieve fine details of faces present in the old photos in the latent space “z”. As a result,
the perceptual quality of the faces is greatly enhanced.

Figure 4 : Face refinement network to enhance the quality of faces [8]


2.2 Image Up-Scaling

In many CSI movies, there's that scene where someone finds a small and obscured image,
and they get a clear picture out of it by zooming and enhancing it. Is this really possible?
Mostly no, those movies are nowhere near technically accurate. But, to some extent, yes.
It is indeed possible to enlarge and enhance images. The process of upscaling and
enhancing an image is called super-resolution.

2.2.1 Initial Ideas for Image Up-Scaling (Using SR-CNNs)

In information theory, there's a concept called data processing inequality. It states that
whatever way you process data, we cannot add information that is not already there.
This implies that missing data cannot be recovered by further processing. Does that mean
super-resolution is theoretically impossible? Not if we have an additional source of
information.
A neural network can learn to hallucinate details based on some prior information
it collects from a large set of images. The details added to an image this way would still
not violate the data processing inequality. Because the information is there, somewhere
in the training set, even if it's not in the input image. First, we can create a dataset by
collecting high-resolution images and downscaling them, or we can simply use one of
the existing super-resolution datasets, such as the DIV2K dataset. Then, we can build a
convolutional neural network that would input only the low-resolution images, and we
can train it to produce higher resolution images that match the original ones the best. As
shown in figure 2, The SRCNN[1] paper simply minimized the squared difference
between the pixel values to produce images that are as close as possible to the original
high-resolution images.

Figure 5 : Super Resolution-CNN

Before understanding super-resolution using GANs it will be good to know more about
how Generative Adversarial Networks work in general.

2.2.2 Generative Adversarial Networks (General Idea)

In GANs [2], we have two neural networks, one is Generator and other is a
Discriminator. The generator tries to generate a high resolution image and the
discriminator tries to determine whether or not it's real or not. Imagine that there is a
counterfeiter and he wants to create an image that looks identical to the real image but
obviously it's fake, so he takes it over to a pawn shop to try to get some money for it.
The store owner then tries to critique that artwork to determine whether or not it's real.
This is exactly how GANs work. The counterfeiter in this case is the generator and the
critic is the discriminator. We feed in low resolution images to the generator and it
creates a high resolution image or the artwork and then our discriminator tries to tell if it
is fake or real. As it can be seen in figure-1, there are two models (both are neural
networks). Generator receives the input Z (which can be a low res image) and then
outputs X̂. X̂ is fed to the discriminator network where it calculates the distance between
X̂ and X where X is the real high-res image, thus regarding X̂ as fake or real.

Figure 6 : Generative Adversarial Networks

Loss is such that the generator is incentivized to generate X̂ such that X̂=X and
the discriminator is incentivized to be able to differentiate between X̂ and X.
Theoretically, the generator will become so good that it will be able to generate X̂ such
that it is the same as X and the discriminator will say X̂ is real every time.

2.2.3 Super Resolution-GAN

There's a GAN-based super-resolution system called SRGAN [3]. It uses a generator


network that inputs low-resolution images and tries to produce their high-resolution
versions. It also uses a discriminator network that tries to tell whether this is a real
high-resolution image or an image upscaled by the generator. Both networks are trained
simultaneously, and they both get better over time. Once the training is done, all we need
is the generator part to upscale low-resolution images. In addition to this adversarial
training setup, SRGAN also used a VGG-based loss function.
Figure 7 : SRGAN architecture

2.2.3 Enhanced Super Resolution-GAN

There's another paper called Enhanced SRGAN [4], which proposes a few tricks to
improve the results further. Enhanced-SRGAN, or ESRGAN for short, somehow got
popular in the gaming community. It was used for upscaling vintage games, and it
worked pretty well. It's surprising how well it worked on video game graphics despite
being trained only on natural images.

Figure 8 : ESR-GAN architecture

One of the enhancements made was the removal of batch normalization layers in
their network architecture. Batch normalization does help a lot for many computer vision
tasks. But for image-processing related tasks, such as super-resolution or image
restoration in general, batch normalization can create some artifacts. Researchers also
added more layers and connections to this model architecture. It's not surprising that a
more sophisticated model resulted in better images, but deeper models can be trickier to
train, especially if they are not using batch normalization layers. So, the authors of
ESRGAN used some tricks like residual scaling to stabilize the training of such a
network. In addition to the changes in the model architecture, they also modified the loss
functions. We have used ESR-GAN for implementation of super-resolution.
2.3 Neural Style Transfer

In this section, we take an artistic image (style) such as a Van Gogh painting or a
psychedelic image and capture the features from it. The style is then applied to a
seemingly normal photograph (content) and we can visualize the artistic results. The
motivation to obtain such a style transfer image is to imagine how a person would be
painted by Van Gogh or for purely artistic/curiosity purposes.

2.3.1 Fast Style Transfer with TF-Hub

The model available in TF-Hub was built by the team at Google Brain [9], which was
trained on the ImageNet dataset [10] for content images and the Kaggle Painter by
Numbers dataset [11] along with the Describable Textures Dataset [12] for style images.
The models consist of two networks, one for style prediction and another for style
transfer. The Style Prediction Network is loosely based on the Inception-v3 architecture
[13] which predicts and embedding vector 𝑆 which is the input for the Style transfer
network along with the content image. The Style Transfer Network largely follows [14].
The objective for style transfer model is to minimize:
ℒ𝑐(𝑥, 𝑐) + λ𝑠ℒ𝑠(𝑥, 𝑠)
ℒ𝑐 is the content loss and ℒ𝑠 is the style loss while λ𝑠 is a lagrangian multiplier that
weights the relative strength of the style loss. The content and style losses are defined as

ℒ𝑐 = ∑
𝑗𝜖𝐶
1
𝑛𝑗 ||𝑓𝑗(𝑥) − 𝑓𝑗(𝑐) ||
2
2

ℒ𝑠 = ∑
𝑖𝜖𝑆
1
𝑛𝑖 ||𝐺[𝑓𝑖(𝑥)] − 𝐺[𝑓𝑖(𝑠)] ||
𝐹

where 𝑓𝑙(𝑥) are the network activations in lth layer, nl is the number of units in lth layer
and 𝐺[𝑓𝑙(𝑥)] is the square and symmetric gram matrix that measures the spatially
averaged correlation structure across the filters for the lth layer activations.

2.3.2 Style Transfer with VGG19

Here, we implemented style transfer using the pretrained VGG-19 [15] network. First we
load the content image and test the VGG19 network to check whether the correct label is
predicted by the image classification model. We then load the VGG19 network without
the classification head, take the intermediate layers of it and use them to represent the
content and style images which is equivalent to the latent space representation of
generative networks. We can do this as somewhere between the model before the
classification label is predicted, the model acts as a feature extractor. By using the
intermediate layers we describe the content and style of the input images. The content of
an image is given by the intermediate feature map values, the style of the image by the
means and correlations across various feature maps. After building the model for content
and style tensor extractor, we run gradient descent with Adam optimizer by setting style
and content weights. We also regularize the high frequency terms of the image which is
also called total variation loss which is basically an edge detector. We use the inbuilt
function for the total variation loss in TensorFlow for this.
3 Results

3.1 Image Restoration

In this project, we give the old degraded photo as the initial input and the system removes
the unstructured degradation and gives the clean restored image as output. In order to
restore the structured degraded images, we need to specify to the system that the image
contains scratches such that the system deals with both the unstructured and structured
degradation and gives a clean output image.

3.1.1 Restoration results for unstructured degraded images


It is seen in the resulting images that color of image is restored, noise and blur is
removed, face is enhanced, and we obtain a clean image.

Table 1 : Results of image restoration for unstructured degradation


Degraded Image Restored Image Degraded Image Restored Image

3.1.2 Restoration results for structured degraded images (including


scratches)
It is seen in the resulting images that scratches and rough patches present in the degraded
image are fixed, color is restored, face is enhanced, and we obtain a clean image.
Table 2 : Results of image restoration for structured degradation
Degraded Image Restored Image

3.2 Image Upscaling using ESR-GAN

We took the restored image and passed it through the ESR-GAN generator. The results
were as follows -
Figure 9 : Low Resolution Image input to ESRGAN

Figure 10 : High Resolution Output from ESRGAN


3.3 Style Transfer

Content Image Psychedelic Style TF-Hub output Our output

Figure 11 : Style transfer on content image with TF-Hub and our model

We used the content-style image pairs as given in figure 12.

Figure 12 : Content-style image pairs


The content images in order are of an eagle, the Taj Mahal, a self-clicked photograph, a
car and the Golden State Bridge obtained from [16].
The Style images in order are of a psychedelic poster and famous paintings such as the
Starry Night by Van Gogh, the Guernica by Picasso, the Weeping Woman by Picasso, A
Sunday Afternoon on the Island of La Grande Jatte by Georges Seurat and the Garden of
Earthly Delights by Heironymous Bosch obtained from [16].
We can see the gradual stylization of each content image by printing it at different
epochs.The following results showcase the progress of stylization:

Figure 13 : Gradual stylization of each content image at different epochs


4 Experiments

We conducted some experiments to test our model and find some empirical properties of
the generative models that we have used for image restoration, image up-scaling and style
transfer.

4.1 Image Restoration

Apart from the old images available in Microsoft's dataset, we tested this model with our
old printed images to observe the model’s efficiency. It is observed in figure 14 that faces
present in the images are enhanced, color is fixed however, scratches, and unwanted
patches are not completely removed (as observed in the first restored image top right
corner, and in the left center of the second restored image). Datasets including degraded
images with such types of patches can help improve the model.

Degraded Image Restored Image

Figure 14 : Image restoration results for our old degraded photos

4.2 Image Up-Scaling

We conducted an experiment to see how the upscaled image compares to the original
image. We fed the image as given in figure 15 as the input to the network. The upscaled
image obtained is shown in figure 16.
Figure 15 : Input image given to the model
Figure 16 : Upscaled output image
If we take a close look at the face in figure 17, we have interesting observations -

Figure 17 : Side by side comparison of ESRGAN output and input

The ESRGAN applies some smoothening and paints in some of the details. We can
clearly see that the teeth are not visible in the input image. The generative model has
painted in the details (in this case teeth), which is undesirable. Also taking a closer look
at the right ear, we can see that the model has failed to draw the ear properly. This can
give us some insights about the validity of our model and what kind of training data
should be used for further training the model. ESRGAN was trained for anime upscaling.
Training the ESR-GAN model more on real life dataset rather than on anime dataset can
help alleviate these problems.

4.3 Style Transfer

We conducted experiments to see how different content images are stylized using just one
style image. Following are the content images used from self-photographs and
screenshots [18] [19] while the style is Psychedelic as shown in figure 18. The style vs
content loss for each is also plotted.

Figure 18 : Style image


Figure 19 : Psychedelic style transfer on different input images.

As we can see from using the same style for different content images, the features of the
style towards the outer layers (bubbles) are applied to the outer parts of the content image
in different areas. The style transfer is not uniform for all content images and it is not a
simple superimposition of the content and style images. The style features are extracted
and transferred in varying degrees and orientations according to the content image
features.
5 Discussion and Conclusion

5.1 Image Restoration

From our study, it can be concluded that the use of VAEs have helped in reducing the
domain gap to generate realistic clean images out of old degraded images. This model
works efficiently for restoring old degraded images which consist of unstructured
degradations. However, it is not efficient for some types of structured degradations
(patches). We are using the Google Colab platform that entirely runs in the cloud and
provides access to GPU. It is seen that CUDA runs out of memory when large file size
images are given input to this model. Hence, this model can be improved to deal with
large file size images and function well on local machines.

5.2 Image Up-Scaling

After the experiments, it can be concluded that although the results of ESR-GAN are
pretty good, there is still some room for improvement, especially relating to the datasets
on which the model is being trained. Also, since the training of GAN’s is extremely hard,
the issues stated in the experiments can also arise due to overfitting. To conclude, the
results are pretty good right now and will only get better with more research.

5.3 Style Transfer


After experimenting with different content and style images, we conclude that style
transfer is a very tool for artists, designers and the curious minds. Elmyr de Hory made
millions by forging artwork and selling it to art dealers and museums. The skill that is
needed by a forger can now be emulated by computers to capture the artistic style of a
painting and apply it to different photographs. One can visualize the artwork of long lost
painters and need not go to museums or galleries and can simply try out style transfer on
their own images. This convenient tool can be included in the various filters and layouts
that a modern smartphone camera or any social media platform has. We observed that
style transfer can be best visualized for a content image that has contrast colors than the
applied style. Also, sometimes only the predominant features in the art may be captured
and the transfer of style may also be done only partially when the image and style have
similar features.

6 References
[1] Image Super-Resolution Using Deep Convolutional Networks,
https://arxiv.org/abs/1501.00092
[2] Generative Adversarial Networks, https://arxiv.org/abs/1406.2661
[3] Photo-Realistic Single Image Super-Resolution Using a Generative Adversarial Network,
https://arxiv.org/abs/1609.04802
[4] ESRGAN: Enhanced Super-Resolution Generative Adversarial Networks,
https://arxiv.org/abs/1809.00219
[5] Digitalcameraworld.com, “Its official looking at old photos is more relaxing than meditating”,
https://www.digitalcameraworld.com/news/its-official-looking-at-old-photos-is-more-relaxing-tha
n-meditating
[6] Wan, Ziyu & Zhang, Bo & Chen, Dongdong & Zhang, Pan & Chen, Dong & Liao, Jing &
Wen, Fang. (2020). Bringing Old Photos Back to Life. 2744-2754.
10.1109/CVPR42600.2020.00282.
[7] towardsdatascience.com, “Intuitively understanding variational autoencoders ”
https://towardsdatascience.com/intuitively-understanding-variational-autoencoders-1bfe67eb5daf
[8] Z. Wan et al., "Old Photo Restoration via Deep Latent Space Translation," in IEEE
Transactions on Pattern Analysis and Machine Intelligence, doi: 10.1109/TPAMI.2022.3163183.
[9] Golnaz Ghiasi, Honglak Lee, Manjunath Kudlur, Vincent Dumoulin, Jonathon Shlens.
Exploring the structure of a real-time, arbitrary neural artistic stylization network. Proceedings of
the British Machine Vision Conference (BMVC), 2017.
[10] ImageNet datasethttps://www.image-net.org/
[11] Kaggle Painter by Numbers https://www.kaggle.com/competitions/painter-by-numbers/data
[12] Describable Textures Dataset https://www.robots.ox.ac.uk/~vgg/data/dtd/
[13] C. Szegedy, V. Vanhoucke, S. Ioffe, J. Shlens, and Z. Wojna. Rethinking the inception
architecture for computer vision. IEEE Computer Vision and Pattern Recognition (CVPR), 2015.
[14] V. Dumoulin, J. Shlens, and M. Kudlur. A learned representation for artistic style.
International Conference of Learned Representations (ICLR), 2016.
[15] Simonyan, Karen, and Andrew Zisserman. "Very deep convolutional networks for large-scale
image recognition." arXiv preprint arXiv:1409.1556 (2014).
[16] https://www.freeimages.com/
[17] C. Irving. Fake: the story of Elmyr de Hory: the greatest art forger of our time. McGraw-Hill,
1969.
[18] Eichiro Oda, TOEI, FUNimation Entertainment (Firm),. (2022). One piece: Episode 1015.
[19] Hajime Isayama, Wit Studios, (Firm),. (2013). Shingeki no Kyojin: Episode 12.
Appendix
Submitted to : Sebastien Motsch, Submitted by : Akshat Sharma ([email protected]) ,
Aradhita Sharma ([email protected]), Apoorva Uplap ([email protected])
#◢ Microsoft's Image Restoration
!git clone https://github.com/microsoft/Bringing-Old-Photos-Back-to-
Life.git photo_restoration

Cloning into 'photo_restoration'...


remote: Enumerating objects: 498, done.ote: Total 498 (delta 0),
reused 0 (delta 0), pack-reused 498

#◢ Set up the environment


# pull the syncBN repo
%cd photo_restoration/Face_Enhancement/models/networks
!git clone https://github.com/vacancy/Synchronized-BatchNorm-PyTorch
!cp -rf Synchronized-BatchNorm-PyTorch/sync_batchnorm .
%cd ../../../

%cd Global/detection_models
!git clone https://github.com/vacancy/Synchronized-BatchNorm-PyTorch
!cp -rf Synchronized-BatchNorm-PyTorch/sync_batchnorm .
%cd ../../

# download the landmark detection model


%cd Face_Detection/
!wget http://dlib.net/files/shape_predictor_68_face_landmarks.dat.bz2
!bzip2 -d shape_predictor_68_face_landmarks.dat.bz2
%cd ../

# download the pretrained model


%cd Face_Enhancement/
!wget
https://facevc.blob.core.windows.net/zhanbo/old_photo/pretrain/Face_En
hancement/checkpoints.zip
!unzip checkpoints.zip
%cd ../

%cd Global/
!wget
https://facevc.blob.core.windows.net/zhanbo/old_photo/pretrain/Global/
checkpoints.zip
!unzip checkpoints.zip
%cd ../

/content/photo_restoration/Face_Enhancement/models/networks
Cloning into 'Synchronized-BatchNorm-PyTorch'...
remote: Enumerating objects: 188, done.ote: Counting objects: 100%
(27/27), done.ote: Compressing objects: 100% (17/17), done.ote: Total
188 (delta 10), reused 27 (delta 10), pack-reused 161odels
Cloning into 'Synchronized-BatchNorm-PyTorch'...
remote: Enumerating objects: 188, done.ote: Counting objects: 100%
(27/27), done.ote: Compressing objects: 100% (17/17), done.ote: Total
188 (delta 10), reused 27 (delta 10), pack-reused 161arks.dat.bz2
Resolving dlib.net (dlib.net)... 107.180.26.78
Connecting to dlib.net (dlib.net)|107.180.26.78|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 64040097 (61M)
Saving to: ‘shape_predictor_68_face_landmarks.dat.bz2’

shape_predictor_68_ 100%[===================>] 61.07M 49.5MB/s in


1.2s

2022-04-29 18:25:28 (49.5 MB/s) -


‘shape_predictor_68_face_landmarks.dat.bz2’ saved [64040097/64040097]

/content/photo_restoration
/content/photo_restoration/Face_Enhancement
--2022-04-29 18:25:41--
https://facevc.blob.core.windows.net/zhanbo/old_photo/pretrain/Face_En
hancement/checkpoints.zip
Resolving facevc.blob.core.windows.net
(facevc.blob.core.windows.net)... 20.150.78.196
Connecting to facevc.blob.core.windows.net
(facevc.blob.core.windows.net)|20.150.78.196|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 684354563 (653M) [application/x-zip-compressed]
Saving to: ‘checkpoints.zip’

checkpoints.zip 100%[===================>] 652.65M 18.5MB/s in


28s

2022-04-29 18:26:09 (23.3 MB/s) - ‘checkpoints.zip’ saved


[684354563/684354563]

Archive: checkpoints.zip
creating: checkpoints/
creating: checkpoints/Setting_9_epoch_100/
inflating: checkpoints/Setting_9_epoch_100/latest_net_G.pth
creating: checkpoints/FaceSR_512/
inflating: checkpoints/FaceSR_512/latest_net_G.pth
/content/photo_restoration
/content/photo_restoration/Global
--2022-04-29 18:26:17--
https://facevc.blob.core.windows.net/zhanbo/old_photo/pretrain/Global/
checkpoints.zip
Resolving facevc.blob.core.windows.net
(facevc.blob.core.windows.net)... 20.150.78.196
Connecting to facevc.blob.core.windows.net
(facevc.blob.core.windows.net)|20.150.78.196|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 2036400762 (1.9G) [application/x-zip-compressed]
Saving to: ‘checkpoints.zip’

checkpoints.zip 100%[===================>] 1.90G 23.1MB/s in


97s

2022-04-29 18:27:54 (20.0 MB/s) - ‘checkpoints.zip’ saved


[2036400762/2036400762]

Archive: checkpoints.zip
creating: checkpoints/
creating: checkpoints/restoration/
creating: checkpoints/restoration/VAE_B_scratch/
inflating: checkpoints/restoration/VAE_B_scratch/latest_net_G.pth
inflating:
checkpoints/restoration/VAE_B_scratch/latest_optimizer_G.pth
inflating:
checkpoints/restoration/VAE_B_scratch/latest_optimizer_D.pth
inflating: checkpoints/restoration/VAE_B_scratch/latest_net_D.pth
creating: checkpoints/restoration/VAE_A_quality/
inflating: checkpoints/restoration/VAE_A_quality/latest_net_G.pth
inflating:
checkpoints/restoration/VAE_A_quality/latest_net_featD.pth
inflating:
checkpoints/restoration/VAE_A_quality/latest_optimizer_G.pth
inflating:
checkpoints/restoration/VAE_A_quality/latest_optimizer_D.pth
inflating:
checkpoints/restoration/VAE_A_quality/latest_optimizer_featD.pth
inflating: checkpoints/restoration/VAE_A_quality/latest_net_D.pth
creating: checkpoints/restoration/mapping_Patch_Attention/
inflating:
checkpoints/restoration/mapping_Patch_Attention/latest_net_mapping_net
.pth
inflating:
checkpoints/restoration/mapping_Patch_Attention/latest_net_D.pth
creating: checkpoints/restoration/mapping_quality/
inflating:
checkpoints/restoration/mapping_quality/latest_net_mapping_net.pth
inflating:
checkpoints/restoration/mapping_quality/latest_optimizer_mapping_net.p
th
inflating:
checkpoints/restoration/mapping_quality/latest_optimizer_D.pth
inflating: checkpoints/restoration/mapping_quality/latest_net_D.pth

creating: checkpoints/restoration/mapping_scratch/
inflating:
checkpoints/restoration/mapping_scratch/latest_net_mapping_net.pth
inflating:
checkpoints/restoration/mapping_scratch/latest_optimizer_mapping_net.p
th
inflating:
checkpoints/restoration/mapping_scratch/latest_optimizer_D.pth
inflating: checkpoints/restoration/mapping_scratch/latest_net_D.pth

creating: checkpoints/restoration/VAE_B_quality/
inflating: checkpoints/restoration/VAE_B_quality/latest_net_G.pth
inflating:
checkpoints/restoration/VAE_B_quality/latest_optimizer_G.pth
inflating:
checkpoints/restoration/VAE_B_quality/latest_optimizer_D.pth
inflating: checkpoints/restoration/VAE_B_quality/latest_net_D.pth
creating: checkpoints/detection/
inflating: checkpoints/detection/FT_Epoch_latest.pt
/content/photo_restoration

! pip install -r requirements.txt

Requirement already satisfied: torch in /usr/local/lib/python3.7/dist-


packages (from -r requirements.txt (line 1)) (1.11.0+cu113)
Requirement already satisfied: torchvision in
/usr/local/lib/python3.7/dist-packages (from -r requirements.txt (line
2)) (0.12.0+cu113)
Requirement already satisfied: dlib in /usr/local/lib/python3.7/dist-
packages (from -r requirements.txt (line 3)) (19.18.0)
Requirement already satisfied: scikit-image in
/usr/local/lib/python3.7/dist-packages (from -r requirements.txt (line
4)) (0.18.3)
Requirement already satisfied: easydict in
/usr/local/lib/python3.7/dist-packages (from -r requirements.txt (line
5)) (1.9)
Requirement already satisfied: PyYAML in
/usr/local/lib/python3.7/dist-packages (from -r requirements.txt (line
6)) (3.13)
Collecting dominate>=2.3.1
Downloading dominate-2.6.0-py2.py3-none-any.whl (29 kB)
Requirement already satisfied: dill in /usr/local/lib/python3.7/dist-
packages (from -r requirements.txt (line 8)) (0.3.4)
Collecting tensorboardX
Downloading tensorboardX-2.5-py2.py3-none-any.whl (125 kB)
ent already satisfied: scipy in /usr/local/lib/python3.7/dist-packages
(from -r requirements.txt (line 10)) (1.4.1)
Requirement already satisfied: opencv-python in
/usr/local/lib/python3.7/dist-packages (from -r requirements.txt (line
11)) (4.1.2.30)
Collecting einops
Downloading einops-0.4.1-py3-none-any.whl (28 kB)
Collecting PySimpleGUI
Downloading PySimpleGUI-4.59.0-py3-none-any.whl (493 kB)
ent already satisfied: typing-extensions in
/usr/local/lib/python3.7/dist-packages (from torch->-r
requirements.txt (line 1)) (4.2.0)
Requirement already satisfied: requests in
/usr/local/lib/python3.7/dist-packages (from torchvision->-r
requirements.txt (line 2)) (2.23.0)
Requirement already satisfied: numpy in /usr/local/lib/python3.7/dist-
packages (from torchvision->-r requirements.txt (line 2)) (1.21.6)
Requirement already satisfied: pillow!=8.3.*,>=5.3.0 in
/usr/local/lib/python3.7/dist-packages (from torchvision->-r
requirements.txt (line 2)) (7.1.2)
Requirement already satisfied: tifffile>=2019.7.26 in
/usr/local/lib/python3.7/dist-packages (from scikit-image->-r
requirements.txt (line 4)) (2021.11.2)
Requirement already satisfied: imageio>=2.3.0 in
/usr/local/lib/python3.7/dist-packages (from scikit-image->-r
requirements.txt (line 4)) (2.4.1)
Requirement already satisfied: networkx>=2.0 in
/usr/local/lib/python3.7/dist-packages (from scikit-image->-r
requirements.txt (line 4)) (2.6.3)
Requirement already satisfied: PyWavelets>=1.1.1 in
/usr/local/lib/python3.7/dist-packages (from scikit-image->-r
requirements.txt (line 4)) (1.3.0)
Requirement already satisfied: matplotlib!=3.0.0,>=2.0.0 in
/usr/local/lib/python3.7/dist-packages (from scikit-image->-r
requirements.txt (line 4)) (3.2.2)
Requirement already satisfied: pyparsing!=2.0.4,!=2.1.2,!
=2.1.6,>=2.0.1 in /usr/local/lib/python3.7/dist-packages (from
matplotlib!=3.0.0,>=2.0.0->scikit-image->-r requirements.txt (line 4))
(3.0.8)
Requirement already satisfied: kiwisolver>=1.0.1 in
/usr/local/lib/python3.7/dist-packages (from matplotlib!
=3.0.0,>=2.0.0->scikit-image->-r requirements.txt (line 4)) (1.4.2)
Requirement already satisfied: cycler>=0.10 in
/usr/local/lib/python3.7/dist-packages (from matplotlib!
=3.0.0,>=2.0.0->scikit-image->-r requirements.txt (line 4)) (0.11.0)
Requirement already satisfied: python-dateutil>=2.1 in
/usr/local/lib/python3.7/dist-packages (from matplotlib!
=3.0.0,>=2.0.0->scikit-image->-r requirements.txt (line 4)) (2.8.2)
Requirement already satisfied: six>=1.5 in
/usr/local/lib/python3.7/dist-packages (from python-dateutil>=2.1-
>matplotlib!=3.0.0,>=2.0.0->scikit-image->-r requirements.txt (line
4)) (1.15.0)
Requirement already satisfied: protobuf>=3.8.0 in
/usr/local/lib/python3.7/dist-packages (from tensorboardX->-r
requirements.txt (line 9)) (3.17.3)
Requirement already satisfied: chardet<4,>=3.0.2 in
/usr/local/lib/python3.7/dist-packages (from requests->torchvision->-r
requirements.txt (line 2)) (3.0.4)
Requirement already satisfied: idna<3,>=2.5 in
/usr/local/lib/python3.7/dist-packages (from requests->torchvision->-r
requirements.txt (line 2)) (2.10)
Requirement already satisfied: certifi>=2017.4.17 in
/usr/local/lib/python3.7/dist-packages (from requests->torchvision->-r
requirements.txt (line 2)) (2021.10.8)
Requirement already satisfied: urllib3!=1.25.0,!=1.25.1,<1.26,>=1.21.1
in /usr/local/lib/python3.7/dist-packages (from requests->torchvision-
>-r requirements.txt (line 2)) (1.24.3)
Installing collected packages: tensorboardX, PySimpleGUI, einops,
dominate
Successfully installed PySimpleGUI-4.59.0 dominate-2.6.0 einops-0.4.1
tensorboardX-2.5

#◢ Run the code

Restore photos (normal mode)


%cd /content/photo_restoration/
input_folder = "test_images/old"
output_folder = "output"

import os
basepath = os.getcwd()
input_path = os.path.join(basepath, input_folder)
output_path = os.path.join(basepath, output_folder)
os.mkdir(output_path)
!rm -rf /content/photo_restoration/output/*
!python run.py --input_folder
/content/photo_restoration/test_images/old --output_folder
/content/photo_restoration/output/ --GPU 0

/content/photo_restoration
Running Stage 1: Overall restoration
Mapping: You are using the mapping model without global restoration.
Now you are processing a.png
Now you are processing b.png
Now you are processing c.png
Now you are processing d.png
Now you are processing e.png
Now you are processing f.png
Now you are processing g.png
Now you are processing h.png
Finish Stage 1 ...

Running Stage 2: Face Detection


Warning: There is no face in b.png
Warning: There is no face in d.png
1
1
Warning: There is no face in e.png
1
1
Warning: There is no face in f.png
Finish Stage 2 ...

Running Stage 3: Face Enhancement


The main GPU is
0
dataset [FaceTestDataset] of size 4 was created
The size of the latent vector size is [8,8]
Network [SPADEGenerator] was created. Total number of parameters: 92.1
million. To see the architecture, do print(network).
hi :)
Finish Stage 3 ...

Running Stage 4: Blending


Warning: There is no face in b.png
Warning: There is no face in d.png
Warning: There is no face in e.png
Warning: There is no face in f.png
Finish Stage 4 ...

All the processing is done. Please check the results.

import io
import IPython.display
import numpy as np
import PIL.Image

def imshow(a, format='png', jpeg_fallback=True):


a = np.asarray(a, dtype=np.uint8)
data = io.BytesIO()
PIL.Image.fromarray(a).save(data, format)
im_data = data.getvalue()
try:
disp = IPython.display.display(IPython.display.Image(im_data))
except IOError:
if jpeg_fallback and format != 'jpeg':
print(('Warning: image was too large to display in format
"{}"; '
'trying jpeg instead.').format(format))
return imshow(a, format='jpeg')
else:
raise
return disp
def make_grid(I1, I2, resize=True):
I1 = np.asarray(I1)
H, W = I1.shape[0], I1.shape[1]

if I1.ndim >= 3:
I2 = np.asarray(I2.resize((W,H)))
I_combine = np.zeros((H,W*2,3))
I_combine[:,:W,:] = I1[:,:,:3]
I_combine[:,W:,:] = I2[:,:,:3]
else:
I2 = np.asarray(I2.resize((W,H)).convert('L'))
I_combine = np.zeros((H,W*2))
I_combine[:,:W] = I1[:,:]
I_combine[:,W:] = I2[:,:]
I_combine = PIL.Image.fromarray(np.uint8(I_combine))

W_base = 600
if resize:
ratio = W_base / (W*2)
H_new = int(H * ratio)
I_combine = I_combine.resize((W_base, H_new), PIL.Image.LANCZOS)

return I_combine

filenames = os.listdir(os.path.join(input_path))
filenames.sort()

for filename in filenames:


print(filename)
image_original = PIL.Image.open(os.path.join(input_path,
filename))
image_restore = PIL.Image.open(os.path.join(output_path,
'final_output', filename))

display(make_grid(image_original, image_restore))

a.png
b.png
c.png

d.png

e.png
f.png
g.png

h.png

Restore the photos with scratches


!rm -rf /content/photo_restoration/output/*
!python run.py --input_folder
/content/photo_restoration/test_images/old_w_scratch/ --output_folder
/content/photo_restoration/output/ --GPU 0 --with_scratch
Running Stage 1: Overall restoration
initializing the dataloader
model weights loaded
directory of testing image:
/content/photo_restoration/test_images/old_w_scratch
processing a.png
processing ab.png
processing b.png
processing c.png
processing d.png
You are using NL + Res
Now you are processing a.png
Now you are processing ab.png
Now you are processing b.png
Now you are processing c.png
Now you are processing d.png
Finish Stage 1 ...

Running Stage 2: Face Detection


1
2
1
1
Warning: There is no face in ab.png
Finish Stage 2 ...

Running Stage 3: Face Enhancement


The main GPU is
0
dataset [FaceTestDataset] of size 5 was created
The size of the latent vector size is [8,8]
Network [SPADEGenerator] was created. Total number of parameters: 92.1
million. To see the architecture, do print(network).
hi :)
Finish Stage 3 ...

Running Stage 4: Blending


Warning: There is no face in ab.png
Finish Stage 4 ...

All the processing is done. Please check the results.

%cd /content/photo_restoration/
import os
import io
import IPython.display
import numpy as np
import PIL.Image

basepath = os.getcwd()
input_folder = "test_images/old_w_scratch"
output_folder = "output"
input_path = os.path.join(basepath, input_folder)
output_path = os.path.join(basepath, output_folder)

filenames = os.listdir(os.path.join(input_path))
filenames.sort()

for filename in filenames:


print("file",filename)
image_original = PIL.Image.open(os.path.join(input_path,
filename))
image_restore = PIL.Image.open(os.path.join(output_path,
'final_output', filename))

display(make_grid(image_original, image_restore))

/content/photo_restoration
file a.png

file ab.png
file b.png

file c.png
file d.png

#◢ SUPER-RESOLUTION (ESRGAN)
import os
import time
from PIL import Image
import numpy as np
import tensorflow as tf
import tensorflow_hub as hub
import matplotlib.pyplot as plt
os.environ["TFHUB_DOWNLOAD_PROGRESS"] = "True"

# Declaring Constants
#IMAGE_PATH = "original.png"
IMAGE_PATH = "/content/photo_restoration/output/final_output/b.png"
SAVED_MODEL_PATH = "https://tfhub.dev/captain-pool/esrgan-tf2/1"
def preprocess_image(image_path):
""" Loads image from path and preprocesses to make it model ready
Args:
image_path: Path to the image file
"""
hr_image = tf.image.decode_image(tf.io.read_file(image_path))
# If PNG, remove the alpha channel. The model only supports
# images with 3 color channels.
if hr_image.shape[-1] == 4:
hr_image = hr_image[...,:-1]
hr_size = (tf.convert_to_tensor(hr_image.shape[:-1]) // 4) * 4
hr_image = tf.image.crop_to_bounding_box(hr_image, 0, 0, hr_size[0],
hr_size[1])
hr_image = tf.cast(hr_image, tf.float32)
return tf.expand_dims(hr_image, 0)

def save_image(image, filename):


"""
Saves unscaled Tensor Images.
Args:
image: 3D image tensor. [height, width, channels]
filename: Name of the file to save.
"""
if not isinstance(image, Image.Image):
image = tf.clip_by_value(image, 0, 255)
image = Image.fromarray(tf.cast(image, tf.uint8).numpy())
image.save("%s.jpg" % filename)
print("Saved as %s.jpg" % filename)

%matplotlib inline
def plot_image(image, title=""):
"""
Plots images from image tensors.
Args:
image: 3D image tensor. [height, width, channels].
title: Title to display in the plot.
"""
image = np.asarray(image)
image = tf.clip_by_value(image, 0, 255)
image = Image.fromarray(tf.cast(image, tf.uint8).numpy())
plt.imshow(image)
plt.axis("off")
plt.title(title)

hr_image = preprocess_image(IMAGE_PATH)

# Plotting Original Resolution image


plot_image(tf.squeeze(hr_image), title="Original Image")
save_image(tf.squeeze(hr_image), filename="Original Image")

Saved as Original Image.jpg


model = hub.load(SAVED_MODEL_PATH)

Downloaded https://tfhub.dev/captain-pool/esrgan-tf2/1, Total size:


20.60MB

start = time.time()
fake_image = model(hr_image)
fake_image = tf.squeeze(fake_image)
print("Time Taken: %f" % (time.time() - start))

Time Taken: 13.227104

# Plotting Super Resolution Image


plot_image(tf.squeeze(fake_image), title="Super Resolution")
save_image(tf.squeeze(fake_image), filename="Super Resolution")

Saved as Super Resolution.jpg


from google.colab.patches import cv2_imshow
import cv2

img =
cv2.imread("/content/photo_restoration/test_images/old_w_scratch/b.png
")
cv2_imshow(img)
img =
cv2.imread("/content/photo_restoration/output/final_output/b.png")
cv2_imshow(img)
img = cv2.imread("/content/photo_restoration/Super Resolution.jpg")
cv2_imshow(img)
from google.colab.patches import cv2_imshow
import cv2

img =
cv2.imread("/content/photo_restoration/test_images/old_w_scratch/ab.pn
g")
cv2_imshow(img)
img =
cv2.imread("/content/photo_restoration/output/final_output/ab.png")
cv2_imshow(img)
img = cv2.imread("/content/photo_restoration/Super Resolution.jpg")
cv2_imshow(img)
Neural Style Transfer
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive

import os
import tensorflow as tf
import matplotlib.pyplot as plt
import numpy as np
import warnings
import random
from tensorflow import keras
from keras.models import Sequential
from tensorflow.keras.layers import Dense, Dropout
import numpy as np # for math and arrays
import pandas as pd

# Load compressed models from tensorflow_hub


os.environ['TFHUB_MODEL_LOAD_FORMAT'] = 'COMPRESSED'

import IPython.display as display


import matplotlib as mpl
mpl.rcParams['figure.figsize'] = (12, 12)
mpl.rcParams['axes.grid'] = False

import PIL.Image
import time
import functools

# Function to convert a TF tensor to image


def tensor_to_image(tensor):
tensor = tensor*255
tensor = np.array(tensor, dtype=np.uint8)
if np.ndim(tensor)>3:
assert tensor.shape[0] == 1
tensor = tensor[0]
return PIL.Image.fromarray(tensor)

# Function to load an image as a tensor from specified path


def load_img(path_to_img):
max_dim = 512
img = tf.io.read_file(path_to_img)
img = tf.image.decode_image(img, channels=3)
img = tf.image.convert_image_dtype(img, tf.float32)

shape = tf.cast(tf.shape(img)[:-1], tf.float32)


long_dim = max(shape)
scale = max_dim / long_dim
new_shape = tf.cast(shape * scale, tf.int32)

img = tf.image.resize(img, new_shape)


img = img[tf.newaxis, :]
return img

# Function to display an image


def imshow(image, title=None):
if len(image.shape) > 3:
image = tf.squeeze(image, axis=0)

plt.imshow(image)
if title:
plt.title(title)

Load the Content And Style Images


from google.colab import files
uploaded = files.upload()

<IPython.core.display.HTML object>

Saving Psychedelic.jpeg to Psychedelic.jpeg

#content_image =
load_img('drive/MyDrive/StyleTransfer/Content/Eagle.jpg')
content_image =
load_img('/content/photo_restoration/output/final_output/b.png')

style_image = load_img('Psychedelic.jpeg')

plt.subplot(1, 2, 1)
imshow(content_image, 'Content Image')

plt.subplot(1, 2, 2)
imshow(style_image, 'Style Image')
Show the Output given by the TF Hub Style Transfer Model
import tensorflow_hub as hub
hub_model = hub.load('https://tfhub.dev/google/magenta/arbitrary-
image-stylization-v1-256/2')
stylized_image = hub_model(tf.constant(content_image),
tf.constant(style_image))[0]
tensor_to_image(stylized_image)

Downloaded https://tfhub.dev/google/magenta/arbitrary-image-
stylization-v1-256/2, Total size: 89.77MB
Load the VGG19 model from Keras
x = tf.keras.applications.vgg19.preprocess_input(content_image*255)
x = tf.image.resize(x, (224, 224))
vgg = tf.keras.applications.VGG19(include_top=True,
weights='imagenet')
prediction_probabilities = vgg(x)
prediction_probabilities.shape

Downloading data from https://storage.googleapis.com/tensorflow/keras-


applications/vgg19/vgg19_weights_tf_dim_ordering_tf_kernels.h5
574717952/574710816 [==============================] - 6s 0us/step
574726144/574710816 [==============================] - 6s 0us/step

TensorShape([1, 1000])

Predict the label for the image to test whether VGG is working for Image classification
predicted_top_5 =
tf.keras.applications.vgg19.decode_predictions(prediction_probabilitie
s.numpy())[0]
[(class_name, prob) for (number, class_name, prob) in predicted_top_5]

Downloading data from


https://storage.googleapis.com/download.tensorflow.org/data/imagenet_c
lass_index.json
40960/35363 [==================================] - 0s 0us/step
49152/35363 [=========================================] - 0s 0us/step

[('bow_tie', 0.27081335),
('military_uniform', 0.077421315),
('cowboy_hat', 0.05717317),
('seat_belt', 0.023244375),
('barbershop', 0.022434594)]

Print the layers of the VGG19 network


vgg = tf.keras.applications.VGG19(include_top=False,
weights='imagenet')

print()
for layer in vgg.layers:
print(layer.name)

Downloading data from https://storage.googleapis.com/tensorflow/keras-


applications/vgg19/vgg19_weights_tf_dim_ordering_tf_kernels_notop.h5
80142336/80134624 [==============================] - 1s 0us/step
80150528/80134624 [==============================] - 1s 0us/step

input_2
block1_conv1
block1_conv2
block1_pool
block2_conv1
block2_conv2
block2_pool
block3_conv1
block3_conv2
block3_conv3
block3_conv4
block3_pool
block4_conv1
block4_conv2
block4_conv3
block4_conv4
block4_pool
block5_conv1
block5_conv2
block5_conv3
block5_conv4
block5_pool

Assign intermediate layers for style and content


content_layers = ['block5_conv2']

style_layers = ['block1_conv1',
'block2_conv1',
'block3_conv1',
'block4_conv1',
'block5_conv1']

num_content_layers = len(content_layers)
num_style_layers = len(style_layers)

def vgg_layers(layer_names):
""" Creates a vgg model that returns a list of intermediate output
values."""
# Load our model. Load pretrained VGG, trained on imagenet data
vgg = tf.keras.applications.VGG19(include_top=False,
weights='imagenet')
vgg.trainable = False

outputs = [vgg.get_layer(name).output for name in layer_names]

model = tf.keras.Model([vgg.input], outputs)


return model

style_extractor = vgg_layers(style_layers)
style_outputs = style_extractor(style_image*255)

#Look at the statistics of each layer's output


for name, output in zip(style_layers, style_outputs):
print(name)
print(" shape: ", output.numpy().shape)
print(" min: ", output.numpy().min())
print(" max: ", output.numpy().max())
print(" mean: ", output.numpy().mean())
print()

block1_conv1
shape: (1, 384, 512, 64)
min: 0.0
max: 797.95276
mean: 35.36682

block2_conv1
shape: (1, 192, 256, 128)
min: 0.0
max: 5022.046
mean: 198.26442

block3_conv1
shape: (1, 96, 128, 256)
min: 0.0
max: 8470.7705
mean: 225.4287

block4_conv1
shape: (1, 48, 64, 512)
min: 0.0
max: 21628.117
mean: 784.6738

block5_conv1
shape: (1, 24, 32, 512)
min: 0.0
max: 3131.514
mean: 53.838207

# Function to calculate the Gram Matrix


def gram_matrix(input_tensor):
result = tf.linalg.einsum('bijc,bijd->bcd', input_tensor,
input_tensor)
input_shape = tf.shape(input_tensor)
num_locations = tf.cast(input_shape[1]*input_shape[2], tf.float32)
return result/(num_locations)

class StyleContentModel(tf.keras.models.Model):
def __init__(self, style_layers, content_layers):
super(StyleContentModel, self).__init__()
self.vgg = vgg_layers(style_layers + content_layers)
self.style_layers = style_layers
self.content_layers = content_layers
self.num_style_layers = len(style_layers)
self.vgg.trainable = False

def call(self, inputs):


"Expects float input in [0,1]"
inputs = inputs*255.0
preprocessed_input =
tf.keras.applications.vgg19.preprocess_input(inputs)
outputs = self.vgg(preprocessed_input)
style_outputs, content_outputs = (outputs[:self.num_style_layers],
outputs[self.num_style_layers:])

style_outputs = [gram_matrix(style_output)
for style_output in style_outputs]
content_dict = {content_name: value
for content_name, value
in zip(self.content_layers, content_outputs)}

style_dict = {style_name: value


for style_name, value
in zip(self.style_layers, style_outputs)}

return {'content': content_dict, 'style': style_dict}

extractor = StyleContentModel(style_layers, content_layers)

results = extractor(tf.constant(content_image))

print('Styles:')
for name, output in sorted(results['style'].items()):
print(" ", name)
print(" shape: ", output.numpy().shape)
print(" min: ", output.numpy().min())
print(" max: ", output.numpy().max())
print(" mean: ", output.numpy().mean())
print()

print("Contents:")
for name, output in sorted(results['content'].items()):
print(" ", name)
print(" shape: ", output.numpy().shape)
print(" min: ", output.numpy().min())
print(" max: ", output.numpy().max())
print(" mean: ", output.numpy().mean())

Styles:
block1_conv1
shape: (1, 64, 64)
min: 0.008798338
max: 20073.625
mean: 146.20078

block2_conv1
shape: (1, 128, 128)
min: 0.0
max: 24671.422
mean: 5006.4023

block3_conv1
shape: (1, 256, 256)
min: 0.0
max: 105253.766
mean: 4994.0215
block4_conv1
shape: (1, 512, 512)
min: 0.0
max: 1066156.0
mean: 71412.266

block5_conv1
shape: (1, 512, 512)
min: 0.0
max: 73688.96
mean: 636.2896

Contents:
block5_conv2
shape: (1, 32, 32, 512)
min: 0.0
max: 1013.4015
mean: 8.951838

style_targets = extractor(style_image)['style']
content_targets = extractor(content_image)['content']

image = tf.Variable(content_image)

def clip_0_1(image):
return tf.clip_by_value(image, clip_value_min=0.0,
clip_value_max=1.0)

opt = tf.optimizers.Adam(learning_rate=0.02, beta_1=0.99, epsilon=1e-


1)

style_weight=1e-2
content_weight=1e4

rank_1_tensor = tf.constant([])

def style_content_loss(outputs):
style_outputs = outputs['style']
content_outputs = outputs['content']
style_loss = tf.add_n([tf.reduce_mean((style_outputs[name]-
style_targets[name])**2)
for name in style_outputs.keys()])
style_loss *= style_weight / num_style_layers

content_loss = tf.add_n([tf.reduce_mean((content_outputs[name]-
content_targets[name])**2)
for name in content_outputs.keys()])
content_loss *= content_weight / num_content_layers
loss = style_loss + content_loss
return loss
@tf.function()
def train_step(image):
with tf.GradientTape() as tape:
outputs = extractor(image)
loss = style_content_loss(outputs)

grad = tape.gradient(loss, image)


opt.apply_gradients([(grad, image)])
image.assign(clip_0_1(image))

Run the train_step function a couple of times to check if it is working.


train_step(image)
train_step(image)
train_step(image)
tensor_to_image(image)
Add total variation loss to the loss function.
tf.image.total_variation(image).numpy()

array([53277.508], dtype=float32)

total_variation_weight=30

@tf.function()
def train_step(image):
with tf.GradientTape() as tape:
outputs = extractor(image)
loss = style_content_loss(outputs)
loss += total_variation_weight*tf.image.total_variation(image)

grad = tape.gradient(loss, image)


opt.apply_gradients([(grad, image)])
image.assign(clip_0_1(image))

image = tf.Variable(content_image)

Run the Style Transfer Model


import time
start = time.time()

epochs = 50
steps_per_epoch = 100

step = 0
for n in range(epochs):
for m in range(steps_per_epoch):
step += 1
train_step(image)
print(".", end='', flush=True)
display.clear_output(wait=True)
display.display(tensor_to_image(image))
print("Train step: {}".format(step))

end = time.time()
print("Total time: {:.1f}".format(end-start))
Train step: 5000
Total time: 1658.4

Download the stylized output image


file_name = 'stylized-image.png'
tensor_to_image(image).save(file_name)

try:
from google.colab import files
except ImportError:
pass
else:
files.download(file_name)

<IPython.core.display.Javascript object>
<IPython.core.display.Javascript object>

You might also like