0% found this document useful (0 votes)
1 views

Chapter 10

Artificial Intelligence Chapter 10

Uploaded by

08deepaksingh.me
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
1 views

Chapter 10

Artificial Intelligence Chapter 10

Uploaded by

08deepaksingh.me
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 20

C H A P T E R

10
Variational auto-encoder

OUTLINE
• Introduction to Auto-Encoder (AE)
• Introduction to Variational Auto-Encoder (VAE)
• Implementing the VAE model for reconstructing
images

10.1 Introduction to auto-encoder


The aim of Auto-Encoder (AE) is to learn to compress data while minimizing errors in reconstructing them. To
accomplish this, an AE is composed of two main parts: an encoder and a decoder [1–3]. The encoder is responsible
for compressing the input into a lower-dimensional latent space representation. The latent representation is referred
to as code, and the decoder is used to decode this code back to the input. In training the AE model, it is hoped that the
output of the decoder and the input of the encoder are as similar as possible. For example, an MNIST handwritten
image of size 28  28 is compressed into a 2D vector code by the encoder, and then the resulting code is decoded back
to the original 28  28 image through the decoder. Fig. 10.1 presents the training schematic diagram of the AE model.

Input Latent space Output


representation
Encoder Decoder
Code
Ex: 2 Dimension

The loss between the Output and Input

FIG. 10.1 Auto-Encoder (AE) training schematic diagram.

However, when training the AE model, there are no constraints on the latent representations generated by the
encoder, so it cannot be ensured that every output generated by the decoder has meaningful attributes of the original
data input. For example, 225 sets of 2D vector codes are produced by the encoder of an AE model, in which the values
of codes are linearly sampled from 1.5 to +1.5. These 225 sets are sent to the decoder to generate 225 images. As
observed in Fig. 10.2, we cannot recognize all characters in the output images except for images “0,” “3,” “5,” and
“8” in the lower right corner. In the next section, we introduce a Variational Auto-Encoder (VAE), which adds con-
straints on the latent representations to address this problem.

Principles and Labs for Deep Learning 235 Copyright © 2021 Elsevier Inc. All rights reserved.
https://doi.org/10.1016/B978-0-323-90198-7.00010-0
236 10. Variational auto-encoder

FIG. 10.2 Example of generated images of the Auto-Encoder (AE) model. Not all generated images can be recognized.

10.2 Introduction to variational auto-encoder

10.2.1 Introduction to VAE


VAE [4] is an advanced version of AE that learns a data-generating distribution, allowing it to take random codes or
latent representations from latent distribution to generate output data that have similar characteristics to those of
input data. Fig. 10.3 shows the differences between VAE and AE.

Input data Latent Output data


Encoder Decoder
representation

(A) Auto-Encoder

Input data Latent Sampling Sampled latent Output data


Encoder Decoder
distribution representation

(B) Variational Auto-Encoder


FIG. 10.3 Differences between Auto-Encoder (AE) and Variational Auto-Encoder (VAE).

Fig. 10.4 shows the operation concept of the VAE. As shown, the encoder receives input data and outputs two vec-
tors including a vector of the mean (μ) and a vector of variance (σ 2) for generating latent distribution. Then, a code or
latent representation is randomly sampled from the latent distribution to be passed through the decoder for gen-
erating output data.

FIG. 10.4 Conceptual diagram of Variational Auto-Encoder (VAE).


10.2 Introduction to variational auto-encoder 237
As for why VAE has codes to be randomly sampled from the latent distributions, here is an intuitive example
to explain. As shown in Fig. 10.5, two images A and B are used as the inputs of the encoder. At the output of
the encoder, two codes, namely Code A and Code B are sampled and sent to the decoder to produce Output A
and Output B, respectively. Because there is an intersection between two distributions, an additional Output C
can be produced between Output A and Output B from this intersection. This design makes the outputs of
the VAE have a continuous relationship, such as the continuous relationship between Output A, Output B,
and Output C.

FIG. 10.5 Advantages of using latent distributions in VAE.

10.2.2 Operation of VAE


As introduced previously, instead of directly outputting the values for the latent space representation like AE, the
encoder of a VAE outputs two vectors of the mean (μ) and variance (σ 2) for generating latent distributions. The
decoder takes random sampled code from the latent distributions for reconstruction of the original input. When
training the VAE model, the relationship of parameters in the model with respect to the final loss is calculated
by the backpropagation algorithm. However, it is impossible to do this for the random sampling process because
there is no value for computation. To overcome this problem, the VAE employed a reparametrization trick for sam-
pling, in which an ε from a standard normal distribution is randomly sampled to combine with parameters μ and σ 2
of latent distribution for computing the output code of the encoder, which is defined as C ¼ exp (σ 2) ∗ ε + μ, as shown
in Fig. 10.6. Following this trick, μ and σ of the latent distribution can be optimized during the training process while

Input m Output

Encoder C Decoder
exp
s2

*
C = exp(s 2) * e + m
e
Random
sample
Standard
normal distribution
FIG. 10.6 Diagram of Variational Auto-Encoder (VAE).
238 10. Variational auto-encoder

still allowing to randomly sample from that distribution. Another more intuitive explanation for using the code
C ¼ exp (σ 2) ∗ ε + μ is that μ is considered the code of the AE, this code is added a noise (exp(σ 2) ∗ ε) in the VAE to
produce the code C and hope that C can still be decoded to the original input.
Taking a similar example as in Section 10.1, the encoder of the VAE produces 225 sets of 2D vector codes, and then
these sets are passed through the decoder to generate 225 handwritten digit images, as shown in Fig. 10.7. As
shown, each image is a digit that can be easy to recognize, and each is smoothly transformed into another.

FIG. 10.7 Example of generated images of the Variational Auto-Encoder (VAE).

10.2.3 Variational auto-encoder loss function


The training goal of the VAE model is to ensure the predicted output and the input are as similar as possible. If the
data contains image samples, binary cross-entropy (BCE) loss between each pixel of the reconstructed image and input
image can be employed for training the model. This loss is also called reconstruction loss, which is described as follows:

1X N X W X H X C  
Lossreconstruction ¼ binary_crossentropy xx,y,c , y^x,y, c
N i¼1 x¼1 y¼1 c¼1

x: input image.
^
y: output image or reconstructed image.
W: width of image.
H: height of image.
C: the number of image channels.
N:the amount of data in a batch.
However, if only reconstruction loss is used for training the VAE model, it is not enough. As shown in Fig. 10.6,
exp(σ 2) controls the scale of noise (exp(σ 2) ∗ ε), in which σ 2 is learned by the encoder. If the encoder learns to output
exp(σ 2) as 0, noise is not added to compute the output code. If there is no noise in computing the code, the output code
in the VAE model is similar to that of the AE model, which means constraints on the encoded representations are not
added to the VAE model. The graph of exp(σ 2) is shown in Fig. 10.8, where, the smaller the value of σ 2 is, the closer to 0
the value of exp(σ 2) becomes.
10.2 Introduction to variational auto-encoder 239

FIG. 10.8 The graph of exp(σ 2).

In order to solve this problem, Lossσ2 is employed to limit the value of σ 2. If the value of Lossσ2 is equal to 0, σ 2 must be 0.
At this time, the value of exp(σ 2) is equal to 1, so the problem that the encoder updates in the direction of exp(σ 2) ¼ 0 is
solved. Fig. 10.9 shows the graph of Lossσ2; its formula is as follows:

1 XN     
Lossσ2 ¼ exp σ2i  1 + σ2i
2N i¼1

exp(s 2) 1 + s2 exp(s 2) – (1 + s 2)
7 7 7
exp(s 2) 1 + s2 exp(s 2) – (1 + s 2)
6 6 6
5 5 5
4 4 4
3 3 3
2 2 2
1 1 1
0 0 0
–1 –1 –1
–2 –2 –2
–4 –2 0 2 4 –4 –2 0 2 4 –4 –2 0 2 4

FIG. 10.9 The graph of exp(σ2i )  (1 + σ2i ).

In addition, L2 regularization is applied for μ, which is formulated as follows:

1 XN
Lossu ¼ μ2
2N i¼1 i

Finally, Lossu,σ2 is established by incorporating Lossu and Lossσ2, which is also known as Kullback–Leibler divergence
loss (KL Loss) and is expressed as follows:

1 XN      
Lossu, σ2 ¼ exp σ2i  1 + σ2i + μ2i
2N i¼1

μ: mean value (one of the outputs of encoder).


σ2: variance (the other output of encoder).
N: batch size.
240 10. Variational auto-encoder

10.3 Experiment: Implementation of variational auto-encoder model


This section introduces an example program of building a VAE model, which is trained and tested on the MNIST
handwritten digit dataset [5] for compressing and reconstructing images. Fig. 10.10 shows some generated images of
the VAE model.

FIG. 10.10 Handwritten digital images generated by the Variational Auto-Encoder (VAE).

10.3.1 Create project


Because the VAE models in this chapter are much more complicated than the example programs in the previous
chapters, we employ Pycharm IDE as a compiler to write the source codes and train the model. In the following, we
outline the process of creating the project.
1. Create a new project: Click “File” ! “New Project,” as shown in Fig. 10.11.

FIG. 10.11 Creating a new project on Pycharm.


10.3 Experiment: Implementation of variational auto-encoder model 241
2. Set a directory of the new project, as shown in Fig. 10.12.

FIG. 10.12 Setting a new project directory.

3. Configure a Python interpreter: Open “Project Interpreter: Python 3.6,” select the “Existing interpreter,” and set
Python Interpreter, as shown in Fig. 10.13.

FIG. 10.13 Interpreter setting.


242 10. Variational auto-encoder

4. Create a project: Click “Create” button to create a new project, as shown in Fig. 10.14.

FIG. 10.14 Creating a project.

Supplementary explanation
The code examples of the VAE project in this chapter can be download at: https://github.com/taipeitechmmslab/
MMSLAB-DL/tree/master, as shown in Fig. 10.15.

FIG. 10.15 The source code of Variational Auto-Encoder (VAE) on GitHub.


10.3 Experiment: Implementation of variational auto-encoder model 243

10.3.2 Introduction to the dataset


We use the MNIST handwritten digit dataset [5] for training and testing the VAE model in this chapter. The MNIST
contains 60,000 training samples and 10,000 test samples, which are gray images with a size of 28  28. The dataset can
be loaded through TensorFlow datasets as follows.

import tensorflow_datasets as tfds

#Load training data


train_data, info = tfds.load("mnist", split= tfds.Split.TRAIN, with_info=True)
# Loading test data
test_data = tfds.load("mnist", split= tfds.Split.TRAIN)
# Display information of dataset
print(info)
Result:
tfds.core.DatasetInfo(
name='mnist',
version=1.0.0,
description='The MNIST database of handwritten digits.',
urls=['https://storage.googleapis.com/cvdf-datasets/mnist/'],
features=FeaturesDict({
'image': Image(shape=(28, 28, 1), dtype=tf.uint8),
'label': ClassLabel(shape=(), dtype=tf.int64, num_classes=10),
}),
total_num_examples=70000,
splits={
'test': 10000,
'train': 60000,
},
supervised_keys=('image', 'label'),
citation="""@article{lecun2010mnist,
title={MNIST handwritten digit database},
author={LeCun, Yann and Cortes, Corinna and Burges, CJ},
journal={ATT Labs [Online]. Available: http://yann. lecun. com/exdb/mnist},
volume={2},
year={2010}
}""",
redistribution_info=,
)

Fig. 10.16 shows some example images of the MNIST dataset.

FIG. 10.16 Handwritten digits from the MNIST dataset.


244 10. Variational auto-encoder

10.3.3 Building a Variational auto-encoder model


1. Directory and files
Create a folder for storing python files of the VAE project, as shown in Fig. 10.17. Here, we present a brief intro-
duction to the files.
▪ train.py: file source code for training the VAE model
▪ test.py: file source code for evaluating the VAE model
▪ utils:
• model.py: file source code for the VAE model and custom network layer
• losses.py: file source code for custom loss function
• callbacks.py: file source code for custom callback function

FIG. 10.17 Variational Auto-Encoder (VAE) project.

2. Implement the VAE model


Fig. 10.18 is a flowchart of the source code for building the VAE model.

1. Creating helper 2. Building and training 3. Visualization


functions VAE model results
- Functions for - Import packages. - Image generation.
constructing the VAE - Preparing data. - Observation of
model. - Set callback. training results
- Variational Auto- - Create VAE model. with TensorBoard.
Encoder loss function - Set optimizer and loss
- Custom callback function.
function. - Training VAE model.

FIG. 10.18 Flowchart of the source code for the Variational Auto-Encoder (VAE) model.
10.3 Experiment: Implementation of variational auto-encoder model 245
(a) Creating helper functions
▪ Functions for constructing the VAE model
The source code of the VAE model is written in the “models.py” file. Fig. 10.19 shows the architecture of the VAE.

FIG. 10.19 The architecture of Variational Auto-Encoder (VAE).


246 10. Variational auto-encoder

Creating “create_vae_model” function.


10.3 Experiment: Implementation of variational auto-encoder model 247

The sampling layer in Fig. 10.19 is a custom network layer. Fig. 10.20 describes the output of this layer.

FIG. 10.20 Sampling in the Variational Auto-Encoder (VAE) model.

▪ The source code of the sampling custom network layer

class Sampling(keras.layers.Layer):
def call(self, inputs):
z_mean, z_log_var = inputs
batch = tf.shape(z_mean)[0]
dim = tf.shape(z_mean)[1]
epsilon = tf.random.normal(shape=(batch, dim))
return z_mean + tf.exp(z_log_var) * epsilon
248 10. Variational auto-encoder

Supplementary explanation
Lossu, σ2(KL Loss) is used to optimize μ and σ 2 in the middle of the VAE model, so it is necessary to declare the internal loss
together at the establishment of the network layer. To accomplish this task, the “vae.add_loss” used in “create_vae_model”
function above is one of the methods. We can also create a “custom network model” or “custom network layer” to add the
Lossu, σ2 loss function.
Example 1: custom network model
class VariationalAutoEncoder(keras.Model):
def __init__(self, name='autoencoder', **kwargs):
super(VariationalAutoEncoder, self).__init__(name=name, **kwargs)
self.encoder = Encoder()
self.decoder = Decoder()
self.sampling = Sampling()

def call(self, inputs):


z_mean, z_var = self.encoder(inputs)
z = self.sampling([z_mean, z_var])
img_output = self.decoder(z)
kl_loss=0.5*tf.reduce_mean(tf.exp(z_var)-(1+z_var)+tf.square(z_mean))
self.add_loss(kl_loss)
return reconstructed

Example 2: Custom network layer


class KLLoss(keras.layers.Layer):
def call(self, inputs):
z_mean = inputs[0]
z_var = inputs[1]
kl_loss=0.5*tf.reduce_mean(tf.exp(z_var)-(1+z_var)+tf.square(z_mean))
self.add_loss(kl_loss)
return z_mean, z_var
# … Omit the convolution layer and fully connected layer of the Encoder …
z_mean = keras.layers.Dense(latent_dim)(x)
z_var = keras.layers.Dense(latent_dim)(x)
z_mean, z_var = KLLoss()([z_mean, z_var])
z = Sampling()([z_mean, z_var])
encoder = keras.Model(inputs=img_inputs, outputs=z, name='encoder')

▪ VAE loss function

1X N X W X H X C  
Lossreconstruction ¼ binary_crossentropy xx,y,c , y^x,y, c
N i¼1 x¼1 y¼1 c¼1
10.3 Experiment: Implementation of variational auto-encoder model 249
x: input image.
^y: output image.
W: width of image.
H: height of image.
C: the number of image channels.
N: the amount of data in a batch.
The reconstruction loss function is written in the “losses.py” file.
def reconstruction_loss(y_true, y_pred):
# Binary Cross-Entropy loss is used for calculating error between each pixel of the
generated image and the input image
bce = -(y_true * tf.math.log(y_pred + 1e-07) +
(1 - y_true) * tf.math.log(1 - y_pred + 1e-07))
return tf.reduce_mean(tf.reduce_sum(bce, axis=[1, 2, 3]))

• Custom callback
Please write the source code in “callbacks.py” file.
• "SaveDecoderModel" class: Check every epoch. If there is any improvement of loss, save the decoder model
(similar to keras.callbacks.ModelCheckpoint).
• "SaveDecoderOutput" class: In each epoch, the decoder model generates 225 images and write them into the
TensorBoard log file for observing the output changes.
SaveDecoderModel
class SaveDecoderModel(tf.keras.callbacks.Callback):
def __init__(self, weights_file, monitor='loss', save_weights_only=False):
super(SaveDecoderModel, self).__init__()
self.weights_file = weights_file # Decoder model storage path
self.best = np.Inf # Set best to infinite
self.monitor = monitor
self.save_weights_only = save_weights_only # Save model weights

def on_epoch_end(self, epoch, logs=None):


"""
Each epoch, if there is an improvement of loss, the model or model weights will be
saved
"""
loss = logs.get(self.monitor) # Get the value to be measured
if loss < self.best:
if self.save_weights_only:
# Save the weight of the Decoder model
self.model.get_layer('decoder').save_weights(self.weights_file)
else:
# Save the complete Decoder model
self.model.get_layer('decoder').save(self.weights_file)
self.best = loss
250 10. Variational auto-encoder

SaveDecoderOutput
class SaveDecoderOutput(tf.keras.callbacks.Callback):
def __init__(self, image_size, log_dir):
super(SaveDecoderOutput, self).__init__()
self.size = image_size
self.log_dir = log_dir # the storage path of Tensorboard log file
n = 15 # for generating (15x15) images
self.save_images = np.zeros((image_size * n, image_size * n, 1))
self.grid_x = np.linspace(-1.5, 1.5, n)
self.grid_y = np.linspace(-1.5, 1.5, n)

def on_train_begin(self, logs=None):


""" Tensorboard log file is created before starting training """
path = os.path.join(self.log_dir, 'images')
self.writer = tf.summary.create_file_writer(path)

def on_epoch_end(self, epoch, logs=None):


"""
225 images are generated and write into the log file
"""
for i, yi in enumerate(self.grid_x):
for j, xi in enumerate(self.grid_y):
# Generate a set of Code
z_sample = np.array([[xi, yi]])
# Decoder generates images
img = self.model.get_layer('decoder')(z_sample)
# Save image
self.save_images[i*self.size:(i+1)*self.size,
j*self.size:(j+1)*self.size] = img.numpy()[0]
# write the generated 225 images to the TensorBoard log file
with self.writer.as_default():
tf.summary.image("Decoder output", [self.save_images], step=epoch)

(b) Building and training the VAE model


The source code for training the VAE model is written in a “train.py” file.
▪ Import packages

import os
import tensorflow as tf
import tensorflow_datasets as tfds
from tensorflow import keras
from utils.models import create_vae_model
from utils.losses import reconstruction_loss
from utils.callbacks import SaveDecoderOutput, SaveDecoderModel
10.3 Experiment: Implementation of variational auto-encoder model 251
▪ Preparing data
• Data normalization

def parse_fn(dataset, input_size=(28, 28)):


x = tf.cast(dataset['image'], tf.float32)
# Resize the image to the network input size
x = tf.image.resize(x, input_size)
# Normalize the image
x = x / 255.
# Return training data and the answers
return x, x

• Load MNIST dataset

train_data = tfds.load('mnist', split=tfds.Split.TRAIN)


test_data = tfds.load('mnist', split=tfds.Split.TEST)

• Set data

AUTOTUNE = tf.data.experimental.AUTOTUNE # Automatic adjustment mode


batch_size = 16 # batch size
train_num = info.splits['train'].num_examples # Number of training data

# Shuffle training data


train_data = train_data.shuffle(train_num)
# Training data
train_data = train_data.map(parse_fn, num_parallel_calls=AUTOTUNE)
# Set the batch size to 16 and turn on prefetch mode
train_data = train_data.batch(batch_size).prefetch(buffer_size=AUTOTUNE)

# Test data
test_data = test_data.map(parse_fn, num_parallel_calls=AUTOTUNE)
# Set the batch size to 16 and turn on prefetch mode
test_data = test_data.batch(batch_size).prefetch(buffer_size=AUTOTUNE)

▪ Set callback

# Create a directory to save model weights


log_dirs = 'logs_vae'
model_dir = log_dirs + '/models'
os.makedirs(model_dir, exist_ok=True)

# Save the training log as a TensorBoard log file


model_tb = keras.callbacks.TensorBoard(log_dir=log_dirs)
# Store the best model weights
model_sdw = SaveDecoderModel(model_dir + '/best_model.h5', monitor='val_loss')
# write the image generated by Decoder to TensorBoard log file
model_testd = SaveDecoderOutput(28, log_dir=log_dirs)
252 10. Variational auto-encoder

▪ Create VAE model

# the input size of the VAE model


input_shape = (28, 28, 1)
# Dimensional space vectors
latent_dim = 2
# Create VAE model
vae_model = create_vae_model(input_shape, latent_dim)

▪ Set the optimizer and loss function

optimizer = tf.keras.optimizers.RMSprop()
vae_model.compile(optimizer, loss=reconstruction_loss)

▪ Train the VAE model

vae_model.fit(train_data,
epochs=20,
validation_data=test_data,
callbacks=[model_tb, model_sdw, model_testd])

(c) Visualization results


▪ Image generation
The output image of the VAE model is generated by the decoder network. The source code for testing model is
written in a “test.py” file. Fig. 10.21 presents the test results.
Import packages:

import numpy as np
import tensorflow as tf
import matplotlib.pyplot as plt

Load trained model for generating images:

size = 28 # Output image size


n = 15 # Generate (15x15) images
save_images = np.zeros((size * n, size * n, 1))
grid_x = np.linspace(-1.5, 1.5, n)
grid_y = np.linspace(-1.5, 1.5, n)

# Load trained VAE model.


model = tf.keras.models.load_model('logs_vae/models/best_model.h5')
for i, yi in enumerate(grid_x):
for j, xi in enumerate(grid_y):
10.3 Experiment: Implementation of variational auto-encoder model 253

# Generate Codes
z_sample = np.array([[xi, yi]])
# Generate images
img = model(z_sample)
# Save images for displaying
save_images[i * size: (i + 1) * size, j * size: (j + 1) * size] = img.numpy()[0]

# Display generated images


plt.imshow(save_images[..., 0], cmap='gray')
plt.show()
Result:

FIG. 10.21 The reconstructed images of the Varitaional Auto-Encoder (VAE) model.

▪ Observation of training results with TensorBoard


Open TensorBoard with command line:

tensorboard --logdir logs-vae

The prediction changes of the VAE model during training can be observed through TensorBoard, as shown in
Fig. 10.22. Note the recoded image in Fig. 10.22 obtained by using the custom callback “SaveDecoderOutput.”
254 10. Variational auto-encoder

FIG. 10.22 Training results of the Variational Auto-Encoder (VAE) model on TensorBoard.

References
[1] G. Hinton, R. Salakhutdinov, Reducing the dimensionality of data with neural networks, Science 313 (5786) (2006) 504–507.
[2] G.E. Hinton, R. Zemel, Autoencoders, minimum description length and helmholtz free energy, Adv. Neural Inf. Proces. Syst. 6 (1994) 3–10.
[3] P. Vincent, et al., Stacked denoising autoencoders: learning useful representations in a deep network with a local denoising criterion, J. Mach.
Learn. Res. 11 (12) (2010).
[4] D.P. Kingma, M. Welling, Auto-encoding variational bayes, in: 2nd International Conference on Learning Representations, Canada, April 14–16,
2014 [Online]. Available: http://arxiv.org/abs/1312.6114.
[5] L. Deng, The MNIST Database of handwritten digit images for machine learning research [Best of the Web], IEEE Signal Process. Mag. 29 (6)
(2012) 141–142.

You might also like