Chapter 10
Chapter 10
10
Variational auto-encoder
OUTLINE
• Introduction to Auto-Encoder (AE)
• Introduction to Variational Auto-Encoder (VAE)
• Implementing the VAE model for reconstructing
images
However, when training the AE model, there are no constraints on the latent representations generated by the
encoder, so it cannot be ensured that every output generated by the decoder has meaningful attributes of the original
data input. For example, 225 sets of 2D vector codes are produced by the encoder of an AE model, in which the values
of codes are linearly sampled from 1.5 to +1.5. These 225 sets are sent to the decoder to generate 225 images. As
observed in Fig. 10.2, we cannot recognize all characters in the output images except for images “0,” “3,” “5,” and
“8” in the lower right corner. In the next section, we introduce a Variational Auto-Encoder (VAE), which adds con-
straints on the latent representations to address this problem.
Principles and Labs for Deep Learning 235 Copyright © 2021 Elsevier Inc. All rights reserved.
https://doi.org/10.1016/B978-0-323-90198-7.00010-0
236 10. Variational auto-encoder
FIG. 10.2 Example of generated images of the Auto-Encoder (AE) model. Not all generated images can be recognized.
(A) Auto-Encoder
Fig. 10.4 shows the operation concept of the VAE. As shown, the encoder receives input data and outputs two vec-
tors including a vector of the mean (μ) and a vector of variance (σ 2) for generating latent distribution. Then, a code or
latent representation is randomly sampled from the latent distribution to be passed through the decoder for gen-
erating output data.
Input m Output
Encoder C Decoder
exp
s2
*
C = exp(s 2) * e + m
e
Random
sample
Standard
normal distribution
FIG. 10.6 Diagram of Variational Auto-Encoder (VAE).
238 10. Variational auto-encoder
still allowing to randomly sample from that distribution. Another more intuitive explanation for using the code
C ¼ exp (σ 2) ∗ ε + μ is that μ is considered the code of the AE, this code is added a noise (exp(σ 2) ∗ ε) in the VAE to
produce the code C and hope that C can still be decoded to the original input.
Taking a similar example as in Section 10.1, the encoder of the VAE produces 225 sets of 2D vector codes, and then
these sets are passed through the decoder to generate 225 handwritten digit images, as shown in Fig. 10.7. As
shown, each image is a digit that can be easy to recognize, and each is smoothly transformed into another.
1X N X W X H X C
Lossreconstruction ¼ binary_crossentropy xx,y,c , y^x,y, c
N i¼1 x¼1 y¼1 c¼1
x: input image.
^
y: output image or reconstructed image.
W: width of image.
H: height of image.
C: the number of image channels.
N:the amount of data in a batch.
However, if only reconstruction loss is used for training the VAE model, it is not enough. As shown in Fig. 10.6,
exp(σ 2) controls the scale of noise (exp(σ 2) ∗ ε), in which σ 2 is learned by the encoder. If the encoder learns to output
exp(σ 2) as 0, noise is not added to compute the output code. If there is no noise in computing the code, the output code
in the VAE model is similar to that of the AE model, which means constraints on the encoded representations are not
added to the VAE model. The graph of exp(σ 2) is shown in Fig. 10.8, where, the smaller the value of σ 2 is, the closer to 0
the value of exp(σ 2) becomes.
10.2 Introduction to variational auto-encoder 239
In order to solve this problem, Lossσ2 is employed to limit the value of σ 2. If the value of Lossσ2 is equal to 0, σ 2 must be 0.
At this time, the value of exp(σ 2) is equal to 1, so the problem that the encoder updates in the direction of exp(σ 2) ¼ 0 is
solved. Fig. 10.9 shows the graph of Lossσ2; its formula is as follows:
1 XN
Lossσ2 ¼ exp σ2i 1 + σ2i
2N i¼1
exp(s 2) 1 + s2 exp(s 2) – (1 + s 2)
7 7 7
exp(s 2) 1 + s2 exp(s 2) – (1 + s 2)
6 6 6
5 5 5
4 4 4
3 3 3
2 2 2
1 1 1
0 0 0
–1 –1 –1
–2 –2 –2
–4 –2 0 2 4 –4 –2 0 2 4 –4 –2 0 2 4
1 XN
Lossu ¼ μ2
2N i¼1 i
Finally, Lossu,σ2 is established by incorporating Lossu and Lossσ2, which is also known as Kullback–Leibler divergence
loss (KL Loss) and is expressed as follows:
1 XN
Lossu, σ2 ¼ exp σ2i 1 + σ2i + μ2i
2N i¼1
FIG. 10.10 Handwritten digital images generated by the Variational Auto-Encoder (VAE).
3. Configure a Python interpreter: Open “Project Interpreter: Python 3.6,” select the “Existing interpreter,” and set
Python Interpreter, as shown in Fig. 10.13.
4. Create a project: Click “Create” button to create a new project, as shown in Fig. 10.14.
Supplementary explanation
The code examples of the VAE project in this chapter can be download at: https://github.com/taipeitechmmslab/
MMSLAB-DL/tree/master, as shown in Fig. 10.15.
FIG. 10.18 Flowchart of the source code for the Variational Auto-Encoder (VAE) model.
10.3 Experiment: Implementation of variational auto-encoder model 245
(a) Creating helper functions
▪ Functions for constructing the VAE model
The source code of the VAE model is written in the “models.py” file. Fig. 10.19 shows the architecture of the VAE.
The sampling layer in Fig. 10.19 is a custom network layer. Fig. 10.20 describes the output of this layer.
class Sampling(keras.layers.Layer):
def call(self, inputs):
z_mean, z_log_var = inputs
batch = tf.shape(z_mean)[0]
dim = tf.shape(z_mean)[1]
epsilon = tf.random.normal(shape=(batch, dim))
return z_mean + tf.exp(z_log_var) * epsilon
248 10. Variational auto-encoder
Supplementary explanation
Lossu, σ2(KL Loss) is used to optimize μ and σ 2 in the middle of the VAE model, so it is necessary to declare the internal loss
together at the establishment of the network layer. To accomplish this task, the “vae.add_loss” used in “create_vae_model”
function above is one of the methods. We can also create a “custom network model” or “custom network layer” to add the
Lossu, σ2 loss function.
Example 1: custom network model
class VariationalAutoEncoder(keras.Model):
def __init__(self, name='autoencoder', **kwargs):
super(VariationalAutoEncoder, self).__init__(name=name, **kwargs)
self.encoder = Encoder()
self.decoder = Decoder()
self.sampling = Sampling()
1X N X W X H X C
Lossreconstruction ¼ binary_crossentropy xx,y,c , y^x,y, c
N i¼1 x¼1 y¼1 c¼1
10.3 Experiment: Implementation of variational auto-encoder model 249
x: input image.
^y: output image.
W: width of image.
H: height of image.
C: the number of image channels.
N: the amount of data in a batch.
The reconstruction loss function is written in the “losses.py” file.
def reconstruction_loss(y_true, y_pred):
# Binary Cross-Entropy loss is used for calculating error between each pixel of the
generated image and the input image
bce = -(y_true * tf.math.log(y_pred + 1e-07) +
(1 - y_true) * tf.math.log(1 - y_pred + 1e-07))
return tf.reduce_mean(tf.reduce_sum(bce, axis=[1, 2, 3]))
• Custom callback
Please write the source code in “callbacks.py” file.
• "SaveDecoderModel" class: Check every epoch. If there is any improvement of loss, save the decoder model
(similar to keras.callbacks.ModelCheckpoint).
• "SaveDecoderOutput" class: In each epoch, the decoder model generates 225 images and write them into the
TensorBoard log file for observing the output changes.
SaveDecoderModel
class SaveDecoderModel(tf.keras.callbacks.Callback):
def __init__(self, weights_file, monitor='loss', save_weights_only=False):
super(SaveDecoderModel, self).__init__()
self.weights_file = weights_file # Decoder model storage path
self.best = np.Inf # Set best to infinite
self.monitor = monitor
self.save_weights_only = save_weights_only # Save model weights
SaveDecoderOutput
class SaveDecoderOutput(tf.keras.callbacks.Callback):
def __init__(self, image_size, log_dir):
super(SaveDecoderOutput, self).__init__()
self.size = image_size
self.log_dir = log_dir # the storage path of Tensorboard log file
n = 15 # for generating (15x15) images
self.save_images = np.zeros((image_size * n, image_size * n, 1))
self.grid_x = np.linspace(-1.5, 1.5, n)
self.grid_y = np.linspace(-1.5, 1.5, n)
import os
import tensorflow as tf
import tensorflow_datasets as tfds
from tensorflow import keras
from utils.models import create_vae_model
from utils.losses import reconstruction_loss
from utils.callbacks import SaveDecoderOutput, SaveDecoderModel
10.3 Experiment: Implementation of variational auto-encoder model 251
▪ Preparing data
• Data normalization
• Set data
# Test data
test_data = test_data.map(parse_fn, num_parallel_calls=AUTOTUNE)
# Set the batch size to 16 and turn on prefetch mode
test_data = test_data.batch(batch_size).prefetch(buffer_size=AUTOTUNE)
▪ Set callback
optimizer = tf.keras.optimizers.RMSprop()
vae_model.compile(optimizer, loss=reconstruction_loss)
vae_model.fit(train_data,
epochs=20,
validation_data=test_data,
callbacks=[model_tb, model_sdw, model_testd])
import numpy as np
import tensorflow as tf
import matplotlib.pyplot as plt
# Generate Codes
z_sample = np.array([[xi, yi]])
# Generate images
img = model(z_sample)
# Save images for displaying
save_images[i * size: (i + 1) * size, j * size: (j + 1) * size] = img.numpy()[0]
FIG. 10.21 The reconstructed images of the Varitaional Auto-Encoder (VAE) model.
The prediction changes of the VAE model during training can be observed through TensorBoard, as shown in
Fig. 10.22. Note the recoded image in Fig. 10.22 obtained by using the custom callback “SaveDecoderOutput.”
254 10. Variational auto-encoder
FIG. 10.22 Training results of the Variational Auto-Encoder (VAE) model on TensorBoard.
References
[1] G. Hinton, R. Salakhutdinov, Reducing the dimensionality of data with neural networks, Science 313 (5786) (2006) 504–507.
[2] G.E. Hinton, R. Zemel, Autoencoders, minimum description length and helmholtz free energy, Adv. Neural Inf. Proces. Syst. 6 (1994) 3–10.
[3] P. Vincent, et al., Stacked denoising autoencoders: learning useful representations in a deep network with a local denoising criterion, J. Mach.
Learn. Res. 11 (12) (2010).
[4] D.P. Kingma, M. Welling, Auto-encoding variational bayes, in: 2nd International Conference on Learning Representations, Canada, April 14–16,
2014 [Online]. Available: http://arxiv.org/abs/1312.6114.
[5] L. Deng, The MNIST Database of handwritten digit images for machine learning research [Best of the Web], IEEE Signal Process. Mag. 29 (6)
(2012) 141–142.