Unit 3
Unit 3
Autoencoders are a specific type of feed forward neural networks trained to copy input to
the output.
The input is compressed into a lower dimensional code and then the output is
reconstructed from this representation.
The code is also called as latent space representation which is a compact summary or
compression of the input.
Why copying the input to the output?
If the only purpose of autoencoders was to copy the input to the output, they would be useless.
Indeed, we hope that, by training the autoencoder to copy the input to the output, the latent
This can be achieved by creating constraints on the copying task. One way to obtain useful features
from the autoencoder is to constrain h to have smaller dimensions than x, in this case the
By training an undercomplete representation, we force the autoencoder to learn the most salient
1. Encoder
2. Code
3. Decoder
The Encoder layer compresses the input image into a latent space representation. It encodes the input image as a compressed representation
in a reduced dimension.
The Code layer represents the compressed input fed to the decoder layer.
The decoder layer decodes the encoded image back to the original dimension. The decoded image is reconstructed from latent space
representation, and it is reconstructed from the latent space representation and is a lossy reconstruction of the original image.
Autoencoder
Architecture of Autoencoder
Autoencoder
The main objective is to get the output that is identical with the input.The
dimensionality of the input and the output should be same.
Hyperparameters during autoencoder working
There are four hyper parameters that must be set before training the autoencoders
they are as follows
Code size: It is the number of nodes in the middle layer.
Number of layers: The autoencoder can be as deep as we like without considering
the input and the output.
Number of nodes per layer: The number of nodes per layer decreases with each
subsequent layer of the encoder and increases back in decoder.
Loss function: Mean squared error and Binary cross entropy can be used as loss
function.
Features of Autoencoders
Data Dependent: Autoencoders are compression techniques where the model can
be used only on data in which they have trained. For example
A linear autoencoder uses zero or more linear activation function in its layers. Linear autoencoder can be trained
using only a single layer encoder and single layer decoder. This is called as Linear Autoencoder. This has one hidden
layer ,linear activations and squared error loss. It has to map D dimensional inputs to K dimensional subspace s .
The network compute the linear function.
Under Complete Autoencoder
While reconstructing an image, we do not want the neural network to simply copy the input to the output. This
An autoencoder should be able to reconstruct the input data efficiently but by learning the useful properties
There are many ways to capture important properties when training an autoencoder. Let’s start by getting to
We can do that if we make the hidden coding data to have less dimensionality than the input data. In an autoencoder, when the encoding
The above way of obtaining reduced dimensionality data is the same as PCA. In PCA also, we try to try to reduce the dimensionality of the
original data.The loss function for the above process can be described as,
L(x,r)=L(x,g(f(x))
where
L is the loss function. This loss function applies when the reconstruction r is dissimilar from the input
x
.
Undercomplete Autoencoder
An autoencoder in which the dimension of the code is less than the dimension of
the input is called as undercomplete encoder.
Undercomplete autoencoder limits the amount of information that can flow through
the network by constraining the number of hidden nodes.
Ideally this encoding learns and describes the latent attributes of the input data.
Undercomplete autoencoders is a sandwich architecture keeping the code size
small.Training the autoencoder to perform the input copying task will result in code
h taking on useful properties.
Undercomplete autoencoder
Learning an under complete representation forces the autoencoder to capture the
salient features of training data and it won't be able to copy the inputs to the
outputs.
Advantages-
● Undercomplete autoencoders do not need any regularization as they maximize the probability of data
rather than copying the input to the output.
Drawbacks-
● Using an overparameterized model due to lack of sufficient training data can create overfitting.
Overcomplete autoencoder
Undercomplete autoencoders with code dimensions less than the input
dimensions can learn the most salient features of data distribution . These
autoencoders fail to learn anything useful if encoder and decoder are given too
much capacity.
A similar problem occurs if hidden code has dimension greater than the input .
They are overcomplete autoencoders. In this case even a linear encoder and
linear decoder can learn to copy the input to the output without learning anything
useful about data distribution.
Regularised Autoencoder
There are other ways we can constraint the reconstruction of an autoencoder than to impose a
hidden layer of smaller dimension than the input.
Rather than limiting the model capacity by keeping the encoder and decoder shallow and the code
size small, regularized autoencoders use a loss function that encourages the model to have other
properties besides the ability to copy its input to its output.
In practice, we usually find two types of regularized autoencoder: the sparse autoencoder and the
denoising autoencoder.
Sparse Autoencoders
Sparse autoencoders are simply an autoencoders whose training criteria involves
a sparsity constraint or penalty.
Sparse autoencoders are used to learn features for another task such as
classification.
An autoencoder has been regularised to be sparse must respond to unique
statistical features of the dataset it has been trained on. In this way training to
perform the copying task with sparsity penalty can yield a model that has learned
useful features as byproduct.
Sparse Autoencoders
Sparse autoencoders have hidden nodes greater than input nodes. They can still discover important features
from the data. A generic sparse autoencoder is visualized where the obscurity of a node corresponds with the
level of activation.
Sparsity constraint is introduced on the hidden layer. This is to prevent output layer copy input data. Sparsity
may be obtained by additional terms in the loss function during the training process, either by comparing the
probability distribution of the hidden unit activations with some low desired value,or by manually zeroing all but
the strongest hidden unit activations.
Some of the most powerful AIs in the 2010s involved sparse autoencoders stacked inside of deep neural
networks.
Sparse autoencoders
Advantages-
● Sparse autoencoders have a sparsity penalty, a value close to zero but not exactly zero. Sparsity penalty is
applied on the hidden layer in addition to the reconstruction error. This prevents overfitting.
● They take the highest activation values in the hidden layer and zero out the rest of the hidden nodes. This
prevents autoencoders to use all of the hidden nodes at a time and forcing only a reduced number of hidden
nodes to be used.
Drawbacks-
● For it to be working, it's essential that the individual nodes of a trained model which activate are data
dependent, and that different inputs will result in activations of different nodes through the network.
●
Sparse autoencoders
● Another way we can constraint the reconstruction of autoencoder is to impose a
constraint in its loss. We could, for example, add a regularization term in the
loss function. Doing this will make our autoencoder learn sparse representation
of data.
● Sparsity constraint is introduced on the hidden layer such that only fraction of
nodes would have a non zero values called as active nodes. So only reduced
number of hidden nodes are used a time .
● A penalty term is added to the loss function such that only fraction of nodes
become active. So the autoencoder is forced to represent each input as a
combination of small nodes and to discover salient features in data.
● The value of sparsity penalty is close to zero but not zero . In addition to the
reconstruction error the sparsity penalty is applied on hidden layer which
prevents overfitting.
Sparse autoencoders
In sparse autoencoders hidden nodes are greater than input nodes . As only a
small subset of the nodes will be active at any time this method works even if the
code size is large.
The penalty term can be simply treated a sa regularizer term added to the feed
forward network whose primary task is to copy the input to the output .
Specifically, if the autoencoder is too big, then it can just learn the data, so the output equals the
input, and does not perform any useful representation learning or dimensionality reduction.
Denoising autoencoders solve this problem by corrupting the input data on purpose, adding noise
or masking some of the input values.
Denoising Autoencoder
This type of Autoencoder is an alternative to the concept of regular Autoencoder we just discussed,
which is prone to a high risk of overfitting.
In the case of a Denoising Autoencoder, the data is partially corrupted by noises added to
the input vector in a stochastic manner.
Then, the model is trained to predict the original, uncorrupted data point as its output.
Training process of Denoising autoencoders
● An input is sampled from our dataset.
● (x, x
̃ ) is used as a training example
Just like regular AE, our DAE is a feedforward network that can be trained with a gradient-based approximate
equiprobable corruption.
During training, the aim is to minimize the negative log-likelihood cost function.
Thus, our model learns a reconstruction vector field D(E(x))-x, some of these vectors are represented by the red arrows.
Denoising autoencoders
Rather than adding a penalty to the loss function, we can obtain an autoencoder that learns
something useful by changing the reconstruction error term of the loss function
. This can be done by adding some noise of the input image and make the autoencoder learn to
remove it
. By this means, the encoder will extract the most important features and learn a robuster
representation of the data
Advantages of Denoising autoencoders
● Learns more robust filters
The main goal of contractive autoencoder is to have a robust learned representation that is less sensitive to
small variation of data.
The penalty term is frobenius norm of jacobian matrix which is calculated with respect to the hidden layer
So from the mathematical point of view, it gives the effect of contraction by adding an additional term to
reconstruction cost and this term needs to comply with the Frobenius norm of the Jacobian matrix to be
applicable for the encoder activation sequence.
If this value is zero, it means that as we change input values, we don't observe any change on the learned
hidden representations.
But if the value is very large, then the learned representation is unstable as the input values change.
Link between Denoising and Contractive autoencoder
There is a connection between the denoising autoencoder and the contractive autoencoder:
the denoising reconstruction error is equivalent to a contractive penalty on the reconstruction function that
maps x to r - g(f(x)).
In other words, denoising autoencoders make the reconstruction function resist small but finite sized
perturbations of the input, whereas contractive autoencoders make the feature extraction function resist
infinitesimal perturbations of the input.
When using the Jacobian based contractive penalty to pretrain features f(x) for use with a classifier, the best
classification accuracy usually results from applying the contractive penalty to f(x) rather than to g(f(x)).
There are some important equations we need to know first before deriving contractive autoencoder. Before
going there, we'll touch base on the Frobenius norm of the Jacobian matrix.
The Frobenius norm, also called the Euclidean norm, is matrix norm of an mxn matrix A defined as the square
root of the sum of the absolute squares of its elements.
The Jacobian matrix is the matrix of all first-order partial derivatives of a vector-valued function. So when the
matrix is a square matrix, both the matrix and its determinant are referred to as the Jacobian.
where the penalty term, λ(J(x))^2, is the squared Frobenius norm of the Jacobian matrix of partial derivatives
associated with the encoder function and is defined in the second line.
Image Compression using autoencoders
Image Compression Using Autoencoders in Keras | Paperspace Blog