0% found this document useful (0 votes)
26 views

Unit 3

An autoencoder is an unsupervised neural network that learns compressed encodings of input data. It consists of an encoder that compresses the input into a latent space representation, and a decoder that reconstructs the output from this encoding. Undercomplete autoencoders have a hidden layer with fewer nodes than the input, forcing it to learn only the most salient features of the data rather than simply copying the input. Sparse autoencoders add a sparsity penalty during training to encourage unique activations of hidden nodes, helping it learn useful features.

Uploaded by

Ramya Kanagaraj
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
26 views

Unit 3

An autoencoder is an unsupervised neural network that learns compressed encodings of input data. It consists of an encoder that compresses the input into a latent space representation, and a decoder that reconstructs the output from this encoding. Undercomplete autoencoders have a hidden layer with fewer nodes than the input, forcing it to learn only the most salient features of the data rather than simply copying the input. Sparse autoencoders add a sparsity penalty during training to encourage unique activations of hidden nodes, helping it learn useful features.

Uploaded by

Ramya Kanagaraj
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 39

Autoencoders

Deep learning Chapter 3


Autoencoder
An autoencoder is a unsupervised learning technique. It is a artificial neural network used
to learn data encodings of unlabelled data or task of representational learning.

Autoencoders are a specific type of feed forward neural networks trained to copy input to
the output.

A bottleneck is imposed on the network to represent compressed knowledge of the


original input.

The input is compressed into a lower dimensional code and then the output is
reconstructed from this representation.

The code is also called as latent space representation which is a compact summary or
compression of the input.
Why copying the input to the output?
If the only purpose of autoencoders was to copy the input to the output, they would be useless.

Indeed, we hope that, by training the autoencoder to copy the input to the output, the latent

representation h will take on useful properties.

This can be achieved by creating constraints on the copying task. One way to obtain useful features

from the autoencoder is to constrain h to have smaller dimensions than x, in this case the

autoencoder is called undercomplete.

By training an undercomplete representation, we force the autoencoder to learn the most salient

features of the training data.


Components of the Autoencoders
An Autoencoder is a type of neural network that can learn to reconstruct images, text, and other data from compressed versions of
themselves.

An Autoencoder consists of three layers:

1. Encoder
2. Code
3. Decoder

The Encoder layer compresses the input image into a latent space representation. It encodes the input image as a compressed representation
in a reduced dimension.

The compressed image is a distorted version of the original image.

The Code layer represents the compressed input fed to the decoder layer.

The decoder layer decodes the encoded image back to the original dimension. The decoded image is reconstructed from latent space
representation, and it is reconstructed from the latent space representation and is a lossy reconstruction of the original image.
Autoencoder
Architecture of Autoencoder
Autoencoder

The architecture of autoencoder consist of a hidden layer h.


This hidden layer h describes the code and it is used to represent the input.
The network has two parts
1) Encoder function h=f(x)
2) A decoder that produces a reconstruction r=g(h)
3) Encoder and decoder both are fully connected feedforward neural networks. The ANN structure
of decoder is same as that of encoder . Code is single layer artificial neural network having
dimension of our choice. Code size is the number of nodes in code layer is a hyperparameter
which need to be set before training the autoencoder.
Autoencoder working
The input is passed through the encoder to produce the code.The decoder
produces the output using the code. Though it is not required ,typically the
architecture of decoder is the mirror image of encoder.

The main objective is to get the output that is identical with the input.The
dimensionality of the input and the output should be same.
Hyperparameters during autoencoder working
There are four hyper parameters that must be set before training the autoencoders
they are as follows
Code size: It is the number of nodes in the middle layer.
Number of layers: The autoencoder can be as deep as we like without considering
the input and the output.
Number of nodes per layer: The number of nodes per layer decreases with each
subsequent layer of the encoder and increases back in decoder.
Loss function: Mean squared error and Binary cross entropy can be used as loss
function.
Features of Autoencoders
Data Dependent: Autoencoders are compression techniques where the model can
be used only on data in which they have trained. For example

Model of auto encoder which is used to compress images of houses cannot be


used to compress human faces.

Lossy compression: Reconstruction of original data from compressed


representation would result in a degraded output.
Linear Autoencoder
Autoencoders consists of two main parts: encoder and decoder (figure 1). They work by automatically encoding data
based on input values, then performing an activation function, and finally decoding the data for output. A bottleneck
(the h layer(s)) of some sort imposed on the input features, compressing them into fewer categories. Thus, if some
inherent structure exists within the data, the autoencoder model will identify and leverage it to get the output.

A linear autoencoder uses zero or more linear activation function in its layers. Linear autoencoder can be trained
using only a single layer encoder and single layer decoder. This is called as Linear Autoencoder. This has one hidden
layer ,linear activations and squared error loss. It has to map D dimensional inputs to K dimensional subspace s .
The network compute the linear function.
Under Complete Autoencoder
While reconstructing an image, we do not want the neural network to simply copy the input to the output. This

type of memorization will lead to overfitting and less generalization power.

An autoencoder should be able to reconstruct the input data efficiently but by learning the useful properties

rather than memorizing it.

There are many ways to capture important properties when training an autoencoder. Let’s start by getting to

know about under complete autoencoders.


Undercomplete Autoencoder
In the previous section, we discussed that we want our autoencoder to learn the important features of the input data. It should do that
instead of trying to memorize and copy the input data to the output data.

We can do that if we make the hidden coding data to have less dimensionality than the input data. In an autoencoder, when the encoding

H has a smaller dimension than x then it is called an undercomplete autoencoder.

The above way of obtaining reduced dimensionality data is the same as PCA. In PCA also, we try to try to reduce the dimensionality of the
original data.The loss function for the above process can be described as,

L(x,r)=L(x,g(f(x))

where

L is the loss function. This loss function applies when the reconstruction r is dissimilar from the input
x

.
Undercomplete Autoencoder
An autoencoder in which the dimension of the code is less than the dimension of
the input is called as undercomplete encoder.
Undercomplete autoencoder limits the amount of information that can flow through
the network by constraining the number of hidden nodes.
Ideally this encoding learns and describes the latent attributes of the input data.
Undercomplete autoencoders is a sandwich architecture keeping the code size
small.Training the autoencoder to perform the input copying task will result in code
h taking on useful properties.
Undercomplete autoencoder
Learning an under complete representation forces the autoencoder to capture the
salient features of training data and it won't be able to copy the inputs to the
outputs.

Learning is described by loss function

When decoder is linear and L is mean squared error an undercomplete


autoencoder learns to span the same subspace as PCA.
Undercomplete Autoencoder
The objective of undercomplete autoencoder is to capture the most important features present in the data.
Undercomplete autoencoders have a smaller dimension for hidden layer compared to the input layer. This
helps to obtain important features from the data. It minimizes the loss function by penalizing the g(f(x)) for
being different from the input x.

Advantages-

● Undercomplete autoencoders do not need any regularization as they maximize the probability of data
rather than copying the input to the output.

Drawbacks-

● Using an overparameterized model due to lack of sufficient training data can create overfitting.
Overcomplete autoencoder
Undercomplete autoencoders with code dimensions less than the input
dimensions can learn the most salient features of data distribution . These
autoencoders fail to learn anything useful if encoder and decoder are given too
much capacity.

A similar problem occurs if hidden code has dimension greater than the input .
They are overcomplete autoencoders. In this case even a linear encoder and
linear decoder can learn to copy the input to the output without learning anything
useful about data distribution.
Regularised Autoencoder
There are other ways we can constraint the reconstruction of an autoencoder than to impose a
hidden layer of smaller dimension than the input.

Rather than limiting the model capacity by keeping the encoder and decoder shallow and the code
size small, regularized autoencoders use a loss function that encourages the model to have other
properties besides the ability to copy its input to its output.

In practice, we usually find two types of regularized autoencoder: the sparse autoencoder and the
denoising autoencoder.
Sparse Autoencoders
Sparse autoencoders are simply an autoencoders whose training criteria involves
a sparsity constraint or penalty.
Sparse autoencoders are used to learn features for another task such as
classification.
An autoencoder has been regularised to be sparse must respond to unique
statistical features of the dataset it has been trained on. In this way training to
perform the copying task with sparsity penalty can yield a model that has learned
useful features as byproduct.
Sparse Autoencoders
Sparse autoencoders have hidden nodes greater than input nodes. They can still discover important features
from the data. A generic sparse autoencoder is visualized where the obscurity of a node corresponds with the
level of activation.

Sparsity constraint is introduced on the hidden layer. This is to prevent output layer copy input data. Sparsity
may be obtained by additional terms in the loss function during the training process, either by comparing the
probability distribution of the hidden unit activations with some low desired value,or by manually zeroing all but
the strongest hidden unit activations.

Some of the most powerful AIs in the 2010s involved sparse autoencoders stacked inside of deep neural
networks.
Sparse autoencoders
Advantages-

● Sparse autoencoders have a sparsity penalty, a value close to zero but not exactly zero. Sparsity penalty is
applied on the hidden layer in addition to the reconstruction error. This prevents overfitting.
● They take the highest activation values in the hidden layer and zero out the rest of the hidden nodes. This
prevents autoencoders to use all of the hidden nodes at a time and forcing only a reduced number of hidden
nodes to be used.

Drawbacks-

● For it to be working, it's essential that the individual nodes of a trained model which activate are data
dependent, and that different inputs will result in activations of different nodes through the network.

Sparse autoencoders
● Another way we can constraint the reconstruction of autoencoder is to impose a
constraint in its loss. We could, for example, add a regularization term in the
loss function. Doing this will make our autoencoder learn sparse representation
of data.
● Sparsity constraint is introduced on the hidden layer such that only fraction of
nodes would have a non zero values called as active nodes. So only reduced
number of hidden nodes are used a time .
● A penalty term is added to the loss function such that only fraction of nodes
become active. So the autoencoder is forced to represent each input as a
combination of small nodes and to discover salient features in data.
● The value of sparsity penalty is close to zero but not zero . In addition to the
reconstruction error the sparsity penalty is applied on hidden layer which
prevents overfitting.
Sparse autoencoders
In sparse autoencoders hidden nodes are greater than input nodes . As only a
small subset of the nodes will be active at any time this method works even if the
code size is large.

The penalty term can be simply treated a sa regularizer term added to the feed
forward network whose primary task is to copy the input to the output .

The network can also perform some supervised task.


Denoising autoencoders
A Denoising Autoencoder is a modification on the autoencoder to prevent the network learning the
identity function.

Specifically, if the autoencoder is too big, then it can just learn the data, so the output equals the
input, and does not perform any useful representation learning or dimensionality reduction.

Denoising autoencoders solve this problem by corrupting the input data on purpose, adding noise
or masking some of the input values.
Denoising Autoencoder
This type of Autoencoder is an alternative to the concept of regular Autoencoder we just discussed,
which is prone to a high risk of overfitting.

In the case of a Denoising Autoencoder, the data is partially corrupted by noises added to
the input vector in a stochastic manner.

Then, the model is trained to predict the original, uncorrupted data point as its output.
Training process of Denoising autoencoders
● An input is sampled from our dataset.

● A corrupted version of this input is sampled from a stochastic mapping M( x


̃ |x)

● (x, x
̃ ) is used as a training example

Just like regular AE, our DAE is a feedforward network that can be trained with a gradient-based approximate

minimization on the negative log-likelihood


You can see on the image here some data represented by the blue dots. Our corrupted data will remain in the black circle of

equiprobable corruption.

During training, the aim is to minimize the negative log-likelihood cost function.

Thus, our model learns a reconstruction vector field D(E(x))-x, some of these vectors are represented by the red arrows.
Denoising autoencoders
Rather than adding a penalty to the loss function, we can obtain an autoencoder that learns
something useful by changing the reconstruction error term of the loss function

. This can be done by adding some noise of the input image and make the autoencoder learn to
remove it

. By this means, the encoder will extract the most important features and learn a robuster
representation of the data
Advantages of Denoising autoencoders
● Learns more robust filters

● Prevents from learning a simple identify function

● Decreases the risk of overfitting that can be problematic with regular AE


Contractive autoencoder
Contractive autoencoder simply targets to learn invariant representations to unimportant transformations for
the given data.

The main goal of contractive autoencoder is to have a robust learned representation that is less sensitive to
small variation of data.

A penalty term is applied to loss function so as to make representation robust.

Inorder to make derivatives of f to be as small as possible contractive autoencoder introduces an explicit


regulariser on the code h=f(x).

The penalty term is frobenius norm of jacobian matrix which is calculated with respect to the hidden layer

Frobenius term of Jacobian matrix is sum of square of all elements .


The goal of Contractive Autoencoder is to reduce the representation’s sensitivity towards the training
input data. In order to achieve this, we must add a regularizer or penalty term to the cost function that the
autoencoder is trying to minimize.

So from the mathematical point of view, it gives the effect of contraction by adding an additional term to
reconstruction cost and this term needs to comply with the Frobenius norm of the Jacobian matrix to be
applicable for the encoder activation sequence.

If this value is zero, it means that as we change input values, we don't observe any change on the learned
hidden representations.

But if the value is very large, then the learned representation is unstable as the input values change.
Link between Denoising and Contractive autoencoder
There is a connection between the denoising autoencoder and the contractive autoencoder:

the denoising reconstruction error is equivalent to a contractive penalty on the reconstruction function that
maps x to r - g(f(x)).

In other words, denoising autoencoders make the reconstruction function resist small but finite sized
perturbations of the input, whereas contractive autoencoders make the feature extraction function resist
infinitesimal perturbations of the input.
When using the Jacobian based contractive penalty to pretrain features f(x) for use with a classifier, the best
classification accuracy usually results from applying the contractive penalty to f(x) rather than to g(f(x)).

There are some important equations we need to know first before deriving contractive autoencoder. Before
going there, we'll touch base on the Frobenius norm of the Jacobian matrix.

The Frobenius norm, also called the Euclidean norm, is matrix norm of an mxn matrix A defined as the square
root of the sum of the absolute squares of its elements.

The Jacobian matrix is the matrix of all first-order partial derivatives of a vector-valued function. So when the
matrix is a square matrix, both the matrix and its determinant are referred to as the Jacobian.
where the penalty term, λ(J(x))^2, is the squared Frobenius norm of the Jacobian matrix of partial derivatives
associated with the encoder function and is defined in the second line.
Image Compression using autoencoders
Image Compression Using Autoencoders in Keras | Paperspace Blog

This is also the fifth practical of Deep learning

You might also like