0% found this document useful (0 votes)

5 views

DL-UNIT_3

The document discusses autoencoders and their relation to PCA, highlighting that while PCA is a linear technique for dimensionality reduction, autoencoders can learn both linear and nonlinear representations. It also covers various regularization techniques for autoencoders, such as L1/L2 regularization, denoising, and sparse autoencoders, to improve generalization and prevent overfitting. Additionally, it explains concepts like the bias-variance tradeoff, early stopping, dataset augmentation, parameter sharing, greedy layer-wise training, and activation functions in neural networks.

Uploaded by

23f2002473

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

5 views

DL-UNIT_3

Uploaded by

23f2002473

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 14

Deep Learning: - UNIT- 3

Autoencoders and relation to PCA:

Autoencoders and Principal Component Analysis (PCA) are both techniques used for
dimensionality reduction and feature extraction, but they have key differences in how they
achieve this.

1. Principal Component Analysis (PCA)

PCA is a linear technique that finds an optimal set of orthogonal axes (principal components)
along which the data varies the most. It projects data onto these components to reduce
dimensionality while retaining as much variance as possible. PCA is mathematically
straightforward and uses Singular Value Decomposition (SVD) to compute the principal
components.

Linear transformation

Finds directions of maximum variance

Uses eigenvectors of the covariance matrix

Optimal in terms of minimizing reconstruction error for linear projections

Interpretable as rotations and projections in feature space

2. Autoencoders

Autoencoders are a type of neural network that learn efficient data representations in an
unsupervised manner. They consist of an encoder, which compresses input data into a lower-
dimensional latent space, and a decoder, which reconstructs the original data from this
compressed representation. Autoencoders can learn both linear and nonlinear transformations,
making them more flexible than PCA.

Can learn nonlinear mappings

Typically consist of multiple layers (deep autoencoders)

Minimize reconstruction error using neural network optimization techniques (e.g.,

backpropagation)

Can incorporate additional constraints like sparsity or denoising

Relation Between PCA and Autoencoders

A linear autoencoder (with a single hidden layer and linear activation functions) behaves
similarly to PCA. It finds a subspace that captures maximum variance, similar to PCA’s principal
components.
Unlike PCA, nonlinear autoencoders can capture more complex patterns in data by learning a
more flexible manifold structure.

Autoencoders, especially deep autoencoders, can learn hierarchical representations, which PCA
cannot.PCA is deterministic and has a closed-form solution, while autoencoders require training
with optimization methods.

Which One to Use?

Use PCA when you need a fast, interpretable, and optimal linear transformation.

Use Autoencoders when your data is complex and you suspect nonlinear structures that PCA
cannot capture.

Regularization In Autoencoders:
Regularization in autoencoders helps improve their generalization ability by preventing
overfitting and ensuring meaningful feature extraction. Various regularization techniques can
be applied to autoencoders, including:

1. L1 & L2 Regularization (Weight Decay)

L1 Regularization (Lasso) promotes sparsity in the weights, encouraging certain connections to

be zero.

L2 Regularization (Ridge) prevents large weight values, leading to a more stable and smooth
representation.

2. Sparse Autoencoders

Introduces a sparsity constraint on the hidden units using KL divergence or L1 regularization.

Ensures that only a subset of neurons activate, leading to efficient feature learning.

3. Denoising Autoencoders

Adds noise (e.g., Gaussian, salt-and-pepper) to the input and trains the network to reconstruct
the original clean data.

Encourages robustness and prevents the model from memorizing training data.

4. Contractive Autoencoders

Adds a penalty term on the Jacobian of the encoder to minimize sensitivity to small input
variations.

Forces the latent representation to be robust to slight changes in input.

5. Variational Autoencoders (VAE)

Introduces a probabilistic framework by enforcing a prior distribution (e.g., Gaussian) on the

latent space.

Uses KL divergence to regularize the latent distribution, ensuring structured and meaningful
embeddings.

6. Dropout Regularization

Randomly drops neurons during training to prevent over-reliance on specific features.

Encourages redundancy and robustness in learned representations.

7. Batch Normalization & Layer Normalization

Normalizes activations to stabilize training and reduce internal covariate shifts.

Improves generalization and speeds up convergence.

Autoencoders are a specialized class of algorithms that can learn efficient representations of
input data with no need for labels. It is a class of artificial neural networks designed for
unsupervised learning. Learning to compress and effectively represent input data without
specific labels is the essential principle of an automatic decoder. This is accomplished using a
two-fold structure that consists of an encoder and a decoder. The encoder transforms the input
data into a reduced-dimensional representation, which is often referred to as “latent space” or
“encoding”. From that representation, a decoder rebuilds the initial input. For the network to
gain meaningful patterns in data, a process of encoding and decoding facilitates the definition
of essential features.
Denoising Autoencoders:
Now, a denoising autoencoder is a modification of the original autoencoder in which instead of
giving the original input we give a corrupted or noisy version of input to the encoder while
decoder loss is calculated concerning original input only. This results in efficient learning of
autoencoders and the risk of autoencoder becoming an identity function is significantly
reduced.

Denoising Autoencoders (DAEs) are a type of autoencoder designed to remove noise from data
by learning a robust representation of the input. They are widely used in image processing,
speech enhancement, and feature learning.

How Denoising Autoencoders Work

• Corrupting the Input: A noisy version of the input is created by adding noise (e.g.,
Gaussian noise, salt-and-pepper noise, or occlusions).

• Encoding: The noisy input is passed through an encoder, which maps it to a lower-
dimensional latent space.

• Decoding: The decoder reconstructs the denoised version of the input from the latent
representation.

• Loss Function: The model is trained using a loss function that minimizes the difference
between the reconstructed output and the clean input.

• Applications of Denoising Autoencoders

• Image Denoising: Removing noise from images (e.g., medical imaging, photography).

• Speech Enhancement: Improving the quality of speech signals.

• Feature Learning: Extracting robust representations for downstream tasks like

classification.

• Anomaly Detection: Identifying irregularities in data by comparing reconstructed

outputs to the original input.
Sparse Autoencoders:
KL Divergence: Encourages the activation of neurons to match a desired sparsity level.

L1 Regularization (Lasso): Promotes sparsity by penalizing large weights.

4. Hidden Layer Activation

Uses non-linear activation functions (ReLU, Sigmoid, Tanh) to control neuron activation.

The average activation of neurons is kept low to enforce

sparsity.

Bias Variance Tradeoff:

The bias-variance tradeoff is a fundamental concept in machine learning and statistics that
describes the balance between two sources of error that affect the performance of predictive
models:

1. Bias

Bias refers to the error introduced by approximating a real-world problem with a simplified
model.

High bias means the model makes strong assumptions about the data, leading to underfitting.

Example: A linear regression model used to fit a complex, highly non-linear dataset will have
high bias.

2. Variance

Variance refers to how much the model's predictions fluctuate based on the training data.

High variance means the model is too sensitive to small fluctuations in the training set, leading
to overfitting.
Example: A deep neural network that perfectly fits the training data but performs poorly on
new data has high variance.

The Tradeoff

Increasing model complexity reduces bias but increases variance.

Simplifying the model reduces variance but increases bias.

The goal is to find the optimal balance where both bias and variance are minimized to achieve
the lowest total error.

Graphical Representation

The typical error curve shows:

• High bias: The model performs poorly on both training and test data.

• High variance: The model does well on training data but poorly on test data.

• Optimal point: A balance where both errors are minimized.

• How to Manage the Tradeoff

• Regularization (e.g., L1/L2 penalties): Prevents overfitting by discouraging overly

complex models.

• Cross-validation: Helps detect high variance and tune model complexity.

• Feature selection: Reducing irrelevant features can help control variance.

• Ensemble methods (e.g., bagging, boosting): Helps reduce variance while maintaining
low bias.

The bias is known as the difference between the prediction of the values by the Machine
Learning model and the correct value. Being high in biasing gives a large error in training as well
as testing data. It recommended that an algorithm should always be low-biased to avoid the
problem of underfitting. By high bias, the data predicted is in a straight line format, thus not
fitting accurately in the data in the data set. Such fitting is known as the Underfitting of Data.
This happens when the hypothesis is too simple or linear in nature. Refer to the graph given
below for an example of such a situation.
Early Stopping:
In Regularization by Early Stopping, we stop training the model when the performance on the
validation set is getting worse- increasing loss decreasing accuracy, or poorer scores of the
scoring metric. By plotting the error on the training dataset and the validation dataset together,
both the errors decrease with a number of iterations until the point where the model starts to
overfit. After this point, the training error still decreases but the validation error increases.

So, even if training is continued after this point, early stopping essentially returns the set of
parameters that were used at this point and so is equivalent to stopping training at that point.
So, the final parameters returned will enable the model to have low variance and better
generalization. The model at the time the training is stopped will have a better generalization
performance than the model with the least training error.

on the validation set is getting worse- increasing loss or decreasing accuracy or poorer scores

Early stopping can be thought of as implicit regularization, contrary to regularization via weight
decay. This method is also efficient since it requires less amount of training data, which is not
always available. Due to this fact, early stopping requires lesser time for training compared to
other regularization methods. Repeating the early stopping process many times may result in
the model overfitting the validation dataset, just as similar as overfitting occurs in the case of
training data.

The number of iterations (i.e. epoch) taken to train the model can be considered a
hyperparameter. Then the model has to find an optimum value for this hyperparameter (by
hyperparameter tuning) for the best performance of the learning model.

Early stopping is a regularization technique used in machine learning to prevent overfitting by

stopping the training process when a model’s performance on a validation dataset starts to
degrade.

How Early Stopping Works

• Monitor Performance – During training, the model’s loss or accuracy is evaluated on

both the training and validation datasets.
• Detect Overfitting – If the validation loss starts increasing while the training loss
continues to decrease, the model is likely overfitting.

• Stop Training – Training is stopped when the validation loss (or another metric) has not
improved for a set number of epochs (patience).

• Use Best Model – The model is typically restored to the weights from the epoch with the
best validation performance.

Dataset Augmentation:
The best way to make a machine learning model generalize better is to train it on more data. Of
course, in practice, the amount of data we have is limited. One way to get around this problem
is to create new data and add it to the training set.

Data augmentation is easiest for classification, Classifier takes high-dimensional input x and
summarizes it with a single category identity y. Main task of classifier is to be invariant to a
wide variety of transformations. We can generate new samples (x,y) just by transforming
inputs.

This Approach not easily generalized to other problems, Example density estimation problem. It
is not possible generate new data without solving density estimation.

Dataset augmentation is a technique used in deep learning to artificially expand the size and
diversity of training datasets by applying various transformations to the existing data. This helps
improve model generalization, reduce overfitting, and make models more robust to real-world
variations.

Parameter Sharing In TYPING:

The parameters of one model, trained as a classifier in a supervised paradigm, were regularised
to be close to the parameters of another model, trained in an unsupervised paradigm, using
this method (to capture the distribution of the observed input data). Many of the parameters in
the classifier model might be linked with similar parameters in the unsupervised model thanks
to the designs. While a parameter norm penalty is one technique to require sets of parameters
to be equal, constraints are a more prevalent way to regularise parameters to be close to one
another. Because we view the numerous models or model components as sharing a unique set
of parameters, this form of regularisation is commonly referred to as parameter sharing. The
fact that only a subset of the parameters (the unique set) needs to be retained in memory is a
significant advantage of parameter sharing over regularising the parameters to be close
(through a norm penalty). This can result in a large reduction in the memory footprint of certain
models, such as the convolutional neural network.

Convolutional neural networks (CNNs) used in computer vision are by far the most widespread
and extensive usage of parameter sharing. Many statistical features of natural images are
translation insensitive. A shot of a cat, for example, can be translated one pixel to the right and
still be a shot of a cat. By sharing parameters across several picture locations, CNNs take this
property into account. Different locations in the input are computed with the same feature (a
hidden unit with the same weights). This indicates that whether the cat appears in column i or
column i + 1 in the image, we can find it with the same cat detector.

CNN’s have been able to reduce the number of unique model parameters and raise network
sizes greatly without requiring a comparable increase in training data thanks to parameter
sharing. It’s still one of the best illustrations of how domain knowledge can be efficiently
integrated into the network architecture.

In the context of machine learning, "parameter sharing" refers to the practice of using the same
set of parameters across different parts of a model, essentially allowing different sections to
learn similar features and reducing the overall number of parameters needed, which is
particularly useful in convolutional neural networks (CNNs) where features might be present at
different locations within an image; it's a way to make the model more efficient and robust by
leveraging shared information across various parts of the data.

Greedy Layer Wise Training:

Artificial intelligence has undergone a revolution thanks to neural networks, which have made
significant strides possible in a number of areas like speech recognition, computer vision, and
natural language processing. Deep neural network training, however, may be difficult,
particularly when working with big, complicated datasets. One method that tackles some of
these issues is greedy layer-wise pre-training, which initializes deep neural network settings
layer by layer.

Greedy layer-wise pre-training is used to initialize the parameters of deep neural networks
layer by layer, beginning with the first layer and working through each one that follows. A layer
is trained as if it were a stand-alone model at each step, using input from the layer before it and
output to go to the layer after it. Typically, developing usable representations of the input data
is the training aim.

Processes of Greedy Layer-Wise Pre-Training

The process of greedy layer-wise pre-training can be staged as follows:

• Initialization: The neural network's first layer is trained on its own using autoencoders
and other unsupervised learning strategies. Learning a collection of features that
highlight important elements of the input data is the aim.

• Extracting Feature: The activations of the first layer are utilized as features to train the
subsequent layer after it has been trained. Each layer learns to represent the traits
discovered by the layer before it in a higher-level abstraction when this process is
repeated repeatedly.

• Fine-Tuning: The network is adjusted as a whole using supervised learning methods

once every layer has been pretrained in this way. To maximize performance on a
particular job, this entails simultaneously modifying all of the network's parameters
using a labeled dataset

Better Activation Functions:

An activation function is a mathematical function applied to the output of a neuron. It
introduces non-linearity into the model, allowing the network to learn and represent complex
patterns in the data. Without this non-linearity feature, a neural network would behave like a
linear regression model, no matter how many layers it has.

The activation function decides whether a neuron should be activated by calculating the
weighted sum of inputs and adding a bias term. This helps the model make complex decisions
and predictions by introducing non-linearities to the output of each neuron.

A paradigm for information processing that draws inspiration from the brain is called an
artificial neural network (ANN). ANNs learn via imitation just like people do. Through a learning
process, an ANN is tailored for a particular purpose, including such pattern classification or data
classification. The synapses interconnections that exist between both the neurons change
because of learning.

What input layer to employ with in hidden layer and at the input level of the network is one of
the decisions you get to make while creating a neural network. This article discusses a few of
the alternatives.

The nerve impulse in neurology serves as a model for activation functions within computer
science. A chain reaction permits a neuron to "fire" and send a signal to nearby neurons if the
induced voltage between its interior and exterior exceeds a threshold value known as the
action potential. The next series of activations, known as a "spike train," enables motor neurons
to transfer commands from of the brain to the limbs and sensory neurons too transmit
sensation from the digits to the brain.

In artificial neural networks, an activation function is one that outputs a smaller value for tiny
inputs and a higher value if its inputs are greater than a threshold. An activation function "fires"
if the inputs are big enough; otherwise, nothing happens. An activation function, then, is a gate
that verifies how an incoming value is higher than a threshold value.

Because they introduce non-linearities in neural networks and enable the neural networks can
learn powerful operations, activation functions are helpful. A feedforward neural network
might be refactored into a straightforward linear function or matrix transformation on to its
input if indeed the activation functions were taken out.
By generating a weighted total and then including bias with it, the activation function
determines whether a neuron should be turned on. The activation function seeks to boost a
neuron's output's nonlinearity.

Explanation: As we are aware, neurons in neural networks operate in accordance with weight,
bias, and their corresponding activation functions. Based on the mistake, the values of the
neurons inside a neural network would be modified. This process is known as back-propagation.
Back-propagation is made possible by activation functions since they provide the gradients and
error required to change the biases and weights.

Better Weight Initialization Methods:

Weight initialization is an essential aspect of training neural networks, influencing their
convergence speed, stability, and general performance. Initializing the weights of a neural
community properly can cause quicker convergence at some stage in schooling and better
generalization on unseen data.

A neural network may be considered as a function with learnable parameters, which are
commonly referred to as weights and biases. Now, when neural nets are first trained, these
parameters (typically the weights) are initialized in a variety of ways, including using constant
values like 0's and 1's, values sampled from some distribution (typically a uniform distribution
or normal distribution), and other sophisticated schemes such as Xavier Initialization.

A neural network's performance is heavily influenced by how its parameters are initialized
when it first begins training. Furthermore, if we initialize it at random for each run, it is certain
to be non-reproducible (nearly) and even underperforming. On the other hand, if we initialize it
with constant values, it may take an extremely long time to converge. We also eliminate the
beauty of randomness, giving a neural net the ability to achieve convergence faster via
gradient-based learning. We certainly require a better technique to initialize it.

Challenges of Weight Initialisation

Weight initialization presents a hurdle owing to the non-linear activation functions employed in
neural networks, such as sigmoid, tanh, and ReLU. These activation functions operate optimally
within particular ranges. For example, the sigmoid function returns values between 0 and 1,
whereas tanh returns values between -1 and 1. If the initial weights are too big or too little, the
activations might become saturated, resulting in disappearing gradients or sluggish
convergence.

Another problem is keeping the variation of activations and gradients consistent across the
network's layers. As the signal travels through numerous levels, it might increase or diminish,
compromising training stability. Proper weight initialization strategies strive to overcome these
problems while also ensuring robust and efficient neural network training.
Batch Normalization:
Batch normalization (also known as batch norm) is a method used to make training of artificial
neural networks faster and more stable through normalization of the layers' inputs by re-
centering and re-scaling. It was proposed by Sergey Ioffe and Christian Szegedy in 2015.[1]

The reasons behind the effectiveness of batch normalization remain under discussion. It was
believed that it can mitigate the problem of internal covariate shift, where parameter
initialization and changes in the distribution of the inputs of each layer affect the learning rate
of the network.[1] Recently, some scholars have argued that batch normalization does not
reduce internal covariate shift, but rather smooths the objective function, which in turn
improves the performance.[2] However, at initialization, batch normalization in fact induces
severe gradient explosion in deep networks, which is only alleviated by skip connections in
residual networks.[3] Others maintain that batch normalization achieves length-direction
decoupling, and thereby accelerates neural networks

Batch normalization was introduced to mitigate the internal covariate shift problem in neural
networks by Sergey Ioffe and Christian Szegedy in 2015. The normalization process involves
calculating the mean and variance of each feature in a mini-batch and then scaling and shifting
the features using these statistics. This ensures that the input to each layer remains roughly in
the same distribution, regardless of changes in the distribution of earlier layers' outputs.
Consequently, Batch Normalization helps in stabilizing the training process, enabling higher
learning rates and faster convergence.

Batch normalization is a deep learning approach that has been shown to significantly improve
the efficiency and reliability of neural network models. It is particularly useful for training very
deep networks, as it can help to reduce the internal covariate shift that can occur during
training.

Batch normalization is a supervised learning method for normalizing the interlayer outputs of a
neural network. As a result, the next layer receives a “reset” of the output distribution from the
preceding layer, allowing it to analyze the data more effectively.

The term “internal covariate shift” is used to describe the effect that updating the parameters
of the layers above it has on the distribution of inputs to the current layer during deep learning
training. This can make the optimization process more difficult and can slow down the
convergence of the model.

100 NLP Questions
100% (6)
100 NLP Questions
23 pages
Lecture 14 Autoencoders
No ratings yet
Lecture 14 Autoencoders
39 pages
Autoencoder - Unit 4
No ratings yet
Autoencoder - Unit 4
39 pages
Autoencoders
No ratings yet
Autoencoders
4 pages
Autoencoders
No ratings yet
Autoencoders
35 pages
Jntuk r20 Unit-V Deep Learning Techniques (WWW - Jntumaterials.co - In)
No ratings yet
Jntuk r20 Unit-V Deep Learning Techniques (WWW - Jntumaterials.co - In)
61 pages
reserch papers on deep learning mpgi
No ratings yet
reserch papers on deep learning mpgi
6 pages
DL Class5
No ratings yet
DL Class5
23 pages
Unit II
No ratings yet
Unit II
35 pages
Ch3-Auto-encoder
No ratings yet
Ch3-Auto-encoder
40 pages
Lecture 23b Auto Encoder
No ratings yet
Lecture 23b Auto Encoder
27 pages
module 03
No ratings yet
module 03
13 pages
1 Autoencoders
No ratings yet
1 Autoencoders
22 pages
Unit 5e - Autoencoders
No ratings yet
Unit 5e - Autoencoders
32 pages
CVDL Cae 2
No ratings yet
CVDL Cae 2
7 pages
DL UNIT 4
No ratings yet
DL UNIT 4
21 pages
UNIT-5 part1
No ratings yet
UNIT-5 part1
15 pages
UNIT 3
No ratings yet
UNIT 3
23 pages
Deep Learning Module-2 & 4
No ratings yet
Deep Learning Module-2 & 4
48 pages
Gen AI Unit 2
No ratings yet
Gen AI Unit 2
65 pages
Autoencoder
No ratings yet
Autoencoder
39 pages
D5_PPT
No ratings yet
D5_PPT
79 pages
ch14 Autoencoder
No ratings yet
ch14 Autoencoder
42 pages
6. Brief Introduction on Current Research Areas - Autoencoders
No ratings yet
6. Brief Introduction on Current Research Areas - Autoencoders
20 pages
Auto Encoder
No ratings yet
Auto Encoder
39 pages
Unit5 Autoencoders.doc
No ratings yet
Unit5 Autoencoders.doc
45 pages
Lec16 - Autoencoders
No ratings yet
Lec16 - Autoencoders
18 pages
Autoencoders
No ratings yet
Autoencoders
20 pages
Generative_Models
No ratings yet
Generative_Models
65 pages
Unit-V Deep Learning Techniques
100% (1)
Unit-V Deep Learning Techniques
31 pages
Unit-5 Auto Encoders in Deep Learning
No ratings yet
Unit-5 Auto Encoders in Deep Learning
23 pages
unit-iv-v-deep-learning-material
No ratings yet
unit-iv-v-deep-learning-material
32 pages
Autoencoder_2
No ratings yet
Autoencoder_2
16 pages
Autoencoders
No ratings yet
Autoencoders
12 pages
21CS743 Model Question Paper Solution
No ratings yet
21CS743 Model Question Paper Solution
32 pages
UNSUPERVISED DEEP LEARNING-UNIT 4
No ratings yet
UNSUPERVISED DEEP LEARNING-UNIT 4
26 pages
deep learnig u2
No ratings yet
deep learnig u2
4 pages
368bb2df9cfd25b100b4e1d96143f6da1c53091d-1746848322051
No ratings yet
368bb2df9cfd25b100b4e1d96143f6da1c53091d-1746848322051
7 pages
SCSA3015 Deep Learning Unit 3
100% (1)
SCSA3015 Deep Learning Unit 3
23 pages
Deep Learning: Prof:Naveen Ghorpade
No ratings yet
Deep Learning: Prof:Naveen Ghorpade
43 pages
Introduction To Autoencoders: A Brief Overview
No ratings yet
Introduction To Autoencoders: A Brief Overview
27 pages
Autoencoder
No ratings yet
Autoencoder
4 pages
Deep Learning: Autoencoder
No ratings yet
Deep Learning: Autoencoder
42 pages
MODULE 5 Auto-Encoders and Generative Models
No ratings yet
MODULE 5 Auto-Encoders and Generative Models
25 pages
Vae - Gan 1
No ratings yet
Vae - Gan 1
136 pages
Autoencoders: Presented By: 2019220013 Balde Lansana (
No ratings yet
Autoencoders: Presented By: 2019220013 Balde Lansana (
21 pages
Auto Encoders
No ratings yet
Auto Encoders
4 pages
7& 9 Autoencoder and Variational Autoencoder
No ratings yet
7& 9 Autoencoder and Variational Autoencoder
13 pages
UNIT V
No ratings yet
UNIT V
32 pages
DeepLearning Unit IV Notes
No ratings yet
DeepLearning Unit IV Notes
58 pages
Study Materials - Denoising Autoencoders
No ratings yet
Study Materials - Denoising Autoencoders
7 pages
Auto Encoder S
No ratings yet
Auto Encoder S
32 pages
UNIT-V DL
No ratings yet
UNIT-V DL
31 pages
L23_autoencoders
No ratings yet
L23_autoencoders
16 pages
tutorial 1,2
No ratings yet
tutorial 1,2
12 pages
Vanilla Autoencoders
No ratings yet
Vanilla Autoencoders
2 pages
DeepLearning 4 and 5
No ratings yet
DeepLearning 4 and 5
60 pages
465-Lecture 12
No ratings yet
465-Lecture 12
31 pages
Autoencoders and Their Applications in Machine Learning
No ratings yet
Autoencoders and Their Applications in Machine Learning
52 pages
Auto Encoder
No ratings yet
Auto Encoder
10 pages
DATA MINING and MACHINE LEARNING. PREDICTIVE TECHNIQUES: ENSEMBLE METHODS, BOOSTING, BAGGING, RANDOM FOREST, DECISION TREES and REGRESSION TREES.: Examples with MATLAB
From Everand
DATA MINING and MACHINE LEARNING. PREDICTIVE TECHNIQUES: ENSEMBLE METHODS, BOOSTING, BAGGING, RANDOM FOREST, DECISION TREES and REGRESSION TREES.: Examples with MATLAB
César Pérez López
No ratings yet
DL-UNIT_5
No ratings yet
DL-UNIT_5
10 pages
DL-UNIT_4
No ratings yet
DL-UNIT_4
14 pages
DL-UNIT_2
No ratings yet
DL-UNIT_2
7 pages
DL-UNIT_1
No ratings yet
DL-UNIT_1
12 pages
EfficientNet Tutorial
No ratings yet
EfficientNet Tutorial
20 pages
CS601_Machine Learning_Unit 2 New
No ratings yet
CS601_Machine Learning_Unit 2 New
56 pages
Beyond A Gaussian Denoiser: Residual Learning of Deep CNN For Image Denoising
No ratings yet
Beyond A Gaussian Denoiser: Residual Learning of Deep CNN For Image Denoising
13 pages
Jacob Quantization and Training
No ratings yet
Jacob Quantization and Training
10 pages
Diffusion
100% (5)
Diffusion
62 pages
Lecture 8.7
No ratings yet
Lecture 8.7
9 pages
DL Student Lab Manual
No ratings yet
DL Student Lab Manual
81 pages
Ahmad Et Al. - 2023
No ratings yet
Ahmad Et Al. - 2023
19 pages
Convolutional Knowledge Graph Embeddings
No ratings yet
Convolutional Knowledge Graph Embeddings
8 pages
DN CNN
No ratings yet
DN CNN
14 pages
IJST-2021-1266
No ratings yet
IJST-2021-1266
15 pages
How_to_get_pavement_distress_detection_ready_for_deep_learning_A_systematic_approach
No ratings yet
How_to_get_pavement_distress_detection_ready_for_deep_learning_A_systematic_approach
9 pages
Lec3 MLP Optimization
No ratings yet
Lec3 MLP Optimization
86 pages
Retina Net
No ratings yet
Retina Net
2 pages
Brain Sciences: A Deep Siamese Convolution Neural Network For Multi-Class Classification of Alzheimer Disease
No ratings yet
Brain Sciences: A Deep Siamese Convolution Neural Network For Multi-Class Classification of Alzheimer Disease
15 pages
DL_Unit-3
No ratings yet
DL_Unit-3
56 pages
Advanced Deep Learning Questions - ChatGPT
No ratings yet
Advanced Deep Learning Questions - ChatGPT
13 pages
Mixture of A Million Experts: Google Deepmind
No ratings yet
Mixture of A Million Experts: Google Deepmind
12 pages
ISE-2 5 DL marks new Imp
No ratings yet
ISE-2 5 DL marks new Imp
17 pages
A New Framework of Swarm Learning Consolidating Knowledge From Multi-Center Non-IID Data For Medical Image Segmentation
No ratings yet
A New Framework of Swarm Learning Consolidating Knowledge From Multi-Center Non-IID Data For Medical Image Segmentation
12 pages
Convolution Neural Network (CNN) Unit 2: Dr. Kavita R Singh
No ratings yet
Convolution Neural Network (CNN) Unit 2: Dr. Kavita R Singh
65 pages
Deep learning in computer vision: principles and applications First Edition. Edition Mahmoud Hassaballah - Own the complete ebook with all chapters in PDF format
100% (2)
Deep learning in computer vision: principles and applications First Edition. Edition Mahmoud Hassaballah - Own the complete ebook with all chapters in PDF format
61 pages
A Survey of Deep Convolutional Neural Networks Applied for Prediction of Plant Leaf Diseases
No ratings yet
A Survey of Deep Convolutional Neural Networks Applied for Prediction of Plant Leaf Diseases
35 pages
Final Documentation 9
No ratings yet
Final Documentation 9
69 pages
Exponential Convergence Rates For Batch Normalization - 2
No ratings yet
Exponential Convergence Rates For Batch Normalization - 2
1 page
tarp_da_5
No ratings yet
tarp_da_5
29 pages
Unit 2 - Neural Networks (DL Illustrated)
No ratings yet
Unit 2 - Neural Networks (DL Illustrated)
146 pages
Batch Normalization
No ratings yet
Batch Normalization
6 pages
DL - Assignment 10 Solution
100% (2)
DL - Assignment 10 Solution
6 pages

DL-UNIT_3

Uploaded by

DL-UNIT_3

Uploaded by

Deep Learning: - UNIT- 3

Autoencoders and relation to PCA:

1. Principal Component Analysis (PCA)

Finds directions of maximum variance

Uses eigenvectors of the covariance matrix

Optimal in terms of minimizing reconstruction error for linear projections

Interpretable as rotations and projections in feature space

Can learn nonlinear mappings

Typically consist of multiple layers (deep autoencoders)

Minimize reconstruction error using neural network optimization techniques (e.g.,

Can incorporate additional constraints like sparsity or denoising

Relation Between PCA and Autoencoders

Which One to Use?

1. L1 & L2 Regularization (Weight Decay)

L1 Regularization (Lasso) promotes sparsity in the weights, encouraging certain connections to

Introduces a sparsity constraint on the hidden units using KL divergence or L1 regularization.

Forces the latent representation to be robust to slight changes in input.

5. Variational Autoencoders (VAE)

Introduces a probabilistic framework by enforcing a prior distribution (e.g., Gaussian) on the

Randomly drops neurons during training to prevent over-reliance on specific features.

Encourages redundancy and robustness in learned representations.

7. Batch Normalization & Layer Normalization

Normalizes activations to stabilize training and reduce internal covariate shifts.

Improves generalization and speeds up convergence.

How Denoising Autoencoders Work

• Applications of Denoising Autoencoders

• Speech Enhancement: Improving the quality of speech signals.

• Feature Learning: Extracting robust representations for downstream tasks like

• Anomaly Detection: Identifying irregularities in data by comparing reconstructed

L1 Regularization (Lasso): Promotes sparsity by penalizing large weights.

4. Hidden Layer Activation

The average activation of neurons is kept low to enforce

Bias Variance Tradeoff:

Increasing model complexity reduces bias but increases variance.

Simplifying the model reduces variance but increases bias.

The typical error curve shows:

• Optimal point: A balance where both errors are minimized.

• How to Manage the Tradeoff

• Regularization (e.g., L1/L2 penalties): Prevents overfitting by discouraging overly

• Cross-validation: Helps detect high variance and tune model complexity.

• Feature selection: Reducing irrelevant features can help control variance.

Early stopping is a regularization technique used in machine learning to prevent overfitting by

How Early Stopping Works

• Monitor Performance – During training, the model’s loss or accuracy is evaluated on

Parameter Sharing In TYPING:

Greedy Layer Wise Training:

Processes of Greedy Layer-Wise Pre-Training

The process of greedy layer-wise pre-training can be staged as follows:

• Fine-Tuning: The network is adjusted as a whole using supervised learning methods

Better Activation Functions:

Better Weight Initialization Methods:

Challenges of Weight Initialisation

You might also like