0% found this document useful (0 votes)

4 views

Deep Learning

The document provides an overview of deep learning and neural networks, detailing their structure, types, and functions such as forward and backward propagation. It discusses various activation functions, loss functions, optimizers, and regularization methods, highlighting their importance in training neural networks. Additionally, it addresses the limitations of artificial neural networks, including computational complexity and the need for large datasets.

Uploaded by

221210088

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

4 views

Deep Learning

Uploaded by

221210088

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 20

DEEP LEARNING VS MACHINE LEARNING:

DEEP LEARNING ?
NEURAL NETWORKS ?
A neural network is a computational model inspired by the human brain
that is used in machine learning and artificial intelligence. It consists of
layers of interconnected neurons (also called nodes), which process and
transmit information.

Basic Structure of a Neural Network:

1. Input Layer – Takes in data.

2. Hidden Layers – Perform computations using weights, biases, and
activation functions.
3. Output Layer – Produces the final result.

Neural networks learn by adjusting weights through a process called

backpropagation, using optimization techniques like gradient descent.
They are widely used in tasks like image recognition, natural language
processing, and predictive analytics.

ANN (artificial nn) : works like neural networks

CNN (convolutional nn) : image classification, object detection
RNN (recurrent nn) : speech/text/audio detection
GAN (generate ann) : text/images

ANN >>
Artificial Neural Networks contain artificial neurons which are called units.
Artificial Neural Network has an input layer, an output layer as well as
hidden layers. The input layer receives data from the outside world which
the neural network needs to analyze or learn about. Then this data passes
through one or multiple hidden layers that transform the input into data that
is valuable for the output layer. Finally, the output layer provides an output
in the form of a response of the Artificial Neural Networks to input data
provided. As the data transfers from one unit to another, the neural network
learns more and more about the data which eventually results in an output
from the output layer.
STRUCTURE OF ANN

PERCEPTRON :
A perceptron is a neural network unit that does a precise computation to
detect features in the input data. Perceptron is mainly used to classify the
data into two parts. Therefore, it is also known as Linear Binary Classifier.
Perceptron uses the step function. The activation function is used to map
the input between the required value like (0, 1) or (-1, 1).
o Input value or One input layer: The input layer of the perceptron is made
of artificial input neurons and takes the initial data into the system for
further processing.
o Weights and Bias:
Weight: It represents the dimension or strength of the connection between
units. If the weight to node 1 to node 2 has a higher quantity, then neuron 1
has a more considerable influence on the neuron.
Bias: It is the same as the intercept added in a linear equation. It is an
additional parameter which task is to modify the output along with the
weighted sum of the input to the other neuron.
o Net sum: It calculates the total sum.
o Activation Function: A neuron can be activated or not, is determined by
an activation function. The activation function calculates a weighted sum
and further adding bias with it to give the result
LIMITATIONS OF ANN ;
1. Computational Complexity

● ANNs require high computational power, especially for deep

networks.
● Training large models is time-consuming and demands GPUs/TPUs.

2. Large Training Data Requirement

● ANNs need a large amount of labeled data for effective learning.

● Insufficient data can lead to overfitting or poor generalization.

3. Black Box Nature

● Neural networks work like a black box—it is difficult to interpret how

decisions are made.
● Lack of explainability makes them hard to debug or trust in critical
applications.

4. Overfitting and Underfitting

● Overfitting: ANN learns too much from training data and fails on new
data.
● Underfitting: ANN fails to capture complex patterns due to insufficient
training.

5. Local Minima in Optimization

● The training process relies on gradient descent, which may get stuck
in local minima, affecting model performance.

6. Hyperparameter Sensitivity

● ANN performance depends on choosing optimal hyperparameters

(learning rate, number of layers, neurons, etc.).
● Tuning hyperparameters is complex and time-consuming.

7. Lack of Standardization

● No single architecture fits all problems.

● Model design and tuning require trial-and-error approaches.

8. High Energy Consumption

● Training deep ANNs consumes a lot of electricity, making them less

sustainable.

9. Poor Performance on Small Datasets

● Unlike traditional machine learning models, ANNs struggle when data

is limited.
● Small datasets can lead to poor generalization and unreliable
predictions.

10. Sensitivity to Noisy Data

● ANNs can misinterpret noisy or irrelevant data, leading to incorrect

predictions.
● Data preprocessing and feature engineering are crucial to avoid this
issue

FORWARD PROPAGATION :
Forward propagation (or forward pass) refers to the calculation and storage of
intermediate variables (including outputs) for a neural network in order from the input
layer to the output layer.
Forward is same as normal neural network.
Weights, bias, activation function and all

BACKWARD PROPAGATION :
Backpropagation (Backward Propagation of Errors) is a supervised learning
algorithm used to train artificial neural networks. It minimizes the error by
adjusting weights and biases through gradient descent.

NUMERICALS

ACTIVATION FUNCTION :
If hum ye func use nahi krenge toh hmara neural network non linear data
capture nahi kr payega sirf linear data hi capture krega.
An activation function determines the output of a neuron in a neural network
by adding non-linearity, enabling the network to learn complex patterns from
the data.
Types of Activation Functions

1. Linear Activation Function

Linear Activation Function resembles straight line define by y=x

The range of the output spans from (−∞ to +∞).

2. Non-Linear Activation Functions

(i) Sigmoid Function

The output ranges between 0 and 1, hence useful for binary classification.

(ii) Tanh Activation Function

Outputs values from -1 to +1.

(iii) ReLU (Rectified Linear Unit) Function

is defined by A(x)=max⁡(0,x), this means that if the input x is positive, ReLU

returns x, if the input is negative, it returns 0.
Value Range: [0,∞), meaning the function only outputs non-negative values
ReLU is less computationally expensive than tanh and sigmoid because it
involves simpler mathematical operations.

3. Exponential Linear Units

(i) Softmax Function

to handle multi-class classification problems.

It transforms raw output scores from a neural network into probabilities. It

works by squashing the output values of each class into the range of 0 to 1,
while ensuring that the sum of all probabilities equals 1.
Softmax is a non-linear activation function.
ReLU outputs positive values directly
and zero for negatives, while Tanh maps inputs between -1 and 1.
LOSS FUNCTION :
Loss quantifies how well a model performs during a training phase.
loss functions measure the difference between predicted and actual values.
Mean absolute error (MAE)
Also known as L1 loss, this loss function calculates the average absolute
difference between predicted and actual values.

Mean squared error (MSE)

This loss function calculates the square of the difference between predicted
and actual values.

Huber loss (Combination of MSE and MAE)

Also known as smooth L1 loss, this loss function combines the strengths of
MAE and MSE.

Hinge loss
This loss function works well for classification problems when target values
are in the set of {-1,1}.
Binary cross entropy
This loss function measures the difference between predicted binary
outcomes and actual values.

Categorical cross entropy

This loss function measures the difference between predicted and actual
output in categorical form.

OPTIMISER:
Improve the speed of training. Minimise the loss function like gradient
descent. There are three types of gradient descent :
Batch : entire dataset
Stochastic : data ko dekhte ho after every data point
Mini batch: batch size update after each batch size

Challenges in these were:

● Deciding learning rate was difficult
● To solve above we made learning rate scheduler but usme hume
predefined dena pdta h training se pehle hi
● Multidimensional me minima dhunda tough h
● Local minima se niklne ka koi chance ni h

SDG: Stochastic Gradient Descent (SGD) is a basic optimization algorithm

used to minimize the loss function by updating model parameters based on
the gradient of the loss function. Unlike Batch Gradient Descent, which uses
the entire dataset, SGD updates the parameters using a single data point per
iteration, making it more efficient for large datasets.
ADAM: (adaptive moment estimation) Adam (Adaptive Moment
Estimation) is an optimization algorithm that combines the advantages
of SGD (Stochastic Gradient Descent) with Momentum and RMSprop.
It is widely used in deep learning because it adapts the learning rate for
each parameter dynamically, leading to faster and more stable
convergence.

● Merges the momentum and learning rate decay concept

● Formula :

●
● Bias correction :

● Learning rate is 0.1, 0.01

Regularization methods :
It is a technique used to reduce errors by fitting the function appropriately
on the given training set and avoiding overfitting.
L1:(Lasso) Least absolute shrinkage and selection operator. It adds
the absolute value of magnitude of the coefficient as a penalty term to the
loss function. It also helps us achieve feature selection by penalizing the
weights approx to zero that does not serve any purpose in the model.
Best for: Sparse models where feature selection is important.

L2 : (Ridge) It adds the squared magnitude of the coefficient as a penalty

term to the loss function. Encourages small weight values instead of
zeroing them out.Helps reduce model complexity without eliminating
features.Best for: Preventing overfitting in deep networks.

Dropout : Dropout is a regularization technique that randomly drops neurons

during training to prevent overfitting.

During Training:
● Each neuron is randomly deactivated (set to 0) with probability ppp
(e.g., p=0.2p = 0.2p=0.2 means 20% of neurons are dropped).
● The remaining active neurons are scaled up by 11−p\frac{1}{1 - p}1−p1
to maintain the overall scale of activations.

During Testing:

● No neurons are dropped.

● The full network is used, ensuring that learned representations are
stable.

When to Use Dropout?

✅
✅ Deep Neural Networks (DNNs) with many parameters.

✅ Overfitting is observed (training accuracy >> validation accuracy).

Fully connected layers in CNNs (not always in convolutional layers).

❌
❌ When NOT to use Dropout?

❌ IfNottheusually
dataset is small, dropout may remove too much information.
needed in Batch Normalization, as BN already stabilizes activations.

Batch Normalisation :

Batch Normalization (BatchNorm) normalizes activations (bw hidden layers) in a

neural network to improve training speed and stability. It reduces internal covariate
shift, allowing for higher learning rates and faster convergence. The main idea is to
normalise the input of each layer across a mini batch of data. This reduces the
covariate shift between the layers.

It speeds up training.

Decreases importance of initial weights,

Regularise the model

Hyperparameter tuning :
Hyperparameters are external settings in a neural network (not learned from
data) that affect model performance. Hyperparameter tuning is the process of
finding the best combination of these values to optimize performance.

1. Learning Rate (η)

● Controls the step size of weight updates during training.

● Too high → Convergence is unstable.
● Too low → Training is slow.

Tuning Strategy:

● Start with 0.001 for Adam, 0.01 for SGD.

● Use a learning rate schedule (e.g., decay over time).

2. Batch Size

● Number of training samples processed before updating weights.

● Small batch size (e.g., 32) → Noisy but better generalization.
● Large batch size (e.g., 512) → Faster but may overfit.
Tuning Strategy:

● Use 32, 64, or 128 (common choices).

● Try small batches for better generalization.

3. Number of Epochs

● Too few epochs → Underfitting.

● Too many epochs → Overfitting.

Tuning Strategy:

● Use early stopping to stop when validation loss stops improving.

4. Optimizer (SGD vs. Adam)

● SGD → Works well for large datasets, better generalization.

● Adam → Faster convergence, good for deep networks.

Tuning Strategy:

● Default: Adam (start with it).

● Try SGD with momentum if generalization is poor.

5. Number of Hidden Layers and Neurons

● Too few layers/neurons → Underfitting.

● Too many layers/neurons → Overfitting, slow training.

Tuning Strategy:

● Start with 2-3 hidden layers (for most tasks).

● Use power of 2 neurons (e.g., 64, 128, 256).
● Use dropout & regularization to prevent overfitting.
6. Activation Functions

● ReLU → Best for hidden layers.

● Sigmoid/Tanh → Only for output in binary tasks.
● Softmax → For multi-class classification.

Millipore Academic
No ratings yet
Millipore Academic
93 pages
AI & ML Unit 5 Notes
No ratings yet
AI & ML Unit 5 Notes
23 pages
ML_MU_Unit_5NeuralNetworkpdf__2025_04_16_13_47_39
No ratings yet
ML_MU_Unit_5NeuralNetworkpdf__2025_04_16_13_47_39
57 pages
Unit -4 Artificial Neural Networks
No ratings yet
Unit -4 Artificial Neural Networks
33 pages
UNIT-I.pptx
No ratings yet
UNIT-I.pptx
90 pages
AML 03 Dense Neural Networks
No ratings yet
AML 03 Dense Neural Networks
20 pages
NN unit_1
No ratings yet
NN unit_1
27 pages
Unit-5
No ratings yet
Unit-5
59 pages
Deep Learning
No ratings yet
Deep Learning
5 pages
ISE-1 Imp DLpdf
No ratings yet
ISE-1 Imp DLpdf
28 pages
Deep Learning (1)
No ratings yet
Deep Learning (1)
19 pages
unit v
No ratings yet
unit v
9 pages
ML-U2
No ratings yet
ML-U2
15 pages
UNIT II DNN
No ratings yet
UNIT II DNN
24 pages
Chapter 5 Final
No ratings yet
Chapter 5 Final
80 pages
Unit 2 Deep Learning and Neural Networks
No ratings yet
Unit 2 Deep Learning and Neural Networks
38 pages
Neural-Network(Basics)
No ratings yet
Neural-Network(Basics)
48 pages
Unit-1 and 2 and 3 (1)
No ratings yet
Unit-1 and 2 and 3 (1)
212 pages
ca3dl
No ratings yet
ca3dl
6 pages
tutorial 1,2
No ratings yet
tutorial 1,2
12 pages
4. ANNs
No ratings yet
4. ANNs
57 pages
shortnotedeeplearning (2)
No ratings yet
shortnotedeeplearning (2)
11 pages
Assignment B 3 Customer Churn Modeling
No ratings yet
Assignment B 3 Customer Churn Modeling
7 pages
Introduction to Artificial Neural Networks
No ratings yet
Introduction to Artificial Neural Networks
31 pages
Unit II
No ratings yet
Unit II
12 pages
Notes On Introduction To Deep Learning
No ratings yet
Notes On Introduction To Deep Learning
19 pages
Neural Networks
No ratings yet
Neural Networks
61 pages
Unit 4
No ratings yet
Unit 4
19 pages
Ann MJJ-1
No ratings yet
Ann MJJ-1
64 pages
cst414- Deep learning
No ratings yet
cst414- Deep learning
34 pages
Unit II
No ratings yet
Unit II
56 pages
Artificial Intelligence: Outline
No ratings yet
Artificial Intelligence: Outline
35 pages
Ut
No ratings yet
Ut
14 pages
Unit 5
No ratings yet
Unit 5
102 pages
Unit 3 Self Made
No ratings yet
Unit 3 Self Made
23 pages
Research Proposal Presentation
No ratings yet
Research Proposal Presentation
20 pages
ANN Presentation Exam Tanjina
No ratings yet
ANN Presentation Exam Tanjina
21 pages
DeepLearing Theory
No ratings yet
DeepLearing Theory
51 pages
Lecture8,9-Neural Networks
No ratings yet
Lecture8,9-Neural Networks
65 pages
DEEP LEARNING Paper
No ratings yet
DEEP LEARNING Paper
12 pages
ML unit 4
No ratings yet
ML unit 4
23 pages
SHAI - Task 3 - NN
No ratings yet
SHAI - Task 3 - NN
10 pages
Unit 1 Fundamentals of Deep Learning
No ratings yet
Unit 1 Fundamentals of Deep Learning
20 pages
Deep Learing
No ratings yet
Deep Learing
37 pages
Artificial Neural Networks
No ratings yet
Artificial Neural Networks
100 pages
Neural Networks
No ratings yet
Neural Networks
28 pages
CV Lec5
No ratings yet
CV Lec5
54 pages
CS 522 Selected Topics in CS: Lecture 07 - Artificial Neural Network
No ratings yet
CS 522 Selected Topics in CS: Lecture 07 - Artificial Neural Network
52 pages
CS 611 Slides 5
No ratings yet
CS 611 Slides 5
28 pages
Module 1
No ratings yet
Module 1
22 pages
UNIT 5AIML
No ratings yet
UNIT 5AIML
46 pages
Deep Learning Unit 2
No ratings yet
Deep Learning Unit 2
4 pages
Unit 2.1
No ratings yet
Unit 2.1
37 pages
Lecture15 NeuronNetworks
No ratings yet
Lecture15 NeuronNetworks
61 pages
Kagan Lecture2
No ratings yet
Kagan Lecture2
118 pages
Cs3491-Artificial Intelligence and Machine Learning-1221091049-Unit 5 Aiml
No ratings yet
Cs3491-Artificial Intelligence and Machine Learning-1221091049-Unit 5 Aiml
38 pages
AD3451 ML UNIT 4 NOTES
No ratings yet
AD3451 ML UNIT 4 NOTES
36 pages
Module 5 AIML Notes
No ratings yet
Module 5 AIML Notes
77 pages
Essential Concept in Artificial Neural Networks
No ratings yet
Essential Concept in Artificial Neural Networks
27 pages
Backpropagation: Fundamentals and Applications for Preparing Data for Training in Deep Learning
From Everand
Backpropagation: Fundamentals and Applications for Preparing Data for Training in Deep Learning
Fouad Sabry
No ratings yet
Artificial Intelligence Algorithms
From Everand
Artificial Intelligence Algorithms
akosnemeth
No ratings yet
Compiler
No ratings yet
Compiler
9 pages
DL Assignment 1 (1)
No ratings yet
DL Assignment 1 (1)
6 pages
HealthBotX- Voice-Based Multilingual Health Assistant for Rural India
No ratings yet
HealthBotX- Voice-Based Multilingual Health Assistant for Rural India
13 pages
DATA COMMUNICATION
No ratings yet
DATA COMMUNICATION
9 pages
Computer Vision
No ratings yet
Computer Vision
33 pages
The Delivery of Books 'And Newspapers'
No ratings yet
The Delivery of Books 'And Newspapers'
2 pages
TOFA STATs
No ratings yet
TOFA STATs
38 pages
PIL Digestted Cases
100% (4)
PIL Digestted Cases
25 pages
RotaMASS 3 Series Coriolis Mass Flow & Density Meter PDF
No ratings yet
RotaMASS 3 Series Coriolis Mass Flow & Density Meter PDF
36 pages
Date Quote Info
No ratings yet
Date Quote Info
18 pages
Status Profile
No ratings yet
Status Profile
5 pages
Audit Network Checklist Whitepaper
No ratings yet
Audit Network Checklist Whitepaper
25 pages
Final 2011
No ratings yet
Final 2011
3 pages
FALLSEM2020-21 ECE2003 ETH VL2020210101783 Reference Material I 14-Jul-2020 DLD Satheesh
No ratings yet
FALLSEM2020-21 ECE2003 ETH VL2020210101783 Reference Material I 14-Jul-2020 DLD Satheesh
127 pages
16 - Conflicts in Organizations
No ratings yet
16 - Conflicts in Organizations
5 pages
Delfi Portal Training Manual
No ratings yet
Delfi Portal Training Manual
89 pages
REPROGRAM Your Mind To Be Rich in 22 Minutes....
No ratings yet
REPROGRAM Your Mind To Be Rich in 22 Minutes....
9 pages
Document
No ratings yet
Document
8 pages
The National Art Center Tokyo
No ratings yet
The National Art Center Tokyo
2 pages
FLL Challenge Submerged Welcome Letter
No ratings yet
FLL Challenge Submerged Welcome Letter
1 page
(Ebook PDF) Introductory Financial Accounting For Business by Thomas Edmonds 2024 Scribd Download
100% (3)
(Ebook PDF) Introductory Financial Accounting For Business by Thomas Edmonds 2024 Scribd Download
35 pages
Chapter 6 - Memory Management
No ratings yet
Chapter 6 - Memory Management
60 pages
Departmental Enquiry - Important Case Laws
No ratings yet
Departmental Enquiry - Important Case Laws
12 pages
Jurisdiction of Supreme Court
No ratings yet
Jurisdiction of Supreme Court
19 pages
HP 210A Brochure PDF
No ratings yet
HP 210A Brochure PDF
2 pages
Vacuum Tube Amplifier Circuits and Equations
No ratings yet
Vacuum Tube Amplifier Circuits and Equations
5 pages
AH THIAN v. GOVERNMENT OF MALAYSIA
No ratings yet
AH THIAN v. GOVERNMENT OF MALAYSIA
5 pages
Maryland Heights PD - Homemade Explosives Recognition Guide
100% (1)
Maryland Heights PD - Homemade Explosives Recognition Guide
58 pages
Added Value l2
No ratings yet
Added Value l2
16 pages
Error List
No ratings yet
Error List
482 pages
QUERIES For Practical File 2023-24
No ratings yet
QUERIES For Practical File 2023-24
4 pages
MCS Protocol v3.2.1.2
No ratings yet
MCS Protocol v3.2.1.2
62 pages
HTW 10-1 - For The Tenth Session of The Sub-Committee To Be Held at IMO Headquarters, 4 Albert Embank... (Secretariat)
No ratings yet
HTW 10-1 - For The Tenth Session of The Sub-Committee To Be Held at IMO Headquarters, 4 Albert Embank... (Secretariat)
3 pages
Building parallel programs SMPs clusters and Java 1st Edition Alan Kaminsky - The ebook is available for quick download, easy access to content
100% (1)
Building parallel programs SMPs clusters and Java 1st Edition Alan Kaminsky - The ebook is available for quick download, easy access to content
31 pages