0% found this document useful (0 votes)

4 views

LecML -3 NN

The document outlines a course on Data Science, specifically focusing on Neural Networks, taught by Dr. Devesh Bhimsaria at IIT Roorkee. It covers the origins, architectures, learning processes, and common issues associated with neural networks, including problems like vanishing and exploding gradients, slow convergence, overfitting, and underfitting. Solutions to these challenges are also discussed, emphasizing the importance of proper initialization, choice of activation functions, and optimization techniques.

Uploaded by

ranis

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

4 views

LecML -3 NN

Uploaded by

ranis

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 33

Data Science

DAI-101 Spring 2024-25

Dr. Devesh Bhimsaria

Office: F9, Old Building
Department of Biosciences and Bioengineering
Indian Institute of Technology–Roorkee
[email protected]
Dr. Devesh Bhimsaria 1
Neural Network
Neural Network

Dr. Devesh Bhimsaria 3

Slide credit: wiki
Neural Network
• Origins: Algorithms that try to mimic the brain.
• Very widely used in the 80s and early 90s; popularity
diminished in the late 90s.
• Recent resurgence: State-of-the-art technique for many
applications
• Artificial neural networks are not nearly as complex or
intricate as the actual brain structure

Dr. Devesh Bhimsaria 4

Slide credit: Andrew Ng
Neural Network

Dr. Devesh Bhimsaria 5

Slide credit: Eric Eaton
Neuron Model: Logistic Unit

Do we really need an activation function??

Dr. Devesh Bhimsaria 6

Slide credit: Eric Eaton
Neural Network

Dr. Devesh Bhimsaria 7

Feed-Forward Process

Dr. Devesh Bhimsaria 8

Slide credit: Eric Eaton
Other Network Architectures

Dr. Devesh Bhimsaria 9

Slide credit: Eric Eaton
Multiple Output Units: One-vs-Rest

Dr. Devesh Bhimsaria 10

Neural Network Classification

Equivalent to
dimensions of 𝑥

Dr. Devesh Bhimsaria 11

Slide credit: Andrew Ng
Neural Network examples
Representing Boolean Functions

Dr. Devesh Bhimsaria 13

Slide credit: Eric Eaton
Representing Boolean Functions

Dr. Devesh Bhimsaria 14

Slide credit: Eric Eaton
Combining Representations to Create
Non-Linear Functions

Dr. Devesh Bhimsaria 15

Slide credit: Eric Eaton
Layering Representations

Dr. Devesh Bhimsaria 16

Slide credit: Eric Eaton
Layering Representations

Dr. Devesh Bhimsaria 17

Slide credit: Eric Eaton
Neural Network Learning
Perceptron Learning Rule

Dr. Devesh Bhimsaria 19

Slide credit: Eric Eaton
Neural Network

Dr. Devesh Bhimsaria 20

Learning in NN: Backpropagation

• Using mean squared error (like in linear regression) is not

ideal for logistic regression due to the non-convexity and poor
convergence behaviour.

• Log is there due to Maximum Likelihood criteria

Dr. Devesh Bhimsaria 21
Slide credit: Eric Eaton
Cost Function

Dr. Devesh Bhimsaria 22

Slide credit: Eric Eaton
Forward Propagation

Do we really need an activation function??

Dr. Devesh Bhimsaria 23

Slide credit: Eric Eaton
Backpropagation Intuition

Dr. Devesh Bhimsaria 24

Slide credit: Eric Eaton
Backpropagation Intuition

Dr. Devesh Bhimsaria 25

Slide credit: Eric Eaton
Backpropagation Issues
• “Backprop is the cockroach of machine learning. It’s
ugly, and annoying, but you just can’t get rid of it.” -
Geoff Hinton
• Problems
• Vanishing Gradient Problem
• Exploding Gradient Problem
• Slow Convergence
• Getting Stuck in Local Minima
• Overfitting (High Variance)
• Underfitting (High Bias)

Dr. Devesh Bhimsaria 26

Vanishing Gradient Problem
What?
• In deep networks, gradients become very small as they propagate backward
• Weight updates shrink, causing earlier layers to learn very slowly or not at all.
• Leading to slow convergence or even network stalling.

Why?
• Activation functions like sigmoid and tanh squash values into small ranges.
• Their derivatives are also small (<= 0.25 for sigmoid), leading to small gradients.
• This shrinks the gradient exponentially as it moves backward.

Solutions:
• Use ReLU (Rectified Linear Unit) (ReLU has gradients of 1 for positive inputs).
• Use Batch Normalization to stabilize activations.
• Better weight initialization (Xavier- sigmoid & tanh- similar variance of outputs across
layers /He initialization- ReLU Accounts for the non-zero mean and asymmetry).

Dr. Devesh Bhimsaria 27

Slide credit: Online
Exploding Gradient Problem
What?
• Gradients become too large and cause unstable training.
• Weight updates explode, leading to NaN or large numbers.

Why?
• Happens in very deep networks with large weight updates.
• Poor weight initialization or using large learning rates.

Solutions:
• Use Gradient Clipping (cap gradients at a threshold).
• Xavier/He weight initialization.
• Reduce the learning rate.
Dr. Devesh Bhimsaria 28
Slide credit: Online
Slow Convergence
What?
• Training takes too long due to small weight updates.
• Learning stagnates, especially in deep networks.

Why Does It Happen?

• Poor weight initialization.
• Poor choice of activation function (e.g., sigmoid in deep networks).
• Inefficient optimizer (e.g., using plain SGD instead of adaptive optimizers).
Note- Stochastic GD uses just one random data point (or mini-batch) per update
instead of computing the gradient over the entire dataset

Solutions:
• Use optimizers like Adam, RMSprop, or Momentum.
• Use a learning rate scheduler (reduce learning rate when needed).
• Use Batch Normalization to speed up training.

Dr. Devesh Bhimsaria 29

Slide credit: Online
Getting Stuck in Local Minima
Why?
• Non-convex loss functions (common in deep networks).
• Poor initialization of weights.
• Using SGD without momentum makes it harder to escape.

Solutions:
• Use optimizers like Adam, Momentum, or RMSprop, which
help escape local minima.
• Increase network capacity (more neurons/layers) to create
a smoother loss surface.
• Train longer or use learning rate annealing (manual).

Dr. Devesh Bhimsaria 30

Slide credit: Online
Overfitting (High Variance)
What?
• Performs well on training data but poorly on test data.

Why?
• Too many parameters compared to the training data.
• No regularization used (e.g., dropout, L2 regularization).
• Too many training epochs.

Solutions:
• Use Dropout (randomly disable neurons during training).
• Use L2 Regularization (Weight Decay) to prevent large
weights.
• Increase training data or use data augmentation.

Dr. Devesh Bhimsaria 31

Slide credit: Online
Underfitting (High Bias)
What?
• The model is too simple and can’t capture the data’s complexity.
• Both training and test performance are poor.

Why?
• The network is too shallow.
• Not enough neurons in hidden layers.
• High regularization making the model too restrictive.

Solutions:
• Increase the number of layers or neurons in hidden layers.
• Reduce regularization strength (e.g., lower L2 penalty).
• Train for more epochs.

Dr. Devesh Bhimsaria 32

Slide credit: Online
Thank You
• Thanks to those who made their material available
online. I have tried to acknowledge the person or
website from where material was taken.
• All my slides/notes excluding third party material
are licensed by various authors including myself
under https://creativecommons.org/licenses/by-
nc/4.0/

Dr. Devesh Bhimsaria 33

Optimization of Deep Networks
No ratings yet
Optimization of Deep Networks
84 pages
Deep Learning (1)
No ratings yet
Deep Learning (1)
19 pages
SS 2020 Solutions
No ratings yet
SS 2020 Solutions
22 pages
A Probabilistic Theory of Deep Learning: Unit 2
No ratings yet
A Probabilistic Theory of Deep Learning: Unit 2
17 pages
Lect 12 -Deep Feed Forward NN- Review
No ratings yet
Lect 12 -Deep Feed Forward NN- Review
93 pages
Chap 2 Training Feed Forward Neural Networks
No ratings yet
Chap 2 Training Feed Forward Neural Networks
22 pages
Deep Learning Unit 2
No ratings yet
Deep Learning Unit 2
25 pages
ML Lec 10 Neural Networks
No ratings yet
ML Lec 10 Neural Networks
87 pages
AI - W7L13
No ratings yet
AI - W7L13
46 pages
cours4
No ratings yet
cours4
30 pages
Ece18898g Neural Networks
No ratings yet
Ece18898g Neural Networks
47 pages
Learning Algorithm
No ratings yet
Learning Algorithm
100 pages
Introduction To Neural Network
No ratings yet
Introduction To Neural Network
20 pages
Fundamentals of Deep Learning
No ratings yet
Fundamentals of Deep Learning
26 pages
Ann MJJ-1
No ratings yet
Ann MJJ-1
64 pages
Unit-2 Improving-Deep-Neural-Networks
No ratings yet
Unit-2 Improving-Deep-Neural-Networks
18 pages
L4 Training Neural Networks en
No ratings yet
L4 Training Neural Networks en
48 pages
Deep Learning Andrew NG
100% (3)
Deep Learning Andrew NG
173 pages
ca3dl
No ratings yet
ca3dl
6 pages
General Observation
No ratings yet
General Observation
93 pages
Notes On Introduction To Deep Learning
No ratings yet
Notes On Introduction To Deep Learning
19 pages
Complete Deep Learning Interview Question
No ratings yet
Complete Deep Learning Interview Question
46 pages
AD3451 ML UNIT 4 NOTES
No ratings yet
AD3451 ML UNIT 4 NOTES
36 pages
L10 Learning II Gradient Based Learning
No ratings yet
L10 Learning II Gradient Based Learning
72 pages
CS 224D: Deep Learning For NLP: Lecture Notes: Part III Spring 2015
No ratings yet
CS 224D: Deep Learning For NLP: Lecture Notes: Part III Spring 2015
14 pages
Module 2
No ratings yet
Module 2
13 pages
Slides 11
No ratings yet
Slides 11
48 pages
CS460 - Deep Learning - W02 & W03
No ratings yet
CS460 - Deep Learning - W02 & W03
44 pages
Deep Learning (All in One)
No ratings yet
Deep Learning (All in One)
23 pages
1.1 Introduction
No ratings yet
1.1 Introduction
73 pages
Session NN
No ratings yet
Session NN
32 pages
Deep Neural Network (DNN)
100% (1)
Deep Neural Network (DNN)
80 pages
Convolutional Neural Network
100% (1)
Convolutional Neural Network
59 pages
Improving Deep Neural Networks: Hyperparameter Tuning, Regularization and Optimization
No ratings yet
Improving Deep Neural Networks: Hyperparameter Tuning, Regularization and Optimization
1 page
cs188-fa24-lec24
No ratings yet
cs188-fa24-lec24
46 pages
06 AIS302 ANN backpropagation
No ratings yet
06 AIS302 ANN backpropagation
83 pages
Lecture 5-6
No ratings yet
Lecture 5-6
45 pages
Deep Learning
100% (4)
Deep Learning
100 pages
Unit 2.1
No ratings yet
Unit 2.1
37 pages
chapter_5_summary
No ratings yet
chapter_5_summary
5 pages
Unit 2 Deep Learning and Neural Networks
No ratings yet
Unit 2 Deep Learning and Neural Networks
38 pages
IoT - Lecture 11
No ratings yet
IoT - Lecture 11
58 pages
Deep+Learning+Module-02+Search+Creators
No ratings yet
Deep+Learning+Module-02+Search+Creators
15 pages
Neural-Network(Basics)
No ratings yet
Neural-Network(Basics)
48 pages
ML_MU_Unit_5NeuralNetworkpdf__2025_04_16_13_47_39
No ratings yet
ML_MU_Unit_5NeuralNetworkpdf__2025_04_16_13_47_39
57 pages
UNIT V NNHDL
No ratings yet
UNIT V NNHDL
33 pages
Lecture_09_slides_-_after
No ratings yet
Lecture_09_slides_-_after
57 pages
L10 Neural Network
No ratings yet
L10 Neural Network
52 pages
cst414- Deep learning
No ratings yet
cst414- Deep learning
34 pages
Artificial Neural Networks_dl
No ratings yet
Artificial Neural Networks_dl
55 pages
CII4Q3 - Computer Vision-EAR - Week-11-Intro To Deep Learning v1.0
No ratings yet
CII4Q3 - Computer Vision-EAR - Week-11-Intro To Deep Learning v1.0
50 pages
Deep Learing
No ratings yet
Deep Learing
37 pages
18 DL Regularization
No ratings yet
18 DL Regularization
41 pages
CS224n: Natural Language Processing With Deep Learning
No ratings yet
CS224n: Natural Language Processing With Deep Learning
18 pages
tutorial 1,2
No ratings yet
tutorial 1,2
12 pages
Deep Learning
No ratings yet
Deep Learning
20 pages
4 - DNN Tip
No ratings yet
4 - DNN Tip
52 pages
Artificial Intelligence Interview Questions
From Everand
Artificial Intelligence Interview Questions
Tech Interviews
5/5 (2)
AI for Everyone: An Intermediate Guide to Artificial Intelligence
From Everand
AI for Everyone: An Intermediate Guide to Artificial Intelligence
Nova Clarke
No ratings yet
Deep Learning With Python Illustrated Guide For Beginners & Intermediates: The Future Is Here!: The Future Is Here!, #2
From Everand
Deep Learning With Python Illustrated Guide For Beginners & Intermediates: The Future Is Here!: The Future Is Here!, #2
William Sullivan
1/5 (1)
Biological Psychology 12th Edition James W. Kalat instant download
100% (1)
Biological Psychology 12th Edition James W. Kalat instant download
50 pages
Action Potential Notes
No ratings yet
Action Potential Notes
26 pages
Chapter 6 AI
No ratings yet
Chapter 6 AI
52 pages
Handbook of Local Anesthesia. 6th Edition. ISBN 0323074138, 978-0323074131
100% (21)
Handbook of Local Anesthesia. 6th Edition. ISBN 0323074138, 978-0323074131
23 pages
Artificial Neural Networks Notes Syllabus Unit-1
No ratings yet
Artificial Neural Networks Notes Syllabus Unit-1
24 pages
Exam Style Answers 15 Asal Biology CB
No ratings yet
Exam Style Answers 15 Asal Biology CB
5 pages
Principles of Animal Physiology 3rd Edition Moyes Test Bank pdf download
100% (1)
Principles of Animal Physiology 3rd Edition Moyes Test Bank pdf download
45 pages
DL2_Perceptron.pptx
No ratings yet
DL2_Perceptron.pptx
14 pages
Introduction To Neuro Rehabilitation and Restorative Neurology (Article) Author Milan R Dimitrijevic
No ratings yet
Introduction To Neuro Rehabilitation and Restorative Neurology (Article) Author Milan R Dimitrijevic
3 pages
What Are Neurons
No ratings yet
What Are Neurons
16 pages
Download the updated Human Body in Health and Disease 6th Edition Patton Test Bank (PDF) containing all chapters.
100% (11)
Download the updated Human Body in Health and Disease 6th Edition Patton Test Bank (PDF) containing all chapters.
61 pages
2024 Chapter 9 Review Questions
No ratings yet
2024 Chapter 9 Review Questions
4 pages
Synaptic Transmission Worksheet
No ratings yet
Synaptic Transmission Worksheet
2 pages
Long Short-Term Memory (LSTM) by Mohsin
No ratings yet
Long Short-Term Memory (LSTM) by Mohsin
17 pages
30_Deep_Learning_Quiz_Questions_and_Answers_OnlineExamMaker_Blog
No ratings yet
30_Deep_Learning_Quiz_Questions_and_Answers_OnlineExamMaker_Blog
18 pages
Behavioral Neuroscience Essentials and Beyond Interactive Edition 1st Edition PDF DOCX Download
100% (9)
Behavioral Neuroscience Essentials and Beyond Interactive Edition 1st Edition PDF DOCX Download
14 pages
PHYSIO - P9 - Cortical and Brain Stem Control of Motor Function.docx
No ratings yet
PHYSIO - P9 - Cortical and Brain Stem Control of Motor Function.docx
6 pages
Week 11
No ratings yet
Week 11
3 pages
Clinical Neurophysiology Board Review Q A Second Edition Gupta Md Mse pdf download
100% (1)
Clinical Neurophysiology Board Review Q A Second Edition Gupta Md Mse pdf download
45 pages
11 - Ans
No ratings yet
11 - Ans
28 pages
Reticular Activating System
No ratings yet
Reticular Activating System
57 pages
Reflex Arc Model and Peripheral Nervous System
No ratings yet
Reflex Arc Model and Peripheral Nervous System
13 pages
Neuromuscular Blocking Agents
No ratings yet
Neuromuscular Blocking Agents
18 pages
Cellular and Molecular Neurobiology 1984 - Unknown
No ratings yet
Cellular and Molecular Neurobiology 1984 - Unknown
6 pages
Neural Learning
No ratings yet
Neural Learning
11 pages
Lecture 7-Plasticity Memory
No ratings yet
Lecture 7-Plasticity Memory
67 pages
Nerve and Muscle Keynes R.D. download
No ratings yet
Nerve and Muscle Keynes R.D. download
46 pages
Do short rest periods help or hinder muscle growth? | by Chris Beardsley | Medium
No ratings yet
Do short rest periods help or hinder muscle growth? | by Chris Beardsley | Medium
9 pages
Unit 9 ANN
No ratings yet
Unit 9 ANN
14 pages
Hopfield Network
No ratings yet
Hopfield Network
14 pages

LecML -3 NN

Uploaded by

LecML -3 NN

Uploaded by

Data Science

DAI-101 Spring 2024-25

Dr. Devesh Bhimsaria

Dr. Devesh Bhimsaria 3

Dr. Devesh Bhimsaria 4

Dr. Devesh Bhimsaria 5

Do we really need an activation function??

Dr. Devesh Bhimsaria 6

Dr. Devesh Bhimsaria 7

Dr. Devesh Bhimsaria 8

Dr. Devesh Bhimsaria 9

Dr. Devesh Bhimsaria 10

Dr. Devesh Bhimsaria 11

Dr. Devesh Bhimsaria 13

Dr. Devesh Bhimsaria 14

Dr. Devesh Bhimsaria 15

Dr. Devesh Bhimsaria 16

Dr. Devesh Bhimsaria 17

Dr. Devesh Bhimsaria 19

Dr. Devesh Bhimsaria 20

• Using mean squared error (like in linear regression) is not

• Log is there due to Maximum Likelihood criteria

Dr. Devesh Bhimsaria 22

Do we really need an activation function??

Dr. Devesh Bhimsaria 23

Dr. Devesh Bhimsaria 24

Dr. Devesh Bhimsaria 25

Dr. Devesh Bhimsaria 26

Dr. Devesh Bhimsaria 27

Why Does It Happen?

Dr. Devesh Bhimsaria 29

Dr. Devesh Bhimsaria 30

Dr. Devesh Bhimsaria 31

Dr. Devesh Bhimsaria 32

Dr. Devesh Bhimsaria 33

You might also like