0% found this document useful (0 votes)

4 views

ANN-Unit 7 - Parameter Tuning & Normalization

The document outlines topics related to deep neural networks including hyperparameter tuning, batch normalization, mini-batches, regularization, softmax, and orthogonalization. It discusses exploring hyperparameters randomly from coarse to fine levels and exponential parameter tuning. Batch normalization is explained as normalizing data before activation layers to speed up training and improve model accuracy. Mini-batches and gradient descent implementation with batch normalization are also covered. Softmax regression and its use of the softmax activation function for classification problems are described.

Uploaded by

adeenahussain70

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

4 views

ANN-Unit 7 - Parameter Tuning & Normalization

Uploaded by

adeenahussain70

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 13

1/1/2024

Applied Neural
Networks
Unit – 7

Dr. Muhammad Usman Arif; Applied Neural Networks 1/1/2024 1

Lecture Outline
▪ Deep Neural Networks
▪ Hyper parameter Tuning
▪ Batch Normaliazation
▪ Mini-Batches
▪ Regularization
▪ Softmax
▪ Orthogonalization

Dr. Muhammad Usman Arif; Applied Neural Networks 1/1/2024 2

1
1/1/2024

Hyper-parameter Tuning
▪ α,β
▪ β1
▪ β2
▪ ε
▪ # hidden layers
▪ # hidden units
▪ Learning rates
▪ Mini-batch size
▪ Activation functions
▪ …..

Dr. Muhammad Usman Arif; Applied Neural Networks 1/1/2024 3

Hyper-parameter Tuning
▪ Don’t Use a Grid
▪ Explore Randomly
▪ Coarse to Fine

Dr. Muhammad Usman Arif; Applied Neural Networks 1/1/2024 4

2
1/1/2024

Exponential Parameter Tuning

▪ Choose the Right Scale
▪ r = -4 * np.random.rand() r є [-4, 0]
▪ α = 10-r (10-4 , 100)
▪ If β is to be explored between
0.9 …………….. 0.999

▪ Then: r є [-3, -1]

▪ β = 1 – 10-r
▪ 1-β = 10-r

Dr. Muhammad Usman Arif; Applied Neural Networks 1/1/2024 5

Panda vs. Caviar

Dr. Muhammad Usman Arif; Applied Neural Networks 1/1/2024 6

3
1/1/2024

Batch Normalization
▪ As we discussed before, normalizing the data is very
important to a machine learning model
▪ The Batch Normalization layer works normalizing the
data before the activation layer, it is essential as it
makes the model faster and more accurate.
▪ Using batch normalization, will speed up model
training, decreases the importance of initial
weights, regularizes the model a little and makes it a
little better.

Dr. Muhammad Usman Arif; Applied Neural Networks 1/1/2024 7

Batch Normalization Implementation

(𝑖) 𝑧 (𝑖) − 𝜇𝐵
𝑧𝑛𝑜𝑟𝑚 =
𝜎2 + 𝜀
Where:
𝛾 𝑎𝑛𝑑 𝛽 are learnable parameters
ⅈ (𝑖)
𝑧ǁ = 𝛾𝑧𝑛𝑜𝑟𝑚 + 𝛽
Dr. Muhammad Usman Arif; Applied Neural Networks 1/1/2024 8

4
1/1/2024

Adding Batch Normalization to a Neural Network

𝛾 [1] 𝛽[1] W [2] , b[2] 𝛾 [2] 𝛽[2]

X → Z[1] 𝑧ǁ [1] → a[1] = g[1] (𝑧ǁ [1]) Z[2] 𝑧ǁ [2] → a[2] ….
Batch Norm Batch Norm

Dr. Muhammad Usman Arif; Applied Neural Networks 1/1/2024 9

Working with Mini-Batches

𝛾 [1] 𝛽[1] W [2] , b[2] 𝛾 [2] 𝛽[2]
X{1} → Z[1] 𝑧ǁ [1] → a[1] = g[1] (𝑧ǁ [1]) Z[2] 𝑧ǁ [2] → a[2] ….
Batch Norm Batch Norm
𝛾 [1] 𝛽[1] W [2] , b[2] 𝛾 [2] 𝛽[2]
X{2} → Z[1] 𝑧ǁ [1] → a[1] = g[1] (𝑧ǁ [1]) Z[2] 𝑧ǁ [2] → a[2] ….
Batch Norm Batch Norm
𝛾 [1] 𝛽[1] W [2] , b[2] 𝛾 [2] 𝛽[2]
X{3} → Z[1] 𝑧ǁ [1] → a[1] = g[1] (𝑧ǁ [1]) Z[2] 𝑧ǁ [2] → a[2] ….
Batch Norm z [l] = w [l] a [l-1] + b [l] Batch Norm

Parameters:W[l] , b[l], 𝛾[l]𝑎𝑛𝑑 𝛽 [l] z [l] = w [l] a [l-1]

(𝑖)
𝑧𝑛𝑜𝑟𝑚 ⅈ (𝑖)
𝑧ǁ = 𝛾𝑧𝑛𝑜𝑟𝑚 + 𝛽
Dr. Muhammad Usman Arif; Applied Neural Networks 1/1/2024 10

5
1/1/2024

Implementing Gradient Descent

for t = 1………….num Mini Batch
Compute forward prop X{t}
In each hidden layer, use BN to replace z[l] with 𝑧ǁ [𝑙]
Use back prop and compute dW[l], db[l], 𝑑𝛾[l]𝑎𝑛𝑑 𝑑𝛽 [l]
Update parameters
W[l] = W − αdW[l]
β[l] = β − αdβ[l]
𝛾[l] = 𝛾 − αd𝛾[l]
Works with momentum, RMS Prop, Adam
Dr. Muhammad Usman Arif; Applied Neural Networks 1/1/2024 11

Learning on Shifting Input Distribution

Covariate Shift

X→Y

Dr. Muhammad Usman Arif; Applied Neural Networks 1/1/2024 12

6
1/1/2024

Why this is a Problem with NN W [3], b[3] W [4], b[4]

W [1], b[1] W [2], b[2]
a1[2]

a2[2]

a3[2]

a4[2]

Dr. Muhammad Usman Arif; Applied Neural Networks 1/1/2024 13

Batch Normalization as Regularization

▪ Each mini-batch is scaled by the mean/variance
computed on just that mini-batch.
▪ This adds some noise to the values z[l] within that
minibatch. So similar to dropout, it adds some noise to
each hidden layer’s activations.
▪ This has a slight regularization effect

Dr. Muhammad Usman Arif; Applied Neural Networks 1/1/2024 14

7
1/1/2024

Softmax Regression

Dr. Muhammad Usman Arif; Applied Neural Networks 1/1/2024 15

Softmax Layer
z [l] = w [l] a [l−1] + b [l]
Activation Function:
[𝑙]
t = 𝑒𝑧
[𝑙]
𝑒𝑧 [𝑙] 𝑡𝑖
𝑎[𝑙] = σ4 , 𝑎𝑖 = σ4
𝑗=1 𝑡𝑖 𝑗=1 𝑡𝑖

z[l] = = a[l]

Dr. Muhammad Usman Arif; Applied Neural Networks 1/1/2024 16

8
1/1/2024

Softmax Examples

Dr. Muhammad Usman Arif; Applied Neural Networks 1/1/2024 17

Softmax Classifier

Softmax Hardmax
1
0
0
0

Softmax regression generalizes logistic regression to C classes.

Dr. Muhammad Usman Arif; Applied Neural Networks 1/1/2024 18

9
1/1/2024

Loss Function
0 0.3
1 0.2
𝑦= 𝑦ො =
0 0.1
0 0.4
4
1 1 1 𝑚
ℒ 𝑦ො , 𝑦 = − ෍ 𝑦𝑗 log 𝑦ො 𝑗 𝐽 𝑤 𝑏 … = 𝑚 ෌𝑖=1 ℒ 𝑦ො 𝑖 , 𝑦 𝑖
𝑗=1

−𝑦2 log 𝑦ො2 = − log 𝑦ො2

0 1 0 0.3 0.5 0.2
1 0 0 1 2 3 0.2 0.2 0.1
𝑦= 𝑦 1 ,𝑦 2 ,𝑦 3 ,…. = … … . 𝑦ො = 𝑦ො , 𝑦ො , 𝑦ො ,…. =
0.1 0.15 0.6
…….
0 0 1
0 0 0 0.4 0.15 0.1
Dr. Muhammad Usman Arif; Applied Neural Networks 1/1/2024 19

Gradient Descent with Softmax

▪ Backpropogation 𝑧 [𝑙] → 𝑎[𝑙] = 𝑦ො → ℒ 𝑦ො , 𝑦

𝑑𝑧 = 𝑦ො − 𝑦
𝜕𝐽
𝜕𝑧 [𝑙]

Dr. Muhammad Usman Arif; Applied Neural Networks 1/1/2024 20

10
1/1/2024

Deep Learning Frameworks

▪ Caffe/Caffe2 Choosing deep learning frameworks
▪ CNTK - Ease of programming (development
▪ DL4J and deployment)
▪ Keras - Running speed
▪ Lasagne
- Truly open (open source with good
▪ mxnet
governance)
▪ PaddlePaddle
▪ TensorFlow
▪ Theano
▪ Torch
Dr. Muhammad Usman Arif; Applied Neural Networks 1/1/2024 21

ML Strategies to Improve the System

Ideas: ▪ Try smaller network

▪ Collect more data ▪ Try dropout
▪ Collect more diverse training set ▪ Add L2 regularization

▪ Train algorithm longer with gradient descent ▪ Network architecture

▪ Try Adam instead of gradient descent ▪ Activation functions

▪ Try bigger network ▪ # hidden units

▪ …

Dr. Muhammad Usman Arif; Applied Neural Networks 1/1/2024 22

11
1/1/2024

Orthogonalization
▪ Orthogonalization or orthogonality is a system design property that
assures that modifying an instruction or a component of an
algorithm will not create or propagate side effects to other
components of the system. It becomes easier to verify the algorithms
independently from one another, it reduces testing and development
time.

Dr. Muhammad Usman Arif; Applied Neural Networks 1/1/2024 23

Orthogonalization

Dr. Muhammad Usman Arif; Applied Neural Networks 1/1/2024 24

12
1/1/2024

Assumptions for Orthogonalization

▪ When a supervised learning system is design, these are the 4 assumptions that needs
to be true and orthogonal.
1. Fit training set well in cost function
If it doesn’t fit well, the use of a bigger neural network or switching to a better optimization algorithm might
help.
2. Fit development set well on cost function
If it doesn’t fit well, regularization or using bigger training set might help.
3. Fit test set well on cost function
If it doesn’t fit well, the use of a bigger development set might help
4. Performs well in real world
If it doesn’t perform well, the development test set is not set correctly or the cost function is not evaluating
the right thing.

Dr. Muhammad Usman Arif; Applied Neural Networks 1/1/2024 25

ANN-Unit 9 - Convolutionary Neural Networks
No ratings yet
ANN-Unit 9 - Convolutionary Neural Networks
22 pages
Neurocomputing Chap2 2 PDF
No ratings yet
Neurocomputing Chap2 2 PDF
41 pages
NN 04
No ratings yet
NN 04
18 pages
Lecture 3
No ratings yet
Lecture 3
17 pages
7 DPCM and Delta Modulation
No ratings yet
7 DPCM and Delta Modulation
16 pages
Stride 2 1-D, 2-D, and 3-D Winograd For Convolutional Neural Networks
No ratings yet
Stride 2 1-D, 2-D, and 3-D Winograd For Convolutional Neural Networks
11 pages
CS445 - Neural Networks and Deep Learning - Lecture Notes
No ratings yet
CS445 - Neural Networks and Deep Learning - Lecture Notes
5 pages
Intro_DL_04
No ratings yet
Intro_DL_04
35 pages
Article
No ratings yet
Article
8 pages
mod4
No ratings yet
mod4
65 pages
Fuzzy C Mean
No ratings yet
Fuzzy C Mean
6 pages
SC 2
No ratings yet
SC 2
95 pages
Swearson Presentation
No ratings yet
Swearson Presentation
20 pages
AI - Physics Informed Neural Network by ARNAB HALDER
No ratings yet
AI - Physics Informed Neural Network by ARNAB HALDER
15 pages
CSE4261 Lecture-8
No ratings yet
CSE4261 Lecture-8
49 pages
Ann4-3s.pdf 7oct PDF
No ratings yet
Ann4-3s.pdf 7oct PDF
21 pages
BUCO,Christian_SPP2024_(July 5, 2024)
No ratings yet
BUCO,Christian_SPP2024_(July 5, 2024)
1 page
Multithreaded Architectures: (Applied Parallel Programming)
No ratings yet
Multithreaded Architectures: (Applied Parallel Programming)
29 pages
Lec-4-Opt and BP
No ratings yet
Lec-4-Opt and BP
75 pages
Day 10
No ratings yet
Day 10
17 pages
Practical Initialization of The Nelder-Mead Method For Computationally Expensive Optimization Problems
No ratings yet
Practical Initialization of The Nelder-Mead Method For Computationally Expensive Optimization Problems
15 pages
lecture 7
No ratings yet
lecture 7
24 pages
Transient Analysis Using Ansys Maxwell
No ratings yet
Transient Analysis Using Ansys Maxwell
21 pages
Data Normalizationand Standardization ATechnical Report
No ratings yet
Data Normalizationand Standardization ATechnical Report
6 pages
Thesis Presentation Single Image Denoising
No ratings yet
Thesis Presentation Single Image Denoising
57 pages
Mech LND 2022r2 en Ws02.1 - Sdof Oscillators
No ratings yet
Mech LND 2022r2 en Ws02.1 - Sdof Oscillators
15 pages
CNN and Gan: Introduction To
No ratings yet
CNN and Gan: Introduction To
58 pages
24A-375-장재형-발표자료
No ratings yet
24A-375-장재형-발표자료
41 pages
CE6146_Lecture_3
No ratings yet
CE6146_Lecture_3
83 pages
Introduction To Neural Networks: John Paxton Montana State University Summer 2003
No ratings yet
Introduction To Neural Networks: John Paxton Montana State University Summer 2003
24 pages
6COM1044 Deep Learning 1
No ratings yet
6COM1044 Deep Learning 1
49 pages
DCASE2022 Dinkel 52 t4
No ratings yet
DCASE2022 Dinkel 52 t4
5 pages
[AK]_AIMLCZG511_Midsem_Regular
No ratings yet
[AK]_AIMLCZG511_Midsem_Regular
7 pages
Lecture 4 Calculus
No ratings yet
Lecture 4 Calculus
11 pages
Cours Abdellatif ZAIDI
No ratings yet
Cours Abdellatif ZAIDI
95 pages
Lec7 Inroduction to Neural Network (1)
No ratings yet
Lec7 Inroduction to Neural Network (1)
24 pages
Artificial Intellligence Based On Health Care System
No ratings yet
Artificial Intellligence Based On Health Care System
24 pages
MOW323 - Semester Test 1
No ratings yet
MOW323 - Semester Test 1
10 pages
sample-ec-submission
No ratings yet
sample-ec-submission
8 pages
Artificial intelligence basics
No ratings yet
Artificial intelligence basics
13 pages
1 Berkeley PDF
No ratings yet
1 Berkeley PDF
10 pages
Adjoint-Based Model Tuning and Machine Learning Strategy for Turbulence Model Improvement
No ratings yet
Adjoint-Based Model Tuning and Machine Learning Strategy for Turbulence Model Improvement
23 pages
midas NFX2024R1_Release Note
No ratings yet
midas NFX2024R1_Release Note
16 pages
Regression
No ratings yet
Regression
39 pages
Unit4_PPT
No ratings yet
Unit4_PPT
118 pages
Ar$ficial Neural Network - : Adaline and Madaline
No ratings yet
Ar$ficial Neural Network - : Adaline and Madaline
22 pages
2015TrainingArtificialNeuralNetworkUsingModificationofDifferentialEvolutionAlgorithm
No ratings yet
2015TrainingArtificialNeuralNetworkUsingModificationofDifferentialEvolutionAlgorithm
7 pages
Greedy Method
No ratings yet
Greedy Method
50 pages
Hadamard Walsh space based hybrid technique for image data augmentation
No ratings yet
Hadamard Walsh space based hybrid technique for image data augmentation
9 pages
Slide07 Haykin Chapter 7: Committee Machines
No ratings yet
Slide07 Haykin Chapter 7: Committee Machines
8 pages
UpOrDown
No ratings yet
UpOrDown
12 pages
Deep Learning 10 Hours: - Artificial Neural Networks (ANN) : Architecture
No ratings yet
Deep Learning 10 Hours: - Artificial Neural Networks (ANN) : Architecture
24 pages
Neural Net
No ratings yet
Neural Net
15 pages
The Finite Difference Time Domain Method For Electromagnetics with MATLAB Simulations 2nd Edition Atef Z. Elsherbeni pdf download
100% (2)
The Finite Difference Time Domain Method For Electromagnetics with MATLAB Simulations 2nd Edition Atef Z. Elsherbeni pdf download
82 pages
Package Neuralnet': February 7, 2019
No ratings yet
Package Neuralnet': February 7, 2019
15 pages
structuresolution-joelreid-2022-final
No ratings yet
structuresolution-joelreid-2022-final
45 pages
A Simple Approach To Weather Predictions by Using Naive Bayes Classifiers
No ratings yet
A Simple Approach To Weather Predictions by Using Naive Bayes Classifiers
12 pages
Mid 2 NN
No ratings yet
Mid 2 NN
14 pages
Basics of Autodesk Inventor Nastran 2024
From Everand
Basics of Autodesk Inventor Nastran 2024
Gaurav Verma
No ratings yet
Basics of Autodesk Inventor Nastran 2025
From Everand
Basics of Autodesk Inventor Nastran 2025
Gaurav Verma
No ratings yet
Online MachineLearningUsing Python
No ratings yet
Online MachineLearningUsing Python
3 pages
First - CFP - ISPR2024
No ratings yet
First - CFP - ISPR2024
2 pages
Quiz Artificial Intelligence
No ratings yet
Quiz Artificial Intelligence
4 pages
cs231n 2017 Lecture16
No ratings yet
cs231n 2017 Lecture16
43 pages
Major Project PPT - Phase 1
No ratings yet
Major Project PPT - Phase 1
13 pages
Comparing Gru and LSTM For Automatic Speech Recognition: Shubham Khandelwal, Benjamin Lecouteux, Laurent Besacier
No ratings yet
Comparing Gru and LSTM For Automatic Speech Recognition: Shubham Khandelwal, Benjamin Lecouteux, Laurent Besacier
7 pages
Validation and Optimization of 3d-Human Body Pose Estimation Approaches For Use in Motion Analysis, Ab. Sami Noorzad and Malek Zedan
No ratings yet
Validation and Optimization of 3d-Human Body Pose Estimation Approaches For Use in Motion Analysis, Ab. Sami Noorzad and Malek Zedan
87 pages
Stas Week 16-17 Nano World
No ratings yet
Stas Week 16-17 Nano World
4 pages
Robotics 2
100% (1)
Robotics 2
13 pages
K-Nearest Neighbour (KNN)
No ratings yet
K-Nearest Neighbour (KNN)
14 pages
History of Deep Learning
No ratings yet
History of Deep Learning
23 pages
Kaldi Nnet3 PDF
No ratings yet
Kaldi Nnet3 PDF
1 page
Nanotechnology Andrecent Development in Solar Cell: Sidharth S Reg No:18Bec0017 Course Code:Ece1006
No ratings yet
Nanotechnology Andrecent Development in Solar Cell: Sidharth S Reg No:18Bec0017 Course Code:Ece1006
6 pages
Domain: Artificial Intelligence: You Have To Complete Four Core Courses, One Course in Each Core Set Shown Below
No ratings yet
Domain: Artificial Intelligence: You Have To Complete Four Core Courses, One Course in Each Core Set Shown Below
2 pages
Introduction To Object Detection
No ratings yet
Introduction To Object Detection
24 pages
Project Report
No ratings yet
Project Report
5 pages
Programming Computer: Python
No ratings yet
Programming Computer: Python
4 pages
Cours 1 - Intro To Deep Learning
100% (1)
Cours 1 - Intro To Deep Learning
38 pages
Day 2
No ratings yet
Day 2
58 pages
Quiz 102 - Attempt Review Supervision
No ratings yet
Quiz 102 - Attempt Review Supervision
5 pages
CS231A - Computer Vision: Project Proposals
No ratings yet
CS231A - Computer Vision: Project Proposals
46 pages
Week 03 Assignment Solution
No ratings yet
Week 03 Assignment Solution
5 pages
Micro and Nano Manufacturing
No ratings yet
Micro and Nano Manufacturing
3 pages
What Is AI? What Is ML? What Is Deep Learning? Machine Learning Process
No ratings yet
What Is AI? What Is ML? What Is Deep Learning? Machine Learning Process
8 pages
Ai ML DL App
No ratings yet
Ai ML DL App
24 pages
Artecle Review
No ratings yet
Artecle Review
4 pages
Solamalai College of Engineering Madurai.: Welcomes U ALL
No ratings yet
Solamalai College of Engineering Madurai.: Welcomes U ALL
19 pages
Anime Face Generation Using DC-GANs
No ratings yet
Anime Face Generation Using DC-GANs
6 pages
KKR & KSR Institute of Technology and Sciences Cohort-3 & 4 Students Details
No ratings yet
KKR & KSR Institute of Technology and Sciences Cohort-3 & 4 Students Details
577 pages
Smartphone-Based Image Captioning For Visually and Hearing Impaired
No ratings yet
Smartphone-Based Image Captioning For Visually and Hearing Impaired
5 pages

ANN-Unit 7 - Parameter Tuning & Normalization

Uploaded by

ANN-Unit 7 - Parameter Tuning & Normalization

Uploaded by

1/1/2024

Dr. Muhammad Usman Arif; Applied Neural Networks 1/1/2024 1

Dr. Muhammad Usman Arif; Applied Neural Networks 1/1/2024 2

Dr. Muhammad Usman Arif; Applied Neural Networks 1/1/2024 3

Dr. Muhammad Usman Arif; Applied Neural Networks 1/1/2024 4

Exponential Parameter Tuning

▪ Then: r є [-3, -1]

Dr. Muhammad Usman Arif; Applied Neural Networks 1/1/2024 5

Panda vs. Caviar

Dr. Muhammad Usman Arif; Applied Neural Networks 1/1/2024 6

Dr. Muhammad Usman Arif; Applied Neural Networks 1/1/2024 7

Batch Normalization Implementation

Adding Batch Normalization to a Neural Network

𝛾 [1] 𝛽[1] W [2] , b[2] 𝛾 [2] 𝛽[2]

Dr. Muhammad Usman Arif; Applied Neural Networks 1/1/2024 9

Working with Mini-Batches

Parameters:W[l] , b[l], 𝛾[l]𝑎𝑛𝑑 𝛽 [l] z [l] = w [l] a [l-1]

Implementing Gradient Descent

Learning on Shifting Input Distribution

Dr. Muhammad Usman Arif; Applied Neural Networks 1/1/2024 12

Why this is a Problem with NN W [3], b[3] W [4], b[4]

Dr. Muhammad Usman Arif; Applied Neural Networks 1/1/2024 13

Batch Normalization as Regularization

Dr. Muhammad Usman Arif; Applied Neural Networks 1/1/2024 14

Dr. Muhammad Usman Arif; Applied Neural Networks 1/1/2024 15

Dr. Muhammad Usman Arif; Applied Neural Networks 1/1/2024 16

Dr. Muhammad Usman Arif; Applied Neural Networks 1/1/2024 17

Softmax regression generalizes logistic regression to C classes.

Dr. Muhammad Usman Arif; Applied Neural Networks 1/1/2024 18

−𝑦2 log 𝑦ො2 = − log 𝑦ො2

Gradient Descent with Softmax

▪ Backpropogation 𝑧 [𝑙] → 𝑎[𝑙] = 𝑦ො → ℒ 𝑦ො , 𝑦

Dr. Muhammad Usman Arif; Applied Neural Networks 1/1/2024 20

Deep Learning Frameworks

ML Strategies to Improve the System

Ideas: ▪ Try smaller network

▪ Train algorithm longer with gradient descent ▪ Network architecture

▪ Try bigger network ▪ # hidden units

Dr. Muhammad Usman Arif; Applied Neural Networks 1/1/2024 22

Dr. Muhammad Usman Arif; Applied Neural Networks 1/1/2024 23

Dr. Muhammad Usman Arif; Applied Neural Networks 1/1/2024 24

Assumptions for Orthogonalization

Dr. Muhammad Usman Arif; Applied Neural Networks 1/1/2024 25

You might also like