SlideShare a Scribd company logo
Intro to Neural
NetworksEng. Abdallah Bashir
Session topics:
1. Introduction to Neural Networks.
2. Neural Networks Basics.
3. Shallow neural networks.
4. Deep Neural Networks.
1. Introduction to Neural Networks
Introduction to Neural Netwoks
1.1 What is a neuron ?
The input is the size
of the house (x)
The output is the
price (y)
• It is a linear regression problem because the price as a function
of size is a continuous output.
• We know the prices can never be negative so we are creating a
function called Rectified Linear Unit (ReLU) which starts at zero.
• Single neuron = linear regression
1.2 Neural Network Architecture
• The price of a house can be affected by
other features such as size, number of
bedrooms, zip code and wealth.
• The role of the neural network is to
predicted the price and it will automatically
generate the hidden units. We only need to
give the inputs x and the output y.
Input layer
hidden layers
output layer
Each Input will be connected to the
hidden layer and the NN will decide
the connections.
Supervised learning means we have
the (X,Y) and we need to get the
function that maps X to Y.
1.3 SUPERVISED LEARNING WITH NEURAL
NETWORKS
Different types of neural networks for supervised learning which includes:
• Standard NN (Useful for Structured data)
• CNN or convolutional neural networks (Useful in computer vision)
• RNN or Recurrent neural networks (Useful in Speech recognition or NLP)
• Hybrid/custom NN or a Collection of NNs types
Introduction to Neural Netwoks
Introduction to Neural Netwoks
1.4 Structured vs Unstructured Data
• Structured data is like the databases and tables.
• Unstructured data is like images, video, audio, and text.
1.5 Why is deep learning taking off?
Deep learning is taking off for 3 reasons:
1. Data
•For small data NN can perform as Linear regression
or SVM (Support vector machine)
•For big data a small NN is better that SVM
•For big data a big NN is better that a medium NN is
better that small NN.
2. Computation:
•GPUs.
•Powerful CPUs.
•Distributed computing.
3.Algorithm:
Creative algorithms has
appeared that changed
the way NN works.
2. Neural Networks Basics
2.1 Binary Classification
In a binary classification problem, the
result is a discrete value output.
For example:
• account hacked (1) or compromised (0)
•Object is a cat (1) or no cat (0)
Example: Cat vs Non-Cat
The goal is to train a classifier that the input is an image
represented by a feature vector, X, and predicts whether the
corresponding label Y, is 1 or 0. In this case, whether this is a cat
image (1) or a non-cat image (0).
The value in a cell represents the pixel
intensity which will be used to create a
feature vector of n dimension. In pattern
recognition and machine learning, a
feature vector represents an object, in
this case, a cat or no cat.
To create a feature vector, x, the pixel
intensity values will be “unroll” or
“reshape” for each color. The dimension
of the input feature vector x is Nx = 64 x
64 x 3 = 12 288
2.1.1 Neural Networks Notations
Here are some of the notations:
• M is the number of examples in the datasets.
• Nx is the size of the input vector
• Ny is the size of the output vector
• X(1) is the first input vector
• Y(1) is the first output vector
• X = [x(1) x(2).. x(M)]
• Y = (y(1) y(2).. y(M))
• L is the number of layers.
2.2 Logistic Regression
Logistic regression is a learning algorithm used in a supervised learning
problem when the output y are all either zero or one. The goal of
logistic regression is to minimize the error between its predictions and
training data.
Example: Cat vs No - cat
Given an image represented by a feature vector x , the algorithm will
evaluate the probability of a cat being in that image.
The parameters used in Logistic regression are:
Introduction to Neural Netwoks
2.2.1 Cost Function
To train the parameters w and b we need to define a cost function
Loss Function:
The loss function measures the discrepancy between the prediction
and the desired output
To explain the last function lets see:
• if y= 1 ==> L(y',1) = -log(y’)
• if y = 0 ==> L(y',0) = -log(1-y') ==>
• Then the Cost function will be:
• The loss function computes the error for a single training
example
• the cost function is the average of the loss functions of the
entire training set.
2.2.2 Gradient descent
2.2.2 Gradient descent
• Goal is to find 𝑤, 𝑏 that minimize the cost function 𝐽 𝑤, 𝑏
• First we initialize w and b to 0,0 or initialize them to a
random value in the cost function and then try to improve
the values
• In Logistic regression people always use 0,0 instead of
random.
•The gradient decent algorithm repeats:
• w = w - alpha * dw where alpha is the
learning rate and dw is the derivative of w
(Change to w) The derivative is also the
slope of w.
• w = w - alpha * d(J(w,b) / dw) (how much
the function slopes in the w direction)
• b = b - alpha * d(J(w,b) / db) (how much
the function slopes in the d direction)
Gradient Descent
𝑤
Introduction to Neural Netwoks
Computing derivatives
𝑢= 𝑏𝑐
𝑣 = 𝑎 + 𝑢 𝐽 = 3𝑣6
11 33
𝑎 = 5
𝑐 = 2
𝑏 = 3
2.2.3 Vectorizing Logistic Regression
• As an input we have a matrix X and its [Nx, m] and a matrix Y and its
[Ny, m].
• We will then compute at instance [z1,z2...zm] = W' * X + [b,b,...b].
This can be written in python as:
Z = np.dot(W.T,X) + b #Z shape is (1, m)
A = 1 / 1 + np.exp(-Z) # A shape is (1, m)
Vectorizing Logistic Regression's Gradient Output:
• dz = A - Y # dz shape is (1, m)
• dw = np.dot(X, dz.T) / m #dw shape is (Nx, 1)
• db = dz.sum() / m # db shape is (1, 1)
Side Notes
The main steps for building a Neural Network
are:
•Define the model structure (such as number of
input features and outputs)
•Initialize the model's parameters.
•Loop.
• Calculate current loss (forward propagation)
• Calculate current gradient (backward propagation)
• Update parameters (gradient descent)
Side Notes
•Preprocessing the dataset is important.
Tuning the learning rate (which is an example of a "hyperparameter")
can make a big difference to the algorithm.
kaggle.com is a good place for datasets and competitions.
3. Shallow
Neural Networks
3.1 Neural Networks Overview
• In logistic regression we had:
𝑥1
𝑥2
𝑥3
𝑦
x
w
b
𝑧 = 𝑤 𝑇
𝑥 + 𝑏 𝑎 = 𝜎(𝑧) ℒ(𝑎, 𝑦)
• In neural networks with one layer we will have:
𝑥1
𝑥2
𝑥3
𝑦
𝑧[1] = 𝑊[1] 𝑥 + 𝑏[1]
x
𝑊[1]
𝑏[1]
𝑎[1]
= 𝜎(𝑧[1]
) 𝑧[2]
= 𝑊[2]
𝑎[1]
+ 𝑏[2] 𝑎[2]
= 𝜎(𝑧[2]
) ℒ(𝑎[2]
, 𝑦)
3.2 Shallow Neural Network Representation
• We will define the neural networks that has one hidden layer.
• NN contains of input layers, hidden layers, output layers.
• Hidden layer means we cant see that layers in the training set.
• a0 = x (the input layer)
• a1 will represent the activation of the hidden neurons.
• a2 will represent the output layer.
• We are talking about 2 layers NN. The input layer isn't counted.
𝑥1
𝑥2
𝑥3
𝑦
3.3 Forward Propagation
3.3 Forward Propagation
𝑥1
𝑥2
𝑥3
𝑦
…𝑋 = 𝑥(1)
𝑥(2) 𝑥(𝑚)
𝑎[1](2)A[1]
= 𝑎[1](1) 𝑎[1](𝑚)…
𝑍 1 = 𝑊 1 𝑋 + 𝑏 1
𝐴 1 = 𝜎(𝑍 1 )
𝑍 2
= 𝑊 2
𝐴 1
+ 𝑏 2
𝐴 2 = 𝜎(𝑍 2 )
Here are some information about the last image:
1) Nh= 4
2) Nx = 3
3) Shapes of the variables:
I. W1 is the matrix of the first hidden layer, it has a shape of
(noOfHiddenNeurons,nx)
II. b1 is the matrix of the first hidden layer, it has a shape of
(noOfHiddenNeurons,1)
III. z1 is the result of the equation z1 = W1*X + b, it has a shape of
(noOfHiddenNeurons,1)
IV. a1 is the result of the equation a1 = sigmoid(z1), it has a shape of
(noOfHiddenNeurons,1)
V. W2 is the matrix of the second layer, it has a shape of (1,noOfHiddenLayers)
VI. b2 is the matrix of the second layer, it has a shape of (1,1)
VII. z2 is the result of the equation z2 = W2*a1 + b, it has a shape of (1,1)
VIII. a2 is the result of the equation a2 = sigmoid(z2), it has a shape of (1,1)
•Pseudo code for forward propagation for the 2
layers NN, Lets say we have X on shape (Nx,m):
Z1 = W1X + b1 # shape of Z1 (noOfHiddenNeurons,m)
A1 = sigmoid(Z1) # shape of A1 (noOfHiddenNeurons,m)
Z2 = W2A1 + b2 # shape of Z2 is (1,m)
A2 = sigmoid(Z2) # shape of A2 is (1,m)
3.4 Activation Functions
• In computational networks, the activation function of a node defines
the output of that node given an input or set of inputs. A standard
computer chip circuit can be seen as a digital network of activation
functions that can be "ON" (1) or "OFF" (0)
• So far we are using sigmoid, but in some cases other functions can be
a lot better.
• Sigmoid can lead us to gradient decent problem where the updates
are so low.
• Sigmoid activation function range is [0,1] A = 1 / (1 + np.exp(-z)) #
Where z is the input matrix
• Tanh activation function range is [-1,1] (Shifted version of sigmoid
function)
• It turns out that the tanh activation usually works better than
sigmoid activation function for hidden units.
• Sigmoid or Tanh function disadvantage is that if the input is too
small or too high, the slope will be near zero which will cause
us the gradient decent problem.
• One of the popular activation functions that solved the slow
gradient decent is the RELU function. RELU = max(0,z) # so if z
is negative the slope is 0 and if z is positive the slope remains
linear.
• So here is some basic rule for choosing activation functions, if
your classification is between 0 and 1, use the output
activation as sigmoid and the others as RELU
Introduction to Neural Netwoks
Side Notes
• In NN you will decide a lot of choices like:
• No of hidden layers.
• No of neurons in each hidden layer.
• Learning rate. (The most important parameter)
• Activation functions.
• And others..
3.4 Backpropagation
•This is when all the magic happens !!
3.4 Backward Propagation
NN parameters:
o n[0] = Nx
o n[1] = NoOfHiddenNeurons
o n[2] = NoOfOutputNeurons = 1
o W1 shape is (n[1],n[0])
o b1 shape is (n[1],1)
o W2 shape is (n[2],n[1])
o b2 shape is (n[2],1)
Then Gradient descent:
Repeat:
Compute predictions (y'[i], i = 0,...m)
Get derivatives: dW1, db1, dW2, db2
Update: W1 = W1 - LearningRate * dW1
b1 = b1 - LearningRate * db1
W2 = W2 - LearningRate * dW2
b2 = b2 - LearningRate * db2
Forward propagation:
oZ1 = W1A0 + b1 # A0 is X
oA1 = g1(Z1)
oZ2 = W2A1 + b2
oA2 = Sigmoid(Z2) # Sigmoid because the output is between 0 and 1
𝑥1
𝑥2
𝑥3
𝑦
Back propagation :
odZ2 = A2 - Y
odW2 = (dZ2 * A1.T) / m
odb2 = Sum(dZ2) / m
odZ1 = (W2.T * dZ2) * g'1(Z1) # element wise product (*)
odW2 = (dZ1 * A0.T) / m # A0 = X
odb2 = Sum(dZ1) / m
𝑥1
𝑥2
𝑥3
𝑦
Introduction to Neural Netwoks
3.5 Random Initialization
• In logistic regression it wasn't important to initialize the
weights randomly, while in NN we have to initialize them
randomly.
• If we initialize all the weights with zeros in NN it won't
work (initializing bias with zero is OK):
• All hidden units will be completely identical (symmetric) -
compute exactly the same function.
• On each gradient descent iteration all the hidden units will
always update the same.
• To solve this we initialize the W's with a small random numbers:
• W1 = np.random.randn((2,2)) * 0.01 # 0.01 to make it small enough
• b1 = np.zeros((2,1)) # its ok to have b as zero, it won't get us
to the symmetry breaking
𝑎1
[1]
𝑥1
𝑎2
[1]
𝑥2
𝑦𝑎1
[2]
Introduction to Neural Netwoks
4. Deep Neural
Networks
Introduction to Neural Netwoks
4.1 Deep L-layer neural network
•Shallow NN is a NN with one or two layers.
•Deep NN is a NN with three or more layers.
•We will use the notation L to denote the number
of layers in a NN.
•n[l] is the number of neurons in a specific layer l.
•n[0] denotes the number of neurons input layer.
n[L] denotes the number of neurons in output
layer.
•g[l] is the activation function.
4.2 Forward Propagation in a Deep Network
Forward propagation general rule for m inputs:
•Z[l] = W[l]A[l-1] + B[l]
•A[l] = g[l](A[l])
4.2.1 Matrix Dimensions
•Dimension of W is (n[l],n[l-1]) . Can be thought by
right to left.
•Dimension of b is (n[l],1)
•dw has the same shape as W, while db is the
same shape as b
•Dimension of Z[l], A[l], dZ[l], and dA[l] is (n[l],m)
4.3 Intuition about deep representation
𝑦
4.4 Parameters vs Hyperparameters
• Main parameters of the NN is W and b
• Hyper parameters (parameters that control the algorithm) are like:
• Learning rate.
• Number of iteration.
• Number of hidden layers L.
• Number of hidden units n.
• Choice of activation functions.
• You have to try values yourself of hyper parameters.
Introduction to Neural Netwoks
4.5 NN and The Human Brain !
•The analogy that "It is like the brain" has become really
an oversimplified explanation.
•There is a very simplistic analogy between a single
logistic unit and a single neuron in the brain.
•No human today understand how a human brain
neuron works.
•No human today know exactly how many neurons on
the brain.
Ad

More Related Content

What's hot (20)

Dynamic programming Basics
Dynamic programming BasicsDynamic programming Basics
Dynamic programming Basics
Kvishnu Dahatonde
 
Backpropagation (DLAI D3L1 2017 UPC Deep Learning for Artificial Intelligence)
Backpropagation (DLAI D3L1 2017 UPC Deep Learning for Artificial Intelligence)Backpropagation (DLAI D3L1 2017 UPC Deep Learning for Artificial Intelligence)
Backpropagation (DLAI D3L1 2017 UPC Deep Learning for Artificial Intelligence)
Universitat Politècnica de Catalunya
 
Principal Component Analysis for Tensor Analysis and EEG classification
Principal Component Analysis for Tensor Analysis and EEG classificationPrincipal Component Analysis for Tensor Analysis and EEG classification
Principal Component Analysis for Tensor Analysis and EEG classification
Tatsuya Yokota
 
deep learning
deep learningdeep learning
deep learning
Aravindharamanan S
 
Linked CP Tensor Decomposition (presented by ICONIP2012)
Linked CP Tensor Decomposition (presented by ICONIP2012)Linked CP Tensor Decomposition (presented by ICONIP2012)
Linked CP Tensor Decomposition (presented by ICONIP2012)
Tatsuya Yokota
 
06 Arithmetic 1
06 Arithmetic 106 Arithmetic 1
06 Arithmetic 1
anithabalaprabhu
 
Tensor representations in signal processing and machine learning (tutorial ta...
Tensor representations in signal processing and machine learning (tutorial ta...Tensor representations in signal processing and machine learning (tutorial ta...
Tensor representations in signal processing and machine learning (tutorial ta...
Tatsuya Yokota
 
Ch 04 Arithmetic Coding (Ppt)
Ch 04 Arithmetic Coding (Ppt)Ch 04 Arithmetic Coding (Ppt)
Ch 04 Arithmetic Coding (Ppt)
anithabalaprabhu
 
Neural network
Neural networkNeural network
Neural network
Mahmoud Hussein
 
Lec3
Lec3Lec3
Lec3
Amba Research
 
505 260-266
505 260-266505 260-266
505 260-266
idescitation
 
Rabbit challenge 3 DNN Day1
Rabbit challenge 3 DNN Day1Rabbit challenge 3 DNN Day1
Rabbit challenge 3 DNN Day1
TOMMYLINK1
 
Rabbit challenge 3 DNN Day2
Rabbit challenge 3 DNN Day2Rabbit challenge 3 DNN Day2
Rabbit challenge 3 DNN Day2
TOMMYLINK1
 
Digital signal and image processing FAQ
Digital signal and image processing FAQDigital signal and image processing FAQ
Digital signal and image processing FAQ
Mukesh Tekwani
 
오토인코더의 모든 것
오토인코더의 모든 것오토인코더의 모든 것
오토인코더의 모든 것
NAVER Engineering
 
Unit 3 daa
Unit 3 daaUnit 3 daa
Unit 3 daa
Nv Thejaswini
 
Svm V SVC
Svm V SVCSvm V SVC
Svm V SVC
AMR koura
 
Lesson 1: Functions and their representations (slides)
Lesson 1: Functions and their representations (slides)Lesson 1: Functions and their representations (slides)
Lesson 1: Functions and their representations (slides)
Matthew Leingang
 
Deep Generative Models I (DLAI D9L2 2017 UPC Deep Learning for Artificial Int...
Deep Generative Models I (DLAI D9L2 2017 UPC Deep Learning for Artificial Int...Deep Generative Models I (DLAI D9L2 2017 UPC Deep Learning for Artificial Int...
Deep Generative Models I (DLAI D9L2 2017 UPC Deep Learning for Artificial Int...
Universitat Politècnica de Catalunya
 
Fast dct algorithm using winograd’s method
Fast dct algorithm using winograd’s methodFast dct algorithm using winograd’s method
Fast dct algorithm using winograd’s method
IAEME Publication
 
Backpropagation (DLAI D3L1 2017 UPC Deep Learning for Artificial Intelligence)
Backpropagation (DLAI D3L1 2017 UPC Deep Learning for Artificial Intelligence)Backpropagation (DLAI D3L1 2017 UPC Deep Learning for Artificial Intelligence)
Backpropagation (DLAI D3L1 2017 UPC Deep Learning for Artificial Intelligence)
Universitat Politècnica de Catalunya
 
Principal Component Analysis for Tensor Analysis and EEG classification
Principal Component Analysis for Tensor Analysis and EEG classificationPrincipal Component Analysis for Tensor Analysis and EEG classification
Principal Component Analysis for Tensor Analysis and EEG classification
Tatsuya Yokota
 
Linked CP Tensor Decomposition (presented by ICONIP2012)
Linked CP Tensor Decomposition (presented by ICONIP2012)Linked CP Tensor Decomposition (presented by ICONIP2012)
Linked CP Tensor Decomposition (presented by ICONIP2012)
Tatsuya Yokota
 
Tensor representations in signal processing and machine learning (tutorial ta...
Tensor representations in signal processing and machine learning (tutorial ta...Tensor representations in signal processing and machine learning (tutorial ta...
Tensor representations in signal processing and machine learning (tutorial ta...
Tatsuya Yokota
 
Ch 04 Arithmetic Coding (Ppt)
Ch 04 Arithmetic Coding (Ppt)Ch 04 Arithmetic Coding (Ppt)
Ch 04 Arithmetic Coding (Ppt)
anithabalaprabhu
 
Rabbit challenge 3 DNN Day1
Rabbit challenge 3 DNN Day1Rabbit challenge 3 DNN Day1
Rabbit challenge 3 DNN Day1
TOMMYLINK1
 
Rabbit challenge 3 DNN Day2
Rabbit challenge 3 DNN Day2Rabbit challenge 3 DNN Day2
Rabbit challenge 3 DNN Day2
TOMMYLINK1
 
Digital signal and image processing FAQ
Digital signal and image processing FAQDigital signal and image processing FAQ
Digital signal and image processing FAQ
Mukesh Tekwani
 
오토인코더의 모든 것
오토인코더의 모든 것오토인코더의 모든 것
오토인코더의 모든 것
NAVER Engineering
 
Lesson 1: Functions and their representations (slides)
Lesson 1: Functions and their representations (slides)Lesson 1: Functions and their representations (slides)
Lesson 1: Functions and their representations (slides)
Matthew Leingang
 
Deep Generative Models I (DLAI D9L2 2017 UPC Deep Learning for Artificial Int...
Deep Generative Models I (DLAI D9L2 2017 UPC Deep Learning for Artificial Int...Deep Generative Models I (DLAI D9L2 2017 UPC Deep Learning for Artificial Int...
Deep Generative Models I (DLAI D9L2 2017 UPC Deep Learning for Artificial Int...
Universitat Politècnica de Catalunya
 
Fast dct algorithm using winograd’s method
Fast dct algorithm using winograd’s methodFast dct algorithm using winograd’s method
Fast dct algorithm using winograd’s method
IAEME Publication
 

Similar to Introduction to Neural Netwoks (20)

cnn.pptx Convolutional neural network used for image classication
cnn.pptx Convolutional neural network used for image classicationcnn.pptx Convolutional neural network used for image classication
cnn.pptx Convolutional neural network used for image classication
SakkaravarthiShanmug
 
Fundamentals of Neural Networks_AhmadMasri_26_06_2024.pptx
Fundamentals of Neural Networks_AhmadMasri_26_06_2024.pptxFundamentals of Neural Networks_AhmadMasri_26_06_2024.pptx
Fundamentals of Neural Networks_AhmadMasri_26_06_2024.pptx
ArpithaHs3
 
4.2 Neural Networks Overviewwwwwwww.pptx
4.2 Neural Networks Overviewwwwwwww.pptx4.2 Neural Networks Overviewwwwwwww.pptx
4.2 Neural Networks Overviewwwwwwww.pptx
NGUYNMINHHIU444154
 
6_1_course-notes-deep-nets-overview.pdf
6_1_course-notes-deep-nets-overview.pdf6_1_course-notes-deep-nets-overview.pdf
6_1_course-notes-deep-nets-overview.pdf
Javier Crisostomo
 
DNN.pptx
DNN.pptxDNN.pptx
DNN.pptx
someshleocola
 
Cs229 notes-deep learning
Cs229 notes-deep learningCs229 notes-deep learning
Cs229 notes-deep learning
VuTran231
 
Deep Learning Module 2A Training MLP.pptx
Deep Learning Module 2A Training MLP.pptxDeep Learning Module 2A Training MLP.pptx
Deep Learning Module 2A Training MLP.pptx
vipul6601
 
neural.ppt
neural.pptneural.ppt
neural.ppt
SuvamSankarKar
 
neural.ppt
neural.pptneural.ppt
neural.ppt
KabileshCm
 
introduction to feed neural networks.ppt
introduction to feed neural networks.pptintroduction to feed neural networks.ppt
introduction to feed neural networks.ppt
ChamilaWalgampaya1
 
neural.ppt
neural.pptneural.ppt
neural.ppt
OhadEfrati1
 
neural (1).ppt
neural (1).pptneural (1).ppt
neural (1).ppt
Almamoon
 
neural.ppt
neural.pptneural.ppt
neural.ppt
ssuserc96a481
 
neural.ppt
neural.pptneural.ppt
neural.ppt
RedjonLleshaj
 
Introduction to Artificial Neural Networks
Introduction to Artificial Neural NetworksIntroduction to Artificial Neural Networks
Introduction to Artificial Neural Networks
Stratio
 
Sequencing and Attention Models - 2nd Version
Sequencing and Attention Models - 2nd VersionSequencing and Attention Models - 2nd Version
Sequencing and Attention Models - 2nd Version
ssuserbd372d
 
Deep neural networks & computational graphs
Deep neural networks & computational graphsDeep neural networks & computational graphs
Deep neural networks & computational graphs
Revanth Kumar
 
Artificial neural networks and deep learning.ppt
Artificial neural networks and deep learning.pptArtificial neural networks and deep learning.ppt
Artificial neural networks and deep learning.ppt
justjoking99yt
 
neural1Advanced Features of Neural Network.ppt
neural1Advanced Features of Neural Network.pptneural1Advanced Features of Neural Network.ppt
neural1Advanced Features of Neural Network.ppt
dabeli2153
 
neural networking and factor analysis.ppt
neural networking and factor analysis.pptneural networking and factor analysis.ppt
neural networking and factor analysis.ppt
apsapssingh9
 
cnn.pptx Convolutional neural network used for image classication
cnn.pptx Convolutional neural network used for image classicationcnn.pptx Convolutional neural network used for image classication
cnn.pptx Convolutional neural network used for image classication
SakkaravarthiShanmug
 
Fundamentals of Neural Networks_AhmadMasri_26_06_2024.pptx
Fundamentals of Neural Networks_AhmadMasri_26_06_2024.pptxFundamentals of Neural Networks_AhmadMasri_26_06_2024.pptx
Fundamentals of Neural Networks_AhmadMasri_26_06_2024.pptx
ArpithaHs3
 
4.2 Neural Networks Overviewwwwwwww.pptx
4.2 Neural Networks Overviewwwwwwww.pptx4.2 Neural Networks Overviewwwwwwww.pptx
4.2 Neural Networks Overviewwwwwwww.pptx
NGUYNMINHHIU444154
 
6_1_course-notes-deep-nets-overview.pdf
6_1_course-notes-deep-nets-overview.pdf6_1_course-notes-deep-nets-overview.pdf
6_1_course-notes-deep-nets-overview.pdf
Javier Crisostomo
 
Cs229 notes-deep learning
Cs229 notes-deep learningCs229 notes-deep learning
Cs229 notes-deep learning
VuTran231
 
Deep Learning Module 2A Training MLP.pptx
Deep Learning Module 2A Training MLP.pptxDeep Learning Module 2A Training MLP.pptx
Deep Learning Module 2A Training MLP.pptx
vipul6601
 
introduction to feed neural networks.ppt
introduction to feed neural networks.pptintroduction to feed neural networks.ppt
introduction to feed neural networks.ppt
ChamilaWalgampaya1
 
neural (1).ppt
neural (1).pptneural (1).ppt
neural (1).ppt
Almamoon
 
Introduction to Artificial Neural Networks
Introduction to Artificial Neural NetworksIntroduction to Artificial Neural Networks
Introduction to Artificial Neural Networks
Stratio
 
Sequencing and Attention Models - 2nd Version
Sequencing and Attention Models - 2nd VersionSequencing and Attention Models - 2nd Version
Sequencing and Attention Models - 2nd Version
ssuserbd372d
 
Deep neural networks & computational graphs
Deep neural networks & computational graphsDeep neural networks & computational graphs
Deep neural networks & computational graphs
Revanth Kumar
 
Artificial neural networks and deep learning.ppt
Artificial neural networks and deep learning.pptArtificial neural networks and deep learning.ppt
Artificial neural networks and deep learning.ppt
justjoking99yt
 
neural1Advanced Features of Neural Network.ppt
neural1Advanced Features of Neural Network.pptneural1Advanced Features of Neural Network.ppt
neural1Advanced Features of Neural Network.ppt
dabeli2153
 
neural networking and factor analysis.ppt
neural networking and factor analysis.pptneural networking and factor analysis.ppt
neural networking and factor analysis.ppt
apsapssingh9
 
Ad

Recently uploaded (20)

VKS-Python Basics for Beginners and advance.pptx
VKS-Python Basics for Beginners and advance.pptxVKS-Python Basics for Beginners and advance.pptx
VKS-Python Basics for Beginners and advance.pptx
Vinod Srivastava
 
Molecular methods diagnostic and monitoring of infection - Repaired.pptx
Molecular methods diagnostic and monitoring of infection  -  Repaired.pptxMolecular methods diagnostic and monitoring of infection  -  Repaired.pptx
Molecular methods diagnostic and monitoring of infection - Repaired.pptx
7tzn7x5kky
 
04302025_CCC TUG_DataVista: The Design Story
04302025_CCC TUG_DataVista: The Design Story04302025_CCC TUG_DataVista: The Design Story
04302025_CCC TUG_DataVista: The Design Story
ccctableauusergroup
 
Secure_File_Storage_Hybrid_Cryptography.pptx..
Secure_File_Storage_Hybrid_Cryptography.pptx..Secure_File_Storage_Hybrid_Cryptography.pptx..
Secure_File_Storage_Hybrid_Cryptography.pptx..
yuvarajreddy2002
 
Adobe Analytics NOAM Central User Group April 2025 Agent AI: Uncovering the S...
Adobe Analytics NOAM Central User Group April 2025 Agent AI: Uncovering the S...Adobe Analytics NOAM Central User Group April 2025 Agent AI: Uncovering the S...
Adobe Analytics NOAM Central User Group April 2025 Agent AI: Uncovering the S...
gmuir1066
 
MASAkkjjkttuyrdquesjhjhjfc44dddtions.docx
MASAkkjjkttuyrdquesjhjhjfc44dddtions.docxMASAkkjjkttuyrdquesjhjhjfc44dddtions.docx
MASAkkjjkttuyrdquesjhjhjfc44dddtions.docx
santosh162
 
Deloitte Analytics - Applying Process Mining in an audit context
Deloitte Analytics - Applying Process Mining in an audit contextDeloitte Analytics - Applying Process Mining in an audit context
Deloitte Analytics - Applying Process Mining in an audit context
Process mining Evangelist
 
Defense Against LLM Scheming 2025_04_28.pptx
Defense Against LLM Scheming 2025_04_28.pptxDefense Against LLM Scheming 2025_04_28.pptx
Defense Against LLM Scheming 2025_04_28.pptx
Greg Makowski
 
Digilocker under workingProcess Flow.pptx
Digilocker  under workingProcess Flow.pptxDigilocker  under workingProcess Flow.pptx
Digilocker under workingProcess Flow.pptx
satnamsadguru491
 
VKS-Python-FIe Handling text CSV Binary.pptx
VKS-Python-FIe Handling text CSV Binary.pptxVKS-Python-FIe Handling text CSV Binary.pptx
VKS-Python-FIe Handling text CSV Binary.pptx
Vinod Srivastava
 
Day 1 - Lab 1 Reconnaissance Scanning with NMAP, Vulnerability Assessment wit...
Day 1 - Lab 1 Reconnaissance Scanning with NMAP, Vulnerability Assessment wit...Day 1 - Lab 1 Reconnaissance Scanning with NMAP, Vulnerability Assessment wit...
Day 1 - Lab 1 Reconnaissance Scanning with NMAP, Vulnerability Assessment wit...
Abodahab
 
定制学历(美国Purdue毕业证)普渡大学电子版毕业证
定制学历(美国Purdue毕业证)普渡大学电子版毕业证定制学历(美国Purdue毕业证)普渡大学电子版毕业证
定制学历(美国Purdue毕业证)普渡大学电子版毕业证
Taqyea
 
Customer Segmentation using K-Means clustering
Customer Segmentation using K-Means clusteringCustomer Segmentation using K-Means clustering
Customer Segmentation using K-Means clustering
Ingrid Nyakerario
 
183409-christina-rossetti.pdfdsfsdasggsag
183409-christina-rossetti.pdfdsfsdasggsag183409-christina-rossetti.pdfdsfsdasggsag
183409-christina-rossetti.pdfdsfsdasggsag
fardin123rahman07
 
4. Multivariable statistics_Using Stata_2025.pdf
4. Multivariable statistics_Using Stata_2025.pdf4. Multivariable statistics_Using Stata_2025.pdf
4. Multivariable statistics_Using Stata_2025.pdf
axonneurologycenter1
 
C++_OOPs_DSA1_Presentation_Template.pptx
C++_OOPs_DSA1_Presentation_Template.pptxC++_OOPs_DSA1_Presentation_Template.pptx
C++_OOPs_DSA1_Presentation_Template.pptx
aquibnoor22079
 
Calories_Prediction_using_Linear_Regression.pptx
Calories_Prediction_using_Linear_Regression.pptxCalories_Prediction_using_Linear_Regression.pptx
Calories_Prediction_using_Linear_Regression.pptx
TijiLMAHESHWARI
 
Perencanaan Pengendalian-Proyek-Konstruksi-MS-PROJECT.pptx
Perencanaan Pengendalian-Proyek-Konstruksi-MS-PROJECT.pptxPerencanaan Pengendalian-Proyek-Konstruksi-MS-PROJECT.pptx
Perencanaan Pengendalian-Proyek-Konstruksi-MS-PROJECT.pptx
PareaRusan
 
1. Briefing Session_SEED with Hon. Governor Assam - 27.10.pdf
1. Briefing Session_SEED with Hon. Governor Assam - 27.10.pdf1. Briefing Session_SEED with Hon. Governor Assam - 27.10.pdf
1. Briefing Session_SEED with Hon. Governor Assam - 27.10.pdf
Simran112433
 
DPR_Expert_Recruitment_notice_Revised.pdf
DPR_Expert_Recruitment_notice_Revised.pdfDPR_Expert_Recruitment_notice_Revised.pdf
DPR_Expert_Recruitment_notice_Revised.pdf
inmishra17121973
 
VKS-Python Basics for Beginners and advance.pptx
VKS-Python Basics for Beginners and advance.pptxVKS-Python Basics for Beginners and advance.pptx
VKS-Python Basics for Beginners and advance.pptx
Vinod Srivastava
 
Molecular methods diagnostic and monitoring of infection - Repaired.pptx
Molecular methods diagnostic and monitoring of infection  -  Repaired.pptxMolecular methods diagnostic and monitoring of infection  -  Repaired.pptx
Molecular methods diagnostic and monitoring of infection - Repaired.pptx
7tzn7x5kky
 
04302025_CCC TUG_DataVista: The Design Story
04302025_CCC TUG_DataVista: The Design Story04302025_CCC TUG_DataVista: The Design Story
04302025_CCC TUG_DataVista: The Design Story
ccctableauusergroup
 
Secure_File_Storage_Hybrid_Cryptography.pptx..
Secure_File_Storage_Hybrid_Cryptography.pptx..Secure_File_Storage_Hybrid_Cryptography.pptx..
Secure_File_Storage_Hybrid_Cryptography.pptx..
yuvarajreddy2002
 
Adobe Analytics NOAM Central User Group April 2025 Agent AI: Uncovering the S...
Adobe Analytics NOAM Central User Group April 2025 Agent AI: Uncovering the S...Adobe Analytics NOAM Central User Group April 2025 Agent AI: Uncovering the S...
Adobe Analytics NOAM Central User Group April 2025 Agent AI: Uncovering the S...
gmuir1066
 
MASAkkjjkttuyrdquesjhjhjfc44dddtions.docx
MASAkkjjkttuyrdquesjhjhjfc44dddtions.docxMASAkkjjkttuyrdquesjhjhjfc44dddtions.docx
MASAkkjjkttuyrdquesjhjhjfc44dddtions.docx
santosh162
 
Deloitte Analytics - Applying Process Mining in an audit context
Deloitte Analytics - Applying Process Mining in an audit contextDeloitte Analytics - Applying Process Mining in an audit context
Deloitte Analytics - Applying Process Mining in an audit context
Process mining Evangelist
 
Defense Against LLM Scheming 2025_04_28.pptx
Defense Against LLM Scheming 2025_04_28.pptxDefense Against LLM Scheming 2025_04_28.pptx
Defense Against LLM Scheming 2025_04_28.pptx
Greg Makowski
 
Digilocker under workingProcess Flow.pptx
Digilocker  under workingProcess Flow.pptxDigilocker  under workingProcess Flow.pptx
Digilocker under workingProcess Flow.pptx
satnamsadguru491
 
VKS-Python-FIe Handling text CSV Binary.pptx
VKS-Python-FIe Handling text CSV Binary.pptxVKS-Python-FIe Handling text CSV Binary.pptx
VKS-Python-FIe Handling text CSV Binary.pptx
Vinod Srivastava
 
Day 1 - Lab 1 Reconnaissance Scanning with NMAP, Vulnerability Assessment wit...
Day 1 - Lab 1 Reconnaissance Scanning with NMAP, Vulnerability Assessment wit...Day 1 - Lab 1 Reconnaissance Scanning with NMAP, Vulnerability Assessment wit...
Day 1 - Lab 1 Reconnaissance Scanning with NMAP, Vulnerability Assessment wit...
Abodahab
 
定制学历(美国Purdue毕业证)普渡大学电子版毕业证
定制学历(美国Purdue毕业证)普渡大学电子版毕业证定制学历(美国Purdue毕业证)普渡大学电子版毕业证
定制学历(美国Purdue毕业证)普渡大学电子版毕业证
Taqyea
 
Customer Segmentation using K-Means clustering
Customer Segmentation using K-Means clusteringCustomer Segmentation using K-Means clustering
Customer Segmentation using K-Means clustering
Ingrid Nyakerario
 
183409-christina-rossetti.pdfdsfsdasggsag
183409-christina-rossetti.pdfdsfsdasggsag183409-christina-rossetti.pdfdsfsdasggsag
183409-christina-rossetti.pdfdsfsdasggsag
fardin123rahman07
 
4. Multivariable statistics_Using Stata_2025.pdf
4. Multivariable statistics_Using Stata_2025.pdf4. Multivariable statistics_Using Stata_2025.pdf
4. Multivariable statistics_Using Stata_2025.pdf
axonneurologycenter1
 
C++_OOPs_DSA1_Presentation_Template.pptx
C++_OOPs_DSA1_Presentation_Template.pptxC++_OOPs_DSA1_Presentation_Template.pptx
C++_OOPs_DSA1_Presentation_Template.pptx
aquibnoor22079
 
Calories_Prediction_using_Linear_Regression.pptx
Calories_Prediction_using_Linear_Regression.pptxCalories_Prediction_using_Linear_Regression.pptx
Calories_Prediction_using_Linear_Regression.pptx
TijiLMAHESHWARI
 
Perencanaan Pengendalian-Proyek-Konstruksi-MS-PROJECT.pptx
Perencanaan Pengendalian-Proyek-Konstruksi-MS-PROJECT.pptxPerencanaan Pengendalian-Proyek-Konstruksi-MS-PROJECT.pptx
Perencanaan Pengendalian-Proyek-Konstruksi-MS-PROJECT.pptx
PareaRusan
 
1. Briefing Session_SEED with Hon. Governor Assam - 27.10.pdf
1. Briefing Session_SEED with Hon. Governor Assam - 27.10.pdf1. Briefing Session_SEED with Hon. Governor Assam - 27.10.pdf
1. Briefing Session_SEED with Hon. Governor Assam - 27.10.pdf
Simran112433
 
DPR_Expert_Recruitment_notice_Revised.pdf
DPR_Expert_Recruitment_notice_Revised.pdfDPR_Expert_Recruitment_notice_Revised.pdf
DPR_Expert_Recruitment_notice_Revised.pdf
inmishra17121973
 
Ad

Introduction to Neural Netwoks

  • 2. Session topics: 1. Introduction to Neural Networks. 2. Neural Networks Basics. 3. Shallow neural networks. 4. Deep Neural Networks.
  • 3. 1. Introduction to Neural Networks
  • 5. 1.1 What is a neuron ? The input is the size of the house (x) The output is the price (y)
  • 6. • It is a linear regression problem because the price as a function of size is a continuous output. • We know the prices can never be negative so we are creating a function called Rectified Linear Unit (ReLU) which starts at zero. • Single neuron = linear regression
  • 7. 1.2 Neural Network Architecture • The price of a house can be affected by other features such as size, number of bedrooms, zip code and wealth. • The role of the neural network is to predicted the price and it will automatically generate the hidden units. We only need to give the inputs x and the output y.
  • 9. Each Input will be connected to the hidden layer and the NN will decide the connections. Supervised learning means we have the (X,Y) and we need to get the function that maps X to Y.
  • 10. 1.3 SUPERVISED LEARNING WITH NEURAL NETWORKS Different types of neural networks for supervised learning which includes: • Standard NN (Useful for Structured data) • CNN or convolutional neural networks (Useful in computer vision) • RNN or Recurrent neural networks (Useful in Speech recognition or NLP) • Hybrid/custom NN or a Collection of NNs types
  • 13. 1.4 Structured vs Unstructured Data • Structured data is like the databases and tables. • Unstructured data is like images, video, audio, and text.
  • 14. 1.5 Why is deep learning taking off? Deep learning is taking off for 3 reasons: 1. Data
  • 15. •For small data NN can perform as Linear regression or SVM (Support vector machine) •For big data a small NN is better that SVM •For big data a big NN is better that a medium NN is better that small NN.
  • 16. 2. Computation: •GPUs. •Powerful CPUs. •Distributed computing. 3.Algorithm: Creative algorithms has appeared that changed the way NN works.
  • 18. 2.1 Binary Classification In a binary classification problem, the result is a discrete value output. For example: • account hacked (1) or compromised (0) •Object is a cat (1) or no cat (0)
  • 19. Example: Cat vs Non-Cat The goal is to train a classifier that the input is an image represented by a feature vector, X, and predicts whether the corresponding label Y, is 1 or 0. In this case, whether this is a cat image (1) or a non-cat image (0).
  • 20. The value in a cell represents the pixel intensity which will be used to create a feature vector of n dimension. In pattern recognition and machine learning, a feature vector represents an object, in this case, a cat or no cat. To create a feature vector, x, the pixel intensity values will be “unroll” or “reshape” for each color. The dimension of the input feature vector x is Nx = 64 x 64 x 3 = 12 288
  • 21. 2.1.1 Neural Networks Notations Here are some of the notations: • M is the number of examples in the datasets. • Nx is the size of the input vector • Ny is the size of the output vector • X(1) is the first input vector • Y(1) is the first output vector • X = [x(1) x(2).. x(M)] • Y = (y(1) y(2).. y(M)) • L is the number of layers.
  • 22. 2.2 Logistic Regression Logistic regression is a learning algorithm used in a supervised learning problem when the output y are all either zero or one. The goal of logistic regression is to minimize the error between its predictions and training data. Example: Cat vs No - cat Given an image represented by a feature vector x , the algorithm will evaluate the probability of a cat being in that image.
  • 23. The parameters used in Logistic regression are:
  • 25. 2.2.1 Cost Function To train the parameters w and b we need to define a cost function Loss Function: The loss function measures the discrepancy between the prediction and the desired output
  • 26. To explain the last function lets see: • if y= 1 ==> L(y',1) = -log(y’) • if y = 0 ==> L(y',0) = -log(1-y') ==>
  • 27. • Then the Cost function will be: • The loss function computes the error for a single training example • the cost function is the average of the loss functions of the entire training set.
  • 29. 2.2.2 Gradient descent • Goal is to find 𝑤, 𝑏 that minimize the cost function 𝐽 𝑤, 𝑏 • First we initialize w and b to 0,0 or initialize them to a random value in the cost function and then try to improve the values • In Logistic regression people always use 0,0 instead of random.
  • 30. •The gradient decent algorithm repeats: • w = w - alpha * dw where alpha is the learning rate and dw is the derivative of w (Change to w) The derivative is also the slope of w. • w = w - alpha * d(J(w,b) / dw) (how much the function slopes in the w direction) • b = b - alpha * d(J(w,b) / db) (how much the function slopes in the d direction)
  • 33. Computing derivatives 𝑢= 𝑏𝑐 𝑣 = 𝑎 + 𝑢 𝐽 = 3𝑣6 11 33 𝑎 = 5 𝑐 = 2 𝑏 = 3
  • 34. 2.2.3 Vectorizing Logistic Regression • As an input we have a matrix X and its [Nx, m] and a matrix Y and its [Ny, m]. • We will then compute at instance [z1,z2...zm] = W' * X + [b,b,...b]. This can be written in python as: Z = np.dot(W.T,X) + b #Z shape is (1, m) A = 1 / 1 + np.exp(-Z) # A shape is (1, m)
  • 35. Vectorizing Logistic Regression's Gradient Output: • dz = A - Y # dz shape is (1, m) • dw = np.dot(X, dz.T) / m #dw shape is (Nx, 1) • db = dz.sum() / m # db shape is (1, 1)
  • 36. Side Notes The main steps for building a Neural Network are: •Define the model structure (such as number of input features and outputs) •Initialize the model's parameters. •Loop. • Calculate current loss (forward propagation) • Calculate current gradient (backward propagation) • Update parameters (gradient descent)
  • 37. Side Notes •Preprocessing the dataset is important. Tuning the learning rate (which is an example of a "hyperparameter") can make a big difference to the algorithm. kaggle.com is a good place for datasets and competitions.
  • 39. 3.1 Neural Networks Overview • In logistic regression we had: 𝑥1 𝑥2 𝑥3 𝑦 x w b 𝑧 = 𝑤 𝑇 𝑥 + 𝑏 𝑎 = 𝜎(𝑧) ℒ(𝑎, 𝑦)
  • 40. • In neural networks with one layer we will have: 𝑥1 𝑥2 𝑥3 𝑦 𝑧[1] = 𝑊[1] 𝑥 + 𝑏[1] x 𝑊[1] 𝑏[1] 𝑎[1] = 𝜎(𝑧[1] ) 𝑧[2] = 𝑊[2] 𝑎[1] + 𝑏[2] 𝑎[2] = 𝜎(𝑧[2] ) ℒ(𝑎[2] , 𝑦)
  • 41. 3.2 Shallow Neural Network Representation • We will define the neural networks that has one hidden layer. • NN contains of input layers, hidden layers, output layers. • Hidden layer means we cant see that layers in the training set. • a0 = x (the input layer) • a1 will represent the activation of the hidden neurons. • a2 will represent the output layer. • We are talking about 2 layers NN. The input layer isn't counted. 𝑥1 𝑥2 𝑥3 𝑦
  • 43. 3.3 Forward Propagation 𝑥1 𝑥2 𝑥3 𝑦 …𝑋 = 𝑥(1) 𝑥(2) 𝑥(𝑚) 𝑎[1](2)A[1] = 𝑎[1](1) 𝑎[1](𝑚)… 𝑍 1 = 𝑊 1 𝑋 + 𝑏 1 𝐴 1 = 𝜎(𝑍 1 ) 𝑍 2 = 𝑊 2 𝐴 1 + 𝑏 2 𝐴 2 = 𝜎(𝑍 2 )
  • 44. Here are some information about the last image: 1) Nh= 4 2) Nx = 3 3) Shapes of the variables: I. W1 is the matrix of the first hidden layer, it has a shape of (noOfHiddenNeurons,nx) II. b1 is the matrix of the first hidden layer, it has a shape of (noOfHiddenNeurons,1) III. z1 is the result of the equation z1 = W1*X + b, it has a shape of (noOfHiddenNeurons,1) IV. a1 is the result of the equation a1 = sigmoid(z1), it has a shape of (noOfHiddenNeurons,1) V. W2 is the matrix of the second layer, it has a shape of (1,noOfHiddenLayers) VI. b2 is the matrix of the second layer, it has a shape of (1,1) VII. z2 is the result of the equation z2 = W2*a1 + b, it has a shape of (1,1) VIII. a2 is the result of the equation a2 = sigmoid(z2), it has a shape of (1,1)
  • 45. •Pseudo code for forward propagation for the 2 layers NN, Lets say we have X on shape (Nx,m): Z1 = W1X + b1 # shape of Z1 (noOfHiddenNeurons,m) A1 = sigmoid(Z1) # shape of A1 (noOfHiddenNeurons,m) Z2 = W2A1 + b2 # shape of Z2 is (1,m) A2 = sigmoid(Z2) # shape of A2 is (1,m)
  • 46. 3.4 Activation Functions • In computational networks, the activation function of a node defines the output of that node given an input or set of inputs. A standard computer chip circuit can be seen as a digital network of activation functions that can be "ON" (1) or "OFF" (0) • So far we are using sigmoid, but in some cases other functions can be a lot better. • Sigmoid can lead us to gradient decent problem where the updates are so low. • Sigmoid activation function range is [0,1] A = 1 / (1 + np.exp(-z)) # Where z is the input matrix • Tanh activation function range is [-1,1] (Shifted version of sigmoid function)
  • 47. • It turns out that the tanh activation usually works better than sigmoid activation function for hidden units. • Sigmoid or Tanh function disadvantage is that if the input is too small or too high, the slope will be near zero which will cause us the gradient decent problem. • One of the popular activation functions that solved the slow gradient decent is the RELU function. RELU = max(0,z) # so if z is negative the slope is 0 and if z is positive the slope remains linear. • So here is some basic rule for choosing activation functions, if your classification is between 0 and 1, use the output activation as sigmoid and the others as RELU
  • 49. Side Notes • In NN you will decide a lot of choices like: • No of hidden layers. • No of neurons in each hidden layer. • Learning rate. (The most important parameter) • Activation functions. • And others..
  • 50. 3.4 Backpropagation •This is when all the magic happens !!
  • 51. 3.4 Backward Propagation NN parameters: o n[0] = Nx o n[1] = NoOfHiddenNeurons o n[2] = NoOfOutputNeurons = 1 o W1 shape is (n[1],n[0]) o b1 shape is (n[1],1) o W2 shape is (n[2],n[1]) o b2 shape is (n[2],1)
  • 52. Then Gradient descent: Repeat: Compute predictions (y'[i], i = 0,...m) Get derivatives: dW1, db1, dW2, db2 Update: W1 = W1 - LearningRate * dW1 b1 = b1 - LearningRate * db1 W2 = W2 - LearningRate * dW2 b2 = b2 - LearningRate * db2
  • 53. Forward propagation: oZ1 = W1A0 + b1 # A0 is X oA1 = g1(Z1) oZ2 = W2A1 + b2 oA2 = Sigmoid(Z2) # Sigmoid because the output is between 0 and 1 𝑥1 𝑥2 𝑥3 𝑦
  • 54. Back propagation : odZ2 = A2 - Y odW2 = (dZ2 * A1.T) / m odb2 = Sum(dZ2) / m odZ1 = (W2.T * dZ2) * g'1(Z1) # element wise product (*) odW2 = (dZ1 * A0.T) / m # A0 = X odb2 = Sum(dZ1) / m 𝑥1 𝑥2 𝑥3 𝑦
  • 56. 3.5 Random Initialization • In logistic regression it wasn't important to initialize the weights randomly, while in NN we have to initialize them randomly. • If we initialize all the weights with zeros in NN it won't work (initializing bias with zero is OK): • All hidden units will be completely identical (symmetric) - compute exactly the same function. • On each gradient descent iteration all the hidden units will always update the same.
  • 57. • To solve this we initialize the W's with a small random numbers: • W1 = np.random.randn((2,2)) * 0.01 # 0.01 to make it small enough • b1 = np.zeros((2,1)) # its ok to have b as zero, it won't get us to the symmetry breaking 𝑎1 [1] 𝑥1 𝑎2 [1] 𝑥2 𝑦𝑎1 [2]
  • 61. 4.1 Deep L-layer neural network •Shallow NN is a NN with one or two layers. •Deep NN is a NN with three or more layers. •We will use the notation L to denote the number of layers in a NN. •n[l] is the number of neurons in a specific layer l. •n[0] denotes the number of neurons input layer. n[L] denotes the number of neurons in output layer. •g[l] is the activation function.
  • 62. 4.2 Forward Propagation in a Deep Network Forward propagation general rule for m inputs: •Z[l] = W[l]A[l-1] + B[l] •A[l] = g[l](A[l])
  • 63. 4.2.1 Matrix Dimensions •Dimension of W is (n[l],n[l-1]) . Can be thought by right to left. •Dimension of b is (n[l],1) •dw has the same shape as W, while db is the same shape as b •Dimension of Z[l], A[l], dZ[l], and dA[l] is (n[l],m)
  • 64. 4.3 Intuition about deep representation 𝑦
  • 65. 4.4 Parameters vs Hyperparameters • Main parameters of the NN is W and b • Hyper parameters (parameters that control the algorithm) are like: • Learning rate. • Number of iteration. • Number of hidden layers L. • Number of hidden units n. • Choice of activation functions. • You have to try values yourself of hyper parameters.
  • 67. 4.5 NN and The Human Brain ! •The analogy that "It is like the brain" has become really an oversimplified explanation. •There is a very simplistic analogy between a single logistic unit and a single neuron in the brain. •No human today understand how a human brain neuron works. •No human today know exactly how many neurons on the brain.

Editor's Notes

  • #65: Face recognition application: Image ==> Edges ==> Face parts ==> Faces ==> desired face Audio recognition application: Audio ==> Low level sound features like (sss,bb) ==> Phonemes ==> Words ==> Sentences