0% found this document useful (0 votes)

6 views

CS217_2024_lec11

The document provides an introduction to neural networks, discussing their philosophy, historical development, and the mechanics of training them. It explains the importance of non-linear decision boundaries, activation functions, and the backpropagation algorithm in optimizing neural networks. Key concepts such as the vanishing gradient problem and the training process using stochastic gradient descent are also highlighted.

Uploaded by

Samarth Doshi

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

6 views

CS217_2024_lec11

Uploaded by

Samarth Doshi

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 7

CS 217: Artificial Intelligence and Machine Learning Jan-Apr 2024

Lecture 11: Introduction to Neural Networks

Lecturer: Swaprava Nath Scribe(s): SG21, SG22

Disclaimer: These notes aggregate content from several texts and have not been subjected to the usual
scrutiny deserved by formal publications. If you find errors, please bring to the notice of the Instructor.

11.1 Philosophy behind Neural Networks

Consider the example of points shown below in the diagram; there are some cross(×) symbolic points and
some circular (◦) points, and we have to classify them into two different classes.
If we use logistic regression for classification, as the decision boundary of logistic regression is linear in the

Figure 11.1: Activation functions’ graphs

2-D case, it is not possible to perfectly classify the points into two different classes as a line can’t separate
the × and ◦ points into two different regions.
1
f (X, w) = (Logistic Regression)
1 + e−wT X
So, to solve this problem of the linear decision boundary, we can use the concept of the basis function, which
gives us a non-linear decision boundary.
Let us take the basis function as :
T
Φ(X) = 1 x1 x2 x21 x22
Now, the decision boundary of this basis function is circular, and by properly adjusting the weights, we could
find a circular decision boundary that contains all the ◦ points.
1
f (Φ(X), w) = (Logistic Regression with basis function)
1 + e−wT Φ(X)
In Neural Networks, our goal is to attain non-linear behaviour without the need for explicit programming
dedicated to non-linearity; this is the fundamental principle behind Neural Networks.

11-1
11-2 Lecture 11: Introduction to Neural Networks

11.2 Back in Time: A Quick Look at History

• 1943 - Neurophysiologist Warren McCulloch and mathematician Walter Pitts wrote a paper on how
neurons might work. In order to describe how neurons in the brain might work, they modelled a simple
neural network using electrical circuits. They also introduced the perceptron concept.

• 1950s- Nathanial Rochester from the IBM research laboratories led the first effort to simulate a neural
network. Unfortunately for him, the first attempt to do so failed.

• 1957- The first hardware implementation of perceptron was Mark I Perceptron machine built in 1957
at the Cornell Aeronautical Laboratory by psychologist Frank Rosenblatt, funded by the Information
Systems Branch of the United States Office of Naval Research and the Rome Air Development Center.

• 1982- Interest in the field was renewed. John Hopfield of Caltech presented a paper to the National
Academy of Sciences. His approach was to create more useful machines by using bidirectional lines.
Previously, the connections between neurons were only one way.

• 1982- US-Japan Joint Conference on Cooperative/ Competitive Neural Networks at which Japan an-
nounced their Fifth-Generation effort resulted US worrying about being left behind.

• 1997- A recurrent neural network (RNN) framework, Long Short-Term Memory (LSTM), was proposed
by Schmidhuber & Hochreiter.

• 1998- Yann LeCun published Gradient-Based Learning Applied to Document Recognition.

Now, discussions on neural networks are prevalent; the future is here!

11.3 Artificial Neural Networks

First, let want us to understand why neural networks are called neural networks. The way an actual neuron

Figure 11.2: Diagram of a Neuron

works involves the accumulation of electric potential, which, when exceeding a particular value, causes the
pre-synaptic neuron to discharge across the axon and stimulate the post-synaptic neuron. The human brain’s
capabilities are incredible compared to what we can do even with state-of-the-art neural networks.
In the following diagram, we illustrate the analogy between the neuron structure and the artificial neurons
in a neural network.
Lecture 11: Introduction to Neural Networks 11-3

Figure 11.3: Diagram of an Artificial Neuron

If we have multiple features, each is passed through an affine transformation, which is basically a weighted
sum of input features with some bias term, which gives us something resembling a regression equation.We then
pass this result through our activation function, which gives us some form of probability. This probability
determines whether the neuron will fire — our result can then be plugged into our loss function to assess
the algorithm’s performance.
Here is a multi-layer neural network, and our target is to learn the W s and the bs of all layers to minimize
the loss J.

Figure 11.4: Diagram of an Artificial Neuron

11-4 Lecture 11: Introduction to Neural Networks

11.4 Popular Activation Functions

The activation function is analogous to the build-up of electrical potential in biological neurons, which fires
once a certain activation potential is reached. This activation potential is mimicked in artificial neural
networks using probability. The activation function should do two things:-
1. Ensure non-linearity to capture complex features that are not linear.
2. Ensure gradients remain large through the hidden layers in Deep Neural Networks; otherwise, we may
encounter the vanishing gradient problem.
Following are some of the popular activation functions:-

1. Sigmoid (σ) :-
1
σ(x) =
1 + e−x

2. Hyperbolic tan (tanh) :-

e2x − 1
tanh(x) =
e2x + 1

3. Rectified Linear Unit (ReLU) :-

ReLU(x) = max{0, x}

Figure 11.5: Popular activation functions used in neural networks

Vanishing Gradient Problem

The vanishing gradient problem is a challenge encountered in training neural networks, particularly deep
neural networks with many layers. It occurs during the back-propagation algorithm. The issue arises when
the gradients of the loss function with respect to the weights become extremely small as they are back-
propagated through many layers of the network. This causes the updates to the weights to be very small or
negligible, leading to slow convergence or stagnation in training.
Vanishing gradient problem can be mitigated by using ReLU activation function. This is because of non-
saturation property of ReLU. Unlike activation functions such as the sigmoid or hyperbolic tangent (tanh),
ReLU does not saturate in the positive region. This means that for positive input values, the gradient
of ReLU is always equal to 1. This property prevents the gradients from becoming too small as they are
backpropagated through the network, which helps to mitigate the vanishing gradient problem.
Lecture 11: Introduction to Neural Networks 11-5

11.5 Training a Neural Network

N
Given a dataset D = (xi , yi )i=1 , find the values of w & b that minimize the loss function. This process is
called the training of the neural network. Let’s discuss the idea in the context of feed-forward neural networks.

Feedforward Neural Network: It is one of the broad types of Artificial Neural Networks, where the
flow of information is unidirectional and is from the inputs to outputs through hidden layers without any
cycles or loops, in contrast to recurrent neural networks, which have a bi-directional flow. Following are the
steps involved in training a neural network:

• Define the Cost Function (or Loss Function)

The cost function, denoted by J(θ), measures the discrepancy between the predicted values and the
actual labels in the dataset. For a neural network model, this typically involves a loss function l(xi , yi )
applied to each data point. The cost function aggregates these losses over the entire dataset.

N
1 X
J(θ) = l(N N (xi , θ), yi )
N i=1

• Choose a Loss Function

A commonly used loss function for binary classification tasks is the Cross Entropy Loss Function. It
quantifies the difference between the predicted probabilities and the actual class labels.

l(N N (xi , θ), yi ) = − [yi log(N N (xi , θ)) + (1 − yi ) log(1 − N N (xi , θ))]
• Stochastic Gradient Descent (SGD)
– Pick Random Data Point: Randomly select a data point (xi , yi ) from the dataset.
– Compute Gradient of the Loss: Calculate the gradient of the loss function with respect to
the parameters θ for the selected data point.
∇θ l(xi , yi )
– Update Parameters: Update the parameters θ using the gradient descent update rule:
θt+1 = θt − η∇θ l(xi , yi )
Where η (eta) is the learning rate, controlling the size of the steps taken during optimization.
– Mini-Batch Variant: Instead of updating parameters with single data points, one can use mini-
batches of data (B) to compute gradients and update parameters. This often leads to more stable
convergence.

11.6 Backpropagation

The whole training of neural networks lies in the fact that how you can train it using backpropagation. It
d(l)
simply means that if you have loss l and θ is set of all parameters, then how will you calculate d(θ) so that
d(l)
you can train by doing θ1 − = d(θ1 ) × learning rate.

Backpropagation efficiently computes gradients in neural networks by utilizing the chain rule of differenti-
dl dl
ation. For example, if we want to calculate da where f (a) = b, g(b) = l, instead of directly da , first we
db dl dl
compute db/da, and then dl/db, finally da × db = da .
11-6 Lecture 11: Introduction to Neural Networks

x g

Now, say x connected to u1 , u2 connected to g, then

dg du1 dg du2 dg
= × + ×
dx dx du1 dx du2
dg ∂⃗u ∂g
= ·
dx ∂x ∂⃗u
∂g ∂g
Where ∂⃗
u
∂x = ( ∂u 1 ∂u2
∂x , ∂x ) and ∂⃗
u = ( ∂u , ∂g )
1 ∂u2

Let’s move to multiple vectors.

x3 u3 g3

x2 u2 g2

x1 u1 g1

In a multi-layer neural network, each layer consists of nodes and connections between nodes carry weights.
During both forward pass (computing the output) and backward pass (computing gradients for training),
derivatives play a crucial role. Here, we discuss the computation of derivatives with respect to inputs and
the matrix representation of these derivatives.
The derivative ∂⃗u
x represents the sensitivity of the intermediate layer u to changes in the input x. It can be
∂⃗
represented as a matrix where each element (i, j) corresponds to the partial derivative of ui with respect to
xj . This matrix is commonly known as the Jacobian matrix.

 ∂u ∂u2 ∂uk

∂x
1
∂x1 ··· ∂x1
 ∂u11 ∂u2
··· ∂uk 
∂⃗u   ∂x2 ∂x2 ∂x2 
= . .. .. .. 
∂⃗x  .. . . . 

∂u1 ∂u2 ∂uk
∂xd ∂xd ··· ∂xd

∂⃗
g
Similarly, for the matrix of ∂⃗u , it would have elements representing the partial derivatives of each element
in ⃗g with respect to each element in ⃗u.

 ∂g ∂g2 ∂gm

∂u
1
∂u1 ··· ∂u1
 ∂g11 ∂g2 ∂gm 
∂⃗g  ∂u2 ∂u2 ··· ∂u2 
= . .. .. .. 

∂⃗u   .. . . . 
∂g1 ∂g2 ∂gm
∂uk ∂uk ··· ∂uk
Lecture 11: Introduction to Neural Networks 11-7

Consider vectors ⃗x, ⃗u, and ⃗g , where every node in ⃗x is connected to every node in ⃗u, and every node in ⃗u is
∂gi
connected to every node in ⃗g . The derivative ∂x j
represents the sensitivity of each element gi in ⃗g to changes
in each element xj in ⃗x. This derivative can be computed using the chain rule:

∂⃗g ∂⃗u ∂⃗g

= ·
∂⃗x ∂⃗x ∂⃗u

k
∂gi X ∂uz ∂gi
= ·
∂xj z=1
∂xj ∂uz

Here, we sum over all elements uz , where each term in the sum involves the product of two partial derivatives.

References
• Medium article - https://towardsdatascience.com/simple-introduction-to-neural-networks-ac1d7c3d7a2

• History of Neural Networks - https://cs.stanford.edu/people/eroberts/courses/soco/projects/

neural-networks/History/history2.html

Deep Learning for Vision Systems 1st Edition Mohamed Elgendy download pdf
100% (1)
Deep Learning for Vision Systems 1st Edition Mohamed Elgendy download pdf
62 pages
Artificial Intelligence Artificial Neural Networks - : Introduction
No ratings yet
Artificial Intelligence Artificial Neural Networks - : Introduction
43 pages
ANN Lab Manual
100% (3)
ANN Lab Manual
35 pages
ECSE484 Intro v2
No ratings yet
ECSE484 Intro v2
67 pages
Lec10 Handout
No ratings yet
Lec10 Handout
41 pages
9.a Handout-1-NN Arch + Representation Power + Layer Sizes
No ratings yet
9.a Handout-1-NN Arch + Representation Power + Layer Sizes
7 pages
UNIT 3-Multilayer-Perceptrons
No ratings yet
UNIT 3-Multilayer-Perceptrons
23 pages
CS224n: Natural Language Processing With Deep Learning
No ratings yet
CS224n: Natural Language Processing With Deep Learning
18 pages
FRJ Paper
No ratings yet
FRJ Paper
9 pages
Multilayer Perceptrons and Backpropagation Learning: 1 Some History
No ratings yet
Multilayer Perceptrons and Backpropagation Learning: 1 Some History
6 pages
Main
No ratings yet
Main
25 pages
Deep Learning Meets Sparse Regularization: A Signal Processing Perspective
No ratings yet
Deep Learning Meets Sparse Regularization: A Signal Processing Perspective
23 pages
Unit 03 - Neural Networks - MD
No ratings yet
Unit 03 - Neural Networks - MD
24 pages
Machine Learning: Lecture 4: Artificial Neural Networks (Based On Chapter 4 of Mitchell T.., Machine Learning, 1997)
No ratings yet
Machine Learning: Lecture 4: Artificial Neural Networks (Based On Chapter 4 of Mitchell T.., Machine Learning, 1997)
14 pages
ML 6
No ratings yet
ML 6
10 pages
Introduction To ANNs
No ratings yet
Introduction To ANNs
31 pages
Machine Learning: Lecture 4: Artificial Neural Networks (Based On Chapter 4 of Mitchell T.., Machine Learning, 1997)
No ratings yet
Machine Learning: Lecture 4: Artificial Neural Networks (Based On Chapter 4 of Mitchell T.., Machine Learning, 1997)
14 pages
Lecture_2 (1)
No ratings yet
Lecture_2 (1)
52 pages
DL Concepts 1 Overview
No ratings yet
DL Concepts 1 Overview
80 pages
CC511 Week 5 - 6 - NN - BP
No ratings yet
CC511 Week 5 - 6 - NN - BP
62 pages
Assign 1 Soft Comp
No ratings yet
Assign 1 Soft Comp
12 pages
CS 224D: Deep Learning For NLP: Lecture Notes: Part III Spring 2016
No ratings yet
CS 224D: Deep Learning For NLP: Lecture Notes: Part III Spring 2016
14 pages
CS 224D: Deep Learning For NLP: Lecture Notes: Part III Spring 2015
No ratings yet
CS 224D: Deep Learning For NLP: Lecture Notes: Part III Spring 2015
14 pages
THE_DEEP_NEURAL_NETWORK-A_REVIEW
No ratings yet
THE_DEEP_NEURAL_NETWORK-A_REVIEW
5 pages
Unit 2 Deep Learning and Neural Networks
No ratings yet
Unit 2 Deep Learning and Neural Networks
38 pages
AITools Unit-4
No ratings yet
AITools Unit-4
25 pages
Neural Networks From Scratch: 3.1 Formal Neuron
No ratings yet
Neural Networks From Scratch: 3.1 Formal Neuron
8 pages
Biological Neurons and Neural Networks, Artificial Neurons
No ratings yet
Biological Neurons and Neural Networks, Artificial Neurons
14 pages
Deep Learning Modeule V01
No ratings yet
Deep Learning Modeule V01
70 pages
UNit 6 Machine Learning
No ratings yet
UNit 6 Machine Learning
23 pages
Bai 1 Eng
No ratings yet
Bai 1 Eng
10 pages
2K21_EE_192 MLP
No ratings yet
2K21_EE_192 MLP
59 pages
Soft Computing Manual.-1
No ratings yet
Soft Computing Manual.-1
45 pages
Chapter 2
No ratings yet
Chapter 2
41 pages
DOS - Report
No ratings yet
DOS - Report
25 pages
Multi Layer Perceptron
No ratings yet
Multi Layer Perceptron
62 pages
Question Bank
No ratings yet
Question Bank
26 pages
Tensors_Operations_and_Deep_Learning_Cycle - Jupyter Notebook
No ratings yet
Tensors_Operations_and_Deep_Learning_Cycle - Jupyter Notebook
24 pages
Deep Learning As A Building Block in Probabilistic Models: Pierre-Alexandre Mattei
No ratings yet
Deep Learning As A Building Block in Probabilistic Models: Pierre-Alexandre Mattei
57 pages
01 Basics 01ML 02
No ratings yet
01 Basics 01ML 02
35 pages
Unit 2 - Soft Computing
No ratings yet
Unit 2 - Soft Computing
49 pages
Exercise 2: Hopeld Networks: Articiella Neuronnät Och Andra Lärande System, 2D1432, 2004
No ratings yet
Exercise 2: Hopeld Networks: Articiella Neuronnät Och Andra Lärande System, 2D1432, 2004
8 pages
CS231n Convolutional Neural Networks For Visual Recognition 2
No ratings yet
CS231n Convolutional Neural Networks For Visual Recognition 2
12 pages
Machine Learning and Pattern Recognition Week 8 Neural Net Intro
No ratings yet
Machine Learning and Pattern Recognition Week 8 Neural Net Intro
3 pages
Deep Learning 2.0: Artificial Neurons That Matter - Reject Correlation, Embrace Orthogonality
No ratings yet
Deep Learning 2.0: Artificial Neurons That Matter - Reject Correlation, Embrace Orthogonality
19 pages
Chapter 9
No ratings yet
Chapter 9
9 pages
6COM1044 Deep Learning 1
No ratings yet
6COM1044 Deep Learning 1
49 pages
Multi-Dimensional Neural Networks: Unified Theory: Garimella Ramamurthy
No ratings yet
Multi-Dimensional Neural Networks: Unified Theory: Garimella Ramamurthy
22 pages
Neural Networks
No ratings yet
Neural Networks
12 pages
1808.07526
No ratings yet
1808.07526
26 pages
Neural Networks
No ratings yet
Neural Networks
37 pages
Neural Network Basics
No ratings yet
Neural Network Basics
13 pages
cs188-sp24-note22
No ratings yet
cs188-sp24-note22
8 pages
Artificial Neural Networks
No ratings yet
Artificial Neural Networks
82 pages
Module 2
No ratings yet
Module 2
13 pages
Perceptron to Hopfield Networks - NNDL_merged
No ratings yet
Perceptron to Hopfield Networks - NNDL_merged
35 pages
appendhhdhdh
No ratings yet
appendhhdhdh
17 pages
lec6 (1)
No ratings yet
lec6 (1)
18 pages
Fundamentals of Artificial Neural Network: Workshop On "Neural Network Approach For Image Processing", Feb 4 & 5, 2011
No ratings yet
Fundamentals of Artificial Neural Network: Workshop On "Neural Network Approach For Image Processing", Feb 4 & 5, 2011
27 pages
Unit-Iv Hopfield Networks: As Per Jntu Your Syllabus Is
No ratings yet
Unit-Iv Hopfield Networks: As Per Jntu Your Syllabus Is
48 pages
ML Unit-2
No ratings yet
ML Unit-2
141 pages
Neural Networks
From Everand
Neural Networks
Sasha Kurzweil
No ratings yet
Skydock Ai 1729259173834335888 1
No ratings yet
Skydock Ai 1729259173834335888 1
11 pages
Unit 4 notes
No ratings yet
Unit 4 notes
19 pages
Machine Learning, ML Ass 6
No ratings yet
Machine Learning, ML Ass 6
11 pages
Ch1 Slides
No ratings yet
Ch1 Slides
47 pages
Supervised Learning Network: "Principles of Soft Computing, 2
No ratings yet
Supervised Learning Network: "Principles of Soft Computing, 2
30 pages
CSA501_ QB Neural Network Deep Learning_updated2024
No ratings yet
CSA501_ QB Neural Network Deep Learning_updated2024
11 pages
Machine Learning Unit 4
No ratings yet
Machine Learning Unit 4
21 pages
Lecture 22 Energy-Based Models - Hopfield Network
No ratings yet
Lecture 22 Energy-Based Models - Hopfield Network
57 pages
A Survey of Deep Learning and Its Applications - A New Paradigm To Machine Learning - Dargan2019
No ratings yet
A Survey of Deep Learning and Its Applications - A New Paradigm To Machine Learning - Dargan2019
22 pages
Cs3027 Deep Learning Syllabus
No ratings yet
Cs3027 Deep Learning Syllabus
2 pages
Recurrent Neural Network Wiki
100% (1)
Recurrent Neural Network Wiki
7 pages
Lecture Notes For Chapter 4 Artificial Neural Networks Introduction To Data Mining, 2 Edition
No ratings yet
Lecture Notes For Chapter 4 Artificial Neural Networks Introduction To Data Mining, 2 Edition
20 pages
Multi Layer Perceptron 1
No ratings yet
Multi Layer Perceptron 1
54 pages
OlahLSTM NEURAL NETWORK TUTORIAL 15
No ratings yet
OlahLSTM NEURAL NETWORK TUTORIAL 15
9 pages
SC Question Bank
No ratings yet
SC Question Bank
3 pages
LCTM and Gru
No ratings yet
LCTM and Gru
62 pages
lightllm源码导读-模型
No ratings yet
lightllm源码导读-模型
37 pages
Transformer-Transducer End-to-End Speech Recognition with Self-Attention
No ratings yet
Transformer-Transducer End-to-End Speech Recognition with Self-Attention
5 pages
Artificial Neural Networks: Prajith CA Associate Professor Ece, Cet
No ratings yet
Artificial Neural Networks: Prajith CA Associate Professor Ece, Cet
46 pages
DL Notes ALL
No ratings yet
DL Notes ALL
63 pages
L3 Backpropagation
No ratings yet
L3 Backpropagation
61 pages
Deep Learning: - Course Code: - Unit 1
No ratings yet
Deep Learning: - Course Code: - Unit 1
21 pages
Recurrent & Recursive Nets
No ratings yet
Recurrent & Recursive Nets
10 pages
MAT6007 - Session1 - History of Deep Learning
No ratings yet
MAT6007 - Session1 - History of Deep Learning
22 pages
[Ebooks PDF] download (Ebook) Convergence of Deep Learning and Artificial Intelligence in Internet of Things by Ajay Rana (editor), Arun Kumar Rana (editor), Sachin Dhawan (editor), Sharad Sharma (editor) ISBN 9781032391717, 9781032410425, 9781003355960, 1032391715, 1032410426, 100335596X full chapters
100% (8)
[Ebooks PDF] download (Ebook) Convergence of Deep Learning and Artificial Intelligence in Internet of Things by Ajay Rana (editor), Arun Kumar Rana (editor), Sachin Dhawan (editor), Sharad Sharma (editor) ISBN 9781032391717, 9781032410425, 9781003355960, 1032391715, 1032410426, 100335596X full chapters
71 pages
Deep Learning - Lesson Plan
No ratings yet
Deep Learning - Lesson Plan
5 pages
Neural Machine Translation: A Review and Survey
No ratings yet
Neural Machine Translation: A Review and Survey
91 pages
DL Unit-3
No ratings yet
DL Unit-3
9 pages