SlideShare a Scribd company logo
By
B RAJESWARI
TGTWRDC(GIRLS) KHAMMAM
NEURAL NETWORKS
Bio-inspired Multi-Layer Networks
1.What is a Multi-Layer Neural Network?
A Multi-Layer Perceptron (MLP) is an artificial neural
network composed of multiple layers of neurons.
•It is inspired by how biological neurons process information.
Structure of a Multi-Layer Network
•A typical two-layer neural network consists of:
Input Layer – Receives input features (e.g., image
pixels).
Hidden Layer – Applies weighted transformations and
activation functions.
Output Layer – Produces the final prediction.
Diagram of a Simple Two-Layer Network
Here:
•Each hidden neuron receives inputs and applies a non-linear
function (activation function).
•The output neuron combines hidden activations to make a
final decision.
2. How Does a Multi-Layer Network Work?
•The network performs forward propagation (to make
predictions) and back propagation (to update weights).
Step 1: Compute Hidden Layer Activations
Each hidden neuron receives inputs and computes an activation:
Graph of Activation Functions
•tanh function (smooth, non-linear, differentiable)
•ReLU function (better for deep networks)
•The image compares the sign function (sign(x))and the
hyperbolic tangent function (tanh⁔
(x)).
Explanation:
Sign Function (sign(x):
Defined as
•It is not differentiable at x=0 due to the discontinuity.
•It produces discrete output values: -1 for negative numbers,
0 at zero,
1 for positive numbers.
x sign(x) tanh⁔
(x)
-2 -1 -0.964
-1 -1 -0.761
0 0 0
1 1 0.761
2 1 0.964
Example: Let’s compare outputs for a few values of x:
3. Training a Multi-Layer Network (Back propagation)
• To train the network, we minimize the error between
predicted and actual outputs.
• Loss Function (Error)
• A common choice is Mean Squared Error (MSE):
Comprehensive Guide to Neural Networks in Machine Learning and Deep Learning Applications"
Meaning of Activation in Neural Networks
•Activation in a neural network refers to the value computed by a
neuron after applying a mathematical function to its inputs.
•It determines whether the neuron should be "active" (contribute to the
next layer) or not.
•Each neuron in a layer computes its activation using the formula:
a=f(z) where:
z=āˆ‘vihi+b
is the weighted sum of inputs plus bias.
•f(z) is an activation function that introduces non-linearity.
Step 2: Apply Activation Function
If we use a ReLU activation function
f(z)=max⁔
(0,z), then:
a=max⁔
(0,2.0)=2.0
So, the final activation output is 2.0.
Example
Let’s consider a simple neural network with one hidden neuron and one
output neuron.
Step 1: Compute Weighted Sum
Suppose:
Hidden neuron activation h1=2.0
Weight v1=0.5
Bias bo=1.0
Using the formula:
z=(v1 h1)+bo=(0.5 2.0)+1.0=2.0
ā‹… ā‹…
How to Solve XOR?
Single-layer perceptron fails because XOR is not linearly separable.
Multi-layer network solves it using a hidden layer.
šŸ”½ Diagram: Two-Layer Network for XOR
Explanation of the Small XOR Data Set
The table shown in the image represents a small XOR dataset, which is
commonly used in machine learning to illustrate the need for a non-linear
model.
Understanding the Columns
y: The target output (label).
x0​
: Bias term (always +1).
x1,x2​
: Input features.
Each row represents a training example.
XOR Logic
The dataset seems to be an extension of the XOR (Exclusive OR)
function, where the output y depends on the inputs x1​and x2​
:
y=x1 x2
āŠ•
where denotes the XOR operation.
āŠ•
However, this dataset uses ±1 notation instead of 0 and 1, which is
common in some neural network training approaches.
Comprehensive Guide to Neural Networks in Machine Learning and Deep Learning Applications"
x1x_1x1​ x2x_2x2​ h1=OR(x1,x2) h2=AND(x1,x2) y^=āˆ’2h2+h1​
0 0 0 0 0
0 1 1 0 1
1 0 1 0 1
1 1 1 1 -2(1) + 1 = -1
Step 3: Understanding the Computation
Let’s verify this using truth table values:
This matches the expected XOR function:
•(0,0)→0
•(0,1)→1
•(1,0)→1
•(1,1)→0
X-----------------X (Linear Model āŒ)
| |
| O O |
| |
X-----------------X
X-----O-----X (Multi-Layer Model āœ…)
| | |
| O | O |
| | |
X-----O-----X
5. Why Are Multi-Layer Networks Powerful?
•The Universal Approximation Theorem
•A single-layer network can only represent linear functions.
•A two-layer network can approximate any function given enough
neurons.
•Deeper networks can learn complex patterns more efficiently.
Back propagation Algorithm
•The Back propagation Algorithm is a key method for training multi-layer
neural networks.
• It adjusts the weights of a neural network using gradient descent and the
chain rule of calculus.
1. What is Back propagation?
Back propagation allows a neural network to learn by:
1.Computing the error between predicted and actual values.
2.Propagating the error backward through the network.
3.Adjusting the weights to minimize the error.
2. Steps of Back propagation
Step 2: Compute Error
The error at the output layer is calculated as:
e=yāˆ’y^​
where:
y is the actual target output.
Y^​is the predicted output from the model.
Step 3: Compute Gradients (Backward Pass)
This step involves computing how much we need to adjust the
weights to reduce the error.
Comprehensive Guide to Neural Networks in Machine Learning and Deep Learning Applications"
Comprehensive Guide to Neural Networks in Machine Learning and Deep Learning Applications"
Comprehensive Guide to Neural Networks in Machine Learning and Deep Learning Applications"
Comprehensive Guide to Neural Networks in Machine Learning and Deep Learning Applications"
Next Steps
Repeat the Forward Pass:
Use the updated weights (w=0.4005, v=0.6014) to compute the new
hidden layer activation and output prediction.
Compute New Error:
Compare the new prediction with the actual target y.
Compute the new error e=yāˆ’y^​
.
Perform Another Back propagation Step:
Compute new gradients for w and v.
Update weights again using gradient descent.
Continue Training for Multiple Epochs:
Iterate this process for many epochs until the error is minimized,
meaning the model has learned to approximate the target output well.
Evaluate Performance:
Once training is done, test the model on new data to check its accuracy
and generalization.
Comprehensive Guide to Neural Networks in Machine Learning and Deep Learning Applications"
5. Key Takeaways
āœ” Back propagation uses gradient descent to adjust weights.
It propagates errors from output to hidden layers.
āœ”
Allows deep networks to learn complex patterns.
āœ”
Used in almost all modern deep learning models.
āœ”
Initialization and Convergence of Neural
Networks
This section cusses:
• Why Initialization Matters
• Problems with Poor Initialization
• Good Initialization Techniques
• Challenges in Convergence
• Strategies for Faster CONVERGENCE
1. Why Does Initialization Matter?
šŸ”¹ What Happens with Poor Initialization?
• Weight initialization is a critical step in training neural networks.
Poor initialization can lead to issues such as slow convergence,
unstable training, or the complete failure of the network to learn.
Below is a detailed breakdown of why proper initialization is
important and what happens if it is not done correctly.
1. All Weights = 0 → The Network Never Learns āŒ
 If all weights are initialized to zero, the network fails to learn anything.
 This is because all neurons will have the same gradients and will update
identically.
 This leads to symmetry in the network, meaning all neurons in the same layer
behave the same way, making the network incapable of learning
diverse features.
āœ… Solution: Randomly initialize weights with small values to break symmetry.
2. Too Large Weights → Exploding Gradients āŒ
When weights are initialized with very large values, the
gradients during back propagation can also become excessively large.
This leads to unstable training because the weight updates are
too drastic, causing the network to diverge rather than converge.
The problem is especially severe in deep networks, where
multiple layers amplify these large gradients, making
optimization difficult.
āœ… Solution: Use Xavier Initialization or He Initialization, which
scales weights appropriately to prevent large gradient values.
3. Too Small Weights → Vanishing Gradients āŒ
If the weights are initialized with very small values, the
gradients in the deeper layers become extremely small during back
propagation.
This slows down learning since the weight updates are
negligible.
The problem is particularly common when using activation
functions like sigmoid or tanh, where gradients shrink as they
propagate backward.
As a result, earlier layers learn very slowly, while later layers
receive better updates, leading to inefficient training.
āœ… Solution:
Use ReLU activation functions instead of sigmoid/tanh.
Use He Initialization, which is designed for ReLU-based
networks to maintain proper gradient flow
2.Common Weight Initialization Methods
•To prevent these issues, we use smart initialization techniques.
 (1) Random Initialization (Old Approach āŒ)
• Assign random values (e.g., small Gaussian noise).
Problem: It can still cause vanishing/exploding gradients.
Comprehensive Guide to Neural Networks in Machine Learning and Deep Learning Applications"
3. Challenges in Convergence
šŸ”¹ (1) Vanishing Gradients šŸ˜“
In deep networks, gradients shrink → slow learning.
Solution: Use ReLU + He initialization.
āœ”
šŸ”¹ (2) Exploding Gradients šŸ’„
In deep networks, gradients become too large.
Solution: Use gradient clipping or He initialization.
āœ”
šŸ”¹ (3) Poor Local Minima 😩
The network gets stuck in bad solutions.
Solution: Use batch normalization and adaptive optimizers (Adam,
āœ”
RMSprop).
Convergence of Randomly Initialized Networks
•The graph illustrates how different weight
initialization methods affect the convergence of
a neural network during training.
•The x-axis represents the number of
iterations (training progress), while the y-axis
represents the test error (lower is better).
Key Observations:
Zero Initialization Fails to Converge Efficiently
The curve labeled "zero-init" remains significantly higher than the others,
meaning the network struggles to reduce test error effectively.
This happens because initializing all weights to zero leads to symmetry in the
network, causing neurons to learn the same features, making training
ineffective.
2.Random Initialization Leads to Faster Convergence
• The multiple colored lines represent different runs of the network with
random weight initialization.
• These networks show better and faster reduction in test error
compared to zero initialization.
• However, the convergence rate varies among different runs, suggesting
that the choice of initialization distribution can still impact training
efficiency.
3.Final Performance Variation
• Some curves (e.g., green and blue) achieve lower test errors faster, while
others (e.g., purple) take longer or oscillate more.
• This highlights the importance of using optimized initialization
techniques (e.g., Xavier or He initialization) instead of purely random
initialization.
Problem Solution
Slow learning
Use adaptive learning rates (Adam,
RMSprop)
Vanishing gradients Use ReLU + He initialization
Exploding gradients Apply gradient clipping
Poor local minima Use batch normalization
4. Strategies to Improve Convergence
6. Key Takeaways
āœ” Good initialization speeds up training and prevents bad
convergence.
āœ” Xavier (Glorot) → Best for sigmoid/tanh networks.
āœ” He Initialization → Best for ReLU-based deep networks.
āœ” LeCun Initialization → Works well for small networks with
sigmoid.
Beyond Two Layers in Neural Networks
(Deep Learning)
• Neural networks can go beyond two layers to create deep neural
networks (DNNs), which allow them to learn more complex
patterns.
• This section will explain:
• Why go beyond two layers?
• Deep Networks Structure (Diagrams)
• Forward and Back propagation in Deep Networks
• Advantages & Challenges of Deep Networks
• Graphs showing Training Convergence
Feature Two-Layer Network Multi-Layer Network
Function Approximation
Can approximate any
function
More efficient &
expressive
Training Complexity
Fewer parameters,
easier to train
More parameters, harder
to optimize
Performance on
Complex Tasks
Limited to simple
problems
Handles deep features
(vision, NLP)
Why Go Beyond Two Layers?
• Two-layer network can approximate any function, but deeper networks offer:
Fewer neurons for the same task (efficient representation)
āœ”
Better generalization (learns hierarchical patterns)
āœ”
Solves complex tasks (e.g., image recognition, NLP)
āœ”
šŸ”½ Comparison: Two-Layer vs. Multi-Layer Networks
2. Deep Networks Structure (Diagrams)
A two-layer network consists of
one hidden layer between input and
output:
šŸ”½ Two-Layer Neural Network (Shallow)
Multi-Layer Neural Network (Deep)
Comprehensive Guide to Neural Networks in Machine Learning and Deep Learning Applications"
Comprehensive Guide to Neural Networks in Machine Learning and Deep Learning Applications"
Comprehensive Guide to Neural Networks in Machine Learning and Deep Learning Applications"
Step 2: Back propagation
• Compute error at output
• Propagate errors backward through layers
• Update weights using gradient descent
šŸ”½ Graph of Back propagation
Comprehensive Guide to Neural Networks in Machine Learning and Deep Learning Applications"
Comprehensive Guide to Neural Networks in Machine Learning and Deep Learning Applications"
Comprehensive Guide to Neural Networks in Machine Learning and Deep Learning Applications"
4. Advantages & Challenges of Deep Networks
šŸ”¹ Advantages
āœ” Feature Hierarchies: First layers learn simple patterns (edges), deeper
layers learn complex features (objects).
Efficient Representation: Fewer parameters required than a wide
āœ”
network.
Better Generalization: Captures higher-level abstract representations.
āœ”
šŸ”¹ Challenges
āŒ Vanishing Gradients: Lower layers get very small updates, slowing
learning.
āŒ Exploding Gradients: Large updates make training unstable.
āŒ More Parameters: Harder to optimize.
Breadth vs Depth, Basis Functions
What is Breadth vs. Depth?
• Neural networks can be designed to be wide (more neurons per layer)
or deep (more layers with fewer neurons per layer).
• Wide Networks: Have more neurons per layer but fewer layers.
• Deep Networks: Have fewer neurons per layer but many layers.
Why Consider Deeper Networks If Two-Layer Networks
Are Universal Function Approximators?
•The Universal Approximation Theorem states that a two-layer (shallow)
neural network with enough hidden units can approximate any function.
•However, this does not mean that a shallow network is the most efficient
way to represent every function.
•Some functions require an exponential number of neurons in a shallow
network, whereas a deep network can achieve the same result with far
fewer neurons.
Circuit Complexity and the Parity Function
•To understand why deep networks can be more efficient, we look
at circuit complexity:
Consider the parity function, which determines whether
the number of 1s in a binary input is odd or even:
1 if the number of 1s is odd
0 if the number of 1s is even
If we use XOR gates in a circuit:
•A deep network (logarithmic depth) can compute parity with O(D)
gates (linear in the number of inputs).
•A shallow network (constant depth) would require exponential many
gates, making it inefficient.
Feature Wide Network (Breadth) Deep Network (Depth)
Number of Layers Small (1-2) Large (4+)
Number of Neurons per
Layer
Large Small
Computational Complexity Low High
Expressiveness Limited
Can approximate complex
functions
Training Time Faster Slower
Vanishing Gradient
Problem
Less likely More likely
Best For Simple tasks
Hierarchical features (e.g.,
image recognition)
Trade-Off Between Breadth and Depth
Basis Functions in Neural Networks
Neural networks can approximate both linear and complex non-linear
functions. A natural question arises:
šŸ‘‰ Can a neural network mimic a k-Nearest Neighbors (KNN)
classifier efficiently?
•The answer lies in using Radial Basis Functions (RBFs), which
transform a neural network into a structure that behaves similarly to
KNN.
1. What is a Basis Function?
•A basis function is a mathematical function used to transform input data
before passing it to the network. It helps in:
•Feature transformation: Mapping input space to a more useful
representation.
•Better function approximation: Capturing complex relationships in data.
šŸ”¹ Example:
•Linear functions use a dot product transformation
•Here, wi​is the center of the radial function.
•γi​controls the width of the Gaussian function.
2. How Do RBF Networks Mimic KNN?
a) KNN Classifier
•KNN makes predictions based on distances: It finds the closest K points
to a given query point and assigns the most common label.
b) RBF Networks
•Instead of directly storing all training points, RBF neurons act like
prototypes.
•Each hidden unit in an RBF network corresponds to a "prototype" data
point.
•The output is determined by a weighted sum of these RBF neurons,
similar to KNN’s distance-weighted voting.
Key Idea:
•Large γ → Localized activation (behaves like KNN, considering only
nearby points)
•Small γ → Broad activation (behaves like a generalizing model,
considering distant points too)
X (Input) Y (Class)
1.0 A
1.5 A
2.0 B
2.5 B
3. Example: RBF vs. KNN
•Imagine we have a 1D dataset where we classify points based on
proximity:
•KNN (K=1): Predicts class based on nearest neighbor.
RBF Network: Centers an RBF neuron at each data point.
•Uses a Gaussian function to determine influence.
•The output is a weighted sum of RBF neuron activations.
If a new point X = 1.8 is given:
KNN (K=1) → Predicts Class A (closer to 1.5)
RBF with optimized γ behaves similarly!
Feature KNN RBF Network
Memory Usage
Stores entire
dataset
Stores fewer
prototypes
Computational
Cost
Slow for large
datasets
Fast after training
Generalization Sensitive to noise
Can learn smoother
decision boundaries
Training No training needed
Requires
optimization of
centers & γ
. Advantages of RBF Networks Over KNN
Thank you
Ad

More Related Content

Similar to Comprehensive Guide to Neural Networks in Machine Learning and Deep Learning Applications" (20)

Neural Networks on Steroids (Poster)
Neural Networks on Steroids (Poster)Neural Networks on Steroids (Poster)
Neural Networks on Steroids (Poster)
Adam Blevins
Ā 
unit 1- NN concpts.pptx.pdf withautomstion
unit 1- NN concpts.pptx.pdf withautomstionunit 1- NN concpts.pptx.pdf withautomstion
unit 1- NN concpts.pptx.pdf withautomstion
KarthickGanesh8
Ā 
Artificial neural network
Artificial neural networkArtificial neural network
Artificial neural network
mustafa aadel
Ā 
Back_propagation_algorithm.Back_propagation_algorithm.Back_propagation_algorithm
Back_propagation_algorithm.Back_propagation_algorithm.Back_propagation_algorithmBack_propagation_algorithm.Back_propagation_algorithm.Back_propagation_algorithm
Back_propagation_algorithm.Back_propagation_algorithm.Back_propagation_algorithm
sureshkumarece1
Ā 
Activation functions and Training Algorithms for Deep Neural network
Activation functions and Training Algorithms for Deep Neural networkActivation functions and Training Algorithms for Deep Neural network
Activation functions and Training Algorithms for Deep Neural network
Gayatri Khanvilkar
Ā 
Artificial Neural Networks presentations
Artificial Neural Networks presentationsArtificial Neural Networks presentations
Artificial Neural Networks presentations
migob991
Ā 
Introduction to deep Learning Fundamentals
Introduction to deep Learning FundamentalsIntroduction to deep Learning Fundamentals
Introduction to deep Learning Fundamentals
VishalGour25
Ā 
Introduction to deep Learning Fundamentals
Introduction to deep Learning FundamentalsIntroduction to deep Learning Fundamentals
Introduction to deep Learning Fundamentals
VishalGour25
Ā 
Introduction to Neural networks (under graduate course) Lecture 9 of 9
Introduction to Neural networks (under graduate course) Lecture 9 of 9Introduction to Neural networks (under graduate course) Lecture 9 of 9
Introduction to Neural networks (under graduate course) Lecture 9 of 9
Randa Elanwar
Ā 
Neural Networks and its related Concepts
Neural Networks and its related ConceptsNeural Networks and its related Concepts
Neural Networks and its related Concepts
SAMPADABHONDE1
Ā 
Web spam classification using supervised artificial neural network algorithms
Web spam classification using supervised artificial neural network algorithmsWeb spam classification using supervised artificial neural network algorithms
Web spam classification using supervised artificial neural network algorithms
aciijournal
Ā 
Deep neural networks & computational graphs
Deep neural networks & computational graphsDeep neural networks & computational graphs
Deep neural networks & computational graphs
Revanth Kumar
Ā 
Backpropagation.pptx
Backpropagation.pptxBackpropagation.pptx
Backpropagation.pptx
VandanaVipparthi
Ā 
Multilayer Perceptron Neural Network MLP
Multilayer Perceptron Neural Network MLPMultilayer Perceptron Neural Network MLP
Multilayer Perceptron Neural Network MLP
Abdullah al Mamun
Ā 
Web Spam Classification Using Supervised Artificial Neural Network Algorithms
Web Spam Classification Using Supervised Artificial Neural Network AlgorithmsWeb Spam Classification Using Supervised Artificial Neural Network Algorithms
Web Spam Classification Using Supervised Artificial Neural Network Algorithms
aciijournal
Ā 
Web Spam Classification Using Supervised Artificial Neural Network Algorithms
Web Spam Classification Using Supervised Artificial Neural Network AlgorithmsWeb Spam Classification Using Supervised Artificial Neural Network Algorithms
Web Spam Classification Using Supervised Artificial Neural Network Algorithms
aciijournal
Ā 
this is a Ai topic neural network ML_Lecture_4.ppt
this is a Ai topic neural network ML_Lecture_4.pptthis is a Ai topic neural network ML_Lecture_4.ppt
this is a Ai topic neural network ML_Lecture_4.ppt
ry54321288
Ā 
Artificial neural network
Artificial neural networkArtificial neural network
Artificial neural network
IshaneeSharma
Ā 
Introduction to Deep learning and H2O for beginner's
Introduction to Deep learning and H2O for beginner'sIntroduction to Deep learning and H2O for beginner's
Introduction to Deep learning and H2O for beginner's
Vidyasagar Bhargava
Ā 
Unit 2 ml.pptx
Unit 2 ml.pptxUnit 2 ml.pptx
Unit 2 ml.pptx
PradeeshSAI
Ā 
Neural Networks on Steroids (Poster)
Neural Networks on Steroids (Poster)Neural Networks on Steroids (Poster)
Neural Networks on Steroids (Poster)
Adam Blevins
Ā 
unit 1- NN concpts.pptx.pdf withautomstion
unit 1- NN concpts.pptx.pdf withautomstionunit 1- NN concpts.pptx.pdf withautomstion
unit 1- NN concpts.pptx.pdf withautomstion
KarthickGanesh8
Ā 
Artificial neural network
Artificial neural networkArtificial neural network
Artificial neural network
mustafa aadel
Ā 
Back_propagation_algorithm.Back_propagation_algorithm.Back_propagation_algorithm
Back_propagation_algorithm.Back_propagation_algorithm.Back_propagation_algorithmBack_propagation_algorithm.Back_propagation_algorithm.Back_propagation_algorithm
Back_propagation_algorithm.Back_propagation_algorithm.Back_propagation_algorithm
sureshkumarece1
Ā 
Activation functions and Training Algorithms for Deep Neural network
Activation functions and Training Algorithms for Deep Neural networkActivation functions and Training Algorithms for Deep Neural network
Activation functions and Training Algorithms for Deep Neural network
Gayatri Khanvilkar
Ā 
Artificial Neural Networks presentations
Artificial Neural Networks presentationsArtificial Neural Networks presentations
Artificial Neural Networks presentations
migob991
Ā 
Introduction to deep Learning Fundamentals
Introduction to deep Learning FundamentalsIntroduction to deep Learning Fundamentals
Introduction to deep Learning Fundamentals
VishalGour25
Ā 
Introduction to deep Learning Fundamentals
Introduction to deep Learning FundamentalsIntroduction to deep Learning Fundamentals
Introduction to deep Learning Fundamentals
VishalGour25
Ā 
Introduction to Neural networks (under graduate course) Lecture 9 of 9
Introduction to Neural networks (under graduate course) Lecture 9 of 9Introduction to Neural networks (under graduate course) Lecture 9 of 9
Introduction to Neural networks (under graduate course) Lecture 9 of 9
Randa Elanwar
Ā 
Neural Networks and its related Concepts
Neural Networks and its related ConceptsNeural Networks and its related Concepts
Neural Networks and its related Concepts
SAMPADABHONDE1
Ā 
Web spam classification using supervised artificial neural network algorithms
Web spam classification using supervised artificial neural network algorithmsWeb spam classification using supervised artificial neural network algorithms
Web spam classification using supervised artificial neural network algorithms
aciijournal
Ā 
Deep neural networks & computational graphs
Deep neural networks & computational graphsDeep neural networks & computational graphs
Deep neural networks & computational graphs
Revanth Kumar
Ā 
Backpropagation.pptx
Backpropagation.pptxBackpropagation.pptx
Backpropagation.pptx
VandanaVipparthi
Ā 
Multilayer Perceptron Neural Network MLP
Multilayer Perceptron Neural Network MLPMultilayer Perceptron Neural Network MLP
Multilayer Perceptron Neural Network MLP
Abdullah al Mamun
Ā 
Web Spam Classification Using Supervised Artificial Neural Network Algorithms
Web Spam Classification Using Supervised Artificial Neural Network AlgorithmsWeb Spam Classification Using Supervised Artificial Neural Network Algorithms
Web Spam Classification Using Supervised Artificial Neural Network Algorithms
aciijournal
Ā 
Web Spam Classification Using Supervised Artificial Neural Network Algorithms
Web Spam Classification Using Supervised Artificial Neural Network AlgorithmsWeb Spam Classification Using Supervised Artificial Neural Network Algorithms
Web Spam Classification Using Supervised Artificial Neural Network Algorithms
aciijournal
Ā 
this is a Ai topic neural network ML_Lecture_4.ppt
this is a Ai topic neural network ML_Lecture_4.pptthis is a Ai topic neural network ML_Lecture_4.ppt
this is a Ai topic neural network ML_Lecture_4.ppt
ry54321288
Ā 
Artificial neural network
Artificial neural networkArtificial neural network
Artificial neural network
IshaneeSharma
Ā 
Introduction to Deep learning and H2O for beginner's
Introduction to Deep learning and H2O for beginner'sIntroduction to Deep learning and H2O for beginner's
Introduction to Deep learning and H2O for beginner's
Vidyasagar Bhargava
Ā 
Unit 2 ml.pptx
Unit 2 ml.pptxUnit 2 ml.pptx
Unit 2 ml.pptx
PradeeshSAI
Ā 

Recently uploaded (20)

Michelle Rumley & MairƩad Mooney, Boole Library, University College Cork. Tra...
Michelle Rumley & MairƩad Mooney, Boole Library, University College Cork. Tra...Michelle Rumley & MairƩad Mooney, Boole Library, University College Cork. Tra...
Michelle Rumley & MairƩad Mooney, Boole Library, University College Cork. Tra...
Library Association of Ireland
Ā 
GDGLSPGCOER - Git and GitHub Workshop.pptx
GDGLSPGCOER - Git and GitHub Workshop.pptxGDGLSPGCOER - Git and GitHub Workshop.pptx
GDGLSPGCOER - Git and GitHub Workshop.pptx
azeenhodekar
Ā 
UNIT 3 NATIONAL HEALTH PROGRAMMEE. SOCIAL AND PREVENTIVE PHARMACY
UNIT 3 NATIONAL HEALTH PROGRAMMEE. SOCIAL AND PREVENTIVE PHARMACYUNIT 3 NATIONAL HEALTH PROGRAMMEE. SOCIAL AND PREVENTIVE PHARMACY
UNIT 3 NATIONAL HEALTH PROGRAMMEE. SOCIAL AND PREVENTIVE PHARMACY
DR.PRISCILLA MARY J
Ā 
How to Subscribe Newsletter From Odoo 18 Website
How to Subscribe Newsletter From Odoo 18 WebsiteHow to Subscribe Newsletter From Odoo 18 Website
How to Subscribe Newsletter From Odoo 18 Website
Celine George
Ā 
LDMMIA Reiki Master Spring 2025 Mini Updates
LDMMIA Reiki Master Spring 2025 Mini UpdatesLDMMIA Reiki Master Spring 2025 Mini Updates
LDMMIA Reiki Master Spring 2025 Mini Updates
LDM Mia eStudios
Ā 
Phoenix – A Collaborative Renewal of Children’s and Young People’s Services C...
Phoenix – A Collaborative Renewal of Children’s and Young People’s Services C...Phoenix – A Collaborative Renewal of Children’s and Young People’s Services C...
Phoenix – A Collaborative Renewal of Children’s and Young People’s Services C...
Library Association of Ireland
Ā 
World war-1(Causes & impacts at a glance) PPT by Simanchala Sarab(BABed,sem-4...
World war-1(Causes & impacts at a glance) PPT by Simanchala Sarab(BABed,sem-4...World war-1(Causes & impacts at a glance) PPT by Simanchala Sarab(BABed,sem-4...
World war-1(Causes & impacts at a glance) PPT by Simanchala Sarab(BABed,sem-4...
larencebapu132
Ā 
New Microsoft PowerPoint Presentation.pptx
New Microsoft PowerPoint Presentation.pptxNew Microsoft PowerPoint Presentation.pptx
New Microsoft PowerPoint Presentation.pptx
milanasargsyan5
Ā 
Handling Multiple Choice Responses: Fortune Effiong.pptx
Handling Multiple Choice Responses: Fortune Effiong.pptxHandling Multiple Choice Responses: Fortune Effiong.pptx
Handling Multiple Choice Responses: Fortune Effiong.pptx
AuthorAIDNationalRes
Ā 
Social Problem-Unemployment .pptx notes for Physiotherapy Students
Social Problem-Unemployment .pptx notes for Physiotherapy StudentsSocial Problem-Unemployment .pptx notes for Physiotherapy Students
Social Problem-Unemployment .pptx notes for Physiotherapy Students
DrNidhiAgarwal
Ā 
How to Customize Your Financial Reports & Tax Reports With Odoo 17 Accounting
How to Customize Your Financial Reports & Tax Reports With Odoo 17 AccountingHow to Customize Your Financial Reports & Tax Reports With Odoo 17 Accounting
How to Customize Your Financial Reports & Tax Reports With Odoo 17 Accounting
Celine George
Ā 
Biophysics Chapter 3 Methods of Studying Macromolecules.pdf
Biophysics Chapter 3 Methods of Studying Macromolecules.pdfBiophysics Chapter 3 Methods of Studying Macromolecules.pdf
Biophysics Chapter 3 Methods of Studying Macromolecules.pdf
PKLI-Institute of Nursing and Allied Health Sciences Lahore , Pakistan.
Ā 
Stein, Hunt, Green letter to Congress April 2025
Stein, Hunt, Green letter to Congress April 2025Stein, Hunt, Green letter to Congress April 2025
Stein, Hunt, Green letter to Congress April 2025
Mebane Rash
Ā 
One Hot encoding a revolution in Machine learning
One Hot encoding a revolution in Machine learningOne Hot encoding a revolution in Machine learning
One Hot encoding a revolution in Machine learning
momer9505
Ā 
Metamorphosis: Life's Transformative Journey
Metamorphosis: Life's Transformative JourneyMetamorphosis: Life's Transformative Journey
Metamorphosis: Life's Transformative Journey
Arshad Shaikh
Ā 
Sinhala_Male_Names.pdf Sinhala_Male_Name
Sinhala_Male_Names.pdf Sinhala_Male_NameSinhala_Male_Names.pdf Sinhala_Male_Name
Sinhala_Male_Names.pdf Sinhala_Male_Name
keshanf79
Ā 
Unit 6_Introduction_Phishing_Password Cracking.pdf
Unit 6_Introduction_Phishing_Password Cracking.pdfUnit 6_Introduction_Phishing_Password Cracking.pdf
Unit 6_Introduction_Phishing_Password Cracking.pdf
KanchanPatil34
Ā 
YSPH VMOC Special Report - Measles Outbreak Southwest US 4-30-2025.pptx
YSPH VMOC Special Report - Measles Outbreak  Southwest US 4-30-2025.pptxYSPH VMOC Special Report - Measles Outbreak  Southwest US 4-30-2025.pptx
YSPH VMOC Special Report - Measles Outbreak Southwest US 4-30-2025.pptx
Yale School of Public Health - The Virtual Medical Operations Center (VMOC)
Ā 
To study the nervous system of insect.pptx
To study the nervous system of insect.pptxTo study the nervous system of insect.pptx
To study the nervous system of insect.pptx
Arshad Shaikh
Ā 
Understanding P–N Junction Semiconductors: A Beginner’s Guide
Understanding P–N Junction Semiconductors: A Beginner’s GuideUnderstanding P–N Junction Semiconductors: A Beginner’s Guide
Understanding P–N Junction Semiconductors: A Beginner’s Guide
GS Virdi
Ā 
Michelle Rumley & MairƩad Mooney, Boole Library, University College Cork. Tra...
Michelle Rumley & MairƩad Mooney, Boole Library, University College Cork. Tra...Michelle Rumley & MairƩad Mooney, Boole Library, University College Cork. Tra...
Michelle Rumley & MairƩad Mooney, Boole Library, University College Cork. Tra...
Library Association of Ireland
Ā 
GDGLSPGCOER - Git and GitHub Workshop.pptx
GDGLSPGCOER - Git and GitHub Workshop.pptxGDGLSPGCOER - Git and GitHub Workshop.pptx
GDGLSPGCOER - Git and GitHub Workshop.pptx
azeenhodekar
Ā 
UNIT 3 NATIONAL HEALTH PROGRAMMEE. SOCIAL AND PREVENTIVE PHARMACY
UNIT 3 NATIONAL HEALTH PROGRAMMEE. SOCIAL AND PREVENTIVE PHARMACYUNIT 3 NATIONAL HEALTH PROGRAMMEE. SOCIAL AND PREVENTIVE PHARMACY
UNIT 3 NATIONAL HEALTH PROGRAMMEE. SOCIAL AND PREVENTIVE PHARMACY
DR.PRISCILLA MARY J
Ā 
How to Subscribe Newsletter From Odoo 18 Website
How to Subscribe Newsletter From Odoo 18 WebsiteHow to Subscribe Newsletter From Odoo 18 Website
How to Subscribe Newsletter From Odoo 18 Website
Celine George
Ā 
LDMMIA Reiki Master Spring 2025 Mini Updates
LDMMIA Reiki Master Spring 2025 Mini UpdatesLDMMIA Reiki Master Spring 2025 Mini Updates
LDMMIA Reiki Master Spring 2025 Mini Updates
LDM Mia eStudios
Ā 
Phoenix – A Collaborative Renewal of Children’s and Young People’s Services C...
Phoenix – A Collaborative Renewal of Children’s and Young People’s Services C...Phoenix – A Collaborative Renewal of Children’s and Young People’s Services C...
Phoenix – A Collaborative Renewal of Children’s and Young People’s Services C...
Library Association of Ireland
Ā 
World war-1(Causes & impacts at a glance) PPT by Simanchala Sarab(BABed,sem-4...
World war-1(Causes & impacts at a glance) PPT by Simanchala Sarab(BABed,sem-4...World war-1(Causes & impacts at a glance) PPT by Simanchala Sarab(BABed,sem-4...
World war-1(Causes & impacts at a glance) PPT by Simanchala Sarab(BABed,sem-4...
larencebapu132
Ā 
New Microsoft PowerPoint Presentation.pptx
New Microsoft PowerPoint Presentation.pptxNew Microsoft PowerPoint Presentation.pptx
New Microsoft PowerPoint Presentation.pptx
milanasargsyan5
Ā 
Handling Multiple Choice Responses: Fortune Effiong.pptx
Handling Multiple Choice Responses: Fortune Effiong.pptxHandling Multiple Choice Responses: Fortune Effiong.pptx
Handling Multiple Choice Responses: Fortune Effiong.pptx
AuthorAIDNationalRes
Ā 
Social Problem-Unemployment .pptx notes for Physiotherapy Students
Social Problem-Unemployment .pptx notes for Physiotherapy StudentsSocial Problem-Unemployment .pptx notes for Physiotherapy Students
Social Problem-Unemployment .pptx notes for Physiotherapy Students
DrNidhiAgarwal
Ā 
How to Customize Your Financial Reports & Tax Reports With Odoo 17 Accounting
How to Customize Your Financial Reports & Tax Reports With Odoo 17 AccountingHow to Customize Your Financial Reports & Tax Reports With Odoo 17 Accounting
How to Customize Your Financial Reports & Tax Reports With Odoo 17 Accounting
Celine George
Ā 
Stein, Hunt, Green letter to Congress April 2025
Stein, Hunt, Green letter to Congress April 2025Stein, Hunt, Green letter to Congress April 2025
Stein, Hunt, Green letter to Congress April 2025
Mebane Rash
Ā 
One Hot encoding a revolution in Machine learning
One Hot encoding a revolution in Machine learningOne Hot encoding a revolution in Machine learning
One Hot encoding a revolution in Machine learning
momer9505
Ā 
Metamorphosis: Life's Transformative Journey
Metamorphosis: Life's Transformative JourneyMetamorphosis: Life's Transformative Journey
Metamorphosis: Life's Transformative Journey
Arshad Shaikh
Ā 
Sinhala_Male_Names.pdf Sinhala_Male_Name
Sinhala_Male_Names.pdf Sinhala_Male_NameSinhala_Male_Names.pdf Sinhala_Male_Name
Sinhala_Male_Names.pdf Sinhala_Male_Name
keshanf79
Ā 
Unit 6_Introduction_Phishing_Password Cracking.pdf
Unit 6_Introduction_Phishing_Password Cracking.pdfUnit 6_Introduction_Phishing_Password Cracking.pdf
Unit 6_Introduction_Phishing_Password Cracking.pdf
KanchanPatil34
Ā 
To study the nervous system of insect.pptx
To study the nervous system of insect.pptxTo study the nervous system of insect.pptx
To study the nervous system of insect.pptx
Arshad Shaikh
Ā 
Understanding P–N Junction Semiconductors: A Beginner’s Guide
Understanding P–N Junction Semiconductors: A Beginner’s GuideUnderstanding P–N Junction Semiconductors: A Beginner’s Guide
Understanding P–N Junction Semiconductors: A Beginner’s Guide
GS Virdi
Ā 
Ad

Comprehensive Guide to Neural Networks in Machine Learning and Deep Learning Applications"

  • 2. Bio-inspired Multi-Layer Networks 1.What is a Multi-Layer Neural Network? A Multi-Layer Perceptron (MLP) is an artificial neural network composed of multiple layers of neurons. •It is inspired by how biological neurons process information. Structure of a Multi-Layer Network •A typical two-layer neural network consists of: Input Layer – Receives input features (e.g., image pixels). Hidden Layer – Applies weighted transformations and activation functions. Output Layer – Produces the final prediction.
  • 3. Diagram of a Simple Two-Layer Network Here: •Each hidden neuron receives inputs and applies a non-linear function (activation function). •The output neuron combines hidden activations to make a final decision.
  • 4. 2. How Does a Multi-Layer Network Work? •The network performs forward propagation (to make predictions) and back propagation (to update weights). Step 1: Compute Hidden Layer Activations Each hidden neuron receives inputs and computes an activation:
  • 5. Graph of Activation Functions •tanh function (smooth, non-linear, differentiable) •ReLU function (better for deep networks)
  • 6. •The image compares the sign function (sign(x))and the hyperbolic tangent function (tanh⁔ (x)). Explanation: Sign Function (sign(x): Defined as •It is not differentiable at x=0 due to the discontinuity. •It produces discrete output values: -1 for negative numbers, 0 at zero, 1 for positive numbers.
  • 7. x sign(x) tanh⁔ (x) -2 -1 -0.964 -1 -1 -0.761 0 0 0 1 1 0.761 2 1 0.964 Example: Let’s compare outputs for a few values of x:
  • 8. 3. Training a Multi-Layer Network (Back propagation) • To train the network, we minimize the error between predicted and actual outputs. • Loss Function (Error) • A common choice is Mean Squared Error (MSE):
  • 10. Meaning of Activation in Neural Networks •Activation in a neural network refers to the value computed by a neuron after applying a mathematical function to its inputs. •It determines whether the neuron should be "active" (contribute to the next layer) or not. •Each neuron in a layer computes its activation using the formula: a=f(z) where: z=āˆ‘vihi+b is the weighted sum of inputs plus bias. •f(z) is an activation function that introduces non-linearity.
  • 11. Step 2: Apply Activation Function If we use a ReLU activation function f(z)=max⁔ (0,z), then: a=max⁔ (0,2.0)=2.0 So, the final activation output is 2.0. Example Let’s consider a simple neural network with one hidden neuron and one output neuron. Step 1: Compute Weighted Sum Suppose: Hidden neuron activation h1=2.0 Weight v1=0.5 Bias bo=1.0 Using the formula: z=(v1 h1)+bo=(0.5 2.0)+1.0=2.0 ā‹… ā‹…
  • 12. How to Solve XOR? Single-layer perceptron fails because XOR is not linearly separable. Multi-layer network solves it using a hidden layer. šŸ”½ Diagram: Two-Layer Network for XOR
  • 13. Explanation of the Small XOR Data Set The table shown in the image represents a small XOR dataset, which is commonly used in machine learning to illustrate the need for a non-linear model. Understanding the Columns y: The target output (label). x0​ : Bias term (always +1). x1,x2​ : Input features. Each row represents a training example. XOR Logic The dataset seems to be an extension of the XOR (Exclusive OR) function, where the output y depends on the inputs x1​and x2​ : y=x1 x2 āŠ• where denotes the XOR operation. āŠ• However, this dataset uses ±1 notation instead of 0 and 1, which is common in some neural network training approaches.
  • 15. x1x_1x1​ x2x_2x2​ h1=OR(x1,x2) h2=AND(x1,x2) y^=āˆ’2h2+h1​ 0 0 0 0 0 0 1 1 0 1 1 0 1 0 1 1 1 1 1 -2(1) + 1 = -1 Step 3: Understanding the Computation Let’s verify this using truth table values: This matches the expected XOR function: •(0,0)→0 •(0,1)→1 •(1,0)→1 •(1,1)→0
  • 16. X-----------------X (Linear Model āŒ) | | | O O | | | X-----------------X X-----O-----X (Multi-Layer Model āœ…) | | | | O | O | | | | X-----O-----X 5. Why Are Multi-Layer Networks Powerful? •The Universal Approximation Theorem •A single-layer network can only represent linear functions. •A two-layer network can approximate any function given enough neurons. •Deeper networks can learn complex patterns more efficiently.
  • 17. Back propagation Algorithm •The Back propagation Algorithm is a key method for training multi-layer neural networks. • It adjusts the weights of a neural network using gradient descent and the chain rule of calculus. 1. What is Back propagation? Back propagation allows a neural network to learn by: 1.Computing the error between predicted and actual values. 2.Propagating the error backward through the network. 3.Adjusting the weights to minimize the error.
  • 18. 2. Steps of Back propagation
  • 19. Step 2: Compute Error The error at the output layer is calculated as: e=yāˆ’y^​ where: y is the actual target output. Y^​is the predicted output from the model. Step 3: Compute Gradients (Backward Pass) This step involves computing how much we need to adjust the weights to reduce the error.
  • 24. Next Steps Repeat the Forward Pass: Use the updated weights (w=0.4005, v=0.6014) to compute the new hidden layer activation and output prediction. Compute New Error: Compare the new prediction with the actual target y. Compute the new error e=yāˆ’y^​ . Perform Another Back propagation Step: Compute new gradients for w and v. Update weights again using gradient descent. Continue Training for Multiple Epochs: Iterate this process for many epochs until the error is minimized, meaning the model has learned to approximate the target output well. Evaluate Performance: Once training is done, test the model on new data to check its accuracy and generalization.
  • 26. 5. Key Takeaways āœ” Back propagation uses gradient descent to adjust weights. It propagates errors from output to hidden layers. āœ” Allows deep networks to learn complex patterns. āœ” Used in almost all modern deep learning models. āœ”
  • 27. Initialization and Convergence of Neural Networks This section cusses: • Why Initialization Matters • Problems with Poor Initialization • Good Initialization Techniques • Challenges in Convergence • Strategies for Faster CONVERGENCE
  • 28. 1. Why Does Initialization Matter? šŸ”¹ What Happens with Poor Initialization? • Weight initialization is a critical step in training neural networks. Poor initialization can lead to issues such as slow convergence, unstable training, or the complete failure of the network to learn. Below is a detailed breakdown of why proper initialization is important and what happens if it is not done correctly. 1. All Weights = 0 → The Network Never Learns āŒ  If all weights are initialized to zero, the network fails to learn anything.  This is because all neurons will have the same gradients and will update identically.  This leads to symmetry in the network, meaning all neurons in the same layer behave the same way, making the network incapable of learning diverse features. āœ… Solution: Randomly initialize weights with small values to break symmetry.
  • 29. 2. Too Large Weights → Exploding Gradients āŒ When weights are initialized with very large values, the gradients during back propagation can also become excessively large. This leads to unstable training because the weight updates are too drastic, causing the network to diverge rather than converge. The problem is especially severe in deep networks, where multiple layers amplify these large gradients, making optimization difficult. āœ… Solution: Use Xavier Initialization or He Initialization, which scales weights appropriately to prevent large gradient values.
  • 30. 3. Too Small Weights → Vanishing Gradients āŒ If the weights are initialized with very small values, the gradients in the deeper layers become extremely small during back propagation. This slows down learning since the weight updates are negligible. The problem is particularly common when using activation functions like sigmoid or tanh, where gradients shrink as they propagate backward. As a result, earlier layers learn very slowly, while later layers receive better updates, leading to inefficient training. āœ… Solution: Use ReLU activation functions instead of sigmoid/tanh. Use He Initialization, which is designed for ReLU-based networks to maintain proper gradient flow
  • 31. 2.Common Weight Initialization Methods •To prevent these issues, we use smart initialization techniques.  (1) Random Initialization (Old Approach āŒ) • Assign random values (e.g., small Gaussian noise). Problem: It can still cause vanishing/exploding gradients.
  • 33. 3. Challenges in Convergence šŸ”¹ (1) Vanishing Gradients šŸ˜“ In deep networks, gradients shrink → slow learning. Solution: Use ReLU + He initialization. āœ” šŸ”¹ (2) Exploding Gradients šŸ’„ In deep networks, gradients become too large. Solution: Use gradient clipping or He initialization. āœ” šŸ”¹ (3) Poor Local Minima 😩 The network gets stuck in bad solutions. Solution: Use batch normalization and adaptive optimizers (Adam, āœ” RMSprop).
  • 34. Convergence of Randomly Initialized Networks •The graph illustrates how different weight initialization methods affect the convergence of a neural network during training. •The x-axis represents the number of iterations (training progress), while the y-axis represents the test error (lower is better). Key Observations: Zero Initialization Fails to Converge Efficiently The curve labeled "zero-init" remains significantly higher than the others, meaning the network struggles to reduce test error effectively. This happens because initializing all weights to zero leads to symmetry in the network, causing neurons to learn the same features, making training ineffective.
  • 35. 2.Random Initialization Leads to Faster Convergence • The multiple colored lines represent different runs of the network with random weight initialization. • These networks show better and faster reduction in test error compared to zero initialization. • However, the convergence rate varies among different runs, suggesting that the choice of initialization distribution can still impact training efficiency. 3.Final Performance Variation • Some curves (e.g., green and blue) achieve lower test errors faster, while others (e.g., purple) take longer or oscillate more. • This highlights the importance of using optimized initialization techniques (e.g., Xavier or He initialization) instead of purely random initialization.
  • 36. Problem Solution Slow learning Use adaptive learning rates (Adam, RMSprop) Vanishing gradients Use ReLU + He initialization Exploding gradients Apply gradient clipping Poor local minima Use batch normalization 4. Strategies to Improve Convergence
  • 37. 6. Key Takeaways āœ” Good initialization speeds up training and prevents bad convergence. āœ” Xavier (Glorot) → Best for sigmoid/tanh networks. āœ” He Initialization → Best for ReLU-based deep networks. āœ” LeCun Initialization → Works well for small networks with sigmoid.
  • 38. Beyond Two Layers in Neural Networks (Deep Learning) • Neural networks can go beyond two layers to create deep neural networks (DNNs), which allow them to learn more complex patterns. • This section will explain: • Why go beyond two layers? • Deep Networks Structure (Diagrams) • Forward and Back propagation in Deep Networks • Advantages & Challenges of Deep Networks • Graphs showing Training Convergence
  • 39. Feature Two-Layer Network Multi-Layer Network Function Approximation Can approximate any function More efficient & expressive Training Complexity Fewer parameters, easier to train More parameters, harder to optimize Performance on Complex Tasks Limited to simple problems Handles deep features (vision, NLP) Why Go Beyond Two Layers? • Two-layer network can approximate any function, but deeper networks offer: Fewer neurons for the same task (efficient representation) āœ” Better generalization (learns hierarchical patterns) āœ” Solves complex tasks (e.g., image recognition, NLP) āœ” šŸ”½ Comparison: Two-Layer vs. Multi-Layer Networks
  • 40. 2. Deep Networks Structure (Diagrams) A two-layer network consists of one hidden layer between input and output: šŸ”½ Two-Layer Neural Network (Shallow) Multi-Layer Neural Network (Deep)
  • 44. Step 2: Back propagation • Compute error at output • Propagate errors backward through layers • Update weights using gradient descent šŸ”½ Graph of Back propagation
  • 48. 4. Advantages & Challenges of Deep Networks šŸ”¹ Advantages āœ” Feature Hierarchies: First layers learn simple patterns (edges), deeper layers learn complex features (objects). Efficient Representation: Fewer parameters required than a wide āœ” network. Better Generalization: Captures higher-level abstract representations. āœ” šŸ”¹ Challenges āŒ Vanishing Gradients: Lower layers get very small updates, slowing learning. āŒ Exploding Gradients: Large updates make training unstable. āŒ More Parameters: Harder to optimize.
  • 49. Breadth vs Depth, Basis Functions What is Breadth vs. Depth? • Neural networks can be designed to be wide (more neurons per layer) or deep (more layers with fewer neurons per layer). • Wide Networks: Have more neurons per layer but fewer layers. • Deep Networks: Have fewer neurons per layer but many layers.
  • 50. Why Consider Deeper Networks If Two-Layer Networks Are Universal Function Approximators? •The Universal Approximation Theorem states that a two-layer (shallow) neural network with enough hidden units can approximate any function. •However, this does not mean that a shallow network is the most efficient way to represent every function. •Some functions require an exponential number of neurons in a shallow network, whereas a deep network can achieve the same result with far fewer neurons.
  • 51. Circuit Complexity and the Parity Function •To understand why deep networks can be more efficient, we look at circuit complexity: Consider the parity function, which determines whether the number of 1s in a binary input is odd or even: 1 if the number of 1s is odd 0 if the number of 1s is even
  • 52. If we use XOR gates in a circuit: •A deep network (logarithmic depth) can compute parity with O(D) gates (linear in the number of inputs). •A shallow network (constant depth) would require exponential many gates, making it inefficient.
  • 53. Feature Wide Network (Breadth) Deep Network (Depth) Number of Layers Small (1-2) Large (4+) Number of Neurons per Layer Large Small Computational Complexity Low High Expressiveness Limited Can approximate complex functions Training Time Faster Slower Vanishing Gradient Problem Less likely More likely Best For Simple tasks Hierarchical features (e.g., image recognition) Trade-Off Between Breadth and Depth
  • 54. Basis Functions in Neural Networks Neural networks can approximate both linear and complex non-linear functions. A natural question arises: šŸ‘‰ Can a neural network mimic a k-Nearest Neighbors (KNN) classifier efficiently? •The answer lies in using Radial Basis Functions (RBFs), which transform a neural network into a structure that behaves similarly to KNN.
  • 55. 1. What is a Basis Function? •A basis function is a mathematical function used to transform input data before passing it to the network. It helps in: •Feature transformation: Mapping input space to a more useful representation. •Better function approximation: Capturing complex relationships in data. šŸ”¹ Example: •Linear functions use a dot product transformation •Here, wi​is the center of the radial function. •γi​controls the width of the Gaussian function.
  • 56. 2. How Do RBF Networks Mimic KNN? a) KNN Classifier •KNN makes predictions based on distances: It finds the closest K points to a given query point and assigns the most common label. b) RBF Networks •Instead of directly storing all training points, RBF neurons act like prototypes. •Each hidden unit in an RBF network corresponds to a "prototype" data point. •The output is determined by a weighted sum of these RBF neurons, similar to KNN’s distance-weighted voting. Key Idea: •Large γ → Localized activation (behaves like KNN, considering only nearby points) •Small γ → Broad activation (behaves like a generalizing model, considering distant points too)
  • 57. X (Input) Y (Class) 1.0 A 1.5 A 2.0 B 2.5 B 3. Example: RBF vs. KNN •Imagine we have a 1D dataset where we classify points based on proximity: •KNN (K=1): Predicts class based on nearest neighbor. RBF Network: Centers an RBF neuron at each data point. •Uses a Gaussian function to determine influence. •The output is a weighted sum of RBF neuron activations.
  • 58. If a new point X = 1.8 is given: KNN (K=1) → Predicts Class A (closer to 1.5) RBF with optimized γ behaves similarly! Feature KNN RBF Network Memory Usage Stores entire dataset Stores fewer prototypes Computational Cost Slow for large datasets Fast after training Generalization Sensitive to noise Can learn smoother decision boundaries Training No training needed Requires optimization of centers & γ . Advantages of RBF Networks Over KNN