0% found this document useful (0 votes)

3 views50 pages

Introduction Deep Eng (1)

Deep Learning is a sub-field of Artificial Intelligence that utilizes deep neural networks to automatically learn from large datasets, avoiding manual feature extraction. It consists of various layers, including input, hidden, and output layers, and relies on techniques like forward propagation and backpropagation to optimize model performance. Key concepts include activation functions, cost functions, and the learning rate, which all play crucial roles in training neural networks effectively.

Uploaded by

Anwar Chouchane

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

3 views50 pages

Introduction Deep Eng (1)

Uploaded by

Anwar Chouchane

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 50

Introduction to

Deep Learning
Ones Sidhom

Email 2024-2025
[email protected]
What is
learning
Deep
Deep Learning is a sub-field of Artificial Intelligence
(AI), more specifically Machine Learning, which relies on the
use of deep neural networks to model and learn
from
complex representations from large quantities of data.
Machine Vs Deep learning

In traditional machine learning, features are usually extracted manually by experts (hand-
crafted feature extraction), which can be a laborious process and requires in-depth domain
knowledge.

Machine learning algorithms are then used to classify the data according to these characteristics
Machine Vs Deep learning

Deep learning uses deep neural networks that can learn automatically.
extract relevant features directly from raw data.

This avoids feature engineering process and can lead to better performance, especially for complex
tasks.
Deep learning
Deep learning generally requires large data sets for several reasons:

• Automatic feature learning: Deep neural networks automatically learn to extract relevant
features from data. To do this, they need a large number of examples to be able to identify
complex patterns in the data.

• Model complexity: Deep neural networks are complex models with a large number of
parameters. To train them effectively and avoid overlearning, they require a large volume of
data to favor generalization to new examples rather than memorization of training data.
Deep learning
Deep learning generally requires large data sets for several reasons:

• Data variability: Real data can be highly varied, containing noise or anomalies. A large data set
helps to better represent data diversity and improve model robustness.

• Accuracy: To achieve high levels of accuracy, deep neural networks need to be trained on
large, high-quality data sets.
Layers in artificial neural networks (ANNs)

Artificial neural network (ANN) layers composed of three main types of layers:
• Input
• Hidden layers
• Output layer
Layers in artificial neural networks (ANNs)

1. Input layer :
• Role: Receives raw data.
• Function: Transmits data to hidden layers.
• Example: For an image, the input layer contains neurons for each pixel value.
Layers in artificial neural networks (ANNs)

2. Hidden :
• Hidden layers are the intermediate layers between input and output layers.
• They perform most of the calculations required by the network. The number and size of hidden layers
can vary according to the complexity of the task.
• Each hidden layer applies a set of weights and biases to the input data, followed by an activation
function
Layers in artificial neural networks (ANNs)

3. Output layer:
• The output layer is the last layer of an ANN.
• It produces output predictions.
• The number of neurons in this layer corresponds to the number of classes in a
classification problem or the number of outputs in a regression problem.
What is a single-layer perceptron?
This is one of the oldest neural networks. It was proposed by Frank Rosenblatt in 1958. The perceptron is
also known as an Artificial Neural Network.

The main functions of the perceptron are

as follows:

• It takes data from the input layer.

• The weights are multiplied by the

inputs and the sum is calculated.

• Pass the sum to the non-linear

function to produce the output.
What is a single-layer perceptron?

feature 1
bias
X1 W1
b1
output of
Z1
the neuron

X2 W2
weight
feature 2
Calculating the output of a single neuron

Without activation function With activation

Forward Propagation

• Forward propagation is the first stage of computation in a neural network. It's the
passage information from the inputs to the output, applying weights, biases and
activation layer by layer.

• Objective: Calculate network output from inputs.

Forward Propagation
Forward Propagation stages
In a neural network, each neuron performs two main operations:
1.Calculation of the weighted sum of inputs

2. Application of the activation

• f is an activation

⮚ These operations are repeated for each layer of the network until the final output is obtained.
Forward Propagation

X2
Forward Propagation

X1 W3
b2

X2
Forward Propagation

X2 W6
b3
Forward Propagation

b4
Forward Propagation

network weight through the

equation to network
calculate entries
Forward Propagation

network weight through the

equation to network
calculate entries
Forward Propagation
Forward Propagation
Forward Propagation
Activation functions

• An activation is a transformation of a neuron' output

Role of the activation in a neural network
1. Introduce non-linearity: Without it, a network would be equivalent to a simple linear
regression, even with several layers. Thanks to non-linearity, the network can :

• Learning complex relationships in data

• Approximately model any function
2. Deciding whether to activate a neuron
• It allows each neuron to determine whether or not it should "activate", depending on the value of the
of z.
3. Control the flow information
• It acts as an adaptive filter on incoming information: it can block, attenuate or amplify
signals.
Activation functions
1
Sigmoid

0.5

• Output between 0 and 1.

• Used in binary classification networks. 0
-6 -4 -2 0 2 4 6

Softmax

• Used in the last layer for multi-class classification.

• Converts outputs into normalized probabilities.
Activation function
Tanh

• Output between -1 and 1.

• Used in the layers intermediate layers to
centerdata around zero.

ReLU

• Replaces negative values with 0.

• Very popular in the networks deep at due of
simplicity and efficiency.
Activation function
Rules for the activation

For hidden layers : For the output layer :

• Activation of the hidden layer must - Sigmoid for binary classification
never be linear.
• Softmax for multi-class classification.

- Activation linear for regression

Cost Function
• The loss function and the cost function are at the heart of deep learning. They are used to measure
the error between what the neural network predicts and the truth (the true labels or values).

• In practice, these two terms are often used interchangeably, but technically :
▪ Loss function: The error for a single example (input-output),
▪ Cost function: The average error over the entire dataset (or a mini-batch).

• The loss functions enable the network :

▪ To know how wrong he is
▪ Adjust weights to learn, gradient descent
▪ Optimize model performance
Cost Function
Loss function Problem Use when

Binary Cross-Entropy Binary 2-class classification+ sigmoid

Categorical Cross-Entropy Multi-class Multiple classes+ softmax+ one-hot

Sparse Categorical Cross-Entropy Multi-class Multiple classes without one-hot

MSE (Mean Squared Error) Regression Classical regression (derivative more stable than MAE)

Regression+ presence outliers (less sensitive to outliers than

MAE (Mean Absolute Error) Regression
MSE)
Stable regression+ resistant to outliers (mix between MSE
Huber Loss Regression
and MAE)
Backpropagation
• Backpropagation is a technique used in deep learning to train neural networks.
• It works by progressively adjusting the weights and biases to reduce the error (or the cost
function).
• At each iteration (or epoch), the model follows the direction of error gradient to improve its
predictions.
• This technique often uses optimization such as gradient descent. The calculation of the gradient is
based on the mathematical chain rule, which ensures that even the deepest layers of the network
are correctly updated.
Backpropagation
Backpropagation
• Backpropagation calculates the contribution of features to the cost function.

1. Error calculation: measures the difference between prediction and reality.

2. Gradient descent: For this, the network calculates the gradient of the error with respect to weights and
biases. This gradient indicates the direction in which the error decreases most rapidly.
3. Updating weights and biases: The network uses this gradient information to update weights and biases across
all layers.
4. Iterative process: with each iteration, the network weights and biases gradually approach the ideal values
that minimize the overall error.
Backpropagation
Backpropagation

• True value : y
• Cost : C= y - a (2)

y a(2)

C
Backpropagation

• True value : y
• Cost : C= y - a(2)
a(2)= α(z(2))

z(2)

y a(2)

C
Backpropagation

• True value : y
• Cost : C= y - a(2)
a(2)= α(z(2)) w2 a(1) b2
z(2)= w2 a(1)+ b2 z(2)

y a(2)

C
Backpropagation

• True value : y
• Cost : C= y - a(2) Z(1)
a(2)= α(z(2)) w2 a(1) b2
z(2)= w2 a(1)+ b2 z(2)
a(1)= α(z(1))
y a(2)

C
Backpropagation

w1 a(0) b1
• True value : y
• Cost : C= y - a(2) Z(1)
a(2)= α(z(2)) w2 a(1) b2
z(2)= w2 a(1)+ b2 z(2)
a(1)= α(z(1))
y a(2)
z(1)= w1 a(0)+ b1
C
Backpropagation
• We want to know how modifying an element in this tree will affect the cost function
Partial derivative.

= Partial derivative of C with respect to W2

• The cost function cannot be derived directly from the weights, as the relationship is
indirect (through several intermediate functions).
• This is why we use the chain rule, which allows us to split the derivative into several smaller
partial derivatives, layer by layer. This allows us to calculate the impact of the weights on the
error, and thus to adjust them correctly.
Backpropagation

= Partial derivative of C with respect to

w1 a(0) b1

z(1)
w2 a(1) b2
z(2)

y a(2)

C
Backpropagation

= Partial derivative of C with respect to

w1 a(0) b1

z(1)

w2 a(1) b2
z(2)

y a(2)

C
Backpropagation

= Partial derivative of C with respect to

w1 a(0) b1

z(1)
w2 a(1) b2
• The chain rule allows you to "move up" gradually z(2)
in the network, by splitting the global derivative into products
y a(2)
of partial derivatives at each stage.
C
Backpropagation
Learning rate
• The learning rate is a fundamental parameter in training a machine.
neural network. It directly influences the speed and quality learning.
• The learning rate (often denoted) controls how much the network weights are modified with each
learning stage, depending on the error (the loss).
• This is a speed factor used in the gradient descent algorithm.

• The learning rate determines the amplitude of this correction.

Learning rate

cost
cost cost

w w w

• Very slow to learn • Fast, stable • The model may never

(various periods) convergence to a converge, or diverge

good minimum completely
Learning rate
-.

Learning rate
1. High learning rate= instability
• The model hops around the minimum.

2. Learning rate too high= divergence

• The learning rate is so high that every update skips far
over the minimum.

3. Learning rate too low= very slow learning

• The model learns, but takes a long time to
converge.
• Risk of wasting resources

4. Good learning rate= fast, stable convergence

• Minimizes loss quickly.
• The network learns efficiently without bursts or
instability
Gradient descent
For a positive gradient :

• As the gradient is positive, the subtraction effectively

reduces w and thus the cost function.

For a negative gradient :

• Since the gradient is negative, subtracting it increases

w. We therefore add it here to reduce the cost
function.
How a neural network works

The Cybersecurity Guide To Governance, Risk, and Compliance
100% (4)
The Cybersecurity Guide To Governance, Risk, and Compliance
667 pages
Chapter 5 Final
No ratings yet
Chapter 5 Final
80 pages
Mid 1 DL Notes
No ratings yet
Mid 1 DL Notes
15 pages
ca3dl
No ratings yet
ca3dl
6 pages
Unit 1 (1)
No ratings yet
Unit 1 (1)
72 pages
shortnotedeeplearning (2)
No ratings yet
shortnotedeeplearning (2)
11 pages
AI & ML Unit 5 Notes
No ratings yet
AI & ML Unit 5 Notes
23 pages
Notes On Introduction To Deep Learning
No ratings yet
Notes On Introduction To Deep Learning
19 pages
DEEP LEARNING Paper
No ratings yet
DEEP LEARNING Paper
12 pages
cst414- Deep learning
No ratings yet
cst414- Deep learning
34 pages
UNIT-I.pptx
No ratings yet
UNIT-I.pptx
90 pages
Intro To DL
No ratings yet
Intro To DL
28 pages
Artificial Neural Networks
No ratings yet
Artificial Neural Networks
100 pages
Convolutional Neural Networks
No ratings yet
Convolutional Neural Networks
21 pages
ML_MU_Unit_5NeuralNetworkpdf__2025_04_16_13_47_39
No ratings yet
ML_MU_Unit_5NeuralNetworkpdf__2025_04_16_13_47_39
57 pages
4.0 The Complete Guide To Artificial Neural Networks
No ratings yet
4.0 The Complete Guide To Artificial Neural Networks
23 pages
Deep Learning Notes
No ratings yet
Deep Learning Notes
3 pages
Unit 5 Ml
No ratings yet
Unit 5 Ml
37 pages
Artificial Intelligence: Outline
No ratings yet
Artificial Intelligence: Outline
35 pages
Deep Learning
No ratings yet
Deep Learning
20 pages
Unit 2 Deep Learning and Neural Networks
No ratings yet
Unit 2 Deep Learning and Neural Networks
38 pages
Unit 3
No ratings yet
Unit 3
8 pages
ML.8-Neural Networks - Deep Learning (Week 12,13)
No ratings yet
ML.8-Neural Networks - Deep Learning (Week 12,13)
80 pages
Unit-5
No ratings yet
Unit-5
59 pages
Artificial Intelligence - Chapter 7
No ratings yet
Artificial Intelligence - Chapter 7
18 pages
Deep Learning
No ratings yet
Deep Learning
37 pages
Neural network
No ratings yet
Neural network
7 pages
Unit_2
No ratings yet
Unit_2
20 pages
Tensorflow Keras Pytorch: Step 1: For Each Input, Multiply The Input Value X With Weights W
No ratings yet
Tensorflow Keras Pytorch: Step 1: For Each Input, Multiply The Input Value X With Weights W
6 pages
Fundamentals of Deep Learning
No ratings yet
Fundamentals of Deep Learning
26 pages
2023246032-Backward Propagation and Other Differential Algorithms
No ratings yet
2023246032-Backward Propagation and Other Differential Algorithms
48 pages
tutorial 1,2
No ratings yet
tutorial 1,2
12 pages
Deep Learning
No ratings yet
Deep Learning
299 pages
Assignment B 3 Customer Churn Modeling
No ratings yet
Assignment B 3 Customer Churn Modeling
7 pages
Lesson 3 Artificial Neural Network
No ratings yet
Lesson 3 Artificial Neural Network
77 pages
Machine Learning NN
100% (2)
Machine Learning NN
16 pages
Ann MJJ-1
No ratings yet
Ann MJJ-1
64 pages
ANN-CNN-RNN
No ratings yet
ANN-CNN-RNN
26 pages
Unit 1 Fundamentals of Deep Learning
No ratings yet
Unit 1 Fundamentals of Deep Learning
20 pages
NeuralNetworks
No ratings yet
NeuralNetworks
29 pages
Unit 3
No ratings yet
Unit 3
7 pages
Unit 4
No ratings yet
Unit 4
19 pages
Neural Networks
No ratings yet
Neural Networks
40 pages
unit 2 -ml
No ratings yet
unit 2 -ml
18 pages
Neural Networks - V Unit (2)
No ratings yet
Neural Networks - V Unit (2)
43 pages
Lecture8,9-Neural Networks
No ratings yet
Lecture8,9-Neural Networks
65 pages
UNIT II DNN
No ratings yet
UNIT II DNN
24 pages
Notes DL-1
No ratings yet
Notes DL-1
10 pages
Domnic Object Detecion Basics
No ratings yet
Domnic Object Detecion Basics
62 pages
AML 03 Dense Neural Networks
No ratings yet
AML 03 Dense Neural Networks
20 pages
Unit 4
No ratings yet
Unit 4
38 pages
Deep Learning and Its Applications
No ratings yet
Deep Learning and Its Applications
21 pages
Neural Networks
No ratings yet
Neural Networks
61 pages
DL_PRESENTATION
No ratings yet
DL_PRESENTATION
82 pages
CC511 Week 5 - 6 - NN - BP
No ratings yet
CC511 Week 5 - 6 - NN - BP
62 pages
NN unit_1
No ratings yet
NN unit_1
27 pages
NN Concepts
No ratings yet
NN Concepts
4 pages
Understanding and Coding Neural Networks From Scratch in Python and R
100% (1)
Understanding and Coding Neural Networks From Scratch in Python and R
15 pages
Assignment - 4
No ratings yet
Assignment - 4
24 pages
Backpropagation: Fundamentals and Applications for Preparing Data for Training in Deep Learning
From Everand
Backpropagation: Fundamentals and Applications for Preparing Data for Training in Deep Learning
Fouad Sabry
No ratings yet
Neural Networks
From Everand
Neural Networks
Sasha Kurzweil
No ratings yet
Final2
No ratings yet
Final2
58 pages
Wormhole
No ratings yet
Wormhole
2 pages
Chap 3 Logical Agents
No ratings yet
Chap 3 Logical Agents
25 pages
Cat Dog Classification Project
No ratings yet
Cat Dog Classification Project
10 pages
Questions (1)
No ratings yet
Questions (1)
1 page
Questions (4)
No ratings yet
Questions (4)
1 page
Cours CNN eng (1)
No ratings yet
Cours CNN eng (1)
60 pages
Plant Disease Detection Using Deep Learning (3)
No ratings yet
Plant Disease Detection Using Deep Learning (3)
16 pages
Finals
No ratings yet
Finals
57 pages
TabTransformer - Tabular Data Modeling Using Contextual Embeddings
No ratings yet
TabTransformer - Tabular Data Modeling Using Contextual Embeddings
17 pages
Zeroth Review PPT Template
No ratings yet
Zeroth Review PPT Template
18 pages
Python + OpenAI Powered Humanoid AI Desktop Assistant Robot Guided by - Dr. S. Malathi - Ashwin M - 2020PECAI130 - Rijo Benny - 2020PECAI152
No ratings yet
Python + OpenAI Powered Humanoid AI Desktop Assistant Robot Guided by - Dr. S. Malathi - Ashwin M - 2020PECAI130 - Rijo Benny - 2020PECAI152
10 pages
Artificial Intelligence Mcqs
No ratings yet
Artificial Intelligence Mcqs
173 pages
Internet of Things by 2025:: Artificial Intelligence Will Continue To Become A Bigger Thing
No ratings yet
Internet of Things by 2025:: Artificial Intelligence Will Continue To Become A Bigger Thing
2 pages
Recent Advances in Artificial Intelligence: TEQIP-III Sponsored Five Days Online Short-Term Training Programme (STTP)
No ratings yet
Recent Advances in Artificial Intelligence: TEQIP-III Sponsored Five Days Online Short-Term Training Programme (STTP)
1 page
Data Science
100% (2)
Data Science
33 pages
MEESHO
No ratings yet
MEESHO
4 pages
Aipowered Interventions Revolutionizing Drug Abuse Prevention
No ratings yet
Aipowered Interventions Revolutionizing Drug Abuse Prevention
5 pages
Unit3 2023 NNDL
No ratings yet
Unit3 2023 NNDL
69 pages
AI Agent for info retrieval
No ratings yet
AI Agent for info retrieval
3 pages
Call for Abstracts_ Innovation and Research Summit 2025
No ratings yet
Call for Abstracts_ Innovation and Research Summit 2025
1 page
Research Methodology: Romi Satria Wahono
No ratings yet
Research Methodology: Romi Satria Wahono
381 pages
What Is A Deepfake?: Furious 7. But It Used To Take Entire Studios Full of Experts A Year To Create
No ratings yet
What Is A Deepfake?: Furious 7. But It Used To Take Entire Studios Full of Experts A Year To Create
2 pages
Introducing Natural Language Processing
No ratings yet
Introducing Natural Language Processing
13 pages
Question Bank For Seen Pre-Board - AI - Grade 10 - 2021-22
No ratings yet
Question Bank For Seen Pre-Board - AI - Grade 10 - 2021-22
7 pages
Recent Advances in Generative AI and Large Language Models Current Status Challenges and Perspectives-3
No ratings yet
Recent Advances in Generative AI and Large Language Models Current Status Challenges and Perspectives-3
21 pages
(MIT Electrical Engineering and Computer Science) Berthold K.P. Horn - Robot Vision-MIT Press (1986)
No ratings yet
(MIT Electrical Engineering and Computer Science) Berthold K.P. Horn - Robot Vision-MIT Press (1986)
536 pages
AWS Certified Machine Learning Specialty Exam Guide
No ratings yet
AWS Certified Machine Learning Specialty Exam Guide
7 pages
1163
No ratings yet
1163
2 pages
Deep LearningINAF With MATLAB
No ratings yet
Deep LearningINAF With MATLAB
80 pages
Thesis Topics in Aviation
100% (3)
Thesis Topics in Aviation
5 pages
Group 12 Zerodha SectionB
No ratings yet
Group 12 Zerodha SectionB
12 pages
Deep Unordered Composition Rivals Syntactic Methods For Text Classification
No ratings yet
Deep Unordered Composition Rivals Syntactic Methods For Text Classification
11 pages
Information Retriver EV
No ratings yet
Information Retriver EV
8 pages
Lin, J. S., & Chen, K. H. (2024). A
No ratings yet
Lin, J. S., & Chen, K. H. (2024). A
3 pages
Artificial Neural Networks-Based Machine Learning For Wireless Networks: A Tutorial
No ratings yet
Artificial Neural Networks-Based Machine Learning For Wireless Networks: A Tutorial
33 pages
A Review of Artificial Intelligence Impacting Statistical Process Monitoring and Future Directions
No ratings yet
A Review of Artificial Intelligence Impacting Statistical Process Monitoring and Future Directions
44 pages

Introduction Deep Eng (1)

Uploaded by

Introduction Deep Eng (1)

Uploaded by

Introduction to

The main functions of the perceptron are

• It takes data from the input layer.

• The weights are multiplied by the

• Pass the sum to the non-linear

Without activation function With activation

• Objective: Calculate network output from inputs.

2. Application of the activation

network weight through the

network weight through the

• An activation is a transformation of a neuron' output

• Learning complex relationships in data

• Output between 0 and 1.

• Used in the last layer for multi-class classification.

• Output between -1 and 1.

• Replaces negative values with 0.

For hidden layers : For the output layer :

- Activation linear for regression

• The loss functions enable the network :

Binary Cross-Entropy Binary 2-class classification+ sigmoid

Categorical Cross-Entropy Multi-class Multiple classes+ softmax+ one-hot

Sparse Categorical Cross-Entropy Multi-class Multiple classes without one-hot

Regression+ presence outliers (less sensitive to outliers than

1. Error calculation: measures the difference between prediction and reality.

= Partial derivative of C with respect to W2

= Partial derivative of C with respect to

= Partial derivative of C with respect to

= Partial derivative of C with respect to

• The learning rate determines the amplitude of this correction.

• Very slow to learn • Fast, stable • The model may never

(various periods) convergence to a converge, or diverge

2. Learning rate too high= divergence

3. Learning rate too low= very slow learning

4. Good learning rate= fast, stable convergence

• As the gradient is positive, the subtraction effectively

For a negative gradient :

• Since the gradient is negative, subtracting it increases

You might also like