0% found this document useful (0 votes)
3 views50 pages

Introduction Deep Eng (1)

Deep Learning is a sub-field of Artificial Intelligence that utilizes deep neural networks to automatically learn from large datasets, avoiding manual feature extraction. It consists of various layers, including input, hidden, and output layers, and relies on techniques like forward propagation and backpropagation to optimize model performance. Key concepts include activation functions, cost functions, and the learning rate, which all play crucial roles in training neural networks effectively.

Uploaded by

Anwar Chouchane
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views50 pages

Introduction Deep Eng (1)

Deep Learning is a sub-field of Artificial Intelligence that utilizes deep neural networks to automatically learn from large datasets, avoiding manual feature extraction. It consists of various layers, including input, hidden, and output layers, and relies on techniques like forward propagation and backpropagation to optimize model performance. Key concepts include activation functions, cost functions, and the learning rate, which all play crucial roles in training neural networks effectively.

Uploaded by

Anwar Chouchane
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 50

Introduction to

Deep Learning
Ones Sidhom

Email 2024-2025
[email protected]
What is
learning
Deep
Deep Learning is a sub-field of Artificial Intelligence
(AI), more specifically Machine Learning, which relies on the
use of deep neural networks to model and learn
from
complex representations from large quantities of data.
Machine Vs Deep learning

In traditional machine learning, features are usually extracted manually by experts (hand-
crafted feature extraction), which can be a laborious process and requires in-depth domain
knowledge.

Machine learning algorithms are then used to classify the data according to these characteristics
Machine Vs Deep learning

Deep learning uses deep neural networks that can learn automatically.
extract relevant features directly from raw data.

This avoids feature engineering process and can lead to better performance, especially for complex
tasks.
Deep learning
Deep learning generally requires large data sets for several reasons:

• Automatic feature learning: Deep neural networks automatically learn to extract relevant
features from data. To do this, they need a large number of examples to be able to identify
complex patterns in the data.

• Model complexity: Deep neural networks are complex models with a large number of
parameters. To train them effectively and avoid overlearning, they require a large volume of
data to favor generalization to new examples rather than memorization of training data.
Deep learning
Deep learning generally requires large data sets for several reasons:

• Data variability: Real data can be highly varied, containing noise or anomalies. A large data set
helps to better represent data diversity and improve model robustness.

• Accuracy: To achieve high levels of accuracy, deep neural networks need to be trained on
large, high-quality data sets.
Layers in artificial neural networks (ANNs)

Artificial neural network (ANN) layers composed of three main types of layers:
• Input
• Hidden layers
• Output layer
Layers in artificial neural networks (ANNs)

1. Input layer :
• Role: Receives raw data.
• Function: Transmits data to hidden layers.
• Example: For an image, the input layer contains neurons for each pixel value.
Layers in artificial neural networks (ANNs)

2. Hidden :
• Hidden layers are the intermediate layers between input and output layers.
• They perform most of the calculations required by the network. The number and size of hidden layers
can vary according to the complexity of the task.
• Each hidden layer applies a set of weights and biases to the input data, followed by an activation
function
Layers in artificial neural networks (ANNs)

3. Output layer:
• The output layer is the last layer of an ANN.
• It produces output predictions.
• The number of neurons in this layer corresponds to the number of classes in a
classification problem or the number of outputs in a regression problem.
What is a single-layer perceptron?
This is one of the oldest neural networks. It was proposed by Frank Rosenblatt in 1958. The perceptron is
also known as an Artificial Neural Network.

The main functions of the perceptron are


as follows:

• It takes data from the input layer.

• The weights are multiplied by the


inputs and the sum is calculated.

• Pass the sum to the non-linear


function to produce the output.
What is a single-layer perceptron?

feature 1
bias
X1 W1
b1
output of
Z1
the neuron

X2 W2
weight
feature 2
Calculating the output of a single neuron

Without activation function With activation


Forward Propagation

• Forward propagation is the first stage of computation in a neural network. It's the
passage information from the inputs to the output, applying weights, biases and
activation layer by layer.

• Objective: Calculate network output from inputs.


Forward Propagation
Forward Propagation stages
In a neural network, each neuron performs two main operations:
1.Calculation of the weighted sum of inputs

2. Application of the activation

• f is an activation

⮚ These operations are repeated for each layer of the network until the final output is obtained.
Forward Propagation

b1

X1

X2
Forward Propagation

X1 W3
b2

X2
Forward Propagation

X1

X2 W6
b3
Forward Propagation

X1

X2

b4
Forward Propagation

network weight through the


equation to network
calculate entries
Forward Propagation

network weight through the


equation to network
calculate entries
Forward Propagation
Forward Propagation
Forward Propagation
Activation functions

• An activation is a transformation of a neuron' output


Role of the activation in a neural network
1. Introduce non-linearity: Without it, a network would be equivalent to a simple linear
regression, even with several layers. Thanks to non-linearity, the network can :

• Learning complex relationships in data


• Approximately model any function
2. Deciding whether to activate a neuron
• It allows each neuron to determine whether or not it should "activate", depending on the value of the
of z.
3. Control the flow information
• It acts as an adaptive filter on incoming information: it can block, attenuate or amplify
signals.
Activation functions
1
Sigmoid

0.5

• Output between 0 and 1.


• Used in binary classification networks. 0
-6 -4 -2 0 2 4 6

Softmax

• Used in the last layer for multi-class classification.


• Converts outputs into normalized probabilities.
Activation function
Tanh

• Output between -1 and 1.


• Used in the layers intermediate layers to
centerdata around zero.

ReLU

• Replaces negative values with 0.


• Very popular in the networks deep at due of
simplicity and efficiency.
Activation function
Rules for the activation

For hidden layers : For the output layer :


• Activation of the hidden layer must - Sigmoid for binary classification
never be linear.
• Softmax for multi-class classification.

- Activation linear for regression


Cost Function
• The loss function and the cost function are at the heart of deep learning. They are used to measure
the error between what the neural network predicts and the truth (the true labels or values).

• In practice, these two terms are often used interchangeably, but technically :
▪ Loss function: The error for a single example (input-output),
▪ Cost function: The average error over the entire dataset (or a mini-batch).

• The loss functions enable the network :


▪ To know how wrong he is
▪ Adjust weights to learn, gradient descent
▪ Optimize model performance
Cost Function
Loss function Problem Use when

Binary Cross-Entropy Binary 2-class classification+ sigmoid

Categorical Cross-Entropy Multi-class Multiple classes+ softmax+ one-hot

Sparse Categorical Cross-Entropy Multi-class Multiple classes without one-hot

MSE (Mean Squared Error) Regression Classical regression (derivative more stable than MAE)

Regression+ presence outliers (less sensitive to outliers than


MAE (Mean Absolute Error) Regression
MSE)
Stable regression+ resistant to outliers (mix between MSE
Huber Loss Regression
and MAE)
Backpropagation
• Backpropagation is a technique used in deep learning to train neural networks.
• It works by progressively adjusting the weights and biases to reduce the error (or the cost
function).
• At each iteration (or epoch), the model follows the direction of error gradient to improve its
predictions.
• This technique often uses optimization such as gradient descent. The calculation of the gradient is
based on the mathematical chain rule, which ensures that even the deepest layers of the network
are correctly updated.
Backpropagation
Backpropagation
• Backpropagation calculates the contribution of features to the cost function.

1. Error calculation: measures the difference between prediction and reality.


2. Gradient descent: For this, the network calculates the gradient of the error with respect to weights and
biases. This gradient indicates the direction in which the error decreases most rapidly.
3. Updating weights and biases: The network uses this gradient information to update weights and biases across
all layers.
4. Iterative process: with each iteration, the network weights and biases gradually approach the ideal values
that minimize the overall error.
Backpropagation
Backpropagation

• True value : y
• Cost : C= y - a (2)

y a(2)

C
Backpropagation

• True value : y
• Cost : C= y - a(2)
a(2)= α(z(2))

z(2)

y a(2)

C
Backpropagation

• True value : y
• Cost : C= y - a(2)
a(2)= α(z(2)) w2 a(1) b2
z(2)= w2 a(1)+ b2 z(2)

y a(2)

C
Backpropagation

• True value : y
• Cost : C= y - a(2) Z(1)
a(2)= α(z(2)) w2 a(1) b2
z(2)= w2 a(1)+ b2 z(2)
a(1)= α(z(1))
y a(2)

C
Backpropagation

w1 a(0) b1
• True value : y
• Cost : C= y - a(2) Z(1)
a(2)= α(z(2)) w2 a(1) b2
z(2)= w2 a(1)+ b2 z(2)
a(1)= α(z(1))
y a(2)
z(1)= w1 a(0)+ b1
C
Backpropagation
• We want to know how modifying an element in this tree will affect the cost function
Partial derivative.

= Partial derivative of C with respect to W2

• The cost function cannot be derived directly from the weights, as the relationship is
indirect (through several intermediate functions).
• This is why we use the chain rule, which allows us to split the derivative into several smaller
partial derivatives, layer by layer. This allows us to calculate the impact of the weights on the
error, and thus to adjust them correctly.
Backpropagation

= Partial derivative of C with respect to


W2

w1 a(0) b1

z(1)
w2 a(1) b2
z(2)

y a(2)

C
Backpropagation

= Partial derivative of C with respect to


W2

w1 a(0) b1

z(1)

w2 a(1) b2
z(2)

y a(2)

C
Backpropagation

= Partial derivative of C with respect to


W2

w1 a(0) b1

z(1)
w2 a(1) b2
• The chain rule allows you to "move up" gradually z(2)
in the network, by splitting the global derivative into products
y a(2)
of partial derivatives at each stage.
C
Backpropagation
Learning rate
• The learning rate is a fundamental parameter in training a machine.
neural network. It directly influences the speed and quality learning.
• The learning rate (often denoted) controls how much the network weights are modified with each
learning stage, depending on the error (the loss).
• This is a speed factor used in the gradient descent algorithm.

• The learning rate determines the amplitude of this correction.


Learning rate

cost
cost cost

w w w

• Very slow to learn • Fast, stable • The model may never

(various periods) convergence to a converge, or diverge


good minimum completely
Learning rate
-.

Learning rate
1. High learning rate= instability
• The model hops around the minimum.

2. Learning rate too high= divergence


• The learning rate is so high that every update skips far
over the minimum.

3. Learning rate too low= very slow learning


• The model learns, but takes a long time to
converge.
• Risk of wasting resources

4. Good learning rate= fast, stable convergence


• Minimizes loss quickly.
• The network learns efficiently without bursts or
instability
Gradient descent
For a positive gradient :

• As the gradient is positive, the subtraction effectively


reduces w and thus the cost function.

For a negative gradient :

• Since the gradient is negative, subtracting it increases


w. We therefore add it here to reduce the cost
function.
How a neural network works

You might also like