0% found this document useful (0 votes)
4 views

Deep Learning

The document provides an overview of deep learning and neural networks, detailing their structure, types, and functions such as forward and backward propagation. It discusses various activation functions, loss functions, optimizers, and regularization methods, highlighting their importance in training neural networks. Additionally, it addresses the limitations of artificial neural networks, including computational complexity and the need for large datasets.

Uploaded by

221210088
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views

Deep Learning

The document provides an overview of deep learning and neural networks, detailing their structure, types, and functions such as forward and backward propagation. It discusses various activation functions, loss functions, optimizers, and regularization methods, highlighting their importance in training neural networks. Additionally, it addresses the limitations of artificial neural networks, including computational complexity and the need for large datasets.

Uploaded by

221210088
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 20

DEEP LEARNING VS MACHINE LEARNING:

DEEP LEARNING ?
NEURAL NETWORKS ?
A neural network is a computational model inspired by the human brain
that is used in machine learning and artificial intelligence. It consists of
layers of interconnected neurons (also called nodes), which process and
transmit information.

Basic Structure of a Neural Network:

1.​ Input Layer – Takes in data.


2.​ Hidden Layers – Perform computations using weights, biases, and
activation functions.
3.​ Output Layer – Produces the final result.

Neural networks learn by adjusting weights through a process called


backpropagation, using optimization techniques like gradient descent.
They are widely used in tasks like image recognition, natural language
processing, and predictive analytics.

ANN (artificial nn) : works like neural networks


CNN (convolutional nn) : image classification, object detection
RNN (recurrent nn) : speech/text/audio detection
GAN (generate ann) : text/images

ANN >>
Artificial Neural Networks contain artificial neurons which are called units.
Artificial Neural Network has an input layer, an output layer as well as
hidden layers. The input layer receives data from the outside world which
the neural network needs to analyze or learn about. Then this data passes
through one or multiple hidden layers that transform the input into data that
is valuable for the output layer. Finally, the output layer provides an output
in the form of a response of the Artificial Neural Networks to input data
provided. As the data transfers from one unit to another, the neural network
learns more and more about the data which eventually results in an output
from the output layer.
​ ​ ​ ​ STRUCTURE OF ANN

PERCEPTRON :
A perceptron is a neural network unit that does a precise computation to
detect features in the input data. Perceptron is mainly used to classify the
data into two parts. Therefore, it is also known as Linear Binary Classifier.
Perceptron uses the step function. The activation function is used to map
the input between the required value like (0, 1) or (-1, 1).
o Input value or One input layer: The input layer of the perceptron is made
of artificial input neurons and takes the initial data into the system for
further processing.
o Weights and Bias:
Weight: It represents the dimension or strength of the connection between
units. If the weight to node 1 to node 2 has a higher quantity, then neuron 1
has a more considerable influence on the neuron.
Bias: It is the same as the intercept added in a linear equation. It is an
additional parameter which task is to modify the output along with the
weighted sum of the input to the other neuron.
o Net sum: It calculates the total sum.
o Activation Function: A neuron can be activated or not, is determined by
an activation function. The activation function calculates a weighted sum
and further adding bias with it to give the result
LIMITATIONS OF ANN ;
1. Computational Complexity

●​ ANNs require high computational power, especially for deep


networks.
●​ Training large models is time-consuming and demands GPUs/TPUs.

2. Large Training Data Requirement

●​ ANNs need a large amount of labeled data for effective learning.


●​ Insufficient data can lead to overfitting or poor generalization.

3. Black Box Nature

●​ Neural networks work like a black box—it is difficult to interpret how


decisions are made.
●​ Lack of explainability makes them hard to debug or trust in critical
applications.

4. Overfitting and Underfitting

●​ Overfitting: ANN learns too much from training data and fails on new
data.
●​ Underfitting: ANN fails to capture complex patterns due to insufficient
training.

5. Local Minima in Optimization


●​ The training process relies on gradient descent, which may get stuck
in local minima, affecting model performance.

6. Hyperparameter Sensitivity

●​ ANN performance depends on choosing optimal hyperparameters


(learning rate, number of layers, neurons, etc.).
●​ Tuning hyperparameters is complex and time-consuming.

7. Lack of Standardization

●​ No single architecture fits all problems.


●​ Model design and tuning require trial-and-error approaches.

8. High Energy Consumption

●​ Training deep ANNs consumes a lot of electricity, making them less


sustainable.

9. Poor Performance on Small Datasets

●​ Unlike traditional machine learning models, ANNs struggle when data


is limited.
●​ Small datasets can lead to poor generalization and unreliable
predictions.

10. Sensitivity to Noisy Data

●​ ANNs can misinterpret noisy or irrelevant data, leading to incorrect


predictions.
●​ Data preprocessing and feature engineering are crucial to avoid this
issue

FORWARD PROPAGATION :
Forward propagation (or forward pass) refers to the calculation and storage of
intermediate variables (including outputs) for a neural network in order from the input
layer to the output layer.
Forward is same as normal neural network.
Weights, bias, activation function and all

BACKWARD PROPAGATION :
Backpropagation (Backward Propagation of Errors) is a supervised learning
algorithm used to train artificial neural networks. It minimizes the error by
adjusting weights and biases through gradient descent.

NUMERICALS

ACTIVATION FUNCTION :
If hum ye func use nahi krenge toh hmara neural network non linear data
capture nahi kr payega sirf linear data hi capture krega.
An activation function determines the output of a neuron in a neural network
by adding non-linearity, enabling the network to learn complex patterns from
the data.
Types of Activation Functions

1. Linear Activation Function

Linear Activation Function resembles straight line define by y=x


The range of the output spans from (−∞ to +∞).

2. Non-Linear Activation Functions

(i) Sigmoid Function

The output ranges between 0 and 1, hence useful for binary classification.

(ii) Tanh Activation Function

Outputs values from -1 to +1.

(iii) ReLU (Rectified Linear Unit) Function

is defined by A(x)=max⁡(0,x), this means that if the input x is positive, ReLU


returns x, if the input is negative, it returns 0.
Value Range: [0,∞), meaning the function only outputs non-negative values
ReLU is less computationally expensive than tanh and sigmoid because it
involves simpler mathematical operations.

3. Exponential Linear Units

(i) Softmax Function

to handle multi-class classification problems.

It transforms raw output scores from a neural network into probabilities. It


works by squashing the output values of each class into the range of 0 to 1,
while ensuring that the sum of all probabilities equals 1.
Softmax is a non-linear activation function.
ReLU outputs positive values directly
and zero for negatives, while Tanh maps inputs between -1 and 1.
LOSS FUNCTION :
Loss quantifies how well a model performs during a training phase.
loss functions measure the difference between predicted and actual values.
Mean absolute error (MAE)​
Also known as L1 loss, this loss function calculates the average absolute
difference between predicted and actual values.

Mean squared error (MSE)​


This loss function calculates the square of the difference between predicted
and actual values.

Huber loss (Combination of MSE and MAE)​


Also known as smooth L1 loss, this loss function combines the strengths of
MAE and MSE.

Hinge loss​
This loss function works well for classification problems when target values
are in the set of {-1,1}.
Binary cross entropy​
This loss function measures the difference between predicted binary
outcomes and actual values.

Categorical cross entropy​


This loss function measures the difference between predicted and actual
output in categorical form.

OPTIMISER:
Improve the speed of training. Minimise the loss function like gradient
descent. There are three types of gradient descent :
Batch : entire dataset
Stochastic : data ko dekhte ho after every data point
Mini batch: batch size update after each batch size

Challenges in these were:


●​ Deciding learning rate was difficult
●​ To solve above we made learning rate scheduler but usme hume
predefined dena pdta h training se pehle hi
●​ Multidimensional me minima dhunda tough h
●​ Local minima se niklne ka koi chance ni h

SDG: Stochastic Gradient Descent (SGD) is a basic optimization algorithm


used to minimize the loss function by updating model parameters based on
the gradient of the loss function. Unlike Batch Gradient Descent, which uses
the entire dataset, SGD updates the parameters using a single data point per
iteration, making it more efficient for large datasets.
ADAM: (adaptive moment estimation) Adam (Adaptive Moment
Estimation) is an optimization algorithm that combines the advantages
of SGD (Stochastic Gradient Descent) with Momentum and RMSprop.
It is widely used in deep learning because it adapts the learning rate for
each parameter dynamically, leading to faster and more stable
convergence.

●​ Merges the momentum and learning rate decay concept


●​ Formula :

●​
●​ Bias correction :

●​ Learning rate is 0.1, 0.01

Regularization methods :
It is a technique used to reduce errors by fitting the function appropriately
on the given training set and avoiding overfitting.
L1:(Lasso) Least absolute shrinkage and selection operator. It adds
the absolute value of magnitude of the coefficient as a penalty term to the
loss function. It also helps us achieve feature selection by penalizing the
weights approx to zero that does not serve any purpose in the model.
Best for: Sparse models where feature selection is important.

L2 : (Ridge) It adds the squared magnitude of the coefficient as a penalty


term to the loss function. Encourages small weight values instead of
zeroing them out.Helps reduce model complexity without eliminating
features.Best for: Preventing overfitting in deep networks.

Dropout : Dropout is a regularization technique that randomly drops neurons


during training to prevent overfitting.

During Training:
●​ Each neuron is randomly deactivated (set to 0) with probability ppp
(e.g., p=0.2p = 0.2p=0.2 means 20% of neurons are dropped).
●​ The remaining active neurons are scaled up by 11−p\frac{1}{1 - p}1−p1​
to maintain the overall scale of activations.

During Testing:

●​ No neurons are dropped.


●​ The full network is used, ensuring that learned representations are
stable.

When to Use Dropout?



✅ Deep Neural Networks (DNNs) with many parameters.​

✅ Overfitting is observed (training accuracy >> validation accuracy).​


Fully connected layers in CNNs (not always in convolutional layers).


❌ When NOT to use Dropout?​

❌ IfNottheusually
dataset is small, dropout may remove too much information.​
needed in Batch Normalization, as BN already stabilizes activations.

Batch Normalisation :

Batch Normalization (BatchNorm) normalizes activations (bw hidden layers) in a


neural network to improve training speed and stability. It reduces internal covariate
shift, allowing for higher learning rates and faster convergence. The main idea is to
normalise the input of each layer across a mini batch of data. This reduces the
covariate shift between the layers.

It speeds up training.

Decreases importance of initial weights,

Regularise the model


Hyperparameter tuning :
Hyperparameters are external settings in a neural network (not learned from
data) that affect model performance. Hyperparameter tuning is the process of
finding the best combination of these values to optimize performance.

1. Learning Rate (η)

●​ Controls the step size of weight updates during training.


●​ Too high → Convergence is unstable.
●​ Too low → Training is slow.

Tuning Strategy:

●​ Start with 0.001 for Adam, 0.01 for SGD.


●​ Use a learning rate schedule (e.g., decay over time).

2. Batch Size

●​ Number of training samples processed before updating weights.


●​ Small batch size (e.g., 32) → Noisy but better generalization.
●​ Large batch size (e.g., 512) → Faster but may overfit.
Tuning Strategy:

●​ Use 32, 64, or 128 (common choices).


●​ Try small batches for better generalization.

3. Number of Epochs

●​ Too few epochs → Underfitting.


●​ Too many epochs → Overfitting.

Tuning Strategy:

●​ Use early stopping to stop when validation loss stops improving.

4. Optimizer (SGD vs. Adam)

●​ SGD → Works well for large datasets, better generalization.


●​ Adam → Faster convergence, good for deep networks.

Tuning Strategy:

●​ Default: Adam (start with it).


●​ Try SGD with momentum if generalization is poor.

5. Number of Hidden Layers and Neurons

●​ Too few layers/neurons → Underfitting.


●​ Too many layers/neurons → Overfitting, slow training.

Tuning Strategy:

●​ Start with 2-3 hidden layers (for most tasks).


●​ Use power of 2 neurons (e.g., 64, 128, 256).
●​ Use dropout & regularization to prevent overfitting.
6. Activation Functions

●​ ReLU → Best for hidden layers.


●​ Sigmoid/Tanh → Only for output in binary tasks.
●​ Softmax → For multi-class classification.

You might also like