0% found this document useful (0 votes)

3 views

ml

The document explains key concepts in machine learning, including the differences between linearity and non-linearity, the roles of weights and biases in models, the function of loss functions and gradient descent in optimizing model performance, and the technique of batch normalization to stabilize training. It also covers dropout as a regularization method, L1 and L2 regularization to prevent overfitting, and the importance of hyperparameters in model tuning. Each concept is illustrated with examples to enhance understanding.

Uploaded by

bloodlife.24by7

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

3 views

ml

Uploaded by

bloodlife.24by7

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 10

Skip to content

Q1
ChatGPT
You said:
diff between linearity vs non linearity
ChatGPT said:

The concepts of linearity and non-linearity are essential in many fields like mathematics,
physics, and machine learning. Here's a breakdown of the difference between the two:

Linearity
A system or function is said to be linear if it satisfies two main properties:

1. Additivity (Superposition Principle): The output of the system for a sum of inputs is
the sum of the outputs for each individual input.

Mathematically:

f(x1+x2)=f(x1)+f(x2)f(x1+x2)=f(x1)+f(x2)
2. Homogeneity (Scaling): The output of the system for a scaled input is the scaled
output of the system for the original input.

Mathematically:

f(a⋅x)=a⋅f(x)f(a⋅x)=a⋅f(x)
A linear system or function is one where these two conditions hold. In simpler terms, if the
output of the system behaves predictably based on the inputs, it is considered linear.

Examples of linear systems/functions:

• Linear equations: y=mx+by=mx+b (where mm and bb are constants)

• Electrical circuits with resistors (Ohm’s Law: V=IRV=IR)

Non-Linearity
A system or function is non-linear if it does not satisfy the properties of linearity. Non-linear
systems are more complex because their behavior does not follow the simple principles of
additivity or homogeneity. In non-linear systems, small changes in inputs can lead to
disproportionately large or unpredictable changes in outputs.

Examples of non-linear systems/functions:

• Quadratic equations: y=ax2+bx+cy=ax2+bx+c
• Exponential functions: y=exy=ex
• Complex physical systems like weather patterns, chemical reactions, or neural
networks.

Key Differences:
• Linearity implies a predictable and proportional relationship between inputs and
outputs, while non-linearity can lead to unpredictable and disproportionate
responses.
• Linear systems are simpler to analyze and model, while non-linear systems often
require more advanced techniques and are more complex in their behavior.

In machine learning, for example, linear models like linear regression assume a straight-line
relationship between inputs and outputs, whereas non-linear models (e.g., decision trees,
neural networks) can capture more complex patterns in data.

Does this distinction make sense for your needs?

You said:
Q2
Explain weights and bias loss functions gradient descent
ChatGPT said:

Let’s break down weights, biases, loss functions, and gradient descent to understand how
they work together, especially in the context of machine learning (ML) and neural networks.

1. Weights and Biases

In machine learning, particularly in neural networks, weights and biases are parameters that
the model uses to make predictions.

• Weights: These are the values that are multiplied by the input features in the model.
They control the strength and direction of the relationship between the input features
and the output. In simple terms, weights determine how much influence each input
feature has on the predicted output.
• Biases: Biases allow the model to adjust its predictions regardless of the input.
Essentially, biases shift the model’s output and help it learn patterns better. They act
like an offset or a constant added to the weighted sum of inputs, enabling the model to
make predictions even when all the input values are zero.

Together, the weights and biases determine how the model transforms the input data into a
prediction.

2. Loss Function
The loss function (or cost function) is a way to measure how well the model’s predictions
match the true values. In other words, it calculates the "error" or "loss" between the predicted
output and the actual output. The goal of training a model is to minimize this loss.

• Common loss functions:

• Mean Squared Error (MSE): Used for regression tasks, calculates the
squared difference between predicted and actual values.
MSE=1n∑i=1n(yi−y^i)2MSE=n1i=1∑n(yi−y^i)2

where yiyi is the true value and y^iy^i is the predicted value.

• Cross-Entropy Loss: Commonly used for classification tasks, measures the

performance of a classification model whose output is a probability value
between 0 and 1.
Cross-Entropy=−∑iyilog⁡(y^i)Cross-Entropy=−i∑yilog(y^i)

where yiyi is the actual label and y^iy^i is the predicted probability.

3. Gradient Descent
Gradient Descent is an optimization algorithm used to minimize the loss function by
adjusting the weights and biases of the model. The idea is to move in the direction that
decreases the loss function the most, iteratively improving the model’s parameters.

Here’s how it works:

1. Calculate the Gradient: The gradient is the derivative (or slope) of the loss function
with respect to each parameter (weight or bias). The gradient tells you how much the
loss function will change if you change the parameters a little bit. For each weight and
bias, you compute the gradient to see which direction you need to move to minimize
the loss.
2. Update the Parameters: Once you have the gradients, you update the weights and
biases in the opposite direction of the gradient. This is because you want to move
towards the minimum of the loss function (i.e., the point where the loss is as low as
possible).

The update rule is:

θ=θ−η⋅∇θJ(θ)θ=θ−η⋅∇θJ(θ)
where:

• θθ represents the model’s parameters (weights and biases),

• ηη is the learning rate (how big a step to take in each iteration),
• ∇θJ(θ)∇θJ(θ) is the gradient of the loss function with respect to θθ.
3. Repeat: This process of calculating gradients and updating parameters is repeated for
many iterations (or epochs), with the model gradually converging to the optimal
weights and biases that minimize the loss function.

Putting it All Together

In a neural network or any ML model:

• The weights and biases start off with random values.

• The loss function measures how far the model's predictions are from the actual
values.
• Gradient descent is used to update the weights and biases iteratively, moving
towards the minimum of the loss function, thereby improving the model's accuracy.

Example:

Imagine you're training a simple linear regression model to predict house prices based on
square footage. The model’s equation might be:

y=w⋅x+by=w⋅x+b
where:

• yy is the predicted price,

• ww is the weight (slope),
• xx is the square footage of the house,
• bb is the bias (intercept).

The loss function (mean squared error, for example) will tell you how far off your
predictions are from the actual house prices.

Gradient descent will then adjust ww (weight) and bb (bias) by calculating the gradient of
the loss function and making small updates to reduce the error.

Key Points:
• Weights control the input’s influence on the output.
• Biases shift the output to make the model more flexible.
• Loss function measures how well the model is performing.
• Gradient Descent adjusts weights and biases to minimize the loss and improve the
model’s performance.
Does that make sense? Would you like a more detailed explanation on any of these
components?

You said:
Q3
Explain batch normalisation with suitable example
ChatGPT said:

Batch Normalization (BatchNorm)

Batch Normalization is a technique used to improve the training of deep neural networks by
normalizing the inputs to each layer. It helps to accelerate the training, improve stability, and
sometimes even allow for higher learning rates. It was introduced by Sergey Ioffe and
Christian Szegedy in 2015.

The main goal of Batch Normalization is to address the internal covariate shift, which
refers to the problem of the distribution of activations changing during training as the
parameters of the network change.

Here’s how it works and why it’s useful:

What is Batch Normalization?

• Normalizing activations: BatchNorm normalizes the input to each layer by scaling
and shifting the activations based on the statistics (mean and variance) of the current
mini-batch.
• Helps with faster training: It makes the learning process more stable and reduces the
need for careful weight initialization.
• Reduces dependence on initialization: BatchNorm allows the network to use higher
learning rates without risk of diverging.
• Improves convergence: By stabilizing the distribution of activations, it helps the
network converge faster.

How Does Batch Normalization Work?

For each mini-batch during training, BatchNorm performs the following steps:

1. Compute the Mean and Variance:

• For each feature (or neuron in the layer), compute
the mean and variance over the mini-batch.
• Let's say X=[x1,x2,...,xN]X=[x1,x2,...,xN] is a mini-batch, where NN is
the number of samples in the mini-batch.
• The mean μμ and variance σ2σ2 are computed as:
μ=1N∑i=1Nxiμ=N1i=1∑Nxiσ2=1N∑i=1N(xi−μ)2σ2=N1i=1∑N(xi
−μ)2
2. Normalize the Inputs:
• Subtract the mean from each input and divide by the standard deviation
(square root of variance) to normalize the data:
xi^=xi−μσ2+ϵxi^=σ2+ϵxi−μ

where ϵϵ is a small value added to avoid division by zero.

3. Scale and Shift:

• After normalizing, we apply a scale factor γγ and a shift factor ββ to allow
the model to still express more complex transformations. These parameters are
learned during training.
yi=γ⋅xi^+βyi=γ⋅xi^+β
• γγ and ββ are parameters that help the model maintain the representational
capacity of the network, despite normalization.
4. Use Running Statistics During Inference:
• During training, batch statistics (mean and variance) are computed for each
mini-batch.
• During inference (testing or prediction), instead of using batch statistics, we
use running averages of the mean and variance computed during training. This
is to ensure that predictions are not dependent on mini-batches and are
consistent across different test data.

Why Batch Normalization Helps

• Stabilizes learning: By ensuring that activations do not vary too much across layers,
BatchNorm helps the network train more stably.
• Accelerates convergence: It allows the use of higher learning rates because it
mitigates the risk of diverging due to large gradients.
• Reduces dependence on weight initialization: Normally, weights need to be
initialized very carefully to prevent exploding or vanishing gradients. BatchNorm
reduces the sensitivity to this.
• Regularization effect: BatchNorm has a slight regularizing effect because of the
noise introduced by the mini-batch statistics, which can help prevent overfitting.

Example of Batch Normalization

Let’s say you are training a neural network for image classification using a dataset like
CIFAR-10. This dataset contains 60,000 32x32 pixel images, with 10 possible classes. You
might build a simple CNN with several layers, such as:

1. Convolutional layer → 32 filters, 3x3 kernel

2. ReLU activation
3. Batch Normalization
4. MaxPooling layer → 2x2
5. Fully connected layer → 128 neurons
6. Batch Normalization
7. Output layer → Softmax for classification

In this example:

• After the first convolutional layer and ReLU activation, the activations might be very
large or very small, making the training process unstable or slow.
• Applying Batch Normalization after the ReLU activation normalizes the activations,
which makes them more stable and easier to train.
• During training, the mini-batch statistics (mean and variance) will be used to
normalize the activations for each mini-batch. During inference, the running averages
computed during training are used.

Without BatchNorm:

• The activations may change significantly between training and inference, causing the
model to perform inconsistently.

With BatchNorm:

• The activations are normalized during training, so the network can learn faster and
perform more stably both during training and inference.

Summary of Steps with Example

1. Input Layer: You pass an image through the network.
2. First Convolutional Layer: It produces a feature map.
3. Batch Normalization:
• Calculate the mean and variance of the feature map across the mini-batch.
• Normalize each feature map element using the calculated mean and variance.
• Scale and shift the normalized values using learnable parameters γγ and ββ.
4. Activation (e.g., ReLU) is applied.
5. The network continues through other layers and eventually makes a prediction.

Key Points:
• Normalization of activations across mini-batches.
• Scaling and shifting after normalization with learnable parameters.
• Helps with faster convergence and better generalization.

BatchNorm is often placed after the activation function (e.g., ReLU) and before the next
layer in a neural network. It is particularly effective in deep networks where training can be
very slow without normalization.
Would you like a more specific mathematical example or a deeper dive into how BatchNorm
is used in a particular type of model?

You said:

Q4
Explain following in details - 1)dropout ,2)L1 RR and L2 regulations,3 ) momentum Turning
,4) hyperparameters
ChatGPT said:

Let's go through each of these concepts in detail:

1. Dropout
Dropout is a regularization technique used to prevent overfitting in neural networks by
randomly "dropping out" (setting to zero) a fraction of the neurons during training. The idea
is to reduce reliance on specific neurons and encourage the network to learn more robust and
generalized features.

How Dropout Works:

• During training, for each training sample, a certain percentage (dropout rate) of
neurons in the layer are randomly set to zero.
• The neurons that remain active (non-zero) will contribute to the forward pass and
backpropagation as usual.
• The dropped-out neurons are reintroduced in the next training iteration, but since they
were dropped out randomly in the current one, the model cannot rely on any particular
neuron, which forces it to generalize better.
• At inference time (during testing or prediction), dropout is turned off, and the full
network is used. However, to compensate for the fact that some neurons were
"dropped" during training, the remaining neurons are scaled by a factor of 11−p1−p1,
where pp is the dropout rate (probability of dropout).

Benefits:

• Prevents overfitting by reducing the reliance on any particular neuron or feature.

• Improves generalization by forcing the model to learn distributed representations of
data.

Example:
If you have a fully connected layer with 10 neurons, and you set a dropout rate of 0.5, then on
any given forward pass, only half of the neurons (on average) will be active. The other half
will be dropped out randomly.
2. L1 Regularization (L1 RR) and L2 Regularization (L2 RR)
Regularization techniques like L1 and L2 regularization are used to penalize large weights,
which helps to prevent overfitting by keeping the model from becoming too complex.

L1 Regularization (L1 RR):

• L1 regularization adds a penalty equal to the absolute value of the weights. It

encourages sparsity in the model because the absolute value of the weights forces
some of them to be exactly zero, which leads to feature selection.
• The regularization term is λ∑i∣wi∣λ∑i∣wi∣, where λλ is the regularization strength
and wiwi are the model's weights.
• Effect: L1 regularization tends to produce sparse models with many weights being
exactly zero, which can be useful for feature selection.

L2 Regularization (L2 RR):

• L2 regularization adds a penalty equal to the squared value of the weights. It

discourages large weights and tries to keep all weights small, but it doesn't tend to
produce exact zeros.
• The regularization term is λ∑iwi2λ∑iwi2.
• Effect: L2 regularization helps to prevent overfitting by discouraging the model from
assigning too much importance to any single feature. It generally leads to small but
non-zero weights.

Combining L1 and L2 (Elastic Net):

• In many cases, a combination of L1 and L2 regularization is used, which is known

as Elastic Net regularization. This combines the benefits of both regularization
techniques—sparsity from L1 and weight shrinkage from L2.

3. Momentum
Momentum is a technique used to accelerate gradient descent, which helps the model
converge faster and avoid getting stuck in local minima or saddle points. Momentum helps
smooth out the updates to the weights by considering past gradients, effectively adding a
"memory" of previous updates.

How Momentum Works:

• In standard gradient descent, the weight update for each parameter θθ is simply the
negative gradient of the loss function with respect to that parameter.
θ=θ−η⋅∇θLθ=θ−η⋅∇θL

where ηη is the learning rate and ∇θL∇θL is the gradient of the loss function.
• In momentum, instead of using the gradient directly, we maintain a "velocity"
term vv, which is a running average of previous gradients:
vt=β⋅vt−1+(1−β)⋅∇θLvt=β⋅vt−1+(1−β)⋅∇θL
where:

• vtvt is the velocity at time step tt,

• ββ is the momentum term (typically close to 1, e.g., 0.9),
• ∇θL∇θL is the current gradient.

The weight update then becomes:

θ=θ−η⋅vtθ=θ−η⋅vt

Benefits of Momentum:

• Helps accelerate convergence in the relevant direction.

• Helps avoid oscillations in the gradient direction.
• Helps escape shallow local minima and saddle points by using past gradients.

Example:
Suppose you're training a neural network on a very steep slope in one direction and a flat dire
ction in another. With momentum, the updates will continue

Attach
Search
Reason
ChatGPT can make mistakes. Check important info.
?

Data Build Tool (DBT)
No ratings yet
Data Build Tool (DBT)
65 pages
AIML Unit-I Jntuk R20
100% (1)
AIML Unit-I Jntuk R20
28 pages
DL Unit-2
No ratings yet
DL Unit-2
24 pages
Federated Learning Book
No ratings yet
Federated Learning Book
209 pages
CS601 - Machine Learning - Unit 2 - Notes - 1672759753
No ratings yet
CS601 - Machine Learning - Unit 2 - Notes - 1672759753
14 pages
CS601 Machine Learning Unit 2 Notes 1672759753
No ratings yet
CS601 Machine Learning Unit 2 Notes 1672759753
14 pages
Mid 1 DL Notes
No ratings yet
Mid 1 DL Notes
15 pages
HODL Lec 2 Training NNs Intro TF
No ratings yet
HODL Lec 2 Training NNs Intro TF
83 pages
Linear Classifier: by Dr. Sanjeev Kumar Associate Professor Department of Mathematics IIT Roorkee, Roorkee-247 667, India
No ratings yet
Linear Classifier: by Dr. Sanjeev Kumar Associate Professor Department of Mathematics IIT Roorkee, Roorkee-247 667, India
86 pages
EE769 7 Introduction To Neural Networks
No ratings yet
EE769 7 Introduction To Neural Networks
52 pages
Domnic Object Detecion Basics
No ratings yet
Domnic Object Detecion Basics
62 pages
Neural network intro lecture 4
No ratings yet
Neural network intro lecture 4
46 pages
NNDL Unit 1 to unit - 3
No ratings yet
NNDL Unit 1 to unit - 3
75 pages
NN Concepts
No ratings yet
NN Concepts
4 pages
DL UNIT 2
No ratings yet
DL UNIT 2
46 pages
Neural Networks - 2
No ratings yet
Neural Networks - 2
79 pages
A Weight Decides How Much Influence The Input Will Have On The Output
No ratings yet
A Weight Decides How Much Influence The Input Will Have On The Output
1 page
Convolutional Neural Network
100% (1)
Convolutional Neural Network
59 pages
Neural Network - Optimization DRAFT 3.11
No ratings yet
Neural Network - Optimization DRAFT 3.11
66 pages
Understanding and Creating Neural Networks
No ratings yet
Understanding and Creating Neural Networks
69 pages
00005187-Deep Learning PPT
No ratings yet
00005187-Deep Learning PPT
11 pages
Week - 5 (Deep Learning) Q. 1) Explain The Architecture of Feed Forward Neural Network or Multilayer Perceptron. (12 Marks)
No ratings yet
Week - 5 (Deep Learning) Q. 1) Explain The Architecture of Feed Forward Neural Network or Multilayer Perceptron. (12 Marks)
7 pages
FALLSEM2023-24 CSE4020 ELA VL2023240104096 2023-09-07 Reference-Material-I
No ratings yet
FALLSEM2023-24 CSE4020 ELA VL2023240104096 2023-09-07 Reference-Material-I
7 pages
Introduction To Neural Network
No ratings yet
Introduction To Neural Network
20 pages
DL UNIT-I
No ratings yet
DL UNIT-I
30 pages
DeepLearning Recap
No ratings yet
DeepLearning Recap
104 pages
Unit 2
No ratings yet
Unit 2
36 pages
26 Neural Nets
No ratings yet
26 Neural Nets
77 pages
Unit 2
No ratings yet
Unit 2
31 pages
week 06 - Deep Feedforward Networks - Optimization
No ratings yet
week 06 - Deep Feedforward Networks - Optimization
83 pages
Unit IV BPA GD
No ratings yet
Unit IV BPA GD
12 pages
Different Activation Functions With The Equations
No ratings yet
Different Activation Functions With The Equations
6 pages
Week 2 Artificial Neural Networks
No ratings yet
Week 2 Artificial Neural Networks
62 pages
Ch2-Training, Optimization and Regularization of DNN-new (1)
No ratings yet
Ch2-Training, Optimization and Regularization of DNN-new (1)
114 pages
Unit 2 - Machine Learning
No ratings yet
Unit 2 - Machine Learning
19 pages
Deep Learning (1)
No ratings yet
Deep Learning (1)
19 pages
Linear Regression
No ratings yet
Linear Regression
29 pages
DL Notes
No ratings yet
DL Notes
16 pages
Gradient Descent Deep Learning: by T.K. Damodharan Vice President, RBS Reg - No: PC2013003013008
No ratings yet
Gradient Descent Deep Learning: by T.K. Damodharan Vice President, RBS Reg - No: PC2013003013008
37 pages
Linear Models (Unit II) Chapter III 1
No ratings yet
Linear Models (Unit II) Chapter III 1
24 pages
Activation Function To Back Pro
No ratings yet
Activation Function To Back Pro
22 pages
Kagan Lecture2
No ratings yet
Kagan Lecture2
118 pages
Deep Learning Tutorial 9
No ratings yet
Deep Learning Tutorial 9
70 pages
Machine Learning - SoS 2017
No ratings yet
Machine Learning - SoS 2017
15 pages
Unit 2
No ratings yet
Unit 2
18 pages
ML3 Unit 4-3
No ratings yet
ML3 Unit 4-3
13 pages
DL_Unit2
No ratings yet
DL_Unit2
113 pages
ML - WEEK 06
No ratings yet
ML - WEEK 06
31 pages
NLP-NeuralNetworks Reading Notes
No ratings yet
NLP-NeuralNetworks Reading Notes
13 pages
Lesson 3 Artificial Neural Network
No ratings yet
Lesson 3 Artificial Neural Network
77 pages
tutorial 1,2
No ratings yet
tutorial 1,2
12 pages
CS460 - Deep Learning - W02 & W03
No ratings yet
CS460 - Deep Learning - W02 & W03
44 pages
Deep Learning (All in One)
No ratings yet
Deep Learning (All in One)
23 pages
Neural Net 3rdclass
No ratings yet
Neural Net 3rdclass
35 pages
EPS-DL-Handout3-Build ANN From Scratch Basics
No ratings yet
EPS-DL-Handout3-Build ANN From Scratch Basics
25 pages
Neural Network
No ratings yet
Neural Network
14 pages
Gradient Descent
No ratings yet
Gradient Descent
13 pages
L3_CSE256_FA24_FFN
No ratings yet
L3_CSE256_FA24_FFN
64 pages
NeuralNetworks
No ratings yet
NeuralNetworks
29 pages
Intro To Machine Learning With PyTorch
No ratings yet
Intro To Machine Learning With PyTorch
48 pages
Machine Learning Notes Cs229 1
No ratings yet
Machine Learning Notes Cs229 1
217 pages
Backpropagation: Fundamentals and Applications for Preparing Data for Training in Deep Learning
From Everand
Backpropagation: Fundamentals and Applications for Preparing Data for Training in Deep Learning
Fouad Sabry
No ratings yet
A Conversation About Calculus
From Everand
A Conversation About Calculus
Ginachukwu Amah
No ratings yet
Data Science in Agriculture and Natural Resource Management 1st Edition G P Obi Reddy Mehul S Raval J Adinarayana Sanjay Chaudhary instant download
100% (2)
Data Science in Agriculture and Natural Resource Management 1st Edition G P Obi Reddy Mehul S Raval J Adinarayana Sanjay Chaudhary instant download
48 pages
Uber Data Analysis Using Python
No ratings yet
Uber Data Analysis Using Python
24 pages
Future Of: Cloud Computing
No ratings yet
Future Of: Cloud Computing
53 pages
Flow-Based Programming For Machine Learning
No ratings yet
Flow-Based Programming For Machine Learning
30 pages
Ai Use Cases Ebook
100% (1)
Ai Use Cases Ebook
25 pages
Naïve Bayes Classifier Algorithm
No ratings yet
Naïve Bayes Classifier Algorithm
11 pages
Spring Assignment 2024
No ratings yet
Spring Assignment 2024
12 pages
Acs Molpharmaceut 6b00248
No ratings yet
Acs Molpharmaceut 6b00248
7 pages
The Impacts of Artificial Intelligence On Employment and Workforce
No ratings yet
The Impacts of Artificial Intelligence On Employment and Workforce
5 pages
Tarun Sharma CV
No ratings yet
Tarun Sharma CV
2 pages
Multiple Choice Questions
No ratings yet
Multiple Choice Questions
56 pages
AI in Fraud Detection: Challenges in Banking and Financial Sectors
No ratings yet
AI in Fraud Detection: Challenges in Banking and Financial Sectors
11 pages
Different Artificial Neural Networks Architectures
No ratings yet
Different Artificial Neural Networks Architectures
27 pages
CHAPTER 1 Diabetes
No ratings yet
CHAPTER 1 Diabetes
10 pages
Digital Economy Research Final
No ratings yet
Digital Economy Research Final
18 pages
Python Mini 2023
No ratings yet
Python Mini 2023
3 pages
Deep Learning With Python File
No ratings yet
Deep Learning With Python File
22 pages
Lec-8 Price Optimization
No ratings yet
Lec-8 Price Optimization
9 pages
Driver Drowiness
No ratings yet
Driver Drowiness
20 pages
16 Duke LTech Rev 85
No ratings yet
16 Duke LTech Rev 85
16 pages
projectreport
No ratings yet
projectreport
4 pages
Hyper-Parameter Optimization: A Review of Algorithms and Applications
No ratings yet
Hyper-Parameter Optimization: A Review of Algorithms and Applications
56 pages
Firas Al-Azizy ML Assignment 1
No ratings yet
Firas Al-Azizy ML Assignment 1
12 pages
Gayathri Sambrajyam R. v. L.
No ratings yet
Gayathri Sambrajyam R. v. L.
20 pages
diabets project document3
No ratings yet
diabets project document3
60 pages
Detection of Face Mask and Glass Using Deep Learning Algorithm
No ratings yet
Detection of Face Mask and Glass Using Deep Learning Algorithm
7 pages
Unsupervised Machine Learning-UNITIV
No ratings yet
Unsupervised Machine Learning-UNITIV
22 pages

ml

Uploaded by

ml

Uploaded by

Skip to content

Examples of linear systems/functions:

• Linear equations: y=mx+by=mx+b (where mm and bb are constants)

Examples of non-linear systems/functions:

Does this distinction make sense for your needs?

1. Weights and Biases

• Common loss functions:

• Cross-Entropy Loss: Commonly used for classification tasks, measures the

Here’s how it works:

The update rule is:

• θθ represents the model’s parameters (weights and biases),

Putting it All Together

• The weights and biases start off with random values.

• yy is the predicted price,

Batch Normalization (BatchNorm)

Here’s how it works and why it’s useful:

What is Batch Normalization?

How Does Batch Normalization Work?

1. Compute the Mean and Variance:

where ϵϵ is a small value added to avoid division by zero.

3. Scale and Shift:

Why Batch Normalization Helps

Example of Batch Normalization

1. Convolutional layer → 32 filters, 3x3 kernel

Summary of Steps with Example

Let's go through each of these concepts in detail:

How Dropout Works:

• Prevents overfitting by reducing the reliance on any particular neuron or feature.

L1 Regularization (L1 RR):

• L1 regularization adds a penalty equal to the absolute value of the weights. It

L2 Regularization (L2 RR):

• L2 regularization adds a penalty equal to the squared value of the weights. It

Combining L1 and L2 (Elastic Net):

• In many cases, a combination of L1 and L2 regularization is used, which is known

How Momentum Works:

• vtvt is the velocity at time step tt,

The weight update then becomes:

• Helps accelerate convergence in the relevant direction.

You might also like