0% found this document useful (0 votes)
4 views

Lecture 1.3

The document discusses normalization and standardization techniques used in machine learning to adjust feature scales, ensuring equal contribution to models. It also covers overfitting and underfitting, which describe how well a model generalizes to new data, along with their symptoms, causes, and prevention strategies. Normalization scales data to a specific range, while standardization transforms data to have a mean of 0 and a standard deviation of 1.

Uploaded by

mous7457
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views

Lecture 1.3

The document discusses normalization and standardization techniques used in machine learning to adjust feature scales, ensuring equal contribution to models. It also covers overfitting and underfitting, which describe how well a model generalizes to new data, along with their symptoms, causes, and prevention strategies. Normalization scales data to a specific range, while standardization transforms data to have a mean of 0 and a standard deviation of 1.

Uploaded by

mous7457
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 11

Lecture 1.

• Normalization and Standardization


• Overfitting and Underfitting

Dr. Mainak Biswas


Normalization and Standardization
• Both Normalization and Standardization are
techniques used to adjust the scale of features
in a dataset
• They are crucial in machine learning to ensure
that all features contribute equally to the
model and prevent any feature from
dominating due to its scale

Dr. Mainak Biswas


Normalization
• Normalization (also called Min-Max Scaling) is the
process of transforming features such that they lie within
a specific range, typically [0, 1] or [-1, 1]
• This is done by scaling the data to a fixed range based on
the minimum and maximum values of the feature
• Formula:
𝑥 − min⁡(𝑥)
𝑥′ =
max 𝑥 − min⁡(𝑥)
where x is the original value, min(x)is the minimum value,
and max(x) is the maximum value in the dataset
• Usage: Algorithms like k-Nearest Neighbors (k-NN), and
Neural Networks, which are sensitive to the scale of
features.
Dr. Mainak Biswas
Normalization Example
𝒙 − 𝐦𝐢𝐧⁡(𝒙)
SL Values 𝒙= Normalized
𝐦𝐚𝐱 𝒙 − 𝐦𝐢𝐧⁡(𝒙)
Values
𝟏𝟎 − 𝟏𝟎
1 10 0
𝟓𝟎 − 𝟏𝟎
𝟐𝟎 − 𝟏𝟎
2 20 0.25
𝟓𝟎 − 𝟏𝟎
𝟑𝟎 − 𝟏𝟎
3 30 0.50
𝟓𝟎 − 𝟏𝟎
𝟒𝟎 − 𝟏𝟎
4 40 0.75
𝟓𝟎 − 𝟏𝟎

𝟓𝟎 − 𝟏𝟎
5 50 1.00
𝟓𝟎 − 𝟏𝟎

Dr. Mainak Biswas


Standardization
• Standardization: Transforming data to have a mean of 0 and
a standard deviation of 1 (also known as Z-Score Scaling)
• It centers the data and scales it based on standard
deviation
• Formula:

𝑥−𝜇
𝑥 =
𝜎
where 𝜇 is the mean and 𝜎 is the standard deviation of the
dataset
• Usage: Algorithms like Support Vector Machines (SVM),
Logistic Regression, and Principal Component Analysis
(PCA) which assume a normal distribution or work better
with data centered around 0.

Dr. Mainak Biswas


Dr. Mainak Biswas
Standardization Example
SL Values Mean Standard Deviation 𝑥−𝜇 Standardize
𝑥′ =
(𝜇) (𝜎) 𝜎 d Values
(10 𝟏𝟎 − 𝟑𝟎
1 10 𝟏𝟎 − 𝟑𝟎 𝟐 −1.41
+ 𝟏𝟒. 𝟏𝟒
+
20 𝟐𝟎 − 𝟑𝟎 𝟐 𝟐𝟎 − 𝟑𝟎
2 20 + +
−0.71
𝟏𝟒. 𝟏𝟒
30 𝟑𝟎 − 𝟑𝟎 𝟐
𝟑𝟎 − 𝟑𝟎
3 30 + + 0.00
40 𝟏𝟒. 𝟏𝟒
𝟒𝟎 − 𝟑𝟎 𝟐
+ + 𝟒𝟎 − 𝟑𝟎
4 40 50) 𝟓𝟎 − 𝟑𝟎 𝟐
0.71
𝟏𝟒. 𝟏𝟒
/5⁡ 𝟓
= 30 = 𝟐𝟎𝟎 = 𝟏𝟒. 𝟏𝟒 𝟓𝟎 − 𝟑𝟎
5 50 1.41
𝟏𝟒. 𝟏𝟒

Dr. Mainak Biswas


Overfitting and Underfitting
• Overfitting and Underfitting are concepts in
machine learning that describe how well a
model generalizes to new data
• They are often indicators of how effectively a
model has learned patterns from the training
data

Dr. Mainak Biswas


Overfitting
• Overfitting occurs when a model learns not only the underlying
patterns in the training data but also the noise and details that do
not generalize to unseen data
• Symptoms
– High accuracy on training data
– Poor performance on validation or test data
• Causes
– Model is too complex (e.g., too many parameters or layers)
– Insufficient training data
– Training for too many epochs without regularization
• Prevention
– Use regularization techniques
– Reduce the model's complexity
– Use more training data or data augmentation

Dr. Mainak Biswas


Underfitting
• Underfitting occurs when a model is too simple to capture the
underlying patterns in the data
• Symptoms
– Poor performance on both training and validation/test data
– Model fails to capture the complexity of the data
• Causes
– Model is too simple
– Insufficient training time
– Features used in the model are not relevant or sufficient
• Prevention
– Use a more complex model
– Train the model for more epochs
– Provide better or more features to the model

Dr. Mainak Biswas


Differences

Aspect Overfitting Underfitting


Performance on Training
High accuracy Low accuracy
Data
Performance on Test Data Poor Poor
Model Complexity Too complex Too simple
Generalization Poor Poor

Dr. Mainak Biswas

You might also like