Lecture 1.3

The document discusses normalization and standardization techniques used in machine learning to adjust feature scales, ensuring equal contribution to models. It also covers overfitting and underfitting, which describe how well a model generalizes to new data, along with their symptoms, causes, and prevention strategies. Normalization scales data to a specific range, while standardization transforms data to have a mean of 0 and a standard deviation of 1.

Uploaded by

mous7457

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

4 views

Lecture 1.3

Uploaded by

mous7457

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 11

Lecture 1.

• Normalization and Standardization

• Overfitting and Underfitting

Dr. Mainak Biswas

Normalization and Standardization
• Both Normalization and Standardization are
techniques used to adjust the scale of features
in a dataset
• They are crucial in machine learning to ensure
that all features contribute equally to the
model and prevent any feature from
dominating due to its scale

Dr. Mainak Biswas

Normalization
• Normalization (also called Min-Max Scaling) is the
process of transforming features such that they lie within
a specific range, typically [0, 1] or [-1, 1]
• This is done by scaling the data to a fixed range based on
the minimum and maximum values of the feature
• Formula:
𝑥 − min⁡(𝑥)
𝑥′ =
max 𝑥 − min⁡(𝑥)
where x is the original value, min(x)is the minimum value,
and max(x) is the maximum value in the dataset
• Usage: Algorithms like k-Nearest Neighbors (k-NN), and
Neural Networks, which are sensitive to the scale of
features.
Dr. Mainak Biswas
Normalization Example
𝒙 − 𝐦𝐢𝐧⁡(𝒙)
SL Values 𝒙= Normalized
𝐦𝐚𝐱 𝒙 − 𝐦𝐢𝐧⁡(𝒙)
Values
𝟏𝟎 − 𝟏𝟎
1 10 0
𝟓𝟎 − 𝟏𝟎
𝟐𝟎 − 𝟏𝟎
2 20 0.25
𝟓𝟎 − 𝟏𝟎
𝟑𝟎 − 𝟏𝟎
3 30 0.50
𝟓𝟎 − 𝟏𝟎
𝟒𝟎 − 𝟏𝟎
4 40 0.75
𝟓𝟎 − 𝟏𝟎

𝟓𝟎 − 𝟏𝟎
5 50 1.00
𝟓𝟎 − 𝟏𝟎

Dr. Mainak Biswas

Standardization
• Standardization: Transforming data to have a mean of 0 and
a standard deviation of 1 (also known as Z-Score Scaling)
• It centers the data and scales it based on standard
deviation
• Formula:
′
𝑥−𝜇
𝑥 =
𝜎
where 𝜇 is the mean and 𝜎 is the standard deviation of the
dataset
• Usage: Algorithms like Support Vector Machines (SVM),
Logistic Regression, and Principal Component Analysis
(PCA) which assume a normal distribution or work better
with data centered around 0.

Dr. Mainak Biswas

Dr. Mainak Biswas
Standardization Example
SL Values Mean Standard Deviation 𝑥−𝜇 Standardize
𝑥′ =
(𝜇) (𝜎) 𝜎 d Values
(10 𝟏𝟎 − 𝟑𝟎
1 10 𝟏𝟎 − 𝟑𝟎 𝟐 −1.41
+ 𝟏𝟒. 𝟏𝟒
+
20 𝟐𝟎 − 𝟑𝟎 𝟐 𝟐𝟎 − 𝟑𝟎
2 20 + +
−0.71
𝟏𝟒. 𝟏𝟒
30 𝟑𝟎 − 𝟑𝟎 𝟐
𝟑𝟎 − 𝟑𝟎
3 30 + + 0.00
40 𝟏𝟒. 𝟏𝟒
𝟒𝟎 − 𝟑𝟎 𝟐
+ + 𝟒𝟎 − 𝟑𝟎
4 40 50) 𝟓𝟎 − 𝟑𝟎 𝟐
0.71
𝟏𝟒. 𝟏𝟒
/5⁡ 𝟓
= 30 = 𝟐𝟎𝟎 = 𝟏𝟒. 𝟏𝟒 𝟓𝟎 − 𝟑𝟎
5 50 1.41
𝟏𝟒. 𝟏𝟒

Dr. Mainak Biswas

Overfitting and Underfitting
• Overfitting and Underfitting are concepts in
machine learning that describe how well a
model generalizes to new data
• They are often indicators of how effectively a
model has learned patterns from the training
data

Dr. Mainak Biswas

Overfitting
• Overfitting occurs when a model learns not only the underlying
patterns in the training data but also the noise and details that do
not generalize to unseen data
• Symptoms
– High accuracy on training data
– Poor performance on validation or test data
• Causes
– Model is too complex (e.g., too many parameters or layers)
– Insufficient training data
– Training for too many epochs without regularization
• Prevention
– Use regularization techniques
– Reduce the model's complexity
– Use more training data or data augmentation

Dr. Mainak Biswas

Underfitting
• Underfitting occurs when a model is too simple to capture the
underlying patterns in the data
• Symptoms
– Poor performance on both training and validation/test data
– Model fails to capture the complexity of the data
• Causes
– Model is too simple
– Insufficient training time
– Features used in the model are not relevant or sufficient
• Prevention
– Use a more complex model
– Train the model for more epochs
– Provide better or more features to the model

Dr. Mainak Biswas

Differences

Aspect Overfitting Underfitting

Performance on Training
High accuracy Low accuracy
Data
Performance on Test Data Poor Poor
Model Complexity Too complex Too simple
Generalization Poor Poor

Dr. Mainak Biswas

Hourglass Workout Program by Luisagiuliet 2
76% (21)
Hourglass Workout Program by Luisagiuliet 2
51 pages
12 Week Program: Summer Body Starts Now
87% (46)
12 Week Program: Summer Body Starts Now
70 pages
Read People Like A Book by Patrick King-Edited
57% (82)
Read People Like A Book by Patrick King-Edited
12 pages
Livingood, Blake - Livingood Daily Your 21-Day Guide To Experience Real Health
77% (13)
Livingood, Blake - Livingood Daily Your 21-Day Guide To Experience Real Health
260 pages
Cheat Code To The Universe
94% (79)
Cheat Code To The Universe
34 pages
Facial Gains Guide (001 081)
91% (45)
Facial Gains Guide (001 081)
81 pages
Curse of Strahd
95% (467)
Curse of Strahd
258 pages
The Psychiatric Interview - Daniel Carlat
91% (34)
The Psychiatric Interview - Daniel Carlat
473 pages
The Borax Conspiracy
91% (57)
The Borax Conspiracy
14 pages
The Secret Language of Attraction
86% (108)
The Secret Language of Attraction
278 pages
How To Develop and Write A Grant Proposal
83% (542)
How To Develop and Write A Grant Proposal
17 pages
Penis Enlargement Secret
60% (124)
Penis Enlargement Secret
12 pages
Workbook For The Body Keeps The Score
89% (53)
Workbook For The Body Keeps The Score
111 pages
Donald Trump & Jeffrey Epstein Rape Lawsuit and Affidavits
83% (1016)
Donald Trump & Jeffrey Epstein Rape Lawsuit and Affidavits
13 pages
KamaSutra Positions
78% (69)
KamaSutra Positions
55 pages
7 Hermetic Principles
93% (30)
7 Hermetic Principles
3 pages
27 Feedback Mechanisms Pogil Key
77% (13)
27 Feedback Mechanisms Pogil Key
6 pages
Frank Hammond - List of Demons
92% (92)
Frank Hammond - List of Demons
3 pages
Phone Codes
79% (28)
Phone Codes
5 pages
36 Questions That Lead To Love
91% (35)
36 Questions That Lead To Love
3 pages
How 2 Setup Trust
97% (307)
How 2 Setup Trust
3 pages
The 36 Questions That Lead To Love - The New York Times
91% (35)
The 36 Questions That Lead To Love - The New York Times
3 pages
100 Questions To Ask Your Partner
78% (36)
100 Questions To Ask Your Partner
2 pages
Satanic Calendar
25% (56)
Satanic Calendar
4 pages
The 36 Questions That Lead To Love - The New York Times
95% (21)
The 36 Questions That Lead To Love - The New York Times
3 pages
Jeffrey Epstein39s Little Black Book Unredacted PDF
75% (12)
Jeffrey Epstein39s Little Black Book Unredacted PDF
95 pages
14 Easiest & Hardest Muscles To Build (Ranked With Solutions)
100% (8)
14 Easiest & Hardest Muscles To Build (Ranked With Solutions)
27 pages
1001 Songs
70% (73)
1001 Songs
1,798 pages
The 4 Hour Workweek, Expanded and Updated by Timothy Ferriss - Excerpt
23% (954)
The 4 Hour Workweek, Expanded and Updated by Timothy Ferriss - Excerpt
38 pages
Zodiac Sign & Their Most Common Addictions
63% (30)
Zodiac Sign & Their Most Common Addictions
9 pages
ProblemSet Notebook17-18
100% (1)
ProblemSet Notebook17-18
97 pages
All Models Are Wrong
No ratings yet
All Models Are Wrong
429 pages
Lecture5
No ratings yet
Lecture5
26 pages
5.Feauture Engineering
No ratings yet
5.Feauture Engineering
34 pages
U&O Fitting
No ratings yet
U&O Fitting
6 pages
Data Science Interview Question
No ratings yet
Data Science Interview Question
23 pages
Unit 4
No ratings yet
Unit 4
33 pages
Underfitting and Overfitting
No ratings yet
Underfitting and Overfitting
4 pages
DL_Unit1 (1)
100% (1)
DL_Unit1 (1)
79 pages
ML Unit 2
No ratings yet
ML Unit 2
90 pages
23.-Scaling-Techniques
No ratings yet
23.-Scaling-Techniques
30 pages
1737527078055
No ratings yet
1737527078055
111 pages
Lecture 7 Data Transformation and Dimensionality Reduction
No ratings yet
Lecture 7 Data Transformation and Dimensionality Reduction
22 pages
Unit 2 ML 2019
No ratings yet
Unit 2 ML 2019
91 pages
Model Evaluation
No ratings yet
Model Evaluation
29 pages
Summary Chap 1 & 2
No ratings yet
Summary Chap 1 & 2
5 pages
Unit II_2.5_Overfitting Underfitting @ CSJMU_6 Slides Handouts
No ratings yet
Unit II_2.5_Overfitting Underfitting @ CSJMU_6 Slides Handouts
5 pages
Linear Regression Summary
No ratings yet
Linear Regression Summary
57 pages
Feature Scaling (Standardization & Normalization)
No ratings yet
Feature Scaling (Standardization & Normalization)
35 pages
Machine Learning - Lec4 - 5
No ratings yet
Machine Learning - Lec4 - 5
41 pages
Data Mining
No ratings yet
Data Mining
33 pages
Overfitting and Underfitting in Machine Learning
No ratings yet
Overfitting and Underfitting in Machine Learning
3 pages
DADM S2 Data Preprocessing-Data Cleaning and Transformation
No ratings yet
DADM S2 Data Preprocessing-Data Cleaning and Transformation
12 pages
Underfitting and Overfitting Slides and Transcript
No ratings yet
Underfitting and Overfitting Slides and Transcript
13 pages
emsemble methods-pages-deleted
No ratings yet
emsemble methods-pages-deleted
2 pages
Normalization and Standardization: Methods To Preprocess Data To Have Consistent Scales and Distributions
No ratings yet
Normalization and Standardization: Methods To Preprocess Data To Have Consistent Scales and Distributions
10 pages
3_AML _Lecture 3_Feature Engg
No ratings yet
3_AML _Lecture 3_Feature Engg
39 pages
Week 15
No ratings yet
Week 15
41 pages
My Notes
No ratings yet
My Notes
15 pages
02 - Diagnostics For Machine Learning Model
No ratings yet
02 - Diagnostics For Machine Learning Model
20 pages
ML - WEEK 04
No ratings yet
ML - WEEK 04
33 pages
Unit 3
No ratings yet
Unit 3
55 pages
Theory in Machine Learning
No ratings yet
Theory in Machine Learning
60 pages
Data Pre Processing
No ratings yet
Data Pre Processing
23 pages
MACHINE LEARNING NOTES ANNA UNIVERSITY
No ratings yet
MACHINE LEARNING NOTES ANNA UNIVERSITY
9 pages
Bias and Variance in Machine Learning
No ratings yet
Bias and Variance in Machine Learning
3 pages
Effectiveness of Normalization Pre-Processing of Big Data To The Machine Learning Performance
No ratings yet
Effectiveness of Normalization Pre-Processing of Big Data To The Machine Learning Performance
6 pages
Lecture-11 - Feature Scaling
No ratings yet
Lecture-11 - Feature Scaling
26 pages
Notes-1
No ratings yet
Notes-1
3 pages
Unit 4
No ratings yet
Unit 4
35 pages
Underfitting and Overfitting in Machine Learning by ROll (41,42)
No ratings yet
Underfitting and Overfitting in Machine Learning by ROll (41,42)
29 pages
Introduction to Machine Learning
No ratings yet
Introduction to Machine Learning
116 pages
unit-1.2-Perceptron-2024
No ratings yet
unit-1.2-Perceptron-2024
107 pages
Optimization
No ratings yet
Optimization
95 pages
Gansp Awareness Quiz PDF
No ratings yet
Gansp Awareness Quiz PDF
13 pages
AI - W7L14
No ratings yet
AI - W7L14
22 pages
3 1 Chapter 3 Normalization
No ratings yet
3 1 Chapter 3 Normalization
22 pages
General ML Notes
No ratings yet
General ML Notes
30 pages
Module - 2 Ver 1.4
No ratings yet
Module - 2 Ver 1.4
35 pages
ML PYQs
No ratings yet
ML PYQs
32 pages
OVERFITTING and UNDERFITTING
No ratings yet
OVERFITTING and UNDERFITTING
5 pages
Bias_and_Variance
No ratings yet
Bias_and_Variance
4 pages
Data Analytics Micro
No ratings yet
Data Analytics Micro
1 page
Week11_regularization and optimization
No ratings yet
Week11_regularization and optimization
75 pages
Over Fitting
No ratings yet
Over Fitting
19 pages
Nndl Notes
No ratings yet
Nndl Notes
73 pages
additional-notes-practice-exam
No ratings yet
additional-notes-practice-exam
8 pages
Unit - 2 Deep Learning
No ratings yet
Unit - 2 Deep Learning
26 pages
15-The Bias - Variance - Trade-Off-08-04-2024
No ratings yet
15-The Bias - Variance - Trade-Off-08-04-2024
23 pages
Chapter 1-ML
No ratings yet
Chapter 1-ML
27 pages
Certified Lean Six Sigma Green Belt (ICGB) Practice Questions And Exam Tests ICGB Exam Guidebook And Updated Questions
From Everand
Certified Lean Six Sigma Green Belt (ICGB) Practice Questions And Exam Tests ICGB Exam Guidebook And Updated Questions
Idea Link
No ratings yet
2 Sample T Test
No ratings yet
2 Sample T Test
9 pages
Laboratory
No ratings yet
Laboratory
19 pages
Reg. No.: Name:: Q. No. Question Description Marks
No ratings yet
Reg. No.: Name:: Q. No. Question Description Marks
1 page
Gonzales Talanta 2010 PDF
No ratings yet
Gonzales Talanta 2010 PDF
4 pages
Probit Analysis - Respuesta: Estimated Regression Model (Maximum Likelihood)
No ratings yet
Probit Analysis - Respuesta: Estimated Regression Model (Maximum Likelihood)
18 pages
Final MAEC_Semester_4_English_2024-25
No ratings yet
Final MAEC_Semester_4_English_2024-25
15 pages
Advance Time Series
No ratings yet
Advance Time Series
12 pages
A Theoretical Generalization of The Markowitz Model Incorporating Skewness and Kurtosis - Pierpaolo Uberti
No ratings yet
A Theoretical Generalization of The Markowitz Model Incorporating Skewness and Kurtosis - Pierpaolo Uberti
29 pages
Unit I HONOR- Discrete Distibution
No ratings yet
Unit I HONOR- Discrete Distibution
12 pages
20ma402 Ps Unit III DCM
No ratings yet
20ma402 Ps Unit III DCM
77 pages
Lecture7a 0919
No ratings yet
Lecture7a 0919
18 pages
Instant download of Statistical and Econometric Methods for Transportation Data Analysis 1st Edition Simon P. Washington ebook PDF, every chapter
No ratings yet
Instant download of Statistical and Econometric Methods for Transportation Data Analysis 1st Edition Simon P. Washington ebook PDF, every chapter
86 pages
Martín Albo, J., Núñez, J., Navarro, J., & Grijalvo, F. (2007)
No ratings yet
Martín Albo, J., Núñez, J., Navarro, J., & Grijalvo, F. (2007)
11 pages
Demisu Shimelis Experimental Design and Research Methodology Assignment
No ratings yet
Demisu Shimelis Experimental Design and Research Methodology Assignment
5 pages
Variance Component Estimation & Best Linear Unbiased Prediction (Blup)
100% (1)
Variance Component Estimation & Best Linear Unbiased Prediction (Blup)
16 pages
Armitage1995 - Methods AR PDF
No ratings yet
Armitage1995 - Methods AR PDF
28 pages
Ankit Bansal-CGT19005
No ratings yet
Ankit Bansal-CGT19005
7 pages
Econ2p91 ReviewQuestionsMultipleChoice SOLUTIONS Chapter10 (2)
No ratings yet
Econ2p91 ReviewQuestionsMultipleChoice SOLUTIONS Chapter10 (2)
3 pages
ST3001 - Week 4-1
No ratings yet
ST3001 - Week 4-1
12 pages
Granger Causality in Excel
No ratings yet
Granger Causality in Excel
6 pages
Sufficient Statistics
No ratings yet
Sufficient Statistics
22 pages
Penerapan Fungsi Manajemen Sebagai Metode Meningkatkan Kinerja Karyawan
No ratings yet
Penerapan Fungsi Manajemen Sebagai Metode Meningkatkan Kinerja Karyawan
8 pages
Biostat Midterm
No ratings yet
Biostat Midterm
4 pages
Ba BSC Statistics
No ratings yet
Ba BSC Statistics
19 pages
Data Analysis and Chemometrics PDF
No ratings yet
Data Analysis and Chemometrics PDF
33 pages
Continuous Probability Distributions
No ratings yet
Continuous Probability Distributions
25 pages
Full Download Business Analytics, Global Edition James Evans PDF DOCX
100% (1)
Full Download Business Analytics, Global Edition James Evans PDF DOCX
40 pages
Chapter # 1 What Is Statistics ?
No ratings yet
Chapter # 1 What Is Statistics ?
2 pages
DC Unit Test 2 Question Bank
No ratings yet
DC Unit Test 2 Question Bank
4 pages