0% found this document useful (0 votes)

279 views35 pages

Feature Scaling (Standardization & Normalization)

Feature scaling is a preprocessing technique in machine learning that standardizes or normalizes independent variables to ensure equal contribution to analysis, especially for algorithms sensitive to feature scales. Standardization transforms features to have a mean of 0 and a standard deviation of 1, while normalization scales features to a fixed range, typically [0,1]. The choice between standardization and normalization depends on data distribution, algorithm requirements, and the presence of outliers.

Uploaded by

Ha Yanga

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

279 views35 pages

Feature Scaling (Standardization & Normalization)

Uploaded by

Ha Yanga

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 35

Feature scaling is a preprocessing technique used in machine learning and data analysis to standardize

or normalize the range of independent variables (features) in a dataset. It ensures that all features
contribute equally to the analysis, preventing features with larger magnitudes from dominating those
with smaller magnitudes.
This is particularly important for algorithms that rely on distance calculations (e.g., k-nearest
neighbors, support vector machines) or gradient-based optimization (e.g., linear regression, neural
networks).
Two common techniques for feature scaling are standardization and normalization.
Standardization (also called Z-score normalization) is a feature scaling technique used in data
preprocessing to transform features in a dataset so that they have a mean of 0 and a standard
deviation of 1.
This process ensures that the features are centered around zero and have a consistent scale, making
them suitable for machine learning algorithms that are sensitive to the magnitude of input data.
Mathematical Definition
The formula for standardization is:

Where:
•X: Original feature value.
•μ: Mean of the feature.
•σ: Standard deviation of the feature.

After standardization:
Transforms data to have zero mean and unit variance.
The mean (μ) of the feature becomes 0 and the standard deviation (σ) of the feature becomes 1.
Data can have negative values after scaling.
Works well with Gaussian (normally distributed) data but is also commonly used for other
distributions.
Key Characteristics of Standardization:
1.Centers Data Around Zero:

1. Subtracting the mean (μ) shifts the distribution so that the mean of the feature becomes 0.

2.Scales Data to Unit Variance:

1. Dividing by the standard deviation (σ) ensures that the feature has a standard deviation of 1.

3.Preserves the Shape of the Distribution:

1. Standardization does not change the shape of the data distribution (e.g., skewness, kurtosis). It
only changes the scale.

4.Handles Outliers:

1. While standardization is sensitive to outliers (since mean and standard deviation are influenced
by extreme values), it is less sensitive than normalization (min-max scaling).
Limitations of Standardization
1.Sensitive to Outliers:
1. Since standardization uses the mean and standard deviation, it can be influenced by outliers.
In such cases, robust scaling (using median and IQR) may be a better alternative.
2.Not Suitable for All Algorithms:
1. Tree-based algorithms (e.g., decision trees, random forests) do not require standardization, as
they are not sensitive to feature scales.
3.Assumes Gaussian Distribution:
1. Standardization works best when the data is approximately normally distributed. For non-
Gaussian distributions, other scaling methods (e.g., normalization) may be more appropriate.
 Steps to Perform Standardization :
1.Calculate the Mean (μ):

1. Compute the mean of each feature.

2.Calculate the Standard Deviation (σ):

1. Compute the standard deviation of each feature.

3.Apply the Standardization Formula:

1. Transform each feature value using the formula:

 When to Use Standardization (StandardScaler):
Standardization is used when your dataset has features with different units and scales, and you want to
transform them to have:
1.Zero mean (μ = 0)
2.Unit variance (σ = 1)

✅ Machine Learning Models That Assume Normally Distributed Data and Algorithms Sensitive to Feature
Scales:
•Logistic Regression
•Support Vector Machines (SVM)
•k-Nearest Neighbors (KNN)
•Principal Component Analysis (PCA)
•Linear Regression (if features vary in scale)
•Clustering algorithms like K-Means, DBSCAN

✅ Distance-Based Algorithms: Algorithms that rely on distance metrics (e.g., k-nearest neighbors
(KNN), k-means clustering) require standardized data to ensure that all features contribute equally to the
distance calculations.
✅ Gradient-Based Optimization Works Better:Gradient-Based Optimization: Algorithms that use
gradient descent (e.g., linear regression, neural networks) converge faster when features are
standardized. Standardization helps gradient descent converge faster in algorithms like Neural
Networks.
✅ Features Have Different Scales & Units: Example: Age (in years) and Salary (in dollars) have different
ranges; standardization makes them comparable.
✅ When Data is Normally Distributed: Standardization is ideal when data follows a Gaussian (normal)
distribution.
 When Not to Use Standardization:
❌ Tree-Based Models (Decision Trees, Random Forest, XGBoost, etc.)
•These models are not affected by feature scaling.
❌ When Data is Not Normally Distributed & Needs Different Scaling:
•Use MinMaxScaler if data is uniformly distributed (e.g., between 0 and 1).
•Use RobustScaler if data has many outliers.
 Advantages of Standardization:
1.Improves Algorithm Performance:

1. Many machine learning algorithms perform better when features are standardized.
2.Faster Convergence:

1. Gradient-based optimization algorithms converge faster when features are on a similar scale.
3.Interpretability:

1. Standardized features are easier to interpret, as they are centered around zero and have a
consistent scale.
4.Handles Features with Different Units:

1. Standardization ensures that features with different units are treated equally.
Limitations of Standardization:
1.Sensitive to Outliers:

1. Since standardization uses the mean and standard deviation, it can be influenced by outliers.
In such cases, robust scaling (using median and IQR) may be a better alternative.
2.Not Suitable for All Algorithms:

1. Tree-based algorithms (e.g., decision trees, random forests) do not require standardization, as
they are not sensitive to feature scales.
3.Assumes Gaussian Distribution:

1. Standardization works best when the data is approximately normally distributed. For non-
Gaussian distributions, other scaling methods (e.g., normalization) may be more appropriate.
Normalization in Machine Learning:-
What is Normalization?
Normalization is a feature scaling technique that transforms data into a fixed range, typically [0,1] or
[-1,1]. It ensures that all features contribute equally to a model, preventing features with larger values
from dominating those with smaller values.

Formula for Normalization

The most commonly used method for normalization is Min-Max Scaling, defined as:

Where:
•X = Original feature value
•Xmin= Minimum value of the feature
•Xmax= Maximum value of the feature
🔹 This scales all values into the [0,1] range.

🔹 If we want to scale between [-1,1], the formula is:

Why is Normalization Important?
✅ 1. Handles Different Scales of Data
•Features in a dataset may have different ranges (e.g., Age: 0-100, Salary: 10,000-100,000).
Normalization makes them comparable.
✅ 2. Improves Model Performance
•Many machine learning algorithms, especially those using distance-based calculations (like
KNN, SVM), perform better when features are normalized.
✅ 3. Helps with Faster Convergence
•Algorithms like Neural Networks and Gradient Descent-based models converge faster when
inputs are normalized.
✅ 4. Avoids Dominance of Large-Scale Features
•Prevents features with larger numerical ranges from overshadowing smaller-scale features
in the learning process.
When to Use Normalization?
Use Normalization When:
✔ Data has features with very different scales.
✔ Algorithms that use distance-based calculations (e.g., KNN, SVM, PCA, K-Means) require it.
✔ Neural networks (deep learning models) perform better with normalized input.
✔ The data does not follow a normal distribution.
Do NOT use Normalization When:
❌ The dataset follows a normal (Gaussian) distribution → Use Standardization instead.
❌ Algorithms like Decision Trees, Random Forests, XGBoost are used → They do not require feature
scaling.
Choosing Between Normalization & Standardization:
Choosing Normalization (Min-Max Scaling) or Standardization (Z-score scaling) depends on:
1.Data Distribution (Gaussian vs. non-Gaussian)
2.Machine Learning Algorithm (Sensitive vs. Insensitive to Scaling)
3.Outliers (Present vs. Absent)
When to Choose Normalization (Min-Max Scaling)?
✅ Use when:
•The data is not normally distributed (e.g., skewed or uniform).
•You need fixed range scaling (e.g., [0,1] or [-1,1]).
•Used in algorithms that do not assume Gaussian distribution:
• Deep Learning (Neural Networks)
• K-Nearest Neighbors (KNN)
• Support Vector Machines (SVM)
• K-Means Clustering
• Decision Trees / Random Forests (not sensitive but can benefit)
⚠ Avoid if:
•There are outliers, as Min-Max Scaling is sensitive to extreme values.
•The dataset follows a normal distribution.
When to Choose Standardization (Z-Score Scaling)?

✅ Use when:
•The data follows a normal (Gaussian) distribution.
•Outliers exist, as standardization is less sensitive to outliers.
•Used in algorithms that assume Gaussian distribution:
• Linear Regression
• Logistic Regression
• Principal Component Analysis (PCA)
• Support Vector Machines (SVM)
• K-Means Clustering
• Gradient Descent-based models (e.g., Linear Models, Neural Networks)

⚠ Avoid if:
•The data is not Gaussian and needs a fixed range.
•The model doesn’t rely on normally distributed input features.
 Difference Between Normalization and Standardization:
Why is Standardization Better Than Normalization for Normally Distributed
Data?
1. Standardization Transforms Data to Standard Normal Distribution (𝜇 = 0, 𝜎 = 1)
•Standardization centers the data by subtracting the mean (𝜇) and scales it by the standard deviation
(𝜎).
•This results in a standard normal distribution (bell-shaped curve) with:
• Mean = 0
• Standard Deviation = 1
•This is useful for models like Linear Regression, Logistic Regression, PCA, and SVM, which assume
that data is normally distributed.
2. Normalization (Min-Max Scaling) Doesn’t Preserve Gaussian Distribution
•Normalization scales data between 0 and 1 (or -1 to 1).
•If the data is already normally distributed, min-max scaling distorts its shape by compressing values
toward the boundaries (0 and 1).
•This can remove important statistical properties like standard deviation and skewness.

ML Unit 2
No ratings yet
ML Unit 2
90 pages
Practical 6
No ratings yet
Practical 6
6 pages
Normalization Vs Standardization
No ratings yet
Normalization Vs Standardization
2 pages
Unit 2 ML 2019
No ratings yet
Unit 2 ML 2019
91 pages
ML - WEEK 04
No ratings yet
ML - WEEK 04
33 pages
Normalization and Standardization: Methods To Preprocess Data To Have Consistent Scales and Distributions
No ratings yet
Normalization and Standardization: Methods To Preprocess Data To Have Consistent Scales and Distributions
10 pages
1737527078055
No ratings yet
1737527078055
111 pages
Lecture-11 - Feature Scaling
No ratings yet
Lecture-11 - Feature Scaling
26 pages
3_AML _Lecture 3_Feature Engg
No ratings yet
3_AML _Lecture 3_Feature Engg
39 pages
23.-Scaling-Techniques
No ratings yet
23.-Scaling-Techniques
30 pages
Standardization vs Normalization in Pattern Recognition
No ratings yet
Standardization vs Normalization in Pattern Recognition
1 page
Feature Scaling in Machine Learning
No ratings yet
Feature Scaling in Machine Learning
4 pages
Feature Scaling
No ratings yet
Feature Scaling
13 pages
Summary Chap 1 & 2
No ratings yet
Summary Chap 1 & 2
5 pages
Standar Ization
No ratings yet
Standar Ization
7 pages
Lec7 (1)
No ratings yet
Lec7 (1)
9 pages
Presentation #1 Data Mining Minahel Khan BSIT(E)22!11!1
No ratings yet
Presentation #1 Data Mining Minahel Khan BSIT(E)22!11!1
7 pages
Session 7 Feature Selection & Dimensionality Reduction
No ratings yet
Session 7 Feature Selection & Dimensionality Reduction
20 pages
Feature Scaling Techniques: Machine Learning
No ratings yet
Feature Scaling Techniques: Machine Learning
27 pages
Data Preprocessing
No ratings yet
Data Preprocessing
49 pages
Unit 3-2
No ratings yet
Unit 3-2
15 pages
Standardization & Normalization In: ML With Python Example
No ratings yet
Standardization & Normalization In: ML With Python Example
8 pages
Towards Data Science All About Feature Scaling
No ratings yet
Towards Data Science All About Feature Scaling
16 pages
ML Normalization Techniques_ Overview & Practical Guide
No ratings yet
ML Normalization Techniques_ Overview & Practical Guide
5 pages
Normalization: Normalization Techniques at A Glance
No ratings yet
Normalization: Normalization Techniques at A Glance
5 pages
feature scaling
No ratings yet
feature scaling
6 pages
Feature Engineering
No ratings yet
Feature Engineering
18 pages
Feature Scaling Notes
No ratings yet
Feature Scaling Notes
4 pages
Step 06 - Data Preprocessing
No ratings yet
Step 06 - Data Preprocessing
10 pages
Assignment 121
No ratings yet
Assignment 121
9 pages
ML
No ratings yet
ML
6 pages
Lecture 2.3 Data Normalization
No ratings yet
Lecture 2.3 Data Normalization
7 pages
Data Scaling
No ratings yet
Data Scaling
5 pages
Data Normalization in Data Mining
No ratings yet
Data Normalization in Data Mining
8 pages
Summery of Feature Eng
No ratings yet
Summery of Feature Eng
4 pages
Lecture 7 Data Transformation and Dimensionality Reduction
No ratings yet
Lecture 7 Data Transformation and Dimensionality Reduction
22 pages
Conversation_Normalization
No ratings yet
Conversation_Normalization
2 pages
Machine Learning - Lec4 - 5
No ratings yet
Machine Learning - Lec4 - 5
41 pages
04_data-normalization-in-python.en
No ratings yet
04_data-normalization-in-python.en
1 page
CH1
No ratings yet
CH1
64 pages
Data Mining
No ratings yet
Data Mining
33 pages
5.Feauture Engineering
No ratings yet
5.Feauture Engineering
34 pages
ml lab exam document
No ratings yet
ml lab exam document
14 pages
Standardisation vs Normalisation
No ratings yet
Standardisation vs Normalisation
6 pages
Uklanjanje I Normalizacija Karakteristika
No ratings yet
Uklanjanje I Normalizacija Karakteristika
2 pages
Lecture # 13 Data_Transformation_Techniques
No ratings yet
Lecture # 13 Data_Transformation_Techniques
36 pages
Lecture 1.3
No ratings yet
Lecture 1.3
11 pages
Seven Lab Instruction
No ratings yet
Seven Lab Instruction
38 pages
data processing
No ratings yet
data processing
19 pages
Normalization and Calibration
No ratings yet
Normalization and Calibration
3 pages
8 Normalization Methods
No ratings yet
8 Normalization Methods
10 pages
Unit 4
No ratings yet
Unit 4
33 pages
FeatureEngineering (1)
No ratings yet
FeatureEngineering (1)
50 pages
ML Lec-6
No ratings yet
ML Lec-6
16 pages
General ML Notes
No ratings yet
General ML Notes
30 pages
287_Sougata Saha_Scaling Training and Test Data
No ratings yet
287_Sougata Saha_Scaling Training and Test Data
11 pages
3 1 Chapter 3 Normalization
No ratings yet
3 1 Chapter 3 Normalization
22 pages
4 - Finding and Fixing Data Quality Issues
No ratings yet
4 - Finding and Fixing Data Quality Issues
48 pages
Week 10
No ratings yet
Week 10
50 pages
Extending the Boundaries: An Expansive Journey into Nonparametric Curve Estimation
From Everand
Extending the Boundaries: An Expansive Journey into Nonparametric Curve Estimation
Pasquale De Marco
No ratings yet
Unit-1 Soft Computing
No ratings yet
Unit-1 Soft Computing
8 pages
Unit 3 SC PPT
No ratings yet
Unit 3 SC PPT
24 pages
Unit 1 Hard vs Soft
No ratings yet
Unit 1 Hard vs Soft
16 pages
TF-IDF
No ratings yet
TF-IDF
15 pages
Computer Networks Assignment
No ratings yet
Computer Networks Assignment
10 pages
computer architecture
No ratings yet
computer architecture
2 pages
Numerical Question
No ratings yet
Numerical Question
1 page
unit 2
No ratings yet
unit 2
23 pages
Worksheet - Molar Mass - Answers
No ratings yet
Worksheet - Molar Mass - Answers
1 page
Lab 6
No ratings yet
Lab 6
6 pages
PR2 Module 5 Research Title
100% (1)
PR2 Module 5 Research Title
9 pages
Introduction To Purposive Communication
No ratings yet
Introduction To Purposive Communication
11 pages
Emona Datex Sample Labmanual Ver2
0% (1)
Emona Datex Sample Labmanual Ver2
58 pages
Unit 1 - Basic Concepts - FD3404 - Principles of Thermodynamics
No ratings yet
Unit 1 - Basic Concepts - FD3404 - Principles of Thermodynamics
28 pages
editedMATH SLK Q2 WK 5
No ratings yet
editedMATH SLK Q2 WK 5
10 pages
Sika - Plasterstik: Cement Mortar Bonding Liquid
No ratings yet
Sika - Plasterstik: Cement Mortar Bonding Liquid
2 pages
Scientific computing with case studies 1st Edition Dianne P. O'Leary download pdf
100% (4)
Scientific computing with case studies 1st Edition Dianne P. O'Leary download pdf
85 pages
Unoverdone Summary
No ratings yet
Unoverdone Summary
2 pages
The Effects of Logistics Activities On Performance of Dagon Foods Processing and Canning Factory (Hlaingtet)
No ratings yet
The Effects of Logistics Activities On Performance of Dagon Foods Processing and Canning Factory (Hlaingtet)
12 pages
SAMPLE DLP ON PBL UISNG 7es
No ratings yet
SAMPLE DLP ON PBL UISNG 7es
3 pages
Light (Refraction) 10th Notes
No ratings yet
Light (Refraction) 10th Notes
15 pages
Thesis Submission University of Sydney
100% (3)
Thesis Submission University of Sydney
4 pages
DLL3-G8 Q1 - August 15, 2024
No ratings yet
DLL3-G8 Q1 - August 15, 2024
2 pages
Maintenance Plan Base On RCM
No ratings yet
Maintenance Plan Base On RCM
4 pages
Biology Subject For High School - Darwin's Theory of Natural Selection by Slidesgo
No ratings yet
Biology Subject For High School - Darwin's Theory of Natural Selection by Slidesgo
56 pages
Full download Celtic from the West Alternative perspectives from archaeology genetics language and literature Barry W. Cunliffe pdf docx
100% (12)
Full download Celtic from the West Alternative perspectives from archaeology genetics language and literature Barry W. Cunliffe pdf docx
60 pages
(PPT) - Topic 4
No ratings yet
(PPT) - Topic 4
24 pages
Week 5 Lecture 20231013 Final
No ratings yet
Week 5 Lecture 20231013 Final
54 pages
Lesson Plan Gr. 4 Mathematics Length
No ratings yet
Lesson Plan Gr. 4 Mathematics Length
18 pages
Write A Reflection Paper Based On The Study of Jose Mencio Molintas Entitled
No ratings yet
Write A Reflection Paper Based On The Study of Jose Mencio Molintas Entitled
3 pages
Forensic Analysis of Glass
No ratings yet
Forensic Analysis of Glass
9 pages
Modifiers of Responsibility
No ratings yet
Modifiers of Responsibility
26 pages
Knowing What Makes Air Quality: The New Filter Standard ISO 16890
No ratings yet
Knowing What Makes Air Quality: The New Filter Standard ISO 16890
8 pages
Becoming Members of The Society
No ratings yet
Becoming Members of The Society
17 pages
This Study Resource Was: Graduate Challenge
No ratings yet
This Study Resource Was: Graduate Challenge
8 pages
Cap 1 - Meredith Mantel (2009) Project Management
100% (1)
Cap 1 - Meredith Mantel (2009) Project Management
34 pages
Belzona
No ratings yet
Belzona
12 pages
Tugas 10 - Chapter 15
No ratings yet
Tugas 10 - Chapter 15
6 pages

Feature Scaling (Standardization & Normalization)

Uploaded by

Feature Scaling (Standardization & Normalization)

Uploaded by

Feature scaling is a preprocessing technique used in machine learning and data analysis to standardize

2.Scales Data to Unit Variance:

3.Preserves the Shape of the Distribution:

1. Compute the mean of each feature.

1. Compute the standard deviation of each feature.

1. Transform each feature value using the formula:

Formula for Normalization

🔹 If we want to scale between [-1,1], the formula is:

You might also like