0% found this document useful (0 votes)
279 views35 pages

Feature Scaling (Standardization & Normalization)

Feature scaling is a preprocessing technique in machine learning that standardizes or normalizes independent variables to ensure equal contribution to analysis, especially for algorithms sensitive to feature scales. Standardization transforms features to have a mean of 0 and a standard deviation of 1, while normalization scales features to a fixed range, typically [0,1]. The choice between standardization and normalization depends on data distribution, algorithm requirements, and the presence of outliers.

Uploaded by

Ha Yanga
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
279 views35 pages

Feature Scaling (Standardization & Normalization)

Feature scaling is a preprocessing technique in machine learning that standardizes or normalizes independent variables to ensure equal contribution to analysis, especially for algorithms sensitive to feature scales. Standardization transforms features to have a mean of 0 and a standard deviation of 1, while normalization scales features to a fixed range, typically [0,1]. The choice between standardization and normalization depends on data distribution, algorithm requirements, and the presence of outliers.

Uploaded by

Ha Yanga
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 35

Feature scaling is a preprocessing technique used in machine learning and data analysis to standardize

or normalize the range of independent variables (features) in a dataset. It ensures that all features
contribute equally to the analysis, preventing features with larger magnitudes from dominating those
with smaller magnitudes.
This is particularly important for algorithms that rely on distance calculations (e.g., k-nearest
neighbors, support vector machines) or gradient-based optimization (e.g., linear regression, neural
networks).
Two common techniques for feature scaling are standardization and normalization.
Standardization (also called Z-score normalization) is a feature scaling technique used in data
preprocessing to transform features in a dataset so that they have a mean of 0 and a standard
deviation of 1.
This process ensures that the features are centered around zero and have a consistent scale, making
them suitable for machine learning algorithms that are sensitive to the magnitude of input data.
Mathematical Definition
The formula for standardization is:

Where:
•X: Original feature value.
•μ: Mean of the feature.
•σ: Standard deviation of the feature.

After standardization:
Transforms data to have zero mean and unit variance.
The mean (μ) of the feature becomes 0 and the standard deviation (σ) of the feature becomes 1.
Data can have negative values after scaling.
Works well with Gaussian (normally distributed) data but is also commonly used for other
distributions.
Key Characteristics of Standardization:
1.Centers Data Around Zero:

1. Subtracting the mean (μ) shifts the distribution so that the mean of the feature becomes 0.

2.Scales Data to Unit Variance:

1. Dividing by the standard deviation (σ) ensures that the feature has a standard deviation of 1.

3.Preserves the Shape of the Distribution:

1. Standardization does not change the shape of the data distribution (e.g., skewness, kurtosis). It
only changes the scale.

4.Handles Outliers:

1. While standardization is sensitive to outliers (since mean and standard deviation are influenced
by extreme values), it is less sensitive than normalization (min-max scaling).
Limitations of Standardization
1.Sensitive to Outliers:
1. Since standardization uses the mean and standard deviation, it can be influenced by outliers.
In such cases, robust scaling (using median and IQR) may be a better alternative.
2.Not Suitable for All Algorithms:
1. Tree-based algorithms (e.g., decision trees, random forests) do not require standardization, as
they are not sensitive to feature scales.
3.Assumes Gaussian Distribution:
1. Standardization works best when the data is approximately normally distributed. For non-
Gaussian distributions, other scaling methods (e.g., normalization) may be more appropriate.
 Steps to Perform Standardization :
1.Calculate the Mean (μ):

1. Compute the mean of each feature.


2.Calculate the Standard Deviation (σ):

1. Compute the standard deviation of each feature.


3.Apply the Standardization Formula:

1. Transform each feature value using the formula:


 When to Use Standardization (StandardScaler):
Standardization is used when your dataset has features with different units and scales, and you want to
transform them to have:
1.Zero mean (μ = 0)
2.Unit variance (σ = 1)

✅ Machine Learning Models That Assume Normally Distributed Data and Algorithms Sensitive to Feature
Scales:
•Logistic Regression
•Support Vector Machines (SVM)
•k-Nearest Neighbors (KNN)
•Principal Component Analysis (PCA)
•Linear Regression (if features vary in scale)
•Clustering algorithms like K-Means, DBSCAN

✅ Distance-Based Algorithms: Algorithms that rely on distance metrics (e.g., k-nearest neighbors
(KNN), k-means clustering) require standardized data to ensure that all features contribute equally to the
distance calculations.
✅ Gradient-Based Optimization Works Better:Gradient-Based Optimization: Algorithms that use
gradient descent (e.g., linear regression, neural networks) converge faster when features are
standardized. Standardization helps gradient descent converge faster in algorithms like Neural
Networks.
✅ Features Have Different Scales & Units: Example: Age (in years) and Salary (in dollars) have different
ranges; standardization makes them comparable.
✅ When Data is Normally Distributed: Standardization is ideal when data follows a Gaussian (normal)
distribution.
 When Not to Use Standardization:
❌ Tree-Based Models (Decision Trees, Random Forest, XGBoost, etc.)
•These models are not affected by feature scaling.
❌ When Data is Not Normally Distributed & Needs Different Scaling:
•Use MinMaxScaler if data is uniformly distributed (e.g., between 0 and 1).
•Use RobustScaler if data has many outliers.
 Advantages of Standardization:
1.Improves Algorithm Performance:

1. Many machine learning algorithms perform better when features are standardized.
2.Faster Convergence:

1. Gradient-based optimization algorithms converge faster when features are on a similar scale.
3.Interpretability:

1. Standardized features are easier to interpret, as they are centered around zero and have a
consistent scale.
4.Handles Features with Different Units:

1. Standardization ensures that features with different units are treated equally.
Limitations of Standardization:
1.Sensitive to Outliers:

1. Since standardization uses the mean and standard deviation, it can be influenced by outliers.
In such cases, robust scaling (using median and IQR) may be a better alternative.
2.Not Suitable for All Algorithms:

1. Tree-based algorithms (e.g., decision trees, random forests) do not require standardization, as
they are not sensitive to feature scales.
3.Assumes Gaussian Distribution:

1. Standardization works best when the data is approximately normally distributed. For non-
Gaussian distributions, other scaling methods (e.g., normalization) may be more appropriate.
Normalization in Machine Learning:-
What is Normalization?
Normalization is a feature scaling technique that transforms data into a fixed range, typically [0,1] or
[-1,1]. It ensures that all features contribute equally to a model, preventing features with larger values
from dominating those with smaller values.

Formula for Normalization


The most commonly used method for normalization is Min-Max Scaling, defined as:

Where:
•X = Original feature value
•Xmin​= Minimum value of the feature
•Xmax= Maximum value of the feature
🔹 This scales all values into the [0,1] range.

🔹 If we want to scale between [-1,1], the formula is:


Why is Normalization Important?
✅ 1. Handles Different Scales of Data
•Features in a dataset may have different ranges (e.g., Age: 0-100, Salary: 10,000-100,000).
Normalization makes them comparable.
✅ 2. Improves Model Performance
•Many machine learning algorithms, especially those using distance-based calculations (like
KNN, SVM), perform better when features are normalized.
✅ 3. Helps with Faster Convergence
•Algorithms like Neural Networks and Gradient Descent-based models converge faster when
inputs are normalized.
✅ 4. Avoids Dominance of Large-Scale Features
•Prevents features with larger numerical ranges from overshadowing smaller-scale features
in the learning process.
When to Use Normalization?
Use Normalization When:
✔ Data has features with very different scales.
✔ Algorithms that use distance-based calculations (e.g., KNN, SVM, PCA, K-Means) require it.
✔ Neural networks (deep learning models) perform better with normalized input.
✔ The data does not follow a normal distribution.
Do NOT use Normalization When:
❌ The dataset follows a normal (Gaussian) distribution → Use Standardization instead.
❌ Algorithms like Decision Trees, Random Forests, XGBoost are used → They do not require feature
scaling.
Choosing Between Normalization & Standardization:
Choosing Normalization (Min-Max Scaling) or Standardization (Z-score scaling) depends on:
1.Data Distribution (Gaussian vs. non-Gaussian)
2.Machine Learning Algorithm (Sensitive vs. Insensitive to Scaling)
3.Outliers (Present vs. Absent)
When to Choose Normalization (Min-Max Scaling)?
✅ Use when:
•The data is not normally distributed (e.g., skewed or uniform).
•You need fixed range scaling (e.g., [0,1] or [-1,1]).
•Used in algorithms that do not assume Gaussian distribution:
• Deep Learning (Neural Networks)
• K-Nearest Neighbors (KNN)
• Support Vector Machines (SVM)
• K-Means Clustering
• Decision Trees / Random Forests (not sensitive but can benefit)
⚠ Avoid if:
•There are outliers, as Min-Max Scaling is sensitive to extreme values.
•The dataset follows a normal distribution.
When to Choose Standardization (Z-Score Scaling)?

✅ Use when:
•The data follows a normal (Gaussian) distribution.
•Outliers exist, as standardization is less sensitive to outliers.
•Used in algorithms that assume Gaussian distribution:
• Linear Regression
• Logistic Regression
• Principal Component Analysis (PCA)
• Support Vector Machines (SVM)
• K-Means Clustering
• Gradient Descent-based models (e.g., Linear Models, Neural Networks)

⚠ Avoid if:
•The data is not Gaussian and needs a fixed range.
•The model doesn’t rely on normally distributed input features.
 Difference Between Normalization and Standardization:
Why is Standardization Better Than Normalization for Normally Distributed
Data?
1. Standardization Transforms Data to Standard Normal Distribution (𝜇 = 0, 𝜎 = 1)
•Standardization centers the data by subtracting the mean (𝜇) and scales it by the standard deviation
(𝜎).
•This results in a standard normal distribution (bell-shaped curve) with:
• Mean = 0
• Standard Deviation = 1
•This is useful for models like Linear Regression, Logistic Regression, PCA, and SVM, which assume
that data is normally distributed.
2. Normalization (Min-Max Scaling) Doesn’t Preserve Gaussian Distribution
•Normalization scales data between 0 and 1 (or -1 to 1).
•If the data is already normally distributed, min-max scaling distorts its shape by compressing values
toward the boundaries (0 and 1).
•This can remove important statistical properties like standard deviation and skewness.

You might also like