Feature Scaling (Standardization & Normalization)
Feature Scaling (Standardization & Normalization)
or normalize the range of independent variables (features) in a dataset. It ensures that all features
contribute equally to the analysis, preventing features with larger magnitudes from dominating those
with smaller magnitudes.
This is particularly important for algorithms that rely on distance calculations (e.g., k-nearest
neighbors, support vector machines) or gradient-based optimization (e.g., linear regression, neural
networks).
Two common techniques for feature scaling are standardization and normalization.
Standardization (also called Z-score normalization) is a feature scaling technique used in data
preprocessing to transform features in a dataset so that they have a mean of 0 and a standard
deviation of 1.
This process ensures that the features are centered around zero and have a consistent scale, making
them suitable for machine learning algorithms that are sensitive to the magnitude of input data.
Mathematical Definition
The formula for standardization is:
Where:
•X: Original feature value.
•μ: Mean of the feature.
•σ: Standard deviation of the feature.
After standardization:
Transforms data to have zero mean and unit variance.
The mean (μ) of the feature becomes 0 and the standard deviation (σ) of the feature becomes 1.
Data can have negative values after scaling.
Works well with Gaussian (normally distributed) data but is also commonly used for other
distributions.
Key Characteristics of Standardization:
1.Centers Data Around Zero:
1. Subtracting the mean (μ) shifts the distribution so that the mean of the feature becomes 0.
1. Dividing by the standard deviation (σ) ensures that the feature has a standard deviation of 1.
1. Standardization does not change the shape of the data distribution (e.g., skewness, kurtosis). It
only changes the scale.
4.Handles Outliers:
1. While standardization is sensitive to outliers (since mean and standard deviation are influenced
by extreme values), it is less sensitive than normalization (min-max scaling).
Limitations of Standardization
1.Sensitive to Outliers:
1. Since standardization uses the mean and standard deviation, it can be influenced by outliers.
In such cases, robust scaling (using median and IQR) may be a better alternative.
2.Not Suitable for All Algorithms:
1. Tree-based algorithms (e.g., decision trees, random forests) do not require standardization, as
they are not sensitive to feature scales.
3.Assumes Gaussian Distribution:
1. Standardization works best when the data is approximately normally distributed. For non-
Gaussian distributions, other scaling methods (e.g., normalization) may be more appropriate.
Steps to Perform Standardization :
1.Calculate the Mean (μ):
✅ Machine Learning Models That Assume Normally Distributed Data and Algorithms Sensitive to Feature
Scales:
•Logistic Regression
•Support Vector Machines (SVM)
•k-Nearest Neighbors (KNN)
•Principal Component Analysis (PCA)
•Linear Regression (if features vary in scale)
•Clustering algorithms like K-Means, DBSCAN
✅ Distance-Based Algorithms: Algorithms that rely on distance metrics (e.g., k-nearest neighbors
(KNN), k-means clustering) require standardized data to ensure that all features contribute equally to the
distance calculations.
✅ Gradient-Based Optimization Works Better:Gradient-Based Optimization: Algorithms that use
gradient descent (e.g., linear regression, neural networks) converge faster when features are
standardized. Standardization helps gradient descent converge faster in algorithms like Neural
Networks.
✅ Features Have Different Scales & Units: Example: Age (in years) and Salary (in dollars) have different
ranges; standardization makes them comparable.
✅ When Data is Normally Distributed: Standardization is ideal when data follows a Gaussian (normal)
distribution.
When Not to Use Standardization:
❌ Tree-Based Models (Decision Trees, Random Forest, XGBoost, etc.)
•These models are not affected by feature scaling.
❌ When Data is Not Normally Distributed & Needs Different Scaling:
•Use MinMaxScaler if data is uniformly distributed (e.g., between 0 and 1).
•Use RobustScaler if data has many outliers.
Advantages of Standardization:
1.Improves Algorithm Performance:
1. Many machine learning algorithms perform better when features are standardized.
2.Faster Convergence:
1. Gradient-based optimization algorithms converge faster when features are on a similar scale.
3.Interpretability:
1. Standardized features are easier to interpret, as they are centered around zero and have a
consistent scale.
4.Handles Features with Different Units:
1. Standardization ensures that features with different units are treated equally.
Limitations of Standardization:
1.Sensitive to Outliers:
1. Since standardization uses the mean and standard deviation, it can be influenced by outliers.
In such cases, robust scaling (using median and IQR) may be a better alternative.
2.Not Suitable for All Algorithms:
1. Tree-based algorithms (e.g., decision trees, random forests) do not require standardization, as
they are not sensitive to feature scales.
3.Assumes Gaussian Distribution:
1. Standardization works best when the data is approximately normally distributed. For non-
Gaussian distributions, other scaling methods (e.g., normalization) may be more appropriate.
Normalization in Machine Learning:-
What is Normalization?
Normalization is a feature scaling technique that transforms data into a fixed range, typically [0,1] or
[-1,1]. It ensures that all features contribute equally to a model, preventing features with larger values
from dominating those with smaller values.
Where:
•X = Original feature value
•Xmin= Minimum value of the feature
•Xmax= Maximum value of the feature
🔹 This scales all values into the [0,1] range.
✅ Use when:
•The data follows a normal (Gaussian) distribution.
•Outliers exist, as standardization is less sensitive to outliers.
•Used in algorithms that assume Gaussian distribution:
• Linear Regression
• Logistic Regression
• Principal Component Analysis (PCA)
• Support Vector Machines (SVM)
• K-Means Clustering
• Gradient Descent-based models (e.g., Linear Models, Neural Networks)
⚠ Avoid if:
•The data is not Gaussian and needs a fixed range.
•The model doesn’t rely on normally distributed input features.
Difference Between Normalization and Standardization:
Why is Standardization Better Than Normalization for Normally Distributed
Data?
1. Standardization Transforms Data to Standard Normal Distribution (𝜇 = 0, 𝜎 = 1)
•Standardization centers the data by subtracting the mean (𝜇) and scales it by the standard deviation
(𝜎).
•This results in a standard normal distribution (bell-shaped curve) with:
• Mean = 0
• Standard Deviation = 1
•This is useful for models like Linear Regression, Logistic Regression, PCA, and SVM, which assume
that data is normally distributed.
2. Normalization (Min-Max Scaling) Doesn’t Preserve Gaussian Distribution
•Normalization scales data between 0 and 1 (or -1 to 1).
•If the data is already normally distributed, min-max scaling distorts its shape by compressing values
toward the boundaries (0 and 1).
•This can remove important statistical properties like standard deviation and skewness.