Feature Scaling in Machine Learning
Feature Scaling in Machine Learning
Feature Scaling is a technique often applied as part of data preparation for machine learning. The
goal of scaling (sometimes also referred as normalization) is to change the values of numeric
columns in the dataset to use a common scale, without distorting differences in the ranges of
values or losing information.
If one component (e.g. human height) varies less than another does (e.g. weight) because of their
respective scales (meters vs. kilos), an algorithm say PCA might determine that the direction of
maximal variance more closely corresponds with the ‘weight’ axis, if those features are not
scaled. Thus leading to incorrect results.
Scaling avoids these problems by creating new values that maintain the general distribution and
ratios in the source data, while keeping values within a scale applied across all numeric columns
used in the model. Essentially avoiding the results being heavily influenced by the High magnitude
features.
Types of Scaling –
The two most commonly used scaling techniques are Z-score and Min-Max Scaling.
1. Z-Score Scaling – A method in which all the values are converted to z-scores.
Z-score removes the mean i.e. brings the mean to zero and scales the data to unit variance.
The scaling shrinks the range of the feature values, for eg. if the original range was say 500
to 50,000 with a mean of 20,000 the new range could be shrunk to -5 to +5 with mean
approximately equal to 0.
Proprietary content. ©Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited.
Z-scores are calculated using the following formula -
One thing to note at this point is, the expression of z-score makes use of mean and std.
deviation of the distribution, which are both affected by the presence of outlier values. This
leads to an imbalanced feature scales in presence of significant outlier values. Thus being
aware of the outliers and performing appropriate treatment in accordance with the business
becomes very crucial.
2. Min-Max Scaling: The MinMax method linearly rescales every feature to the [0,1]
interval. The presence of this bounded range - in contrast to z-score scaling - is that we
will end up with smaller standard deviations, which can suppress the effect of outliers.
The values in the column are obtained using the following formula:
Similar to the z-score scaling, Min-Max scaling is also affected by the outliers due to
presence of Min and Max value (as Outliers would be the Min Max values in the features)
in the expression. Reinforcing the need to inspect for outliers for both the scaling methods.
Proprietary content. ©Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited.
3. Min-Max can be a better option when you are able to accurately estimate the minimum
and maximum observable values. For example in image processing, where pixel
intensities are known to have the range of 0 to 255 for the RGB colour range.
Inverse Scaling -
There might be certain scenarios, where interpretability of the features is very important. For
instance, let’s suppose we have trained machine learning models on the scaled data and we want
to present our results/findings to the stake holders. Now, the stakeholders will be more
comfortable with the original scales (units) instead of the Bounded Scaled values. Thus, inverse
scaling can be used to get the values to the original scales and avoid the issues of interpretability.
#Splitting the dataset into Train and Test prior to scaling to avoid data leakage
#Only applicable for Supervised Learning Algorithms
X_train,X_test,y_train,y_test=train_test_split(X,y,test_size=0.3,random_state=100)
Proprietary content. ©Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited.
#Fitting the StandardScaler object with the Train Set
zscore.fit(X_train_cont)
#Scalling the Train & Test sets using the fitted StandardScaler object
X_train_scaled= pd.DataFrame(zscore.transform(X_train_cont),columns =
X_train_cont.columns)
X_test_scaled= pd.DataFrame(zscore.transform(X_test_cont),columns = X_test_cont.columns)
References –
1. Scikit-learn Pre-processing Article
2. Data Preprocessing in Data Mining By - Salvador García , Julián Luengo , Francisco
Herrera
Proprietary content. ©Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited.