0% found this document useful (0 votes)
50 views

Normalization: Normalization Techniques at A Glance

The document discusses different techniques for normalizing data, including scaling to a range, clipping, log scaling, and z-scoring. It explains when each technique should be used, such as scaling for uniformly distributed data and clipping for outliers. The goal of normalization is to transform features to a similar scale to improve model performance and training stability.

Uploaded by

Arshad Ali
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
50 views

Normalization: Normalization Techniques at A Glance

The document discusses different techniques for normalizing data, including scaling to a range, clipping, log scaling, and z-scoring. It explains when each technique should be used, such as scaling for uniformly distributed data and clipping for outliers. The goal of normalization is to transform features to a similar scale to improve model performance and training stability.

Uploaded by

Arshad Ali
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

Normalization

The goal of normalization is to transform features to be on a similar


scale. This improves the
performance and training stability of the model.

Normalization Techniques at a Glance

Four common normalization techniques may be useful:

scaling to a range

clipping

log scaling

z-score

The following charts show the effect of each normalization technique on the
distribution of the raw
feature (price) on the left.
The charts are based on the data set from 1985 Ward's Automotive
Yearbook that
is part of the UCI Machine Learning Repository under Automobile Data
Set
 (https://archive.ics.uci.edu/ml/datasets/automobile).

Figure 1. Summary of normalization techniques.

Scaling to a range

Recall from MLCC (/machine-learning/crash-course/representation/cleaning-data)


that scaling
 (/machine-learning/glossary#scaling)
means converting floating-point feature values from their natural
range (for
example, 100 to 900) into a standard range—usually 0 and 1 (or sometimes -1 to
+1). Use
the following simple formula to scale to a range:

\[ x' = (x - x_{min}) / (x_{max} - x_{min}) \]


keyboard_arrow_left
Scaling to a range is a good choice when both of the following conditions are
met:

You know the approximate upper and lower bounds on your data with
few or no outliers.

Your data is approximately uniformly distributed across that range.

A good example is age. Most age values falls between 0 and 90, and every part of
the range has a
substantial number of people.

In contrast, you would not use scaling on income, because only a few people
have very high
incomes. The upper bound of the linear scale for income would be
very high, and most people would
be squeezed into a small part of the scale.

Feature Clipping

If your data set contains extreme outliers, you might try feature
clipping, which caps all feature
values above (or below) a certain
value to fixed value. For example, you could clip all temperature
values
above 40 to be exactly 40.

You may apply feature clipping before or after other normalizations.

Formula: Set min/max values to avoid outliers.

Figure 2. Comparing a raw distribution and its clipped version.

Another simple clipping strategy is to clip by z-score to +-Nσ (for example, limit to
+-3σ). Note that σ
is the standard deviation.

Log Scaling

Log scaling computes the log of keyboard_arrow_left


your values to compress a wide range to a narrow
range.
\[ x' = log(x) \]

Log scaling is helpful when a handful of your values have many points, while
most other values have
few points. This data distribution is known as the power
law distribution. Movie ratings are a good
example. In the chart below, most
movies have very few ratings (the data in the tail), while a few
have lots of
ratings (the data in the head). Log scaling changes the distribution, helping to
improve
linear model performance.

Figure 3. Comparing a raw distribution to its log.

Z-Score

Z-score is a variation of scaling that represents the number of standard


deviations away from the
mean. You would use z-score to ensure your feature
distributions have mean = 0 and std = 1. It’s
useful when there are a few
outliers, but not so extreme that you need clipping.

The formula for calculating the z-score of a point, x, is as follows:

\[ x' = (x - μ) / σ \]

μ is the mean and σ is the standard deviation.

keyboard_arrow_left
Figure 4. Comparing a raw distribution to its z-score distribution.

Notice that z-score squeezes raw values that have a range of ~40000
down into a range from
roughly -1 to +4.

Suppose you're not sure whether the outliers truly are extreme.
In this case, start with z-score unless
you have feature values that
you don't want the model to learn; for example, the values are
the result
of measurement error or a quirk.

Summary

est normalization technique is one that


empirically works well, so try new ideas if you think they'll work well on
your
e distribution.

Normalization
Formula When to Use
Technique

Linear Scaling $$ x' = (x - x_{min}) / (x_{max} - When the feature is more-or-less uniformly distributed
x_{min}) $$ across a fixed range.

Clipping if x > max, then x' = max. if x < min, When the feature contains some extreme outliers.
then x' = min

Log Scaling x' = log(x) When the feature conforms to the power law.

Z-score x' = (x - μ) / σ When the feature distribution does not contain extreme
outliers.

keyboard_arrow_left
erms:

aling (/machine-learning/glossary#scaling) normalization (/machine-learning/glossary#normalization)

Previous

arrow_back Transforming Numeric Data


 (/machine-learning/data-prep/transform/transform-numeric)
Next
Bucketing
 (/machine-learning/data-prep/transform/bucketing) arrow_forward

Except as otherwise noted, the content of this page is licensed under the Creative Commons Attribution 4.0 License
 (https://creativecommons.org/licenses/by/4.0/), and code samples are licensed under the Apache 2.0 License
 (https://www.apache.org/licenses/LICENSE-2.0). For details, see the Google Developers Site Policies
 (https://developers.google.com/site-policies). Java is a registered trademark of Oracle and/or its affiliates.

Last updated 2021-06-13 UTC.

keyboard_arrow_left

You might also like