0% found this document useful (0 votes)

50 views

Normalization: Normalization Techniques at A Glance

The document discusses different techniques for normalizing data, including scaling to a range, clipping, log scaling, and z-scoring. It explains when each technique should be used, such as scaling for uniformly distributed data and clipping for outliers. The goal of normalization is to transform features to a similar scale to improve model performance and training stability.

Uploaded by

Arshad Ali

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

50 views

Normalization: Normalization Techniques at A Glance

Uploaded by

Arshad Ali

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 5

Normalization

The goal of normalization is to transform features to be on a similar

scale. This improves the
performance and training stability of the model.

Normalization Techniques at a Glance

Four common normalization techniques may be useful:

scaling to a range

clipping

log scaling

z-score

The following charts show the effect of each normalization technique on the
distribution of the raw
feature (price) on the left.
The charts are based on the data set from 1985 Ward's Automotive
Yearbook that
is part of the UCI Machine Learning Repository under Automobile Data
Set
(https://archive.ics.uci.edu/ml/datasets/automobile).

Figure 1. Summary of normalization techniques.

Scaling to a range

Recall from MLCC (/machine-learning/crash-course/representation/cleaning-data)

that scaling
(/machine-learning/glossary#scaling)
means converting floating-point feature values from their natural
range (for
example, 100 to 900) into a standard range—usually 0 and 1 (or sometimes -1 to
+1). Use
the following simple formula to scale to a range:

\[ x' = (x - x_{min}) / (x_{max} - x_{min}) \]

keyboard_arrow_left
Scaling to a range is a good choice when both of the following conditions are
met:

You know the approximate upper and lower bounds on your data with
few or no outliers.

Your data is approximately uniformly distributed across that range.

A good example is age. Most age values falls between 0 and 90, and every part of
the range has a
substantial number of people.

In contrast, you would not use scaling on income, because only a few people
have very high
incomes. The upper bound of the linear scale for income would be
very high, and most people would
be squeezed into a small part of the scale.

Feature Clipping

If your data set contains extreme outliers, you might try feature
clipping, which caps all feature
values above (or below) a certain
value to fixed value. For example, you could clip all temperature
values
above 40 to be exactly 40.

You may apply feature clipping before or after other normalizations.

Formula: Set min/max values to avoid outliers.

Figure 2. Comparing a raw distribution and its clipped version.

Another simple clipping strategy is to clip by z-score to +-Nσ (for example, limit to
+-3σ). Note that σ
is the standard deviation.

Log Scaling

Log scaling computes the log of keyboard_arrow_left

your values to compress a wide range to a narrow
range.
\[ x' = log(x) \]

Log scaling is helpful when a handful of your values have many points, while
most other values have
few points. This data distribution is known as the power
law distribution. Movie ratings are a good
example. In the chart below, most
movies have very few ratings (the data in the tail), while a few
have lots of
ratings (the data in the head). Log scaling changes the distribution, helping to
improve
linear model performance.

Figure 3. Comparing a raw distribution to its log.

Z-Score

Z-score is a variation of scaling that represents the number of standard

deviations away from the
mean. You would use z-score to ensure your feature
distributions have mean = 0 and std = 1. It’s
useful when there are a few
outliers, but not so extreme that you need clipping.

The formula for calculating the z-score of a point, x, is as follows:

\[ x' = (x - μ) / σ \]

μ is the mean and σ is the standard deviation.

keyboard_arrow_left
Figure 4. Comparing a raw distribution to its z-score distribution.

Notice that z-score squeezes raw values that have a range of ~40000
down into a range from
roughly -1 to +4.

Suppose you're not sure whether the outliers truly are extreme.
In this case, start with z-score unless
you have feature values that
you don't want the model to learn; for example, the values are
the result
of measurement error or a quirk.

Summary

est normalization technique is one that

empirically works well, so try new ideas if you think they'll work well on
your
e distribution.

Normalization
Formula When to Use
Technique

Linear Scaling $$ x' = (x - x_{min}) / (x_{max} - When the feature is more-or-less uniformly distributed
x_{min}) $$ across a fixed range.

Clipping if x > max, then x' = max. if x < min, When the feature contains some extreme outliers.
then x' = min

Log Scaling x' = log(x) When the feature conforms to the power law.

Z-score x' = (x - μ) / σ When the feature distribution does not contain extreme
outliers.

keyboard_arrow_left
erms:

aling (/machine-learning/glossary#scaling) normalization (/machine-learning/glossary#normalization)

arrow_back Transforming Numeric Data

(/machine-learning/data-prep/transform/transform-numeric)
Next
Bucketing
(/machine-learning/data-prep/transform/bucketing) arrow_forward

Except as otherwise noted, the content of this page is licensed under the Creative Commons Attribution 4.0 License
(https://creativecommons.org/licenses/by/4.0/), and code samples are licensed under the Apache 2.0 License
(https://www.apache.org/licenses/LICENSE-2.0). For details, see the Google Developers Site Policies
(https://developers.google.com/site-policies). Java is a registered trademark of Oracle and/or its affiliates.

Last updated 2021-06-13 UTC.

keyboard_arrow_left

Nursing Research POST TEST PNLE 2023 AK
100% (3)
Nursing Research POST TEST PNLE 2023 AK
14 pages
Student Solution Chap 09
No ratings yet
Student Solution Chap 09
10 pages
Chapter 1 Thinking Critically With Psychological Science, Myers 8e Psychology
100% (8)
Chapter 1 Thinking Critically With Psychological Science, Myers 8e Psychology
26 pages
Feature Scaling in Machine Learning
No ratings yet
Feature Scaling in Machine Learning
4 pages
Normalization and Standardization: Methods To Preprocess Data To Have Consistent Scales and Distributions
No ratings yet
Normalization and Standardization: Methods To Preprocess Data To Have Consistent Scales and Distributions
10 pages
8 Normalization Methods
No ratings yet
8 Normalization Methods
10 pages
1737527078055
No ratings yet
1737527078055
111 pages
ML - WEEK 04
No ratings yet
ML - WEEK 04
33 pages
23.-Scaling-Techniques
No ratings yet
23.-Scaling-Techniques
30 pages
Unit 4 4407 Data Mining Discussion
No ratings yet
Unit 4 4407 Data Mining Discussion
2 pages
Data Normalization in Data Mining
No ratings yet
Data Normalization in Data Mining
8 pages
CH1
No ratings yet
CH1
64 pages
Summary Chap 1 & 2
No ratings yet
Summary Chap 1 & 2
5 pages
Normalization Vs Standardization
No ratings yet
Normalization Vs Standardization
2 pages
Feature Scaling (Standardization & Normalization)
No ratings yet
Feature Scaling (Standardization & Normalization)
35 pages
Seven Lab Instruction
No ratings yet
Seven Lab Instruction
38 pages
3 1 Chapter 3 Normalization
No ratings yet
3 1 Chapter 3 Normalization
22 pages
Feature Scaling Techniques: Machine Learning
No ratings yet
Feature Scaling Techniques: Machine Learning
27 pages
ML Unit 2
No ratings yet
ML Unit 2
90 pages
04_data-normalization-in-python.en
No ratings yet
04_data-normalization-in-python.en
1 page
5 Data Preprocessing III Editted Notes
No ratings yet
5 Data Preprocessing III Editted Notes
17 pages
Machine Learning - Lec4 - 5
No ratings yet
Machine Learning - Lec4 - 5
41 pages
Normalization A Preprocessing Stage
No ratings yet
Normalization A Preprocessing Stage
5 pages
Lecture-11 - Feature Scaling
No ratings yet
Lecture-11 - Feature Scaling
26 pages
Data Normalization
No ratings yet
Data Normalization
7 pages
3_AML _Lecture 3_Feature Engg
No ratings yet
3_AML _Lecture 3_Feature Engg
39 pages
Unit 3-2
No ratings yet
Unit 3-2
15 pages
FeatureEngineering (1)
No ratings yet
FeatureEngineering (1)
50 pages
Week 10
No ratings yet
Week 10
50 pages
Unit 2 ML 2019
No ratings yet
Unit 2 ML 2019
91 pages
Preprocessing Stage
No ratings yet
Preprocessing Stage
4 pages
Data Transformation
No ratings yet
Data Transformation
16 pages
Data Mining
No ratings yet
Data Mining
11 pages
ML Normalization Techniques_ Overview & Practical Guide
No ratings yet
ML Normalization Techniques_ Overview & Practical Guide
5 pages
Advanced Databases and Mining
No ratings yet
Advanced Databases and Mining
49 pages
Lecture 7 Data Transformation and Dimensionality Reduction
No ratings yet
Lecture 7 Data Transformation and Dimensionality Reduction
22 pages
5.Feauture Engineering
No ratings yet
5.Feauture Engineering
34 pages
3point5point2 Normalization
No ratings yet
3point5point2 Normalization
3 pages
Data Normalization and Standardization
No ratings yet
Data Normalization and Standardization
6 pages
Lecture # 13 Data_Transformation_Techniques
No ratings yet
Lecture # 13 Data_Transformation_Techniques
36 pages
Lab Sheet 02
No ratings yet
Lab Sheet 02
8 pages
Data Normalizationand Standardization ATechnical Report
No ratings yet
Data Normalizationand Standardization ATechnical Report
6 pages
Standardization & Normalization In: ML With Python Example
No ratings yet
Standardization & Normalization In: ML With Python Example
8 pages
feature scaling
No ratings yet
feature scaling
6 pages
Data Normalization and Standardization - Google Docs
No ratings yet
Data Normalization and Standardization - Google Docs
6 pages
Session 7 Feature Selection & Dimensionality Reduction
No ratings yet
Session 7 Feature Selection & Dimensionality Reduction
20 pages
Conversation_Normalization
No ratings yet
Conversation_Normalization
2 pages
Unit-2Exploratory-Analysis
No ratings yet
Unit-2Exploratory-Analysis
37 pages
Feature Scaling in Machine Learning
No ratings yet
Feature Scaling in Machine Learning
14 pages
Iarjset 5
No ratings yet
Iarjset 5
3 pages
Normal_LectureNote
No ratings yet
Normal_LectureNote
48 pages
Lecture01 &02 (1)
No ratings yet
Lecture01 &02 (1)
77 pages
DMDW 5
No ratings yet
DMDW 5
25 pages
Lecture 2.3 Data Normalization
No ratings yet
Lecture 2.3 Data Normalization
7 pages
Practical 6
No ratings yet
Practical 6
6 pages
Xplore Feature Engineering
No ratings yet
Xplore Feature Engineering
9 pages
Explore Feature Engineering
No ratings yet
Explore Feature Engineering
10 pages
DS Day 5
No ratings yet
DS Day 5
11 pages
ML
No ratings yet
ML
6 pages
Machine Learning Notes
No ratings yet
Machine Learning Notes
27 pages
CS361 FA23 Lec2 Post
No ratings yet
CS361 FA23 Lec2 Post
67 pages
Backpropagation: Fundamentals and Applications for Preparing Data for Training in Deep Learning
From Everand
Backpropagation: Fundamentals and Applications for Preparing Data for Training in Deep Learning
Fouad Sabry
No ratings yet
Introduction to Algorithms
From Everand
Introduction to Algorithms
S VASIST
No ratings yet
Digital Signature: Review Questions
No ratings yet
Digital Signature: Review Questions
6 pages
Student Solution Chap 08
No ratings yet
Student Solution Chap 08
6 pages
The Full Form of KDD Is
No ratings yet
The Full Form of KDD Is
6 pages
Bsafc4 PPT Ch04
No ratings yet
Bsafc4 PPT Ch04
37 pages
Super Crunchers Analysis
No ratings yet
Super Crunchers Analysis
4 pages
Data Analysis Awoke
No ratings yet
Data Analysis Awoke
53 pages
TPWRS 01688 2015 - Final
No ratings yet
TPWRS 01688 2015 - Final
11 pages
Quality Performance and Learning Objectives
No ratings yet
Quality Performance and Learning Objectives
11 pages
Data Analysis Project Report
No ratings yet
Data Analysis Project Report
40 pages
Research Report On The Factors Behind The Use of Daraz
No ratings yet
Research Report On The Factors Behind The Use of Daraz
18 pages
2nd Quarter Exam PRactical Research 2
No ratings yet
2nd Quarter Exam PRactical Research 2
3 pages
Ect Group Work
No ratings yet
Ect Group Work
12 pages
Diseños de Regresión Discontinua Fundaciones
No ratings yet
Diseños de Regresión Discontinua Fundaciones
57 pages
Sleep Patterns and It's Impact On Academic Performance (Chapter 1)
100% (1)
Sleep Patterns and It's Impact On Academic Performance (Chapter 1)
11 pages
PR Reviewer
No ratings yet
PR Reviewer
3 pages
Math T STPM Sem 3 2018
No ratings yet
Math T STPM Sem 3 2018
2 pages
Neyman 1934
No ratings yet
Neyman 1934
69 pages
Germination Percentage of Mung Bean Exposed To Various Temperature Level
No ratings yet
Germination Percentage of Mung Bean Exposed To Various Temperature Level
44 pages
Time Management
No ratings yet
Time Management
39 pages
Chapter 3 Methodology
No ratings yet
Chapter 3 Methodology
9 pages
32.2 2.12. Correlation - Exercise
No ratings yet
32.2 2.12. Correlation - Exercise
2 pages
Research Method A. Place and Time of Research
No ratings yet
Research Method A. Place and Time of Research
8 pages
Hul Rural
100% (1)
Hul Rural
21 pages
The Influence of Extraction Treatment On Holdaway Soft-Tissue Measurements.
No ratings yet
The Influence of Extraction Treatment On Holdaway Soft-Tissue Measurements.
7 pages
Risk and Return - Section 11.2
No ratings yet
Risk and Return - Section 11.2
99 pages
Visvesvaraya Technological University Belagavi: Scheme of Teaching and Examinations and Syllabus
No ratings yet
Visvesvaraya Technological University Belagavi: Scheme of Teaching and Examinations and Syllabus
59 pages
Module For The Fourth Quarter Period: A I C S
No ratings yet
Module For The Fourth Quarter Period: A I C S
34 pages
Kontribusi Pembelajaran Di Perguruan Tinggi Dan Literasi Keuangan Terhadap Perilaku Keuangan Mahasiswa
No ratings yet
Kontribusi Pembelajaran Di Perguruan Tinggi Dan Literasi Keuangan Terhadap Perilaku Keuangan Mahasiswa
11 pages
HGHJGJ KJBBHB KJBJBJHBJHBJ
No ratings yet
HGHJGJ KJBBHB KJBJBJHBJHBJ
9 pages
b22mt14w e
No ratings yet
b22mt14w e
16 pages
Assignment
67% (6)
Assignment
15 pages

Normalization: Normalization Techniques at A Glance

Uploaded by

Normalization: Normalization Techniques at A Glance

Uploaded by

Normalization

The goal of normalization is to transform features to be on a similar

Normalization Techniques at a Glance

Four common normalization techniques may be useful:

Figure 1. Summary of normalization techniques.

Recall from MLCC (/machine-learning/crash-course/representation/cleaning-data)

\[ x' = (x - x_{min}) / (x_{max} - x_{min}) \]

Your data is approximately uniformly distributed across that range.

You may apply feature clipping before or after other normalizations.

Formula: Set min/max values to avoid outliers.

Figure 2. Comparing a raw distribution and its clipped version.

Log scaling computes the log of keyboard_arrow_left

Figure 3. Comparing a raw distribution to its log.

Z-score is a variation of scaling that represents the number of standard

The formula for calculating the z-score of a point, x, is as follows:

μ is the mean and σ is the standard deviation.

est normalization technique is one that

arrow_back Transforming Numeric Data

Last updated 2021-06-13 UTC.

You might also like