0% found this document useful (0 votes)

5 views

Data Normalizationand Standardization ATechnical Report

This technical report discusses two common data preprocessing techniques: normalization and standardization. Normalization rescales data to a specific range like 0 to 1, while standardization rescales data to have a mean of 0 and standard deviation of 1. The report provides formulas for performing normalization to different ranges and its inverse, denormalization. It also discusses z-score standardization and provides an example.

Uploaded by

Sergio Alumenda

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

5 views

Data Normalizationand Standardization ATechnical Report

Uploaded by

Sergio Alumenda

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 6

Machine Learning Technical Reports (2014) 1(1): 1-6

Data Normalization and Standardization: A Technical

Report
Peshawa Jamal Muhammad Ali*, and Rezhna Hassan Faraj
The Machine Learning Lab. at Koya University
Koya, Erbil, Iraq.
[email protected]

Cite the technical report:

Peshawa J. Muhammad Ali, Rezhna H. Faraj; “Data Normalization and Standardization: A Technical
Report” , Machine Learning Technical Reports, 2014, 1(1), pp 1-6.
https://docs.google.com/document/d/1x0A1nUz1WWtMCZb5oVzF0SVMY7a_58KQulqQVT8LaVA/edit#

Information about the publisher:

Machine Learning Technical Reports is a periodical technical report published by the Machine Learning Lab. at Koya
University
Koya University, Building of Faculty of Engineering, KOY45
Koya 44023, Erbil, F.R. of Iraq
Contact: +9647707578801/+9647501138655, email: [email protected]

Abstract
This paper aims to clarify how and why data are normalized or standardized, these two
processes are used in the data preprocessing stage in which the data is prepared to be
processed later by one of the data mining and machine learning techniques like support vector
machine, neural network, etc. The two methods try to scale the data set. These two processes
are helpful in some cases and necessary in some other cases, most of the data mining and
machine learning tools include these two preprocessing techniques like in Weka or in Matlab.
This paper will simply define and present the use of these two data preprocessing techniques.

Normalization
It’s the process of casting the data to the specific range, like between 0 and 1 or between -1 and
+1. Normalization is required when there are big differences in the ranges of different features.
This scaling method is useful when the data set does not contain outliers. The theoretical
background of normalization can be easily understood from Figure (1). If it is required to cast
the data to the range 0,1 then:

valueAf terN ormalization − 0 valueBef oreN ormalization − min

1−0 = max − min
valueAf terN ormalization valueBef oreN ormalization − min
1 = max − min

valueBef oreN ormalization − min

v alueAf terN ormalization = max − min

x − min
or x′ = max − min

Denormalization
This process should be done if normalization applied. For example, to denormalize the a data
from the range 0, 1 below equation can be used:

x = [x′ * (max − min)] + min

where x’ is the normalized data and x is denormalized data, min and max are the same values
used previously in the normalization process.

2
To normalize the data to the range -1, +1 see Fig(2):

valueAf terN ormalization − (−1) valueBef oreN ormalization − min

1 − (−1) = max − min

valueAf terN ormalization +1) valueBef oreN ormalization − min

2 = max − min

valueBef oreN ormalization − min

v alueAf terN ormalization = 2 * ( max − min ) −1

x − min
or x′ = 2 * ( max − min ) − 1

3
Denormalization from range -1, +1

x = [ ( x′ 2+ 1 )(max − min) ] + min

In WEKA, for the range -1,+1, the formula is organized as follow:

x − min
x′ = 2 * ( max − min ) − 1

x − min x − min−( max−min )

x′ = ( max−min ) − 1 = [ 2
max−min ]
2 2

x − min− max min

2 + 2 ) x − max min
2 − 2 )
x′ = [ max−min ]=[ max−min ]
2 2

x − ( max min
2 + 2 )
x′ = [ max−min ]
2

x − ( max 2+ min )
x′ = max−min
2

4
Z-score standardization
Making a data set with mean=0, and standard deviation =1. This scaling method is
useful when the data follows a normal distribution (Gaussian distribution), if the data
does not follow normal distribution then this will make problems.

Example: -20, -6, 0, 40, 70,120

−20−6+0+40+70+120
M ean = 6 = 34

sd = √ (−20−34)2 +(−6−34)2 +(0−34)2 +(40−34)2 +(70−34)2 + (120−34)2

sd = 48.98979

z-score standardization

x−mean −20−34
x" = sd = 48.98979 = − 1.1022

Other values are changed too,

Accordingly, values are changed to:

-1.10227

-0.8165

-0.69402

0.122474

0.734847

1.755468

5
Now, if you calculate the average and sd of these new values you will see that the mean
is zero and sd=1.

Important note:

However, the point must be made that N/S are _not_ good where the raw measurement
is desirable and where the N/S is irreversible, thus losing much of the information in the
raw measurement, this is according to a note made by Kevin Hankins
([email protected]).

References
1. Yazen A. Khalil and Peshawa J. Muhammad Ali; “A proposed method for colorizing
grayscale images”, International Journal of Computer, Science and Engineering,
2013, 2(2), pp.104-109.
http://www.iaset.us/view_acrhives.php?year=2013&id=14&jtype=2&page=2
2. Peshawa J. Muhammad Ali, Nigar M.S. Suramerry, Abdul-rahman M. Yunis, Ladeh
S.Abdulrahman, “Gender prediction of journalists from writing style”, Aro Journal,
2013, 1(1), pp.22-28. http://aro.koyauniversity.org/issues/volumeone/aro-10031
3. Peshawa J. Muhammad Ali; “Predicting the gender of the Kurdish writers in
Facebook” Sulaimani Journal for Engineering Sciences, 2013, 1(1), pp.18-28.
http://www.univsul.edu.iq/Wenekan_KS/12111313102014_Sulaimani%20Journal-EN
G.%2020-30.pdf

Norms and Basic Statistics For Testing
No ratings yet
Norms and Basic Statistics For Testing
26 pages
Data Normalization and Standardization
No ratings yet
Data Normalization and Standardization
6 pages
Data Normalization and Standardization - Google Docs
No ratings yet
Data Normalization and Standardization - Google Docs
6 pages
23.-Scaling-Techniques
No ratings yet
23.-Scaling-Techniques
30 pages
3 1 Chapter 3 Normalization
No ratings yet
3 1 Chapter 3 Normalization
22 pages
Normalization A Preprocessing Stage
No ratings yet
Normalization A Preprocessing Stage
5 pages
Iarjset 5
No ratings yet
Iarjset 5
3 pages
Summary Chap 1 & 2
No ratings yet
Summary Chap 1 & 2
5 pages
Data Normalization in Data Mining
No ratings yet
Data Normalization in Data Mining
8 pages
Normalization: Normalization Techniques at A Glance
No ratings yet
Normalization: Normalization Techniques at A Glance
5 pages
Normalization and Standardization: Methods To Preprocess Data To Have Consistent Scales and Distributions
No ratings yet
Normalization and Standardization: Methods To Preprocess Data To Have Consistent Scales and Distributions
10 pages
Normal_LectureNote
No ratings yet
Normal_LectureNote
48 pages
ML Unit 2
No ratings yet
ML Unit 2
90 pages
Lecture 7 Data Transformation and Dimensionality Reduction
No ratings yet
Lecture 7 Data Transformation and Dimensionality Reduction
22 pages
Unit 2 ML 2019
No ratings yet
Unit 2 ML 2019
91 pages
Lecture-11 - Feature Scaling
No ratings yet
Lecture-11 - Feature Scaling
26 pages
5.Feauture Engineering
No ratings yet
5.Feauture Engineering
34 pages
CH1
No ratings yet
CH1
64 pages
Preprocessing Stage
No ratings yet
Preprocessing Stage
4 pages
3_AML _Lecture 3_Feature Engg
No ratings yet
3_AML _Lecture 3_Feature Engg
39 pages
Lecture # 13 Data_Transformation_Techniques
No ratings yet
Lecture # 13 Data_Transformation_Techniques
36 pages
My Notes
No ratings yet
My Notes
15 pages
1737527078055
No ratings yet
1737527078055
111 pages
Normal Distribn Theory
0% (1)
Normal Distribn Theory
16 pages
CS361 FA23 Lec2 Post
No ratings yet
CS361 FA23 Lec2 Post
67 pages
Feature Scaling (Standardization & Normalization)
No ratings yet
Feature Scaling (Standardization & Normalization)
35 pages
8 Normalization Methods
No ratings yet
8 Normalization Methods
10 pages
5 Data Preprocessing III Editted Notes
No ratings yet
5 Data Preprocessing III Editted Notes
17 pages
ML - WEEK 04
No ratings yet
ML - WEEK 04
33 pages
Unit 4 4407 Data Mining Discussion
No ratings yet
Unit 4 4407 Data Mining Discussion
2 pages
Seven Lab Instruction
No ratings yet
Seven Lab Instruction
38 pages
Data Normalization
No ratings yet
Data Normalization
7 pages
Week 10
No ratings yet
Week 10
50 pages
Chapter 06
No ratings yet
Chapter 06
55 pages
Unit 3-2
No ratings yet
Unit 3-2
15 pages
Machine Learning Notes
No ratings yet
Machine Learning Notes
27 pages
3point5point2 Normalization
No ratings yet
3point5point2 Normalization
3 pages
Thinking With Data
No ratings yet
Thinking With Data
212 pages
Additional-Notes STATS
No ratings yet
Additional-Notes STATS
8 pages
4 Normal Distribution
No ratings yet
4 Normal Distribution
40 pages
IS328 Data Mining-Tutorial Lab Session 2 - Solution - Updated
No ratings yet
IS328 Data Mining-Tutorial Lab Session 2 - Solution - Updated
15 pages
Maths$Stats_NOTES.docx
No ratings yet
Maths$Stats_NOTES.docx
50 pages
Unit-2Exploratory-Analysis
No ratings yet
Unit-2Exploratory-Analysis
37 pages
Advanced Databases and Mining
No ratings yet
Advanced Databases and Mining
49 pages
Data Mining
No ratings yet
Data Mining
33 pages
Lecture 2.3 Data Normalization
No ratings yet
Lecture 2.3 Data Normalization
7 pages
Feature Scaling in Machine Learning
No ratings yet
Feature Scaling in Machine Learning
4 pages
Well Posed Learning Problem
100% (1)
Well Posed Learning Problem
4 pages
Data Mining: A Preprocessing Engine
No ratings yet
Data Mining: A Preprocessing Engine
5 pages
Ds 5
No ratings yet
Ds 5
9 pages
Effectiveness of Normalization Pre-Processing of Big Data To The Machine Learning Performance
No ratings yet
Effectiveness of Normalization Pre-Processing of Big Data To The Machine Learning Performance
6 pages
Normal Distribution
No ratings yet
Normal Distribution
29 pages
BE184
No ratings yet
BE184
47 pages
5 Random Var PDF
No ratings yet
5 Random Var PDF
74 pages
Mathematics As A Tool (Descriptive Statistics) (Midterm Period) Overview: This Module Tackles Mathematics As Applied To Different Areas Such As Data
No ratings yet
Mathematics As A Tool (Descriptive Statistics) (Midterm Period) Overview: This Module Tackles Mathematics As Applied To Different Areas Such As Data
33 pages
SBE12 CH 03 B
No ratings yet
SBE12 CH 03 B
40 pages
STATS Reviewer
No ratings yet
STATS Reviewer
3 pages
2022 CS244 End Sem Soln
No ratings yet
2022 CS244 End Sem Soln
6 pages
Solving Math Problems
From Everand
Solving Math Problems
George N. Frempong
No ratings yet
Major Revision Facts in Mathematics
From Everand
Major Revision Facts in Mathematics
B. N. Kumar
No ratings yet
Pattern Recognition Letters: Julien Lesouple, Cédric Baudoin, Marc Spigai, Jean-Yves Tourneret
No ratings yet
Pattern Recognition Letters: Julien Lesouple, Cédric Baudoin, Marc Spigai, Jean-Yves Tourneret
11 pages
Extracting Text and Images From PDF Files
No ratings yet
Extracting Text and Images From PDF Files
10 pages
A Survey of Machine Learning Approaches For Student Dropout Prediction in Online Courses
No ratings yet
A Survey of Machine Learning Approaches For Student Dropout Prediction in Online Courses
34 pages
An In-Depth Study and Improvement of Isolation Forest
No ratings yet
An In-Depth Study and Improvement of Isolation Forest
19 pages
Incomplete Data Review
No ratings yet
Incomplete Data Review
3 pages
970-Article Text-3918-2-10-20221108
No ratings yet
970-Article Text-3918-2-10-20221108
9 pages
Blending Shapley Values For Feature Ranking in Machine Learning: An Analysis On Educational Data
No ratings yet
Blending Shapley Values For Feature Ranking in Machine Learning: An Analysis On Educational Data
25 pages
1 s2.0 S235197891930736X Main
No ratings yet
1 s2.0 S235197891930736X Main
6 pages
Transformation, Normalization and Batch Effect in The Analysis of Mass Spectrometry Data For Omics Studies
No ratings yet
Transformation, Normalization and Batch Effect in The Analysis of Mass Spectrometry Data For Omics Studies
34 pages
Student Performance Assessment Using Clustering Techniques
No ratings yet
Student Performance Assessment Using Clustering Techniques
10 pages
DS QB
No ratings yet
DS QB
6 pages
MCQ 3 aiml
No ratings yet
MCQ 3 aiml
2 pages
KGiSL Institute of Technolog(Final) - Copy (2)
No ratings yet
KGiSL Institute of Technolog(Final) - Copy (2)
31 pages
MID TERM medicine recommended system report
No ratings yet
MID TERM medicine recommended system report
43 pages
Applications of Artificial Intelligence Techniques in the Petroleum Industry 1st Edition Abdolhossein Hemmati Sarapardeh download
100% (1)
Applications of Artificial Intelligence Techniques in the Petroleum Industry 1st Edition Abdolhossein Hemmati Sarapardeh download
46 pages
Heart Disease Predicition
No ratings yet
Heart Disease Predicition
42 pages
DS Module 1 Notes
No ratings yet
DS Module 1 Notes
25 pages
Op Jeeva1
No ratings yet
Op Jeeva1
36 pages
Python Lab Manual (3)
No ratings yet
Python Lab Manual (3)
21 pages
Enhancing Microsoft 365 Security Integra
No ratings yet
Enhancing Microsoft 365 Security Integra
32 pages
1. How is this model different from existing phishing detection models
No ratings yet
1. How is this model different from existing phishing detection models
4 pages
DWDM Lab File
No ratings yet
DWDM Lab File
29 pages
BTech_Project_Research_Paper
No ratings yet
BTech_Project_Research_Paper
7 pages
Immediate download Applications of Artificial Intelligence Techniques in the Petroleum Industry 1st Edition Abdolhossein Hemmati Sarapardeh ebooks 2024
100% (3)
Immediate download Applications of Artificial Intelligence Techniques in the Petroleum Industry 1st Edition Abdolhossein Hemmati Sarapardeh ebooks 2024
50 pages
Shelly Mehndiratta IrisFlowerClassification
No ratings yet
Shelly Mehndiratta IrisFlowerClassification
15 pages
SRS
No ratings yet
SRS
23 pages
Chronic Kidney Disease Prediction: Team No: 24
No ratings yet
Chronic Kidney Disease Prediction: Team No: 24
7 pages
Efficient Data Search and Retrieval in Cloud Assisted Iot Environment
No ratings yet
Efficient Data Search and Retrieval in Cloud Assisted Iot Environment
6 pages
3-Data Pre-Processing
No ratings yet
3-Data Pre-Processing
18 pages
Algorithms and Complexity Finals Exam Rubrics
No ratings yet
Algorithms and Complexity Finals Exam Rubrics
2 pages
Natural Language Processingand Sentiment Analysis
No ratings yet
Natural Language Processingand Sentiment Analysis
15 pages
Zomato Ishani Abhi
No ratings yet
Zomato Ishani Abhi
33 pages
Aesha Enairat
No ratings yet
Aesha Enairat
11 pages
Big Data Analysis
No ratings yet
Big Data Analysis
33 pages
Presentation Script For Weapon Detection
No ratings yet
Presentation Script For Weapon Detection
3 pages
Sarvagha K DS
No ratings yet
Sarvagha K DS
1 page
CCS341-DW LAB Manual - Chumma Chumma Practical Notes
No ratings yet
CCS341-DW LAB Manual - Chumma Chumma Practical Notes
89 pages
CRP and BA Project List
No ratings yet
CRP and BA Project List
14 pages
4 Data Preprocessing
No ratings yet
4 Data Preprocessing
27 pages
Report on Coral Leaf Stage -1
No ratings yet
Report on Coral Leaf Stage -1
25 pages

Data Normalizationand Standardization ATechnical Report

Uploaded by

Data Normalizationand Standardization ATechnical Report

Uploaded by

Machine Learning Technical Reports (2014) 1(1): 1-6

Data Normalization and Standardization: A Technical

Cite the technical report:

Information about the publisher:

valueAf terN ormalization − 0 valueBef oreN ormalization − min

valueBef oreN ormalization − min

x = [x′ * (max − min)] + min

valueAf terN ormalization − (−1) valueBef oreN ormalization − min

valueAf terN ormalization +1) valueBef oreN ormalization − min

valueBef oreN ormalization − min

x = [ ( x′ 2+ 1 )(max − min) ] + min

In WEKA, for the range -1,+1, the formula is organized as follow:

x − min x − min−( max−min )

x − min− max min

Example: -20, -6, 0, 40, 70,120

sd = √ (−20−34)2 +(−6−34)2 +(0−34)2 +(40−34)2 +(70−34)2 + (120−34)2

Other values are changed too,

Accordingly, values are changed to:

You might also like