Data Transformation by Andy Field

This document discusses different types of data transformations and their uses. It provides a table that lists common data transformations including log transformations, square root transformations, and reciprocal transformations. Each transformation is described in terms of its ability to correct for positive skew and unequal variances in data. The document also discusses using reverse score transformations to correct for negatively skewed data. It notes some issues to consider with transformations, such as not being able to take the log of zero or negative values.

Uploaded by

Gon Mart

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

380 views

Data Transformation by Andy Field

Uploaded by

Gon Mart

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 1

CHAPTE R 5 E X P L O R I N G A SS U M P T I O N S 155

TABLE 5.1 Data transformations and their uses

Data Transformation Can Correct For

Log transformation (log(Xi )): Taking the logarithm of a set of numbers squashes the Positive skew,
right tail of the distribution. As such it’s a good way to reduce positive skew. However, unequal variances
you can’t get a log value of zero or negative numbers, so if your data tend to zero or
produce negative numbers you need to add a constant to all of the data before you do the
transformation. For example, if you have zeros in the data then do log (Xi + 1), or if you have
negative numbers add whatever value makes the smallest number in the data set positive.

Square root transformation (Xi ): Taking the square root of large values has more of an Positive skew,
effect than taking the square root of small values. Consequently, taking the square root unequal variances
of each of your scores will bring any large scores closer to the centre – rather like the log
transformation. As such, this can be a useful way to reduce positive skew; however, you still
have the same problem with negative numbers (negative numbers don’t have a square root).

Reciprocal transformation (1/Xi ): Dividing 1 by each score also reduces the impact of Positive skew,
large scores. The transformed variable will have a lower limit of 0 (very large numbers will unequal variances
become close to 0). One thing to bear in mind with this transformation is that it reverses
the scores: scores that were originally large in the data set become small (close to
zero) after the transformation, but scores that were originally small become big after the
transformation. For example, imagine two scores of 1 and 10; after the transformation they
become 1/1 = 1, and 1/10 = 0.1: the small score becomes bigger than the large score
after the transformation. However, you can avoid this by reversing the scores before the
transformation, by finding the highest score and changing each score to the highest score
minus the score you’re looking at. So, you do a transformation 1/(XHighest Xi ).

Reverse score transformations: Any one of the above transformations can be used to Negative skew
correct negatively skewed data, but first you have to reverse the scores. To do this, subtract
each score from the highest score obtained, or the highest score + 1 (depending on
whether you want your lowest score to be 0 or 1). If you do this, don’t forget to reverse the
scores back afterwards, or to remember that the interpretation of the variable is reversed:
big scores have become small and small scores have become big!

The issue is quite complicated (especially for this early

in the book), but essentially we need to know whether the
statistical models we apply perform better on transformed
data than they do when applied to data that violate the
assumption that the transformation corrects. If a statisti-
cal model is still accurate even when its assumptions are
broken it is said to be a robust test (section 5.7.4). I’m
JANE SUPERBRAIN 5.1 not going to discuss whether particular tests are robust
here, but I will discuss the issue for particular tests in their
To transform or not to transform, respective chapters. The question of whether to trans-
that is the question 3 form is linked to this issue of robustness (which in turn is
linked to what test you are performing on your data).
Not everyone agrees that transforming data is a good idea; A good case in point is the F-test in ANOVA (see
for example, Glass, Peckham, and Sanders (1972) in a very Chapter 10), which is often claimed to be robust (Glass
extensive review commented that ‘the payoff of normaliz- et al., 1972). Early findings suggested that F performed
ing transformations in terms of more valid probability state- as it should in skewed distributions and that transform-
ments is low, and they are seldom considered to be worth ing the data helped as often as it hindered the accu-
the effort’ (p. 241). In which case, should we bother? racy of F (Games & Lucas, 1966). However, in a lively

Machine Learning Business Report
75% (55)
Machine Learning Business Report
60 pages
Scrum: Question & Answers
100% (1)
Scrum: Question & Answers
11 pages
Claire's Quest Walkthrough Optimization and Extras
No ratings yet
Claire's Quest Walkthrough Optimization and Extras
26 pages
Machine Learning Interview Questions
From Everand
Machine Learning Interview Questions
Tech Interviews
4.5/5 (2)
2306 9MA0-32 A Level Mechanics - June 2023 Mark Scheme PDF
67% (3)
2306 9MA0-32 A Level Mechanics - June 2023 Mark Scheme PDF
19 pages
Daihatsu Type K3 Engine Service Manual No.9737 No.9332 No. 9237 General Information PDF
100% (1)
Daihatsu Type K3 Engine Service Manual No.9737 No.9332 No. 9237 General Information PDF
9 pages
Regression Analysis With GRETL
No ratings yet
Regression Analysis With GRETL
21 pages
Types of Transformations For Better Normal Distribution - by Tamil Selvan S - Towards Data Science
No ratings yet
Types of Transformations For Better Normal Distribution - by Tamil Selvan S - Towards Data Science
6 pages
2 Bhojal (Party Wise) All - 1ok
No ratings yet
2 Bhojal (Party Wise) All - 1ok
16 pages
Applied Statistics Outliers Chapter 2
No ratings yet
Applied Statistics Outliers Chapter 2
12 pages
Data Transformation Handout
No ratings yet
Data Transformation Handout
2 pages
ECON3206 - Tutorial 4 - Felipe
No ratings yet
ECON3206 - Tutorial 4 - Felipe
19 pages
PS Notes (Machine Learning
No ratings yet
PS Notes (Machine Learning
14 pages
R Multiple Regression Exercise 2019
No ratings yet
R Multiple Regression Exercise 2019
6 pages
Module 35M F Test
No ratings yet
Module 35M F Test
25 pages
Computer Class 2_time series
No ratings yet
Computer Class 2_time series
13 pages
Chap5 Chris Brooks
No ratings yet
Chap5 Chris Brooks
8 pages
Hints and Codes For The Final Paper
No ratings yet
Hints and Codes For The Final Paper
10 pages
Reliability
No ratings yet
Reliability
0 pages
Correspondent Analysis
No ratings yet
Correspondent Analysis
26 pages
An Introduction To Dynamical Systems and Chaos
No ratings yet
An Introduction To Dynamical Systems and Chaos
27 pages
Box Cox Transformation
No ratings yet
Box Cox Transformation
9 pages
Lec 34
No ratings yet
Lec 34
15 pages
Computer Practical 3
No ratings yet
Computer Practical 3
7 pages
Control Charts and NonNormal Data
No ratings yet
Control Charts and NonNormal Data
9 pages
10 - Regression - Explained - SPSS - Important For Basic Concept
No ratings yet
10 - Regression - Explained - SPSS - Important For Basic Concept
23 pages
Unit 540 Differences Between Two Groups With Answers
No ratings yet
Unit 540 Differences Between Two Groups With Answers
8 pages
Explanation: Dickey-Fuller Table
No ratings yet
Explanation: Dickey-Fuller Table
2 pages
Advanced Software Test Design Techniques Decision Tables and Cause-Effect Graphs
No ratings yet
Advanced Software Test Design Techniques Decision Tables and Cause-Effect Graphs
11 pages
ReLu Heuristics For Avoiding Local Bad Minima
100% (2)
ReLu Heuristics For Avoiding Local Bad Minima
10 pages
9 G1 IDAqr WKG
No ratings yet
9 G1 IDAqr WKG
23 pages
Data Science Interview Preparation
100% (1)
Data Science Interview Preparation
113 pages
Chow Test
0% (1)
Chow Test
23 pages
Panal Data Method ch14 PDF
No ratings yet
Panal Data Method ch14 PDF
38 pages
Comparison of Classification Algorithms
No ratings yet
Comparison of Classification Algorithms
11 pages
Arun_27072021_Predictive_Modeling.pdf
No ratings yet
Arun_27072021_Predictive_Modeling.pdf
33 pages
3-Linear Regreesion-Assumptions
No ratings yet
3-Linear Regreesion-Assumptions
28 pages
Week 9 Lecture - Revision Test-dual-translated
No ratings yet
Week 9 Lecture - Revision Test-dual-translated
92 pages
Unit 540 Differences Between Two Groups Without Answers
No ratings yet
Unit 540 Differences Between Two Groups Without Answers
5 pages
Comparison Test For Improper Integrals
No ratings yet
Comparison Test For Improper Integrals
8 pages
English (US) Home Community Submit A Request: Help Center Sign in
No ratings yet
English (US) Home Community Submit A Request: Help Center Sign in
14 pages
Econ 306 HW 3
No ratings yet
Econ 306 HW 3
7 pages
Chow Test
No ratings yet
Chow Test
23 pages
Transformation 1
No ratings yet
Transformation 1
35 pages
ML Model Paper 2 Solution
No ratings yet
ML Model Paper 2 Solution
15 pages
Lesson 34: Principal Component Analysis: 1. Cross-Tabulation
No ratings yet
Lesson 34: Principal Component Analysis: 1. Cross-Tabulation
4 pages
Notes-1
No ratings yet
Notes-1
3 pages
Chapter4 Solutions
No ratings yet
Chapter4 Solutions
8 pages
Financial Econometrics
No ratings yet
Financial Econometrics
16 pages
Factor Analysis
No ratings yet
Factor Analysis
14 pages
Final Paper Guide For PS, Spring : e Source File For This Document Is Not Yet Available at
No ratings yet
Final Paper Guide For PS, Spring : e Source File For This Document Is Not Yet Available at
13 pages
Regularization PDF
No ratings yet
Regularization PDF
32 pages
The Problem of Overfitting: Overfitting With Linear Regression
No ratings yet
The Problem of Overfitting: Overfitting With Linear Regression
32 pages
Exploratory Factor Analysis
No ratings yet
Exploratory Factor Analysis
22 pages
Unit 5
No ratings yet
Unit 5
12 pages
ML Model Paper 2 Solution
No ratings yet
ML Model Paper 2 Solution
15 pages
SubjectiveQuestions
No ratings yet
SubjectiveQuestions
4 pages
Chapter 13 Notes Gould
No ratings yet
Chapter 13 Notes Gould
6 pages
SPss Compute Command
No ratings yet
SPss Compute Command
8 pages
Babies Learning Language_ Methods[05-06]
No ratings yet
Babies Learning Language_ Methods[05-06]
2 pages
Panel S9-In FEM, Gender Is Controlled For But Not Estimated
No ratings yet
Panel S9-In FEM, Gender Is Controlled For But Not Estimated
16 pages
T Rns Formations
No ratings yet
T Rns Formations
6 pages
More Excel Outside the Box: Unbelievable Excel Techniques from Excel MVP Bob Umlas
From Everand
More Excel Outside the Box: Unbelievable Excel Techniques from Excel MVP Bob Umlas
Bob Umlas
No ratings yet
Simulated Annealing: Fundamentals and Applications
From Everand
Simulated Annealing: Fundamentals and Applications
Fouad Sabry
No ratings yet
Fuzzy Logic: Fundamentals and Applications
From Everand
Fuzzy Logic: Fundamentals and Applications
Fouad Sabry
No ratings yet
The Music Teacher - Word
No ratings yet
The Music Teacher - Word
6 pages
Mangajin28 - Vending Machines
100% (1)
Mangajin28 - Vending Machines
67 pages
Kemal Dedic
No ratings yet
Kemal Dedic
2 pages
Hydrostatic Testing Procedure 02250-1
No ratings yet
Hydrostatic Testing Procedure 02250-1
2 pages
Types of Internet A Complete Introduction
No ratings yet
Types of Internet A Complete Introduction
6 pages
Download full (Ebook) Electronics for Kids. Play with Simple Circuits and Experiment with Electricity! by Oyvind Nydal Dahl ISBN 9781593277253, 1593277253 ebook all chapters
100% (3)
Download full (Ebook) Electronics for Kids. Play with Simple Circuits and Experiment with Electricity! by Oyvind Nydal Dahl ISBN 9781593277253, 1593277253 ebook all chapters
65 pages
Sari Facilities - Module 1
No ratings yet
Sari Facilities - Module 1
39 pages
Palmyrene Portraits From The Temple of Allat. New Evidence On Artists and Workshops
No ratings yet
Palmyrene Portraits From The Temple of Allat. New Evidence On Artists and Workshops
19 pages
bài tập hiện tại hoàn thành lớp 8
No ratings yet
bài tập hiện tại hoàn thành lớp 8
11 pages
MNG4801 - JANFEB - 2021 - Online Portfolio Exam - Answer Template
No ratings yet
MNG4801 - JANFEB - 2021 - Online Portfolio Exam - Answer Template
11 pages
Managing Channel Partners of Robi Axiata Ltd. in Bangladesh
No ratings yet
Managing Channel Partners of Robi Axiata Ltd. in Bangladesh
32 pages
LEASE DEED - HDN Industries 150-3
No ratings yet
LEASE DEED - HDN Industries 150-3
7 pages
Wallaert J J, Fisher J W. Shear Strength of High-Strength Bolts - 1964
No ratings yet
Wallaert J J, Fisher J W. Shear Strength of High-Strength Bolts - 1964
61 pages
Keestar Bigbag Clean Operation Instruction
No ratings yet
Keestar Bigbag Clean Operation Instruction
6 pages
Emerson Hktv13 Chassis Cn-001nk Svcmnls
No ratings yet
Emerson Hktv13 Chassis Cn-001nk Svcmnls
38 pages
LEAD Program Description
No ratings yet
LEAD Program Description
2 pages
MOM Regular Review of eSIP
No ratings yet
MOM Regular Review of eSIP
2 pages
SPEC 1600 kVA, 20 KV-400 v (Cu - Cu)
No ratings yet
SPEC 1600 kVA, 20 KV-400 v (Cu - Cu)
1 page
Unity University Civil Engineering Bsc. Program Chapter-1
No ratings yet
Unity University Civil Engineering Bsc. Program Chapter-1
24 pages
Ai 4th Mythology Lesson
No ratings yet
Ai 4th Mythology Lesson
3 pages
Investigation of The Effect of Palm Bunch Ash On Concrete Properties
No ratings yet
Investigation of The Effect of Palm Bunch Ash On Concrete Properties
6 pages
Grade 7 Kinetic and Potential Energy
100% (1)
Grade 7 Kinetic and Potential Energy
4 pages
Max 31790 Ev Kit
100% (1)
Max 31790 Ev Kit
19 pages
Ground Improvement Technology: - Stone Column Technology - Band Drain Technology
No ratings yet
Ground Improvement Technology: - Stone Column Technology - Band Drain Technology
27 pages
First 21-10-2014 PDF
No ratings yet
First 21-10-2014 PDF
5 pages

Data Transformation by Andy Field

Uploaded by

Data Transformation by Andy Field

Uploaded by

CHAPTE R 5 E X P L O R I N G A SS U M P T I O N S 155

TABLE 5.1 Data transformations and their uses

Data Transformation Can Correct For

The issue is quite complicated (especially for this early

You might also like