T Rns Formations

This document discusses the normality requirement of many statistical tests and how to check if a dataset meets this assumption. It explains that parametric tests assume the dataset is normally distributed. Two common ways to check normality are examining histograms and using the Kolmogorov-Smirnov test. If the data is not normal, options include transforming the data (e.g. logarithmic transformation for positive skewed response time data) or using non-parametric tests. The document provides guidance on checking normality and addressing violations of this assumption.

Uploaded by

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as RTF, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

52 views6 pages

T Rns Formations

Uploaded by

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as RTF, PDF, TXT or read online on Scribd

You are on page 1/ 6

Dealing with the Normality requirement

Generally the 'Roll's Royce' inferential statistics - the most powerful and sophisticated
ones - require DV's in the form of interval numerical scores and the assumption that
the population sampled has a particular distribution, the so-called 'normal' distribution
(and often other prerequisites as well). I.e. the mathematics of these tests assumes
that the population sampled has the scores of all the individual cases spread round the
average in all a particular bell-shaped pattern that you see illustrated in all the books.
Given that distribution of scores in the population, the statistician then has formulae
that predict how often you would get samples of a given size with a particular average
if you picked them randomly from that population over and over again. This is what is
used to decide if your sample(s) is unusual or not, and hence if your sample suggests a
significant difference or whatever. Statistical tests etc. with this assumption are often
referred to as 'parametric' statistics; they generally come on the left of my flow charts.
Mainly they are the t tests and ANOVA/GLM.
What do we mean by 'distribution'? We mean the shape of the histogram when you
plot the possible scores on the bottom scale and the number of cases scoring each
score on the side scale. The shape of the distribution of a set of scores is not captured
either by the mean or the SD of the set: in practice you have to look at the histogram.
What does 'normal' mean. It just refers to a particular shape, also called the Gaussian
curve, with a known formula. It does not mean you 'normally' find this shape.
Why are such tests called 'parametric'? A parameter is a limit within which something
can operate. E.g. the police operate within parameters defined by the law - so for
example they are not relevant in disputes which involve civil rather than criminal law.
Similarly parametric tests work within limits defined by the distribution shape of the
populations. In principle they are not relevant where the populations sampled do not
have this distribution. But non-parametric tests work on populations with any
distribution shape (it is just a bit unfortunate that there do not exist non-parametric
tests that are as versatile and powerful as the parametric ones, otherwise it would be
simpler to use those all the time!).

How do we know if the distribution has the normal shape?

The problem of course is how to know what the distribution of scores in a population
is like, since usually nobody has measured the entire population and drawn a
histogram of scores for the whole population, and seen what shape it is. The normal
distribution is known to arise commonly in the natural world, but certainly can't be
assumed to arise everywhere.
Two common ways of checking are:

Make histograms of the scores for each group on each variable. Get SPSS to
insert the normal curve shape (ElementsShow distribution curve). Decide
by inspection if the histograms fit the normal shape. BUT remember it is the
population that has to have this shape, not the samples and common sense
says we can't take it that the distribution in the population is exactly that in the
sample. The usual common sense about sampling tells us that one might get
samples whose distribution looks quite different from that of the population
they are from.

Do a side investigation to answer the question 'If the population is normally

distributed, how often would you get the distribution in my sample coming up
1

in random samples taken from it?'. If the answer is "often", we'll take it that
the population is normally distributed. If the answer is "rarely" (taking perhaps
5% or less of the time as = "rarely"), then we will say not. Put more formally,
you test the hypothesis that there is no difference between the 'observed'
distribution in the sample and an 'expected' normal distribution". There are
several inferential and other techniques that can be used to tackle this
question. The best and simplest I find is the one sample Kolmogorov-Smirnov
(K-S) test, using the SPSS program (look in AnalyzeNonparametric). Use
it on each separate subgroup of subjects if you have several. You end up with a
probability (between 0 and 1) that a sample is from a normally distributed
population. In interpreting this probability, unlike in the use of many
inferential tests, you are usually "wanting" a p greater than .05 (or 5%) rather
than less. It is usually helpful for you if the null hypothesis is confirmed, so
you conclude that your sample's distribution is no different from the normal
distribution, but for departures just due to sampling. Then you can feel
justified in using parametric inferential stats to compare groups etc..

What to do if the shape is not normal

In practice, researchers take a variety of lines. Choosing parametric tests or not is
often a matter for judgment, not maths.
* Some say, "If previous people working in the same area have assumed the normal
distribution, I will too without further ado".
* Others say "It may be a false assumption, but I'll assume it anyway since in practical
use of statistics one is always violating some assumptions anyway, and there is some
research that shows that you don't go far wrong if you use parametric stats where the
population distribution is not exactly normal".
* Others (e.g. Langley) again say, "Let's be cautious and never assume it without good
evidence, and there is research that shows that you don't lose much in power by using
non-parametric stats instead". If in doubt choose 'No'! The problem is, what if there is
no nonparametric test to use instead?
One of the above is the easiest choice for the beginner. The next requires work!
* Sometimes the distribution of the population is known from a lot of past work to be
of some specific shape that is definitely not the 'normal' one. Or you feel able to
assume from the distribution of your sample(s) that your population has a non-normal
shape for some reason. Often known shapes that are not normal can be "made into"
the normal shape by transforming the scores of each case. Some researchers do such
mathematical conversions of all the scores in their sample(s) to achieve this and then
use parametric stats on the transformed scores. The commonest conversions of data to
make them more normal are:
Percentage scores (e.g. subjects each get a % correct score for use of
the -s third singular), and other scores on scales with two fixed ends
(like 5 point rating scales): use the arcsine transformation.
Response time scores, which are of course on a scale fixed at one end
(0 milliseconds) but extending indefinitely at the other end: use the
logarithmic (log) transformation.
What is a transformation? A 'transformation' mathematically is any conversion of
scores to a related set of scores, following some mathematical rule. So for instance if

you make a new set of scores that is the original ones multiplied by two, or with one
subtracted from all of them, those are simple transformations - but not one that
changes the distribution shape. Re-expressing a set of scores as percent out of some
total is another. Arcsine and logarithmic transformations simply involve more esoteric
mathematical functions which have special effects on distribution shape.
In SPSS you can create new columns of figures from existing ones using
Transform...Compute then enter a name for the new target variable and in numeric
expression put a suitable formula including the variable to be transformed.
Scores on a scale fixed at one end: the log transformation
Sometimes where a histogram shape of a sample (i.e. the 'distribution' of scores) is not
very obviously 'normal', it may have a recognised non-normal shape which can be
transformed into 'normal'. This then makes it more appropriate then to use the stats
that assume the normal distribution shape (so-called 'parametric' tests like t tests and
ANOVA).
For example, response times in psycholinguistic research often are 'skewed' to the left
(positively). They are usually recorded in milliseconds, and also incidentally often
found referred to as reaction times, or 'latencies'. For example in resptim.sav there are
a whole lot of response times gained from a single subject called Ettore in a
psycholinguistic experiment by a student working on the processing of regular and
irregular verb forms in Italian. Subjects had to say as fast as possible whether the
forms existed or not as they appeared on computer screen.
Load the file and see the shape of the histogram.
Why do you think it is that the main 'heap' is not centred on the scale
with symmetrical tails on both sides? Think of the nature of response
time data.
A heap on the left of the distribution of scores is also called 'positive
skew', a heap on the right 'negative skew'. Which have we here?
Check with the one sample K-S test to see if the distribution could be
from a population with the so-called 'normal' shape.
Now try using the log transformation:
Log transformed score = logarithm of raw score
Obtain the logs of the original scores in a new column. From the top menu use
Transform.... Compute... and choose either the LN(numexpr) or LG10(numexpr)
function from the list. Click it into the numeric expression box and replace the word
numexpr in the function with the resptime variable. In the target variable box enter
a name for the new column (e.g. logrt). Then OK.
Look at the histogram of the transformed set of scores. Is it more
normal looking?
Try the K-S test. Does the new distribution pass the test for being a
likely one to be obtained from a population with a normal distribution?
LN stands for 'natural logarithms' and LG10 for logarithms to base ten.
Try both and see if it makes any difference to the histogram or K-S test
result.

For those who don't know what logarithms are..... they convert numbers in the
following way. Logs to base ten (LG10 or log10) re-express a number as the power to
which ten would have to be raised to be the same value. So LG10 of 100 is 2 (because
102 = 100); LG10 of 1000 is 3 (because 103 = 1000). Other logs are intervening
figures: e.g. LG10 of 200 = 2.301.
On that system, LG10 of 10 would be.... what? LG10 of 1 would be...?
Can you see how this conversion makes the differences between larger
figures on the original scale smaller on the new scale, so it contracts
the right hand side of a histogram and moves a heap on the left hand
side of a distribution more towards the middle?
Natural (or Napierian) logarithms (LN) work exactly the same except that the base
figure is not 10, but a figure that has special mathematical properties, usually referred
to as e, which = approximately 2.7183. So the original numbers get re-expressed as
the power to which 2.7183 would have to be raised to be equivalent. E.g. LN of 100 =
4.605 because 2.71834.605 = 100. (This peculiar figure e is as important in maths as
another peculiar figure you possibly know better, pi. While pi has all sorts of uses
when calculating things to do with circles, e, the exponential constant, has great
importance in areas of maths where one calculates growth. E.g. it helps you calculate
the compound interest on an investment. But that is irrelevant here). In principle one
can make logs to any base you like, but these two types are the commonest.
Percent scores: the arcsine transformation
The arcsine transformation is a trigonometric function concerned with the
relationships between the lengths of sides of right-angled triangles and the angles
between them. If you have never done trigonometry, then skip to the next paragraph.
If you have, then you will have come across sines. The sine of an angle in a right
angled triangle is the length of the side opposite the angle divided by the long side of
the triangle (hypotenuse). Maybe you remembered that with the mnemonic OHMS
(opposite over hypotenuse means sine). A little thought shows that with an angle of 0
degrees (or 0 radians) the sine will be 0. With an angle of approximately 90 degrees
(or /2 radians = 3.1416/2 = 1.57 radians) then the sine of the angle will be 1. And
with a negative angle of 90 degrees (or 1.57 radians) then the sine will be 1. The
arcsine is the reverse function of the sine. So since sine 0 = 0, then arcsine 0 = 0; as
sine 1.57 radians = 1, then arcsine 1 = 1.57; and so on.
However, for our purposes, all we need to know is that this transformation stretches
the ends of a fixed end scale but leaves the middle relatively untouched. So it tends to
get rid of ceiling or floor effect i.e. where your distribution of scores is not normal,
because it is bunched in a skewed way at the top or bottom of the scale.
Things are complicated by the fact that the arcsine transformation requires input in the
form of a scale running within the limits -1 to +1, so any fixed-ended scale that you
want transformed has to be converted first to that. A commonly used form of the
arcsine transformation that gets over such difficulties (Rietveld and van Hout 1993
p126) is as follows:
Arcsine transformed score =
2 x arcsine of (square root of (raw score/max possible score))
In SPSS you get this via TransformCompute. You will find ARSIN(numexpr)
and SQRT(numexpr) available in a parallel way to LN(numexpr) described above.
4

If your scores to be transformed are in a column called rscore and all out of 15, then
you create a formula of the type: 2*ARSIN(SQRT(rscore/15)). If the scores are
percentage scores, then the formula is 2*ARSIN(SQRT(rscore/100)), and so on.
The new scale always runs between 0 and 3.14. To see the effect, I here show what
some percent scores come out as:
Original % score

Transformed to
an arcsine score

rscore
arcscor
0
0.0
10
0.64
20
0.93
30
1.16
40
1.37
50
1.57
60
1.77
70
1.98
80
2.21
90
2.5
100
3.14
See how the original scale is stretched at the ends and so, if scores were bunched at
one end (floor or ceiling effect), the shape of the distribution would get stretched on
the side nearest the end of the scale and so end up more normal-like.
For data where scores at the exact ends of the scale are not observed, the log
probability ratio transformation has a similar effect. For %correct scores it would take
the form:
Transformed score = Log10 (%correct/%incorrect)
It stretches the ends rather more than the arcsine transformation, but cannot cope with
scores of 100% or 0%.
Original % score

Transformed to a
log probability ratio

rscore
.00001
10
20
30
40
50
60
70
80
90
99.9999

lgpscor
-7
-.95
-.6
-.37
-.18
0
.18
.37
.6
.95
7

The different effect of the two transformations can be seen below. Clearly the logbased transformation has a much more drastic stretching effect near the ends of the
scale than the arcsine transformation does.

CCLS Math Student S Book 8 and Workbook 8 Answers
No ratings yet
CCLS Math Student S Book 8 and Workbook 8 Answers
72 pages
Audit Checklist For ISO 13485
100% (1)
Audit Checklist For ISO 13485
6 pages
Audit Checklist For ISO 13485
100% (1)
Audit Checklist For ISO 13485
6 pages
Piping Estimation
88% (8)
Piping Estimation
19 pages
Introduction To GraphPad Prism
No ratings yet
Introduction To GraphPad Prism
33 pages
Community Project: Checking Normality For Parametric Tests in SPSS
No ratings yet
Community Project: Checking Normality For Parametric Tests in SPSS
4 pages
Reviewer Psychstats Midterms
No ratings yet
Reviewer Psychstats Midterms
12 pages
Data Analysis Chp9 RM
No ratings yet
Data Analysis Chp9 RM
9 pages
Normality Checking 11 Ps
No ratings yet
Normality Checking 11 Ps
4 pages
2 Normality PG OK
No ratings yet
2 Normality PG OK
24 pages
Statistics Normality
No ratings yet
Statistics Normality
42 pages
Tips and Tricks For Analyzing Non-Normal Data
No ratings yet
Tips and Tricks For Analyzing Non-Normal Data
3 pages
Community Project: Checking Normality For Parametric Tests in R
No ratings yet
Community Project: Checking Normality For Parametric Tests in R
4 pages
U01d2 Questions About Statistics
No ratings yet
U01d2 Questions About Statistics
3 pages
Salinan Terjemahan Pengertian Uji Normalitas
No ratings yet
Salinan Terjemahan Pengertian Uji Normalitas
8 pages
BCSC 108 MAY 24 Introduction to Statistics
No ratings yet
BCSC 108 MAY 24 Introduction to Statistics
63 pages
How Do I Test The Normality of A Variable's Distribution?
No ratings yet
How Do I Test The Normality of A Variable's Distribution?
6 pages
1333355396testing For Normality Using SPSS
No ratings yet
1333355396testing For Normality Using SPSS
19 pages
02 Normal Distribution - TV
No ratings yet
02 Normal Distribution - TV
23 pages
Updated -BCSC 108 MAY 24 Introduction to Statistics
No ratings yet
Updated -BCSC 108 MAY 24 Introduction to Statistics
69 pages
Normal Distribution
No ratings yet
Normal Distribution
25 pages
Testing For Normality Using SPSS PDF
100% (1)
Testing For Normality Using SPSS PDF
12 pages
Ilovepdf Merged Removed
No ratings yet
Ilovepdf Merged Removed
232 pages
Testing For Normality Using SPSS
No ratings yet
Testing For Normality Using SPSS
12 pages
HBTopic 4
No ratings yet
HBTopic 4
10 pages
3.normality Test and Homogenity
No ratings yet
3.normality Test and Homogenity
4 pages
Module 2
No ratings yet
Module 2
13 pages
Statistics Important Points: Properties of Normal Distribution
No ratings yet
Statistics Important Points: Properties of Normal Distribution
2 pages
The Assumption(s) of Normality
No ratings yet
The Assumption(s) of Normality
6 pages
Week 9 Chapter 1 Normal
No ratings yet
Week 9 Chapter 1 Normal
51 pages
Healy MJR - Non-Normal Data.
No ratings yet
Healy MJR - Non-Normal Data.
6 pages
Statistics 1 (Final) / Orthodontic Courses by Indian Dental Academy
No ratings yet
Statistics 1 (Final) / Orthodontic Courses by Indian Dental Academy
15 pages
Lecture09 (Assessing Normality)
No ratings yet
Lecture09 (Assessing Normality)
32 pages
WK 1b Biostat
No ratings yet
WK 1b Biostat
38 pages
EJ1165803
No ratings yet
EJ1165803
15 pages
Hypothesis Testing 2
No ratings yet
Hypothesis Testing 2
7 pages
Statistics Supplement McEvoy
No ratings yet
Statistics Supplement McEvoy
10 pages
Statistics - The Big Picture
No ratings yet
Statistics - The Big Picture
4 pages
Testing For Normality and Transforming Data
No ratings yet
Testing For Normality and Transforming Data
56 pages
Chapter 2 Statistical Concepts in Research
No ratings yet
Chapter 2 Statistical Concepts in Research
62 pages
Statistics in Orthodontics 2
No ratings yet
Statistics in Orthodontics 2
96 pages
Statistical Inference: Prepared By: Antonio E. Chan, M.D
No ratings yet
Statistical Inference: Prepared By: Antonio E. Chan, M.D
227 pages
C15 Statistics TI84
No ratings yet
C15 Statistics TI84
178 pages
Statistics and Probability: Normal Distribution
No ratings yet
Statistics and Probability: Normal Distribution
40 pages
RKSInferential Statistics Tutorial
No ratings yet
RKSInferential Statistics Tutorial
9 pages
Introduction To The Normal Distribution Bell Curve
No ratings yet
Introduction To The Normal Distribution Bell Curve
6 pages
Understanding Statistics - KB Edits040413
No ratings yet
Understanding Statistics - KB Edits040413
70 pages
Statistics
100% (1)
Statistics
743 pages
05 - Statistical Processing and Analysis of Medical Data
No ratings yet
05 - Statistical Processing and Analysis of Medical Data
14 pages
23-Biostatistics
No ratings yet
23-Biostatistics
18 pages
Normal Distribution and Statistical Hypothesis
No ratings yet
Normal Distribution and Statistical Hypothesis
4 pages
Application of Statistical Tools For Data Analysis and Interpretation in Crops
No ratings yet
Application of Statistical Tools For Data Analysis and Interpretation in Crops
10 pages
UNIT 10
No ratings yet
UNIT 10
30 pages
Module I. Basic Calculations. Average, Standard Deviation by Excel (5)
No ratings yet
Module I. Basic Calculations. Average, Standard Deviation by Excel (5)
48 pages
8 (1) .Basic Stat Inference
No ratings yet
8 (1) .Basic Stat Inference
41 pages
Full Download (eBook PDF) First Course in Statistics A 11th Edition PDF DOCX
100% (6)
Full Download (eBook PDF) First Course in Statistics A 11th Edition PDF DOCX
53 pages
Non-Normal Distribution Big
No ratings yet
Non-Normal Distribution Big
7 pages
Assignment 9 Nomor 1
No ratings yet
Assignment 9 Nomor 1
2 pages
Statistics
No ratings yet
Statistics
10 pages
10.2478_rrlm-2022-0030
No ratings yet
10.2478_rrlm-2022-0030
10 pages
Sampling in Statistics
From Everand
Sampling in Statistics
Stephanie Glen
No ratings yet
Chi Squared for Beginners
From Everand
Chi Squared for Beginners
Stephanie Glen
No ratings yet
Errors of Regression Models: Bite-Size Machine Learning, #1
From Everand
Errors of Regression Models: Bite-Size Machine Learning, #1
Lee Baker
No ratings yet
SPSS for you
From Everand
SPSS for you
A Rajathi
4.5/5 (4)
0.016 USL Process Is NOT Stable
No ratings yet
0.016 USL Process Is NOT Stable
5 pages
Survival of The Fittest - The Process Control Imperative
No ratings yet
Survival of The Fittest - The Process Control Imperative
10 pages
Entropy: Increasing The Discriminatory Power of DEA Using Shannon's Entropy
No ratings yet
Entropy: Increasing The Discriminatory Power of DEA Using Shannon's Entropy
15 pages
Applied Technology: by John J. Flaig, PH.D
No ratings yet
Applied Technology: by John J. Flaig, PH.D
2 pages
Zed Charts Compare Apples and Oranges: Wheeler's Workshop
No ratings yet
Zed Charts Compare Apples and Oranges: Wheeler's Workshop
4 pages
Blind Ident
No ratings yet
Blind Ident
6 pages
Quality Human Per For
No ratings yet
Quality Human Per For
8 pages
A Study of Q Chart For Short Run PDF
100% (1)
A Study of Q Chart For Short Run PDF
37 pages
Tolerance Design Using Taguchi Methods
No ratings yet
Tolerance Design Using Taguchi Methods
30 pages
Short Run Statistical Process Control
No ratings yet
Short Run Statistical Process Control
21 pages
Statistical Process Control Fundamentals
100% (1)
Statistical Process Control Fundamentals
32 pages
Forecasting Methods
No ratings yet
Forecasting Methods
30 pages
The Six Sigma DMAIC Roadmap: Define
No ratings yet
The Six Sigma DMAIC Roadmap: Define
1 page
Lean Six Sigma Project Charter: Title: Reduce Scrapped Cookies in NW Region BB/GB: B. Thornton
No ratings yet
Lean Six Sigma Project Charter: Title: Reduce Scrapped Cookies in NW Region BB/GB: B. Thornton
2 pages
LSS Final - Report - Template
No ratings yet
LSS Final - Report - Template
9 pages
Comparison Between Product Verification and Testing
No ratings yet
Comparison Between Product Verification and Testing
1 page
Apqp
No ratings yet
Apqp
11 pages
SPC For Correlated Variables PDF
No ratings yet
SPC For Correlated Variables PDF
15 pages
Lister Petter TR Series+
No ratings yet
Lister Petter TR Series+
4 pages
Algebra Word Problems Practice Test
No ratings yet
Algebra Word Problems Practice Test
43 pages
D8L Specifikacija Razvodnika
No ratings yet
D8L Specifikacija Razvodnika
4 pages
Carbide Tooling
No ratings yet
Carbide Tooling
90 pages
BTM2133-Chapter 4 Measuring Instruments
50% (2)
BTM2133-Chapter 4 Measuring Instruments
61 pages
08BKL0074 PWR Addbat Atp Falcon WSTN
No ratings yet
08BKL0074 PWR Addbat Atp Falcon WSTN
11 pages
Fundamentals in Cavity Prepration
No ratings yet
Fundamentals in Cavity Prepration
42 pages
Lesson 2.1 Earth As The Only Habitable Planet
No ratings yet
Lesson 2.1 Earth As The Only Habitable Planet
12 pages
Principles of Nuclear Physics (NPE-503) : N K N P
No ratings yet
Principles of Nuclear Physics (NPE-503) : N K N P
1 page
Unit 1
No ratings yet
Unit 1
39 pages
1.heavy Earth Moving Equipment A.dozers: Key Components
No ratings yet
1.heavy Earth Moving Equipment A.dozers: Key Components
8 pages
Protein Synthesis Project
No ratings yet
Protein Synthesis Project
2 pages
Proyecto Dialisis
100% (1)
Proyecto Dialisis
6 pages
PCon - Planner 8.1 Features
No ratings yet
PCon - Planner 8.1 Features
11 pages
Fault Analysis - SI1S - Complete
No ratings yet
Fault Analysis - SI1S - Complete
11 pages
Detection Algorithm For Detecting Dronesuavs
No ratings yet
Detection Algorithm For Detecting Dronesuavs
12 pages
BSCM
No ratings yet
BSCM
4 pages
6ME3A Mechatronics
No ratings yet
6ME3A Mechatronics
3 pages
Pearson
No ratings yet
Pearson
14 pages
Wannier Flow
No ratings yet
Wannier Flow
69 pages
8FG15 8FD15: Main฀Vehicle฀Specifications
No ratings yet
8FG15 8FD15: Main฀Vehicle฀Specifications
3 pages
Drafting Textbook
0% (1)
Drafting Textbook
41 pages
Hydroponics Literature Review
100% (2)
Hydroponics Literature Review
6 pages
Class 5-6 IP Addressing
No ratings yet
Class 5-6 IP Addressing
33 pages
Sadashiva Nagar School 3rd and Final Bill
No ratings yet
Sadashiva Nagar School 3rd and Final Bill
42 pages
ch-13 Light
No ratings yet
ch-13 Light
6 pages
Delta Ia-plc Dvp-es3 Pm en 20220407
No ratings yet
Delta Ia-plc Dvp-es3 Pm en 20220407
1,244 pages
G9 EOT Chemistry & Biology Content
No ratings yet
G9 EOT Chemistry & Biology Content
2 pages

T Rns Formations

Uploaded by

T Rns Formations

Uploaded by

Dealing with the Normality requirement

How do we know if the distribution has the normal shape?

Do a side investigation to answer the question 'If the population is normally

What to do if the shape is not normal

You might also like