Statistics Normality
Statistics Normality
Transformations
Definition – (Non)Parametric
Parametric statistics assume that
data come from a normal
distribution, and make inferences
about parameters of that
distribution. These statistical
tests are based on comparing the
means (central tendency) of the
distributions, as a function of their
variability (spread).
Non-parametric statistics do not depend on fitting a
parameterized distribution, based on normality. These
statistical tests are based on comparing the medians
(50 % of data distributions) and the ranks of the
observations amongst the samples.
The Normal Distribution
X ~ N (µ, σ)
In a normal distribution:
• ~ 68% observations within 1 standard deviation of mean
• ~ 96% within 2 standard deviations
• ~ 99% within 3 standard deviations
Assessing Normality
Three ways to assess the normality of the data
• 1) Graphical Displays
– Histogram, Density plot Boxplot, Q-Q Plot
• 2) Skewness / Kurtosis
- Are they different from 0 ? (normal distribution)
OPTIONS tab:
Select the type and the
parameters of theoretical
data distribution.
Default: “Normal”
Assessing Normality
Q-Q Plot: quantile / quantile plot
- Numerical summaries
Normality Test with Rcmdr
Test of Normality
Select data
Use Shapiro-Wilk
Case 2.
Two samples
different. They
are not from
same population
Summary - Parametric Statistics
Benefits and Costs:
B. Skewness:
Measures asymmetry of the distribution. A value of
zero indicates symmetry. Skewness absolute value > 1
indicates non-normal skewed distribution.
C. Kurtosis:
Measures the distribution of mass in the distribution.
A value of zero indicates a normal distribution. Kurtosis
absolute value > 1 indicates non-normal unbalanced
distribution.
Summary – Approach
Suggested Approach:
(x) f(x)
(x) f(x)
Before After
Square Root Transformations
MONOTONIC
TRANSFORMATIONS
(x) f(x)
Power exponents:
½ power (square root)
Before After
Data Transformations – For Proportions
Arcsine / Arcsine-squareroot transformation
Data
Manage variables in active data set
Compute New Variable
Numeracy: Log (Numeracy):
positive > Summary
skew
Min: 1
Max = 14
NOTE: In R:
Log = Ln
Log10 = Log
Hints for Computing New Variables
asin = arcsine
Most Important Rule: Do not Reverse the Order of the Values (larger remains
larger… smaller remains smaller)