0% found this document useful (0 votes)
151 views

Normal Distribution PPT With Assignment 1 Without Answers

The document discusses the normal distribution and how to determine if a sample data is approximately normally distributed. It provides guidelines on using statistics like skewness, kurtosis, and z-scores to identify outliers. Samples larger than 30 are more likely to be normally distributed according to the central limit theorem. The document also cautions that normality tests are not reliable for large samples.
Copyright
© © All Rights Reserved
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
151 views

Normal Distribution PPT With Assignment 1 Without Answers

The document discusses the normal distribution and how to determine if a sample data is approximately normally distributed. It provides guidelines on using statistics like skewness, kurtosis, and z-scores to identify outliers. Samples larger than 30 are more likely to be normally distributed according to the central limit theorem. The document also cautions that normality tests are not reliable for large samples.
Copyright
© © All Rights Reserved
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 33

The Normal Distribution

Assumption of Normality
• Many statistical tests (t, ANOVA) assume that the sampling
distribution is normally distributed.
• This is a problem, we don’t have access to the sampling
distribution.
• But, according to the central limit theorem if the sample data
are approximately normal, then the sampling distribution will
be normal.
• Also from the central limit theorem, in large samples (n > 30)
the sampling distribution tends to be normal, regardless of
the shape of the data in our sample.
• Our task is to decide when a distribution is approximately
normal.
A Z Score is a Standardized Statistic

95.0% of the scores fall between a Z of -1.96 to +1.96


97.4% of the scores fall between a Z of -3.00 to +3.00
99.9% of the scores fall between a Z of -3.30 to +3.30
How do we decide if a distribution is approximately normal?

99.9% of the scores fall between a Z of


-3.30 to +3.30
Normality Statistics are not Reliable for Large Samples

Z > 1.95 is significant at p < .05


Z > 2.58 is significant at p < .01
Z > 3.29 is significant at p < .001

Significance tests for normality: Kolomogorov-Smirnov, Shapiro-


Wilk, skew, kurtosis, should not be used in large samples
(because they are likely to be significant even when skew and
kurtosis are not too different from normal.
Small Data Sets (n<20) and Normality?
-.91
-.04
-.28
-.36
-1.86
-2.99
-.32
1.63
-.19
-.32

Small data sets are adversely affected by occasional


extreme scores, even when the extreme score is
less than a Z of 3.3.
Approximately Normal?

We will use the following to determine if a distribution


is approximately normal:

1. Q-Q Plot values should lie close to the 45 line.


2. Distribution should be similar in shape to the
normal curve.
3. Skew & Kurtosis should be reasonably close to 0.
4. Data points with a Z score > +3.3 or < -3.3 will be
considered as outliers and removed.
Settings to Superimpose Normal Curve on Histogram

1
Histogram with Normal Curve

Distribution may be
leptokurtic (peaked)

Positively skewed?

Need to run Explore.


Explore
Settings
We are 95% sure that the
true mean lies between
58.721 and 61.7085.

The distribution is
positively skewed.

The distribution is
leptokurtic.

This may be an outlier, we


can use the descriptives
command to generate Z
scores.
Don’t use the K-S test for
normality.

If the Shapiro-Wilk is less


than 0.05 the data may
not be normal. But, it is
not a perfect test.

For small data sets (n <


100) if S-W has a p <
0.001 the data may not
be normal.

The test is unreliable for


large data sets n > 100.
The circles should fall on the 45 degree line. For this data set
the ends are deviating from the line, again suggesting a
problem with normality.
Case number 282 has a star, indicating that it is an extreme score.

An extreme value, E, is defined as a value that is smaller (or larger) than 3 box-
lengths. We need to convert the data to Z scores to examine the Z for case 282.
Check this box to generate Z
scores for any variables in the
Variable(s) box.

The output is shown on the


next slide, it places the Z score
in your Data Sheet by adding a
Z in front of the variable(s)
selected.
The new column
ZReactionTime
contains the Z scores
for ReactionTime.

The Z score for case 282 is


4.944. Since the value is
much greater than our
arbitrary cutoff of 3.3.

We will delete this data


point then re-run Explore.
Copy the column, rename
the new variable
RTimeTrimmed, then
delete case 282.
The top descriptive
output is the original
data set.

Here is the data set with


case 282 removed. Look
at the changes in the
95% CI, Skew and
Kurtosis.
The Shapiro-Wilk is
significant,
indicating there may
be a problem with
normality.

Looks like case 84


and 100 may be the
cause.

We need to
generate Z scores
again.
The Q-Q plot and the box plot both suggest a problem. We need to run Z
scores to look at case 100 & 84.
Compute Z scores for the
RTimeTrimmed variable.

The new variable will be in


the data sheet labeled

ZRTimeTrimmed
Both case 84 &
100 have a Z
score above
3.3, our
arbitrary
cutpoint.

We can delete
them and then
run Explore.

Copy column
RTimeTrimmed,
make a new
variable
RTimeTrimmed
2
The original data set is
on the top.

RTimeTrimmed2
now has 3 data
points that have
been deleted.
We still may
have
normality
problems.
Looking better, run Z scores again on
RTimeTrimmed2, check cases 235 & 290.
The Z score for case 235 is 3.17, and for case 290 is 2.92. They are both
below our arbitrary cutoff of 3.3 for a Z score. The distribution is now
approximately normally distributed.
Assignment: Use the data found in slide 27 and
test using SPSS Program if the distribution of
Math test scores is approximately normal. As
your guide, follow the instructions found in slide
7-13. Be sure to print the results of the 4 test
outputs required in slide 7 and explain the
result. (use short bond paper).

Deadline of submission: 3rd Saturday of July


2018
Math Test Result

67 45 68 70
72 85 90 99
50 73 77 78
52 66 89 75
Interpretation: Skewness
• If skewness is positive, the data are positively skewed or
skewed right, meaning that the right tail of the
distribution is longer than the left. If skewness is
negative, the data are negatively skewed or skewed left,
meaning that the left tail is longer.

• If skewness = 0, the data are perfectly symmetrical.


Interpretation: Skewness
• But a skewness of exactly zero is quite unlikely for real-
world data, so how can you interpret the skewness
number? Bulmer (1979) — a classic — suggests this
rule of thumb:
• If skewness is less than −1 or greater than +1, the
distribution is highly skewed.
• If skewness is between −1 and −½ or between +½ and
+1, the distribution is moderately skewed.
• If skewness is between −½ and +½, the distribution is
approximately symmetric.
• With a skewness of −0.1098, the sample data for student
heights are approximately symmetric.
Kurtosis
• Traditionally, kurtosis has been explained in
terms of the central peak. You’ll see statements
like this one: Higher values indicate a higher,
sharper peak; lower values indicate a lower, less
distinct peak. Balanda and MacGillivray (1988)
also mention the tails: increasing kurtosis is
associated with the “movement of probability
mass from the shoulders of a distribution into its
center and tails.”
Kurtosis
• The reference standard is a normal distribution, which has
a kurtosis of 3. In token of this, often the excess kurtosis is
presented: excess kurtosis is simply kurtosis−3. For
example, the “kurtosis” reported by Excel is actually the
excess kurtosis.

• A normal distribution has kurtosis exactly 3 (excess


kurtosis exactly 0). Any distribution with kurtosis ≈3
(excess ≈0) is called mesokurtic.
Kurtosis
• A distribution with kurtosis <3 (excess kurtosis <0) is called
platykurtic. Compared to a normal distribution, its tails are
shorter and thinner, and often its central peak is lower and
broader.

• A distribution with kurtosis >3 (excess kurtosis >0) is called


leptokurtic. Compared to a normal distribution, its tails are
longer and fatter, and often its central peak is higher and
sharper.
Kurtosis

Uniform(min=−√
3, max=√3) Normal(μ=0, σ=1) Logistic(α=0, β=0.55153)
kurtosis = 1.8, kurtosis = 3, excess = 0 kurtosis = 4.2, excess = 1.2
excess = −1.2

Discrete: equally likely values Student’s t (df=4)


kurtosis = 1, excess = −2 kurtosis = ∞, excess = ∞

You might also like