4.normal Distribution Haomin2021
4.normal Distribution Haomin2021
2
Previously on Medical statistics
3
A basic summary of data Types
Nominal Ordinal
4
Previously on Medical statistics
5
Several distinct samples may have the same
mean or median, but completely different levels
of variability.
580
560
540
520
500
480
460
440
420
A B C 6
Measure of dispersion tendency
■ Range (极差)
■ Inter-quartile range (四分位数间距)
■ Variance and Standard Deviation
(方差和标准差)
■ Coefficient of variation (变异系数)
7
(2) Inferential statistics:
It involves making inferences that go
beyond the actual data.
They usually involve inductive reasoning
(i.e., generalizing to a population after
having observed only a sample).
So in inferential statistics what you do is
based on the sample.
8
Normal Distribution
正态分布
Haomin Yang
School of Public Health
Fujian Medical University
9
10
11
12
13
Normal Distribution
"bell curve“
Most data points fall in the middle. 14
Skewed to the right Distribution
17
18
• The normal distribution is the most
important concept in statistics.
Almost everything we do in
inferential statistics is to some degree,
based on the normal distribution.
推论统计很大程度都是基于正态分布.
• One such example is the histogram(直
Figure 1.
20
Taller around center,
shorter on two sides
and symmetric.
24
• This curve is called a probability
density curve.
• Probability density curves are used to
calculate the probability that different
values will occur.
25
Probability
• Probability denotes the possibility of the
outcome of any random event. It is to
check the extent to which any event is
likely to happen.
26
Probability
• P(E) = Number of Favourable
Outcomes/Number of total outcomes
• 0≤P(E)≤1
• σ 𝑷(𝑬𝒊 )=1
27
Probability
• 3 ways to obtain
– Repeated experimentation
– Sample space are equally likely
– Probability distribution
• Discrete probability distribution
– Binomial
– Poisson
• Continuous probability distribution
– Normal probability distribution
28
The Normal Probability Density Function
30
• Populations with small values of
the standard deviation have
a distribution concentrated close
to the centre, ; those with
large standard deviation have a
distribution widely spread along
the measurement axis (Figure 3).
Figure 3
31
Here the means are different while
the standard deviations are the same.
32
(a) Effect of changing mean (1< 2 < 3)
1 2 3
Figure 4 Probability distribution functions of the Normal
distributions with different means. 33
Here the means are the same while the
standard deviations are different. 34
(b) Effect of changing (1 < 2 < 3).
1
2
3
Figure 5 Probability distribution functions of the Normal
distributions with different standard deviations 35
So,
• Changing the mean simply moves the curve
along the horizontal axis, while changing
the standard deviation alters the height and
width of the curve.
36
Properties of the normal distribution
• The normal curve is symmetric and is
defined by its mean and its standard
deviation.
37
Properties of the normal distribution
• The shape of the normal curve is often
illustrated as a bell-shaped curve.
41
3) 68% of the area (data) under the curve is
within one standard deviation of the mean.
68.27%
-
42
4) 95% of the area (data) under the curve is
within 1.96 standard deviations of the mean.
95%
2.5% 2.5%
-1.96 1.96
43
5) 99% of the area (data) under the curve is
within 2.58 standard deviations of the mean.
99%
0.5% 0.5%
-2.58 2.58
44
45
True or False
For any normal distribution, the mean, median, and mode will have
the same value.
The percentile rank for the mean is 50% for any normal
distribution.
The red distribution has more area underneath the curve than the
blue distribution does
46
Standard normal distribution
47
The Standardized Normal
48
Translation to the Standardized
Normal Distribution
• Translate from X to the standardized
normal (the “Z” distribution) by
subtracting the mean of X and
dividing by its standard deviation:
X μ
Z
σ
The Z distribution always has mean = 0 and
standard deviation = 1 49
To any normal variable N ( , 2 ) , after a
standardization transformation:
X
Z
Z is called with standardized normal deviate
or Z-value, or Z-score.
Z is a random variable that has a standard
Normal distribution.
Z ~ N (0,1)
50
Z ~ N (0,1)
The Standard Normal distribution has a mean
of zero and a variance of one. The formula is
given as Equation 2.
0 1
2
1 z 2
( z) exp (2)
2 2
51
Z
Z
N(μ,σ2) N(0,1)
( z ) P ( X z )
54
55
Standardized Normal Probability Table
Z=-2.62
P
• The area within (1.64,1.64)
(1.64) 0.0500
(1.64) 1 0.0500 0.9500
(1.64) (1.64) 0.95000 0.05000 0.9000
90%
5% 5%
-1.64 0 1.64
Corresponding to 1.64,
the area of one tail is 0.05 , the area of two tails is 0.10
• The area within (1.96,1.96)
(1.96) 0.02500
(1.96) 1 0.02500 0.97500
(1.96) ( 1.96) 0.97500 0.02500 0.9500
95%
2.5% 2.5%
-1.96 0 1.96
Corresponding to 1.96,
the area of one tail is 0.025, the area of two tails is 0.05
The area within (2.58,2.58)
(2.58) 0.004940
(2.58) 1 0.004940 0.995060
(2.58) (2.58) 0.995060 0.004960 0.99012
99%
0.5% 0.5%
-2.58 0 2.58
Corresponding to 2.58,
the area of one tail is 0.005, the area of two tails is 0.010
68%
-1 0 1
Corresponding to 1,
the area of one tail is 0.159, the area of two tails is 0.317
60
Critical value
Z / 2 : Two sided critical value
Z : One sided critical value
Z
61
62
Exercise
Use the following information to answer the next two exercises: The patient
recovery time from a particular surgical procedure is normally distributed with a
mean of 5.3 days and a standard deviation of 2.1 days.
What is the z-score for a patient who takes ten days to recover?
A. 1.5
B. 0.2
C. 2.2
D. 7.3
63
Exercise
Kyle’s doctor told him that the z-score for his systolic blood pressure is 1.75. Which
of the following is the best interpretation of this standardized score? The systolic
blood pressure (given in millimeters) of males has an approximately normal
distribution with mean μ=125 and standard deviation σ=14 . If X= a systolic
blood pressure score then X∼N(125,14) .
65
• The Normal probability distribution can be
used to calculate the probability of different
values occurring.
• We could be interested in: what is the
probabilityof being within 1 standard
deviation of the mean (or outside it)?
• We can use a standard normal distribution
table which tells us the probability of being
outside this value.
66
Example:
• The birthweight data from the
O’Cathain et al (2002) study let us
assume that the birthweight for new born
babies has a Normal distribution with a
mean of 3.4 kg and a standard deviation of
0.6 kg.
• So what is the probability of giving birth
to a baby with a birthweight of 4.5 kg or
higher? 67
Answers
• Since birthweight is assumed to follow a
Normal distribution, with mean of 3.4 kg
and SD of 0.6 kg, we therefore know that
approximately 68% of birthweights will lie
between 2.8 and 4.0 kg and about 95% of
birthweights will lie between 2.2 and 4.6 kg.
• Using Figure 6 we can see that a birthweight
of 4.5 kg is between 1 and 2 standard
deviations away from the mean.
68
Figure 6 Normal distribution curve for birthweight with a
mean of 3.4 kg and SD of 0.6 kg 69
• First calculate z, the number of standard
deviations 4.5 kg is away from the mean of
3.4 kg, that is,
71
Exercise:
The women age 18 and over in HANES 2005
averaged 63 inches tall with SD 3 inches. The
histogram was roughly unimodal and
symmetric. For these women:
72
(a) Converting scales
Heights in Standard Units
Heights of Women, 18 and over
0.15
0.15
0.10
0.10
Density
Density
0.05
0.05
0.00
0.00
-3 -2 -1 0 1 2 3
54 57 60 63 66 69 72
Height(Standard Unit)
Height(Inches)
Ave 63 SD 3
60 59
59 60
77
Reference interval
78
Reference interval
(84.8-1.96*3.27, 84.8+1.96*3.79)
(77.4cm, 92.2cm)
79
Exercises
80
1. For data to be normally distributed, which of
the following characteristics should it have?
A. To be defined as ‘normal’ a distribution should be
symmetrical about the mean, it should meet the x axis at
infinity and it should be platykurtic.
B. To be defined as ‘normal’ a distribution should be
symmetrical about the mean, it should meet the x axis at
infinity and it should be leptokurtic.
C. To be defined as ‘normal’ a distribution should be
symmetrical about the mean, it should meet the x axis at
infinity and it should be bell shaped.
D. To be defined as ‘normal’ a distribution should be
symmetrical about the mean, it should meet the x axis at
infinity and it should be positively skewed.
81
Exercise 2
82
Exercise 3
83
Answer to Exercise 3
84
85
Exercise 4
The patient recovery time from a particular surgical procedure is
normally distributed with a mean of 5.3 days and a standard deviation
of 2.1 days.
86
According to a study done by De Anza students, the height for Asian adult
males is normally distributed with an average of 66 inches and a standard
deviation of 2.5 inches. Suppose one Asian adult male is randomly chosen.
Let X=height of the individual.
1.X∼ _____(_____,_____)
2.Find the probability that the person is between 65 and 69 inches. Include
a sketch of the graph, and write a probability statement.
3.Would you expect to meet many Asian adult males over 72 inches? Explain
why or why not, and justify your answer numerically.
4.The middle 40% of heights fall between what two values? Sketch the
graph, and write the probability statement.
87
The percent of fat calories that a person in America consumes
each day is normally distributed with a mean of about 36 and a
standard deviation of 10. Suppose that one individual is
randomly chosen. Let X=percent of fat calories.
1.X∼ _____(_____,_____)
90
91
92
93
Statistical tests -including ANOVA, t-
tests and regression- require the normality
assumption
94