0% found this document useful (0 votes)
19 views

4.normal Distribution Haomin2021

Uploaded by

yanghm669
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
19 views

4.normal Distribution Haomin2021

Uploaded by

yanghm669
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 94

1

Previously on Medical Statistics

The field of statistics can be broken into two


subcategories:

(1) Descriptive statistics (描述性统计)

(2) Inferential statistics(推断性统计)

2
Previously on Medical statistics

(1) Descriptive statistics:


It merely describe, organize, or

summarize data; they refer only to

the actual data available.

3
A basic summary of data Types

Nominal Ordinal

4
Previously on Medical statistics

• Measures of central tendency are used to


estimate "normal" values of a dataset.
• While measures of dispersion tendency are
important for describing the spread of the
data, or its variation around a central value.

5
Several distinct samples may have the same
mean or median, but completely different levels
of variability.
580
560
540
520
500
480
460
440
420
A B C 6
Measure of dispersion tendency

■ Range (极差)
■ Inter-quartile range (四分位数间距)
■ Variance and Standard Deviation
(方差和标准差)
■ Coefficient of variation (变异系数)

7
(2) Inferential statistics:
It involves making inferences that go
beyond the actual data.
They usually involve inductive reasoning
(i.e., generalizing to a population after
having observed only a sample).
So in inferential statistics what you do is
based on the sample.

8
Normal Distribution
正态分布
Haomin Yang
School of Public Health
Fujian Medical University

9
10
11
12
13
Normal Distribution

"bell curve“
Most data points fall in the middle. 14
Skewed to the right Distribution

A bell curve with a longer tail on the right and the


mount pushed somewhat to the left. 15
Skewed to the left Distribution

A bell curve with a longer tail on the left and the


16
mount pushed somewhat to the right.
History
• Abraham de Moivre: had scientific
interest in gambling and often
acted as a consultant to gamblers
to determine probabilities.

• Adrian and Gauss developed formula


for normal distribution
– Gaussian distribution

17
18
• The normal distribution is the most
important concept in statistics.

Almost everything we do in
inferential statistics is to some degree,
based on the normal distribution.
推论统计很大程度都是基于正态分布.
• One such example is the histogram(直

方图) of the birthweight (in kilograms)

of the 98 newborn babies shown in

Figure 1.

20
Taller around center,
shorter on two sides
and symmetric.

Figure1 Empirical relative frequency distributions of birth


weight of 98 babies (data from Simpson, 2004). 21
• If we imagine for the birthweight data in
Figure 2 that we have a very large
sample and by taking smaller and
smaller intervals to classify the birth
weights then the histogram will start to
look like a smooth curve which is also
shown in Figure 2.
22
2

The histogram of the sample data is an estimate of the


population distribution of birth weights in newborn babies.
23
• This symmetric ‘bell-shaped’
distribution mentioned above is known
as the Normal distribution(正态分布)
and is one of the most important
distributions in statistics.

24
• This curve is called a probability
density curve.
• Probability density curves are used to
calculate the probability that different
values will occur.

25
Probability
• Probability denotes the possibility of the
outcome of any random event. It is to
check the extent to which any event is
likely to happen.

• Probability is all about chance. Whereas


statistics is more about how we handle
various data using different techniques.

26
Probability
• P(E) = Number of Favourable
Outcomes/Number of total outcomes

• 0≤P(E)≤1

• σ 𝑷(𝑬𝒊 )=1

27
Probability
• 3 ways to obtain
– Repeated experimentation
– Sample space are equally likely
– Probability distribution
• Discrete probability distribution
– Binomial
– Poisson
• Continuous probability distribution
– Normal probability distribution

28
The Normal Probability Density Function

• The formula of the normal probability


density function is
1 (1/2)[(Xμ)/σ]2
f(X)  e
2π
Where e = the mathematical constant approximated by 2.71828
π = the mathematical constant approximated by 3.14159
μ = the population mean
σ = the population standard deviation
X = any value of the continuous variable
29
Two parameters:
• Population mean: 
• Population standard deviation : 

Normal distribution denoted by: N (  ,  2 )

30
• Populations with small values of
the standard deviation have 
a distribution concentrated close
to the centre,  ; those with
large standard deviation have a
distribution widely spread along
the measurement axis (Figure 3).
Figure 3
31
 Here the means are different while
the standard deviations are the same.
32
 (a) Effect of changing mean (1< 2 < 3)

1 2 3
Figure 4 Probability distribution functions of the Normal
distributions with different means. 33
 Here the means are the same while the
standard deviations are different. 34
 (b) Effect of changing  (1 < 2 < 3).
1

2

3

Figure 5 Probability distribution functions of the Normal
distributions with different standard deviations 35
So,
• Changing the mean simply moves the curve
along the horizontal axis, while changing
the standard deviation alters the height and
width of the curve.

36
Properties of the normal distribution
• The normal curve is symmetric and is
defined by its mean and its standard
deviation.

• The normal curve is not just one curve


but a family of curves.

37
Properties of the normal distribution
• The shape of the normal curve is often
illustrated as a bell-shaped curve.

• The central values, such as mean


median and mode, are identical .

• All normal curves are symmetrical


about the mean.
38
Properties of the normal distribution
• The sum of the positive deviation from
the mean is equal to the sum of the
negative deviations.

• The standard deviation determines the


width of the curve.

• The height of the normal curve is at its


maximum at the mean value.
39
Properties of the normal distribution
• The height of the curve declines as we
go in either direction from the mean, but
never touches the base, so that the tails
of the curve on both sides of the mean
extend indefinitely.
• The total area under the curve the same
as any other probability distribution is 1.
40
Area under the normal distribution curve
1)Total area under the curve = 1
(or 100%). f(X)
Area =1
2)Bell shaped and symmetrical
about its mean. That is, 50%
of the area (data) under the
curve lies to the left of the
mean and 50% of the area
X
(data) under the curve lies to

the right of the mean. The Normal probability distribution

41
3) 68% of the area (data) under the curve is
within one standard deviation of the mean.

68.27%

-   
42
4) 95% of the area (data) under the curve is
within 1.96 standard deviations of the mean.

95%
2.5% 2.5%

-1.96  1.96 

43
5) 99% of the area (data) under the curve is
within 2.58 standard deviations of the mean.

99%

0.5% 0.5%

-2.58  2.58 
44
45
True or False
For any normal distribution, the mean, median, and mode will have
the same value.

The percentile rank for the mean is 50% for any normal
distribution.

Abraham de Moivre, a consultant to gamblers, discovered the


normal distribution when trying to approximate the binomial
distribution to make his computations easier

The red distribution has more area underneath the curve than the
blue distribution does

46
Standard normal distribution

47
The Standardized Normal

• Any normal distribution (with any mean


and standard deviation combination)
can be transformed into the
standardized normal distribution (Z)

• Need to transform X units into Z units

48
Translation to the Standardized
Normal Distribution
• Translate from X to the standardized
normal (the “Z” distribution) by
subtracting the mean of X and
dividing by its standard deviation:

X μ
Z
σ
The Z distribution always has mean = 0 and
standard deviation = 1 49
To any normal variable N (  ,  2 ) , after a
standardization transformation:
X 
Z

Z is called with standardized normal deviate
or Z-value, or Z-score.
Z is a random variable that has a standard
Normal distribution.
Z ~ N (0,1)
50
Z ~ N (0,1)
The Standard Normal distribution has a mean
of zero and a variance of one. The formula is
given as Equation 2.

 0  1
2

1 z  2
 ( z)  exp  (2)
2  2 
51
Z
Z
N(μ,σ2) N(0,1)

Normal distribution Standard normal distribution


Area under the normal probability density curve

A table for standard normal distribution is


usually attached in most textbooks of
statistics.

( z )  P ( X  z )

54
55
Standardized Normal Probability Table

P=0.0044 Standardized Normal


Probability Table
Z 0.00 0.01 0.02
-3.0 0.0013 0.0013 0.0013

-2.9 0.0019 0.0018 0.0018

-2.8 0.0026 0.0025 0.0024

-2.7 0.0035 0.0034 0.0033

-2.62 -2.6 0.0047 0.0045 0.0044

-2.5 0.0062 0.0060 0.0059

Z=-2.62
P
• The area within (1.64,1.64)
(1.64)  0.0500
(1.64)  1  0.0500  0.9500
(1.64)  (1.64)  0.95000  0.05000  0.9000

90%
5% 5%

-1.64 0 1.64
Corresponding to 1.64,
the area of one tail is 0.05 , the area of two tails is 0.10
• The area within (1.96,1.96)
(1.96)  0.02500
(1.96)  1  0.02500  0.97500
(1.96)  ( 1.96)  0.97500  0.02500  0.9500

95%
2.5% 2.5%

-1.96 0 1.96
Corresponding to 1.96,
the area of one tail is 0.025, the area of two tails is 0.05
The area within (2.58,2.58)
(2.58)  0.004940
(2.58)  1  0.004940  0.995060
(2.58)  (2.58)  0.995060 0.004960  0.99012

99%

0.5% 0.5%

-2.58 0 2.58
Corresponding to 2.58,
the area of one tail is 0.005, the area of two tails is 0.010
68%

-1 0 1
Corresponding to 1,
the area of one tail is 0.159, the area of two tails is 0.317
60
Critical value
Z / 2 : Two sided critical value
Z : One sided critical value

Critical value Area of Area of


one tail two tails
1.645 0.05 0.10
1.960 0.025 0.05
2.576 0.005 0.01

 Z
61
62
Exercise
Use the following information to answer the next two exercises: The patient
recovery time from a particular surgical procedure is normally distributed with a
mean of 5.3 days and a standard deviation of 2.1 days.

What is the median recovery time?


A. 2.7
B. 5.3
C. 7.4
D. 2.1

What is the z-score for a patient who takes ten days to recover?
A. 1.5
B. 0.2
C. 2.2
D. 7.3

63
Exercise
Kyle’s doctor told him that the z-score for his systolic blood pressure is 1.75. Which
of the following is the best interpretation of this standardized score? The systolic
blood pressure (given in millimeters) of males has an approximately normal
distribution with mean μ=125 and standard deviation σ=14 . If X= a systolic
blood pressure score then X∼N(125,14) .

1. Which answer(s) is/are correct?


A. Kyle’s systolic blood pressure is 175.
B. Kyle’s systolic blood pressure is 1.75 times the average blood pressure of men his
age.
C. Kyle’s systolic blood pressure is 1.75 above the average systolic blood pressure
of men his age.
D. Kyles’s systolic blood pressure is 1.75 standard deviations above the average
systolic blood pressure for men.

2. Calculate Kyle’s blood pressure.


64
How do we use the Normal
distribution?

65
• The Normal probability distribution can be
used to calculate the probability of different
values occurring.
• We could be interested in: what is the
probabilityof being within 1 standard
deviation of the mean (or outside it)?
• We can use a standard normal distribution
table which tells us the probability of being
outside this value.

66
Example:
• The birthweight data from the
O’Cathain et al (2002) study let us
assume that the birthweight for new born
babies has a Normal distribution with a
mean of 3.4 kg and a standard deviation of
0.6 kg.
• So what is the probability of giving birth
to a baby with a birthweight of 4.5 kg or
higher? 67
Answers
• Since birthweight is assumed to follow a
Normal distribution, with mean of 3.4 kg
and SD of 0.6 kg, we therefore know that
approximately 68% of birthweights will lie
between 2.8 and 4.0 kg and about 95% of
birthweights will lie between 2.2 and 4.6 kg.
• Using Figure 6 we can see that a birthweight
of 4.5 kg is between 1 and 2 standard
deviations away from the mean.
68
Figure 6 Normal distribution curve for birthweight with a
mean of 3.4 kg and SD of 0.6 kg 69
• First calculate z, the number of standard
deviations 4.5 kg is away from the mean of
3.4 kg, that is,

Then look for z = 1.83 in Table A1 of the Normal


distribution table which gives the probability of
above the values of the mean +1.83 SD as 0.0336.
Therefore the probability of having a birthweight of
4.5 kg or higher is 0.0336 or 3.3%.
Online calculation
• https://www.hackmath.net/en/calculato
r/normal-distribution

71
Exercise:
The women age 18 and over in HANES 2005
averaged 63 inches tall with SD 3 inches. The
histogram was roughly unimodal and
symmetric. For these women:

(a) Convert the following to standard units


(i) 60 (ii) 63 (iii) 64.5
(b) Find the height which is -1.5 in standard units
(c) What percentage of women approximately are
between 59 and 60 inches tall?

72
(a) Converting scales
Heights in Standard Units
Heights of Women, 18 and over

0.15
0.15

0.10
0.10

Density
Density

0.05
0.05

0.00
0.00

-3 -2 -1 0 1 2 3
54 57 60 63 66 69 72
Height(Standard Unit)
Height(Inches)
Ave 63 SD 3

Height in inches converted to standard units.


(1)63 inches is 0 standard units;
(2)60 inches is -1 standard unit;
(3)64.5 inches is +0.5 standard unit. 73
(b) Standard units or z-scores
Standard units:
z = (Height – Ave)/SD
Original units:
Height = Ave + z × SD
The height which is -1.5 in standard
units: 63-1.5×3=58.5
74
(c) Areas between points…

60 59
59 60

What proportion of women in HANES are


between 59 and 60 inches tall?
75
76
Reference interval

• widely used tool for interpretation of


individual patient laboratory test results

• the prediction interval between which 95%


of values of a reference group fall into

• Most have lower and upper limit, but some


only upper (urine lead) or lower
(lung capacity)

77
Reference interval

• Use normal distribution to calculate the


reference interval

78
Reference interval

• Use normal distribution to calculate the


reference interval:
If the mean height of the boys in a region is
known (84.8cm), as well as the standardized
deviation (3.79cm) , what do you suggest for the
reference interval?

(84.8-1.96*3.27, 84.8+1.96*3.79)
(77.4cm, 92.2cm)
79
Exercises

80
1. For data to be normally distributed, which of
the following characteristics should it have?
A. To be defined as ‘normal’ a distribution should be
symmetrical about the mean, it should meet the x axis at
infinity and it should be platykurtic.
B. To be defined as ‘normal’ a distribution should be
symmetrical about the mean, it should meet the x axis at
infinity and it should be leptokurtic.
C. To be defined as ‘normal’ a distribution should be
symmetrical about the mean, it should meet the x axis at
infinity and it should be bell shaped.
D. To be defined as ‘normal’ a distribution should be
symmetrical about the mean, it should meet the x axis at
infinity and it should be positively skewed.
81
Exercise 2

82
Exercise 3

83
Answer to Exercise 3

84
85
Exercise 4
The patient recovery time from a particular surgical procedure is
normally distributed with a mean of 5.3 days and a standard deviation
of 2.1 days.

What is the probability of spending more than two days in recovery?


A. 0.0580
B. 0.8447
C. 0.0553
D. 0.9420

The 90th percentile for recovery times is?


A. 8.89
B. 7.07
C. 7.99
D. 4.32

86
According to a study done by De Anza students, the height for Asian adult
males is normally distributed with an average of 66 inches and a standard
deviation of 2.5 inches. Suppose one Asian adult male is randomly chosen.
Let X=height of the individual.

1.X∼ _____(_____,_____)

2.Find the probability that the person is between 65 and 69 inches. Include
a sketch of the graph, and write a probability statement.

3.Would you expect to meet many Asian adult males over 72 inches? Explain
why or why not, and justify your answer numerically.

4.The middle 40% of heights fall between what two values? Sketch the
graph, and write the probability statement.

87
The percent of fat calories that a person in America consumes
each day is normally distributed with a mean of about 36 and a
standard deviation of 10. Suppose that one individual is
randomly chosen. Let X=percent of fat calories.

1.X∼ _____(_____,_____)

2.Find the probability that the percent of fat calories a person


consumes is more than 40. Graph the situation. Shade in the
area to be determined.

3.Find the maximum number for the lower quarter of percent of


fat calories. Sketch the graph and write the probability
statement 88
89
SPSS: test normality

1. Click Analyze -> Descriptive Statistics -> Explore…


2. Move the variable of interest from the left box into the
Dependent List box on the right.
3. Click the Plots button, and tick the Normality plots with
tests option.
4. Click Continue, and then click OK.
5. Your result will pop up – check out the Tests of Normality
section.

90
91
92
93
Statistical tests -including ANOVA, t-
tests and regression- require the normality
assumption

If sample sizes are reasonable, normality tests are


often pointless

94

You might also like