Lecture 3
Lecture 3
and Economics
Module 1:Probability Theory and
Statistical Inference
Spring 2010
Lecture 3: Continuous probability distributions
Priyantha Wijayatunga, Department of Statistics, Ume
University
These materials
are altered ones from copyrighted lecture slides ( 2009 W.H.
[email protected]
Freeman and Company) from the homepage of the book:
The Practice of Business Statistics Using Data for Decisions :Second Edition
by Moore, McCabe, Duckworth and Alwan.
Continuous probability
distributions
Probability density
Sampling distributions
0.1
0.2
P X x p ( x) 0.2
0.3
0.2
if x 1
if x 2
if x 3
if x 4
if x 5
then
1)what is the probability that a student comes to the class more than 3 days?
2)what is the probability that a student comes to the class 2 or 3 days?
Continuous Probability
A
continuous random variable X takes all values in an interval.
Distributions
Example: There is an infinity of numbers between 0 and 1 (e.g., 0.001, 0.4, 0.0063876).
f(x)= 1; for 0 x 1
f(x)= 0; for x<0 or x>1
Intervals
All continuous probability distributions assign probability 0 to every
individual outcome. Only intervals can have a positive probability, represented
by the area under the density curve for that interval.
P(X < 0.5 or X > 0.8) = P(X < 0.5) + P(X > 0.8) = 1 P(0.5 < X < 0.8) = 0.7
f (x)
(b - a)
if a x b
otherwise
% individuals with X
such that x1 < X < x2
Normal probability
distributions
The
probability distribution of many random variables is a normal
distribution. It shows what values the random variable can take and is
used to assign probabilities to those values.
Example: Probability
distribution of womens
heights.
Here since we chose a woman
randomly, her height, X, is a
random variable.
Normal distributions
Normal or Gaussian distributions are a family of symmetrical, bell
shaped density curves defined by a mean (mu) and a standard
deviation (sigma) : N().
f ( x)
1
2
1 x
x
e = 2.71828 The base of the natural logarithm
= pi = 3.14159
Inflection point
N(0,1)
=>
Standardizing: calculating zA
z-score measures the number of standard deviations that a data
scores
value x is from the mean .
(x )
z
for x , z
for x 2 , z
2 2
N(, ) =
N(64.5, 2.5)
Area= ???
Area = ???
= 64.5 x = 67
z=0
z=1
(x )
(67 64.5) 2.5
, z
2.5
2.5
Because of the 68-95-99.7 rule, we can conclude that the percent of women
shorter than 67 should be, approximately, .68 + half of (1 - .68) = .84 or 84%.
What is the probability, if we pick one woman at random, that her height will be
some value X? For instance, between 68 and 70 inches P(68 < X < 70)?
Because the woman is selected at random, X is a random variable.
(x )
z
N(, ) =
N(64.5, 2.5)
For x = 68",
(68 64.5)
1. 4
2.5
For x = 70",
(70 64.5)
2.2
2.5
0.9192
0.9861
The area under the curve for the interval [68" to 70"] is 0.9861 0.9192 = 0.0669.
Thus, the probability that a randomly chosen woman falls into this range is 6.69%.
P(68 < X < 70) = 6.69%
Using Table A
Table A gives the area under the standard Normal curve to the left of any z value.
.0082 is the
area under
N(0,1) left
of z = -2.40
()
N(, ) =
N(64.5, 2.5)
Area 0.84
Conclusion:
84.13% of women are shorter than 67.
Area 0.16
= 64.5 x = 67
z=1
Area = 0.0099
area right of z =
area left of z
x 820
1026
209
(x )
z
(820 1026)
z
209
206
z
0.99
209
Table A : area under
N(0,1) to the left of
z - .99 is 0.1611
or approx.16%.
=
=
total area
1
84%
The NCAA defines a partial qualifier eligible to practice and receive an athletic
scholarship, but not to compete, as a combined SAT score is at least 720.
What proportion of all students who take the SAT would be partial
qualifiers? That is, what proportion have scores between 720 and 820?
x 720
1026
209
(x )
z
(720 1026)
z
209
306
z
1.46
209
Table A : area under
N(0,1) to the left of
z - .99 is 0.0721
or approx. 7%.
area between
720 and 820
9%
=
=
(x )
z
N(0,1)
area/proportion in the
body of the table
corresponding z-value
from the left column and
top row
For an area to the left of 1.25 % (0.0125),
the z-value is -2.24
2. Unstandardize
x 25.7
1.28
5.88
Solving for x gives x = 33.2
miles per gallon.
0.1
density
0.3
0.4
-3
-2
-1
X 10
P X 11 P
11.025 10
0.3
P Z 1.87
1 P Z 1.87
1 - 0.9693
0.0307
0.3
If the distribution is indeed normal the plot will show a straight line,
indicating a good match between the data and a normal distribution.
gets
Sampling distribution of
We
take many random
samples of a given size n from a population
sample
mean
with mean and standard deviation
Some sample means will be above the population mean and some
will be below, making up the sampling distribution.
Sampling
distribution
of x bar
Histogram
of some
sample
averages
If the population is N( )
then the sample means
distribution is N( /n).
Population
Population with
strongly skewed
distribution
Sampling
distribution of
x for n = 2
observations
Sampling
distribution of
x for n = 10
observations
Sampling
distribution of
x for n = 25
observations
Density
1.0
1.0
0.5
0.5
0.0
0.0
Density
1.5
1.5
2.0
2.5
Bin(5,0.7)
3.0
3.2
3.4
3.6
3.8
sample mean
1.024695/ 50 0.1449138
Application
Hypokalemia is diagnosed when blood potassium levels are low, below
3.5mEq/dl. Lets assume that we know a patient whose measured potassium
levels vary daily according to a normal distribution N( = 3.8, = 0.2).
If only one measurement is made, what is the probability that this patient will be
misdiagnosed hypokalemic?
( x ) 3.5 3.8
z
0.2
( x ) 3.5 3.8
z
n
0.2 4
Note: Make sure to standardize (z) using the standard deviation for the sampling
distribution.
Income distribution
Lets consider the very large database of individual incomes from the Bureau of
Labor Statistics as our population. It is strongly right skewed.
We take 1000 SRSs of 100 incomes, calculate the sample mean for
each, and make a histogram of these 1000 means.
We also take 1000 SRSs of 25 incomes, calculate the sample mean for
each, and make a histogram of these 1000 means.
Which histogram
corresponds to the
samples of size
100? 25?