Advanced Topics in Introductory Probability
Advanced Topics in Introductory Probability
NSOWAH-NUAMAH
ADVANCED TOPICS
IN INTRODUCTORY
PROBABILITY
A FIRST COURSE IN PROBABILITY
THEORY – VOLUME III
2
Advanced Topics In Introductory Probability: A first Course in Probability Theory – Volume III
2nd edition
© 2018 Nicholas N.N. Nsowah-Nuamah & bookboon.com
ISBN 978-87-403-2238-5
3
ADVANCED TOPICS IN INTRODUCTORY
PROBABILITY: A FIRST COURSE IN
PROBABILITY THEORY – VOLUME III Contents
CONTENTS
Part 1 Bivariate Probability Distributions 7
�e Graduate Programme
I joined MITAS because for Engineers and Geoscientists
I wanted real responsibili� www.discovermitas.com
Maersk.com/Mitas �e G
I joined MITAS because for Engine
I wanted real responsibili� Ma
Month 16
I was a construction Mo
supervisor ina const
I was
the North Sea super
advising and the No
Real work he
helping foremen advis
International
al opportunities
Internationa
�ree wo
work
or placements ssolve problems
Real work he
helping fo
International
Internationaal opportunities
�ree wo
work
or placements ssolve pr
4
ADVANCED TOPICS IN INTRODUCTORY
PROBABILITY: A FIRST COURSE IN
PROBABILITY THEORY – VOLUME III Contents
5
ADVANCED TOPICS IN INTRODUCTORY
PROBABILITY: A FIRST COURSE IN
PROBABILITY THEORY – VOLUME III Contents
Bibliography 273
6
ADVANCED TOPICS IN INTRODUCTORY
PROBABILITY: A FIRST COURSE IN
PROBABILITY THEORY – VOLUME III Part 1 Bivariate Probability Distributions
PART 1
BIVARIATE PROBABILITY DISTRIBUTIONS
I salute the discovery of a single even insignificant truth more highly than
all the argumentation on the highest questions which fails to reach a truth
GALILEO (1564–1642)
7
ADVANCED TOPICS IN INTRODUCTORY
PROBABILITY: A FIRST COURSE IN PROBABILITY AND DISTRIBUTION FUNCTIONS
PROBABILITY THEORY – VOLUME III OF BIVARIATE DISTRIBUTIONS
Chapter 1
1.1 INTRODUCTION
There are cases where one variable is discrete and the other continuous
but this will not be considered here.
Density and Distribution Functions of Bivariate Distributions 5
Suppose that X and Y are discrete random variables, and X takes values
i = 0, 1, 2, · · · , n, and Y takes values j = 1, 2, · · · , m. Most often, such a
joint distribution is given in table form. Table 1.1 is an n-by-m array which
displays the number of occurrences of the various combinations of values
of X and Y . We may observe that each row represents values of X and
each column represents values of Y . The row and column totals are called
marginal totals. Such a table is called the joint frequency distribution.
x2 (x2 , y1 ) (x2 , y2 ) ··· (x2 , ym ) x2
y
.. .. .. .. .. 9
..
. . . . . .
i = 0, 1, 2, · · · , n, and Y takes values j = 1, 2, · · · , m. Most often, such a
joint distribution is given in table form. Table 1.1 is an n-by-m array which
displays
ADVANCED the number
TOPICS of occurrences of the various combinations of values
IN INTRODUCTORY
of X and Y . A We
PROBABILITY: may
FIRST observe
COURSE IN that each row represents values ofAND
PROBABILITY and
X DISTRIBUTION FUNCTIONS
PROBABILITY
each column THEORY
represents – VOLUME
values IIIof Y . The row and column totals areOFcalled
BIVARIATE DISTRIBUTIONS
x2 (x2 , y1 ) (x2 , y2 ) ··· (x2 , ym ) x2
y
.. .. .. .. .. ..
. . . . . .
xn (xn , y1 ) (xn , y2 ) ··· (xn , ym ) xn
y
Column y1 y2 ··· ym xi yj = N
Totals x x x x y
6 Advanced Topics in Introductory Probability
For example, suppose X and Y are discrete random variables, and X takes
values 0, 1, 2, 3, and Y takes values 1, 2, 3. Each of the nm row-column in-
tersections in Table 1.2 represents the frequency that belongs to the ordered
pair (X, Y ).
p(xi , yj ) = P ({X = xi } ∩ {Y = yj })
Note
(a) The notation p(x, y) for all (x, y) is the same as writing p(xi , yj ) for
i = 1, 2, ..., n and j = 1, 2, 3, ..., m. Sometimes when there is no ambi-
10
guity we shall use simply p(x, y).
The function p(xi , yj ) is sometimes referred to as the joint probability
ADVANCED TOPICS IN INTRODUCTORY
mass functionA (p.m.f.)
PROBABILITY: or theINjoint probability function
FIRST COURSE (p.f.) of XAND
PROBABILITY andDISTRIBUTION
Y. FUNCTIONS
This functionTHEORY
PROBABILITY gives the probability
– VOLUME III that X will assume a particular OFvalue
BIVARIATE
x DISTRIBUTIONS
while at the same time Y will assume a particular value y.
Note
(a) The notation p(x, y) for all (x, y) is the same as writing p(xi , yj ) for
i = 1, 2, ..., n and j = 1, 2, 3, ..., m. Sometimes when there is no ambi-
guity
Density andwe shall use simply
Distribution Functions of Bivariate Distributions
p(x, y). 7
Definition 1.5
If X and Y are discrete random variables with joint probability mass
function p(xi , yj ), then
Once the joint probability mass function is determined for discrete random
variables X and Y , calculation of joint probabilities involving X and Y is
straightforward.
Let the value that the random variables X and Y jointly take be de-
noted by the ordered pair (xi , yj ). The joint probability p(xi , yj ) is obtained
by counting the number of occurrences of that combination of values X and
Y and dividing the count by the total number of all the sample points. Thus,
#({X = xi } ∩ {Y = yj })
P ({X = xi } ∩ {Y = yj }) = n
m
#({X = xi } ∩ {Y = yj })
i=1 j=1
#(xi , yj )
= n
m
#(xi , yj )
i=1 j=1
where
#(xi , yj ) is the number of occurrences in the cell of the
ordered pair (xi , yj );
n
m
#(xi , yj ) is the total number of all sample points (cells)
i=1 j=1
of the ordered pairs (xi , yj ), denoted by N .
11
ADVANCED TOPICS IN INTRODUCTORY
8
PROBABILITY: A FIRST COURSE INAdvanced Topics in Introductory Probability
PROBABILITY AND DISTRIBUTION FUNCTIONS
PROBABILITY THEORY – VOLUME III OF BIVARIATE DISTRIBUTIONS
X Y Row Totals
y1 y2 ··· ym
x1 p(x1 , y1 ) p(x1 , y2 ) ··· p(x1 , ym ) p(x1 )
x2 p(x2 , y1 ) p(x2 , y2 ) ··· p(x2 , ym ) p(x2 )
.. .. .. .. .. ..
. . . . . .
Note
The marginal probabilities for X are simply the simple probabilities that
X = xi for values of yj , where j assumes a value from 1 to m. Similarly, the
marginal probabilities for Y are the simple probabilities that Y = yj , where
i assumes a value from 1 and n.
www.job.oticon.dk
12
xn
ADVANCED
p(x , y ) p(xn , y2 )
TOPICSn IN 1INTRODUCTORY
··· p(xn , ym ) p(xn )
n
m
PROBABILITY: A FIRST COURSE IN PROBABILITY AND DISTRIBUTION FUNCTIONS
PROBABILITY p(y1 ) – VOLUME
Column THEORY p(y2 )III ··· p(ym ) p(xi , yj ) =OF1 BIVARIATE DISTRIBUTIONS
Totals i=1 j=1
Note
The marginal probabilities for X are simply the simple probabilities that
X = xi for values of yj , where j assumes a value from 1 to m. Similarly, the
marginal probabilities for Y are the simple probabilities that Y = yj , where
i assumes a value from 1 and n.
Example 1.1
Example 1.1
(a) For the data in Table 1.2, calculate the joint probabilities of X and Y .
(a) For the data in Table 1.2, calculate the joint probabilities of X and Y .
(b) Does this distribution satisfy the properties of a joint probability func-
(b) Does
tion? this distribution satisfy the properties of a joint probability func-
tion?
Solution
Solution
(a) From Table 1.2, the cell ({X = 0} ∩ {Y = 1}) = (0, 1) contains one
(a) From Table 1.2,
element; the cell ({X = 0} ∩ {Y = 1}) = (0, 1) contains one
element;
Total number of elements in all cells is 8.
Total
Hencenumber of elements in all cells is 8.
Hence
P ({X = 0} ∩ {Y = 1}) = p(0, 1)
P ({X = 0} ∩ {Y = 1}) = p(0, #({X
1) = 0} ∩ {Y = 1}) 1
= n m #({X = 0} ∩ {Y = 1}) =1
= n m = 8
#({X = xi } ∩ {Y = yj }) 8
i j #({X = xi } ∩ {Y = yj })
i j
Similarly,
Similarly,
0
P ({X = 0} ∩ {Y = 2}) = p(0, 2) = 08 =0
P ({X = 0} ∩ {Y = 2}) = p(0, 2) = =0
80
P ({X = 0} ∩ {Y = 3}) = p(0, 3) = 08 =0
P ({X = 0} ∩ {Y = 3}) = p(0, 3) = =0
80
P ({X = 1} ∩ {Y = 1}) = p(1, 1) = 08 =0
P ({X = 1} ∩ {Y = 1}) = p(1, 1) = =0
82 1
P ({X = 1} ∩ {Y = 2}) = p(1, 2) = 28 =1
P ({X = 1} ∩ {Y = 2}) = p(1, 2) = = 4
81 41
P ({X = 1} ∩ {Y = 3}) = p(1, 3) = 18 =1
P ({X = 1} ∩ {Y = 3}) = p(1, 3) = = 8
8 8
13
8
2 1
P ({X = 1} ∩ {Y = 2}) = p(1, 2) = =
ADVANCED TOPICS IN INTRODUCTORY 8 4
PROBABILITY: A FIRST COURSE IN 1 1
P ({X = 1} ∩ {Y = 3}) = p(1, 3) PROBABILITY
= = AND DISTRIBUTION FUNCTIONS
PROBABILITY THEORY – VOLUME III 8 8 OF BIVARIATE DISTRIBUTIONS
(i) 3
p(x i , yj ) ≥ 0,
3 for1 all1 i =2 0, 1,22, 3;1 j =1 1, 2, 3.
(ii) p(xi , yj ) = + + + + + = 1
3 3
i=0 j=1
81 81 82 82 81 81
(ii) p(xi , yj ) = + + + + + = 1
8 8 8 8 8 8
Hencei=0
this
j=1 distribution is a joint probability function.
For
ADVANCED to be IN
p(x, y)TOPICS a joint probability function we must have 21k = 1 from
INTRODUCTORY
PROBABILITY: A FIRST COURSE IN
which PROBABILITY AND DISTRIBUTION FUNCTIONS
PROBABILITY THEORY – VOLUME III 1 OF BIVARIATE DISTRIBUTIONS
k=
21
(b) For the sample point {X = 0, Y = 0} = (0, 0)
1
p(0, 0) = [3(0) + 2(0)] = 0
21
Similarly,
1 2
p(0, 1) = [3(0) + 2(1)] =
21 21
1 4
p(0, 2) = [3(0) + 2(2)] =
21 21
1 3
p(1, 0) = [3(1) + 2(0)] =
21 21
1 5
p(1, 1) = [3(1) + 2(1)] =
21 21
1 7
p(1, 2) = [3(1) + 2(2)] =
21 21
These results are presented in the following table. Recollect that the row
and column totals are the marginal probabilities.
X Y Row
Totals
0 1 2
0 0 2/21 4/21 6/21
1 3/21 5/21 7/21 15/21
Column 3/21 7/21 11/21 1
12 Totals Advanced Topics in Introductory Probability
Fig 1.2 depicts the case of the continuous bivariate random variables
(X, Y ) which assume all the values in the rectangle x1 ≤ X ≤ x2 and
y1 ≤ Y ≤ y2 .
15
Fig 1.2
ADVANCED depicts
TOPICS the case of the continuous bivariate random variables
IN INTRODUCTORY
PROBABILITY:
(X, Y ) whichA assume
FIRST COURSE
all theINvalues in the rectanglePROBABILITY
x1 ≤ X ≤ANDx2 DISTRIBUTION
and FUNCTIONS
PROBABILITY THEORY – VOLUME III OF BIVARIATE DISTRIBUTIONS
y1 ≤ Y ≤ y2 .
{x1 ≤ X ≤ x2 } ∩ {y1 ≤ Y ≤ y2 }
P ({x1 ≤ X ≤ x2 } ∩ {y1 ≤ Y ≤ y2 })
Density and Distribution Functions of Bivariate Distributions 13
This probability can be found by subtracting from the probability that the
event will fall in the (semi-infinite) rectangle having the upper-right corner
(x2 , y2 ) the probabilities that it will fall in the semi-infinite rectangle having
the upper-right corner (x1 , y2 ) and (x2 , y1 ) respectively, and then adding
back the probability that it will fall in the semi-infinite rectangle with the
upper-right corner at (x1 , y1 ). That is,
P (x1 ≤ X ≤ x2 , y1 ≤ Y ≤ y2 ) = P (X ≤ x2 , Y ≤ y2 ) − P (X ≤ x2 , Y ≤ y1 )
−P (X ≤ x1 , Y ≤ y2 ) + P (X ≤ x1 , Y ≤ y1 )
PROGRESS?
Let (X, Y ) be a continuous bivariate random variable assuming all
values in the region R. The joint probability density function f is a
function satisfying the following properties:
where k is a constant. 16
(a) Find the value of k > 0 such that f (x, y) is a probability density
(x2 , y2 ) the probabilities that it will fall in the semi-infinite rectangle having
the upper-right corner (x1 , y2 ) and (x2 , y1 ) respectively, and then adding
ADVANCED TOPICS IN INTRODUCTORY
back the probability
PROBABILITY: that it will
A FIRST COURSE IN fall in the semi-infinite rectangle AND
PROBABILITY withDISTRIBUTION
the FUNCTIONS
upper-right
PROBABILITYcorner
THEORYat– (x ,
1 1y
VOLUME ). That
III is, OF BIVARIATE DISTRIBUTIONS
P (x1 ≤ X ≤ x2 , y1 ≤ Y ≤ y2 ) = P (X ≤ x2 , Y ≤ y2 ) − P (X ≤ x2 , Y ≤ y1 )
−P (X ≤ x1 , Y ≤ y2 ) + P (X ≤ x1 , Y ≤ y1 )
Definition 1.7
Let (X, Y ) be a continuous bivariate random variable assuming all
values in the region R. The joint probability density function f is a
function satisfying the following properties:
Property 2 states that the total volume bounded by the surface given by
equation z = f (x, y) and the region R on the xy-plane is equal to 1.
Example 1.3
Given the following function of a two-dimensional continuous random vari-
able (X, Y ):
xy
x2 + , 0≤x≤1 0≤y≤2
f (x, y) = k
0, elsewhere
where k is a constant.
(a) Find the value of k > 0 such that f (x, y) is a probability density
function.
14(b) Find P (0 < X < 1, 1 < Advanced
Y < 2). Topics in Introductory Probability
Solution
(a) For f (x, y) to be a p.d.f, it should satisfy the two conditions of Theorem
1.2. Obviously,
f (x, y) ≥ 0
since x ≥ 0, y ≥ 0, and k > 0.
Also +∞ +∞
f (x, y) dx dy = 1
−∞ −∞
Now,
2 1 2 1
xy xy
x +2
dx dy = x +
2
dx dy
y=0 x=0 k y=0 x=0 k
2 3 1
x x2 y
= + dy
y=0 3 2k x=0
2
1 y
= + dy
y=0 3 2k
2
1 y2
= y+
3 4k 0
2 171
= + =1
3 k
x + dx dy = x + dx dy
y=0 x=0 k y=0 x=0 k
2 3 1
ADVANCED TOPICS IN INTRODUCTORY x x2 y
= + dy
PROBABILITY: A FIRST COURSE IN y=0 2k x=0
3 PROBABILITY AND DISTRIBUTION FUNCTIONS
PROBABILITY THEORY – VOLUME III 2 OF BIVARIATE DISTRIBUTIONS
1 y
= + dy
y=0 3 2k
2
1 y2
= y+
3 4k 0
2 1
= + =1
3 k
giving k = 3.
(b) Using the value of k found in (a) we have
1 2
xy
P (0 < X < 1, 1 < Y < 2) = dx dy x2 +
x=0 y=1 3
2 1
xy
= x +
2
dx dy
y=1 x=0 3
2 3 1
x x2 y
= + dy
y=1 3 6 x=0
2
1 y
= + dy
y=1 3 6
2
y y2 7
Density and Distribution Functions of=Bivariate
+ Distributions
= 15
3 12 y=1 12
18
The joint cumulative distribution function of random variables X and Y
The joint
gives cumulative that
the probability distribution
X takesfunction of random
on a value variables
less that or equal and
X to xi ,Y
ADVANCED TOPICS IN INTRODUCTORY
gives the probability
i = 1, 2, · · · , nA FIRST
PROBABILITY:
that
and that the IN
COURSE
X takes on a value less that or equal
Y takes on a value less PROBABILITY to yxji ,
than or equalAND DISTRIBUTION FUNCTIONS
= 1,
ji = 1,2,
2,······,,m.
PROBABILITY n and that
THEORY the YIIItakes on a value less than or equal OF
– VOLUME to BIVARIATE
yj , DISTRIBUTIONS
j = 1, 2, · · · , m.
Example 1.4
16
Example 1.4 table of Example Advanced Topics in Introductory Probability
Refer
16 to the 1.1. Calculate
Advanced Topics in Introductory Probability
Refer to the table of Example 1.1. Calculate
(a) the joint probability P (2 ≤ X ≤ 3, 1 ≤ Y ≤ 2);
(b)
(a) the
the joint
joint cumulative
probability probability
P (2 ≤ X ≤ P (X ≤Y1, ≤Y 2);
≤ 2).
(b) the joint cumulative probability P3,(X
1≤≤ 1, Y ≤ 2).
Solution
Solution
(a) The joint probability P (2 ≤ X ≤ 3, 1 ≤ Y ≤ 2) is obtained as follows:
(a) The joint probability P (2 ≤ X ≤ 3, 1 ≤ Y ≤ 2) is obtained as follows:
P (2 ≤ X ≤ 3, 1 ≤ Y ≤ 2) = p(2, 1) + p(2, 2) + p(3, 1) + p(3, 2)
P (2 ≤ X ≤ 3, 1 ≤ Y ≤ 2) = p(2, 21) + 1p(2, 2) + p(3, 3 1) + p(3, 2)
= 0+ 2 + 1 +0== 3
= 0+ 8 + 8 +0== 8
8 8 8
(b) The joint probability P (X ≤ 1, Y ≤ 2) is as follows:
(b) The joint probability P (X ≤ 1, Y ≤ 2) is as follows:
P (X ≤ 1, Y ≤ 2) = F (1, 2),
P (X ≤ 1, Y ≤ 2) = F (1, 2),
= p(0, 1) + p(0, 2) + p(1, 1) + p(1, 2)
= p(0,
1 1) + p(0,22) +3p(1, 1) + p(1, 2)
= 1 +0+0+ 2 = 3
= 8 +0+0+ 8 = 8
8 8 8
Example 1.5
Example
Refer 1.5
to Example 1.2. Calculate
Refer to Example 1.2. Calculate
(a) the joint probability distribution P (0 ≤ X ≤ 1, 1 ≤ Y ≤ 2);
(a) the joint probability distribution P (0 ≤ X ≤ 1, 1 ≤ Y ≤ 2);
PREPARE FOR A
(b) the joint cumulative distribution function P (X ≤ 1, Y ≤ 1);
(b) the joint cumulative distribution function P (X ≤ 1, Y ≤ 1);
LEADING ROLE.
(c) the joint cumulative distribution function P (X ≤ 1, Y ≤ 2).
(c) the joint cumulative distribution function P (X ≤ 1, Y ≤English-taught
2). MSc programmes
Solution in engineering: Aeronautical,
Solution Biomedical, Electronics,
Mechanical, Communication
1 2
1
(a) P (0 ≤ X ≤ 1, 1 ≤ Y ≤ 2) = 1 2
1 (3x + 2y) systems and Transport systems.
21
(a) P (0 ≤ X ≤ 1, 1 ≤ Y ≤ 2) = x=0 y=1 (3x + 2y)
x=0 y=1
21 No tuition fees.
1
1 1
= 1 [(3x + 2) + (3x + 4)]
= 21 [(3x + 2) + (3x + 4)]
21 x=0
x=0
1
1
= 1 1 (6x + 6)
= 21 (6x + 6)
21 x=0
1 x=0 E liu.se/master
6
= 1 [(0 + 6) + (6 + 6)] = 67
21
= [(0 + 6) + (6 + 6)] =
(b) P (X ≤ 1, Y ≤ 1) = 21(1, 1)
F 7
‡
(b) P (X ≤ 1, Y ≤ 1) = F1(1, 11)
1
= 1 1
1 (3x + 2y)
= x=0 y=0
21 (3x + 2y)
x=0 y=0
21
19
P (X ≤ 1, Y ≤ 2) = F (1, 2),
ADVANCED TOPICS IN INTRODUCTORY
= p(0, 1)
PROBABILITY: A FIRST COURSE IN + p(0, 2) + p(1,PROBABILITY
1) + p(1, 2) AND DISTRIBUTION FUNCTIONS
PROBABILITY THEORY – VOLUME III 1 2 3 OF BIVARIATE DISTRIBUTIONS
= +0+0+ =
8 8 8
Example 1.5
Refer to Example 1.2. Calculate
Solution
2
1
1
(a) P (0 ≤ X ≤ 1, 1 ≤ Y ≤ 2) = (3x + 2y)
x=0 y=1
21
1
1
= [(3x + 2) + (3x + 4)]
21 x=0
1
1
= (6x + 6)
21 x=0
1 6
= [(0 + 6) + (6 + 6)] =
21 7
(b) P (X ≤ 1, Y ≤ 1) = F (1, 1)
1 1
Density and Distribution Functions of Bivariate 1
Distributions 17
= (3x + 2y)
x=0 y=0
21
1
1
= [(3x + 0) + (3x + 2)]
21 x=0
1
1
= (6x + 2)
21 x=0
1 10
= [(0 + 2) + (6 + 2)] =
21 21
The reader is asked in Exercise 1.5 to solve part (c) of this example.
The joint distribution function gives the probability that the point X, Y
belongs to a semi-infinite rectangle in the plane, as shown in Fig. 1.3.
20
F (x, y) = P ({X ≤ x} ∩ {Y ≤ y}) = f (s, t) ds dt
−∞ −∞
ADVANCED TOPICS IN INTRODUCTORY
where f (s,At)FIRST
PROBABILITY: is the value of
COURSE IN the joint probability density of X and
PROBABILITY ANDY DISTRIBUTION FUNCTIONS
at (s, t)
PROBABILITY THEORY – VOLUME III OF BIVARIATE DISTRIBUTIONS
The joint distribution function gives the probability that the point X, Y
belongs to a semi-infinite rectangle in the plane, as shown in Fig. 1.3.
Example 1.6
Refer to Example 1.3. Calculate P (X ≤ 1, Y < 1).
Solution
1 1
xy
P (X ≤ 1, Y < 1) = x + dx dy 2
y=0 x=0 3
1 1
xy
= x +
2
dx dy
y=0 x=0 3
1 3 1
x x2 y
= + dy
y=0 3 6 x=0
1
1 y
= + dy
y=0 3 6
1
1 y2
= y+
3 12 y=0
5
=
12
Theorem 1.1
If F is the cumulative distribution function of a two-dimensional
random variable with joint probability density function f (x, y) then
∂ 2 F (x, y)
= f (x, y)
∂x ∂y
wherever F is differentiable
Example 1.7
Let
F (x, y) = (1 − e−x )(1 − e−y ), x ≥ 0, y≥0
Find the joint probability density function f (x, y).
Solution
∂F (x, y)
= e−x (1 − e−y )
∂x
21
∂ 2 F (x, y)
ADVANCED TOPICS IN INTRODUCTORY = f (x, y)
∂x ∂y
PROBABILITY: A FIRST COURSE IN PROBABILITY AND DISTRIBUTION FUNCTIONS
PROBABILITY THEORY – VOLUME III OF BIVARIATE DISTRIBUTIONS
wherever F is differentiable
Example 1.7
Let
F (x, y) = (1 − e−x )(1 − e−y ), x ≥ 0, y≥0
Find the joint probability density function f (x, y).
Solution
∂F (x, y)Functions
Density and Distribution of Bivariate Distributions 19
= e−x (1 − e−y )
∂x
∂ 2 F (x, y)
= e−x e−y
∂x∂y
= e−(x+y) , x ≥ 0, y ≥ 0.
Hence
f (x, y) = e−(x+y) , x ≥ 0, y ≥ 0}
Note
∂ 2 F (x, y) ∂ 2 F (x, y)
=
∂x∂y ∂y∂x
The joint c.d.f. of a bivariate random variable has properties which are
analogous to those of the univariate random variable.
Property 1
The function F (x, y) is a probability, hence
0 ≤ F (x, y) ≤ 1
Click here
Property 2 to learn more
TAKE THE
The bivariate distribution function F (x, y) is monotonic increasing, in a
wider sense for both variables, that is,
RIGHT TRACK
if x1 ≤ x2 , then
if y1 ≤ y2 , then
Give your career a head start
byy1studying
F (x, ) ≤ F (x, ywith
2 ), us.
forExperience
x fixed the advantages
of our collaboration with major companies like
Property 3 ABB, Volvo and Ericsson!
The following relations are also true:
(a) F (−∞, y) = 0
22
f (x, y) = e , x ≥ 0, y ≥ 0}
The joint c.d.f. of a bivariate random variable has properties which are
analogous to those of the univariate random variable.
Property 1
The function F (x, y) is a probability, hence
0 ≤ F (x, y) ≤ 1
Property 2
The bivariate distribution function F (x, y) is monotonic increasing, in a
wider sense for both variables, that is,
if x1 ≤ x2 , then
if y1 ≤ y2 , then
Property 3
The following relations are also true:
(a) F (−∞, y) = 0
Property 4
At points of continuity of f (x, y) is
∂ 2 F (x, y)
= f (x, y)
∂x∂y
The row totals of Table 1.3 provide us with the probability distribution of
X. Similarly, the column totals provide the probability distribution of Y .
These are typically called marginal probability mass functions because they
are found on the margins of tables.
n
h(yj ) = P (Y Functions
Density and Distribution = yj ) = ofp(xi , yj ), Distributions
Bivariate j = 1, 2, 3, ..., m 21
i=1
Density and Distribution Functions of Bivariate Distributions 21
Example 1.8
For the data
Example 1.8 in Table 1.2, find the marginal probability distribution for
(a)
For the
X data andin Table
(b) Y . find the marginal probability distribution for
1.2,
(a) X and (b) Y .
Solution
To calculate the marginal probabilities the joint probabilities are required.
Solution
The joint probabilities
To calculate the marginalfor probabilities
this problemthe have been
joint calculated are
probabilities in Example
required.
1.1.
The joint probabilities for this problem have been calculated in Example
1.1.
(a) From the table obtained in Example 1.1. we shall calculate the marginal
(a) probabilities
From the table forobtained
each xi inbyExample
fixing i 1.1.
and we
summing all the joint
shall calculate proba-
the marginal
bilities acrossfor
probabilities j. Thus:
each xi by fixing i and summing all the joint proba-
bilities across j. Thus:
P (X = 0) = P (X = 0, Y = 1) + P (X = 0, Y = 2)
P (X = 0) = P (X +P= (X0,=Y 0,=Y1)=+3)P (X = 0, Y = 2)
+P (X = 0, Y = 3) 1 1
= p(0, 1) + p(0, 2) + p(0, 3) = + 0 + 0 =
81 81
= p(0, 1) + p(0, 2) + p(0, 3) = + 0 + 0 =
8 8
P (X = 1) = P (X = 1, Y = 1) + P (X = 1, Y = 2)
P (X = 1) = P (X+P = 1,
(XY ==1,1)Y+=P3)(X = 1, Y = 2)
+P (X = 1, Y = 3) 2 1 3
= p(1, 1) + p(1, 2) + p(1, 3) = 0 + + =
82 81 83
= p(1, 1) + p(1, 2) + p(1, 3) = 0 + + =
8 8 8
P (X = 2) = P (X = 2, Y = 1) + P (X = 2, Y = 2)
P (X = 2) = P (X +P= (X2,=Y 2,=Y1)=+3)P (X = 2, Y = 2)
+P (X = 2, Y = 3) 2 1 3
= p(2, 1) + p(2, 2) + p(2, 3) = 0 + + =
82 81 83
= p(2, 1) + p(2, 2) + p(2, 3) = 0 + + =
8 8 8
P (X = 3) = P (X = 3, Y = 1) + P (X = 3, Y = 2)
P (X = 3) = P (X +P= (X3,=Y 3,=Y1)=+3)P (X = 3, Y = 2)
+P (X = 3, Y = 3) 1 1
= p(3, 1) + p(3, 2) + p(3, 3) = + 0 + 0 =
81 81
= p(3, 1) + p(3, 2) + p(3, 3) = + 0 + 0 =
The results are summarised in the table below: 8 8
The results are summarised in the table below:
xi 0 1 2 3
xii ) 10 31 283 381
g(x 8 8
g(xi ) 1
8
3
8
3
8
1
8
24
ADVANCED TOPICS IN INTRODUCTORY
PROBABILITY: A FIRST COURSE IN PROBABILITY AND DISTRIBUTION FUNCTIONS
PROBABILITY
22 THEORY – VOLUME Advanced
III OF BIVARIATE DISTRIBUTIONS
Topics in Introductory Probability
(b) Similar to (a) we fix j and sum all the joint probabilities across i.
Hence,
P (Y = 1) = P (Y = 1, X = 0) + P (Y = 1, X = 1)
+P (Y = 1, X = 2) + P (Y = 1, X = 3)
= p(1, 0) + p(1, 1) + p(1, 2) + p(1, 3)
1 1
= +0+0+0=
8 8
P (Y = 2) = P (Y = 2, X = 0) + P (Y = 2, X = 1) +
P (Y = 2, X = 2) + P (Y = 2, X = 3) +
= p(2, 0) + p(2, 1) + p(2, 2) + p(2, 3)
2 2 4
= 0+ + +0=
8 8 8
P (Y = 3) = P (Y = 3, X = 0) + P (Y = 3, X = 1) +
P (Y = 3, X = 2) + P (Y = 3, X = 3)
= p(3, 0) + p(3, 1) + p(3, 2) + p(3, 3)
1 1 2
= 0+ + +0=
8 8 8
yj 1 2 3
g(yj ) 2
8
4
8
2
8
The results of Examples 1.1 and 1.8 (that is, the joint and marginal proba-
bilities) are usually presented in a single table such as in Table 1.4.
How will people travel in the future, and
Note how will goods be transported? What re-
sources will we use, and how many will
(a) The marginal distributions of X and Y are the ordinary we probability
need? The passenger and freight traf-
distribution functions of X and Y but when derived from fic the
sectorjoint
is developing rapidly, and we
distribution function the adjective “marginal” is added. provide the impetus for innovation and
movement. We develop components and
(b) In marginal distribution, the probability of different values of a for
systems ran-
internal combustion engines
dom variable in a subset of random variables is determined that without
operate more cleanly and more ef-
reference to any possible values of the other variables. ficiently than ever before. We are also
pushing forward technologies that are
bringing hybrid vehicles and alternative
drives into a new dimension – for private,
corporate, and public use. The challeng-
es are great. We deliver the solutions and
offer challenging jobs.
www.schaeffler.com/careers
25
P (Y = 3) = P (Y = 3, X = 0) + P (Y = 3, X = 1) +
P (Y = 3, X = 2) + P (Y = 3, X = 3)
ADVANCED TOPICS IN INTRODUCTORY
= p(3, 0) + p(3, 1)
PROBABILITY: A FIRST COURSE IN
+ p(3, 2) +PROBABILITY
p(3, 3) AND DISTRIBUTION FUNCTIONS
PROBABILITY THEORY – VOLUME III1 1 2 OF BIVARIATE DISTRIBUTIONS
= 0+ + +0=
8 8 8
yj 1 2 3
g(yj ) 2
8
4
8
2
8
The results of Examples 1.1 and 1.8 (that is, the joint and marginal proba-
bilities) are usually presented in a single table such as in Table 1.4.
Note
(a) The marginal distributions of X and Y are the ordinary probability
distribution functions of X and Y but when derived from the joint
distribution function the adjective “marginal” is added.
(b) In marginal distribution, the probability of different values of a ran-
dom
Density andvariable in a subset
Distribution of random
Functions variables
of Bivariate is determined without
Distributions 23
reference to any possible values of the other variables.
(c) From the table,
n
m m
n
p(xi , yj ) = g(xi , yj ) = h(xi , yj ) = 1
i=1 j=1 j=1 i=1
X Y Row
Totals
1 2 3
0 1/8 0 0 1/8
1 0 2/8 1/8 3/8
2 0 2/8 1/8 3/8
3 1/8 0 0 1/8
Column 2/8 4/8 2/8 1
Totals
Example 1.9
Refer to Example 1.2. Find the marginal distributions of
(a) X, (b) Y .
Solution
Solution
Definition 1.12
Suppose f (x, y) be the joint probability density function of the con-
tinuous two-dimensional random variable (X, Y ). We define g(x)
and h(y), the marginal probability density function of X and Y ,
respectively by ∞
g(x) = f (x, y) dy
−∞
and ∞
h(y) = f (x, y) dx
−∞
Example 1.10
Refer to Example 1.3. Find the marginal probability distribution of X and
Y.
Solution
Marginal probability distribution of X:
∞
Density and Distribution Functions of Bivariate Distributions 25
g(x) = f (x, y) dy
−∞
2
xy
= x2 + dy, 0<x<1
0 3
2
xy 2
= x y+ 2
6 0
4x
= 2x2 +
6
27
ADVANCED TOPICS
Density and IN INTRODUCTORY
Distribution Functions of Bivariate Distributions 25
PROBABILITY: A FIRST COURSE IN PROBABILITY AND DISTRIBUTION FUNCTIONS
PROBABILITY THEORY – VOLUME
2 III OF BIVARIATE DISTRIBUTIONS
xy
= 2
x + dy, 0<x<1
0 3
2
xy 2
= x y+ 2
6 0
4x
= 2x2 +
6
That is,
2x2 + 23 x, 0 < x < 1
g(x) =
0, elsewhere
FX (x) is 678'<)25<2850$67(5©6'(*5((
FX (x) = P (X ≤ x)
and &KDOPHUV8QLYHUVLW\RI7HFKQRORJ\FRQGXFWVUHVHDUFKDQGHGXFDWLRQLQHQJLQHHU
the marginal probability distribution function of Y, denoted by
FY (y) is
LQJDQGQDWXUDOVFLHQFHVDUFKLWHFWXUHWHFKQRORJ\UHODWHGPDWKHPDWLFDOVFLHQFHV
FY (y) = P (Y ≤ y)
DQGQDXWLFDOVFLHQFHV%HKLQGDOOWKDW&KDOPHUVDFFRPSOLVKHVWKHDLPSHUVLVWV
IRUFRQWULEXWLQJWRDVXVWDLQDEOHIXWXUH¤ERWKQDWLRQDOO\DQGJOREDOO\
9LVLWXVRQ&KDOPHUVVHRU1H[W6WRS&KDOPHUVRQIDFHERRN
28
3 6
That is
ADVANCED TOPICS IN INTRODUCTORY
PROBABILITY: A FIRST COURSE
1
+ y, 0<y<2
h(y) =IN 3 6 PROBABILITY AND DISTRIBUTION FUNCTIONS
PROBABILITY THEORY – VOLUME III 0, elsewhere OF BIVARIATE DISTRIBUTIONS
Thus,
F (x, y) = P (X ≤ x, Y ≤ y)
= lim F (x, y)
y→∞
= FX (x)
x ∞
= f (u, y) dy du
−∞ −∞
Similarly,
From this, it follows that the probability density function of X alone, known
as the marginal density of X, is
g(x) = fX (x)
= F X (x)
∞
= f (x, y) dy
−∞
Similarly
h(y) = fY (y)
= FY (y)
∞
= f (x, y) dx
−∞
Note
The marginal probability density functions g(x) and h(y) can easily be de-
termined from the knowledge of the joint density function f (x, y). However,
the knowledge of the marginal probability density functions does not, in gen-
eral, uniquely determine the joint density function. The exception occurs
when the two random variables are independent.
29
ADVANCED TOPICS IN INTRODUCTORY
PROBABILITY: A FIRST COURSE IN PROBABILITY AND DISTRIBUTION FUNCTIONS
PROBABILITY THEORY – VOLUME III OF BIVARIATE DISTRIBUTIONS
Density and Distribution Functions of Bivariate Distributions 27
P (A ∩ B) = P (A)P (B|A)
or
The probabilities g(xi ) and h(yj ) are those associated with the marginal
probability distributions for X and Y , respectively. The probability p(x|y)
is the probability that the random variable X takes a specific value x given
that Y takes on the value y written in full as P (X = x|Y = y). Thus,
P (X = 2|Y = 1) is the conditional probability that X = 2 given that Y =
1. A similar interpretation can be attached to the conditional probability
p(y|x).
Note
Conditional distribution is the opposite of marginal distribution, in which
the probability of a value of a random variable is determined without refer-
28 Advanced Topics in Introductory Probability
ence to the possible values of the other variables.
P (X = x, Y = y)
P (X = x|Y = y) = , P (Y = y) > 0
P (Y = y)
That is, given the joint probability distribution p(x, y) and marginal prob-
ability functions g(x) and h(y), respectively, the conditional discrete proba-
bility function of X given Y is
30
p(x, y)
P (X = x|Y = y) = , P (Y = y) > 0
P (XP=(Yx,=Yy)= y)
P (X = x|Y = y) = , P (Y = y) > 0
ADVANCED TOPICS IN INTRODUCTORYP (Y = y)
PROBABILITY: A FIRST COURSE IN PROBABILITY AND DISTRIBUTION FUNCTIONS
PROBABILITY THEORY – VOLUME III OF BIVARIATE DISTRIBUTIONS
That is, given the joint probability distribution p(x, y) and marginal prob-
ability
That is,functions
given the and
joint
g(x) h(y), respectively,
probability the p(x,
distribution conditional discrete proba-
y) and marginal prob-
bility
abilityfunction of g(x)
functions X given Y is respectively, the conditional discrete proba-
and h(y),
bility function of X given Y is
p(x, y)
p(x|y) = , h(y) > 0
h(y)y)
p(x,
p(x|y) = , h(y) > 0
h(y)
and similarly, the conditional discrete probability function of Y given X is
and similarly, the conditional discrete probability function of Y given X is
p(x, y)
p(y|x) = , g(x) > 0
g(x)y)
p(x,
p(y|x) = , g(x) > 0
g(x)
This definition shows that if we have the joint probability function
of two random
This variables
definition shows andthat
desire
if we thehave
conditional
the jointdistribution
probabilityforfunction
one of
them
of twowhen
randomthe other is held
variables andfixed,
desireit the
is merely necessary
conditional to dividefor
distribution theone
joint
of
probability
them when function
the otherbyis the
heldprobability
fixed, it is function of the fixed
merely necessary variable.
to divide the joint
probability function by the probability function of the fixed variable.
Example 1.11
Refer to the
Example 1.11table in Example 1.1. Find
Refer to the table in Example 1.1. Find
(a) P (Y = 2|X = 1)
(a) P (Y = 2|X = 1)
(b) P (X = 1|Y = 2)
(b) P (X = 1|Y = 2)
(c) P (X = 1|Y = 1)
(c) P (X = 1|Y = 1)
Scholarships
31
ADVANCED TOPICS IN INTRODUCTORY
PROBABILITY: A FIRST COURSE IN PROBABILITY AND DISTRIBUTION FUNCTIONS
PROBABILITY
Density and THEORY – VOLUME
Distribution III
Functions of Bivariate Distributions OF BIVARIATE
29 DISTRIBUTIONS
Solution
2
P (X = 1, Y = 2) p(1, 2) 2
(a) P (Y = 2|X = 1) = = = 8
=
P (X = 1) g(1) 3
8
3
2
P (X = 1, Y = 2) p(1, 2) 2
(b) P (X = 1|Y = 2) = = = 8
=
P (Y = 2) h(2) 4
8
4
P (X = 1, Y = 1) p(1, 1) 0
(c) P (X = 1|Y = 1) = = = 2 =0
P (Y = 1) h(1) 8
Example 1.12
Refer to Example 1.2. Find the conditional probability function
(a) p(x|y), (b) p(y|x)
Solution
p(x, y)
p(x|y) =
h(y)
(a) From Example 1.2,
1
p(x, y) = (3x + 2y)
21
From Example 1.9,
1
h(y) = (3 + 4y)
21
Hence
21 (3x + 2y)
1
p(x|y) =
21 (3 + 4y)
1
3x + 2y
=
(3 + 4y)
(b)
p(x, y)
p(y|x) =
g(x)
From Example 1.9,
30 Advanced
1 Topics in Introductory Probability
g(x) = (3x + 2)
7
Hence
21 (3x + 2y)
1
p(y|x) =
7 (3x + 2)
1
7(3x + 2y)
=
21(3x + 2)
3x + 2y
=
3(3x + 2)
Definition 1.15
Let (X, Y ) be a continuous two-dimensional random variable with
joint probability density function f (x, y). Let g and h be the
marginal probability density functions of X32and Y respectively. The
conditional probability of X for given Y = y is defined by
7
7(3x + 2y)
=
ADVANCED TOPICS IN INTRODUCTORY 21(3x + 2)
PROBABILITY: A FIRST COURSE IN 3x + 2y PROBABILITY AND DISTRIBUTION FUNCTIONS
PROBABILITY THEORY – VOLUME III = OF BIVARIATE DISTRIBUTIONS
3(3x + 2)
Definition 1.15
Let (X, Y ) be a continuous two-dimensional random variable with
joint probability density function f (x, y). Let g and h be the
marginal probability density functions of X and Y respectively. The
conditional probability of X for given Y = y is defined by
f (x, y)
f (x|y) = , h(y) > 0
h(y)
f (x, y)
f (y|x) = , g(x) > 0
g(x)
Example 1.13
Refer to Example 1.3. Find (a) f (x|y), (b) f (y|x).
Solution
From Example 1.10
1 y
h(y) = +
3 6
(a)
Density and Distribution Functions of Bivariate Distributions 31
f (x, y) Distributions
Density and Distribution Functions of Bivariate 31
f (x|y) =
Therefore h(y)
Therefore x2 + xy
f (x|y) = y
3
x132 + xy
f (x|y) = 12 y
63
3 ++62xy
6x
= , 0 ≤ x ≤ 1; 0 ≤ y ≤ 2
6x22 + 2xyy
= , 0 ≤ x ≤ 1; 0 ≤ y ≤ 2
2+y
(b) The conditional probability of Y for given X = x is
(b) The conditional probability of Y for given X = x is
f (x, y)
f (y|x) = .
(x, y)
f g(x)
f (y|x) = .
g(x)
From Example 1.10,
2
From Example 1.10, g(x) = 2x2 + x
32
g(x) = 2x2 + x
Therefore 3
Therefore
x2 + xy
f (y|x) = 3
2x + xy
x22 + 2
33 x
f (y|x) =
2x
3x + xy
2 2
3x
=
6x + xy
3x 2 2x
= 3x + y
= 6x + 2x
2
, 0 ≤ y ≤ 2; 0≤x≤1
6x + y2
3x
= , 0 ≤ y ≤ 2; 0≤x≤1
6x + 2
Example 1.14
Consider the joint probability density function
Example 1.14
33
Consider the joint probability
density function
6(1 − y), 0 ≤ x ≤ y ≤ 1
= 3x 3x 2+
2
2 +
3
xy
=
= 6x
3x2 + xy 2x
xy
= 3x 6x 2+ 2x
ADVANCED TOPICS IN INTRODUCTORY
= 3x 6x2+
6x
+
+
+yy2x
2x
, 0 ≤ y ≤ 2; 0 ≤ x ≤ 1
PROBABILITY: A FIRST COURSE = 3x
6x
IN3x +
+ 2
y
y ,, 00 ≤ y ≤ 2; 0PROBABILITY
≤x ≤ 11 AND DISTRIBUTION FUNCTIONS
PROBABILITY THEORY – VOLUME
=
= III6x
6x +
+ 2
2 , 0≤ ≤ 2;
≤ yy ≤ 2; 00 ≤ ≤x x≤≤1 OF BIVARIATE DISTRIBUTIONS
6x + 2
Example 1.14
Example
Example 1.14
Example 1.14
Consider the joint probability density function
1.14
Consider
Consider the joint probability density function
Consider the
the joint
joint probability
probability density
density function
function
6(1 − y), 0 ≤ x ≤ y ≤ 1
f (x, y) = 6(1 − y), 0 ≤ x ≤ y ≤ 1
ff (x, = 6(1
0, − y), 00elsewhere
− y), ≤x x≤ ≤ yy ≤ ≤ 11
(x, y)
y) == 6(1 0,
≤
elsewhere
f (x, y) 0,
0, elsewhere
elsewhere
1 3
Find P
Y < 1 X < 3
.
11 X < 34
2
Find
Find PP Y < 3 .
Find P Y Y <
< 22 X X< < 44 ..
Solution 2 4
Solution
Solution
Solution
1 3
32 P Y < , X <
1 Advanced 3 PTopics 2111 ,, X 4333
in Introductory Probability
P
Y < X < = P Y
Y <
<
X <
<
32 1
1
3
Advanced
43 P Topics
Y < in 2 , Introductory
22 3 X < 4
44 Probability
P Y
P Y <
< 12 X
X <
< 3 = = P
X <
P Y < 2 X < 4 = 3
of 43
We first evaluate the2 2 numerator. 4
4 The P region
P X< 3integration is sketched
P X X< < 44
below.We first evaluate the numerator. The region of4integration is sketched
below.
Now,
Now,
1
1 3 2
y
P Y < , X < = y 6(1 − y) dx dy
1
21 43 x=0 2
y=0
P Y < ,X< = 6(1
1 − y) dx dy
2 4 y=0
2 x=0
= 3 y − 2 y3 21
y=0
= 13 y 2 − 2 y3 2
y=0
=
21
=
2
The denominator is evaluated as follows. First we find the marginal p.d.f of
The denominator is evaluated
X. 1 as follows. First we find the marginal p.d.f of
X. g(x) = 6(1 − y) dy = 3(1 − x)2 0 ≤ x ≤ 1
1
y=x
g(x) = 6(1 − y) dy = 3(1 − x)2 0≤x≤1
y=x
so that
so that 3
3 4
P X < = 3 3(1 − x)2 dx
43 4
P X< = x=0 3(1 − x)2 dx
4 63
= x=0
63
64
=
64
Hence,
Hence, 1 3 1/2 32
P Y < X < = =
21 43 1/2
63/64 32
63
P Y < X < = =
2 4 63/64 63
34
ADVANCED TOPICS IN INTRODUCTORY
PROBABILITY: A FIRST COURSE IN PROBABILITY AND DISTRIBUTION FUNCTIONS
PROBABILITY THEORY – VOLUME III OF BIVARIATE DISTRIBUTIONS
Now,
1
1 3 2
y
P Y < ,X< = 6(1 − y) dx dy
2 4 y=0 x=0
1
= 3 y 2 − 2 y3 2
y=0
1
=
2
The denominator is evaluated as follows. First we find the marginal p.d.f of
X. 1
g(x) = 6(1 − y) dy = 3(1 − x)2 0 ≤ x ≤ 1
y=x
so that
3
3 4
P X< = 3(1 − x)2 dx
4 x=0
63
=
64
Hence,
1 3 1/2 32
P Y < X< = =
2 4 63/64 63
Density and Distribution Functions of Bivariate Distributions 33
Thus, the independence of the random variables X and Y implies that their
joint distribution function factors into the products of their marginal distri-
bution functions. This definition applies whether the random variables are
discrete or continuous.
If X and Y are not independent, they are said to be dependent. It is
usually more convenient to verify independence or otherwise with the help
34 Advanced
of the p.m.f. (in the discrete case) Topics
or p.d.f. in Introductory
(in the Probability
continuous case).
Definition 1.17
If X and Y are discrete random variables with joint probability func-
tion p(x, y) and marginal probability function g(x) and h(y) respec-
tively, then X and Y are independent if and only if
p(x, y) = g(x)h(y)
Example 1.15
Refer to the table in Example 1.1, verify whether or not X and Y are
independent.
Solution
Consider the ordered pair (0, 1).
From Example 1.1
1
P (X = 0, Y = 1) =
8
But from the marginal distributions,
1
P (X = 0) =
8
2
P (Y = 1) =
8
1 2
P (X = 0)P (Y = 1) =
8 8
1
which does not equal . Therefore X and Y are not independent.
8
Example 1.16
Refer to Example 1.2. Are X and Y independent?
Solution 36
From Example 1.2
P (Y = 1) =
8
1 2
ADVANCED TOPICS IN P (X = 0)P (Y =
INTRODUCTORY 1) =
PROBABILITY: A FIRST COURSE IN 8 8 PROBABILITY AND DISTRIBUTION FUNCTIONS
PROBABILITY THEORY – VOLUME III OF BIVARIATE DISTRIBUTIONS
1
which does not equal . Therefore X and Y are not independent.
8
Example 1.16
Refer to Example 1.2. Are X and Y independent?
Solution
From Example 1.2
Density and Distribution Functions of1 Bivariate Distributions 35
p(x, y) = (3x + 2y)
21
From Example 1.9,
1
g(x) = (3x + 2)
7
1
h(y) = (3 + 4y)
21
Now
1 1
g(x)h(y) = (3x + 2) (3 + 4y)
7 21
1
= (9x + 12xy + 8y + 6)
147
which is not equal to p(x, y); hence X and Y are not independent.
Definition 1.18
If X and Y are are continuous random variables with a joint density
function of f (x, y) and marginal density functions of g(x) and h(y),
respectively, then X and Y are independent if and only if
f (x, y) = g(x)h(y)
Example 1.17
Refer to Example 1.3. Verify whether X and Y are independent.
Solution
From Example 1.10,
2
g(x) = 2x2 + x
3
1 y
h(y) = +
3 6
37
1 1
g(x)h(y) = (3x + 2) (3 + 4y)
7 21
ADVANCED TOPICS IN INTRODUCTORY 1
PROBABILITY: A FIRST COURSE =
IN (9x + 12xy + 8y +PROBABILITY
6) AND DISTRIBUTION FUNCTIONS
147
PROBABILITY THEORY – VOLUME III OF BIVARIATE DISTRIBUTIONS
which is not equal to p(x, y); hence X and Y are not independent.
Definition 1.18
If X and Y are are continuous random variables with a joint density
function of f (x, y) and marginal density functions of g(x) and h(y),
respectively, then X and Y are independent if and only if
f (x, y) = g(x)h(y)
Example 1.17
Refer to Example 1.3. Verify whether X and Y are independent.
Solution
From Example 1.10,
36
36 Advanced
Advanced Topics
2 in
Topics in Introductory
Introductory Probability
Probability
36 Advanced
g(x) = 2x2Topics
+ x in Introductory Probability
36 3 in Introductory Probability
Advanced Topics
Now 1 y
Now
Now h(y) = +
3 6
Now
2 22 1
11 + yyy
g(x)h(y)
g(x)h(y) =
= 2x
2x 2+ 2
2+ x
x +
g(x)h(y) = 2x + 33 x 33 + 66
2 23 132 y6 1
2 22 +1 ++ 11 xy
g(x)h(y) = = 2x2+ 1 2y + 2
x
2
= 2 x
x2 + 1 3xx
x2 yy + 329 x
x
x++6 9 xy
xy
= 3 3 x + 3
3 + 9 9
23 2 13 2 29 19
which is
which is not
is not equal
not equal to
equal to
to = x + x y + x + xy
which 3 32 xy 9 9
ff (x, y) = x 2 + xy
(x, y) = x 2 + xy
which is not equal to f (x, y) = x + 33
3
xy
Hence
Hence X
X and
and Y
Y are
are not
not independent.
(x,
independent.
f y) = x 2
+
Hence X and Y are not independent. 3
Example
Hence
ExampleX 1.18
and Y are not independent.
1.18
Example 1.18
Suppose the
Suppose the joint
the joint p.d.f.
joint p.d.f. of
p.d.f. of X
of X and Y
and Y
X and Y isis given
is given by
given by
by
Suppose
Example 1.18
Suppose the joint p.d.f. of and Y 0
X 4xy,
4xy, 0is<given
x< by0 < yy <
1, 1
ff (x, y)
(x, y) =
= 4xy,
y) = 0< <xx<< 1,
1, 00 <
<y< < 11
f (x, 0,
0, elsewhere
elsewhere
0,
4xy, 0elsewhere
< x < 1, 0 < y < 1
Verify whether f (x,
or not y)X=and Y are independent.
Verify whether or not X and 0,
Y are independent.
elsewhere
Verify whether or not X and Y are independent.
Solution
Verify whether or not X and Y are independent.
Solution
Solution
We
We have
We have
have
Solution
11
We have g(x) = 1 f (x, y) dy
g(x) =
g(x) = 00 ff (x,
(x, y)
y) dy
dy
0 1
1
g(x) =
=
11 f (x, y) dy
4xy dy
=
= 000 4xy
4xy dy
dy
01
11
= 4xy
y 22 dy
y2 1
=
= 04x y
= 4x
4x 222 10
y2 00
=
= 2x,
4x 00 <
2x, x< 1
= 2x, 2 0 <
<x < 11
x<
0
Similarly
Similarly = 2x, 0 < x < 1
Similarly
Similarly 11 38
h(y) = 1 f (x, y)dx
h(y) = 00 ff (x,
h(y) = (x, y)dx
y)dx
1
= 4xy dy
0
ADVANCED TOPICS IN INTRODUCTORY
1
PROBABILITY: A FIRST COURSE IN =
y2
4x PROBABILITY AND DISTRIBUTION FUNCTIONS
PROBABILITY THEORY – VOLUME III 2 0 OF BIVARIATE DISTRIBUTIONS
= 2x, 0 < x < 1
Similarly
1
h(y) = f (x, y)dx
0
= 2y, 0<y<1
Hence,
f (x, y) = g(x)h(y)
Density and Distribution Functions of Bivariate Distributions 37
for all real numbers (x, y), and therefore X and Y are independent.
EXERCISES
39
1.1 Given the following joint frequency distribution of X and Y
X
distributions of X ± Y , XY , and (Y = 0) is not straightforward and this
Y
is the subject
ADVANCED of the
TOPICS next chapter.X
IN INTRODUCTORY
distributions
PROBABILITY: AofFIRST , XY , and
X ± YCOURSE IN (Y =
0) is not straightforward and DISTRIBUTION
PROBABILITY AND this FUNCTIONS
PROBABILITY THEORY – VOLUME III
Y OF BIVARIATE DISTRIBUTIONS
is the subject of the next chapter.
EXERCISES
EXERCISES
1.1 Given the following joint frequency distribution of X and Y
1.3 Refer
(a) Pto(XExercise
< 2, Y 1.1.
≤ 3),Calculate
(b) P (2 ≤ X < 3, 1 ≤ Y ≤ 2)
(c) P (1 < X < 2, Y < 3), (d) P (X < 2, 0 ≤ Y < 2)
(a) P (X < 2, Y ≤ 3), (b) P (2 ≤ X < 3, 1 ≤ Y ≤ 2)
(c) Pto
1.4 Refer (1 Exercise
< X < 2,1.1.
Y < 3),
Find (d) P (X < 2, 0 ≤ Y < 2)
1.4 Refer
(a) Pto(YExercise
= 3|X = 1.1.
1), Find (b) P (Y = 2|X = 3),
(c) P (X = 2|Y = 2), (d) P (X = 1|Y = 2),
(a) P (Y = 3|X = 1), (b) P (Y = 2|X = 3),
(e) P (X = 3|Y = 1), (f) P (X = 3|Y = 2)
(c) P (X = 2|Y = 2), (d) P (X = 1|Y = 2),
(e) Pto
1.5 Refer (XExample
= 3|Y =1.5.
1), Solve(f)part
P (X = 3|Y = 2)
(c).
40
1.3 Refer to Exercise 1.1. Calculate
ADVANCED TOPICS IN INTRODUCTORY
PROBABILITY: A FIRST COURSE IN PROBABILITY AND DISTRIBUTION FUNCTIONS
(a) P (X < 2, Y ≤ 3),
PROBABILITY THEORY – VOLUME III
(b) P (2 ≤ X < 3, 1 ≤ Y ≤ 2) OF BIVARIATE DISTRIBUTIONS
(c) P (1 < X < 2, Y < 3), (d) P (X < 2, 0 ≤ Y < 2)
1 1 1 1
p(1, 1) = , p(1, 2) = , p(2, 1) = , p(2, 2) =
8 4 8 2
(a) Compute the conditional probability mass function of X, given
Y = i, i = 1, 2.
(b) Are X and Y independent?
1.7 Consider the bivariate discrete random variables X and Y with prob-
ability function
1
p(x, y) = (2x + y) x = 0, 1, 2; y = 0, 1, 2.
27
(a) Verify that p(x, y) is a legitimate probability mass function.
(b) Find the joint probability
P (1 ≤ X ≤ 2, 0 ≤ Y ≤ 1)
P (X = x, Y = y) = θx (1 − θ)y ,
41
ADVANCED TOPICS IN INTRODUCTORY
PROBABILITY: A FIRST COURSE IN PROBABILITY AND DISTRIBUTION FUNCTIONS
PROBABILITY
40 THEORY – VOLUME Advanced
III OF BIVARIATE DISTRIBUTIONS
Topics in Introductory Probability
1.9 A bivariate discrete random variable (X, Y ) has the following proba-
bility mass function.
x+y x
p (1 − p)y e−λ λx+y
x
p(x, y) = , x, y = 0, 1, 2, ...
(x + y)!
(a) Show that the marginal probability function of X is
e−λ p (λ p)x
g(x) = ; x = 0, 1, 2, ...
x!
(b) Find the marginal probability mass function of Y .
(c) Examine whether X and Y are independent.
where k is a constant.
Density(a)
andFind the value Functions
Distribution of k. of Bivariate Distributions 41
(b) Find the marginal probability distribution of X and Y .
(c) Find the conditional probability
(i) X given Y = y
(ii) Y given X = x
(d) Verify whether X and Y are independent or not.
(e) P (X > Y )
where k is a constant.
(e) P (X > Y )
where k is a constant.
Find
Find
1.16 The number of people that enter a church in a given hour is a Poisson
random variable with λ = 10. Compute the conditional probability
that at most 2 men entered the church in a given hour, given that 10
women entered in that hour. What assumptions have you made?
1.16 The number of people that enter a church in a given hour is a Poisson
random variable with λ = 10. Compute the conditional probability
that at most 2 men entered the church in a given hour, given that 10
women entered in that hour. What assumptions have you made?
1.20 Show that the conditional densities f (x|y) and f (y|x) are legitimate
density functions.
44
ADVANCED TOPICS IN INTRODUCTORY
PROBABILITY: A FIRST COURSE IN SUMS, DIFFERENCES, PRODUCTS AND
PROBABILITY THEORY – VOLUME III QUOTIENTS OF BIVARIATE DISTRIBUTIONS
Chapter 2
2.1 INTRODUCTION
We shall introduce the concept of the sum of discrete random variables with
an example.
Example 2.1
Consider a sequence of independent experiments in each of which a fixed
event is of interest. Suppose X1 , X2 , · · · , Xn are discrete random variables
45
In this chapter, we shall take up the distribution of sums, differences,
products and quotients of X and Y . The numerical characterisation of the
joint distribution
ADVANCED of INTRODUCTORY
TOPICS IN X and Y will be discussed in the next chapter. As
has been the Apractice
PROBABILITY: in thisINbook, discussions will be done SUMS,
FIRST COURSE separately for
DIFFERENCES, PRODUCTS AND
PROBABILITY
discrete and THEORY
continuous– VOLUME
cases. III QUOTIENTS OF BIVARIATE DISTRIBUTIONS
We shall introduce the concept of the sum of discrete random variables with
an example.
Example 2.1
Sums, Differences, Products and Quotients of Bivariate Distributions 45
Consider a sequence of independent experiments in each of which a fixed
event is of interest. Suppose X1 , X2 , · · · , Xn are discrete random variables
defined by
44
1, if the fixed event happens in the ith experiment;
Xi =
0, if the fixed event does not happen in the ith experiment.
Let
Zn = X1 + X2 + · · · + Xn
The random variable Zn is a sum of discrete random variables Xi , i =
1, 2, ..., n, and denotes the number of times the fixed event happens in the
n trials.
For instance, the President of Regent University receives visitors com-
ing every day from the two surrounding communities. The number of visitors
on a particular day from the communities are X1 and X2 , respectively. Let
Z2 = X1 + X2 . The random variable Z2 is a sum of two random variables
and gives the total number of visitors coming to the President of Regent
University on a particular day.
Case 1
Suppose that the joint distribution of X and Y is in the form of the table as
presented in Table 1.3. Such a table can be referred to as the parent joint
probability distribution table. Then the various sums of the discrete
random variables X and Y, denoted as X + Y, and their corresponding
probabilities may be presented as in Table 2.1. This can be referred to as
the derived joint probability distribution table.
xi + yj x1 + y1 x1 + y2 ··· x1 + ym x2 + y1 x2 + y2 ···
p(xi , yj ) p(x1 , y1 ) p(x1 , y2 ) ··· p(x1 , ym ) p(x2 , y1 ) p(x2 , y2 ) ···
··· x2 + ym xn + y1 xn + y2 ··· xn + ym
··· p(x2 , ym ) p(xn , y1 ) p(xn , y2 ) ··· p(xn , ym )
To find the distribution of the sum, we apply the principle of the probabilities
46
xi + yj x1 + y1 x1 + y2 ··· x1 + ym x2 + y1 x2 + y2 ···
p(xi , yj ) p(x1 , y1 ) p(x1 , y2 ) ··· p(x1 , ym ) p(x2 , y1 ) p(x2 , y2 ) ···
ADVANCED TOPICS IN INTRODUCTORY
PROBABILITY: A FIRST COURSE IN SUMS, DIFFERENCES, PRODUCTS AND
· · · x2 + ym x + y1 xn + y2 · · · xn + ym
PROBABILITY THEORYn– VOLUME III QUOTIENTS OF BIVARIATE DISTRIBUTIONS
· · · p(x2 , ym ) p(xn , y1 ) p(xn , y2 ) · · · p(xn , ym )
46 Advanced Topics in Introductory Probability
To find the distribution of the sum, we apply the principle of the probabilities
of equivalent events.1 By this principle, the probabilities of equivalent events
are equal. That is, if A ⊂ S and B ⊂ RX are equivalent events, then we
define the probability of the event B, P (B), to be equal to P (A).
This principle is illustrated in some examples below after the following
theorem.
Theorem 2.1
Y
The distribution (probability) of the sum of X = xi and Y = yj
X −1 0 1 Row Totals
being equal to k is the sum of the probabilities that correspond to
j that sum0.10
all indices i and−1 to k 0.20 0.11 0.41
0 0.08 0.02 0.26 0.36
2 0.03 0.17 0.03 0.23
Column Totals 0.21 0.39 0.40 1.00
Example 2.2
Given the table below, find the distribution of the sum X+Y.
Solution
We shall list the values of x + y in each cell within parenthesis as in the
table Differences,
Sums, below: Products and Quotients Y of Bivariate Distributions 47
X −1 0 1 Row Totals
1
The events A ⊂ S −1 0.10called
and B ⊂ RX are 0.20 0.11 0.41
Y equivalent events if
0 0.08 0.02 0.26 0.36
X −1 0 1 Row Totals
2 0.03 0.17
A = {s ∈ S|X(s) ∈ B} 0.03 0.23
−1 0.10 0.20 0.11 0.41
Column Totals 0.21 0.39 0.40 1.00
where RX is the range space of the(-2)
random(-1)
variable(0)
X.
0 0.08 0.02 0.26 0.36
(-1) (0) (1)
Solution 2 0.03 0.17 0.03 0.23
We shall list the values of x + (1)y in (2)
each cell(3) within parenthesis as in the
table below:Column Totals 0.21 0.39 0.40 1.00
Bioinformatics is the
exciting field where biology,
computer science, and
mathematics meet.
Read more about this and our other international masters degree programmes at www.uu.se/master
48
ADVANCED TOPICS IN INTRODUCTORY
PROBABILITY: A FIRST COURSE IN SUMS, DIFFERENCES, PRODUCTS AND
48
PROBABILITY THEORY – VOLUME Advanced
III QUOTIENTS
Topics in Introductory OF BIVARIATE DISTRIBUTIONS
Probability
Step 1
Indicate the various summands with their corresponding joint probabilities
Step 2
48
Sum the various values for X Advanced
and Y Topics in Introductory Probability
x+y −2 −1 0 −1 0 1 1 2 3
Step 1
p(x, y) 0.1 0.2 0.11 0.08 0.02 0.26 0.03 0.17 0.03
Indicate the various summands with their corresponding joint probabilities
x+y −2 −1 1 0 2 3
By the principle of equality of the probabilities of equivalent events, we shall
p(x + y) 0.1 0.2 + 0.08 0.26 + 0.03 0.11 + 0.02 0.17 0.03
perform the operation in Step 3.
Step 4
Step 3
Present the final result as in the table below:
Sum the various probabilities corresponding to a particular value of the sum
x+y, denoted by p(x+y) (by the addition rule of probability)3 . For example,
Sums, Differences,(x +Products
y) −2 −1Quotients 0 of1by 2 Distributions
3 49
the probability of the sum x and+ y = 1, is given Bivariate
p(x + y) 0.1 0.28 0.13 0.29 0.17 0.03
P [(X + Y ) = 1] = p(1)
3
We iscan
That if averify thatvalue
particular thisindistribution of +
the sum appears
= 0.26 the sum
more
0.03 = is
than a probability
once,
0.29 distri-
their probabilities
are addedThat
bution. together
is, for the purpose of constructing the probability distribution.
Thus, we summarise the table0 in Step+3y)in≤the
≤ p(x 1 table below:
and x+y −2 −1 1 0 2 3
p(x + y) = 1
p(x + y) 0.1 0.2 + 0.08 0.26 + 0.03 0.11 + 0.02 0.17 0.03
Step 4
Case 2
Present the final result as in the table below:
Suppose that the joint distribution of X and Y is in the form of Table 1.3.
Suppose also that X and Y are independent so that p(xi , yj ) = g(xi )h(yj ).
(x + y) −2 −1 0 1 2 3
Then the various sums X + Y and their corresponding probabilities may be
p(x + y) 0.1 0.28 0.13 0.29 0.17 0.03
presented in the form of the following table.
3
That is if a particular value in the sum appears more than once, their probabilities
are added together for the purpose of constructing the probability distribution.
xi + yj x1 + y1 x2 + y2 ··· xn + ym
p(xi , yj ) p(x1 )p(y1 ) p(x2 )p(y2 ) ··· p(xn )p(ym )
49
Theorem 2.2
Case 2
Suppose that the joint distribution of X and Y is in the form of Table 1.3.
ADVANCED TOPICS IN INTRODUCTORY
Suppose also that X and Y are independent so that p(xi , yj ) = g(xi )h(yj ).
PROBABILITY: A FIRST COURSE IN SUMS, DIFFERENCES, PRODUCTS AND
Then the various sums X + Y and their corresponding probabilities
PROBABILITY THEORY – VOLUME III
may be
QUOTIENTS OF BIVARIATE DISTRIBUTIONS
presented in the form of the following table.
xi + yj x1 + y1 x2 + y2 ··· xn + ym
p(xi , yj ) p(x1 )p(y1 ) p(x2 )p(y2 ) ··· p(xn )p(ym )
Theorem 2.2
Suppose that X and Y are independent, discrete random variables
with marginal probability distributions g(x) and h(y) respectively,
then P (X + Y = k) is the sum of the product of g(x) and h(y)
for the corresponding value k = x + y
Example 2.3
Suppose the joint probabilities of two random variables X and Y are given
in the table below.
Y
X 3 6 Row Total
-2 0.28 0.12 0.4
4 0.42 0.18 0.6
Column Total 0.7 0.3 1.0
Solution
(a) The marginal probability distributions of the X is
x -2 4
g(x) 0.4 0.6
y 3 6
h(y) 0.7 0.3
so that we have
xi + yj −2 + 3 −2 + 6 4+3 4+6
p(xi + yj ) 0.4(0.7) 0.4(0.3) 0.6(0.7) 0.6(0.3)
50
p(−2, 6) = 0.12 = g(−2)h(6) = 0.4(0.3) = 0.12
ADVANCED TOPICSp(4, 3) = 0.42 =
IN INTRODUCTORY g(4)h(3) = 0.6(0.7) = 0.42
PROBABILITY: A FIRST COURSE IN SUMS, DIFFERENCES, PRODUCTS AND
p(4, 6) = 0.18 = g(4)h(6) = 0.6(0.3) = 0.18
PROBABILITY THEORY – VOLUME III QUOTIENTS OF BIVARIATE DISTRIBUTIONS
Hence, X and Y are independent.
(b) It has been shown that X and Y are independent. Hence, we obtain
the following table:
xi + yj −2 + 3 −2 + 6 4+3 4+6
p(xi + yj ) 0.4(0.7) 0.4(0.3) 0.6(0.7) 0.6(0.3)
Note
Even though the values of the random variables of X and Y are added
together, their corresponding probabilities g(xi ) and h(yj ) are multi-
plied, if X and Y are independent.
Theorem 2.3
Suppose X and Y are nonnegative discrete random variables with
joint probability function p(x, y). Let Z = X + Y . Then the proba-
bility distribution of Z is
z
P (Z = z) = p(x, z − x)
x=0
or equivalently,
Copenhagen P (Z = z) =
z
y=0
p(z − y, y)
Master of Excellence
Proof
cultural studies
P (Z = z)Master
Copenhagen = P (Xof+Excellence
Y = z) are
z
two-year master= degrees
P (X taught
= x, X +in
Y =English
z)
at one of Europe’s x=0
leading universities religious studies
z
= P (X = x, Y = z − x) (i)
Come to Copenhagenx=0
- and aspire!
fromApply now
which the at follows.
result science
Similarly,
www.come.ku.dk
z
P (Z = z) = p(z − y, y)
y=0
51
In most cases
ADVANCED the IN
TOPICS distribution of X and Y may not be presented in a table.
INTRODUCTORY
Suppose thatAall
PROBABILITY: that
FIRST we know
COURSE IN are the values of X + Y and their
SUMS, corre-
DIFFERENCES, PRODUCTS AND
PROBABILITY THEORY – VOLUME
sponding probabilities. In suchIIIa case, to obtain the distribution
QUOTIENTSof OF
X+ BIVARIATE
Y DISTRIBUTIONS
Theorem 2.3
Suppose X and Y are nonnegative discrete random variables with
joint probability function p(x, y). Let Z = X + Y . Then the proba-
bility distribution of Z is
z
P (Z = z) = p(x, z − x)
x=0
or equivalently,
z
P (Z = z) = p(z − y, y)
y=0
Proof
P (Z = z) = P (X + Y = z)
z
= P (X = x, X + Y = z)
x=0
z
= P (X = x, Y = z − x) (i)
x=0
Note
It is important to note the limits of summation. Beyond these limits, one
of the two component mass functions is zero. When dealing with densities
that are nonzero only on some subset of values, we must always be careful.
In case X and Y are allowed to take negative values as well, the lower index
of summation is changed from 0 to −∞.
Example 2.4
Refer to Example 1.2, find
(i) from the first principle (that is, using the joint distribution of X
and Y directly;)
(ii) using Theorem 2.3.
(b) P (Z = 2),
Solution
Recall from the solution of Example 1.2 that
1
p(x, y) = (3x + 2y), for x = 0, 1; y = 0, 1, 2
21
The various pairs of X and Y that we require 52
are
(b) P (Z = 2),
ADVANCED TOPICS IN INTRODUCTORY
(i) using
PROBABILITY: the distribution
A FIRST COURSE IN obtained in (a) (i); SUMS, DIFFERENCES, PRODUCTS AND
PROBABILITY THEORY – VOLUME III QUOTIENTS OF BIVARIATE DISTRIBUTIONS
(ii) using the distribution obtained in (a) (ii).
Solution
Recall from the solution of Example 1.2 that
1
p(x, y) = (3x + 2y), for x = 0, 1; y = 0, 1, 2
21
The various pairs of X and Y that we require are
(0, 0), (0, 1), (0, 2), (1, 0), (1, 1), (1, 2)
(a) (i) Writing the sum of the joint probability mass function as
1 2
1
P (X + Y = x + y) = (3x +
Sums, Differences, Products and Quotients21of Bivariate 2y)
Distributions 53
x=0 y=0
P (Z = 0) = p(0, 0)
1
= [3(0) + 2(0)]
21
= 0
P (Z = 1) = p(0, 1) + p(1, 0)
1 1
= [3(0) + 2(1)] + [3(1) + 2(0)]
21 21
5
=
21
P (Z = 2) = p(0, 2) + p(1, 1)
1 1
= [3(0) + 2(2)] + [3(1) + 2(1)]
21 21
9
=
21
P (Z = 3) = p(1, 2)
1
[3(1) + 2(2)]
21
7
=
21
x+y 0 1 2 3
p(x + y) 0 5
21
9
21
7
21
Aliter
z
p(0, z − 0), z = 0, 1, 2, 3
y=0
z
P (Z = z) = p(1, z − 1), z = 1, 2, 3
y=1
z
p(2, z − 2), z = 2, 3
y=2
(b) (i) From the table obtained in (a), that is, calculating the probability
from first principle, we have
3
P (Z = 2) =
7
(ii) By Theorem 2.3 (that is, using the results in (a)(ii)) we have
1
P (Z = 2) = p(x, 2 − x)
x=0
1
1
= [3x + 2(2 − x)]
21 x=0
1
1
Brain power =
21 x=0
3
(x + 4) By 2020, wind could provide one-tenth of our planet’s
electricity needs. Already today, SKF’s innovative know-
how is crucial to running a large proportion of the
= world’s wind turbines.
7 Up to 25 % of the generating costs relate to mainte-
nance. These can be reduced dramatically thanks to our
systems for on-line condition monitoring and automatic
Aliter lubrication. We help make it more economical to create
Using the alternative formula in Theorem 2.3, cleaner, cheaper energy out of thin air.
By sharing our experience, expertise, and creativity,
2
industries can boost performance beyond expectations.
P (Z = 2) = p(2 − y, y) Therefore we need the best employees who can
y=0 meet this challenge!
2
= p(2 − y, y), since The=Power
p(2, 0) 0 of Knowledge Engineering
y=1
2
1
= [3(2 − y) + 2y]
21 y=1
54
y=2
(b) (i) From the table obtained in (a), that is, calculating the probability
ADVANCED TOPICS IN INTRODUCTORY
from first
PROBABILITY: principle,
A FIRST COURSEwe IN
have SUMS, DIFFERENCES, PRODUCTS AND
PROBABILITY THEORY – VOLUME III QUOTIENTS OF BIVARIATE DISTRIBUTIONS
3
P (Z = 2) =
7
(ii) By Theorem 2.3 (that is, using the results in (a)(ii)) we have
1
P (Z = 2) = p(x, 2 − x)
x=0
1
1
= [3x + 2(2 − x)]
21 x=0
1
1
= (x + 4)
21 x=0
3
=
7
Aliter
Using the alternative formula in Theorem 2.3,
2
P (Z = 2) = p(2 − y, y)
y=0
2
= p(2 − y, y), since p(2, 0) = 0
y=1
2
Sums, Differences, Products 1 55
= and Quotients of 2y]
[3(2 − y) + Bivariate Distributions
21 y=1
2
1
= (6 − y)
21 y=1
3
=
7
or equivalently,
z
P (Z = z) = g(z − y)h(y) z≥0
y=0
Proof
Since X and Y are independent, the proof follows by writing
P (X = x, Y = z − x)
as
P (X = x)P (Y = z − x)
in Theorem 2.3.
55
Theorem 2.4 suggests a method of obtaining probability mass function when
z
P (Z = z) = g(z − y)h(y) z≥0
ADVANCED TOPICS IN INTRODUCTORY
y=0
PROBABILITY: A FIRST COURSE IN SUMS, DIFFERENCES, PRODUCTS AND
PROBABILITY THEORY – VOLUME III QUOTIENTS OF BIVARIATE DISTRIBUTIONS
called the convolution of g and h
Proof
Since X and Y are independent, the proof follows by writing
P (X = x, Y = z − x)
as
P (X = x)P (Y = z − x)
in Theorem 2.3.
is
Notea one-dimensional probability mass function. Thus the probability func-
tion of the sum of two independent random variables is the convolution4 of
(a) Theorem 2.3 also expresses convolution;
the individual probability functions.
(b) The convolution of g and h is denoted as g ∗ h.
Note
Since
(a) Theorem 2.3 also expresses convolution;
X +Y =Y +X
(b) The convolution of g and h is denoted as g ∗ h.
it is easy to verify that
Since z
P (Z = z)
X +=Y = Y g(x)h(z
+ X − x)
x
it is easy to verify that z
= z
g(z − y)h(y)
y
P (Z = z) = g(x)h(z − x)
that is, g ∗ h = h ∗ g. x
z
Example 2.5 = g(z − y)h(y)
y
Suppose that X and Y are independent discrete random variables having
that is, g ∗ hdistributions
probability = h ∗ g.
Solution
The joint probability distribution of X and Y is given by
Sums, Differences, Products and Quotients
e−λ λx of
e−θBivariate
θy Distributions 57
p(x, y) = ·
x! y!
(a)
4 Bywriters
Some Theorem 2.4
prefer to use the French word Composition or even the German equiva-
lent Faltung z
e−λ λx e−θ θz−x
P (Z = z) = ·
x=0
x! (z − x)!
e−(λ+θ) (λ + θ)z
P (Z = z) = , z = 0, 1, 2, ...
z!
(b) When λ = 1 and θ = 2
e−(1+2) (1 + 2)4
P (Z = 4) =
4!
Trust and responsibility
e−3 (3)4
=
4!
NNE and Pharmaplan have joined forces to create – You have to be proactive and open-minded as a
NNE Pharmaplan, the world’s leading engineering newcomer and make it clear to your colleagues what
and consultancy company focused entirely on the you are able to cope. The pharmaceutical field is new
Example 2.5 biotech
pharma and showsindustries.
that, if X and Y are independent random
to me. But busy asvariables
they are, most of my colleagues
having Poisson distribution with parameters λ and θ firespectively thenme,
nd the time to teach their
and they also trust me.
Inés Aréizaga Esteva (Spain), 25 years old Even though it was a bit hard at first, I can feel over
sum hasEducation:
a Poisson distribution with parameter λ + time
Chemical Engineer
θ. That is, the sum of
that I am beginning to be taken seriously and
the two independent Poisson random variables is again that mya contribution
Poisson random
is appreciated.
variable. The ideas here could be used to prove more generally by induction
that the sum of a finite Poisson random variables also has a Poisson distri-
bution, a result we proved earlier using the moment generating function in
my book “Introductory Probability Theory” (Nsowah-Nuamah, 2017).
Note
In general, if two independent random variables follow the same type of
57
x=0
Identifying
ADVANCED TOPICS the sum as the binomial
IN INTRODUCTORY expansion of (λ + θ)z we obtain
PROBABILITY: A FIRST COURSE IN SUMS, DIFFERENCES, PRODUCTS AND
PROBABILITY THEORY – VOLUME III QUOTIENTS OF BIVARIATE DISTRIBUTIONS
−(λ+θ)
e (λ + θ)z
P (Z = z) = , z = 0, 1, 2, ...
z!
(b) When λ = 1 and θ = 2
e−(1+2) (1 + 2)4
P (Z = 4) =
4!
e−3 (3)4
=
4!
The formulas for finding sums of continuous random variables are similar to
those in the discrete case by replacing the sum by the integral. However,
the actual computation is more complex with the continuous case.
Theorem 2.5
Suppose that X and Y are continuous random variables having joint
density function f (x, y). Let Z = X + Y and denote the density
function of Z by s(z). Then
∞
s(z) = f (x, z − x) dx
−∞
or equivalently, ∞
s(z) = f (z − y, y) dy
−∞
Proof
We shall use the cumulative distribution function technique. Let S denote
the c.d.f. of the random variable Z. Then
S(z) = P (Z ≤ z) = P (X + Y ≤ z) = f (x, y) dx dy
R
where R is the part of the region over which x + y < z, shown shaded in the
Figure 2.1.
To represent this integral as an iterated integral, we fix x and then
integrate with respect to y from z − x to −∞. Next integrate with respect
to x from −∞ to ∞. Thus
∞ z−x 58
S(z) = f (x, y) dy dx
S(z) = P (Z ≤ z) = P (X + Y ≤ z) = f (x, y) dx dy
ADVANCED TOPICS IN INTRODUCTORY
PROBABILITY: A FIRST COURSE IN R SUMS, DIFFERENCES, PRODUCTS AND
PROBABILITY THEORY – VOLUME III QUOTIENTS OF BIVARIATE DISTRIBUTIONS
where R is the part of the region over which x + y < z, shown shaded in the
Figure 2.1.
To represent this integral as an iterated integral, we fix x and then
integrate with respect to y from z − x to −∞. Next integrate with respect
to x from −∞ to ∞. Thus
∞ z−x
S(z) = and Quotientsf of
Sums, Differences, Products (x,Bivariate
y) dy dxDistributions 59
x=−∞ y=−∞
Example 2.6
Refer to Example 1.3.
60 Advanced Topics in Introductory Probability
(a) Find the distribution of Z = X + Y ;
(b) Calculate P (Z ≤ 2) using
(i) the distribution of Z;
(ii) the joint distribution of (X, Y ), that is, from the first principle.
Solution
(a) Z = X + Y =⇒ Y = Z − X
Therefore
59
0 < Y < 2 =⇒ 0 ≤ Z − X ≤ 2 =⇒ X ≤ Z ≤ X + 2
(b) Calculate P (Z ≤ 2) using
ADVANCED TOPICS IN INTRODUCTORY
(i) theA distribution
PROBABILITY: FIRST COURSE ofINZ; SUMS, DIFFERENCES, PRODUCTS AND
PROBABILITY THEORY – VOLUME III QUOTIENTS OF BIVARIATE DISTRIBUTIONS
(ii) the joint distribution of (X, Y ), that is, from the first principle.
Solution
(a) Z = X + Y =⇒ Y = Z − X
Therefore
0 < Y < 2 =⇒ 0 ≤ Z − X ≤ 2 =⇒ X ≤ Z ≤ X + 2
It can be seen from the sketch that the region of integration is parti-
tioned into three, namely,
0 ≤ X ≤ Z when 0 < Z ≤ 1
0 ≤ X ≤ 1 when 1 < Z ≤ 2
Z − 2 ≤ X ≤ 1 when 2 < Z ≤ 3
For 0 < Z ≤ 1,
z x(z − x) 7 3
s(z) = x2 + dx = z
x=0 3 18
For 1 < Z ≤ 2,
This e-book
1 x(z − x) 2 z
s(z) = x +
2
dx = +
x=0 3 9 6
www.setasign.com
60
tioned into three, namely,
0≤X≤Z
ADVANCED TOPICS IN INTRODUCTORY when 0 < Z ≤ 1
PROBABILITY: A FIRST COURSE IN SUMS, DIFFERENCES, PRODUCTS AND
0≤X≤1
PROBABILITY THEORY – VOLUME III
when 1 < Z ≤ 2QUOTIENTS OF BIVARIATE DISTRIBUTIONS
Z − 2 ≤ X ≤ 1 when 2 < Z ≤ 3
For 0 < Z ≤ 1,
z x(z − x) 7 3
s(z) = x +
2
dx = z
x=0 3 18
Sums,For 1 < Z ≤ Products
Differences, 2, and Quotients of Bivariate Distributions 61
Sums, Differences, Products and Quotients of Bivariate
1and Quotients of Bivariate Distributions Distributions 61
Sums, Differences, Products x(z − x) 2 z 61
s(z) = x +
2
dx = +
Finally, for 2 < Z ≤ x=0 3, 3 9 6
Finally, for 2 < Z ≤ 3,
Finally, for
1 2 <Z ≤ 3,
x(z − x)
1
2
s(z) = 1
x2 + x(z − x) dx = 1 −7z 33 + 36z 22 − 57z + 36
s(z) = x=z−2
1 x + x(z 3− x) dx = 18 1 −7z + 36z − 57z + 36
s(z) = x=z−2 x2 + 3 dx = 18 −7z 3 + 36z 2 − 57z + 36
x=z−2 3 18
Therefore, the distribution of Z is
Therefore, the distribution of Z is
Therefore, the distribution
7 of Z is
7 z3,
0<z≤1
18
7 z 33 , 0<z≤1
1818 z , 0<z≤1
2 z
2 + z,
1<z≤2
s(z) =
9
2 + z6 , 1<z≤2
s(z) =
9 + 6, 1<z≤2
s(z) =
91 6
1 −7z 33 + 36z 22 − 57z + 36 , 2 < z ≤ 3
18
1 −7z 3 + 36z 2 − 57z + 36 , 2 < z ≤ 3
18 −7z + 36z − 57z + 36 , otherwise
0, 2<z≤3
0,
18 otherwise
0, otherwise
(b) (i) Using the distribution of Z we calculate the required probability
(b) (i) Using the distribution of Z we calculate the required probability
as follows:
(b) (i) Using the distribution of Z we calculate the required probability
as follows:
as follows:
P (X ≤ 2) = P (0 < z ≤ 1) + P (1 < z < 2)
P (X ≤ 2) = P (01 <7z ≤ 1) +P 2(1 < z < 2)
P (X ≤ 2) = P (0 < z ≤ 1) + P (1 < 2z < x2)
= 1 7 x dx + 2
2 + x dx
= x=0 1 18
7 x dx + 2
x=1 9
2 x+ 6 dx
= x=0 418 1x dx +x=1 9 +62 dx
x=07 z 18 1 x=1
2 9z 6
2
= 7 z 4 11 + 1 2 z + z 2 22
= 18
7 z44 + 31 32 z + z42 1
= 18 4 00 + 3 3 z + 4
4118 4 3 3 4 1
= 41 0 1
= 72 41
= 72
72
(ii) Now, using the joint distribution of (X, Y )
(ii) Now, using the joint distribution of (X, Y )
(ii) Now, using the joint distribution of (X, Y )
P (Z ≤ 2) = f (x, y) dy dx
P (Z ≤ 2) = f (x, y) dy dx
P (Z ≤ 2) = z<2 f (x, y)dy dx
z<21 2−x xy
= z<21 2−x x2 + xy dy dx
x=0 3 dy dx
= 1 2−x x + xy
y=0
2
= x=0 y=0 x2 2+2−x 3 dy dx
1 y=0
x=0 xy 2 2−x3
= 1 x2 y + xy 2−x dx
2
= x=0 1 x y + xy 62 dx
= x=0 x2 y + 6 y=0 dx
62
1
x=0 Advanced
6Topicsy=0
x in Introductory
2 Probability
x
= 1 x2 (2 − x) + x (2 − x)2 − x2 + x dx
2 y=0 2
= x=0 1 x2 (2 − x) + x 6 (2 − x)2 − x + x 6 dx
= x=0 6
1 x (2 − x) + 1 (2 − x) − x +
2 6 dx
x=0 6 6 2 x
= (2x − x ) + (4x − 4x + x ) − (x + ) dx
2 3 2 3
x=0 6 6
1 1 1
2x2 4x4 1 4x2 x4 x3 x2
= − + 2x2 − + − +
3 4 0
6 3 4 0
3 12 0
41
=
72
or equivalently, ∞
s(z) = g(z − y)h(y) dy
−∞
The theorem follows from Theorem 2.5 by noting that since X and Y are
independent:
f (x, z − x) = g(x)h(z − x)
f (z − y, y) = g(z − y)h(y)
The function s is called the convolution of the functions g and h, and the
expression in the Theorem is called the convolution formula.
If X and Y are nonnegative random variables, then the convolution
formula reduces to the following
z
s(z) = g(x)h(z − x) dx
0
This isDifferences,
Sums, because each of g andand
Products h isQuotients
equal to 0offor negativeDistributions
Bivariate argument. 63
Example 2.7
Suppose that X and Y are independent and identically distributed random
variables having p.d.f.’s
and
fY (y) = λ e−λ y , y≥0
respectively, find the distribution of the sum Z = X + Y .
Solution
z
fZ (z) = λ e−λ t λ e−λ(z−t) dt
0
z
= λ2 e−λ z dt
0
= λ2 z e−λ z , z≥0
Note
Note
xi − yj x1 − y1 x1 − y2 · · · x1 − ym x2 − y1 x2 − y2 · · ·
64p(xi , yj ) p(x1 , y1 ) ) · · · p(x
p(x1 , y2Advanced Topics
,
1 my )
in )
Introductory
p(x ,
2 1y 2 , y2 ) · · ·
Probability
p(x
··· x2 − ym xn − y1 xn − y2 ··· xn − ym
··· p(x2 , ym ) p(xn , y1 ) p(xn , y2 ) ··· p(xn , ym )
ExampleSharp
2.8 Minds - Bright Ideas!
For the data in Example 2.2, find the distribution of the difference of the
Employees at FOSS Analytical A/S are living proof of the company value - First - using The Family owned FOSS group is
random new
variables X and Y .
inventions to make dedicated solutions for our customers. With sharp minds and the world leader as supplier of
cross functional teamwork, we constantly strive to develop new unique products - dedicated, high-tech analytical
SolutionWould you like to join our team? solutions which measure and
control the quality and produc-
IndicateFOSS
theworks
various difference
diligently withand
with innovation their corresponding
development as basis for itsjoint probabilities
growth. It is tion of agricultural, food, phar-
reflected in the fact that more than 200 of the 1200 employees in FOSS work with Re- maceutical and chemical produ-
search & Development in Scandinavia and USA. Engineers at FOSS work in production,
i − yj
xdevelopment −1 − (−1) within
and marketing, −1a−wide 0 range 1 0−
−1 of−different (−1)
fields, 0−0
i.e. Chemistry, 0−1 cts. Main activities are initiated
from Denmark, Sweden and USA
p(x ,
i jy )
Electronics, 0.1
Mechanics, Software, 0.2
Optics, 0.11
Microbiology, 0.08
Chemometrics. 0.08 0.26 with headquarters domiciled in
Hillerød, DK. The products are
We offer marketed globally by 23 sales
2 − (−1) 2 − 0 2 − 1
A challenging job in an international and innovative company that is leading in its field. You will get the companies and an extensive net
0.03 to work
opportunity 0.17 0.03
with the most advanced technology together with highly skilled colleagues. of distributors. In line with
the corevalue to be ‘First’, the
Read more about FOSS at www.foss.dk - or go directly to our student site www.foss.dk/sharpminds where
company intends to expand
you can learn more about your possibilities of working together with us on projects, your thesis etc.
Step 2 its market position.
Subtract the various values for Y from the various values for X:
Dedicated Analytical Solutions
FOSS
−y 0 69−1
xSlangerupgade −2 1 0 −1 3 2 1
p(x,
3400y) 0.1 0.2
Hillerød 0.11 0.08 0.02 0.26 0.03 0.17 0.03
Tel. +45 70103370
Step 3 www.foss.dk
By the principle of equality of the probabilities of equivalent events we shall
obtain the table below:
x−y −2 −1 0 1 2 3
63
p(x + y) 0.11 0.2 + 0.26 0.1 + 0.02 0.08 + 0.03 0.17 0.03
64 Advanced Topics in Introductory Probability
ADVANCED TOPICS IN INTRODUCTORY
PROBABILITY: A FIRST COURSE IN SUMS, DIFFERENCES, PRODUCTS AND
· · · x2 − yTHEORY
PROBABILITY m xn– − y1
VOLUME
xIIIn − y2 ··· xn − ym QUOTIENTS OF BIVARIATE DISTRIBUTIONS
· · · p(x2 , ym ) p(xn , y1 ) p(xn , y2 ) ··· p(xn , ym )
Example 2.8
For the data in Example 2.2, find the distribution of the difference of the
random variables X and Y .
Solution
Indicate the various difference with their corresponding joint probabilities
Step 2
Subtract the various values for Y from the various values for X:
x−y 0 −1 −2 1 0 −1 3 2 1
p(x, y) 0.1 0.2 0.11 0.08 0.02 0.26 0.03 0.17 0.03
Step 3
By the principle of equality of the probabilities of equivalent events we shall
obtain the table below:
x−y −2 −1 0 1 2 3
p(x + y) 0.11 0.2 + 0.26 0.1 + 0.02 0.08 + 0.03 0.17 0.03
x−y −2 −1 0 1 2 3
p(x − y) 0.11 0.46 0.12 0.11 0.17 0.03
x−y −2 −1 0 1 2 3
p(x − y) 0.11 0.46 0.12 0.11 0.17 0.03
and
p(x − y) = 1
64
We can
ADVANCED verifyINthat
TOPICS this distribution
INTRODUCTORY of the sum is a probability distri-
PROBABILITY: A FIRST COURSE IN SUMS, DIFFERENCES, PRODUCTS AND
bution. That is,
PROBABILITY THEORY – VOLUME III QUOTIENTS OF BIVARIATE DISTRIBUTIONS
0 ≤ p(x − y) ≤ 1
and
p(x − y) = 1
Example 2.9
For the data in Example 2.3, find the distribution of the difference of the
random variables X and Y .
Solution
It has been shown in Example 2.3 that X and Y are independent. Therefor,
the various values of the sum {x + (−y)} and their corresponding probabil-
ities are given in the following table:
xi − yj −2 − 3 −2 − 6 4−3 4−6
p(xi , yj ) 0.4(0.7) 0.4(0.3) 0.6(0.7) 0.6(0.3)
xi − yj −5 −8 1 −2
p(xi − yj ) 0.28 0.12 0.42 0.18
66 Advanced Topics in Introductory Probability
Theorem 2.7
Suppose that X and Y are continuous random variables having joint
density function f (x, y) and let U = X − Y . Then
∞
fu (u) = f (x, x − u) dx
−∞
or equivalently, ∞
fu (u) = f (u + y, y) dy
−∞
Corollary 2.1
Suppose that X and Y are independent nonnegative continuous random
variables having joint density function f (x, y) with marginal probability
distributions g(x) and h(y) respectively, and let U = X − Y . Then
∞
fu (u) = g(x) h(x − u) dx
0
or equivalently, ∞
fu (u) = g(u + y) h(y) dy
0
Example 2.10
Refer to Example 1.3.
(a) Find the distribution of U = X − Y ;
65
1
fu (u) = g(x) h(x − u) dx
0
ADVANCED TOPICS IN INTRODUCTORY
or equivalently,
PROBABILITY: A FIRST COURSE IN
∞ SUMS, DIFFERENCES, PRODUCTS AND
fu (u) =III
PROBABILITY THEORY – VOLUME g(u + y) h(y) dy QUOTIENTS OF BIVARIATE DISTRIBUTIONS
0
Example 2.10
Refer to Example 1.3.
(a) Find the distribution of U = X − Y ;
1
(b) Using the distribution in (a) calculate (i) P (U ≤ 1) (ii) P U < .
2
Solution
(a) X − Y = U =⇒ X = U + Y .
Therefore
Sums, Differences, Products and Quotients of Bivariate Distributions 67
0 < X < 1 =⇒ 0 ≤ U + Y ≤ 1 =⇒ −Y ≤ U ≤ 1 − Y
It can be seen from the sketch that the region of integration is parti-
tioned into three, namely,
−U ≤ Y ≤ 2 when −2 ≤ U < −1
−U ≤ Y ≤ 1 − U when −1 ≤ U < 0
0≤Y ≤1−U when 0 ≤ U ≤ 1
For −2 ≤ U < −1
2 7 4
fu (u) = u + u y + y 3 dy
2
−u 3 3
5 3 14 32
= u + 2 u2 + u+
18 3 9
For −1 ≤ U < 0
1−u 7 4
fu (u) = u2 + u y + y 2 dy
−u 3 3
4 u
= −
9 6
For 0 ≤ U ≤ 1
1−u 7 4
fu (u) = u + u y + y2 2
dy
0 3 3
4 u 5 3
= − − u
9 6 18
66
ADVANCED TOPICS IN INTRODUCTORY
PROBABILITY: A FIRST COURSE IN SUMS, DIFFERENCES, PRODUCTS AND
PROBABILITY
68 THEORY – VOLUME Advanced
III QUOTIENTS
Topics in Introductory OF BIVARIATE DISTRIBUTIONS
Probability
Therefore
5 3 14 32
u + 2 u2 + u + , −2 ≤ u < −1
18 3 9
4 u
− , −1 ≤ u < 0
fu (u) = 9 6
4 u 5 3
− − u , 0≤u≤1
9 6 18
0, elsewhere
(b)
(i) P (U ≤ 1) = P (−2 ≤ U ≤ 1)
= P (−2 ≤ U < 1) + P (−1 ≤ U < 0) + P (0 ≤ U ≤ 1)
−1
5 14 32
= u + 2u + 3
u+ 2
du
−2 18 3 9
0 1
4 u 4 u 5 3
+ − du + − − u du
−1 9 6 0 9 6 18
13 19 21
= + +
72 36 72
= 1
1
(ii) P U≤ = P (−2 ≤ U ≤ 1) + P (−1 ≤ U < 0)
2
1
+P 0 ≤ U ≤
2
−1
5 14 32
= u3 + 2u2 + u+ du
−2 18 3 9
0 1
u 4 2 4 u 5
+ du +
− − − u3 du
−1 9 6 0 9 6 18
13 19 227
= + +
72 36 1152
1043
=
1152
67
4 u 4 u 5 3
+ − du + − − u du
−1 9 6 0 9 6 18
ADVANCED TOPICS IN INTRODUCTORY
13 IN19 21
=
PROBABILITY: A FIRST COURSE+ + SUMS, DIFFERENCES, PRODUCTS AND
36
72 III
PROBABILITY THEORY – VOLUME 72 QUOTIENTS OF BIVARIATE DISTRIBUTIONS
= 1
1
(ii) P U≤ = P (−2 ≤ U ≤ 1) + P (−1 ≤ U < 0)
2
1
+P 0 ≤ U ≤
2
−1
5 3 14 32
= u + 2u + u +
2
du
−2 18 3 9
0 1
u 4 2 4 u 5
+ − du + − − u3 du
−1 9 6 0 9 6 18
13 19 227
= + +
72 36 1152
1043
=
1152
Sums, Differences, Products and Quotients of Bivariate Distributions 69
xi yj x1 y1 x1 y2 ··· x1 ym x2 y1 x2 y2 ···
p(xi , yj ) p(x1 , y1 ) p(x1 , y2 ) ··· p(x1 , ym ) p(x2 , y1 ) p(x2 , y2 ) ···
··· x2 ym xn y1 xn y2 ··· xn ym
··· p(x2 , ym ) p(xn , y1 ) p(xn , y2 ) ··· p(xn , ym )
Example 2.11
For the data in Example 2.2, find the distribution of the product of the
random variables X and Y .
Solution
Step 1
Indicate the various products with their corresponding joint probabilities
Step 2
Multiply the various values for X by the various values for Y :
68
p(xi , yj ) 0.1 0.2 0.11 0.08 0.08 0.26
ADVANCED TOPICS IN INTRODUCTORY
PROBABILITY: A FIRST COURSE IN SUMS, DIFFERENCES, PRODUCTS AND
2(−1) 2(0) 2(1)
PROBABILITY THEORY – VOLUME III QUOTIENTS OF BIVARIATE DISTRIBUTIONS
0.03 0.17 0.03
70
Step 2 Advanced Topics in Introductory Probability
70 Advanced Topics in Introductory Probability
Multiply the various values for X by the various values for Y :
xy 1 0 −1 0 0 0 −2 0 2
xyy)
p(x, 1
0.1 0
0.2 −1
0.11 0
0.08 0
0.02 0
0.26 −2
0.03 0
0.17 2
0.03
p(x, y) 0.1 0.2 0.11 0.08 0.02 0.26 0.03 0.17 0.03
Step 3
Step
By the3 principle of equality of the probabilities of equivalent events we shall
By the the
obtain principle
table of equality of the probabilities of equivalent events we shall
below:
obtain the table below:
xy −2 −1 0 1 2
xy
p(xy) −2
0.03 −1
0.11 0 + 0.26 + 0.17
0.2 + 0.08 + 0.02 1
0.1 2
0.03
p(xy) 0.03 0.11 0.2 + 0.08 + 0.02 + 0.26 + 0.17 0.1 0.03
Step 4
Step 4 the final result as in the table below:
Present
Present the final result as in the table below:
xy −2 −1 0 1 2
xy
p(xy) −2
0.03 −1
0.11 0
0.73 1
0.1 2
0.03
p(xy) 0.03 0.11 0.73 0.1 0.03
We can verify that this distribution of the sum is a probability distri-
WeThat
bution. can verify
is, that this distribution of the sum is a probability distri-
bution. That is, 0 ≤ p(xy) ≤ 1
0 ≤ p(xy) ≤ 1
and
and p(xy) = 1
p(xy) = 1
Example 2.12
Example 2.12in Example 2.3, find the distribution of the product of X and
For the data
For
Y. the data in Example 2.3, find the distribution of the product of X and
Y.
Solution
Solution
It has been shown in Example 2.3 that X and Y are independent. Therefor,
It has
the been shown
various values in
of Example 2.3 that
the product X and
xy and are independent.
Y corresponding
their Therefor,
probabilities
the various values of the product
are given in the following table: xy and their corresponding probabilities
are given in the following table:
xi yj −2(3) −2(6) 4(3) 4(6)
−2(3) 0.4(0.3)
xi y, yj ) 0.4(0.7)
p(x 4(3)
−2(6) 0.6(0.7) 4(6)
0.6(0.3)
i j
p(xi , yj ) 0.4(0.7) 0.4(0.3) 0.6(0.7) 0.6(0.3)
Sums, Differences, Products and Quotients of Bivariate Distributions 71
By the principle of the equality of probabilities of equivalent events we obtain
By 71
the the principle of the equality of probabilities
Sums, Differences, Products and Quotients of of equivalent
Bivariate events we obtain
Distributions
result summarised in the following table.
the result
xi ysummarised
j −6 in the following
−12 12 24table.
x iyyj ) 0.28
p(x
i j −6 0.12 −12 0.42
12 0.1824
p(xi yj ) 0.28 0.12 0.42 0.18
Example 2.13
Suppose there
Example 2.13 are two independent distributions:
Suppose there are two independent distributions:
x −2 4
g(x)
x 0.4
−2 0.6
4
g(x) 0.4 0.6
y −2 4
h(y)
y 0.4
−2 0.6
4 69
x −2 4
ADVANCED
g(x) 0.4 IN0.6
x TOPICS
−2 4INTRODUCTORY
PROBABILITY: A FIRST COURSE IN SUMS, DIFFERENCES, PRODUCTS AND
PROBABILITY 0.4 0.6
g(x) THEORY – VOLUME III QUOTIENTS OF BIVARIATE DISTRIBUTIONS
y −2 4
h(y)
y 0.4
−2 0.6
4
h(y) 0.4 0.6
x2 4 16
(a)
x2 ) 0.4 0.6
p(x2
4 16
(a)
p(x2 ) 0.4 0.6
xy 8 −8 −8 16
p(x,
xy y) 0.16
8 0.24
−8 0.24
−8 0.36
16
p(x, y) 0.16 0.24 0.24 0.36
xy 4−8 16
72 0.16
0.48 Advanced Topics in Introductory Probability
p(xy)
xy −8 0.36
4 16
p(xy) 0.16 0.48 0.36
2.4.2 Product of Continuous Bivariate Random Variables
Theorem 2.8
Suppose that X and Y are continuous random variables having joint
density function f (x, y) and let V = XY . Then
∞
1 v
fv (v) = f x, dx
−∞ |x| x
or equivalently,
∞
1 v
fv (v) = f ,y dy
−∞ |y| y
Corollary 2.2
Suppose that X and Y are independent, nonnegative continuous random
variables having joint density function f (x, y) with marginal probability
distributions g(x) and h(y) respectively and let V = XY . Then
∞
1 v
fv (v) = h g(x) dx, y>0
0 x x
or equivalently,
∞
1 v
fv (v) = g h(y) dy, x>0
0 y y
70
We can calculate the probability P (XY ≤ t) for the independent, nonnega-
or equivalently,
ADVANCED TOPICS IN INTRODUCTORY
PROBABILITY: A FIRST COURSE∞IN 1 v SUMS, DIFFERENCES, PRODUCTS AND
fv (v)
PROBABILITY THEORY =
– VOLUME III g h(y) dy, x > 0QUOTIENTS OF BIVARIATE DISTRIBUTIONS
0 y y
The integration is restricted to the first quadrant, since g(x)h(y), the joint
density function of X and Y, are zero elsewhere.
Example 2.14
Refer to Example 1.3.
(a) Find the distribution of V = XY ;
1
(b) Using the distribution in (a), calculate P ≤U ≤1 .
2
Solution
V
(a) XY = V =⇒ X = .
Y
Therefore
V
0 < X < 1 =⇒ 0 ≤ ≤ 1 =⇒ 0 ≤ V ≤ Y
Y
The region defined by the preceding inequality and 0 < Y < 2 is
sketched in the diagram below.
Therefore
2 v2 1 v
h(u) = + dy
y=v y3 3 y
71
ADVANCED TOPICS IN INTRODUCTORY
PROBABILITY: A FIRST COURSE IN SUMS, DIFFERENCES, PRODUCTS AND
PROBABILITY THEORY – VOLUME III QUOTIENTS OF BIVARIATE DISTRIBUTIONS
74 Advanced Topics in Introductory Probability
v 1 2
= ln y −
3 2 y2 v
v 2 1 v2
= ln + − , 0<v≤2
3 v 2 8
(b) Now,
1
1 v 2 1 v2
P ≤V ≤1 = ln + − dv
2 1
2
3 v 2 8
1
v v3 v2 2
= − + 2 + ln
2 24 6 v 1
2
= 0.3338
X
Table 2.4 Various Quotient and their Corresponding Probabilities
Y
xi x1 x1 x1 x2 x2
yj The Wake
p(xi , yj )
y1
p(x1 , y1 )
y2
p(x1 , y2 )
···
···
ym
p(x1 , ym )
y1
p(x2 , y1 )
y2
p(x2 , y2 )
···
···
the only emission we want to leave behind
x2 xn xn xn
··· ···
ym y1 y2 ym
··· p(x2 , ym ) p(xn , y1 ) p(xn , y2 ) ··· p(xn , ym )
.QYURGGF'PIKPGU/GFKWOURGGF'PIKPGU6WTDQEJCTIGTU2TQRGNNGTU2TQRWNUKQP2CEMCIGU2TKOG5GTX
6JGFGUKIPQHGEQHTKGPFN[OCTKPGRQYGTCPFRTQRWNUKQPUQNWVKQPUKUETWEKCNHQT/#0&KGUGN6WTDQ
2QYGTEQORGVGPEKGUCTGQHHGTGFYKVJVJGYQTNFoUNCTIGUVGPIKPGRTQITCOOGsJCXKPIQWVRWVUURCPPKPI
HTQOVQM9RGTGPIKPG)GVWRHTQPV
(KPFQWVOQTGCVYYYOCPFKGUGNVWTDQEQO
72
The probability distribution of the quotient of two discrete random variables
X
ADVANCED TOPICS IN INTRODUCTORY
X and Y ( A) FIRST
PROBABILITY:
whoseCOURSE
joint probability
IN
functions are in tabular form can be
SUMS, DIFFERENCES, PRODUCTS AND
Y
calculated using
PROBABILITY the– principle
THEORY VOLUME IIIof the
equality of the probabilities of BIVARIATE
QUOTIENTS OF two DISTRIBUTIONS
equivalent events. This is demonstrated in Table 2.4.
X
Table 2.4 Various Quotient and their Corresponding Probabilities
Y
xi x1 x1 x1 x2 x2
··· ···
yj y1 y2 ym y1 y2
p(xi , yj ) p(x1 , y1 ) p(x1 , y2 ) ··· p(x1 , ym ) p(x2 , y1 ) p(x2 , y2 ) ···
x2 xn xn xn
··· ···
ym y1 y2 ym
··· p(x2 , ym ) p(xn , y1 ) p(xn , y2 ) ··· p(xn , ym )
Example 2.15
Example
Consider 2.15
the distributions of two independent random variables X and Y .
Consider the distributions of two independent random variables X and Y .
x 2 5
x
g(x) 2
0.2 5
0.8
g(x) 0.2 0.8
y 4 8
y
h(y) 4
0.7 8
0.3
h(y) 0.7 0.3
X Y
Find the distribution of X and Y.
Find the distribution of Y and X.
Y X
Solution
Solution
Proceeding as before:
Proceeding as before:
X
The distribution of X is thus
The distribution of Y is thus
Y
Y
The distribution of Y follows similarly and is given in the table below:
The distribution of X follows similarly and is given in the table below:
X
73
ADVANCED TOPICS IN INTRODUCTORY
PROBABILITY: A FIRST COURSE IN SUMS, DIFFERENCES, PRODUCTS AND
PROBABILITY THEORY – VOLUME Advanced
76 III QUOTIENTS
Topics in Introductory OF BIVARIATE DISTRIBUTIONS
Probability
Theorem 2.9
Suppose that X and Y are continuous random variables having joint
X
density function f (x, y) and let W = . Then
Y
∞
fw (w) = |y| f (wy, y) dy
−∞
Corollary 2.3
Suppose that X and Y are independent, nonnegative continuous random
variables having joint density function f (x, y) with marginal probability
X
distributions g(x) and h(y) respectively, and let W = . Then
Y
∞
fw (w) = y f (wy) h(y) dy
0
X
We can calculate the cumulative distribution function of for the indepen-
Y
dent, nonnegative random variables X and Y by
X
P ≤t = y f (wy)h(y) dy
Y
0 <x
y
≤t
0<y<∞
t ∞
= y f (wy) h(y) dy dw
0 0
t
= fw (w) dw
0
Example 2.16
Refer to Example 1.3.
X
(a) Find the distribution of W = ;
Y
1 Distributions
Sums, Differences, Products and Quotients of Bivariate 77
(b) Using the distribution in (a), calculate P W ≤ .
4
Solution
X
(a) Distribution of W = :
Y
X
= W =⇒ X = Y W
Y
Therefore
1
0 < X < 1 =⇒ 0 ≤ Y W ≤ 1 =⇒ 0 ≤ W ≤
Y
The region defined by the preceding inequality and 0 < Y < 2 is
sketched in the diagram below.
74
Y
X
ADVANCED TOPICS IN INTRODUCTORY= W =⇒ X = Y W
Y
PROBABILITY: A FIRST COURSE IN SUMS, DIFFERENCES, PRODUCTS AND
Therefore
PROBABILITY THEORY – VOLUME III QUOTIENTS OF BIVARIATE DISTRIBUTIONS
1
0 < X < 1 =⇒ 0 ≤ Y W ≤ 1 =⇒ 0 ≤ W ≤
Y
The region defined by the preceding inequality and 0 < Y < 2 is
sketched in the diagram below.
It can be seen from the sketch that the region of integration is parti-
tioned into two, namely,
1
0 < Y ≤ 2 when 0 < W ≤
2
1 1
0<Y ≤ when ≤W <∞
W 2
w 2 1
h(w) = w2 + y 3 dy, 0≤w<
3 0 2
1
w w 1
= w2 + y 3 dy, ≤w<∞
3 0 2
Therefore w
4(w2 + ), 0≤w< 1
3 2
h(w) =
1 1 w
(w + ), 1
≤w<∞
4w3 3 2
RUN FASTER.
RUN LONGER.. READ MORE & PRE-ORDER TODAY
RUN EASIER… WWW.GAITEYE.COM
75
2
ADVANCED TOPICS IN INTRODUCTORY
w 1
h(w) = w2 + y 3 dy, 0≤w<
PROBABILITY: A FIRST COURSE IN 3 0 2
SUMS, DIFFERENCES, PRODUCTS AND
PROBABILITY THEORY – VOLUME III 1
w w 1 QUOTIENTS OF BIVARIATE DISTRIBUTIONS
= 2
w + 3
y dy, ≤w<∞
3 0 2
Therefore w
4(w2 + ), 0≤w< 1
3 2
h(w) =
1 1 w
78
Advanced
(w +Topics
), 1in≤Introductory
w<∞ Probability
4 w3 3 2
1
(b) Calculation of P W ≤ .
4
X 1 1
P ≤ = P W ≤
Y 4 4
1
4 w
= 4 w2 + dw
0 3
1
=
16
EXERCISES
and that of Y by
77
10 10
1 t
2.15 Suppose X and Y are independent Normal random variables and X is
ADVANCED , σ12 ) and
N (µ1TOPICS Y is N (µ2 , σ22 ), use the
IN INTRODUCTORY convolution theorem to show
PROBABILITY:
that X A + FIRST
Y is aCOURSE
N (µ1 +INµ2 , σ12 + σ22 ). SUMS, DIFFERENCES, PRODUCTS AND
PROBABILITY THEORY – VOLUME III QUOTIENTS OF BIVARIATE DISTRIBUTIONS
2.16 Refer to Example 1.15. Compute P (X + Y > 2).
and that of Y by
10 10
1
MY (t) = 3et + 1
4
2.23
WHAT you need, WHEN you need it
Refer to Exercise 1.10. Find
At IDC Technologies we can tailor our technical and engineering
(a) the distribution of the quotient
training workshops to suitW
your Y ;
= needs.
X We have extensive OIL & GAS
experience in training technical and X
engineering staff and ENGINEERING
(b) (i) P ( X
Y = 1); (ii) P (W = 2); (iii) P ( = 3).
have trained people in organisationsYsuch as General
ELECTRONICS
Motors,1.11.
2.24 Refer to Exercise Shell,Find
Siemens, BHP and Honeywell to name a few.
Our onsite training is cost effective, convenient and completely AUTOMATION &
(a) the distribution oftothe
customisable the quotient =XY ;
W engineering
technical and areas you want PROCESS CONTROL
covered. Our workshops are all comprehensive hands-on learning
(b) P (W = experiences
5). with ample time given to practical sessions and MECHANICAL
demonstrations. We communicate well to ensure that workshop content ENGINEERING
2.25 Refer to Exercise 1.12, match
and timing = 23 . Findskills, and abilities of the participants.
wherethek knowledge,
INDUSTRIAL
(a) the distribution
We run of P (X
onsite − Y );all year round and hold the workshops on
training DATA COMMS
your premises or a venue of your choice for your convenience.
(b) P (− 2 ≤ X − Y ≤ 1).
1
ELECTRICAL
For a no obligation proposal, contact us today POWER
2.26 Refer to Exercise 1.12, where k = 23 . Find
at [email protected] or visit our website
for more information: www.idc-online.com/onsite/
(a) the distribution of XY ;
(b) P (XY ≤ 45 ). Phone: +61 8 9321 1702
Email: [email protected]
2.27 Refer to Exercise 1.12, where k = 23 . Find
Website: www.idc-online.com
Y );
(a) the distribution of P ( X
(b) P ( X
Y < 10).
(b) (i) P ( X
Y = 1); (ii) P (W = 2); (iii) P ( X
Y = 3).
(b) P (W = 5).
Y );
(a) the distribution of P ( X
(b) P ( X
Y < 10).
79
ADVANCED TOPICS IN INTRODUCTORY
PROBABILITY: A FIRST COURSE IN EXPECTATION AND VARIANCE OF
PROBABILITY THEORY – VOLUME III BIVARIATE DISTRIBUTIONS
Chapter 3
3.1 INTRODUCTION
Chapter
3.2
3
EXPECTATION OF BIVARIATE RANDOM VARIABLES
Example 3.1
3.1 INTRODUCTION
For the data in Example 2.3, find E(X) and E(Y ).
Solution
In the earlier text5 , we discussed the numerical characterisation of a single
For
84 convenience
random variable.weSpecifically,
present theAdvanced
data
we here.
Topics
discussed in in Introductory
Chapter Probability
6 the concepts of
expectation
5
and variance.
Nsowah-Nuamah, 2017 In this chapter, we shall extend the concept of
expectation and variance to thex case of −2bivariate
4 distribution and discuss a
i
83
few more properties. We shall also realise that in this case, there arises the
p(xi ) 0.4 0.6
concept of covariance.
3.2 E(X)
EXPECTATION OF= BIVARIATE
−2(0.4) + 4(0.6) = 1.6 VARIABLES
RANDOM
Definition 3.1
Definition 3.1a two-dimensional discrete random variable and let
Let (X, Y ) be
Let (X, Y ) be a two-dimensional discrete random variable and let
Z = H(X, Y ). Then
Z = H(X, Y ). Then
∞
∞
= Bivariate
E(Z) of
Expectation and Variance ∞ ∞ H(x
i , yj )p(xi , yj )
Distributions 85
Expectation and Variance = Bivariate
E(Z) of i=1 j=1 H(x i , yj )p(xi , yj )
Distributions 85
Expectation and Variance of Bivariate
i=1 j=1 Distributions 85
For the finite case,
For the finite case,
For the finite case, n
m
E(Z) = n
n
m H(xi , yj ) p(xi , yj )
m
E(Z) =
i=1 j=1 H(xi , yj ) p(xi , yj )
E(Z) = H(xi , yj ) p(xi , yj )
i=1 j=1
i=1 j=1
That is,
That is,
That is,
E[H(X, Y )] = H(x1 , y1 ) p(x1 , y1 ) + H(x1 , y2 ) p(x1 , y2 ) + · · ·
E[H(X, Y )] = H(x
+H(x 1 , y,1y) p(x 1 , y1,)y+)H(x
) p(x + H(x 1 , y2 ), yp(x 1 , y2 ), y+ ···
E[H(X, Y )] = H(x 1 , 1y1 )m p(x 1 , y11 ) +m H(x 1 , y2 ) 1 ) p(x
p(x 1 , y2 ) +1 )· · ·
+H(x1 ,, yym))p(x
+H(x p(x1 , ym ) ))++ H(x
· + H(x2 , y1 ) ,p(x
ym22),,p(xy1 ) , y )
2
1 2 m ) p(x21,,yy2m +· ·H(x 2 , y1 2) p(x y1 )2 m
+H(x
+ , y
· · · +2 ,H(x ) p(x , y ) + · · · + H(x , y ) p(x
p(x22n,,,yyym )
+H(x n , y21,)yp(x
y2 ) p(x 2) + n ,·y·1·)+ +H(xH(x2n, ,yym2)) p(x 2)
2 2 2 2 2 m
m
+ · · · + H(xn ,, yy1 ))p(x p(xnnn,,,yyy11m) )+ H(xn , y2 ) p(xn , y2 )
+ · · · + H(xn m 1 ) p(x ) + H(xn , y2 ) p(xn , y2 )
+ · · · + H(xn , ym ) p(xn , ym )
+ · · · + H(xn , ym ) p(xn , ym )
Example 3.2
Example
For 3.2
the data
Example 3.2 in Example 2.3, find the expectation of the function
For the data in Example 2.3, find the expectation of the function
For the data in Example 2.3, find the expectation of the function
H(X, Y ) = X 2 Y
H(X, Y ) = X 2 Y
H(X, Y ) = X 2 Y
Solution
Solution
Solution 2
E(X Y ) = {(x1 )2 (y1 )} p(x1 , y1 ) + {(x1 )2 (y2 )}p(x1 , y2 )
E(X 22 Y ) = +{(x
{(x )2 (y 2 1 )} p(x1 , y1 ) + {(x1 )2 (y
2
2 2 )}p(x1 , y2 )
E(X Y ) = {(x11 )22)(y(y 1 )}p(x
1 )} p(x1 ,2y, 1y)1 +) +{(x {(x 1 )2 )(y(y 2 )}p(x
2 )}p(x 1 ,2y, 2y)2 )
+{(x 22 )2 (y1 )}p(x2 , y1 ) +
2
(−2) 2(3)(0.28)
= +{(x +2(−2) 2 {(x2 )2 (y2 )}p(x
2
2 , y2 )
) (y1 )}p(x , y1 ) +(6)(0.12)
{(x2 ) (y+ (4)2 (3)(0.42)
2 )}p(x 2 , y2 )
= (−2)
+(4)2 (3)(0.28)
2
2
(6)(0.18) + (−2)2 (6)(0.12) + (4)2 (3)(0.42)
2 2
= (−2) (3)(0.28) + (−2) (6)(0.12) + (4) (3)(0.42)
+(4)22 (6)(0.18)
= 43.68
+(4) (6)(0.18)
= 43.68
= 43.68
Theorem 3.1 81
Theorem
Let (X, Y )3.1
be two-dimensional discrete random variable and let
= (−2)2 (3)(0.28) + (−2)2 (6)(0.12) + (4)2 (3)(0.42)
+(4)2 (6)(0.18)
ADVANCED TOPICS IN INTRODUCTORY
PROBABILITY: A FIRST COURSE IN EXPECTATION AND VARIANCE OF
= 43.68
PROBABILITY THEORY – VOLUME III BIVARIATE DISTRIBUTIONS
Theorem 3.1
Let (X, Y ) be two-dimensional discrete random variable and let
H(X, Y ) = XY . Then
n
m
E(XY ) = xi yj p(xi , yj )
86 Advanced Topics in Introductory Probability
i=1 j=1
86
86 Advanced
Advanced Topics in
Topics in Introductory
Introductory Probability
Probability
Example 3.3
Example
Refer
Example 3.3
to Example
3.3 2.2. Find E(XY ) from the parent joint probability dis-
Refer
Refer to Example 2.2.
to
tribution.Example 2.2. Find E(XY )) from
Find E(XY from the
the parent
parent joint
joint probability
probability dis-
dis-
tribution.
tribution.
Solution
Solution
The parent distribution of XY is given in the table below.
Solution
The
The parent
parent distribution
distribution of XY is
of XY is given
given in
in the
the table
table below.
below.
Y
X −1 Y
0Y 1 Row Totals
X
X
−1 −1
0.10
−1 00
0.20 11
0.11 Row Totals
Row0.41
Totals
−1
−1 0.10
(1)
0.10 0.20
(0)
0.20 0.11
(-1)
0.11 0.41
0.41
0 (1)
0.08
(1) (0)
0.02
(0) (-1)
0.26
(-1) 0.36
00 0.08
(0)
0.08 0.02
(0)
0.02 0.26
(0)
0.26 0.36
0.36
2 (0)
0.03
(0) (0)
0.17
(0) (0)
0.03
(0) 0.23
22 0.03
(-2)
0.03 0.17
(0)
0.17 0.03
(2)
0.03 0.23
0.23
Column Totals (-2)
(-2)
0.21 (0)
(0)
0.39 (2)
(2)
0.40 1.00
Column
Column Totals
Totals 0.21
0.21 0.39
0.39 0.40
0.40 1.00
1.00
The values in parentheses are for xy. Multiplying these values by their
The
The values
values in
corresponding
in parentheses
probabilities,
parentheses are for
we obtain
are xy.the
for xy. Multiplying these
these values
following table:
Multiplying values by
by their
their �e Graduate Programme
I joined
corresponding
MITAS because
probabilities, we obtain the following table: for Engineers and Geoscientists
corresponding probabilities, we obtain the following table:
I wanted real responsibili� www.discovermitas.com
Maersk.com/Mitas �e G
I joined MITAS
Y because for Engine
X I wanted
−1 0real 1responsibili�
Y
Y Ma
X
X
−1 −1
0.10
−1 00
0.00 11
-0.11
0
−1
−1 0.10
0.00
0.10 0.00 -0.11
0.00
-0.11
200 0.00
-0.06
0.00 0.00 0.00
0.06
0.00
22 -0.06
-0.06 0.00
0.00 0.06
0.06
Hence,
Hence,
Hence,
E(XY ) = n
n
n
m
m xi yj p(xi , yj )
m
Month 16
E(XY )) =
E(XY
p(xii,, yyjj))
= i=1 j=1 xxiiyyjj p(x I was a construction Mo
= 0.10
i=1 + 0.00 + (−0.11) + 0.00 + · · · + 0.00 + 0.06
i=1 j=1
=
= 0.10
j=1
0.10 +
−0.01+ 0.00
0.00 + + (−0.11)
(−0.11) + + 0.00
0.00 +
+ ·· ·· ·· +
+ 0.00
0.00 +
+ 0.06
0.06
supervisor ina const
I was
=
= −0.01
−0.01 the North Sea super
Corollary 3.1
Corollary
advising and the No
Let (X, Y )3.1
Corollary be two-dimensional discrete random variable and let H(X, Y ) =
3.1
Let
Let (X, Y )) be
(X, Y be two-dimensional
two-dimensional discrete
discrete random
random variable
Real work
variable and
and let
let H(X,
H(X, he
helping
Y )) =
Y = foremen advis
International
al opportunities
Internationa
�ree wo
work
or placements ssolve problems
Real work he
helping fo
International
Internationaal opportunities
�ree wo
work
or placements ssolve pr
82
(0) (0) (0)
2 0.03
ADVANCED TOPICS IN INTRODUCTORY 0.17 0.03 0.23
PROBABILITY: A FIRST COURSE IN (-2) (0) (2) EXPECTATION AND VARIANCE OF
PROBABILITY THEORY – VOLUME III BIVARIATE DISTRIBUTIONS
Column Totals 0.21 0.39 0.40 1.00
The values in parentheses are for xy. Multiplying these values by their
corresponding probabilities, we obtain the following table:
Y
X −1 0 1
−1 0.10 0.00 -0.11
0 0.00 0.00 0.00
2 -0.06 0.00 0.06
Hence,
n
m
E(XY ) = xi yj p(xi , yj )
i=1 j=1
= 0.10 + 0.00 + (−0.11) + 0.00 + · · · + 0.00 + 0.06
= −0.01
and n
m
E(Y ) = yj p(xi , yj )
i=1 j=1
Example 3.4
For the data in Example 2.2, find the expectation of the functions
(a) H(X, Y ) = X
(b) H(X, Y ) = Y .
Solution
(a) From the table in Example 3.3, we calculate the values of the cells with
the xi values. Thus, for x1 = −1
Similarly, for x2 = 0
3
x2 p(x2 , yj ) = 0 + 0 + 0 = 0
j=1
and for x3 = 2
3
x3 p(x3 , yj ) = 0.06 + 0.34 +830.06 = 0.46
j=1
j=1
Similarly, for x2 = 0
ADVANCED TOPICS IN INTRODUCTORY
PROBABILITY: A FIRST COURSE
3 IN EXPECTATION AND VARIANCE OF
PROBABILITY THEORY – VOLUME III , y )
x2 p(x =0+0+0=0 BIVARIATE DISTRIBUTIONS
2 j
j=1
and for x3 = 2
3
x3 p(x3 , yj ) = 0.06 + 0.34 + 0.06 = 0.46
j=1
88 Advanced Topics in Introductory Probability
The results are presented in the table below:
Y
X −1 0 1 Row Totals
−1 -0.1 -0.2 -0.11 -0.41
0 0 0 0 0
2 0.06 0.34 0.06 0.46
Hence
3
3
E(X) = xi p(xi , yj )
i=1 j=1
= −0.41 + 0 + 0.46
= 0.05
and for y3 = 1
3
y3 p(xi , y3 ) = 0.11 + 0.26 + 0.03 = 0.40
i=1
Y
X −1 0 1
−1 -0.1 0 0.11
0 -0.08 0 0.26
Expectation and Variance of2Bivariate
-0.03Distributions
0 0.03 89
Total -0.21 0 0.40
Hence
3
3
E(Y ) = yj p(xi , yj )
j=1 i=1
= −0.21 + 0 + 0.40
= 0.19
Theorem 3.2
Theorem
Let (X, Y )3.2
be two-dimensional discrete random variable and let
Let
H(X,(X,
Y )Y=) XY
be .two-dimensional
Then discrete random variable and let
H(X, Y ) = XY . Then
E(XY ) = xy p(xy)
E(XY ) = xy xy p(xy)
xy
Example 3.5
Example
Refer 3.5
to Example 2.2. Find E(XY ) from the derived joint probability dis-
Refer to Example
tribution. 2.2. Find E(XY ) from the derived joint probability dis-
tribution.
Solution
Solution
From Example 2.11, we have the following:
From Example 2.11, we have the following:
xy −2 −1 0 1 2
xy
p(xy) −2
0.03 −1
0.11 0
0.73 1
0.1 2
0.03
p(xy) 0.03 0.11 0.73 0.1 0.03
xy p(xy) -0.06 -0.11 0 0.1 0.06
xy p(xy) -0.06 -0.11 0 0.1 0.06
Hence
Hence
E(XY ) = xy p(xy)
E(XY ) = xy xy p(xy)
xy
www.job.oticon.dk
85
From Example 2.11, we have the following:
Hence
90 Advanced Topics in Introductory Probability
E(XY ) = xy p(xy)
xy
= −0.06 + (−0.11) + 0 + 0.1 + 0.06
= −0.01
Theorem 3.3
Let (X, Y ) be a two-dimensional random variable with probability
mass function p(x, y) and let H(X, Y ) = X.
Then the function
n
m
E(X) = xi p(xi , yj )
i=1 j=1
simplifies to
n
E(X) = xi p(xi )
i=1
Proof
n
m
E(X) = xi p(xi , yj )
i=1 i=1
n m
= xi p(xi , yj )
i=1 i=1
(since the x values do not depend on yand can therefore be
taken from the summation of y)
n
= xi p(xi )
i=1
(from the definition of the marginal probability mass function
Expectation and Variance of Bivariate Distributions 91
of X in Definition 1.11)
Example 3.6
Refer to Example 2.2. Using the marginal distributions, find
(a) E(X) (b) E(Y ).
86
Solution
E(Y ) = yi p(xi , yj )
i=1 j=1
ADVANCED TOPICS IN INTRODUCTORY
which also reduces
PROBABILITY: toCOURSE IN
A FIRST m EXPECTATION AND VARIANCE OF
PROBABILITY THEORY – VOLUME III BIVARIATE DISTRIBUTIONS
E(Y ) = yj p(yj )
j=1
Example 3.6
Refer to Example 2.2. Using the marginal distributions, find
(a) E(X) (b) E(Y ).
Solution
(a) From Example 2.11, the marginal distributions of X is:
x −1 0 2
p(x) 0.41 0.36 0.23
x p(x) -0.41 0 0.46
Hence
n
E(X) = xi p(xi )
i=1
= −41 + 0 + 0.46
= 0.05
y −1 0 1
p(y) 0.21 0.39 0.4
y p(y) -0.21 0 0.4
Hence
m
92
Advanced Topics in Introductory Probability
E(Y ) = E(Y ) = yj p(yj )
j=1
= −21 + 0 + 0.4
= 0.19
Definition 3.2
Let (X, Y ) be a two-dimensional random variable with probability
density function f (x, y) and let Z = H(X, Y ).
Then ∞ ∞
E(Z) = H(x, y) f (x, y) dx dy
−∞ −∞
Theorem 3.4
Let (X, Y ) be a two-dimensional random variable with probability
density function f (x, y) and let H(X, Y ) = XY . Then expectation
of the product of X and Y is
∞ ∞
E(XY ) = xy f (x, y) dx dy
−∞ −∞
87
Example 3.7
Let (X, Y ) be a two-dimensional random variable with probability
density function f (x, y) and let Z = H(X, Y ).
ADVANCED
Then TOPICS IN INTRODUCTORY
∞
PROBABILITY: A FIRST COURSE ∞ IN EXPECTATION AND VARIANCE OF
E(Z) =
PROBABILITY THEORY – VOLUME III H(x, y) f (x, y) dx dy BIVARIATE DISTRIBUTIONS
−∞ −∞
Theorem 3.4
Let (X, Y ) be a two-dimensional random variable with probability
density function f (x, y) and let H(X, Y ) = XY . Then expectation
of the product of X and Y is
∞ ∞
E(XY ) = xy f (x, y) dx dy
−∞ −∞
Example 3.7
Refer to Example 1.3. Find E(XY ).
Solution
2 1
xy
E(XY ) = xy x + 2
dx dy
0 3
2 0
1 x2y2
= x3 y + dx dy
0 0 3
2
y y2
= + dy
0 4 9
4 8 43
= + =
8 27 54
careers.slb.com/recentgraduates
88
ADVANCED TOPICS
Expectation IN INTRODUCTORY
and Variance of Bivariate Distributions 93
PROBABILITY: A FIRST COURSE IN EXPECTATION AND VARIANCE OF
PROBABILITY THEORY – VOLUME III BIVARIATE DISTRIBUTIONS
Definition 3.3
Let (X, Y ) be a two-dimensional continuous random variable with
probability density function f (x, y). Then the expectation of the
marginal distributions of X and Y are given by
∞
E(X) = x g(x) dx
−∞
∞
E(Y ) = y h(y) dy
−∞
where g(x) and h(y) are the marginal probability density function of
X and Y respectively
Example 3.8
Refer to Example 1.3. Find (a) E(X), (b) E(Y )
Solution
To find the expectation of a random variable from a bivariate distribution
of X and Y we require the marginal probability distributions of X and Y .
The marginal probability distribution of X for Example 1.3 was found
in Example 1.10 to be
2
2 x2 + x, 0 < x < 1
g(x) = 3
0, elsewhere
and that of Y to be
1 y
+ , 0<x<2
h(y) = 3 6
0, elsewhere
Similar to the univariate case, we shall assume in our discussions of the prop-
erties of the expectation of bivariate random variables that both variables
have finite expectations.
89
Similar to the univariate case, we shall assume in our discussions of the prop-
erties of the expectation of bivariate random variables that both variables
have finite expectations.
Theorem 3.5
If X1 , X2 , ..., Xn have the same distribution, then they possess a com-
mon expectation E(Xi ) = µ
Note
The expression “have the same distribution” is equivalent to the expression
“are identically distributed”. Identical distribution means that the probabil-
ity density function or the probability mass function of the random variable
remains the same from trial to trial.
For example, X and Y are identically distributed if their probability
distribution functions are
Theorem 3.6
Let X and Y be any two random variables with expectation E(X)
and E(Y ), respectively. The expectation of their sum is the sum
of their expectations. That is,
Proof
We shall first prove for discrete case.
n
m
E(X + Y ) = (xi + yj ) p(xi , yj )
i=1 j=1
n m n
m
= xi p(xi , yj ) + yj p(xi , yj )
i=1 j=1 i=1 j=1
n m m
n
= xi p(xi , yj ) + yj p(xi , yj )
i=1 j=1 j=1 i=1
n
m
= xi p(xi ) + yj p(xi )
i=1 j=1
(following from the logic of the proof of Theorem 3.3)
= E(X) + E(Y )
90
= n m xi p(xi , yj ) + n m yj p(xi , yj )
= i=1 j=1 xi p(xi , yj ) + i=1 j=1 yj p(xi , yj )
ADVANCED TOPICS INn INTRODUCTORY
m m j=1 n
i=1 j=1
i=1
PROBABILITY: A FIRST COURSE IN EXPECTATION AND VARIANCE OF
= n xi m p(xi , yj ) + m yj n p(xi , yj )
PROBABILITY THEORY – VOLUME III BIVARIATE DISTRIBUTIONS
= i=1 xi j=1 p(xi , yj ) + j=1 yj i=1 p(xi , yj )
i=1
n j=1 m j=1 i=1
= n xi p(xi ) + m yj p(xi )
= i=1 xi p(xi ) + j=1 yj p(xi )
i=1(following from
j=1 the logic of the proof of Theorem 3.3)
(following
= E(X) + E(Y from
) the logic of the proof of Theorem 3.3)
= E(X) + E(Y )
Solution
(a) The distribution of X+Y from the parent joint probability distribution
table is reproduced from the solution for Example 2.2, taking note that
the values for x + y are those in parenthesis:
PREPARE FOR A
LEADING ROLE.
Y English-taught MSc programmes
X −1 0 1 in engineering: Aeronautical,
Row Totals
−1 0.10 0.20 0.11 0.41 Biomedical, Electronics,
(-2) (-1) (0) Mechanical, Communication
0 0.08 0.02 0.26 0.36 systems and Transport systems.
(-1) (0) (1) No tuition fees.
2 0.03 0.17 0.03 0.23
(1) (2) (3)
Column Totals 0.21 0.39 0.40 1.00
E liu.se/master
To obtain E(X + Y ), we multiply the values for x + y by their corre-
sponding probabilities and obtain the following table:
X −1
Y
0 1
‡
−1 -0.20 -0.20 0.00
0 -0.08 0.00 0.26
2 0.03 0.34 0.09
91
(a) usingTOPICS
ADVANCED the parent joint probability distribution table;
IN INTRODUCTORY
PROBABILITY: A FIRST COURSE IN EXPECTATION AND VARIANCE OF
(b) using the
PROBABILITY derived
THEORY joint probability
– VOLUME III distribution table. BIVARIATE DISTRIBUTIONS
Solution
(a) The distribution of X+Y from the parent joint probability distribution
table is reproduced from the solution for Example 2.2, taking note that
the values for x + y are those in parenthesis:
Y
X −1 0 1 Row Totals
−1 0.10 0.20 0.11 0.41
(-2) (-1) (0)
0 0.08 0.02 0.26 0.36
(-1) (0) (1)
2 0.03 0.17 0.03 0.23
(1) (2) (3)
Column Totals 0.21 0.39 0.40 1.00
Y
X −1 0 1
−1 -0.20 -0.20 0.00
Expectation and Variance of 0Bivariate
-0.08 Distributions
0.00 0.26 97
2 0.03 0.34 0.09
Hence,
n
m
E(X + Y ) = (xi + yj ) p(xi , yj )
i=1 j=1
= −0.20 + (−0.20) + 0.00 + (−0.08) · · · + 0.34 + 0.09
= 0.24
(x + y) −2 −1 0 1 2 3
p(x + y) 0.1 0.28 0.13 0.29 0.17 0.03
(x + y) p(x + y) −0.2 −0.28 0 0.29 0.34 0.09
Hence,
E(X + Y ) = (xi + yj ) p(xi + yj )
= −0.2 + (−0.28) + 92
0 + 0.29 + 0.34 + 0.09
= 0.24
have the following table:
(x + y)IN INTRODUCTORY
ADVANCED TOPICS −2 −1 0 1 2 3
PROBABILITY: A FIRST COURSE IN EXPECTATION AND VARIANCE OF
p(x + y) 0.1 0.28 0.13 0.29 0.17 0.03
PROBABILITY THEORY – VOLUME III BIVARIATE DISTRIBUTIONS
(x + y) p(x + y) −0.2 −0.28 0 0.29 0.34 0.09
Hence,
E(X + Y ) = (xi + yj ) p(xi + yj )
= −0.2 + (−0.28) + 0 + 0.29 + 0.34 + 0.09
= 0.24
Corollary 3.2
Let X1 , X2 , ..., Xn be n random variables, where n is finite. Then the ex-
pectation of the sum Sn = X1 + X2 + ... + Xn is the sum of the expectations
of the individual random variables. Thus,
Corollary 3.3
If X1 , X2 , ..., Xn have the same distribution with E(Xi ) = µ for all 1 ≤ i ≤ n,
then the expectation of their sum Sn = X1 + X2 + ... + Xn is
Proof
Since X1 , X2 , ..., Xn have the same distribution, from Theorem 3.3, they
have a common expectation:
Example 3.10
Refer to Example 2.13. Find E(X + Y ).
Solution
Theorem 3.7
Let X and Y be any two random variables with expectations E(X)
and E(Y ) respectively. The expectation of their difference is the
difference of their expectations. That is,
Theorem 3.7
Let X and Y be any two random variables with expectations E(X)
and E(Y ) respectively. The expectation of their difference is the
difference of their expectations. That is,
Example 3.11
For the table in Example 2.2, verify that
Solution
The distribution of X − Y is given in Example 2.8. Hence, to obtain
E(X + Y ), we have the following table:
x−y −2 −1 0 1 2 3
p(x − y) 0.11 0.46 0.12 0.11 0.17 0.03
(x − y) p(x − y) −0.22 −0.46 0 0.11 0.34 0.09
Hence,
TAKE THE
E(X) − E(Y ) = 0.05 − 0.19 = −0.14
RIGHT TRACK
from Example 3.4.
Property 4
94
E(X + Y ), we have the following table:
ADVANCED
x −TOPICS
y IN INTRODUCTORY
−2 −1 0 1 2 3
PROBABILITY: A FIRST COURSE IN EXPECTATION AND VARIANCE OF
p(x − y) 0.11
PROBABILITY THEORY – VOLUME III
0.46 0.12 0.11 0.17 0.03 BIVARIATE DISTRIBUTIONS
(x − y) p(x − y) −0.22 −0.46 0 0.11 0.34 0.09
Hence,
Property 4
Theorem 3.8
If a and b are constants, then for any random variables X and Y
Corollary 3.4
Let X1 , X2 , ..., Xn be n random variables. Then
n
100 E(c1 X1 + c2 X2Advanced
+ ... + cn Topics
Xn ) = in Introductory
ci E(Xi ) Probability
i=1
The proofs of Theorem 3.8 and its corollary are similar to the proofs of
Theorem 3.6 and its corollaries.
Note
Theorems 3.6, 3.7 and 3.8 and their corollaries hold whether or not the
random variables involved are independent.
Example 3.12
For the data of Example 3.1, if a = 3 and b = 4, verify whether Property 4
is valid.
Solution
From Example 3.1,
Now,
or
3x + 4y 6 18 24 36 95
p(x + y) 0.28 0.12 0.42 0.18
Solution
From Example
ADVANCED 3.1,
TOPICS IN INTRODUCTORY
PROBABILITY: A FIRST COURSE IN EXPECTATION AND VARIANCE OF
PROBABILITY THEORY3E(X) + 4E(Y
– VOLUME III ) = 3(1.6) + 4(3.9) BIVARIATE DISTRIBUTIONS
= 20.4
Now,
or
3x + 4y 6 18 24 36
p(x + y) 0.28 0.12 0.42 0.18
so that
The last term appears sufficiently often to deserve a separate name and
The last term
treatment. It is appears sufficiently
called the covarianceoften to deserve
of X and a separate
Y and denoted name and
as Cov(X, Y ).
The last term
treatment. It is appears
called sufficiently
the covariance often
of X to deserve
and Y and a separate
denoted as name and
Cov(X, Y ).
The concept of covariance is discussed in detail in the next chapter.
treatment. It
last term
The concept is called
of appears the covariance
sufficiently
covariance of X
often
is discussed and Y and
to deserve
in detail denoted
in theanext as
separate Cov(X, Y
name and
chapter. ).
The concept
treatment. of covariance is discussed in detail in the next chapter.
It is called the covariance of X and Y and denoted as Cov(X, Y ).
Example 3.13
For the table of
The concept
Example 3.13 incovariance is discussed
Example 2.2, obtain E{[Xin detail in the next
− E(X)][Y − E(Ychapter.
)]}.
Example 3.13
For the table in Example 2.2, obtain E{[X − E(X)][Y − E(Y )]}.
For the table
Example 3.13 in Example 2.2, obtain E{[X − E(X)][Y − E(Y )]}.
Solution
Forhas
thebeen
Solution
It tableproved
in Example 2.2, obtain
in Theorem 4.1 inE{[X that − E(Y )]}.
− E(X)][Y
the sequel
Solution
It has been proved in Theorem 4.1 in the sequel that
It has been proved in Theorem 4.1 in the sequel that
Solution
E{[X − E(X)][Y − E(Y )]} = E(XY ) − E(X)E(Y )
It has been E{[X
proved−inE(X)][Y
Theorem − 4.1
E(Yin)]}the
= sequel
E(XY )that− E(X)E(Y )
E{[X − E(X)][Y − E(Y )]} = E(XY ) − E(X)E(Y )
From Examples 3.3,
From Examples − E(X)][Y − E(Y )]} = E(XY ) − E(X)E(Y )
E{[X3.3,
From Examples 3.3, E(XY ) = −0.01
E(XY ) = −0.01
From Examples 3.3, E(XY ) = −0.01
From Examples 3.4,
From Examples 3.4, E(XY ) = −0.01
From Examples 3.4,
E(X) = 0.05 and E(Y ) = 0.19
From Examples 3.4, E(X) = 0.05 and E(Y ) = 0.19
E(X) = 0.05 and E(Y ) = 0.19
Hence, E(X) = 0.05 and E(Y ) = 0.19
Hence,
Hence,
E{[X − E(X)][Y − E(Y )]} = E(XY ) − E(X)E(Y )
Hence, E{[X − E(X)][Y − E(Y )]} = E(XY ) − E(X)E(Y )
E{[X − E(X)][Y − E(Y )]} = E(XY +
= −0.01 )−0.05(0.19)
E(X)E(Y )
= −0.01 + 0.05(0.19)
E{[X − E(X)][Y − E(Y )]} = −0.01 = −0.0005
E(XY + )− E(X)E(Y )
0.05(0.19)
= −0.0005
−0.01 + 0.05(0.19)
= −0.0005
= −0.0005
96
ADVANCED TOPICS IN INTRODUCTORY
PROBABILITY: A FIRST COURSE IN EXPECTATION AND VARIANCE OF
PROBABILITY THEORY – VOLUME Advanced
102 III BIVARIATE DISTRIBUTIONS
Topics in Introductory Probability
Theorem 3.10
Let X and Y be two independent random variables with expectations
E(X) and E(Y ) respectively. Then the expectation of their product
is equal to the product of their expectations:
E(XY ) = E(X)E(Y )
Note
The converse of Theorem 3.10 does not hold, that is, the random variables
X and Y may satisfy the relation E(XY ) = E(X)E(Y ) without being
independent. How will people travel in the future, and
how will goods be transported? What re-
Example 3.14 sources will we use, and how many will
For the data in Example 2.3, verify whether we need? The passenger and freight traf-
fic sector is developing rapidly, and we
E(XY ) = E(X)E(Y ) provide the impetus for innovation and
movement. We develop components and
Solution systems for internal combustion engines
The distribution of XY is given in Example 2.12. Hence, to obtain
that operate more cleanly and more ef-
E(XY ), we have the following table: ficiently than ever before. We are also
pushing forward technologies that are
bringing hybrid vehicles and alternative
drives into a new dimension – for private,
corporate, and public use. The challeng-
es are great. We deliver the solutions and
offer challenging jobs.
www.schaeffler.com/careers
97
= xi g(xi ) yj h(yj )
i=1 j=1
ADVANCED TOPICS IN INTRODUCTORY
= E(X)E(Y
PROBABILITY: A FIRST COURSE IN ) EXPECTATION AND VARIANCE OF
PROBABILITY THEORY – VOLUME III BIVARIATE DISTRIBUTIONS
The proof of the bivariate continuous case is similar.
Note
The converse of Theorem 3.10 does not hold, that is, the random variables
X and Y may satisfy the relation E(XY ) = E(X)E(Y ) without being
independent.
Example 3.14
For the data in Example 2.3, verify whether
E(XY ) = E(X)E(Y )
Solution
The distribution
Expectation and of XY is of
Variance given in Example
Bivariate 2.12. Hence, to obtain
Distributions 103
E(XY ), we have the following table:
xy −6 −12 12 24
p(xy) 0.28 0.12 0.42 0.18
xy p(xy) -1.68 -1.44 5.08 4.32
Hence,
It has been shown in Example 2.3 that X and Y are independent. Hence,
E(X)E(Y ) = 6.24
Corollary 3.5
Let X1 , X2 , · · · , Xn be independent random variables. Then
Proof
We shall demonstrate for the case of a three random variables X, Y and Z.
The product of three random variables X Y Z may be written as (XY )Z.
Using Theorem 3.10, we obtain:
98
ADVANCED TOPICS IN INTRODUCTORY
PROBABILITY: A FIRST COURSE IN EXPECTATION AND VARIANCE OF
PROBABILITY
104 THEORY – VOLUME Advanced
III BIVARIATE DISTRIBUTIONS
Topics in Introductory Probability
Theorem 3.11
Let X and Y be two random variables with expectations E(X) and
E(Y ) respectively. Then the expectation of their quotient is
X E(X) 1 E(X)
E ≈ − Cov(X, Y ) + Var(Y )
Y E(Y ) E(Y )2 E(Y )3
Note
X
The expectation of the quotient may not exist even though the moments
Y
of X and Y exist.
Theorem 3.12
Suppose X and Y are two random variables with expectations E(X)
and E(Y ) respectively. If X ≤ Y, then
E(X) ≤ E(Y )
Proof
We write
Y = X + (Y − X)
where Y − X ≥ 0. By Theorem 3.5, we have
Expectation and Variance of Bivariate Distributions 105
E(Y ) = E(X) + E(Y − X) ≥ E(X)
Theorem 3.13
Suppose X and Y are bivariate random variables with finite pth and
q th order moments respectively, where p and q are positive numbers
1 1
satisfying + = 1. Then XY has a finite first moment and
p q
1 1
E(|XY |) ≤ E(|X|p ) p E(|Y |q ) q
The equality holds if and only if E(|Y |q ) = c E(|X|p ) for some constant c
or X = 0.
99
Property 10 Cauchy-Schwartz’ Inequality of Expectation
p q
1 1
ADVANCED TOPICS IN INTRODUCTORY
E(|XY |) ≤ E(|X|p ) p E(|Y |q ) q
PROBABILITY: A FIRST COURSE IN EXPECTATION AND VARIANCE OF
PROBABILITY THEORY – VOLUME III BIVARIATE DISTRIBUTIONS
The equality holds if and only if E(|Y |q ) = c E(|X|p ) for some constant c
or X = 0.
Theorem 3.14
Suppose X and Y are two random variables with second moments
E(X 2 ) and E(Y 2 ) respectively. Then E(XY ) exists and
Proof
Case 1
If E(X 2 ) or E(Y 2 ) is infinite, the theorem is valid.
Case 2
If E(X 2 ) = 0 then we must have P (X = 0) = 1 so that P (XY = 0) = 1
and E(XY ) = 0, hence again the theorem is valid. The same argument
applies if E(Y 2 ) = 0, so that we may assume that 0 < E(X 2 ) < ∞ and
0 < E(Y 2 ) < ∞.
Case 3
Suppose we have a random variable
Zt = tX + Y
678'<)25<2850$67(5©6'(*5((
&KDOPHUV8QLYHUVLW\RI7HFKQRORJ\FRQGXFWVUHVHDUFKDQGHGXFDWLRQLQHQJLQHHU
LQJDQGQDWXUDOVFLHQFHVDUFKLWHFWXUHWHFKQRORJ\UHODWHGPDWKHPDWLFDOVFLHQFHV
DQGQDXWLFDOVFLHQFHV%HKLQGDOOWKDW&KDOPHUVDFFRPSOLVKHVWKHDLPSHUVLVWV
IRUFRQWULEXWLQJWRDVXVWDLQDEOHIXWXUH¤ERWKQDWLRQDOO\DQGJOREDOO\
9LVLWXVRQ&KDOPHUVVHRU1H[W6WRS&KDOPHUVRQIDFHERRN
100
Case 2
If E(X 2 ) = 0 then we must have P (X = 0) = 1 so that P (XY = 0) = 1
ADVANCED TOPICS IN INTRODUCTORY
and E(XY ) A=FIRST
PROBABILITY: 0, hence again
COURSE IN the theorem is valid. The sameEXPECTATION
argument AND VARIANCE OF
applies if E(Y
PROBABILITY 2 ) = 0,
THEORY so that IIIwe may assume that 0 < E(X 2 ) < ∞ BIVARIATE
– VOLUME and DISTRIBUTIONS
0 < E(Y ) < ∞.
2
Case 3
Suppose we have a random variable
106 Advanced Topics in Introductory Probability
Zt = tX + Y
Let us consider the real valued function
E(Zt )2 = E(tX + Y )2
For every t we have
E(Zt )2 = E(tX + Y )2 ≥ 0
since
(tX + Y )2 ≥ 0
so that
E(t2 X 2 + 2 tXY + Y 2 ) ≥ 0 (i)
The term on the left hand side of (i) is a quadratic function in t, which is
always nonnegative for all t. Hence, it does not have more than one real
root. Its discriminant must, therefore, be less than or equal to zero. The
discriminant “b2 − 4ac” for the quadratic equation (i) in t is
[2 E(XY )]2 − 4 E(X 2 )E(Y 2 ) ≤ 0
or
4 [E(XY )]2 ≤ 4 E(X 2 )E(Y 2 )
from which the result follows.
Note
(a) Equality of Theorem 3.14 holds if and only if one of the random vari-
ables equals a constant multiple of the other, say, Y = t X for some
constant t or, at least one of them is non-zero, say, X = 0;
(b) This inequality is sometimes simply called Schwartz’ inequality;
(c) Cauchy-Schwartz’ inequality is a special case of Höder’s inequality
when p = q = 2.
Theorem 3.15
Suppose X and Y are two random variables with finite pth order
moments. Then for p ≥ 1
1 1 1
[E(|X + Y |p )] p ≤ E(|X|p ) p + E(|Y |p ) p
Expectation and Variance of Bivariate Distributions 107
Corollary 3.6
Suppose X and Y are two random variables with expectations E(X) and
E(Y ) respectively. Then
The proof follows immediately from Theorem 3.14 for the case when p = 1.
Theorem 3.16
101
Suppose the random variable M is the frequency of occurrence of the
E(Y ) respectively. Then
ADVANCED TOPICS IN INTRODUCTORY
E(|X + Y |) ≤
PROBABILITY: A FIRST COURSE IN
E(|X|) + E(|Y |) EXPECTATION AND VARIANCE OF
PROBABILITY THEORY – VOLUME III BIVARIATE DISTRIBUTIONS
The proof follows immediately from Theorem 3.14 for the case when p = 1.
Theorem 3.16
Suppose the random variable M is the frequency of occurrence of the
event A (number of successes) in n independent and identical trials.
Then
E(M ) = np
Proof
Let X1 , X2 , · · · , Xn be n independent random variables which are identically
distributed. Suppose
1, if event A occurs in the ith trial,
Xi =
0, otherwise
E(Xi ) = pi = p
Letting
M = X1 + X2 + · · · + Xn
we have
Solution
Let A = {the occurrence of red balls}. Then
20 2
p = P (A) = =
50 5
Hence the expectation of the frequency of occurrence of the red balls is
E(M ) = np
2
= 25 = 10
5
That is, if we select 25 balls from a box containing 20 black, 20 red
and 10 green balls at random with replacement, we are likely to have the
red ball appearing 10 times.
Theorem 3.17
M
Let the random variable be the relative frequency of the event
n
A (or proportion of success) among the n independent and identical
trials of experiment E. Then the expectation of the relative frequency
102
equals the probability of the event:
2
= 25 = 10
5
ADVANCED TOPICS IN INTRODUCTORY
That is,AifFIRST
PROBABILITY: we select 25 INballs from a box containing 20 black,
COURSE 20 red AND VARIANCE OF
EXPECTATION
and 10 greenTHEORY
PROBABILITY balls at– VOLUME
random IIIwith replacement, we are likely to haveBIVARIATE
the DISTRIBUTIONS
red ball appearing 10 times.
Theorem 3.17
M
Let the random variable be the relative frequency of the event
n
A (or proportion of success) among the n independent and identical
trials of experiment E. Then the expectation of the relative frequency
equals the probability of the event:
M
E =p
n
Proof
Thus, using Property 4 of expectation (Theorem 3.7)
M1
E E(M )=
n
n
1
= (n p) (from Theorem 3.15)
Expectation and Variance of Bivariate
n Distributions 109
= p
The above theorem says that the expected relative frequency of the event A
is p, where p = P (A). This is intuitively clear and it establishes a connection
between the relative frequency of an event and the probability of that event.
Example 3.16
Refer to Example 3.15. Find the expectation of the relative frequency of the
occurrence of the red balls.
Scholarships
Solution
From Example 3.15,
2
p=
5
so that
Open your mind to 3
new opportunities
q = P (A) =
5
With 31,000 students, Linnaeus University is
one of the larger universities in Sweden. We
Hence,are the expectation
a modern university,of the for
known relative frequency of occurrence of the red
our strong
balls international
is profile. Every year more than Bachelor programmes in
1,600 international students fromall over
the Business & Economics | Computer Science/IT |
world choose to enjoy the friendly atmosphere
M 2 Design | Mathematics
E =
and active student life at Linnaeus University.
p =
n 5 Master programmes in
Welcome to join us! Business & Economics | Behavioural Sciences | Computer
Science/IT | Cultural Studies & Social Sciences | Design |
Mathematics | Natural Sciences | Technology & Engineering
3.3 VARIANCE OF BIVARIATE RANDOM VARIABLES
Summer Academy courses
Example 3.16
Refer to Example 3.15. Find the expectation of the relative frequency of the
occurrence of the red balls.
Solution
From Example 3.15,
2
p=
5
so that
3
q = P (A) =
5
Definition 3.4
Let (X, Y ) be a two-dimensional discrete random variable with prob-
ability mass function p(xi , yj ).
If H(X, Y ) = (X − µX )2 , then
n
m
Var(X) = (X − µX )2 p(xi , yj )
i=1 j=1
If H(X, Y ) = (Y − µY )2 , then
n
m
Var(Y ) = (Y − µY )2 p(xi , yj )
i=1 j=1
Example 3.17
For the data in Example 2.3, find the variance of the (a) X (b) Y .
104
Solution
Var(Y ) =i=1 j=1 (Y − µY )2 p(xi , yj )
i=1 j=1
where µTOPICS
ADVANCED
X = and Y = E(Y )
IN INTRODUCTORY
E(X) µ
where µX A=FIRST
PROBABILITY: E(X) and µYIN= E(Y )
COURSE EXPECTATION AND VARIANCE OF
PROBABILITY THEORY – VOLUME III BIVARIATE DISTRIBUTIONS
Example 3.17
Example 3.17in Example 2.3, find the variance of the
For the data (a) X (b) Y .
For the data in Example 2.3, find the variance of the (a) X (b) Y .
Solution
Solution
(a) From Example 3.1, E(X) = 1.6. Hence,
(a) From Example 3.1, E(X) = 1.6. Hence,
Var(X) = (x1 − µX )2 p(x1 , y1 ) + (x1 − µX )2 p(x1 , y2 )
Var(X) = (x1 − µX )2 p(x 1 , y1 ) + (x1 − µX ) p(x
2
1 , y2 )
+(x2 − µX )2 p(x 2 , y1 ) + (x2 − µX )2 p(x 2 , y2 )
+(x2 − µX2) p(x2 , y1 ) + (x2 − µ2X ) p(x2 , y2 )
2 2
= (−2 − 1.6) (0.28) + (−2 − 1.6) (0.12) + (4 − 1.6)2 (0.42)
= (−2 − 1.6)22 (0.28) + (−2 − 1.6)2 (0.12) + (4 − 1.6)2 (0.42)
+(4 − 1.6) (0.18)
+(4 − 1.6)2 (0.18)
= 8.64
= 8.64
(b) From Example 3.1, E(Y ) = 3.9. Hence,
(b) From Example 3.1, E(Y ) = 3.9. Hence,
Var(Y ) = (y1 − µY )2 p(x1 , y1 ) + (y1 − µY )2 p(x1 , y2 )
Var(Y ) = (y1 − µY )2 p(x 1 , y1 ) + (y1 − µY ) p(x
2
1 , y2 )
+(y2 − µY )2 p(x 2 , y 1 ) + (y 2 − µY )2 p(x 2 , y2 )
+(y2 − µ2Y )2 p(x2 , y1 ) + (y22 − µY )2 p(x2 , y2 ) 2
= (3 − 3.9) (0.7) + (6 − 3.9) (0.12) + (3 − 3.9) (0.42)
Expectation and= Variance
(3 − 3.9)of
2
(0.7) + (6 −Distributions
3.9)2 (0.12) + (3 − 3.9)2 (0.42) 111
+(6 − 3.9)2Bivariate
(0.18)
Expectation and Variance of 2Bivariate
+(6 − 3.9) (0.18) Distributions 111
= 2.23
= 2.23
Definition 3.5
Definition 3.5a two-dimensional continuous random variables with
Let (X, Y ) be
Let (X, Y ) be a two-dimensional
probability density function f (x, y). continuous random variables with
probability density
If H(X, Y ) = (X − function
µX ) , then
2 f (x, y).
If H(X, Y ) = (X − µX )2, then
∞ ∞
Var(X) = ∞ ∞ (X − µX )2 p(xi , yj )
Var(X) = −∞ −∞ (X − µX )2 p(xi , yj )
−∞ −∞
If H(X, Y ) = (Y − µY )2 , then
If H(X, Y ) = (Y − µY )2 ,then
∞ ∞
Var(Y ) = ∞ ∞ (Y − µY )2 p(xi , yj )
Var(Y ) = −∞ −∞ (Y − µY )2 p(xi , yj )
−∞ −∞
Definition 3.6
Definition
Let (X, Y ) be3.6a two-dimensional discrete random variable with prob-
Let (X,mass
ability Y ) befunction
a two-dimensional discrete
p(x, y). Then randomof
the variance variable with prob-
the marginal dis-
ability mass function p(x, y). Then
tributions of X and Y are given by the variance of the marginal dis-
tributions of X and Y are given by
Var(X) = (x − µX )2 p(x)
Var(X) = all x(x − µX )2 p(x)
x
all
Var(Y ) = (y − µY )2 p(y)
Var(Y ) = all y(y − µY )2 p(y)
all y
where p(x) and p(y) are the marginal probability mass function of
where
X and p(x) and p(y) are the marginal probability mass function of
Y , respectively
X and Y , respectively
Example 3.18
Example 3.18in Example 2.3, find (a) Var(X) (b) Var(Y ).
For the data
For the data in Example 2.3, find (a) Var(X)
105 (b) Var(Y ).
Solution
Y
all y
ADVANCED TOPICS IN INTRODUCTORY
where p(x)
PROBABILITY: and p(y)
A FIRST are IN
COURSE the marginal probability mass function of
EXPECTATION AND VARIANCE OF
X and Y THEORY
PROBABILITY , respectively
– VOLUME III BIVARIATE DISTRIBUTIONS
Example 3.18
For the data in Example 2.3, find (a) Var(X) (b) Var(Y ).
Solution
112 Var(X) = Advanced
(x Topics in Introductory Probability
− µX )2 p(x)
all x
Definition 3.7
Let (X, Y ) be a two-dimensional continuous random variable with
probability density function f (x, y). Then the variance of the
marginal distributions of X and Y are given by
∞
Var(X) = (x − µX )2 g(x) dx
−∞
∞
Var(Y ) = (y − µY )2 h(y) dy
−∞
where g(x) and h(y) are the marginal probability density function of
X and Y respectively
Example 3.19
Refer to Example 1.3. Find (a) Var(X) (b) Var(Y ).
106
Var(Y ) = (y − µY )2 h(y) dy
−∞
ADVANCED TOPICS IN INTRODUCTORY
where g(x)
PROBABILITY: and h(y)
A FIRST are the
COURSE IN marginal probability density function of
EXPECTATION AND VARIANCE OF
X and Y THEORY
PROBABILITY respectively
– VOLUME III BIVARIATE DISTRIBUTIONS
Solution
13
(a) From Example 3.6, E(X) = .
18
Now
1
2
E(X ) =
2
x2
2x + x 2
dx
0 3
1
2
= 2 x + x3
4
dx
0 3
1
2 x5 2 x4
= +
5 12 0
2 2
= +
5 12
17
=
30
Therefore 2
17 13
Var(X) = − = 0.04506
30 18
10
(b) From Example 3.5, E(Y ) = .
9
Now
2
1 y
E(Y ) = 2
y 2
+ dy
0 3 6
2
1 y3
= y + 2
dy
0 3 6
2
y3 y4
= +
9 24 0
8 16
= +
9 24
14
=
9 107
Hence
2
1 y3
= y +2
dy
ADVANCED TOPICS IN INTRODUCTORY 0 3 6
PROBABILITY: A FIRST COURSE IN EXPECTATION AND VARIANCE OF
PROBABILITY THEORY – VOLUME III
2 BIVARIATE DISTRIBUTIONS
y3 y4
= +
9 24 0
8 16
= +
9 24
14
=
9
Hence
14 10 2
Var(Y ) = − = 0.32099
114 9 9 Topics in Introductory Probability
Advanced
Theorem 3.18
If the random variables X1 , X2 , ..., Xn have the same distribution,
then they all have the same variance σ 2 , that is,
Var(Xi ) = σ 2 , 1≤i≤n
Theorem 3.19
If (X,Y) are two dimensional random variables, then
Proof
We shall give the proof for the case X + Y . The proof for the case X − Y
should be tried by the reader (see Exercise 3.13).
Var(X + Y ) = E{[X + Y − E(X + Y )]2 }
= E{[X + Y − E(X) − E(Y )]2 }
= E{([X − E(X)] + [Y − E(Y )])2 }
= E{(X − E(X))2 + (Y − E(Y ))2
+2(X − E(X))(Y − E(Y ))}
= E{X − E(X)}2 + E{Y − E(Y )}2
+2{E(X − E(X))(Y − E(Y ))}
= Var(X) + Var(Y ) + 2 Cov(X, Y )
108
Expectation and Variance of Bivariate Distributions 115
Expectation
ADVANCED TOPICS
Expectation and
and Variance of
of Bivariate
IN INTRODUCTORY
Variance Bivariate Distributions
Distributions 115
115
Expectation
PROBABILITY: and A FIRSTVariance
COURSE of IN
Bivariate Distributions 115 AND VARIANCE OF
EXPECTATION
Expectation and Variance of Bivariate Distributions 115
PROBABILITY
Expectation THEORY – VOLUME III BIVARIATE
115 DISTRIBUTIONS
Example 3.20and Variance of Bivariate Distributions
Example
Example 3.20
If Var(X)3.20
Example = 8, Var(Y ) = 5 and Cov(X, Y ) = 3 find
3.20
If Var(X)
Example
If Var(X) =
3.20
= 8, Var(Y )) =
=5 5 and Cov(X, Y Y )) = = 33 find
(a)
If Var(X3.20
Var(X)
Example =+ 8,Y ), Var(Y
8, (b) Var(X
Var(Y )=5 −
and
andY )Cov(X,
Cov(X, Y ) = 3 find
find
(a) Var(X
If Var(X)
(a) Var(X = +
+ 8,Y ),
Y ), Var(Y (b) Var(X
)=5 −
(b) Var(X −andY )
Y )Cov(X, Y ) = 3 find
(a) Var(X =
If Var(X) + Y ), Var(Y (b) Var(X Y )Cov(X, Y ) = 3 find
(a) Var(X + 8,
Solution Y ), (b) Var(X )=5 − −andY)
Solution
(a) Var(X + Y ), (b) Var(X − Y )
Solution
Solution
(a) Var(X + Y ) = 8 + 5 + 2(3) = 19
Solution
(a)
(a) Var(X
Solution Var(X + +Y Y )) ==8 8++5 5++ 2(3)
2(3) = = 1919
(a)
(b) Var(X
Var(X − + Y ) =
Y)=8+5− 8 + 5 + 2(3) =
2(3) 719
(a)
(b) + + = 19
(a) Var(X
(b) Var(X − +Y
− Y )) ==8 8++5 5−+ 2(3)
− 2(3) = =7 719
(b) Var(X − Y ) = 8 + 5 − 2(3) = 7
(b) variance
The Var(X −isY not ) = 8additive
+ 5 − 2(3) =7
in general, as the expected value. However,
(b) variance
The Var(X −isY not ) = 8additive
+ 5 − 2(3) in =7
general, as
The
with variance
the additional is not additive in general, as the
the expected
expected value. However,
we obtainvalue. However,
The
with variance
the is notassumption
additional additive in of
assumption of
independence,
general, as the expected
independence, we obtain
Theorem
value.
Theorem
3.20.
However,
3.20.
The
with variance
the additional is notassumption
additive in ofgeneral, as the expected
independence, we obtainvalue.
Theorem However,
3.20.
with
The the additional
variance is not assumption
additive in ofgeneral,
independence,
as the we obtainvalue.
expected Theorem 3.20.
However,
with the
Corollary 3.7 additional assumption of independence, we obtain Theorem 3.20.
Corollary
with the
Corollary 3.7
additional assumption of independence,
3.7Xn are n dimensional random variables, then we obtain Theorem 3.20.
If X1 , X2 , ...,
Corollary 3.7Xn are n dimensional random variables, then
If X ,
Corollary
If X , ...,
3.7X n are n dimensional random variables, then
1 2
X1 , X2 , ...,
If are
n n dimensional random variables, then
Corollary
If
X , X , ...,3.7X
X11 , X22 , ..., X n
are
n n dimensional
n
n
n
random variables,
n
n
n then
n
If X1 , X2 , ...,Var X n are
n Xn =
idimensional
n Var(X i ) + 2 variables,
random n n Cov(X
then i ,, X ),
Var
Var n X
n Xii =
= n Var(X
n Var(Xii ) + 2
) + 2 n
n j=1 n Cov(X Xj ),
n Cov(Xii , Xjj ),
Var
i=1
i=1
= i=1
i=1
n Var(X ) + 2 i=1 n Cov(X , X ),
n X
Var i=1 i=1 Xii = i=1 Var(Xii ) + 2 ni<j
i=1
j=1
j=1 Cov(Xii , Xjj ),
Var i=1 Xi = i=1 Var(Xi ) + 2 i=1i<jj=1 Cov(Xi , Xj ),
i=1 i=1 i=1 i<jj=1
or equivalently, i=1 i<j
or equivalently,
or equivalently,
i=1 i=1 i<jj=1
or equivalently, i<j
or equivalently,
n n n n
n
n Xi =
n
n Var(Xi ) +
n n
or equivalently, Var n n Cov(Xi , Xj ),
Var
Var n Xi = n Var(Xi ) + n n Cov(Xi , Xj ),
i=1 Xi =
n i=1 Var(Xi ) +
n n
i=1 j=1 n Cov(Xi , Xj ),
Var i=1 X = i=1 Var(X ) + i=1 j=1n Cov(X , X ),
Var i=1 Xi = i=1 Var(Xi ) + i=1i=j=1
n i n i n i =j Cov(Xii , Xjj ),
Var i=1 Xi = i=1 Var(Xi ) + i=1i=j=1
i=1 i=1 i=1 j
j=1
j Cov(Xi , Xj ),
or equivalently, i=1 i=j
or equivalently,
or equivalently, i=1 i=1 i =j=1
j
or equivalently, n i=j
or equivalently, n n
n
n Xi =
n
n
n
or equivalently, Var n Xi = n
n Cov(Xi , Xj )
Var
Var n Xi = n j=1 n Cov(Xi , Xj )
n Cov(Xi , Xj )
Var i=1
= i=1
n Cov(X ,X )
i=1 X i=1 j=1
Var i=1 n Xi =
i
n j=1
i=1 Cov(Xii , Xjj )
Var i=1 Xi = i=1 j=1 Cov(Xi , Xj )
i=1 j=1
Property 3 Sum and i=1 Difference of Independent Random
Property
Property 3 Sum 3 Sum and
and Differencei=1of
Difference
i=1
ofj=1Independent
Independent Random Random
Property 3 Sum Variables and Difference of Independent Random
Property 3 Sum Variables
Variables and Difference of Independent Random
Property 3 Sum Variables and Difference of Independent Random
Variables
Theorem 3.20 Variables
Theorem
Theorem 3.20
3.20
If (X, Y ) are
Theorem 3.20two dimensional random variables and if X and Y are
If (X,
Theorem
If Y )
(X, Y ) are are
3.20two
two dimensional
dimensional random random variables
variables and and if
if XX andand Y Y are
are
independent
If (X,
Theorem Y ) are then,
3.20two dimensional random variables and if X and Y are
independent
If (X, Y ) are then,
independent two dimensional random variables and if X and Y are
then,
independent
If (X, Y ) are then, two dimensional random variables and if X and Y are
independent then, Var(X ± Y ) = Var(X) + Var(Y )
independent then,Var(X Var(X ± ±Y Y )) == Var(X)
Var(X) + + Var(YVar(Y ))
Var(X ± Y ) = Var(X) + Var(Y )
Var(X ± Y ) = Var(X) + Var(Y )
Var(X ± Y ) = Var(X) + Var(Y )
109
or equivalently,
ADVANCED TOPICS IN INTRODUCTORY
n n n
PROBABILITY: A FIRST COURSE
IN EXPECTATION AND VARIANCE OF
PROBABILITY THEORYVar– VOLUMEXIII
i = Cov(Xi , Xj ) BIVARIATE DISTRIBUTIONS
i=1 i=1 j=1
Theorem 3.20
If (X, Y ) are two dimensional random variables and if X and Y are
independent then,
Proof
The theorem follows from Theorem 3.19 by noting that Cov(X, Y ) = 0 for
the case when X and Y are independent (see Theorem 4.7 in the sequel.)
Note
It is important to observe that
n n
E Xi = E(Xi )
i=1 i=1
whether or not the Xi ’s are independent but it is generally not the case that
n n
Var Xi = Var(Xi )
i=1 i=1
always.
Example 3.21
For the data in Example 2.3, verify that
(a) Var(X + Y ) = Var(X) + Var(Y );
Solution
It has been shown in Example 2.3 that X and Y are independent, hence:
Now
(xi + yj )2 (−2 + 3)2 (−2 + 6)2 (4 + 3)2 (4 + 6)2
p(xi )p(yj ) 0.4(0.7) 0.4(0.3) 0.6(0.7) 0.6(0.3)
110
ADVANCED TOPICS IN INTRODUCTORY
PROBABILITY: A FIRST COURSE IN EXPECTATION AND VARIANCE OF
PROBABILITY THEORY – VOLUME III BIVARIATE DISTRIBUTIONS
Expectation and Variance of Bivariate Distributions 117
(xi + yj )2 1 16 49 100
p(xi )p(yj ) 0.28 0.12 0.42 0.18
E (X + Y )2 = (x + y)2 p(x)p(y)
= 1(0.28) + 16(0.12) + 49(0.42) + 100(0.18)
= 0.28 + 1.92 + 20.58 + 18
= 40.78
Now
Var(X) = E(X 2 ) − [E(X)]2
From Example 3.1, E(X) = 1.6.
From Example 2.3, we shall find E(X 2 ). Now,
n
E(X ) = 2
x2 p(x)
i=1
= (−2)2 (0.4) + (4)2 (0.6)
= 1.6 + 9.6 = 11.2
Therefore
Var(X) = 11.2 − (1.6)2 = 8.64
Again
118 Advanced Topics in Introductory Probability
Var(Y ) = E(Y 2 ) − [E(Y )]2
E(Y 2 ) = y 2 h(y)
= (3)2 (0.7) + (6)2 (0.3)
= 6.3 + 10.8 = 17.1
Therefore
Var(Y ) = 17.1 − (3.9)2 = 1.89
Hence,
Var(X + Y ) = Var(X) + 111
Var(Y)
y 2 h(y)
E(Y 22)) =
2
E(Y = y 2 h(y)
E(Y ) = 2 y 2 h(y)
E(Y ) = (3) 2 (0.7) + (6)2 (0.3)
ADVANCED TOPICS IN INTRODUCTORY 2 2
= (3) y 2 h(y)
(0.7) + (6) (0.3)
PROBABILITY: A FIRST COURSE IN= (3) (0.7) + (6)2 (0.3)
2 EXPECTATION AND VARIANCE OF
PROBABILITY THEORY – VOLUME = = 6.3
III 6.3
2+ 10.8 = 17.1
(3) +(0.7)
10.8+=(6)
2
17.1(0.3) BIVARIATE DISTRIBUTIONS
= 6.3 + 10.8 = 17.1
Therefore = 6.3 + 10.8 = 17.1
Therefore
Therefore Var(Y
Var(Y )) == 17.1 − (3.9)
(3.9)22 == 1.89
2
Therefore 17.1 − 1.89
Var(Y ) = 17.1 − (3.9)2 = 1.89
Hence,
Hence, Var(Y ) = 17.1 − (3.9) = 1.89
Hence, Var(X
Hence, Var(X + +Y Y )) == Var(X)
Var(X) + + Var(Y)
Var(Y)
Var(X + Y ) = Var(X) + Var(Y)
(b) Var(X + Y ) = Var(X) + Var(Y)
(b) Var
Var [(X Y )]
)] = Y ))22 −
− [E(X Y )]
2 2
[(X − −Y E(X −
= E(X −Y [E(X − −Y )]22
(b) Var [(X − Y )] = E(X − Y )2 − [E(X − Y )]2
(b) Var [(X − Y )] = E(X − Y ) − [E(X − Y )]
Now
Now
Now
Now (xi − yj )22 (−2 − 3)22 (−2 − 6)22 (4 − 3)22 (4 − 6)22
(xi − yj ) 2 (−2 − 3) 2 (−2 − 6) 2 (4 − 3) 2 (4 − 6) 2
(x −
p(x yj )j 2) (−2
)p(y − 3)2 (−2
0.4(0.7) − 6)2 0.6(0.7)
0.4(0.3) (4 − 3)2 0.6(0.3)
(4 − 6)2
)p(y
(xii ii−
p(x yj )j ) (−2 0.4(0.7)
− 3) 0.4(0.3)
(−2 − 6) 0.6(0.7)
(4 − 3) 0.6(0.3)
(4 − 6)
p(xi )p(yj ) 0.4(0.7) 0.4(0.3) 0.6(0.7) 0.6(0.3)
p(xi )p(yj ) 0.4(0.7) 0.4(0.3) 0.6(0.7) 0.6(0.3)
(xi −
(x yj )22 25 64 11 44
i − yj ) 2 25 64
(xi i−
p(x yj )j 2) 0.28
)p(y 25 0.12 64 0.42 1 4
0.18
(xi i−
p(x )p(y
yj )j ) 0.28 25 0.12 64 0.42 1 0.18
4
p(xi )p(yj ) 0.28 0.12 0.42 0.18
p(xi )p(yj ) 0.28 0.12 0.42 0.18
)
2 = (x − y)2 g(x) h(y)
2 2
E(X − Y
E(X − Y ) 2 = (x − y) 2 g(x) h(y)
E(X − Y )2 = = (x − y) 2 g(x) h(y)
E(X − Y ) = 25(0.28) (x − y)
25(0.28) +
+ 64(0.12)
g(x) h(y)+
64(0.12) + 1(0.42)
1(0.42) + + 4(0.18)
4(0.18)
=
= 25(0.28)
7 + 7.68 +
+ 64(0.12)
0.42 + +
0.72 1(0.42) + 4(0.18)
= 725(0.28)
+ 7.68 + + 0.42
64(0.12)
+ 0.72+ 1(0.42) + 4(0.18)
= 3.82
= 7 + 7.68 + 0.42 + 0.72
7 + 7.68 + 0.42 + 0.72
= 3.82
= 3.82
= 3 3.82
xii −
x − yyjj −2
−2 −− 3 −2 −2 − − 66 44 −
− 33 44 −
− 66
xii )p(y
p(x −2 − 3 0.4(0.3)
− yjj ) 0.4(0.7) −2 − 6 0.6(0.7) 4−3 4−6
0.6(0.3)
x i )p(y
p(x − y j ) 0.4(0.7)
−2 − 3 0.4(0.3)
−2 − 6 0.6(0.7) 4−3 0.6(0.3)
4−6
p(xii )p(yjj ) 0.4(0.7) 0.4(0.3) 0.6(0.7) 0.6(0.3)
p(xi )p(yj ) 0.4(0.7) 0.4(0.3) 0.6(0.7) 0.6(0.3)
xii −
x − yyjj −5
−5 −8
−8 11 −2
−2
x
p(x ii )p(yjj )
− y 0.28
−5 0.12
−8 0.421 0.18
−2
xii )p(y
p(x − yjj ) 0.28 −5 0.12 −8 0.42 1 0.18
−2
p(xi )p(yj ) 0.28 0.12 0.42 0.18
p(xi )p(yj ) 0.28 0.12 0.42 0.18
112
ADVANCED TOPICS
Expectation IN INTRODUCTORY
and Variance of Bivariate Distributions 119
PROBABILITY: A FIRST COURSE IN EXPECTATION AND VARIANCE OF
PROBABILITY THEORY – VOLUME III BIVARIATE DISTRIBUTIONS
E [(X − Y )] = (x − y) g(x) h(y)
= −5(0.28) + (−8)(0.12) + 1(0.42) + −2(0.18)
= −1.4 − 0.96 + 0.42 − 0.36
= −2.3
Therefore,
[E(X − Y )]2 = (−2.30)2 = 5.29
Hence,
Var(X − Y ) = 15.82 − 5.29 = 10.53
But
Var(X) + Var(Y ) = 8.64 + 1.89 = 10.53
Hence,
Property 4
Theorem 3.21
If the random variables Xi (i = 1, 2, ..., n) are independent and have
the same variance, then
n
Var Xi = n σ2
i=1
Var(Xi ) = σ 2
120
This theorem follows immediately from Corollary
Advanced Topics in 3.8.
Introductory Probability
Property 5
Theorem 3.22
If (X, Y ) is a two-dimensional random variable, then
Proof
We shall prove for the sum. The proof for the difference is similar.
Var(aX + bY ) = E{[(aX + bY ) − E(aX + bY )]2 }
= E{a[X − E(X)] + b[Y − E(Y )]2 }
= a2 E[X − E(X)]2 + b2 E[Y − E(Y )]2
+2 a b E{[X − E(X)][Y
113
− E(Y )]}
= a Var(X) + b2 Var(Y ) + 2 a b Cov(X, Y )
2
If (X, Y ) is a two-dimensional random variable, then
ADVANCED
Var(aX
TOPICS
±IN
bY ) = a22 Var(X)
INTRODUCTORY
± b22 Var(Y ) ± 2 a b Cov(X, Y )
Var(aX ± bY ) = a Var(X) ± b Var(Y ) ± 2 a b Cov(X, Y )
PROBABILITY: A FIRST COURSE IN EXPECTATION AND VARIANCE OF
where a and
PROBABILITY b are– VOLUME
THEORY constantIII BIVARIATE DISTRIBUTIONS
where a and b are constant
Proof
Proof
We shall prove for the sum. The proof for the difference is similar.
We shall prove for the sum. The proof for the difference is similar.
Var(aX + bY ) = E{[(aX + bY ) − E(aX + bY )]2 }
Var(aX + bY ) = E{[(aX + bY ) − E(aX + bY )]22}
= E{a[X − E(X)] + b[Y − E(Y )]2 }
= E{a[X − E(X)] + b[Y − E(Y )] }
= a22 E[X − E(X)]22 + b22 E[Y − E(Y )]22
= a E[X − E(X)] + b E[Y − E(Y )]
+2 a b E{[X − E(X)][Y − E(Y )]}
+2 a b E{[X − E(X)][Y − E(Y )]}
= a2 Var(X) + b22 Var(Y ) + 2 a b Cov(X, Y )
2
= a Var(X) + b Var(Y ) + 2 a b Cov(X, Y )
More generally, we have the following results for the variance of a linear
More generally, we have the following results for the variance of a linear
combination of random variables.
combination of random variables.
Corollary 3.9
Corollary 3.9
If Xi (i = 1, 2, ..., n) are n random variables and ci a constant associated
If Xi (i = 1, 2, ..., n) are n random variables and ci a constant associated
with ith random variable, then
with ith random
variable,
then
n
n n
n
n n n
n
Var ci Xi = c2i2 V ar(Xi ) + 2 ci cj Cov(Xi , Xj )
Var i=1
ci Xi = i=1
ci V ar(Xi ) + 2 c c
i=1 j=1 i j
Cov(Xi , Xj )
i=1 i=1 i=1 j=1
i<j
i<j
or equivalently,
or equivalently,
n
n
n
n
n n n n
Var ci Xi = c2i2 Var(Xi ) + ci cj Cov(Xi , Xj )
Var i=1
ci Xi = i=1
ci Var(Xi ) + i=1 c c
j=1 i j
Cov(Xi , Xj )
i=1 i=1 i=1 j=1
i=j
i=j
or equivalently,
or equivalently, n
Expectation
Expectation and
and
Variance
Variance n
of Bivariate
n of Bivariate
n
Distributions
n Distributions
n
121
121
Expectation andVar i =
i XBivariate
Variance cof ci cj Cov(Xi , Xj )
Distributions 121
Var i=1 ci Xi = i=1 j=1 ci cj Cov(Xi , Xj )
i=1 i=1 j=1
Example
Example 3.22
3.22
Example
If Var(X) 3.22
= 5, Var(Y )) = 3 and Cov(X, Y ) = 2, find
If
If Var(X) =
Var(X) = 5,
5, Var(Y
Var(Y ) == 33 and
and Cov(X,
Cov(X, YY )) =
= 2,
2, find
find
(a)
(a) Var(4X + 6Y );
(a) Var(4X
Var(4X ++ 6Y
6Y );
);
(b)
(b) Var(4X − 6Y ).
(b) Var(4X
Var(4X −− 6Y
6Y ).
).
Solution
Solution (a)
Solution (a)
(a)
(a)
(a) Var(4X
Var(4X +
+ 6Y
6Y )) =
= 4
4 2 Var(X) + 62 Var(Y ) + 2(4)(6) Cov(X, Y )
2 2
2 Var(X) + 6 2 Var(Y ) + 2(4)(6) Cov(X, Y )
(a) Var(4X + 6Y ) = 4 Var(X) + 6 Var(Y ) + 2(4)(6) Cov(X, Y )
=
= 16(5) + 36(3) + 48(2)
= 16(5)
16(5) ++ 36(3)
36(3) +
+ 48(2)
48(2)
=
= 284
284
= 284
(b)
(b) Var(4X − 6Y )) = 422 Var(X) + 622 Var(Y )) − 2(4)(6) Cov(X, Y)
(b) Var(4X − 6Y ) = 442 Var(X)
Var(4X − 6Y = Var(X) +
+ 662 Var(Y
Var(Y ) −− 2(4)(6)
2(4)(6) Cov(X,
Cov(X, YY ))
=
= 16(5) + 36(3) − 48(2)
= 16(5)
16(5) ++ 36(3) − 48(2)
36(3) − 48(2)
=
= 92
92
= 92
Property
Property 6
Property 66
Theorem 3.23
Theorem 3.23
Theorem
Let X and 3.23
Y be two
two independent random
random variables. Then
Then
Let X and
Let X and YY be
be two independent
independent random variables.
variables. Then
Var(aX ±
Var(aX bY )) =
± bY =aa22 Var(X) + b22 Var(Y )
2 Var(X) + b 2 Var(Y )
Var(aX ± bY ) = a Var(X) + b Var(Y )
where
where a and bb are constants
where aa and
and b are
are constants
constants
Corollary 3.10
Corollary 3.10
Corollary
Let X (i 3.10
= 1, 2, ..., n) be independent random114variables and cci aa constant
Let
Let X
X i (i
i
(i =
= 1,
1, 2, ...,
2,ith
..., n)
n) be
be independent
independent random
random variables
variables and
and cii a constant
constant
associated
i with random variable,
associated with i th random variable, then
th then
Let X and Y be two independent random variables. Then
ADVANCED TOPICS IN INTRODUCTORY 2
Var(aX ± bY ) = a Var(X)
PROBABILITY: A FIRST COURSE IN
+ b2 Var(Y ) EXPECTATION AND VARIANCE OF
PROBABILITY THEORY – VOLUME III BIVARIATE DISTRIBUTIONS
where a and b are constants
Corollary 3.10
Let Xi (i = 1, 2, ..., n) be independent random variables and ci a constant
associated with ith random variable, then
n n
Var ci Xi = c2i Var(Xi )
i=1 i=1
Example 3.23
122 Advanced Topics in Introductory Probability
If Var(X) = 10 and Var(Y ) = 6, find
Solution
(a) Var(3X + 2Y ) = 32 Var(X) + 22 Var(Y )
= 9(10) + 4(6)
= 112
(b) Var(2X + 2Y ) = Var[2(X + Y )]
= 22 Var(X + Y )
= 22 [Var(X) + Var(Y )]
= 4(10 + 6)
= 64
The reader should solve (c) and (d).
Theorem 3.24
WE WILL TURN YOUR CV
Let the random variable M be the frequency of success in n inde- INTO AN OPPORTUNITY
pendent trials, then OF A LIFETIME
Var(M ) = npq
Proof
We found in the proof of Theorem 3.16 that E(Xi ) = p. To find the variance,
we need to find also E(Xi2 ):
x2i 0 1
p(xi ) q p
i )to=
be1(p)
a part+
of0(q) = p brand?
2
E(X
Do you like cars? Would you like a successful
Send us your CV on
As a constructer at ŠKODA AUTO you will put great things in motion. Things that will
so that ease everyday lives of people all around Send us your CV. We will give it an entirely www.employerforlife.com
new new dimension.
Var(Xi ) = E(Xi2 ) − [E(Xi )]2
= p − p = pq2
115
Solution
ADVANCED TOPICS IN INTRODUCTORY
PROBABILITY: A(a)
FIRSTVar(3X
COURSE+IN
2Y ) = 32 Var(X) + 22 Var(Y ) EXPECTATION AND VARIANCE OF
PROBABILITY THEORY – VOLUME III BIVARIATE DISTRIBUTIONS
= 9(10) + 4(6)
= 112
(b) Var(2X + 2Y ) = Var[2(X + Y )]
= 22 Var(X + Y )
= 22 [Var(X) + Var(Y )]
= 4(10 + 6)
= 64
The reader should solve (c) and (d).
Theorem 3.24
Let the random variable M be the frequency of success in n inde-
pendent trials, then
Var(M ) = npq
Proof
We found in the proof of Theorem 3.16 that E(Xi ) = p. To find the variance,
we need to find also E(Xi2 ):
x2i 0 1
p(xi ) q p
Hence
Var(M ) = Var(X1 + X2 + ... + Xn )
= Var(X1 ) + Var(X2 ) + · · · + Var(Xn )
= npq
Example 3.24
Refer to example 3.15, find Var(M ).
Solution
2 3
n = 25, p = , q=
5 5
Var(M ) = npq
2 3
= 25
5 5
= 6
Theorem 3.25
M
Let the random variable be the relative frequency of success in n
n
independent trials, then
M pq
Var =
n n 116
5 5
Var(M ) = npq
ADVANCED TOPICS IN INTRODUCTORY
2 3
PROBABILITY: A FIRST COURSE IN = 25 EXPECTATION AND VARIANCE OF
PROBABILITY THEORY – VOLUME III 5 5 BIVARIATE DISTRIBUTIONS
= 6
Theorem 3.25
M
Let the random variable be the relative frequency of success in n
n
independent trials, then
M pq
Var =
n n
Proof
M 1
Var = Var(M )
n n2
npq
= (by Theorem 3.24)
n2
pq
=
n
Example 3.25
124 to example 3.15, find Var M
Refer Advanced
. Topics in Introductory Probability
n
Solution
2 3
n = 25, p = , q=
5 5
M pq
Var =
n n
2 3
5 5
=
25
= 0.0096
EXERCISES
3.2 Refer to Exercise 1.1. Suppose X and Y are not assumed to be inde-
pendence, find
3.2 3.2
Refer
Refer
to Exercise
to Exercise
1.1.1.1.
Suppose
Suppose X and
X and Y are
Y are notnot assumed
assumed to inde-
to be be inde-
pendence,
pendence, findfind
(a) (a)
Var(X
Var(X
+ Y+) Y ) (b) (b)
Var(X
Var(X
−Y−)Y )
X X
(c) (c) Var(XY
Var(XY ) ) (d) (d)
VarVar
Y Y
(e) (e)
Var(2X
Var(2X
+ 3Y
+ )3Y (f
) ) (f
Var(5X
) Var(5X
− Y−) Y )
(g) (g)
Var[(3X)(4Y
Var[(3X)(4Y
)] )]
Expectation and Varianceof3X
3X
Bivariate Distributions 125
(h) (h)
VarVar
7Y 7Y
(i) Var(X 2 + 5Y )
1
(j) Var(3X) (k) Var(3X − 2Y )
7
(l) Var(Y 2 ) (m) Var(Y 2 − 2Y + 3X)
(n) Var[3(X + Y )]
3.3 Refer to Exercise 3.2 and rework, assuming that X and Y are inde-
pendent.
3.4 A box contains 10 green balls and 15 red balls. 5 balls are randomly
picked from the box with replacement. Find the expectation and vari-
ance of the frequency and relative frequency of the occurrence of green
balls.
3.5 Given a pair of continuous random variables having the joint density
24 x y, x > 0, y > 0 and x + y ≤ 1
f (x, y) =
0, elsewhere
Find
(a) E(X) (b) E(Y ) (c) E(XY )
(d) Var(X) (e) Var(Y ) (f ) E(X + Y )
(g) E(X − Y ) (h) Var(X + Y ) (i) Var(X − Y )
assuming independence.
3.6 Given the joint density
2, 0 < x < 1, 0 < y < 1, x + y < 1
f (x, y) =
0, elsewhere
find
(a) E(X) (b) E(Y) (c) E(XY)
(d) Var(X) (e) Var(Y) (f ) E(X − Y )
(g) E(X + Y ) (h) Var(X + Y ) (i) Var(X − Y )
if independence is assumed.
3.7 Refer to Exercise 1.7. Find
(a) E(X) (b) E(Y) (c) E(XY)
(d) Var(X) (e) Var(Y) (f ) E(X − Y )
(g) E(X + Y ) (h) Var(X + Y ) (i) Var(X − Y )
118
ADVANCED TOPICS IN INTRODUCTORY
PROBABILITY: A FIRST COURSE IN EXPECTATION AND VARIANCE OF
PROBABILITY
126 THEORY – VOLUME III
Advanced BIVARIATE DISTRIBUTIONS
Topics in Introductory Probability
119
ADVANCED TOPICS IN INTRODUCTORY
PROBABILITY: A FIRST COURSE IN
PROBABILITY THEORY – VOLUME III Measures of Relationship of Bivariate Distributions
Chapter 4
MEASURES OF RELATIONSHIP
OF BIVARIATE DISTRIBUTIONS
4.1 INTRODUCTION
Let g(X1 , ..., Xn ) be any function of the random variables X1 , ..., Xn whose
density function is f (x1 , ..., xn ). Then the k th moment of g(X1 , X2 , ..., Xn )
is defined by
=
∞
Develop
...
∞
the tools we need for Life Science
g k (x , x , · · · , x ) f (x , ..., x ) dx dx ...dx
1 2 n 1 n 1 2 n
−∞ Masters
−∞ Degree in Bioinformatics
In the case of a bivariate distribution a special type of moment that has been
127
Bioinformatics is the
exciting field where biology,
computer science, and
mathematics meet.
Read more about this and our other international masters degree programmes at www.uu.se/master
120
ments are similarly calculated. In Chapter 3, we calculated the mean and
variance ofTOPICS
ADVANCED bivariate random variables and also discussed their properties.
IN INTRODUCTORY
However, we A
PROBABILITY: shall
FIRSTrealise in this
COURSE IN chapter that in the multivariate case, other
PROBABILITY THEORY
types of moments can– VOLUME III
be calculated. Measures of Relationship of Bivariate Distributions
Let g(X1 , ..., Xn ) be any function of the random variables X1 , ..., Xn whose
density function is f (x1 , ..., xn ). Then the k th moment of g(X1 , X2 , ..., Xn )
is defined by
In the case of a bivariate distribution a special type of moment that has been
found very useful is the product moment. We shall define product moment
for both discrete and continuous random127 variables.
The moments described in Definitions 4.1 and 4.2 for the case when
both powers are equal to unity are known as the first product moment about
the origin.
121
ADVANCED TOPICS IN INTRODUCTORY
PROBABILITY: A FIRST COURSE IN
PROBABILITY
Measures THEORY – VOLUME
of Relationship III
of Bivariate Measures of Relationship of Bivariate
Distributions 129 Distributions
FromThe moments
Example 3.5 described in Definitions 4.3 and 4.4 for the case when
both powers are equal to unity shall conveniently be referred to as the first
product moment about the µmean=or E(X) the first 13
X = central product moment.
6
18
See detailed discussion of this in Section 4.3. 10
µY = E(Y ) =
9
Hence
1 2
13 10 xy
= x− y− x + 122 dy dx
2
0 0 18 9 3
1 2
Solution
The first product moment about the mean, (that is, p = q = 1) for which
ADVANCED TOPICS
we are required toINcalculate
INTRODUCTORY
is given by
PROBABILITY: A FIRST COURSE IN
PROBABILITY THEORY – VOLUME III 1 2 Measures of Relationship of Bivariate Distributions
µ11 = E[(X − µX )(Y − µY )] = (x − µX )(y − µY ) f (x, y) dy dx
0 0
Copenhagen
Master of Excellence cultural studies
Copenhagen Master of Excellence are
two-year master degrees taught in English
at one of Europe’s leading universities religious studies
123
= (18x − 13)(2x − 3x ) dx
243 0
1
1
= TOPICS IN(75x
ADVANCED 2
− 54x3 − 26x) dx
INTRODUCTORY
243
PROBABILITY: A FIRST COURSE IN
0
= −0.00617
PROBABILITY THEORY – VOLUME III Measures of Relationship of Bivariate Distributions
Note
M11∗ = Cov(X, Y ) and hence the covariance is sometimes referred to as the
first product moment about the mean or simply the product moment.
Theorem 4.1
Let X and Y be random variables with a joint distribution function,
then
Cov(X, Y ) = E(XY ) − E(X)E(Y )
Proof
124
ADVANCED TOPICS IN INTRODUCTORY
PROBABILITY: A FIRST COURSE IN
PROBABILITY
132 THEORY – VOLUME Advanced
III Measures
Topics of Relationship
in Introductory of Bivariate Distributions
Probability
Note
(b) The covariance need not be finite, or even exist. However, it is finite
if both random variables have finite variances.
Example 4.2
For the data in Example 2.3, find the covariance between X and Y .
Solution
It has been shown in Example 3.14,
E(XY ) = 6.24
E(X)E(Y ) = 6.24
Hence
Example 4.3
Refer to Example 1.3. Find Cov(X, Y ).
Solution
From Examples 3.7 and 3.8,
13 10 43
E(X) = ; E(Y ) = ; E(XY ) = .
18 9 54
Hence
which is the
Measures same as that of
of Relationship obtained in Example
Bivariate 4.1.
Distributions 133
Property 1 Symmetry
Theorem 4.2
Suppose X and Y are two random variables, then
Cov(X, Y ) = Cov(Y, X)
Property 2
Theorem 4.3
Suppose X and Y are two random variables125
and a is a constant, then
Theorem
Suppose
Suppose X X 4.2
and
and Y Y are two
two random
are Cov(X, Y ) = variables,
random Cov(Y, X)then
variables, then
Suppose X and Y are Cov(X, Y ) = variables,
two random Cov(Y, X)then
Cov(X,
Cov(X, Y )) = Cov(Y, X)
ADVANCED TOPICS IN INTRODUCTORY
Cov(X, Y Y)= = Cov(Y,
Cov(Y, X)X)
PROBABILITY: A FIRST COURSE
Property 2 Cov(X,
IN Y ) = Cov(Y, X)
PROBABILITY
Property 2 THEORY – VOLUME III Measures of Relationship of Bivariate Distributions
Property 2
Property
Property 2
2
Theorem
Property 2 4.3
Theorem
Property
Suppose
2 4.3
and Y are two random variables and a is a constant, then
X 4.3
Theorem
Suppose
Theorem and Y are two random variables and a is a constant, then
X 4.3
Suppose
Theorem
Theorem and Y are two random variables and a is a constant, then
X 4.3
4.3
Suppose
Theorem X and Y are Cov(a
two + X, Y )variables
random = Cov(X, Y )a is a constant, then
and
Suppose
Suppose X X 4.3
and
and YY are
are two
two random
Cov(a + X, Y )variables
random = Cov(X,
variables and
Y )a
and a is
is aa constant,
constant, then
then
Suppose X and Y are Cov(a + X, Y )variables
two random = Cov(X, Y )a is a constant, then
and
Cov(a
Cov(a + X, Y )) = Cov(X, Y )
Cov(a ++ X,
X, Y Y)= = Cov(X,
Cov(X, Y Y ))
Proof Cov(a + X, Y ) = Cov(X, Y )
Proof
Proof
Proof Cov(a + X, Y ) = E{[a + X − E(a + X)][Y − E(Y )]}
Proof
Proof Cov(a + X, Y ) = E{[a + X − E(a + X)][Y − E(Y )]}
Proof Cov(a + X, Y ) = E{[a E{[X+−XE(X)][Y − E(a +−X)][Y
E(Y )]} − E(Y )]}
Cov(a +
Cov(a + X,
X, YY )) == E{[a
E{[a +
E{[X +−X − E(a
XE(X)][Y
E(a + +−X)][Y
E(Y )]}
X)][Y E(Y )]}
− E(Y
Cov(a + X, Y ) = E{[X Cov(X,
E{[a +−X − E(Y )]}
)− E(a +−X)][Y
YE(X)][Y − E(Y )]}
− )]}
Cov(a + X, Y ) = = Cov(X,
E{[X+−
E{[a
E{[X −X )− E(a +−
E(X)][Y
YE(X)][Y −X)][Y
E(Y )]}
E(Y )]}
− E(Y )]}
= Cov(X,
E{[X −YE(X)][Y) − E(Y )]}
Corollary 4.1 = E{[X
= Cov(X,
Cov(X, −Y )
YE(X)][Y
) − E(Y )]}
Corollary 4.1 = Cov(X, Y )
Suppose and Y are two random
CorollaryX4.1 = Cov(X, variables
Y ) and a and b are constants. Then
Suppose X4.1
Corollary and Y are two random variables and a and b are constants. Then
Suppose
Corollary
CorollaryX and Y are two random variables and a and b are constants. Then
4.1
4.1
Suppose X and Y are Cov(a
two + X, bvariables
random + Y ) = Cov(X,
and a Y ) b are constants. Then
and
Corollary
Suppose
Suppose XX4.1
and
and YY are two
two random
Cov(a
are Cov(a + X, bvariables
random and
+ Y ) = Cov(X,
variables and a
a and
Y ) b are
and are constants.
constants. Then
Then
Suppose X and Y are Cov(a + X, bvariables
two random + Y ) = Cov(X,
and a Y ) bb are
and constants. Then
Cov(a + X, b + Y ) = Cov(X, Y )
Cov(a + X, bb + Y )) =
= Cov(X, Y ))
Property 3
Property 3 + X, +Y Cov(X, Y
Property 3 Cov(a + X, b + Y ) = Cov(X, Y )
Property
Property 3
3
Theorem
Property 3 4.4
Theorem
Property 3 4.4
Suppose X 4.4
and Y are two random variables, then
Theorem
Suppose X 4.4
Theorem and Y are two random variables, then
Suppose
Theorem
Theorem and Y are two random variables, then
X 4.4
4.4
Suppose
Theorem Cov(X,
and
X 4.4 Y + Z)random
areY two = Cov(X, Y ) + Cov(X,
variables, then Z)
Suppose
Suppose X X and
and Y are
Y
Cov(X, areY two
+ Z)random
two = Cov(X,
random variables, then
Y ) + Cov(X,
variables, then Z)
Suppose X and Cov(X, + Z)random
Y areY two = Cov(X, Y ) + Cov(X,
variables, then Z)
Cov(X,
Cov(X, Y + Z) = Cov(X, Y ) + Cov(X, Z)
Cov(X, Y Y + Z) =
+ Z) = Cov(X,
Cov(X, Y Y )) +
+ Cov(X,
Cov(X, Z) Z)
Cov(X, Y + Z) = Cov(X, Y ) + Cov(X, Z)
126
ADVANCED TOPICS IN INTRODUCTORY
PROBABILITY: A FIRST COURSE IN
PROBABILITY THEORY – VOLUME Advanced
134 III Measures
Topics of Relationship
in Introductory of Bivariate Distributions
Probability
Proof
Property 4
Theorem 4.5
Suppose X and Y are two random variables and a and b are
constants, then
Cov(aX, bY ) = a b Cov(X, Y )
Proof
Corollary 4.2
Suppose X and Y are two random variables and a, b, c, and d are constants,
then
Cov(a + cX, b + dY ) = c d Cov(X, Y )
In general, the same kind of argument gives the important linear property
of covariance.
Measures of Relationship of Bivariate Distributions 135
Theorem 4.6
Let U = a + ni=1 ci Xi and V = b + m
j=1 dj Yj . Then
n
m
Cov(U, V ) = ci dj Cov(Xi , Yj )
i=1 j=1
Cov(X + Y, X + Y ) = Var(X + Y )
127
= Var(X) + Var(Y ) + 2 Cov(X, Y )
n
m
Cov(U, V ) = ci dj Cov(Xi , Yj )
ADVANCED TOPICS IN INTRODUCTORY
i=1 j=1
PROBABILITY: A FIRST COURSE IN
PROBABILITY THEORY – VOLUME III Measures of Relationship of Bivariate Distributions
Cov(X + Y, X + Y ) = Var(X + Y )
= Var(X) + Var(Y ) + 2 Cov(X, Y )
Theorem 4.7
If X and Y are independent random variables then Cov(X, Y ) = 0
Proof
From Theorem 4.1
E(XY ) = E(X)E(Y )
Note
136 Advanced Topics in Introductory Probability
If Cov(X, Y ) = 0, X and Y need not be independent.
Property 7
Theorem 4.8
Suppose X and Y are random variables having second moments then
Cov(X, Y ) is a well-defined finite number and
Note
The equality holds if and only if Y and X have perfect linear relation.
Cov(X, Y )TOPICS
ADVANCED is oftenINused as a measure of the linear dependence of X and Y ,
INTRODUCTORY
PROBABILITY: A FIRST COURSE IN
and the reason for this is that Cov(X, Y ) is a single number (rather than
PROBABILITY THEORY – VOLUME III Measures of Relationship of Bivariate Distributions
a complicated object such as a joint density function) which contains some
useful information about the joint behaviour of X and Y :
(a) Positive values indicate that X increases as Y increases;
(b) Negative values indicate that X decreases as Y increases or vice versa;
(c) A zero value of the covariance would indicate no linear dependence
between X and Y .
0
In this definition, = 0. Correlation coefficient is also referred to as
0
Pearson’s correlation coefficient or product-moment correlation coefficient.
The product-moment correlation coefficient ρ can also be expressed as
XY − Xand
NNE Pharmaplan is the world’s leading engineering Y consultancy company
focused entirely on the pharma and ρ= biotech industries. We employ more than (i)
1500 people worldwide and offer global reach σXand
σYlocal knowledge along with
our all-encompassing list of services. nnepharmaplan.com
The reader will be asked in Exercise 4.21 to show that the numerator of
expression (i) above is defined by Cov(X, Y ). This expression may also be
used to establish a very important property of the correlation coefficient (see
Theorem 4.10 in the sequel.) 129
Measures of Relationship of Bivariate Distributions 137
0
In this definition, = 0. Correlation coefficient is also referred to as
0
Pearson’s correlation coefficient or product-moment correlation coefficient.
The product-moment correlation coefficient ρ can also be expressed as
XY − X Y
ρ= (i)
σX σY
The reader will be asked in Exercise 4.21 to show that the numerator of
expression (i) above is defined by Cov(X, Y ). This expression may also be
used to establish a very important property of the correlation coefficient (see
Theorem 4.10 in the sequel.)
138 Advanced Topics in Introductory Probability
Property 1 Scale-invariance
Theorem 4.9
For any a, b, c, d ∈ R such that ac = 0,
where
+1, if ac > 0
IXY =
−1, if ac < 0
Proof
Cov(ax + b, cY + d)
ρ(aX + b, cY + d) =
Var(aX + b) Var(cY + d)
By Corollary 4.2 130
where
ADVANCED TOPICS IN INTRODUCTORY
+1, if ac > 0
PROBABILITY: A FIRST COURSE =
IXY IN
PROBABILITY THEORY – VOLUME III −1, ifMeasures
ac < 0 of Relationship of Bivariate Distributions
Proof
Cov(ax + b, cY + d)
ρ(aX + b, cY + d) =
Var(aX + b) Var(cY + d)
By Corollary 4.2
Cov(aX + b, cY + d) = a c Cov(X, Y )
and by Theorem 3.22
Var(aX + b) = a2 Var(X) and Var(cY + d) = c2 Var(Y )
Hence
a c Cov(X, Y )
ρ(aX + b, cY + d) =
a2 Var(X) c2 Var(Y )
ac
= · ρ(X, Y )
|a c|
from which the result follows.
Theorem 4.10
The correlation coefficient lies between −1 and +1, that is,
−1 ≤ ρ ≤ 1
Measures of Relationship of Bivariate Distributions 139
Note
When ρ = 0 or near 0 it does not indicate the absence of relationship between
X and Y . It only indicates no linear relationship and it does not preclude
the possibility of some nonlinear relationship. 131
(ii) If ρ(X, Y ) = −1, Y is a decreasing perfect linear function of X.
Note
When ρ = 0 or near 0 it does not indicate the absence of relationship between
X and Y . It only indicates no linear relationship and it does not preclude
the possibility of some nonlinear relationship.
132
ADVANCED TOPICS IN INTRODUCTORY
PROBABILITY: A FIRST COURSE IN
PROBABILITY
Measures of THEORY – VOLUME
Relationship III
of Bivariate Measures of Relationship of Bivariate
Distributions 141 Distributions
Hence
E(XY ) − E(X)E(Y )
ρ =
Var(X)Var(Y )
0.79630 − 0.72222(1.11111)
=
(0.04506)(0.32099)
= −0.05127
Measures of Relationship of Bivariate Distributions 141
4.4.4 Variance of Sum of Random Variables with Common
Variance and Common Correlation Coefficient
Hence
We now establish the variance
E(XYof sum of random) variables with correlation
) − E(X)E(Y
coefficient. ρ =
Var(X)Var(Y )
0.79630 − 0.72222(1.11111)
=
Theorem 4.11 (0.04506)(0.32099)
Suppose X1 , X2 , ...,=Xn−0.05127
are random variables with common vari-
ance σ 2 and common correlation Corr(Xi , Xj ) = ρ, i = j (i, j =
1, 2, ...,
4.4.4 n). Let of Sum of Random Variables with Common
Variance
Sn = X1 +Correlation
Variance and Common X2 + ... + XnCoefficient
Then
We now establish the variance of sum of random variables with correlation
coefficient.
(a) Var(Sn ) = nσ 2 [1 + (n − 1)ρ]
Sn σ2
(b)
Theorem 4.11Var = [1 + (n − 1) ρ]
n n
Suppose X1 , X2 , ..., Xn are random variables with common vari-
ance 2Sn
whereσ and = common correlation
X n = sample mean Corr(Xi , Xj ) = ρ, i = j (i, j =
n Let
1, 2, ..., n).
Sn = X1 + X2 + ... + Xn
Proof
Then
(a) From Corollary 3.9,
(a) Var(Sn ) = nσ 2 [1 + (n − 1)ρ]
n
n
n
Var(S
S n ) =σ 2
n
Var(Xi ) + 2 Cov Cov(Xi , Xj )
(b) Var = [1 +
i=1 (n − 1) ρ] i j
n n i<j
n n
n
Sn
where = X n = sample
= mean
Var(Xi ) + 2 Var(Xi )Var(Xj )
n i=1 i j
i<j
Proof
(a) From Corollary 3.9,
n
n
n
Var(Sn ) = Var(Xi ) + 2 Cov Cov(Xi , Xj )
i=1 i j
i<j
n
n
n
142 = Var(Xi ) +Topics
Advanced 2 Var(Xi )Var(X
in Introductory j)
Probability
i=1 i j
i<j
Cov (Xi , Xj )
×
Var(Xi )Var(Xj )
n
n
n √
= σ2 + 2 σ2σ2 ρ
i=1 i j
i<j
n
n 133
= nσ + 2 2
ρσ 2
142 Advanced Topics in Introductory Probability
ADVANCED TOPICS IN INTRODUCTORY
PROBABILITY: A FIRST COURSE IN
PROBABILITY THEORY – VOLUME III CovMeasures
(Xi , Xj ) of
Relationship of Bivariate Distributions
×
Var(Xi )Var(Xj )
n
n
n √
= σ2 + 2 σ2σ2 ρ
i=1 i j
i<j
n
n
= nσ 2 + 2 ρ σ2
i j
i<j
n2 − n
= nσ + 2 ρ σ
2 2
2
= nσ 2 + ρ σ 2 n2 − n
= nσ 2 + ρ σ 2 n (n − 1)
= nσ 2 [1 + (n − 1) ρ]
Sn 1
(b) Var = Var(Sn )
n n2
σ2
= [1 + (n − 1) ρ]
n
When X’s are independent, ρ = 0 and
Sn σ2
Var = Var(X) =
n n
This e-book
discuss conditional expectation7 which is simply the expected value of a
variable, given that a set of prior conditions has taken place. As we shall
SetaPDF
www.setasign.com
134
Sn σ2
Var
ADVANCED TOPICS IN INTRODUCTORY
= Var(X) =
n n
PROBABILITY: A FIRST COURSE IN
PROBABILITY THEORY
as will be seen – VOLUME
in Theorem 7.2.III Measures of Relationship of Bivariate Distributions
135
ADVANCED TOPICS IN INTRODUCTORY
PROBABILITY: A FIRST COURSE IN
PROBABILITY
144 THEORY – VOLUME Advanced
III Measures
Topics of Relationship
in Introductory of Bivariate Distributions
Probability
From the foregoing two definitions we notice that the definition of condi-
tional expectation is almost the same as the definition of expectation, except
that instead of a probability (marginal) distribution it uses a conditional
probability (marginal) distribution.
Example 4.6
Refer to Example 1.3. Determine: (a) E(Y |x), (b) E(X|y)
Solution
∞
(a) E(Y |x) = y f (y|x) dy
−∞
6x2 + 2xy
f (x|y) = , 0 < x < 1, 0<y<2
2+y
Hence
1
Measures of Relationship of Bivariate
Distributions 145
6x2 + 2xy
E(X|y) = x dx
0 2+y
1 3
6x + 2x2 y
= dx
0 2+y
1
1 6x4 2x3 y
= +
2+y 4 3
0
1 6 2y
= +
2+y 4 3
18 + 8y
=
12(2 + y)
Property 1
136
Theorem 4.12
= 2+ 1 y 46 + 2y
3
= 2+ 1 y 46 + 2y 3
= 2+y 4 + 3
ADVANCED TOPICS IN INTRODUCTORY 218
+ +
y 8y 4 3
= 18 + 8y
PROBABILITY: A FIRST COURSE = 18 + 8y
IN 12(2 + y)
PROBABILITY THEORY – VOLUME
=III 12(2
18 ++8yy)
= 12(2 + y)Measures of Relationship of Bivariate Distributions
12(2 + y)
4.5.2 Properties of Conditional Expectation
4.5.2 Properties of Conditional Expectation
4.5.2 Properties of Conditional Expectation
4.5.2
PropertyProperties
1 of Conditional Expectation
Property 1
Property 1
Property 1
Theorem 4.12
Theorem 4.12
Theorem
Suppose X 4.12
and Y are random variables. If Y has a finite expectation
Suppose
Theorem and Y are random variables. If Y has a finite expectation
X 4.12
and Y ≥ 0, and
Suppose X thenY are random variables. If Y has a finite expectation
and
Suppose
Y ≥X 0, and
thenY are random variables. If Y has a finite expectation
and Y ≥ 0, then E(Y |X) ≥ 0
and Y ≥ 0, then E(Y |X) ≥ 0
E(Y |X) ≥ 0
E(Y |X) ≥ 0
Property 2
Property 2
Property 2
Property 2
Theorem 4.13
Theorem 4.13
Theorem
Suppose X 4.13
and Y are random variables having finite expectations
Suppose
TheoremX 4.13and Y are random variables having finite expectations
Suppose and Y are1 ≤
and ai areXconstants, random
i ≤ n, variables
then having finite expectations
and a areXconstants,
Suppose and Y are1 ≤ random then
i ≤ n, variables having finite expectations
and aii are constants,
1 ≤ i ≤ n,then
and ai are constants,
1nn ≤ i ≤ n,then n
E n
n ai Yi |X = n Eai E(Yi |X)
E n a Y |X =
n Eai E(Yi |X)
E i=1 aii Yii |X = i=1 Eai E(Yi |X)
E i=1 ai Yi |X = i=1 Eai E(Yi |X)
i=1 i=1
i=1 i=1
In particular if a = a then
In particular if aii = a then
In particular if ai = a then
n
In particular if ai = a then
n
n
n
E n ai Yi |X = a
n E(Yi |X)
E n ai Yi |X = a n E(Yi |X)
E i=1 ai Yi |X = a i=1 E(Yi |X)
E i=1 a Y
i=1 i i
|X = a i=1
i=1
E(Yi |X)
i=1 i=1
www.foss.dk
137
ADVANCED TOPICS IN INTRODUCTORY
PROBABILITY: A FIRST COURSE IN
PROBABILITY THEORY – VOLUME Advanced
146 III Measures
Topics of Relationship
in Introductory of Bivariate Distributions
Probability
Property 3
Theorem 4.14
Suppose Y1 and Y2 are random variables with finite expectations. If
Y1 ≤ Y2 then
E(Y1 |X) ≤ E(Y2 |X)
Property 4
Theorem 4.15
Suppose X and Y are two independent random variables. If Y has
a finite expectation, then
Property 5
Theorem 4.16
Suppose X and Y are two random variables. If Y has a moment of
order r ≥ 1, then
r
E(Y |X) ≤ [E(|Y |X)]r ≤ E(|Y |r |X)
As pointed out earlier, E(Y |X) is a random variable and hence we may find
Measures of Relationship of Bivariate Distributions
its expectation. 147
Property 6
Theorem 4.17
Suppose that X and Y are independent random variables. Then the
expectation of conditional expectation is given by
Proof
We will prove this for the continuous case. The discrete case is proved
similarly on replacing integrals by summations.
By definition,
+∞ 138
Proof
We will prove this for the continuous case. The discrete case is proved
similarly on replacing integrals by summations.
By definition,
+∞
E(Y |x) = y h(y|x) dy
−∞
∞
f (x, y)
= y dy
−∞ g(x)
where f (x, y) is the joint probability density function of (X, Y ) and g(x) is
the marginal probability density function of X.
Multiply both sides of the equation by g(x) :
∞
f (x, y)
g(x)E[Y |x] = y dy g(x)
−∞ g(x)
Taking the integral of both sides with respect to x, gives an expectation of
E(Y |x), namely, E[E(Y |X)]. That is,
∞
E[E(Y |X)] = E(Y |x) g(x) dx
−∞
∞ ∞
f (x, y)
= y dy g(x) dx
−∞ −∞ g(x)
If all the expectations exist, it is permissible to write the above iterated
integral with the order of integration reversed. Thus
148
148 Advanced
Advanced
∞ Topics
Topics in
in Introductory
Introductory
Probability
Probability
148 Advanced Topics
∞ in Introductory Probability
E[E(Y |X)] = y f (x, y) dx dy
−∞
∞ −∞
∞
= ∞ yy h(y)
= h(y) dy
dy
= −∞ −∞ y h(y) dy
=
= E(Y ))
−∞
E(Y
= E(Y )
Theorem
Theorem 4.17
4.17 gives
gives what
what might
might be
be called
called the
the law
law of
of total
total expectation:
expectation: the
the
Theorem 4.17
expectation of gives what
of aa random might be
random variable called
variable Y the
can be
Y can law of total
be calculated
calculated byexpectation:
by weighting the
the
expectation weighting the
expectation
conditional of a randomappropriately
expectations variable Y can
and be calculated
summing or by weighting
integrating. the
conditional expectations appropriately and summing or integrating.
conditional expectations appropriately and summing or integrating.
Property 7
Property 7
Property 7
Theorem 4.18
Theorem 4.18
Theorem
Let = 4.18
|x) be
Let Y = E(Y |x)
Y E(Y be the
the conditional
conditional expectation
expectation of Y given
of Y X. Then
given X. Then
Let Y = E(Y |x) be the conditional expectation of Y given X. Then
2
Y ))2 ]] ≤ π)2 ]]
2
E[(Y −
E[(Y −Y E[(Y −
≤ E[(Y − π)
E[(Y − Y )2 ] ≤ E[(Y − π)2 ]
where
where ππ= π(x) is
= π(x) is any
any other
other function
function of of X
X
where π = π(x) is any other function of X
Proof
Proof
Proof
π)}22 ]
} = Y )) +
+ ((Y
2
E{(Y −
E{(Y π)2 }
− π) E[{(Y −
= E[{(Y −Y Y −
− π)} ]
E{(Y − π)2 } = E[{(Y −Y 2) + (Y − π)}2 ]
= E[(Y − Y ) 2 ] + 2E[(Y − Y
= E[(Y − Y )2 ] + 2E[(Y − Y
)(YY π)] +
− π)] + E[(
E[(Y
Y − π)2 ]]
− π)
2
= E[(Y − Y )2 ] + 2E[(Y − )(
Y )( Y −
− π)] + E[(Y − π)2 ]
= E[(Y − Y
) 2 ] + E[(Y
− π) 2]
2
(i)
= E[(Y − Y )2 ] + E[(Y − π)2 ] (i)
= E[(Y − Y ) ] + E[(Y − π) ] (i)
because
because we
we can
can show
show that
that the
the second
second term
term isis zero.
zero. Thus,
Thus,
because we can show thatthe
second term is zero. Thus,
Y (y − y)(y − π)f (x, y)∂y∂x
E(Y −
E(Y −YY )(
)(Y − π) =
− π) = (y − y)(y − π)f (x, y)∂y∂x
E(Y − Y )(Y − π) = xx yy (y − y)(y − π)f (x, y)∂y∂x
x
y 139
= (y − y)f (y|x)∂y (y − π)f (x)∂x
= (y − y)f (y|x)∂y (y − π)f (x)∂x
Proof
ADVANCED π)2 } IN
E{(Y −TOPICS E[{(Y − Y ) + (Y − π)}2 ]
= INTRODUCTORY
PROBABILITY: A FIRST COURSE IN 2
PROBABILITY THEORY= – E[(Y − YIII) ] + 2E[(Y
VOLUME − Y )(Y of
Measures − π)] + E[(Y − π)
Relationship
2
]
of Bivariate Distributions
2
= E[(Y − Y ) ] + E[(Y − π) ] 2
(i)
Proof
Proof
Just as E(Y |X), Var(Y |X) is also a random variable and has expectation
Proof
Just as E(Y |X), Var(Y |X) is also a random variable and has expectation
Proof
Just as E(Y
E[Var(Y Now,
|X)].|X), Var(Y |X) is also a random variable and has expectation
Just as E(Y
E[Var(Y Now,
|X)].|X), Var(Y |X) is also a random variable and has expectation
E[Var(Y |X)]. Now,
E[Var(Y |X)]. Now,
E[Var(Y |X)] = E{E(Y 22|X) − [E(Y |X)]22}
E[Var(Y |X)] = E{E(Y22 |X) − [E(Y |X)]2 } 2
E[Var(Y |X)] = E[E(Y |X) − [E(Y
E{E(Y 22|X)] |X)]
E{E[(Y 2 } 2}
|X)]
E[Var(Y |X)] = E{E(Y
E[E(Y 2 |X)] − [E(Y
|X) − |X)]|X)]
E{E[(Y } }
= E[E(Y 22|X)] − {E[E(Y |X)]22 }22 + {E[E(Y |X)]}22
E{E[(Y |X)]}
= E[E(Y |X)] − E{E[(Y |X)]}} + {E[E(Y |X)]}
{E[E(Y |X)]
= E[E(Y 22 |X)] − {E[E(Y |X)]}22 + {E[E(Y |X)]}22
= E[E(Y |X)] − {E[E(Y |X)]} + {E[E(Y |X)]}
140
ADVANCED TOPICS IN INTRODUCTORY
PROBABILITY: A FIRST COURSE IN
PROBABILITY THEORY – VOLUME III Measures of Relationship of Bivariate Distributions
150 Advanced Topics in Introductory Probability
−E{[E(Y |X)]2 }
= E(Y 2 ) − [E(Y )]2 − Var[E(Y |X)] (from Theorem 4.17)
= Var(Y ) − Var[E(Y |X)
Thus for the discrete case, if pX (x) > 0 and if Y has a second moment, then
the conditional variance of Y given X = x is given by
2
Var(Y |X = x) = y pY |X (y|x) −
2
y pY |X (y|x)
y y
141
Var(Y |X) = E(Y 2 |X) − [E(Y |X)]2
ADVANCED TOPICS IN INTRODUCTORY
provided A
PROBABILITY: that E(Y
FIRST |X) exists
COURSE IN
PROBABILITY THEORY – VOLUME III Measures of Relationship of Bivariate Distributions
Thus for the discrete case, if pX (x) > 0 and if Y has a second moment, then
the conditional variance of Y given X = x is given by
2
Var(Y |X = x) = y pY |X (y|x) −
2
y pY |X (y|x)
y y
Property 1
Theorem 4.21
Suppose that X and Y are independent random variables. Then the
expectation of conditional variance
Property 2
Theorem 4.22
Suppose that X and Y are independent random variables. Then the
expectation of conditional variance is given by
Proof
This follows from the proof of Theorem 4.20.
142
ADVANCED TOPICS IN INTRODUCTORY
PROBABILITY: A FIRST COURSE IN
PROBABILITY THEORY – VOLUME III Measures of Relationship of Bivariate Distributions
152 Advanced Topics in Introductory Probability
Theorem 4.23
Suppose that X and Y are independent random variables. Then
Proof
This follows from the proof of Theorem 4.20.
Aliter
Var(Y ) = E[Y − E(Y )]2
= E{[Y − E(Y |X)] + [E(Y |X) − E(Y )]}2
= E[Y − E(Y |X)]2 + E[E(Y |X) − E(Y )]2
since the cross-product term vanishes. Conditioning on X the two terms of
the expression on the right side, we obtain for the first term:
E{E[Y − E(Y |X)]2 |X} = E[Var(Y |X)] (by Definition 4.9a)
and the second term:
E[E[Y |X) − E(Y |X)]2 = Var[E(Y |X)]
Hence
Var(Y ) = E[Var(Y |X)] + Var[E(Y |X)]
The linear regression function was encountered in Theorem 4.19 as the ex-
pected value of Y given X. In this section, we shall discuss in detail its
Measures
special of Relationship of Bivariate Distributions
characteristics. 153
E(Y |x) = α + βx
That is, the regression curve of Y on X is a straight line, called the re-
gression line of Y on X. The constants α and β are the parameters of the
linear regression equation. The constant α is called the intercept, which is
the point where the regression line cuts the y-axis. The constant β is called
the slope of the regression equation or the regression coefficient of Y on X
which measures a change in Y per unit change in X.
143
The constants α and β are unknown parameters and have to be esti-
mated. We discuss here two estimation methods, namely,
ADVANCED TOPICS IN INTRODUCTORY
E(Y |x) = α + βx
PROBABILITY: A FIRST COURSE IN
where α and
PROBABILITY β are
THEORY constants
– VOLUME III Measures of Relationship of Bivariate Distributions
That is, the regression curve of Y on X is a straight line, called the re-
gression line of Y on X. The constants α and β are the parameters of the
linear regression equation. The constant α is called the intercept, which is
the point where the regression line cuts the y-axis. The constant β is called
the slope of the regression equation or the regression coefficient of Y on X
which measures a change in Y per unit change in X.
The constants α and β are unknown parameters and have to be esti-
mated. We discuss here two estimation methods, namely,
The method of moments for estimating the linear regression function at-
tempts to find expressions for α and β that are in terms of the first-and
second-order moments of the joint distribution, namely, E(X), E(Y ), Var(X)
and Cov(X, Y ).
Theorem 4.24
If (X, Y ) has linear regression of Y on X, then
α = E(Y ) − β E(X)
Cov(XY )
β =
Var(X)
The Wake
the only emission we want to leave behind
.QYURGGF'PIKPGU/GFKWOURGGF'PIKPGU6WTDQEJCTIGTU2TQRGNNGTU2TQRWNUKQP2CEMCIGU2TKOG5GTX
6JGFGUKIPQHGEQHTKGPFN[OCTKPGRQYGTCPFRTQRWNUKQPUQNWVKQPUKUETWEKCNHQT/#0&KGUGN6WTDQ
2QYGTEQORGVGPEKGUCTGQHHGTGFYKVJVJGYQTNFoUNCTIGUVGPIKPGRTQITCOOGsJCXKPIQWVRWVUURCPPKPI
HTQOVQM9RGTGPIKPG)GVWRHTQPV
(KPFQWVOQTGCVYYYOCPFKGUGNVWTDQEQO
144
The method of moments for estimating the linear regression function at-
ADVANCED
tempts to TOPICS IN INTRODUCTORY
find expressions for α and β that are in terms of the first-and
PROBABILITY: A FIRST COURSE IN
second-order moments of the joint distribution,
PROBABILITY THEORY – VOLUME III
namely, E(X), E(Y ), Var(X)
Measures of Relationship of Bivariate Distributions
and Cov(X, Y ).
Theorem 4.24
If (X, Y ) has linear regression of Y on X, then
α = E(Y ) − β E(X)
Cov(XY )
β =
Var(X)
154 Advanced Topics in Introductory Probability
Proof
Proof for Estimating α:
From Definition 4.10, the regression function is given by
so that
E(XY ) − E(X)E(Y )
β =
E(X 2 ) − E[(X)]2
Measures of Relationship of Bivariate Distributions 155
Cov(X, Y )
= (from Theorem 4.1)
Var(X)
or
E[X − E(X)][(Y − E(Y )]
β=
E{[(X − E[(X)]2 }
or Theorem 4.25
E[X − E(X)][(Y − E(Y )]
=
If (X, Y ) has linearβ regression of Y on X, then
E{[(X − E[(X)]2 }
α = µY − β µX
4.7.3 Least Squares Method of Estimating Linear Regression
Function σXY
β =
σ2
Let the values (x1 , y1 ), (x2 , y2 ), · · · (xnX, yn ), be plotted as points in the x, y
plane.
whereThenµY the problem
= E(X), 2 of=estimating
σX Var(X) and a linear
σXYregression
= Cov(X,function
Y) can be
treated as a problem of fitting a straight line to this set of points. The best
known method is the least squares approach.
ProofThe least squares method states that the sum of the squares of the
difference
Let us find between the observed
the best function of thevalue of Y =and
form h(x) α+βthe
x. corresponding fitted
This merely requires
curve valueover
optimising of Y,the
which
two we denote byαŶ ,and
parameters mustβ.beNow,
minimum.
we canThe values
write 8 of
the parameters obtained by this minimisation determine what is known as
Var(Y
the best−fitting curve=in the
α − β X) − α of
E[(Ysense −β least ] − [E(Y − α − β X)]2
X)2squares.
156 − α − β X)2 ] = Var(YAdvanced
E[(Y − α − β X) + [E(Y
Topics − α − β X)]2Probability
in Introductory
= (σY2 + β 2 σX2
− 2β σXY ) + [E(Y − α − β X)]2
Theorem 4.25
8
If (X, Y )that
Recollect (by Theorem
has linear regression of Y 3.22;
on X,and also Var(α) = 0, a property
then
2 2
ofE(X)
variance)
= E(X ) − [E(X)]
α = µY − β µX
The second term of the expression on the right hand side of the equation
does not depend on α so α can be chosen σXY so as to minimize the second term.
β =
Recall that to minimize a function σX2 we set the derivative of the function
to zero. Thus
where µY = E(X), σX 2 = Var(X) and σ
XY = Cov(X, Y )
[E(Y − α − β X)]2 = 2[E(Y − α − β X)] = 0
Proof
giving
Let us find the best function of the form h(x) = α+β x. This merely requires
optimising over the two − α − β µX α =and
µYparameters 0 β. Now, we can write8
α = µY − β µX
Var(Y − α − β X) = E[(Y − α − β X)2 ] − [E(Y − α − β X)]2
Now to
E[(Y − minimize
α − β X)2the
] =firstVar(Y
term − weαset− βthe
X)derivative
+ [E(Y −with
α − respect
β X)]2 to β equal
to zero, to obtain = (σ 2 + β 2 σ 2 − 2β σ ) + [E(Y − α − β X)]2
Y X XY
8
σ 2
Y + β 2 2
σ X − 2 β σ XY =0
Recollect that
giving E(X) = E(X 2 ) − [E(X)]2
2 β σX
2
− 2 σXY = 0
σXY
β = 2
σX
146
[E(Y − α − β X)] = 2[E(Y − α − β X)] = 0
giving
ADVANCED TOPICS IN INTRODUCTORY
PROBABILITY: A FIRST COURSE IN
PROBABILITY THEORY – µ Y −α−
VOLUME IIIβ µX = Measures
0 of Relationship of Bivariate Distributions
α = µY − β µX
Now to minimize the first term we set the derivative with respect to β equal
to zero, to obtain
σY2 + β 2 σX
2
− 2 β σXY = 0
giving
2 β σX
2
− 2 σXY = 0
σXY
β = 2
σX
Corollary 4.3
If β is a coefficient of a linear regression function of X on Y , then
σY
β=ρ
σX
Proof
It is sufficient to show that
σXY σY
2 =ρ
σX σX
Thus
σXY
β = 2
σX
RUN FASTER.
RUN LONGER.. READ MORE & PRE-ORDER TODAY
RUN EASIER… WWW.GAITEYE.COM
147
ADVANCED TOPICS IN INTRODUCTORY
PROBABILITY: A FIRST COURSE IN
Measures of Relationship of Bivariate Distributions 157
PROBABILITY THEORY – VOLUME III Measures of Relationship of Bivariate Distributions
σXY σY
= ·
σX σX σY
σXY σY
= ·
σX σY σX
σY
= ρ
σX
Similarly, (X, Y ) is said to have linear regression of X on Y if
E(X|y) = τ + θ y (i)
with τ and θ as constants. That is, the regression curve of X on Y is a
straight line, which is called the regression line of X on Y . The number θ
is called the regression coefficient of X on Y and is defined as
σXY Cov(X, Y )
θ= = (ii)
2
σY Var(Y )
Theorem 4.26
Let us define Var(Ŷ ) = Var(Y − β X) as the mean squared predic-
tion error. Then
Var(Ŷ ) = σY2 (1 − ρ2 )
Proof
Var(Y − β X) = Var(Y ) + Var(β X) − 2Cov(Y, β X)
= σY2 + β 2 σX
2
− 2 β σXY
σ 2 σXY
= σY2 + XY 4
2
σX − 2 2 σXY
σX σX
σ 2
= σY2 − XY 2
σX
σXY σY 2
= σY2− ·
σX σY
= σY2 − ρ2 σY2
158
Advanced
Topics in Introductory Probability
= σY2 1 − ρ2
Example 4.7
Refer to Example 1.3.
Solution
(a) From Example 3.19
2
σX = VarX = 0.04506
σY2 = VarY = 0.32099
σXY = Cov(X,Y) = −0.00617
Hence
σXY 0.00617
β= 2 = − 0.04506 = −0.13693
σX
148
From Example 3.8,
(a) From Example 3.19
ADVANCED TOPICS IN INTRODUCTORY
2
σX = VarX = 0.04506
PROBABILITY: A FIRST COURSE IN
σY = IIIVarY
2
PROBABILITY THEORY – VOLUME = 0.32099
Measures of Relationship of Bivariate Distributions
σXY = Cov(X,Y) = −0.00617
Hence
σXY 0.00617
β= 2 =− = −0.13693
σX 0.04506
From Example 3.8,
13
µX = E(X) = = 0.72222
18
10
µY = E(Y ) = = 1.11111
9
Hence
α = µY − β µX
= 1.11111 − (−0.13693)(0.72222) = 0.90111
Finally,
E(Y |x) = 0.90111 − 0.13693 x
(b) Calculation of the mean squared prediction error
Var(Ŷ ) = σY2 (1 − ρ2 )
= 0.32099[1 − (−0.05127)] = 0.33745
Theorem 4.27
Let (X, Y ) be a two-dimensional random variable with
Proof
Substituting the expressions for α and β from Theorem 4.25 into Defi-
nition 4.10, we get
σXY
E(Y |x) = (µY − β µX ) + 2 x
σX
σXY σXY
= µX − 2 µX + 2 x
σX σX
σXY
= µY − 2 (X − µX )
σX
σY
= µY − ρ (X − µX ) (f romCorollary4.3)
σX
Similarly, it can be proved that if the regression of X on Y is linear, then
σX
E(X|y) = µX + θ (X − µY )
σY
It follows that if a regression is linear and ρ = 0, then E(Y |x) does not
depend on X and E(X|y) does not depend on Y.
149
Example 4.8
σXY σXY
= µX − 2 µX + σ 2 x
σX X
ADVANCED TOPICS IN INTRODUCTORY
σXY
= COURSE
PROBABILITY: A FIRST µY − IN2 (X − µX )
σX
PROBABILITY THEORY – VOLUME III Measures of Relationship of Bivariate Distributions
σY
= µY − ρ (X − µX ) (f romCorollary4.3)
σX
Similarly, it can be proved that if the regression of X on Y is linear, then
σX
E(X|y) = µX + θ (X − µY )
σY
It follows that if a regression is linear and ρ = 0, then E(Y |x) does not
depend on X and E(X|y) does not depend on Y.
Example 4.8
Refer to Example 1.3. Write the regression equation of Y on X.
Solution
From Example 3.8
13
E(X) = = 0.72222
18
160 10 Advanced Topics in Introductory Probability
E(Y ) = = 1.11111 (to 5 decimal places)
9
From Example 3.19,
√
σX = 0.04506 = 0.21227
√
σY = 0.32099 = 0.56656
Technical training on
Theorem 4.28 WHAT you need, WHEN you need it
The product ofAtthe IDC regression
Technologies coefficients β and
we can tailor θ in the and
our technical linear
engineering
regressions E(Y |x) and
training E(X|y), respectively,
workshops is equal
to suit your needs. Weto theextensive
have square OIL & GAS
of the correlation coefficient
experience of X and
in training Y : and engineering staff and
technical ENGINEERING
have trained people in organisations such as General
ρ2 = βθ ELECTRONICS
Motors, Shell, Siemens, BHP and Honeywell to name a few.
Our onsite training is cost effective, convenient and completely AUTOMATION &
customisable to the technical and engineering areas you want PROCESS CONTROL
covered. Our workshops are all comprehensive hands-on learning
Proof experiences with ample time given to practical sessions and MECHANICAL
In Corollary 4.3 and (ii) above, We communicate well to ensure that workshop content
demonstrations. ENGINEERING
and timing match the knowledge, skills, and abilities of the participants.
σY INDUSTRIAL
We run onsite β = ρ all year round and hold the workshops on
training DATA COMMS
your premises or a venue
σX of your choice for your convenience.
ELECTRICAL
and For a no obligation σX proposal, contact us today POWER
θ=ρ
at [email protected] or visit our website
σY
for more information: www.idc-online.com/onsite/
Hence
σY σX
βθ = ρ · ρ Phone: +61 8 9321 1702
σX σY
from which the result follows. Email: [email protected]
Website: www.idc-online.com
Note
(a) The sign of the regression coefficient is determined by ρ, since σX > 0
and σY > 0.
150
√
σX = 0.04506 = 0.21227
√
ADVANCED TOPICS IN INTRODUCTORY
σY = 0.32099 = 0.56656
PROBABILITY: A FIRST COURSE IN
PROBABILITY THEORY – VOLUME III Measures of Relationship of Bivariate Distributions
From Example 4.5,
ρ = −0.05127.
Hence
0.56656
E(Y |x) = 1.11111 − 0.05127 (x − 0.72222)
0.21227
= 1.11111 − 1.13684(x − 0.72222)
Theorem 4.28
The product of the regression coefficients β and θ in the linear
regressions E(Y |x) and E(X|y), respectively, is equal to the square
of the correlation coefficient of X and Y :
ρ2 = βθ
Proof
In Corollary 4.3 and (ii) above,
σY
β=ρ
σX
and
σX
θ=ρ
σY
Hence
σY σX
βθ = ρ ·ρ
σX σY
from which the result follows.
Note
(a) The of
Measures sign of the regression
Relationship coefficient
of Bivariate 161
is determined by ρ, since σX >
Distributions 0
and σY > 0.
(b) The two linear regression equations, E(Y |x) and E(X|y), have the
same sign of slope.
(c) The regression line passes through the point (EXY ) = [(E(X), E(Y )],
which is the expected value of the joint distribution.
(d) The point of the intersection of the linear regression curves E(Y |x)
and E(X|y) is [(E(X), E(Y )].
EXERCISES
4.1 For the data of Exercise 1.1, find the covariance between X and Y.
4.2 For the data of Exercise 1.1, find the correlation coefficient of X and
Y.
4.5 A die is rolled twice. Let X be the sum of the outcomes, and Y , the
first outcome minus the second. Compute Cov(X, Y )
162 Advanced Topics in Introductory Probability
4.6 Prove Theorem 4.5.
4.7 Prove Theorem 4.7.
4.11 Suppose the random variable X and Y have the joint p.d.f.
x + y, 0 < x < 1, 0<y<1
f (x, y) =
0, elsewhere
(a) Use the least squares method to obtain the linear regression
equation of (i) Y on X; (ii) X on Y.
(b) (i) Find the product of the regression coefficients in a(i) and
a(ii);
(ii) Take the square root of your results in b(i);
(iii) Compare the result in b(ii) with that in Exercise 4.9.
4.20 Show that the point of intersection of Y = E(Y |x) and X = E(X|y)
is (µX , µY )
Cov(X, Y ) = XY − XY
�e Graduate Programme
I joined MITAS because for Engineers and Geoscientists
I wanted real responsibili� www.discovermitas.com
Maersk.com/Mitas �e G
I joined MITAS because for Engine
I wanted real responsibili� Ma
Month 16
I was a construction Mo
supervisor ina const
I was
the North Sea super
advising and the No
Real work he
helping foremen advis
International
al opportunities
Internationa
�ree wo
work
or placements ssolve problems
Real work he
helping fo
International
Internationaal opportunities
�ree wo
work
or placements ssolve pr
153
ADVANCED TOPICS IN INTRODUCTORY
PROBABILITY: A FIRST COURSE IN PART 2 STATISTICAL INEQUALITIES, LIMIT
PROBABILITY THEORY – VOLUME III LAWS AND SAMPLING DISTRIBUTIONS
PART 2
STATISTICAL INEQUALITIES, LIMIT LAWS
AND SAMPLING DISTRIBUTIONS
154
STATISTICAL INEQUALITIES AND LIMIT LAWS
ADVANCED TOPICS IN INTRODUCTORY
PROBABILITY: A FIRST COURSE IN
PROBABILITY THEORY – VOLUME III Statistical Inequalities and Limit Laws
Chapter 5
Chapter
5.1
5
STATISTICAL INEQUALITIES AND LIMIT LAWS
INTRODUCTION
STATISTICAL
5.1.1
INEQUALITIES AND LIMIT LAWS
Statistical Inequalities
167
www.job.oticon.dk
155
probability distribution of X. This is because to describe a probability
distribution completely, we need to know the probability function of the
ADVANCED TOPICS IN INTRODUCTORY
random variable.
PROBABILITY: However,
A FIRST COURSE Markov’s
IN and Chebyshev’s inequalities enable
us to derive THEORY
PROBABILITY lower (or upper) III
– VOLUME bound on such probabilities when only
Statistical the
Inequalities and Limit Laws
mean, or both the mean and the variance of the distribution are known.
E(X)
P (X ≥ ) ≤
Statistical Inequalities and Limit Laws 169
Proof
This theorem is valid for both continuous and discrete cases. We shall first
prove for the discrete case.
X x1 x2 ··· xk ··· xn
P (X = x) p(x1 ) p(x2 ) ··· p(xk ) ··· p(xn )
Suppose also that the values of the random variable are arranged in ascend-
156
ing order
Statistical Inequalities and Limit Laws 169
X x1 x2 ··· xk ··· xn
P (X = x) p(x1 ) p(x2 ) ··· p(xk ) ··· p(xn )
Suppose also that the values of the random variable are arranged in ascend-
ing order
0 ≤ x1 < x2 < · · · < xn
Note that E(X) ≥ 0, since X ≥ 0. We shall consider three cases.
Case 1
If X takes only zero values, then E(X) = 0 and for any constant > 0
0
P (X ≥ ) = 0 ≤
That is, the theorem is valid.
E(X) > 0
Case 2
If > xn , then {X < } is a ‘certain event’ and
E(X)
P (X < ) = 1 ≤
that is, the theorem is again valid in this case.
Case 3
Let ≤ xn and xk , xk+1 , · · · ,Advanced
170 xn be all Topics
values in
of Introductory
X greater than (if in a
Probability
special case ≤ x1 , then k = 1)
By definition,
E(X) = x1 p1 + x2 p2 + · · · + xk pk + · · · + xn pn
≥ xk pk + · · · + xn pn
≥ (pk + · · · + pn ) (i)
pk + pk+1 + · · · + pn = P (X ≥ )
E(X) 157
P (X < ) ≥ 1 − (iii)
Then from (i)
E(X) ≥
ADVANCED TOPICS IN INTRODUCTORY P (X ≥ )
PROBABILITY: A FIRST COURSE IN
or
PROBABILITY THEORY – VOLUME III Statistical Inequalities and Limit Laws
E(X)
P (X ≥ ) ≤ (ii)
Since
P (X < ) = 1 − P (X ≥ )
then from (ii) it follows that
E(X)
P (X < ) ≥ 1 − (iii)
Therefore
E(X)
P (X ≥ ) ≤ ,
careers.slb.com/recentgraduates
158
ADVANCED TOPICS IN INTRODUCTORY
PROBABILITY: A FIRST COURSE IN
PROBABILITY THEORY – VOLUME
Statistical Inequalities III Laws
and Limit Statistical Inequalities
171 and Limit Laws
and hence
E(X)
P (X < ) ≥ 1 −
Aliter
Let
0, if X <
Y =
, if X ≥
Then
P (Y = 0) = P (X < )
and
P (Y = ) = P (X ≥ )
Hence
E(Y ) = 0 · P (Y = 0) + · P (Y = )
= P (X ≥ )
Clearly,
X≥Y
Hence
E(X) ≥ E(Y ) = P (X ≥ )
Therefore
E(X)
P (X ≥ ) ≤
Note
(1) The Markov’s inequality may be equivalently stated as in Theorem
5.1(b).
E(X)
P (X < ) ≥ 1 −
172 Advanced Topics in Introductory Probability
Proof
Since X ≥ and X < are complementary events, Theorem 5.1(b)
follows.
(2) The Markov’s inequality had appeared earlier in the work of Pafnuty
Chebyshev, and for this reason it is sometimes referred to in other
books as the first Chebyshev’s inequality. Such books refer to the
Chebyshev’s inequality, discussed in the sequel, as the second Cheby-
shev inequality.
Example 5.1
A textile factory produces on the average 150 bales of suiting material a
month. Suppose the number of bales of a suiting material produced each
month is a random variable. Find the bounds for the probability that a
particular month’s production will be 159
(2) The Markov’s inequality had appeared earlier in the work of Pafnuty
Chebyshev, and for this reason it is sometimes referred to in other
ADVANCED TOPICS IN INTRODUCTORY
books as the first Chebyshev’s inequality. Such books refer to the
PROBABILITY: A FIRST COURSE IN
Chebyshev’s
PROBABILITY THEORYinequality,
– VOLUME discussed
III in the sequel, as the secondInequalities
Statistical Cheby- and Limit Laws
shev inequality.
Example 5.1
A textile factory produces on the average 150 bales of suiting material a
month. Suppose the number of bales of a suiting material produced each
month is a random variable. Find the bounds for the probability that a
particular month’s production will be
Solution
Let X be the number of bales of the suiting material produced in a month.
E(X) = 150
(b) We are required to find P (X < 200). Using the second formulation we
have
150
P (X < 200) ≥ 1 −
200
1
=
4
Statistical Inequalities and Limit Laws 173
Var(X)
P (|X − µ| ≥ ) ≤
2
Proof
The inequality
|X − µ| ≥
Proof
The inequality
|X − µ| ≥
(X − µ)2 ≥ 2
Note
(1) The Chebyshev’s inequality may be equivalently stated as in Theorem
5.2(b).
Var(X)
P (|X − µ| < ) ≥ 1 −
2
Proof
Since
{|X − µ| ≥ }
and
{|X − µ| < }
are complementary events, Theorem 5.2(b) follows.
(2) As indicated earlier, the Theorem 5.2 is referred to in some books as
the second Chebyshev’s inequality.
Example 5.2
An electric station services an area with 12, 000 bulbs. The probability of
switching on each of these bulbs every evening is 0.9. What are the bounds
for the probability that the number of bulbs switched on in the area in one
particular evening is different from its expected value in absolute terms by
(a) less than 100? (b) at least 120?
Solution
Let
Solution
Let
Example 5.3
Suppose that a random variable has mean 5 and standard deviation 1.5. Use
the Chebyshev’s inequality to estimate the probability that an outcome lies
between 3 and 7.
Solution
µ = 5, σ = 1.5 PREPARE FOR A
LEADING ROLE.
Since we wish to estimate the probability of an outcome lying between 3
and 7, we have
English-taught
P (3 − 5 < X − µ < 7 − 5) = P (−2 < X − µ < 2) = P (|X − µ| < 2) MSc programmes
in engineering: Aeronautical,
That is = 2. By Chebyshev’s inequality Biomedical, Electronics,
Var(X) 1.5 Mechanical, Communication
P (|X − µ| < 2) ≥ 1 − = 1 − 2 = 0.625 systems and Transport systems.
2 2
No tuition fees.
The desired probability is at least 0.625. That is, if the experiment is re-
peated a large number of times, we expect, at least, 62.5% of the outcome
to be between 3 and 7.
The following two theorems are other forms by which the Chebyshev’s
inequality are expressed. E liu.se/master
‡
162
(b) We are required to calculate P (|X − µ| ≥ 120).
We shall have to use the first Chebyshev’s inequality
ADVANCED TOPICS IN INTRODUCTORY
PROBABILITY: A FIRST COURSE IN 1080
PROBABILITY THEORY P (|X − 10800|
– VOLUME III ≥ 120) ≤ = Statistical
0.075 Inequalities and Limit Laws
(120)2
Example 5.3
Suppose that a random variable has mean 5 and standard deviation 1.5. Use
the Chebyshev’s inequality to estimate the probability that an outcome lies
between 3 and 7.
Solution
µ = 5, σ = 1.5
Since we wish to estimate the probability of an outcome lying between 3
and 7, we have
The following two theorems are other forms by which the Chebyshev’s
176
inequality are expressed. Advanced Topics in Introductory Probability
Theorem 5.3(a)
Let X be a random variable with finite expectation E(X) = µ and
finite variance Var(X). Then for any positive number ,
1
P (|X − µ| ≥ cσ) ≤
c2
where = cσ and σ = Var(X)
In words, the above theorem states that the probability that X assumes a
1
value outside the interval from µ − cσ to µ + cσ is never more than 2 .
c
Theorem 5.3(b)
Let X be a random variable with finite expectation E(X) = µ and
finite variance Var(X). Then for any real number > 0,
1
P (|X − µ| < cσ) ≥ 1 −
c2
where = cσ and σ = Var(X)
This states that the probability of the event that X takes on a value x
1
which is within c standard deviations of its expectation is at least 1 − 2 ,
c
no matter what c happens to be. That is, the probability that X assumes a
1
value within the interval from µ − c σ to µ + c σ is never less than 1 − 2 .
c
Note
Theorems 5.3(a) and 5.3(b) are used only when c > 1. Now, if c < 1, then
1 1
1 − 2 < 0 or 2 > 1, but we know that the probability of any event ranges
c c
from zero to one. Thus, Chebyshev’s inequality 163 of Theorems 5.3(a) and
5.3(b) is trivially true when c < 1.
This states that the probability of the event that X takes on a value x
ADVANCED TOPICS IN INTRODUCTORY 1
which is within c standard deviations of its expectation is at least 1 − 2 ,
PROBABILITY: A FIRST COURSE IN c
no matter what
PROBABILITY c happens
THEORY to be.
– VOLUME III That is, the probability that X assumes
Statistical a
Inequalities and Limit Laws
1
value within the interval from µ − c σ to µ + c σ is never less than 1 − 2 .
c
Note
Theorems 5.3(a) and 5.3(b) are used only when c > 1. Now, if c < 1, then
1 1
1 − 2 < 0 or 2 > 1, but we know that the probability of any event ranges
c c
from zero to one. Thus, Chebyshev’s inequality of Theorems 5.3(a) and
5.3(b) is trivially true when c < 1.
Example 5.4
Suppose a random variable X has an expectation µ = 4.6 and a variance
σ 2 = 2.25. Find the bounds for the following probabilities:
(a) P (|X −
Statistical µ| < 2σ) and
Inequalities (b)Limit
P (|XLaws
− µ| < 3σ) 177
Solution √
µ = 4.6, σ= 2.25 = 1.5
1 3
(a) P [|X − 4.6| < 2(1.5)] ≥ 1 − 2 = = 0.75
2 4
Thus,
P [|X − 4.6| < 3] ≥ 0.75
1 8
(b) P [|X − 4.6| ≤ 3(1.5)] ≥ 1 − 2 = = 0.8889
3 9
Thus,
P (|X − 4.6| < 4.5) ≥ 0.8889
From Example 5.4, we can say that the probability that the random variable
X will take on a value within two standard deviations from the mean is at
least 43 , and the probability that X will take on a value within three standard
deviations from the mean is at least 98 . It is in this sense that the standard
deviation σ controls the spread or dispersion of the distribution of a random
variable.
Theorem 5.4
Let Xn be the sample mean based on a random sample of size n on a
random variable X with expectation µ and finite variance σ 2 . Then
for any real number ,
σ2
P (|X n − µ| ≥ ) ≤
n2
or equivalently,
σ2
P (|X n − µ| < ) ≥ 1 −
n2
where n is the sample size
164
ADVANCED TOPICS IN INTRODUCTORY
PROBABILITY: A FIRST COURSE IN
PROBABILITY
178 THEORY – VOLUME Advanced
III Statistical
Topics in Introductory Inequalities and Limit Laws
Probability
Example 5.5
Let X denote the mean of random variables Xi with expectation µ = 100
and variance σ 2 = 2.5. Find the bound for the probability that in 120 trials
the mean differs from the expected value in absolute terms by less than 0.8.
Solution
= 0.8, n = 120, σ 2 = 2.5
2.5
P (|X n − µ| < 0.8) ≥ 1 − = 0.974
120(0.8)2
Example 5.6
Referring to the X of Example 5.5, determine the size of n such that
Solution
P (|X n − 100| < 0.8) ≥ 0.99
σ2
1− = 0.99
n2
σ2
⇒ = 0.01
n2
Click here
Rearranging we obtain to learn more
TAKE THE
RIGHT TRACK
σ2 2.5
n≥ = = 390.6
0.012 0.01(0.8)2
That is, we need at least 391 trials in order that the probability will be at
least 0.99 that the sample mean X n will lie within 0.8 of the expectation.
Give your career a head start
by studying with us. Experience the advantages
of our collaboration with major companies like
ABB, Volvo and Ericsson!
165
Solution TOPICS IN INTRODUCTORY
ADVANCED
Solution
PROBABILITY: A FIRST COURSE IN
P (|X n III− 100|
PROBABILITY THEORY – VOLUME < 0.8) ≥ 0.99 Statistical Inequalities and Limit Laws
P (|X n − 100| < 0.8) ≥ 0.99
Comparing this with Theorem 5.4 we have:
Comparing this with Theorem 5.4 we have:
σ2
1 − σ 22 = 0.99
1 − n2 = 0.99
n
σ 22
⇒ σ = 0.01
⇒ n22 = 0.01
n
Rearranging we obtain
Rearranging we obtain
σ2 2.5
n ≥ σ2 2 = 2.5 2 = 390.6
n≥ 0.01 = 0.01(0.8) = 390.6
0.01 2
Statistical Inequalities and Limit Laws 0.01(0.8) 2
179
Statistical
That Inequalities
is, weInequalities and
need at least Limit
391 Laws
trials in order that the probability will be179 at
That is, we need at least 391 trials in order that the probability will be179
Statistical and Limit Laws at
least Going
0.99 that the sample
through mean X
the solution will lie within
ofnExample 5.5, we0.8
mayof the expectation.
write the formula
least 0.99 that the sample mean X n will lie within 0.8 of the expectation.
for determining
Going throughthe sample size nofbyExample
the solution Chebyshev’s inequality
5.5, we may write as the formula
for determining the sample size n by Chebyshev’s inequality as the formula
Going through the solution of Example 5.5, we may write
for determining the sample size n by Chebyshev’s inequality as
σ2
n = 22
q
σ
n = σ 22
n = q2
q
where q = 1 − p.
where q = 1 − p.
whereAnother
q = 1 −form
p. of Chebyshev’s inequality which is of great importance
is theAnother
Bernoulli forms
form which are theinequality
of Chebyshev’s applications of the
which is ofDegreat
Moivre-Laplace
importance
Another
Integral form
Theorem. of Chebyshev’s inequality which is of great
is the Bernoulli forms which are the applications of the De Moivre-Laplace importance
is the Bernoulli
Integral Theorem. forms which are the applications of the De Moivre-Laplace
Integral Theorem.
Proof
The
ProofTheorem follows from Chebyshev’s inequality (Theorem 5.2) by remem-
Proof
bering that E(M
The Theorem n ) =from
follows np and Var(Mn ) =
Chebyshev’s inequality
npq. (Theorem 5.2) by remem-
The Theorem follows from Chebyshev’s
bering that E(Mn ) = np and Var(Mn ) = npq. inequality (Theorem 5.2) by remem-
bering that E(M
The reader should ) = np and Var(M ) = npq.
n attempt Exercisen 5.26 (a). It is a typical example of
Theorem 5.5.
The reader should attempt Exercise 5.26 (a). It is a typical example of
The reader
Theorem 5.5.should attempt Exercise 5.26 (a). It is a typical example of
Theorem 5.5 is true for any value we give to the > 0. In particular,
Theorem 5.5.
if we Theorem
replace 5.5
by is where
true
n δ, δ is value
for any thoughtwe of as to
give “small”,
the >then
0. Inweparticular,
have the
Theorem
following 5.5
theorem. is true for any value we give to the > 0. In
if we replace by n δ, where δ is thought of as “small”, then we have theparticular,
if we replace
following by n δ, where δ is thought of as “small”, then we have the
theorem.
following theorem.
166
ADVANCED TOPICS IN INTRODUCTORY
PROBABILITY: A FIRST COURSE IN
PROBABILITY THEORY – VOLUME III Statistical Inequalities and Limit Laws
180 Advanced Topics in Introductory Probability
For any value of δ, the right side of the second part of Theorem 5.6
converges to 1 as n → ∞. What it means is that for any value of δ, no
matter how small, the probability that the proportion of successes in n
trials differs from the theoretical probability p by less than δ tends to 1 as
the number of trials increases without bound. That is, it is guaranteed that
the observed relative frequency of successes will converge to the theoretical
relative frequency (as measured by p) as the number of trials tends to infinity
(see Section 5.4).
Example 5.7
The probability of the occurrence of an event A in each trial of an experiment
2
is . Using Chebyshev’s inequality, find a lower bound for the probability
3
that in 10, 000 trials the deviation of the relative frequency of the event A
from the true probability of A will be less than 0.01.
Solution
The relative frequency in n independent trials is a random variable. Its
2
expectation equals p = .
3
We are required to calculate
Statistical Inequalities and
Mn 2
< 0.01 181
P Limit
Laws
10, 000 −
3
Then,
2 1
Mn 2 3 3
P − < 0.01 ≥ 1−
10, 000 3 (10, 000)(0.01)2
2 7
= 1 − = ≈ 0.778
9 9
Thus, with probability not less than 0.778 we may expect that in 10, 000
trials the relative frequency of event A will deviate from its probability for
less than 0.01.
Example 5.8
How many times should a fair die be tossed in order to be at least 95% sure
that the relative frequency of having a four come up is within 0.01 of the
1
theoretical probability .
6 167
Solution
2 7
= 1− = ≈ 0.778
9 9
ADVANCED TOPICS IN INTRODUCTORY
Thus, with probability
PROBABILITY: not less
A FIRST COURSE IN than 0.778 we may expect that in 10, 000
trials the relative
PROBABILITY THEORYfrequency ofIIIevent A will deviate from
– VOLUME its probability
Statistical for
Inequalities and Limit Laws
less than 0.01.
Example 5.8
How many times should a fair die be tossed in order to be at least 95% sure
that the relative frequency of having a four come up is within 0.01 of the
1
theoretical probability .
6
Solution
1 5
p= , 1−p= and = 0.01
6 6
Mn
Let be the relative frequency. Then
n
Mn
P − p < 0.01 ≥ 0.95
n
By the second Bernoulli Theorem,
Sn pq
P − p < ≥ 1 − 2
n n
Thus
pq
1− = 0.95
n2
pq
⇒ = 0.05
n2
pq
⇒ n =
0.05(2 )
1 5
6 6
=
0.05(0.01)2
= 27777.778
182 Advanced Topics in Introductory Probability
≈ 27, 778
The Chebyshev’s inequality provides only crude and general results. This
limitation arises because of its complete universality; it is valid under any
distribution provided both the expectation and the variance are finite. But
this condition will always be satisfied when the number of values of the
random variable X is finite. In the process of achieving such a general
result, however, this inequality is not particularly tight in terms of the bound
achieved on particular distributions. For most distributions that arise in
practice, there are far sharper bounds for P (|X − µ| < ) than that given by
Chebyshev’s inequality. For instance, if the random variable X is normally
distributed, then the following formula derived in Chapter 11 of Volume I is
more exact:
P (|X − µ| < ) = 2Φ −1
σ
Example 5.9
Referring to Example 5.4, use the Normal distribution to find the probabil-
ities and compare the results.
Solution √
µ = 4.6, σ= 2.25 = 1.5
(a) 2σ = 2(1.5) = 3
(a) 2σ = 2(1.5) = 3
In both cases the probabilities are greater than those obtained by Cheby-
shev’s inequality in Example 5.4.
Note
The inequality that helps us derive bounds for sums of independent random
variables is the Komogorov’s inequality.
Suppose that X1 , X2 , · · · , Xn are independent random variables with
mean zero and finite variances and let Sk = X1 + X2 + · · · + Xn . Then for
>0
Var(Sn )
P max |Sk | ≥ ≤
1≤k≤n 169
2
Statistical Inequalities and Limit Laws 183
ADVANCED TOPICS IN INTRODUCTORY
PROBABILITY: A FIRST COURSE IN
In both casesTHEORY
PROBABILITY the probabilities
– VOLUME IIIare greater than those Statistical
obtained byInequalities
Cheby- and Limit Laws
shev’s inequality in Example 5.4.
Var(X) Var(X)
1− > 0, or < 1, or Var(X) < 2
2 2
If, on the other hand, Var(X) > 2 , then the right side of the inequality of
Theorem 5.2 will become negative and the Chebyshev inequality will give
P (|X − µ ≤ ) < −a
where a > 0.
Thus, Chebyshev’s inequality is trivial for the case when Var(X) > 2 .
This of course reduces the role of Chebyshev’s inequality in its application
to practical problems; however, its theoretical importance is great. It is
the starting point for several theoretical development. It provides us with a
convenient interpretation of the concept of variance (or standard deviation).
It can also be used to provide a simple proof for the law of large numbers
in the next section.
Note
The inequality that helps us derive bounds for sums of independent random
variables is the Komogorov’s inequality.
Suppose that X1 , X2 , · · · , Xn are independent random variables with
mean zero and finite variances and let Sk = X1 + X2 + · · · + Xn . Then for
>0
Var(Sn )
P max |Sk | ≥ ≤
1≤k≤n 2
184 Advanced Topics in Introductory Probability
For the proof see page 248 of Billinsley, P. (1979) listed in the bibliography.
P (|Xn − θ| > )
or
p
Xn −→ θ
Synonyms for convergence in probability are stochastic convergence, conver-
gence in measure, or weak convergence.
numbers, namely, the weak and the strong laws of large numbers. For our
where p is the probability of success
purposes, which are statistical, the weak law of large numbers is the central
concept and when the “law of large numbers” is referred to without qualifi-
cation, this one is implied. We shall discuss it in much detail later but now
we state without proof, the strong law of large numbers.
10
The other type of convergence that plays an important role in Probability theory is
5.4.2 Strong
convergence Law ofalso
in distribution, Large
called Numbers
complete convergence or weak convergence. Sup-
pose {Xn } is a sequence of random variables, (n = 1, 2, · · ·) and let Fn (t) be their c.d.f.’s.
Even though
If for every the strong
t at which F0 (t) islaw of large numbers may not be realistic, what
continuous,
might not be realised in the real-world situation may sometimes be achieved
in a purely theoretical sense.n→∞ Such
lim a=
Fn (t) possibility
F0 (t) was unravelled by Borel
in 1909, who established the strong law of large numbers in the case of
independent
then Xn is saidBernoulli
to convergetrials.
d
in distribution to X0 , denoted by Fn (t) → F0 (t).
678'<)25<2850$67(5©6'(*5((
Let Mn be a random variable of the number of successes in n
Mn
Bernoulli trials so that is the proportion of successes. Then
n
&KDOPHUV8QLYHUVLW\RI7HFKQRORJ\FRQGXFWVUHVHDUFKDQGHGXFDWLRQLQHQJLQHHU
Mn
P lim =p =1
LQJDQGQDWXUDOVFLHQFHVDUFKLWHFWXUHWHFKQRORJ\UHODWHGPDWKHPDWLFDOVFLHQFHV
n→∞ n
DQGQDXWLFDOVFLHQFHV%HKLQGDOOWKDW&KDOPHUVDFFRPSOLVKHVWKHDLPSHUVLVWV
where p is the probability of success
IRUFRQWULEXWLQJWRDVXVWDLQDEOHIXWXUH¤ERWKQDWLRQDOO\DQGJOREDOO\
9LVLWXVRQ&KDOPHUVVHRU1H[W6WRS&KDOPHUVRQIDFHERRN
10
The other type of convergence that plays an important role in Probability theory is
convergence in distribution, also called complete convergence or weak convergence. Sup-
pose {Xn } is a sequence of random variables, (n = 1, 2, · · ·) and let Fn (t) be their c.d.f.’s.
If for every t at which F0 (t) is continuous,
d
then Xn is said to converge in distribution to X0 , denoted by Fn (t) → F0 (t).
172
numbers, namely, the weak and the strong laws of large numbers. For our
purposes, which are statistical, the weak law of large numbers is the central
ADVANCED TOPICS IN INTRODUCTORY
concept and when the “law of large numbers” is referred to without qualifi-
PROBABILITY: A FIRST COURSE IN
cation, this one
PROBABILITY is implied.
THEORY WeIIIshall discuss it in much Statistical
– VOLUME detail later Inequalities
but now and Limit Laws
we state without proof, the strong law of large numbers.
Even though the strong law of large numbers may not be realistic, what
might not be realised in the real-world situation may sometimes be achieved
in a purely theoretical sense. Such a possibility was unravelled by Borel
in 1909, who established the strong law of large numbers in the case of
independent Bernoulli trials.
The proof of this theorem goes beyond this book. To see it refer to page
250 of Billingsley, P. (1979), listed in the bibliography.
The strong law of large numbers makes better sense than the weak law
and it is indispensable for certain theoretical investigations. It is indeed the
foundation of a mathematical theory of probability based on the concept of
relative frequency. It is also known that the sample mean, X n , converges
with probability 1 to the mean µ, provided that the latter exists. The
strong law of large numbers is usually not given much attention in most
statistical textbooks, including this one partly because it is not realistic to
assert this almost sure convergence, for in any experiment we can neither be
100 percent sure nor 100 percent accurate, otherwise the phenomenon will
not be a random one. Secondly, in general, it is easier to prove the weak
law of large numbers than the strong law of large numbers.
The weak law of large numbers is one of the earliest and most famous of
the limit laws of probability. We shall consider two of its forms, namely,
Bernoulli law (which relates to proportions) and Khinchin law (which relates
to the means).
173
The weak law of large numbers is one of the earliest and most famous of
the limit laws of probability. We shall consider two of its forms, namely,
Bernoulli law (which relates to proportions) and Khinchin law (which relates
to the means).
The first formulation of the law of large numbers, known as the Bernoulli
188 Advanced Topics in Introductory Probability
law of large numbers, was given and proved by Jakob Bernoulli and pub-
lished posthumously in his book “Ars Conjectandi” in 1713 as a crowning
achievement. It states that if Sn represents the number of successes in n
identical Bernoulli trials, with probability of success p in each trial, then the
Mn
relative frequency is very likely to be close to p when n is a sufficiently
n
large and fixed integer. The law in a sense justifies the use of the frequency
definition of probability discussed in Chapter 3 of Volume I and it brings
the theory of probability into contact with practice.
Mathematically, the Bernoulli law of large numbers may be expressed
in the following theorem.
Mn
That is, for n large, is very close to p. In other words the law of large
n
numbers may be stated as follows: In n trials the probability that the rela-
Mn
tive number of successes , deviates numerically from the true probability
n
p by not more than ( > 0), approaches 1 as n approaches infinity.
174
ADVANCED TOPICS IN INTRODUCTORY
PROBABILITY: A FIRST COURSE
Statistical Inequalities IN
and Limit Laws 189
PROBABILITY THEORY – VOLUME
Statistical Inequalities III Laws
and Limit Statistical Inequalities
189 and Limit Laws
or equivalently,
Sn
lim P S − µ < = 1
lim P n − µ < = 1
n→∞ n
where S = X + X + · · · + Xn
n→∞
n 1 2 n
where Sn = X1 + X2 + · · · + Xn
Proof
Proof
By Theorem 5.4,
By Theorem 5.4,
σ2
Scholarships
P
X
n
− µ
< = 1 − σ 22
P X n − µ < = 1 − n2
so that n
so that
σ2
1 − σ 22 ≤ P (|X n − µ| < ) ≤ 1
1 − n2 ≤ P (|X n − µ| < ) ≤ 1
n2
σ
lim 1 − σ 22 ≤ lim P (|X n − µ| < ) ≤ 1
Open your mind to
lim 1 − n2 ≤ lim P (|X n − µ| < ) ≤ 1
n→∞
n→∞ n
n→∞
n→∞
new opportunities
With 31,000 students, Linnaeus University is
one of the larger universities in Sweden. We
are a modern university, known for our strong
international profile. Every year more than Bachelor programmes in
1,600 international students from all over the Business & Economics | Computer Science/IT |
world choose to enjoy the friendly atmosphere Design | Mathematics
and active student life at Linnaeus University.
Master programmes in
Welcome to join us! Business & Economics | Behavioural Sciences | Computer
Science/IT | Cultural Studies & Social Sciences | Design |
Mathematics | Natural Sciences | Technology & Engineering
Summer Academy courses
175
Sn
lim P
ADVANCED TOPICS IN INTRODUCTORY − µ < = 1
n→∞ n
PROBABILITY: A FIRST COURSE IN
where SnTHEORY
PROBABILITY = X1 +–X 2 + · · · III
VOLUME + Xn Statistical Inequalities and Limit Laws
Proof
By Theorem 5.4,
σ2
P X n − µ < = 1 −
n2
so that
σ2
1− ≤ P (|X n − µ| < ) ≤ 1
n2
190 σ2
lim 1 − 2 Advanced
≤ lim PTopics
(|X n −inµ|Introductory
< ) ≤ 1 Probability
n→∞ n n→∞
Aliter
We first find E(X n ) and Var(X n ):
n
1
E(X n ) = E(Xi ) = µ (see Chapter 7).
n i=1
lim P (|X n − µ| ≥ ) = 0
n→∞
or equivalently,
Sn
lim P
− µ ≥ = 0
n→∞ n
where Sn = X1 + X2 + · · · + Xn
176
ADVANCED
Statistical TOPICS IN INTRODUCTORY
Inequalities and Limit Laws 191
PROBABILITY: A FIRST COURSE IN
Statistical Inequalities and Limit Laws 191
PROBABILITY THEORY – VOLUME
Statistical Inequalities III Laws
and Limit Statistical Inequalities191 and Limit Laws
We shall observe in Theorem 5.17 in the sequel that Polya’s Theorem is the
central limit theorem of means.
177
We shall observe in Theorem 5.17 in the sequel that Polya’s Theorem is the
central limit theorem of means.
Then
P (Sn∗ ≤ s∗ ) → Φ(s∗ ) as n → ∞
where Φ is the c.d.f. of the Standard Normal distribution
For proof see page 316 of Dudewicz and Mishra, 1988, listed among the
bibliography.
The Central Limit Theorem is concerned with the distribution of the “sum
of random variables”. If X1 , X2 , ..., is a sequence of independent random
178
where Φ is the c.d.f. of the Standard Normal distribution
ADVANCED TOPICS IN INTRODUCTORY
PROBABILITY: A FIRST COURSE IN
For proof seeTHEORY
PROBABILITY page 316 of Dudewicz
– VOLUME III and Mishra, 1988, listed among
Statistical the
Inequalities and Limit Laws
bibliography.
Sn
we know from the law of large numbers that converges to E(X)
n
in probability. The Central Limit Theorem is concerned not with the fact
Sn
that the ratio converges to E(X) but with how it fluctuates around
n
E(X). To analyse these fluctuations, we standardise the sum
Sn − E(Sn )
Tn = (i)
Var(Sn )
As usual we shall discuss two forms of CLT, namely, the Khinchin form
(continuous case) and the Bernoulli form (discrete case.)
Continuous Case
Note
These theorems will also hold when X1 , · · · , Xn are independent random
variables with the same mean and the same finite variance but not neces-
sarily identically distributed.
Discrete Case
The De Moivre-Laplace Theorem discussed in Chapter 12 of Volume I is
179
actually a special case of the Central Limit Theorem, namely, the Bernoulli
form of CLT of frequency.
Substituting these expressions in (i) above the result follows immediately.
Note
ADVANCED TOPICS IN INTRODUCTORY
These theorems
PROBABILITY: will COURSE
A FIRST IN when X1 , · · · , Xn are independent random
also hold
variables with the same mean IIIand the same finite variance
PROBABILITY THEORY – VOLUME but not
Statistical neces-
Inequalities and Limit Laws
sarily identically distributed.
Discrete Case
The De Moivre-Laplace Theorem discussed in Chapter 12 of Volume I is
actually a special case of the Central Limit Theorem, namely, the Bernoulli
form of CLT of frequency.
The Bernoulli and other forms of the central limit theorem arise out of
Theorem 5.15. For instance, if the random variables Xi ’s are independent
Bernoulli random variables, their sum
n
Mn = Xi
i=1
then
Mn − np
Yn = √
npq
tends toInequalities
Statistical the standard
andNormal distribution as n → ∞
Limit Laws 195
The computational significance of the CLT is that for large n, we can express
the cumulative distribution function of Tn in terms of N (0, 1) as follows.
In the case of frequencies,
y − np
P (Yn ≤ y) ≈ Φ
npq
where y is a particular value of Yn .
In the case of continuous random variables
t − nµ
P (Tn ≤ t) ≈ Φ √
σ n
where t is a particular value of Tn .
The logical question that arises at this point is: How large should the
n be to enable us apply the Central Limit Theorem. There is no single
answer, since this depends on the closeness of approximation required and
the actual distribution forms of Xi ’s. If the random variables Xi ’s are nor-
mally distributed, then no matter how small n 180is, their sum is also normally
distributed and P (Tn ≤ t) provides exact probabilities. If nothing is known
where y is a particular value of Yn .
In the case of continuous random variables
ADVANCED TOPICS IN INTRODUCTORY
PROBABILITY: A FIRST COURSE IN
t − nµ
P (Tn III≤ t)
PROBABILITY THEORY – VOLUME ≈Φ √ Statistical Inequalities and Limit Laws
σ n
where t is a particular value of Tn .
The logical question that arises at this point is: How large should the
n be to enable us apply the Central Limit Theorem. There is no single
answer, since this depends on the closeness of approximation required and
the actual distribution forms of Xi ’s. If the random variables Xi ’s are nor-
mally distributed, then no matter how small n is, their sum is also normally
distributed and P (Tn ≤ t) provides exact probabilities. If nothing is known
about the distribution patterns of Xi ’s, or if the distribution of Xi ’s differs
greatly from normality then n must be large enough to guarantee approx-
imate normality for Sn . One rule of thumb states that, in most practical
situations, n equals at least 30 is satisfactory. In general, the approximation
of the sum of random variables to normality becomes better and better as
the sample size increases.
Example 5.10
A die is thrown ninety times. At a given throw the expected number of
7 32
points is and the variance is . Find
2 12
(a) the expectation of the sum
196 of points;
Advanced Topics in Introductory Probability
Solution
7 32
n = 90, µ = , σ2 =
2 12
7
(a) E(Sn ) = nµ = 90 = 315
2
32
(b) Var(Sn ) = nσ 2 = 90 = 240
12
(c) s = 300
Sn − nµ s − nµ s − nµ
P (Sn ≤ s) = P √ ≤ √ = P Tn ≤ √
σ n σ n σ n
300 − 315
Thus, P (Sn ≤ 300) = P
T n ≤
32 √
90
12
−15
= P Tn ≤
(1.6330)(9.4868)
= P (Tn ≤ −0.968)
= 1 − Φ(0.97)
= 1 − 0.8340 = 0.166
.
Solution
7 32
n = 90, µ = , σ2 =
2 12
7
(a) E(Sn ) = nµ = 90 = 315
2
32
(b) Var(Sn ) = nσ 2 = 90 = 240
12
(c) s = 300
Sn − nµ s − nµ s − nµ
P (Sn ≤ s) = P √ ≤ √ = P Tn ≤ √
σ n σ n σ n
300 − 315
Thus, P (Sn ≤ 300) = P
Tn ≤ 32 √
90
12
−15
= P Tn ≤
(1.6330)(9.4868)
= P (Tn ≤ −0.968)
= 1 − Φ(0.97)
= 1 − 0.8340 = 0.166
Xn − µ
Zn = √
σ/ n
tends to the Standard Normal distribution as n → ∞
Note
Zn is the standardised mean of the random variables X1 , · · · , Xn and it is
equal to Tn in Theorem 5.16.
Proof
The proof is quite sophisticated. An outline of a proof is given in Hoel
(1971), and Freund and Walpole (1971) listed 182
among the references.
tends to the Standard Normal distribution as n → ∞
Proof
The proof is quite sophisticated. An outline of a proof is given in Hoel
(1971), and Freund and Walpole (1971) listed among the references.
Example 5.11
Suppose that I.Q. is a random variable with mean µ = 100 and standard
deviation σ = 25. What is the probability that in a class of 40 students
(a) the average I.Q. exceeds 100?
Solution
25
n = 40, µ = 110, σ = 25, σX = √ = 3.9528
40
By the Central Limit Theorem, the mean I.Q. of the 40 students is normally
distributed. Hence
(a) The probability that the average I.Q. exceeds 100 is
198 Advanced Topics in Introductory
Probability
110 − 100
P (X ≥ 100) = P Z ≥
3.9528
= P (Z ≥ 2.53)
= 1 − Φ(2.53)
= 1 − 0.9943
= 0.0057
Theorem 5.18
Suppose that X1 , X2 , · · · , Xn are independent and identically dis-
tributed random variables, each having mean µ and Var(X) = σ 2 .
Suppose also that the average of the measurements, X, is used as
an estimate of µ. Then
√ √
n n
P (|X − µ| < ) ≈ Φ −Φ −
σ σ
Proof
Suppose that we wish to find
P (|X − µ| < )
Proof
Proof
Suppose that we wish to find
Suppose that we wish to find
P (|X − µ| < )
P (|X − µ| < )
for some constant
Statistical > 0.
Inequalities andTo use the Central Limit Theorem to approximate
199
for some constant > 0. ToLimit Laws
use the Central Limit Theorem to approximate
this probability, we standardise the mean, using E(X) = µ and Var(X) =
this probability, we standardise the mean, using E(X) = µ and Var(X) =
σ 22
σ ;
n;
n
P (|X − µ| < ) = P (− < X − µ < )
− X −µ
= P σ < σ < σ
√ √ √
n n n
√ √
n n
≈ Φ −Φ −
σ σ
If we use the Standard Normal Table I in the Appendix, then we may write
this formula as √
n
P (|X − µ| < ) ≈ 2Φ −1
σ
On the other hand
√
n
P (|X − µ| < ) ≈ 2Ψ
σ
if we use Table II in the Appendix.
Example 5.12
Suppose that 25 measurements are taken with σ = 1.2. Find the proba-
bility that the sample mean X deviates from µ by less than 0.3.
Solution
n = 25, σ = 1.2, e = 0.3
√ √
0.3 25 0.3 25
P (|X − µ| < 0.3) ≈ Φ −Φ −
1.2 1.2
√
0.3 25
= 2Φ −1
1.2
= 2Φ(1.25) − 1
= 0.7888
184
On the other hand
√
ADVANCED TOPICS IN INTRODUCTORY n
P (|X −
PROBABILITY: A FIRST COURSE IN µ| < ) ≈ 2Ψ
PROBABILITY THEORY – VOLUME III
σ Statistical Inequalities and Limit Laws
if we use Table II in the Appendix.
Example 5.12
Suppose that 25 measurements are taken with σ = 1.2. Find the proba-
bility that the sample mean X deviates from µ by less than 0.3.
Solution
n = 25, σ = 1.2, e = 0.3
√ √
0.3 25 0.3 25
P (|X − µ| < 0.3) ≈ Φ −Φ −
1.2 1.2
√
0.3 25
= 2Φ −1
1.2
= 2Φ(1.25) − 1
200 Advanced Topics in Introductory Probability
= 0.7888
This sort of reasoning can be turned around. That is, given and Y, n
can be found such that
P (|Xn − µ| < ) ≥ θ
With the Central Limit Theorem we can derive a formula for calculating
the value of n such that the probability that the mean deviates from µ by
is θ :
σ −1 1 + θ 2
n= Φ
2
if we use the Full Normal Table (Table I in the Appendix), or equivalently,
2
σ −1 θ
n= Ψ
2
using the Half Normal Table (Table II in the Appendix).
Example 5.13
Refer to Example 5.12. Suppose σ = 1.2 and = 0.3, find n such that the
probability that the mean deviates from µ by 0.3.
Solution
σ2 = 2 = 0.25 θ = 0.75
Therefore
2
2 1 + 0.75
n= Φ−1 = [8 Φ−1 (.875)]2 = 8(1.15)2 = 84.64 ≈ 85
0.25 2
EXERCISES
5.1 Refer to Example 5.1. Find the bounds for the probability that the
month’s production will be
5.2 The mean lifetime of a certain electrical device is 4 years. Find the
lower bounds for the probability that a randomly selected device from
a consignment of such devices will not exceed 20 years.
186
ADVANCED TOPICS IN INTRODUCTORY
PROBABILITY: A FIRST COURSE IN
PROBABILITY THEORY – VOLUME III Statistical Inequalities and Limit Laws
202 Advanced Topics in Introductory Probability
5.3 The amount of savings in a certain Rural Bank is ten million cedis.
Suppose the probability that an amount of at most two hundred thou-
sand cedis drawn by a customer selected at random is 0.8. What can
we say about the number of customers.
5.4 Refer to Example 5.2. Find the bounds for the probability that the
number of bulbs switched on in the area in that evening is different
from its expected value in absolute terms by
xi 1 2 3 4 5 6
p(xi ) 0.05 0.10 0.25 0.30 0.20 0.10
5.6 Suppose that a random variable has mean 35 and standard deviation
5. Use the Chebyshev’s inequality to estimate the probability that an
outcome will lie between
(a) 24 and 46 (b) 18 and 52 (c) 31 and 39
5.8 Suppose that a random variable has mean 80 and standard deviation
6. Use the Chebyshev’s inequality to find the value of for which the
probability that the outcome lies between 80 − and 80 + is at least
Statistical
5 Inequalities and Limit Laws 203
12 .
5.9 Suppose that a random variable has mean 25 and standard deviation
0.67. Use the Chebyshev’s inequality to find the value of for which
the probability that the outcome lies between 25 − and 25 + is at
most 10
13 .
5.10 Refer to Example 5.4. Find the bounds for the following probabilities:
(a) P (|X − µ| ≤ σ)
(b) P (|X − µ| ≥ 3.3σ)
(c) P (|X − µ| ≤ 2.5σ)
(d) P (|X − µ| ≥ 1.65σ)
(e) P (|X − µ| ≥ 2.2σ)
5.11 Suppose that the number of hours a certain type of light bulb will burn
before requiring replacement has a mean of 2, 000 hours and standard
187
deviation of 150 hours. If 1, 000 such bulbs are installed in a new house,
5.9 Suppose
ADVANCED thatINaINTRODUCTORY
TOPICS random variable has mean 25 and standard deviation
PROBABILITY: A FIRST COURSE
0.67. Use the Chebyshev’sIN inequality to find the value of for which
PROBABILITY THEORY – VOLUME III Statistical Inequalities and Limit Laws
the probability that the outcome lies between 25 − and 25 + is at
most 10
13 .
5.10 Refer to Example 5.4. Find the bounds for the following probabilities:
(a) P (|X − µ| ≤ σ)
(b) P (|X − µ| ≥ 3.3σ)
(c) P (|X − µ| ≤ 2.5σ)
(d) P (|X − µ| ≥ 1.65σ)
(e) P (|X − µ| ≥ 2.2σ)
5.11 Suppose that the number of hours a certain type of light bulb will burn
before requiring replacement has a mean of 2, 000 hours and standard
deviation of 150 hours. If 1, 000 such bulbs are installed in a new house,
estimate the number that will require replacement between 1, 200 and
2, 800 hours from the time of installation.
5.12 Refer to Example 5.7. Find an upper bound for the probability that
in 5,000 trials the deviation of the relative frequency of the event A
from its probability will not exceed 0.08.
5.15 The final scores of the students in Diploma class over a period of four
WE WILL TURN YOUR CV
years have a mean 60 and a variance 64. In a particular year there
INTO AN OPPORTUNITY
OF A LIFETIME
188
estimate the number that will require replacement between 1, 200 and
2, 800 hours from the time of installation.
ADVANCED TOPICS IN INTRODUCTORY
5.12 Refer toA Example
PROBABILITY: 5.7. IN
FIRST COURSE Find anupper bound for the probability that
in 5,000 trials the deviation
PROBABILITY THEORY – VOLUME III ofthe relative frequency of the Inequalities
Statistical event A and Limit Laws
from its probability will not exceed 0.08.
5.18 Suppose that the number of students that enrol in the Basic Statistics
course in the Faculty of Social Studies at the University of Ghana is
a Poisson random variable with mean 500. The Co-ordinator of the
course has decided that if the number enrolling is 350 or more he will
split the group into two, otherwise they will all be in the same class.
What is the probability that the class will be split?
5.19 The mean and standard deviation of the ages of statistics students of
the University of Ghana are 20 years and 5.8 years respectively. What
is the probability that a random sample of 50 students will have a
mean age of between 18 and 23 years?
5.20 The mean age and the standard deviation of Statistics students are
18 years and 1.8 years respectively. What is the probability that a
random sample of 50 students will have a mean of 16 and 20 years?
Let Sn = n1 ni=1 Xi . Use the Chebyshev’s inequality to estimate the
minimum possible value of n, given that
For this value of n, calculate the upper bound to the given probability
obtained by applying Chebyshev’s inequality.
5.25 Twenty numbers are rounded off to the nearest integer and then added.
Assume the individual round-off errors are independent and uniformly
distributed over (− 12 , 12 ). Find the probability that the given sum will
differ from the sum of the original twenty numbers by 3 or more using
(a) Chebyshev’s inequality (b) Central Limit Theorem.
5.26 Suppose a coin is tossed 26 times. Estimate the probability that the
number of heads will deviate from 12 by less than 4, using
(a) Chebyshev’s inequality (b) Central Limit Theorem.
190
ADVANCED TOPICS IN INTRODUCTORY
PROBABILITY: A FIRST COURSE IN
PROBABILITY THEORY – VOLUME III SAMPLING DISTRIBUTIONS I Basic Concepts
Chapter 6
SAMPLING DISTRIBUTIONS I
Basic Concepts
6.1 INTRODUCTION
207
Bioinformatics is the
exciting field where biology,
computer science, and
mathematics meet.
Read more about this and our other international masters degree programmes at www.uu.se/master
191
Volume II (Nsowah-Nuamah, 2018) we considered some special probability
distributions. These included the Bernoulli distribution, Binomial distribu-
ADVANCED TOPICS IN INTRODUCTORY
tion, Geometric distribution, Negative binomial distribution, Poisson dis-
PROBABILITY: A FIRST COURSE IN
tribution, Hypergeometric
PROBABILITY THEORY – VOLUMEdistribution,
III Multinomial SAMPLING
distribution, Uniform I Basic Concepts
DISTRIBUTIONS
distribution, Exponential distribution, Gamma distribution, Beta distribu-
tion and Normal distribution.
In Chapters 1 to 4 of this book, we extended the concept of probability
distributions to bivariate distributions. In Chapter 5 we discussed the basic
Inequalities in Statistics (Markov’s and Chebyshev’s Inequalities) and the
two main Limit Laws (the Central Limit Theorem and the Law of Large
Numbers) which have wider and practical applications in Statistics.
208 Advanced Topics in Introductory Probability
We can now conclude the book with some analytical statistics con-
cerned with inferential procedures. The present chapter and the subse-
quent ones serve as a bridge between the topics discussed in this book and
statistical inference. 207
6.2.1 Population
The elements
Sampling in an infinite
Distributions I population cannot be counted completely209 no
matter how long the counting process is carried on. An example of an
infinite population is an experiment of tossing a coin to determine whether
or not the coin is biased. Theoretically, this experiment could be carried
out an infinite number of times.
Philosophically, no truly infinite population of physical objects exists.
After all, given unlimited resources and time, we could enumerate even the
grains of sand, on a particular seashore. As a practical matter, then, we will
use the term infinite population when we are talking about a population
that could not be enumerated in a reasonable period of time.
6.2.2 Sample
192
It is obvious that in an infinite population complete enumeration cannot
Sampling Distributions
ADVANCED I
TOPICS IN INTRODUCTORY 209
PROBABILITY: A FIRST COURSE IN
PROBABILITY THEORY – VOLUME III SAMPLING DISTRIBUTIONS I Basic Concepts
or not the coin is biased. Theoretically, this experiment could be carried
out an infinite number of times.
Philosophically, no truly infinite population of physical objects exists.
After all, given unlimited resources and time, we could enumerate even the
grains of sand, on a particular seashore. As a practical matter, then, we will
use the term infinite population when we are talking about a population
that could not be enumerated in a reasonable period of time.
6.2.2 Sample
Population Parameter
Examples of statistic include the sample mean, the sample proportion, the
sample variance, the sample range.
As a rule, small populations, for obvious reasons, are not sampled. Instead,
the entire population is examined. A sample that contains all the members in
the population is called an exhaustive sampling, or a 100 per cent sampling,
which of course, are only other names for census.
There are basically two types of sampling: probability sampling and
non-probability sampling. In this book we shall consider only probability
sampling because it is only for probability sampling that there are sound
statistical procedures for drawing conclusions concerning the population of
interest based on the sample drawn.
Copenhagen
Master of Excellence cultural studies
Copenhagen Master of Excellence are
two-year master degrees taught in English
at one of Europe’s leading universities religious studies
194
ADVANCED TOPICS IN INTRODUCTORY
PROBABILITY: A FIRST COURSE IN
PROBABILITY THEORY – VOLUME
Sampling Distributions I III SAMPLING DISTRIBUTIONS
211 I Basic Concepts
Probability sampling is the scientific method by which the units of the sam-
ple are chosen based on some definite pre-assigned probability. The various
types of probability sampling include the cases where:
(a) each member of the population has an equal chance of being selected;
For a probability sample neither the sampler nor the member of the popu-
lation can decide which member will be included in the sample. The selec-
tion is achieved by the operation of chance alone. Synonym for probability
sampling is random sampling. Random sampling methods include simple
random sampling, systematic sampling, and stratified sampling. Samples
obtained by taking every k th name in a list are called systematic samples.
Sometimes a population P is divided into groups, P1 , P2 , · · · , Pr called
strata and random samples are taken from these strata and combined to
get a stratified random sample. This is often done when we want to be sure
that we shall have specified numbers of subjects from each stratum. Other
sampling techniques include cluster sampling and multistage sampling. For
details about sampling techniques see Nuamah (1994).
The simple random sampling is the only one that can be considered a
true random sampling and in this book unless otherwise stated, by random
212
sampling we mean simple random sampling.
Advanced Topics in Introductory Probability
The values in the ith column may be considered the values of a random
variable Xi corresponding to the ith trial of the sample having probability
density function fi (xi ). Thus,
The probability density function f (xi ) has to fulfill two conditions in order
to describe the process of simple random sampling. These are given in
Definition 6.9.
Sampling Distributions I 213
(a) the successive selection of the element in the population are indepen-
dent and
(b) the density function of the random variable remains the same from
selection to selection as that of the population.
probability
electricity needs. Already today, SKF’s innovative know-
of a large proportion of the
is crucial to running
being included in the sample, world’s wind turbines.
Up to 25 % of the generating costs relate to mainte-
(b) different members of the population are selectednance. These can be reduced dramatically thanks to our
independently
systems for on-line condition monitoring and automatic
(that is, selection of one member has no effect lubrication.
on selectionWe helpof
make it more economical to create
another) cleaner, cheaper energy out of thin air.
By sharing our experience, expertise, and creativity,
industries can boost performance beyond expectations.
Therefore we need the best employees who can
meet this challenge!
(a) in any position in the table, each of the numbers 0 through 9 has equal
1 197
probability of of occurring.
10
Simple random sampling is any technique designed to draw sample
members from a population in such a way that each member in the
ADVANCED TOPICS IN INTRODUCTORY
population
PROBABILITY: has anCOURSE
A FIRST equal chance
IN of being selected
PROBABILITY THEORY – VOLUME III SAMPLING DISTRIBUTIONS I Basic Concepts
(a) in any position in the table, each of the numbers 0 through 9 has equal
1 I
Sampling Distributions 215
probability of of occurring.
10
(b) The occurrence of any number in one part of the table is independent
of the occurrence of any number in any other part of the table (i.e.
knowing the numbers in one part of the table tells us nothing about
the numbers in another part of the table).
Table X of Appendix A presents a table of random numbers.
We shall illustrate in the following example how random numbers are used
in the selection of a sample.
Example 6.1
Select 15 students from a student population of 250 using a table of random
numbers.
Solution
The number is 46201. The first three digits of this number are 462.
This number is thrown out because it is larger than the population
size (N = 250). There is no student whose number is 462.
045 211 006 194 214 212 029 276 044 165 154 080
076 043 244
Note
If, by chance, the same number occurs more than once, we would
ignore it, after it has appeared for the first time.
200
ADVANCED TOPICS IN INTRODUCTORY
PROBABILITY: A FIRST COURSE IN
218
PROBABILITY THEORY – VOLUME Advanced
III Topics in Introductory Probability I Basic Concepts
SAMPLING DISTRIBUTIONS
N N!
=
n n! (N − n)!
The probability for each subset of n of the N objects of the finite
population is
1
N
n
We can also determine the joint probability distributions of the random
variables from a random sample of size n from a finite population by
1
f (x1 , x2 , · · · , xn ) =
N (N − 1) · · · (N − n + 1)
To generalise: when a simple random sample of size n is drawn from
a finite population with replacement, or from an infinite population with
or without replacement, we have n identically distributed random variables
X1 , X2 , · · · , Xn all possessing the same distribution as the parent population.
Also, if the population is finite but large then even when sampling is without
replacement, the sample observations, xi , can in practice be treated as if
they were identical and independent.
EXERCISES
201
ADVANCED TOPICS IN INTRODUCTORY
PROBABILITY: A FIRST COURSE IN SAMPLING DISTRIBUTIONS II
PROBABILITY THEORY – VOLUME III SAMPLING DISTRIBUTION OF STATISTICS
Chapter
Chapter 7
7
Chapter 7
SAMPLING
SAMPLING DISTRIBUTIONS
DISTRIBUTIONS II II
Sampling Distribution
Sampling Distribution of Statistics
of StatisticsII
SAMPLING DISTRIBUTIONS
Sampling Distribution of Statistics
7.1 INTRODUCTION
7.1 INTRODUCTION
7.1 INTRODUCTION
7.1.1 Definition of Statistic
7.1.1 Definition of Statistic
Having
7.1.1 determined
Definitionin Chapter 6 what a random sample is, we can now ex-
Having determined inofChapter
Statistic
6 what a random sample is, we can now ex-
amine the distributions of statistics calculated from random samples.
amine the distributions of statistics calculated from random samples.
Though
Having we have
determined already 6defined
in Chapter what a what
random “statistic”
sample is,is, we shall
nownow
Though we have already defined what “statistic” is,wewecan
shall ex-
now
present
amine its distributions
mathematical definition.
presentthe
its mathematicalofdefinition.
statistics calculated from random samples.
Though we have already defined what “statistic” is, we shall now
present its mathematical definition.
Definition 7.1 STATISTIC
Definition 7.1 STATISTIC
Let X , X , ..., X be a random sample from a random variable and
Let X11 , X22 , ..., Xnn be a random sample from a random variable and
let x1 , x2 , ...,7.1
x be the values assumed by the sample. Then the
let x1 , x2 , ..., xnn beSTATISTIC
Definition the values assumed by the sample. Then the
real-valued
Let X1 , X2 , function
..., Xn be a random sample from a random variable and
real-valued function
let x1 , x2 , ..., xn be the values assumed by the sample. Then the
real-valued function Y = G(X , X , ..., X )
Y = G(X11 , X22 , ..., Xnn )
is a statistic which assumes the value
Y = G(X y...,=XG(x
1 , X2 , y n) 1
, x2 , ..., xn )
is a statistic which assumes the value = G(x 1 , x2 , ..., xn )
202
real-valued function
Y = G(X , X2 , ..., Xn )
ADVANCED TOPICS IN INTRODUCTORY 1
PROBABILITY: A FIRST COURSE IN SAMPLING DISTRIBUTIONS II
is a statistic
PROBABILITY which
THEORY – VOLUME n)
assumesIIIthe value y = G(x1 , x2 , ..., xSAMPLING DISTRIBUTION OF STATISTICS
(a) Randomly select all possible samples of size n from the finite popula-
tion of size N .
(c) List the different distinct observed values of the statistic together with
the corresponding frequency of occurrence of each distinct observed
value of the statistic.
For infinite or large finite populations, one could approximate sampling dis-
tribution of a statistic by drawing large number of independent simple ran-
dom samples and by proceeding in the manner just described above. In
fact, the actual construction of a sampling distribution according to the
steps given above is a tedious task. Fortunately, there are theorems that
simplify things for us.
In the subsequent sections we shall introduce sampling distributions
for the most frequently encountered statistics: the mean, proportion and
variance. We shall now introduce an example that will be used most often
in this chapter.
Example 7.1
Suppose a population consists of four numbers: 4, 5, 7, 8.
Find
Solution
Solution
This e-book
is made with SETASIGN
SetaPDF
www.setasign.com
205
ADVANCED TOPICS IN INTRODUCTORY
PROBABILITY: A FIRST COURSE IN SAMPLING DISTRIBUTIONS II
PROBABILITY THEORY – VOLUME
Sampling Distributions II III SAMPLING DISTRIBUTION
223 OF STATISTICS
Example 7.2
Refer to Example 7.1.
(a) List all possible samples Advanced
224 of size twoTopics
that can be drawn from
in Introductory the pop-
Probability
ulation with replacement.
(b) Find
(i) the mean of the distribution of the means;
(ii) the variance and standard deviation of the distribution of the
means.
Solution
(a) N = 4, n = 2
There are N n = 42 = 16 possible samples of size two that can be drawn
from a population of size four (where sampling is with replacement).
These are
(4, 4) (4, 5) (4, 7) (4, 8) (5, 4) (5, 5) (5, 7) (5, 8)
(7, 4) (7, 5) (7, 7) (7, 8) (8, 4) (8, 5) (8, 7) (8, 8)
Note
206
Here the notation (a, b) is an ordered pair. For example, (4, 5) denotes
means.
Note
Here the notation (a, b) is an ordered pair. For example, (4, 5) denotes
“first a 4 and then a 5” and is different from (5, 4) which denotes “first
a 5 and then a 4.”
(b) (i) The corresponding sample means X i are
4.0 4.5 5.5 6.0 4.5 5.0 6.0 6.5
5.5 6.0 7.0 7.5 6.0 6.5 7.5 8.0
The sample mean and their associated probabilities (relative fre-
quencies) are as follows:
Note
Without calculating the probabilities we could have obtained the mean of
the sampling distribution of means from the raw data (the sample means
themselves) as:
X 4.0 + 4.5 + 5.0 + · · · + 6.5 + 7.0 + 7.5 + 8.0
µX = =
N2 42
96
=
16
= 6
Note
226 variance of the sampling distribution
The of means
Advanced Topics is not equal to
in Introductory the pop-
Probability
ulation variance.
As has been pointed out earlier, it is burdensome to construct the
sampling distribution of means in the way it has been done. In practice, we
shall employ the following two theorems to obtain the mean and variance of
the sample mean.
Theorem 7.1
Let X be a random variable with expectation E(X) = µ and variance
Var(X) = σ 2 . Let X be the sample mean of a random sample of size
n. Then the mean of the sampling distribution of means is given by
µX = E(X) = µ
Sharp
The theorem aboveMinds - Bright
states that Ideas!
the expected value of the sample mean is the
population mean.
Employees at FOSS Analytical A/S are living proof of the company value - First - using The Family owned FOSS group is
new inventions to make dedicated solutions for our customers. With sharp minds and the world leader as supplier of
Proof cross functional teamwork, we constantly strive to develop new unique products - dedicated, high-tech analytical
Since theWould
sample mean
you like to join is
ourdefined
team? as solutions which measure and
control the quality and produc-
FOSS works diligently withX 1 + X2and
innovation + development
· · · + Xn as basis for its growth. It is tion of agricultural, food, phar-
reflected in theX = more than 200 of the 1200 employees in FOSS work with Re-
fact that maceutical and chemical produ-
search & Development in Scandinavia
n
and USA. Engineers at FOSS work in production,
1 cts. Main activities are initiated
development and marketing, within a wide range of different fields, i.e. Chemistry,
E(X) = E(X
Electronics, Mechanics, Software, 1 ) + E(X 2 ) + · · · + E(X
Optics, Microbiology, Chemometrics. n ) from Denmark, Sweden and USA
n with headquarters domiciled in
1 Hillerød, DK. The products are
We offer
= (nµ) marketed globally by 23 sales
n
A challenging job in an international and innovative company that is leading in its field. You will get the
= µ
opportunity to work with the most advanced technology together with highly skilled colleagues.
companies and an extensive net
of distributors. In line with
the corevalue to be ‘First’, the
Read more about FOSS at www.foss.dk - or go directly to our student site www.foss.dk/sharpminds where
company intends to expand
you can learn more about your possibilities of working together with us on projects, your thesis etc.
Theorem 7.2 its market position.
If a Dedicated
populationAnalytical
is infiniteSolutions
or if sampling is with replacement, then
the variance
FOSS
of the sampling 2 ,
distribution of means, denoted by σX
is given by
Slangerupgade 69
3400 Hillerød σ2
Tel. +45 70103370 Var(X) = σ 2
X
=
n
www.foss.dk
That is, the variance of the sampling distribution is equal to the population
variance divided by the size of the sample used to obtain the sampling
distribution. 208
As has
ADVANCED been INpointed
TOPICS out earlier, it is burdensome to construct the
INTRODUCTORY
PROBABILITY: A FIRST COURSE IN in the way it has been done. In practice,
sampling distribution of means we
SAMPLING DISTRIBUTIONS II
shall employ the following two theorems to obtain the mean and variance of
PROBABILITY THEORY – VOLUME III SAMPLING DISTRIBUTION OF STATISTICS
the sample mean.
Theorem 7.1
Let X be a random variable with expectation E(X) = µ and variance
Var(X) = σ 2 . Let X be the sample mean of a random sample of size
n. Then the mean of the sampling distribution of means is given by
µX = E(X) = µ
The theorem above states that the expected value of the sample mean is the
population mean.
Proof
Since the sample mean is defined as
X1 + X 2 + · · · + X n
X =
n
1
E(X) = E(X1 ) + E(X2 ) + · · · + E(Xn )
n
1
= (nµ)
n
= µ
Theorem 7.2
If a population is infinite or if sampling is with replacement, then
the variance of the sampling distribution of means, denoted by σX 2 ,
is given by
σ2
Var(X) = σX 2
=
n
That is, the variance of the sampling distribution is equal to the population
variance divided by the size of the sample used to obtain the sampling
Sampling Distributions II 227
distribution.
Proof
X 1 X2 Xn
Var(X) = Var + + ... +
n n n
1
= Var(X 1 ) + Var(X 2 ) + ... + Var(X n )
n2
1 σ 2
= 2
(nσ 2 ) =
n n
Note
Theorems 7.1 and 7.2 do not assume Normality of the “parent” population.
The standard error of the mean measures chance variations of sample mean
from sample to sample. Denoting the standard error by σX , it is given by
209
σ
σ =√
Definition 7.4 STANDARD ERROR OF MEAN
ADVANCED TOPICS IN INTRODUCTORY
The positive
PROBABILITY: square
A FIRST root of
COURSE IN the variance
of the mean is referred to SAMPLING
as DISTRIBUTIONS II
the standard
PROBABILITY error
THEORY of the sample
– VOLUME III mean SAMPLING DISTRIBUTION OF STATISTICS
The standard error of the mean measures chance variations of sample mean
from sample to sample. Denoting the standard error by σX , it is given by
σ
σX = √
n
This formula shows that the standard error of the mean decreases when n,
the sample size, is increased. This means that when n is sufficiently large
and we actually have more information, sample means can be expected to
be closer to µ, the quantity which X is usually supposed to estimate.
Example 7.3
Refer to Example 7.1. Employ Theorems 7.1 and 7.2 to obtain the variance
of the sampling distribution of means if a sample of size two is drawn with
replacement from the population.
Solution
From Example 7.2, N = 4, σ 2 = 2.5.
Also, the sample size n = 2. Hence
σ2 2.5
228 Var(X) = = Topics
Advanced = 1.25
in Introductory Probability
n 2
Although Theorems 7.1 and 7.2 give us some characteristics of the sampling
distribution, it does not permit us to calculate probabilities because we do
not know the form of the sampling distributions. To be able to do this we
need to use the Central Limit Theorem, as it can be seen later.
Example 7.4
Refer to Example 7.1
(a) List all possible samples of size two that can be drawn from the pop-
ulation without replacement.
(b) Find
Solution
N = 4, n = 2
4
(a) There are = 6 samples of size two which can be drawn without
2
replacement, namely,
(i) the
ADVANCED mean
TOPICS of the distribution of the means;
IN INTRODUCTORY
PROBABILITY: A FIRST COURSE IN SAMPLING DISTRIBUTIONS II
(ii) the variance and standard deviation of the distribution of the
PROBABILITY THEORY – VOLUME III SAMPLING DISTRIBUTION OF STATISTICS
means.
Solution
N = 4, n = 2
4
(a) There are = 6 samples of size two which can be drawn without
2
replacement, namely,
Note
Sampling
The Distributions II example, is considered the same as (5, 4).
sample (4, 5) for 229
Note
Once again the mean of the sampling distribution of means is equal to
the population mean.
(ii) The variance of this sampling distribution is
(Xi − µX )
2
σX =
N
n
(4.5 − 6)2 + (5.5 − 6)2 + · · · + (7.5 − 6)2
=
4
2
2.25 + 0.25 + 0 + 0 + 0.25 + 2.25
=
6
5
= = 0.83333
6
5
σX = = 0.9129
6
Note
As in the case of sampling with replacement, the variance of the sam-
pling distribution is not equal to the population variance. Moreover,
it is not equal to the population variance divided by the sample size
(σX2 = 5 = 2.5 ). The formula for obtaining the variance of the sam-
6 4
pling distribution of means in the case of sampling without replace-
ment is given in Theorem 7.3.
211
4
2
ADVANCED TOPICS IN INTRODUCTORY
4.5 + 5.5 + 6.0 + 6.0 + 6.5 + 7.5
= IN
PROBABILITY: A FIRST COURSE SAMPLING DISTRIBUTIONS II
PROBABILITY THEORY – VOLUME III 6 SAMPLING DISTRIBUTION OF STATISTICS
= 6
Note
Once again the mean of the sampling distribution of means is equal to
the population mean.
(ii) The variance of this sampling distribution is
(Xi − µX )
2
σX =
N
n
(4.5 − 6)2 + (5.5 − 6)2 + · · · + (7.5 − 6)2
=
4
2
2.25 + 0.25 + 0 + 0 + 0.25 + 2.25
=
6
5
= = 0.83333
6
5
σX = = 0.9129
6
Note
As in the case of sampling with replacement, the variance of the sam-
pling distribution is not equal to the population variance. Moreover,
it is not equal to the population variance divided by the sample size
(σX2 = 5 = 2.5 ). The formula for obtaining the variance of the sam-
6 4
pling distribution of means in the case of sampling without replace-
230 ment is given in Theorem 7.3.
Advanced Topics in Introductory Probability
Theorem 7.3
If X is the mean of a random sample of size n from a finite
population of size N (or if sampling is without replacement), whose
mean is µ and variance is σ 2 , then
E(X) = µ
σ2 N − n
Var(X) = ·
n N −1
The formula for obtaining Var(X) in Theorem 7.2, which applies to values
assumed by independent random variables (or sampling with replacement
from a finite population) and the one in Theorem 7.3, which applies to
sampling without replacement from a finite population, differ by the factor
N −n
. If N , the size of the population, is larger compared to n, the size
N −1
of the sample, the difference between the two formulas becomes negligible.
Indeed, the formula in Theorem 7.2 is frequently used as an approxima-
tion for the variance of the distribution of X for samples obtained without
replacement from sufficiently large finite populations.
Example 7.5
For the data of Example 7.1 if a sample of size two, is drawn without re-
placement, calculate the variance and hence the standard deviation of the
sampling distribution of means.
212
Solution
. If N , the size of the population, is larger compared to n, the size
N −1
of the sample, the difference between the two formulas becomes negligible.
ADVANCED TOPICS IN INTRODUCTORY
Indeed, the formula
PROBABILITY: in Theorem
A FIRST COURSE IN 7.2 is frequently used as an approxima-
SAMPLING DISTRIBUTIONS II
tion for the variance
PROBABILITY THEORY –of the distribution
VOLUME III of X for samples obtained
SAMPLINGwithout
DISTRIBUTION OF STATISTICS
replacement from sufficiently large finite populations.
Example 7.5
For the data of Example 7.1 if a sample of size two, is drawn without re-
placement, calculate the variance and hence the standard deviation of the
sampling distribution of means.
Solution
Since sampling is without replacement, we use the formula in Theorem 7.3
σ 2 = 2.5, N = 4, n=2
σ2 N − n
Var(X) = ·
n N −1
2.5 4 − 2
= ·
2 4−1
= 0.83333
Hence the standard error of the mean is
Sampling Distributions II √ 231
σX = 0.83333 = 0.9129
Note
These results are equal to the ones obtained in Example 7.4
Theorem 7.4
If the population from which random samples are taken is normally
distributed with mean µ and variance σ 2 then the sample mean X
σ2
is normally distributed with mean µ and variance
n
Proof
From Theorem 6.22 of Volume I it follows that
t
MX (t) = M 1 (X1 +X2 ...+Xn ) (t) = MX1 +...+Xn
n n
Since the sampling is random (that is, simple random), the variables X1 ,
X2 , .., Xn are independent, and therefore Theorem 6.28 of Volume I may be
applied to give
t t t
MX (t) = MX1 MX2 ...MX n (i)
n n n
From Definition 7.2 all the random variables X1 , X2 , ..., Xn have the same
probability density function, namely that of X, and hence the same moment
generating function. Consequently, all the moment
213 generating functions
Since the sampling is random (that is, simple random), the variables X1 ,
X2 , .., Xn are independent, and therefore Theorem 6.28 of Volume I may be
ADVANCED TOPICS IN INTRODUCTORY
applied to give
PROBABILITY: A FIRST COURSE IN SAMPLING DISTRIBUTIONS II
PROBABILITY THEORY – VOLUME III SAMPLING DISTRIBUTION OF STATISTICS
t t t
MX (t) = MX1 MX2 ...MX n (i)
n n n
From
232 Definition 7.2 all the random variables
Advanced TopicsXin
1, X 2 , ..., Xn have
Introductory the same
Probability
probability density function, namely that of X, and hence the same moment
generating function. Consequently, all the moment generating functions
on the right hand side of (i) are the same function, namely the moment
generating function of the random variable X. Thus
n
t
MX (t) = MX
n
t
replacing t by in the formula will yield
n
n
t t2
µ + 12 σ2
MX (t) = e n n2
1 2 σ2
= eµ t+ 2 t n
(b) What is the probability that the mean of this sample will fall between
16 and 23?
Solution
(i) E(X) = µ = 17
.QYURGGF'PIKPGU/GFKWOURGGF'PIKPGU6WTDQEJCTIGTU2TQRGNNGTU2TQRWNUKQP2CEMCIGU2TKOG5GTX
6JGFGUKIPQHGEQHTKGPFN[OCTKPGRQYGTCPFRTQRWNUKQPUQNWVKQPUKUETWEKCNHQT/#0&KGUGN6WTDQ
2QYGTEQORGVGPEKGUCTGQHHGTGFYKVJVJGYQTNFoUNCTIGUVGPIKPGRTQITCOOGsJCXKPIQWVRWVUURCPPKPI
HTQOVQM9RGTGPIKPG)GVWRHTQPV
(KPFQWVOQTGCVYYYOCPFKGUGNVWTDQEQO
214
1 2 σ2
= eµ t+ 2 t n
(b) What is the probability that the mean of this sample will fall between
16 and 23?
Solution
σ2 5
(ii) Var(X) = = = 0.25
n 20
σ2 √
σX = = 0.025 = 0.5
n
(b) Though n = 20 is small, we still apply the Normal distribution because
the population from which the sample is drawn is normally distributed.
16 − 17 X − 17 23 − 17
P (16 < X < 23) = P √ < √ < √
0.5 0.5 0.5
= P (−2 < Z < 12)
= Φ(12) − Φ(−2)
= 1 − {1 − Φ(2)}
= Φ(2)
= 0.9872
We are often faced with the problem of sampling from non-normally dis-
tributed populations or from populations whose distributions are not known.
Under these two conditions, we shall have to take large samples since when
the sample size is sufficiently large, by virtue of the Central Limit Theorem
(Theorem 5.16), the means of random samples from any distribution will
σ2
tend to be normally distributed with a mean µ and variance . This per-
n
mits the use of the Normal distribution approximation as though sampling
is from normally distributed populations. Thus inference procedures based
on the sample mean can often use the Normal distribution. But we must be
careful not to imput normality to the original observations.
Example 7.7
A random sample of size 100 is taken from a population with mean 60 and
variance 300.
Example 7.7
A random sample of size 100 is taken from a population with mean 60 and
variance 300.
(b) What is the probability that the sample mean will be less than 56?
Solution
n = 100 µ = 60 Var(X) = 300
(b) The sampling distribution of the mean is not known but the sample
size is large so by the Central Limit Theorem we approximate it by
the Normal distribution.
X n − 60 56 − 60
P (X n < 56) = P √ < √
3 3
= P (Zn < −2.31)
= Φ(−2.31)
= 1 − Φ(2.31)
= 1 − 0.9896
= 0.0104
Theorem 7.5
If sampling is with replacement, then
(a) E(p̂) = p
p(1 − p)
(b) Var(p̂) =
n
where p̂ is the sample proportion
Example 7.8
A consignment of bulbs has 20 percent defective. If a random sample of 250
is drawn with replacement from this consignment. What is the
(b) variance and hence the standard error of the sample proportion?
Challenge the way we run
RUN FASTER.
RUN LONGER.. READ MORE & PRE-ORDER TODAY
RUN EASIER… WWW.GAITEYE.COM
217
p(1 − p)
(b) Var(p̂)
ADVANCED TOPICS=IN INTRODUCTORY
n
PROBABILITY: A FIRST COURSE IN SAMPLING DISTRIBUTIONS II
where p̂ is
PROBABILITY the sample
THEORY proportion
– VOLUME III SAMPLING DISTRIBUTION OF STATISTICS
Example 7.8
A consignment of bulbs has 20 percent defective. If a random sample of 250
is drawn with replacement from this consignment. What is the
Solution
p = 0.2, n = 250
(b)
p(1 − p)
Var(p̂) =
n
0.2(1 − 0.2)
= = 0.00064
√ 250
σp̂ = 0.00064 = 0.0253
Theorem 7.6
If sampling is without replacement, then
(a) E(p̂) = p
p(1 − p) N − n
(b) Var(p̂) = ·
n N −1
where p̂ is the sample proportion
That is, E(p̂) is still identical with the population proportion p but the
N −n
variance is adjusted by a finite population correction factor .
N −1
Example 7.9
In a class of 50 students, twenty are females and thirty are males. A random
sample of twelve students are drawn from this class without replacement.
Determine the variance and hence the standard error of the proportion of
male students in the sample of twelve.
218
ADVANCED TOPICS IN INTRODUCTORY
PROBABILITY: A FIRST COURSE IN SAMPLING DISTRIBUTIONS II
PROBABILITY THEORY – VOLUME
Sampling Distributions II III SAMPLING DISTRIBUTION
237 OF STATISTICS
Solution
Let k represent the number of female students in the class. Hence
N = 50, k = 20, n = 12
The population proportion of females in the class is therefore
k 20
p= = = 0.4
N 50
Since this is sampling without replacement, we have
p(1 − p) N − n
σp̂2 = Var(p̂) = ·
n N −1
0.4(1 − 0.4) 50 − 20
= ·
12 50 − 1
= 0.012245
√
σp̂ = 0.012245 = 0.1107
Theorem 7.7
The variable
p̂ − p
σp̂2
approaches the Standard Normal distribution when n becomes
infinite, where p̂ is a sample proportion that occurs in a sample of
size n
The question that now arises is how large does the sample size have
to be for the use of the Normal approximation to be valid? A widely used
238
criterion is that both np and n(1 − p) must
Advanced be greater
Topics than 5. Probability
in Introductory
Example 7.10
Refer to Example 7.8; what is the probability that 22% will be defective?
Solution
n = 250, p = 0.2
A specific value of p̂, denoted by p0 is p0 = 0.22. We are required to calculate
P (p̂ ≤ p0 ).
0.22 − 0.2
P (p̂ ≤ 0.22) = P Z ≤
0.2(1 − 0.2)
250
0.02
= P Z≤
0.0253
= P (Z ≤ 0.7906) = Φ(0.7906)
219
= 0.61141
P (p̂ ≤ p0 ).
ADVANCED TOPICS IN INTRODUCTORY 0.22 − 0.2
P (p̂ ≤ 0.22) = P
Z ≤
PROBABILITY: A FIRST COURSE IN
0.22
0.2(1−−0.2
0.2)
SAMPLING DISTRIBUTIONS II
P (p̂ ≤– 0.22)
PROBABILITY THEORY VOLUME=III P Z ≤
SAMPLING DISTRIBUTION OF STATISTICS
0.2(1250
− 0.2)
0.02 250
= P Z ≤
0.02
0.0253
= P Z≤
= P (Z ≤ 0.7906)
0.0253 = Φ(0.7906)
=
= P (Z ≤
0.611410.7906) = Φ(0.7906)
= 0.61141
The Normal approximation may be improved by the continuity correction
factor,
The a device
Normal that makes an
approximation mayadjustment
be improvedfor the
by fact
the that a discrete
continuity distri-
correction
bution ais device
factor, being that
approximated
makes an by a continuous
adjustment distribution.
for the fact that a The correction
discrete distri-
factor isismore
bution beingimportant if n isby
approximated small. With the distribution.
a continuous continuity correction factor,
The correction
factor is more important if n is small. With the continuity correction factor,
0.5
p0 + − p
P (p̂ ≤ p0 ) = P Z ≤ 0.5
n
p 0 + − p
, x < np
P (p̂ ≤ p0 ) = P Z ≤
p(1n − p)
, x < np
p(1 n− p)
n
p0 + 0.5
− p
= Φ 0.5
n
p
0+ − p
= Φ p(1n− p)
p(1 n− p)
or n
or
0.5
p0 − − p
P (p̂ ≤ p0 ) = P
Z ≤ 0.5
n
, x > np
p 0 − − p
P (p̂ ≤ p0 ) = P Z ≤
p(1n − p)
, x > np
p(1 n− p)
n
Technical training on
WHAT you need, WHEN you need it
At IDC Technologies we can tailor our technical and engineering
training workshops to suit your needs. We have extensive OIL & GAS
experience in training technical and engineering staff and ENGINEERING
have trained people in organisations such as General
ELECTRONICS
Motors, Shell, Siemens, BHP and Honeywell to name a few.
Our onsite training is cost effective, convenient and completely AUTOMATION &
customisable to the technical and engineering areas you want PROCESS CONTROL
covered. Our workshops are all comprehensive hands-on learning
experiences with ample time given to practical sessions and MECHANICAL
demonstrations. We communicate well to ensure that workshop content ENGINEERING
and timing match the knowledge, skills, and abilities of the participants.
INDUSTRIAL
We run onsite training all year round and hold the workshops on DATA COMMS
your premises or a venue of your choice for your convenience.
ELECTRICAL
For a no obligation proposal, contact us today POWER
at [email protected] or visit our website
for more information: www.idc-online.com/onsite/
220
ADVANCED TOPICS IN INTRODUCTORY
PROBABILITY: A FIRST COURSE IN SAMPLING DISTRIBUTIONS II
PROBABILITY THEORY – VOLUME III SAMPLING DISTRIBUTION OF STATISTICS
Sampling Distributions II 239
p0 − 0.5 − p
n
= Φ
p(1 − p)
n
Example 7.11
Refer to Example 7.9. Calculate the probability that out of the selected
twelve students, five or less of them will be females.
Solution
5
n = 12, p = 0.4 p0 = = 0.42
12
0.5
0.42 − − 0.4
12
P (p̂ ≤ 0.42) = P Z ≤
0.4(1 − 0.4)
12
−0.002
= P Z≤
0.01414
= Φ(−0.1414) = 1 − Φ(0.01414)
= 1 − 0.0056
= 0.9944
There are many problems in applied statistics where our interest is in two
populations, such as knowing something about the difference between two
population means or the difference between two population proportions. In
one situation we may wish to know if it is reasonable to conclude that the
two population means (or proportions) are different. In another situation we
may wish to know the magnitude of the difference between two population
means (or proportions). A knowledge of the sampling distribution of the
240
difference between two means Advanced Topics isinuseful
(or proportions) Introductory Probability
in investigations of
this type.
To empirically construct the sampling distribution of the difference
between two sample means (or proportions) we would adopt the following
procedure.
Suppose there are two populations, Population 1 and Population 2. We
would draw all possible random
samples of size n1 from Population 1 of size
N1
N1 . There would be such samples. For each set of sample data, the
n1
sample statistic (mean or proportion) would be computed. From Population
2 of size N2 we
would draw separately and independently all possible random
N2
samples of size in all. The sample mean (or proportion) of each sample
n2
is computed and the difference between all possible pairs of the means (or
proportions) is taken. The sampling distribution of the difference between
sample means (or proportions) would consist of all such distinct differences,
accompanied by their frequencies or relative frequencies of occurrence.
221
7.4.2 Sampling Distribution of Difference between Means
N1
N1 . There would be such samples. For each set of sample data, the
n1
ADVANCED TOPICS IN INTRODUCTORY
sample statistic (mean or proportion) would be computed. From Population
PROBABILITY: A FIRST COURSE IN SAMPLING DISTRIBUTIONS II
2 of size N2 we
PROBABILITY would
THEORY –draw
VOLUMEseparately
III and independently allSAMPLING
possible random
DISTRIBUTION OF STATISTICS
N2
samples of size in all. The sample mean (or proportion) of each sample
n2
is computed and the difference between all possible pairs of the means (or
proportions) is taken. The sampling distribution of the difference between
sample means (or proportions) would consist of all such distinct differences,
accompanied by their frequencies or relative frequencies of occurrence.
Theorem 7.8
Let X11 , X12 , ..., X1n1 , X21 , X22 , ...X2n2 be n1 + n2 independent ran-
dom variables, the first n1 , having identical distributions with mean
µ1 and variance σ12 , and the remaining n2 having identical distribu-
tions with mean µ2 and variance σ22 . Then
E(X 1 − X 2 ) = µ1 − µ2
and
σ12 σ22
Var(X 1 − X 2 ) = +
n1 n2
where X 1 and X 2 are the sample means of Populations 1 and 2
respectively
Example 7.12
Two brands
Sampling of tyres areIIbeing compared by a motor firm. Brand A has
Distributions 241a
mean life of 24,000 km and a standard deviation of 1,200 km, while Brand
Solution
n1 = 100 n2 = 90
µ1 = 24, 000 µ2 = 22, 000
σ1 = 1, 200 σ2 = 1, 080
Suppose two random samples of different sizes are drawn from binomial
populations. If we need to compare the number of successes in the samples
242have to work with their proportions.
we Advanced Topics in Introductory Probability
Theorem 7.9
If independent random samples of different sizes n1 and n2 are drawn
from two binomial populations with proportions p1 and p2 respec-
tively, then the distribution of the difference between the two sample
proportions, p̂1 − p̂2 has mean
E(p̂1 − p̂2 ) = p1 − p2
and variance
p1 (1 − p1 ) p2 (1 − p2 )
Var(p̂1 − p̂2 ) = +
n1 n2
Example 7.13
In a national election, 60 percent of voters in Community A are in favour
of a certain candidate and 50 percent in Community B are in favour of the
candidate. If a sample of 210 voters from Community A and 160 voters from
Community B are drawn, find
Solution
p1 (1 − p1 ) p2 (1 − p2 )
(a) Var(p̂1 − p̂2 ) = +
n1 n2
0.6(1 − 0.6) 0.5(1 − 0.5)
= +
200 160
0.6(0.4) 0.5(0.5)
= +
200 160
= 0.0012 + 223
0.00156
Solution
ADVANCED TOPICS IN INTRODUCTORY
p1 =COURSE
PROBABILITY: A FIRST
0.6 p2IN= 0.5 n1 = 200 n2 = 160 SAMPLING DISTRIBUTIONS II
PROBABILITY THEORY – VOLUME III SAMPLING DISTRIBUTION OF STATISTICS
(a) E(p̂1 − p̂2 ) = p1 − p2 = 0.6 − 0.5 = 0.1
p1 (1 − p1 ) p2 (1 − p2 )
(a) Var(p̂1 − p̂2 ) = +
n1 n2
0.6(1 − 0.6) 0.5(1 − 0.5)
= +
200 160
Sampling Distributions II 0.6(0.4) 0.5(0.5) 243
= +
200 160
= 0.0012 + 0.00156
= 0.00276
√
σ(p̂1 −p̂2 ) = 0.00276 = 0.0525
Example 7.14
In Example 7.13 if in the sample of voters from Community A and Commu-
nity B there were 30 and 20 voters respectively in favour of the candidate
what is the probability of obtaining this or a smaller difference in the sample
proportions if the belief about the population parameters is correct?
Solution
x1 = 30, x2 = 20
x1 30 �e Graduate Programme
I joined MITAS because
p01 = = = 0.143 for Engineers and Geoscientists
n1 210
I wanted real responsibili� 20
www.discovermitas.com
Maersk.com/Mitas �e G
p02I joined
x2 MITAS because
= = = 0.125 for Engine
I wanted
n 2 real responsibili�
160 Ma
p01 − p02 = 0.143 − 0.125 = 0.018
p1 − p2 = 0.6 − 0.5 = 0.1
224
Example 7.14
In Example 7.13 if in the sample of voters from Community A and Commu-
ADVANCED TOPICS IN INTRODUCTORY
nity B there Awere
PROBABILITY: 30COURSE
FIRST and 20 INvoters respectively in favour of the candidate
SAMPLING DISTRIBUTIONS II
what is the probability
PROBABILITY of obtaining
THEORY – VOLUME III this or a smaller difference in the sample
SAMPLING DISTRIBUTION OF STATISTICS
proportions if the belief about the population parameters is correct?
Solution
x1 = 30, x2 = 20
x1 30
p01 = = = 0.143
n1 210
x2 20
p02 = = = 0.125
n2 160
p01 − p02 = 0.143 − 0.125 = 0.018
p1 − p2 = 0.6 − 0.5 = 0.1
Example 7.15
With reference to Examples 7.1, if a sample of size two is drawn with re-
placement from the population, find
Solution
The sample values and means have been obtained in Example 7.2. For the
sake of convenience, we reproduce them here. The sample values i for each
j th sample of size two (Xij ) are:
225
(4, 4) (4, 5) (4, 7) (4, 8) (5, 4) (5, 5) (5, 7) (5, 8)
placement from the population, find
ADVANCED TOPICS IN INTRODUCTORY
(a) the mean of the sampling distribution
PROBABILITY: A FIRST COURSE IN
of variances; SAMPLING DISTRIBUTIONS II
PROBABILITY THEORY – VOLUME III SAMPLING DISTRIBUTION OF STATISTICS
(b) the variance of the sampling distribution of the variances.
Solution
The sample values and means have been obtained in Example 7.2. For the
sake of convenience, we reproduce them here. The sample values i for each
j th sample of size two (Xij ) are:
226
ADVANCED TOPICS IN INTRODUCTORY
PROBABILITY: A FIRST COURSE IN SAMPLING DISTRIBUTIONS II
PROBABILITY THEORY – VOLUME III SAMPLING DISTRIBUTION OF STATISTICS
246 Advanced Topics in Introductory Probability
Theorem 7.10
Let X1 , ..., Xn be a random sample of size n, from a random variable
X with expectation µ and variance σ 2 . Let s2 be the sample variance
defined as n
1
s2 = (X − X)2
n i=1
If sampling is from an infinite population or with replacement from
a finite population, then
n−1 2
E(s2 ) = σ
n
Proof
www.job.oticon.dk
227
(Xi − X)2 = [(Xi − µ)2 − 2(X − µ)(Xi − µ) + (X − µ)2 ]
i=1 i=1
ADVANCED TOPICS IN INTRODUCTORY
n n
PROBABILITY: A FIRST =
COURSE [(X
INi − µ)2 − 2(X − µ) (Xi − µ) SAMPLING DISTRIBUTIONS II
PROBABILITY THEORY – VOLUME
i=1 III i=1 SAMPLING DISTRIBUTION OF STATISTICS
228
ADVANCED TOPICS IN INTRODUCTORY
PROBABILITY: A FIRST COURSE IN SAMPLING DISTRIBUTIONS II
PROBABILITY THEORY – VOLUME III SAMPLING DISTRIBUTION OF STATISTICS
248 Advanced Topics in Introductory Probability
n n−1 2
= · σ
n−1 n
= σ2
Example 7.16
Refer to Example 7.1. List all possible samples of size two that can be drawn
from the population with replacement and calculate
Solution
From Example 7.2, N = 4, σ 2 = 2.5. From Example 7.3, n = 2 and
N n = 16. The mean of the sample means is 6.
(a) To find the variance we subtract the mean of the sample means from
each of the sample means, square it and divide it by the number of
the sample means.
Thus,
Example 7.17
With reference to Examples 7.1, if a sample of size two is drawn without
replacement, find
(a) the mean, (b) the variance
of the sampling distribution of variances.
Solution
The sample values and means have been obtained in Example 7.5. For the
sake of convenience, we reproduce them here.
(4, 5) (4, 7) (4, 8) (5, 7) (5, 8) (7, 8)
careers.slb.com/recentgraduates
230
ADVANCED TOPICS IN INTRODUCTORY
PROBABILITY: A FIRST COURSE IN SAMPLING DISTRIBUTIONS II
PROBABILITY
250 THEORY – VOLUME Advanced
III SAMPLING
Topics in Introductory DISTRIBUTION OF STATISTICS
Probability
s2
E(s2j ) = µs2 =
j
N
n
0.25 + 2.25 + 4.00 + 1.00 + 2.25 + 0.25
=
6
10
= = 1.6667
6
k
j=1 (sj − µs2 )
2
Var(s2j ) =
4
2
(0.25 − 1.6667)2 + (2.25 − 1.6667)2
=
6
(4.00 − 1.6667) + (1 − 1.6667)2
2
+
6
(2.25 − 1.6667) + (0.25 − 1.6667)2
2
+
6
2.0070 + 0.3404 + · · · + 0.3404 + 2.0070
=
6
10.5836
=
6
Sampling Distributions =
II 1.7639 251
Theorem 7.11
Let X1 , X2 , ..., Xn be a random sample of size n, with expectation µ
and variance σ 2 . Let s2 be the sample variance defined as
n
1
s2 = (Xi − X)2
n i=1
Note
As N → ∞ this result reduces to that in Theorem
231 7.10.
Var(s ) = σ
N −1 n
ADVANCED TOPICS IN INTRODUCTORY
PROBABILITY: A FIRST COURSE IN SAMPLING DISTRIBUTIONS II
PROBABILITY THEORY – VOLUME III SAMPLING DISTRIBUTION OF STATISTICS
Note
As N → ∞ this result reduces to that in Theorem 7.10.
Example 7.18
With reference to Examples 7.17 and 7.5, find the variance of the sample
variance s2 if sampling is without replacement.
Solution
N = 4, n = 2, σ 2 = 2.5 (from Example 7.2). Hence,
4 2−1
Var(s ) =
2
(2.5)
4−1 2
= 1.6667
Theorem 7.12
If random samples of size n are taken from a population having a
Normal distribution, and if
i (Xi
− X)2
s =
2
n−1
then
(n − 1)s2
σ2
has a Chi-square distribution with n − 1 degrees of freedom
Note
Chi-square distribution is discussed in Chapter 8.
Theorem 7.13
If X is normally distributed with mean µ and variance σ 2 and
X1 , X2 ..., Xn is a random sample of size n of X, then the random
variable
(Xi − µ)2
V = i
σ2
will possess a Chi-square distribution with n degrees of freedom
EXERCISES
232
EXERCISES
7.2 Rework Exercise 7.1 for the case when a random sample of size 2 is
drawn without replacement.
7.5 The weights of the products of a certain factory have a mean of 140
kilograms and a standard deviation of 20 kilograms. IfEnglish-taught
180 of the MSc programmes
products are selected at random, find in engineering: Aeronautical,
Biomedical, Electronics,
(a) (i) the expectation of the sample mean; Mechanical, Communication
systems
(ii) the variance and standard deviation of the sample mean;and Transport systems.
No weigh
(b) the probability that a product selected at random will tuition fees.
(i) at most 150 kilograms; (ii) at least 120 kilograms;
(iii) between 110 and 155 kilograms.
7.6 The time that a cashier spends in processing each person’s order is an
E liu.se/master
independent random variable with mean of 45 minutes and standard
deviation of 30 minutes. What is the approximate probability that the
orders of 100 persons can be processed in less than 4000 minutes?
‡
233
(a) the expectation of the sample mean;
ADVANCED TOPICS IN INTRODUCTORY
(b) theAvariance
PROBABILITY: and hence
FIRST COURSE IN the standard deviation of the sample SAMPLING
mean DISTRIBUTIONS II
if sampling is made; (i) with replacement; (ii) without replace-
PROBABILITY THEORY – VOLUME III SAMPLING DISTRIBUTION OF STATISTICS
ment.
7.5 The weights of the products of a certain factory have a mean of 140
kilograms and a standard deviation of 20 kilograms. If 180 of the
products are selected at random, find
7.6 The time that a cashier spends in processing each person’s order is an
independent random variable with mean of 45 minutes and standard
254 deviation of 30 minutes. Advanced
What is the approximate
Topics probability
in Introductory that the
Probability
orders of 100 persons can be processed in less than 4000 minutes?
7.8 Steel wires produced by a certain factory have a mean tensile strength
of 1, 000 kilograms and a variance of 900 kilograms. If a random sample
of 250 wires is drawn from the production line during a certain month
with a total output of 100, 000 units, find
(a) (i) the expectation and (ii) the variance of the sample mean
tensile strength;
(b) the probability that the sample mean will
(i) be more than 1, 010 kilograms;
(ii) be less than 1, 003 kilograms;
(iii) be between 995 and 1, 006 kilograms;
(iv) differ from 1, 000 by 5 kilograms or more;
(v) differ from 1, 000 by at most 7 kilograms.
7.9 Repeat Exercise 7.7 for the case when sample is with replacement.
(a) Find
(i) the expectation of the proportion of females;
234
(ii) the variance and hence the standard error of the sample pro-
(iii)
(iii) be
be between
between 995
995 and
and 1,
1, 006
006 kilograms;
kilograms;
(iv)
ADVANCED(iv) differ
differ
TOPICS from
from 1, 000 by 5 kilograms or
1, 000
IN INTRODUCTORY by 5 kilograms or more;
more;
(v)
PROBABILITY: A differ
FIRST from 1,
COURSE 000
IN by at most 7 kilograms.
(v) differ from 1, 000 by at most 7 kilograms.
SAMPLING DISTRIBUTIONS II
PROBABILITY THEORY – VOLUME III SAMPLING DISTRIBUTION OF STATISTICS
7.9
7.9 Repeat
Repeat Exercise
Exercise 7.7
7.7 for
for the
the case
case when
when sample
sample is
is with
with replacement.
replacement.
7.10
7.10 A
A group
group of
of children
children consists
consists of
of twenty
twenty males
males and
and twenty-five
twenty-five females.
females.
A random sample of twenty-three children is drawn
A random sample of twenty-three children is drawn fromfrom this
this group
group
with replacement.
with replacement.
(a)
(a) Find
Find
(i)
(i) the
the expectation
expectation of
of the
the proportion
proportion of
of females;
females;
(ii)
(ii) the variance and hence the standard error
the variance and hence the standard error of
of the
the sample
sample pro-
pro-
portion of females.
portion of females.
(b)
(b) What
What isis the
the probability
probability that
that the
the sample
sample proportion
proportion of
of females
females is
is
at least 0.6?
at least 0.6?
7.11
7.11 Rework
Rework Exercise
Exercise 7.10
7.10 if
if sampling
sampling was
was made
made without
without replacement.
replacement.
7.12
7.12 Thirty-six
Sampling percent
percentIIof
Distributions
Thirty-six of University
University staff
staff are
are against
against aa strike
strike action. 255
action. If
If aa
sample of 120 of them are drawn at random with replacement,
sample of 120 of them are drawn at random with replacement, find find
(a) (i) the expectation of the sample proportion,
(ii) the variance and hence the standard error of the sample pro-
portion,
of the University staff who are against a strike action.
(b) What is the probability that the proportion of the University staff
who are against the strike will be between 0.5 and 0.7.
7.13 Rework Exercise 7.12 if sampling is without replacement and if there
are 400 university staff members.
7.14 A box contains 80 black balls and 60 white balls. Two samples of 40
marbles each are randomly selected with replacement from the box
and their colours noted. Suppose that the first and second samples
contained 30 and 25 black balls respectively. Find
(a) the variance of the difference in proportion of black balls in the
two sample;
(b) the probability that the difference between the two samples does
not exceed 10 balls.
7.15 Rework Exercise 7.14 if sampling is without replacement.
7.16 A random sample of size 5 is drawn from a population which is nor-
mally distributed with mean 49 and variance 9. A second random
sample of size 4 is drawn from a different population which is also nor-
mally distributed with mean 39 and variance 4. The two samples are
independent with means X 1 and X 2 respectively. Let W = X 1 − X 2 .
Calculate (a) the mean and variance of W ; (b) P (W > 8.2).
7.17 With reference to Exercise 7.2, find
(a) (i) the mean; (ii) the variance;
of the sampling distribution of variances;
(b) the corrected variance for sampling.
7.18 With reference to Exercise 7.1, find
(a) (i) the mean (ii) the variance
of the sampling distribution of variances;
(b) the corrected variance for sampling.
235
ADVANCED TOPICS IN INTRODUCTORY
PROBABILITY: A FIRST COURSE IN
PROBABILITY THEORY – VOLUME III Distributions Derived from Normal Distribution
Chapter 8
8.1 INTRODUCTION
8.2 χ2 DISTRIBUTION
TAKE THE
pling distribution of the sample variance.
RIGHT TRACK
256
236
probability distributions, which are byproducts of the Normal distribution
such can be
ADVANCED described
TOPICS by one of them under certain conditions. These are
IN INTRODUCTORY
the Chi-square (χ2 ) distribution, the Student’s t (briefly t) distribution, and
PROBABILITY: A FIRST COURSE IN
the Fisher’s THEORY
PROBABILITY variance– ratio
VOLUME(briefly
III F ) distribution.
Distributions Derived from Normal Distribution
8.2 χ2 DISTRIBUTION
256
8.2.1 Definition of Chi-square Distribution
237
ADVANCED TOPICS IN INTRODUCTORY
PROBABILITY: A FIRST COURSE IN
PROBABILITY
258 THEORY – VOLUME Advanced
III Distributions
Topics Derived
in Introductory from Normal Distribution
Probability
Property 1
Theorem 8.1
Suppose X has a χ2 with parameter v distribution. Then the
moment-generating function of X is
v 1
MX (t) = (1 − 2t)− 2 , t<
2
The proof of this theorem is left as an exercise for the reader (see Exercise
8.1)
Example 8.1
Suppose X has a moment generating function (m.g.f)
1
MX (t) = (1 − 2t)−10 , for t <
2
Solution
The Chi-square distribution has v degrees of freedom. The exponent of the
m.g.f. equals −10. Hence −10 = − v2 which gives v = 20.
Example 8.2
Suppose X has a Chi-square distribution with 5 degrees of freedom. Find
its m.g.f.
Solution
From Theorem 8.1 with v = 5, the moment generating function of X is
Property 2
Theorem 8.2
Suppose X has a χ2 distribution with parameter v. Then
Proof
Since the Chi-square distribution is a special case of the Gamma distribution
when α = v2 and β = 2, its expectation and variance is obtained simply by
substituting these values in the formulas of Theorem 10.18. Thus,
Example 8.3
238
Refer to Example 8.2. Find the (a) mean (b) variance.
Suppose X has a χ2 distribution with parameter v. Then
(a) E(X) = v
ADVANCED TOPICS IN INTRODUCTORY (b) Var(X) = 2 v
PROBABILITY: A FIRST COURSE IN
PROBABILITY THEORY – VOLUME III Distributions Derived from Normal Distribution
Proof
Since the Chi-square distribution is a special case of the Gamma distribution
when α = v2 and β = 2, its expectation and variance is obtained simply by
substituting these values in the formulas of Theorem 10.18. Thus,
Example 8.3
Refer to Example 8.2. Find the (a) mean (b) variance.
Solution
(a) E(X) = v = 5 (b) Var(X) = 2 v = 10
Property 3
Theorem 8.3
The cumulative distribution function of the Chi-square distribution
is given by x
v v v x
22 Γ t 2 −1 e− 2 dt
2 0
Property 4
Theorem 8.4
If X1 , X2 , ..., Xn are independent random variables having Chi-square
distributions with v1 , v2 , ... , vn degrees of freedom respectively,
then the distribution of Y = X1 + X2 + ... + Xn will possess a
Chi-square distribution with v1 + v2 + ... + vn degrees of freedom
www.schaeffler.com/careers
239
The cumulative distribution function of the Chi-square distribution
is given by x
ADVANCED TOPICS IN INTRODUCTORY
v v v x
22 Γ t 2 −1 e− 2 dt
PROBABILITY: A FIRST COURSE IN 2 0
PROBABILITY THEORY – VOLUME III Distributions Derived from Normal Distribution
Property 4
Theorem 8.4
If X1 , X2 , ..., Xn are independent random variables having Chi-square
distributions with v1 , v2 , ... , vn degrees of freedom respectively,
then the distribution of Y = X1 + X2 + ... + Xn will possess a
Chi-square distribution with v1 + v2 + ... + vn degrees of freedom
260 Advanced Topics in Introductory Probability
Corollary 8.1
If X1 , X2 , ..., Xn are independent random variables having Chi-square
distributions with 1 degree of freedom each, then the distribution of
Y = X1 + X2 + ... + Xn will possess a Chi-square distribution with n
degrees of freedom.
Corollary 8.2
If X and Y are independent and X ∼ χ2v and Y ∼ χ2u , then
Z = X + Y ∼ χ2v+u .
Property 5
Theorem 8.5
If X1 , and X2 are independent random variables, X1 has a Chi-
square distribution with v1 degrees of freedom and X1 + X2 has a
Chi-square distribution with v (> v1 ) degrees of freedom, then X2
has a Chi-square distribution with v − v1 degrees of freedom.
α = 0.005, 0.01, 0.025, 0.05, 0.95, 0.975, 0.99, 0.995 and v = 1, 2, ..., 30
It is a convention in statistics to use the same symbol χ2 for both the random
variable and a value of that random variable. Thus percentile values of the
Chi-square distribution with v degrees of freedom is denoted by χ2α,v or
simply χ2α if n is understood, where the suffix α is used from now on and
in applications of probability and Statistics, to denote “lower percentile”
(100α% point).
Example 8.4
Find from Table I the following:
(a) χ20.025, 13 (b) χ20.99, 5 (c) χ20.99, 28 (d) χ20,95, 3 (e) χ20.975, 10
240
ADVANCED TOPICS IN INTRODUCTORY
PROBABILITY: A FIRST COURSE IN
PROBABILITY THEORY – VOLUME III Distributions Derived from Normal Distribution
Distributions Derived From Normal Distribution 261
Solution
(a) This is a Chi-square for probability 0.025 and degrees of freedom of 13.
Now, referring to Table I, we proceed downward under column labelled
d.f. until we reach entry 12. Then we proceed right to meet column
headed χ20.025 . The value at that meeting point, which is 24.736, is
the required value of χ20.025,12 . Therefore the probability that a χ2 -
distributed random variable with 12 degrees of freedom does not ex-
ceed 24.736 is 0.025.
Example 8.5
Suppose a certain Chi-square distribution value for a probability of 0.975 is
30.2. What is the degree of freedom.
Solution
Refer to Table I and proceed downward under column 0.975 until you locate
the value 30.2 in the body of the table. Then proceed left to meet column
headed v. The value of v = 17 is the required degrees of freedom.
Theorem 8.6
Let Z be a Standard Normal variable. Then U = Z 2 is a Chi-square
distribution with 1 degree of freedom
262 Advanced Topics in Introductory Probability
Proof
In Theorem 10.28, we proved that
1 u 1 t
P (U ≤ u) = √ t− 2 e− 2 dt (i)
2π 0
If we put v = 1 in the Chi-square cumulative distribution function (Theorem
8.3), that is, if we consider Chi-square
√ distribution with 1 degree of freedom,
and recalling also that Γ 2 = π, we will obtain expression (i).
1
Note
If X ∼ N (µ, σ), then
2
X −µ
∼ χ21
σ
241
1 1 t
P (U ≤ u) = √ t− 2 e− 2 dt (i)
2π 0
ADVANCED TOPICS IN INTRODUCTORY
If we put v = 1 in the Chi-square cumulative distribution function (Theorem
PROBABILITY: A FIRST COURSE IN
8.3), that is, THEORY
PROBABILITY if we consider
Chi-square
– VOLUME III√ distribution with 1 degree
Distributions of freedom,
Derived from Normal Distribution
and recalling also that Γ 2 = π, we will obtain expression (i).
1
Note
If X ∼ N (µ, σ), then
2
X −µ
∼ χ21
σ
Theorem 8.7
Let Z1 , Z2 , ... , Zv be n independent Standard Normal random vari-
ables. Then the sum of the squares of these variables is a Chi-square
variable, χ2 , with v degrees of freedom. That is,
Example 8.6
Suppose Z1 , Z2 , ... , Zv is a random sample from the Standard Normal dis-
tribution. Find the number c such that
22
P Zi2 > c = 0.10
i=1
Solution
22
By Theorem 8.7, Zi has a χ2 distribution with 22 degrees of freedom.
i=1
678'<)25<2850$67(5©6'(*5((
&KDOPHUV8QLYHUVLW\RI7HFKQRORJ\FRQGXFWVUHVHDUFKDQGHGXFDWLRQLQHQJLQHHU
LQJDQGQDWXUDOVFLHQFHVDUFKLWHFWXUHWHFKQRORJ\UHODWHGPDWKHPDWLFDOVFLHQFHV
DQGQDXWLFDOVFLHQFHV%HKLQGDOOWKDW&KDOPHUVDFFRPSOLVKHVWKHDLPSHUVLVWV
IRUFRQWULEXWLQJWRDVXVWDLQDEOHIXWXUH¤ERWKQDWLRQDOO\DQGJOREDOO\
9LVLWXVRQ&KDOPHUVVHRU1H[W6WRS&KDOPHUVRQIDFHERRN
242
Suppose Z1 , Z2 , ... , Zv is a random sample from the Standard Normal dis-
tribution. Find the number c such that
ADVANCED TOPICS IN INTRODUCTORY
22
PROBABILITY: A FIRST COURSE IN
P III Zi2
PROBABILITY THEORY – VOLUME >c = Distributions
0.10 Derived from Normal Distribution
i=1
Solution
22
Distributions Derived From Normal Distribution 263
By Theorem 8.7, Zi has a χ2 distribution with 22 degrees of freedom.
i=1
Reading from Table I in Appendix A the row labelled 22 d.f. and the column
headed by an upper-tail area of 0.10, we get the number 30.813. Therefore
22
P Zi2 > 30.813 = 0.10
i=1
Thus c = 30.813
Theorem 8.8
The Chi-square distribution approaches the Normal distribution
N (v, 2v) as the degrees of freedom v gets large
Theorem 8.9
Suppose X1 , X2 , ... , Xs has a joint multinomial distribution with
parameters n, p1 , p2 , · · · , ps . Then the sum
s
(Xi − npi )2
i=1
npi
8.3 t DISTRIBUTION
The t-distribution was first introduced by W. Gosset who published his work
under the name of ”Student”.
Property 1
The t distribution is symmetric and for α = 0.50, t(1−α),v = −tα,v
Property 2
Theorem 8.10
Suppose a random variable X has the Student’s t distribution with
parameter v. Then
Property 2
244
Theorem 8.11
(a) Var(X) = , for v > 2
v−2
ADVANCED TOPICS IN INTRODUCTORY
PROBABILITY: A FIRST COURSE IN
Thus a t-distribution
PROBABILITY THEORY – possesses
VOLUME IIIno mean when v = 1 and the
Distributions variance
Derived does
from Normal Distribution
not exist for v ≤ 2.
Property 2
Theorem 8.11
If X1 , X2 · · · , Xn is a random sample from the Normal population
having the mean µ and variance σ 2 , and let X be the sample mean,
then
X −µ
T = √
ŝ/ n
has the Student’s t distribution with v = n − 1 degrees of freedom
where n
1
ŝ =
2
(Xi − X)2
n − 1 i=1
tα,v
P (t ≤ tα,v ) = f (x) dx = 1 − α
−∞
Example 8.7
If t distribution has 25 degrees of freedom, find
(a) t0.975,25 (b) t0.05,25
Solution
Therefore
Distributions the probability
Derived that Distribution
From Normal a t- distributed random variable with
267
25 degrees of freedom does not exceed 1.708 is 0.90.
Theorem 8.12
If Z is a Standard Normal variable and χ2 is a Chi-square distribution
with v degrees of freedom and Z and χ2 are independently distributed
then the distribution
Z
T =
χ2
v
has a Student’s t distribution with v degrees of freedom
Proof
We simply express the given ratio in a different form as follows:
X − µ
σ
√
X −µ n
s =
√ s2
n
σ2
Z
=
(n − 1)s2
(n − 1)σ 2
Z
=
χ2n−1
n−1
(n − 1)s2
since = χ2n−1 by Theorem 7.12
σ2
By Definition 8.11, this is a t distribution with n − 1 degrees of freedom.
268
Note Advanced Topics in Introductory Probability
This
268 quantity is usually called Advanced
‘Student’sTopics
t and the corresponding
in Introductory distribu-
Probability
tion is called ‘Student’s t distribution.
Theorem 8.13
The t curve approaches a Normal curve as n approaches infinity
Theorem 8.13
The t curve approaches a Normal curve as n approaches infinity
8.4 F DISTRIBUTION
Scholarships
F = F
α;v1 ;v2 α;v2 ;v1
247
Γ Γ v1
2 2 1+ x 2
v2
ADVANCED TOPICS IN INTRODUCTORY
where x >A 0FIRST COURSE IN
PROBABILITY:
PROBABILITY THEORY – VOLUME III Distributions Derived from Normal Distribution
248
ADVANCED TOPICS IN INTRODUCTORY
PROBABILITY: A FIRST COURSE IN
PROBABILITY THEORY – VOLUME III Distributions Derived from Normal Distribution
270 Advanced Topics in Introductory Probability
Property 1
As a ratio of two non-negative values (χ2v ≥ 0), an F random variable ranges
in value from 0 to +∞.
Property 2
Theorem 8.14
Suppose a random variable X has the F distribution with parameters
v1 and v2 . Then
v2
(a) E(X) = , v2 > 2
v2 − 2
2v22 (v1 + v2 − 2)
(b) Var(X) = , v2 > 4
v1 (v2 − 2)2 (v2 − 4)
Theorem 8.14 implies that an F variable has no mean when v2 ≤ 2 and has
no variance when v2 ≤ 4.
Theorem 8.15
1
Suppose X has the distribution Fv1 ,v2 then Y = has the distribu-
X
tion Fv2 ,v1 , that is,
1
F(1−α,v1 ,v2 ) =
F(α,v1 ,v2 )
Property 4
F is a positively skewed distribution; however, its skewness decreases with
increasing v1 and v2 .
The F distribution has been tabulated for α = 0.50, 0.90, 0.95, 0.975, 0.99,
0.995, 0.999. Table III in Appendix A gives the percentage points for the
right tail of several F distributions at 1, 5 and 10 percent levels. To say
that P (F > Fv1 ,v2 ) = α is the same as saying that
P (0 ≤ F ≤ Fv1 ,v2 ) = 1 − α
Example 8.8
If the F distribution has degrees of freedom v249
1 = 30 and v2 = 24, find the
table value of
that P (F > Fv1 ,v2 ) = α is the same as saying that
ADVANCED TOPICS IN INTRODUCTORY
P (0 ≤
PROBABILITY: A FIRST COURSE IN F ≤ Fv1 ,v2 ) = 1 − α
PROBABILITY THEORY – VOLUME III Distributions Derived from Normal Distribution
Example 8.8
If the F distribution has degrees of freedom v1 = 30 and v2 = 24, find the
table value of
(a) F0.90;30,24 (b) F0.95;12,8 (c) F0.05;12,8
Solution
(a) Table III in Appendix A gives the percentage point of the distribution
at 90 percent levels. Read the value of the table at the meeting point
of the degrees of freedom of the numerator, v1 = 30 and degrees of
freedom of the denominator, v2 = 24. The value is 1.67. That is, the
probability that an F-distributed random variable with 30 numerator
degrees of freedom and 24 denominator degrees of freedom does not
exceed 1.67 is 0.90.
F0.95;12,8 = 3.28
1 1
F0.05;12,5 = = = 0.305
F0.95;8,12 3.28
250
ADVANCED TOPICS IN INTRODUCTORY
PROBABILITY: A FIRST COURSE IN
PROBABILITY THEORY – VOLUME III Distributions Derived from Normal Distribution
272 Advanced Topics in Introductory Probability
Theorem 8.16
Let χ2(v1 ) and χ2(v2 ) be two independent χ2 random variables with
parameters v1 and v2 respectively. Then F -distribution is given by
χ2(v1 ) χ2(v2 )
F(v1 ,v2 ) = /
v1 v2
where v1 is the degrees of freedom of the numerator and v2 is the
degrees of freedom of the denominator
Theorem 8.17
Let χ2α,v be a Chi-square random variable with v degrees of freedom.
Then
χ2α,v
Fα,v,∞ =
v
Theorem 8.18
Let F1,v be an F distribution with degrees of freedom 1 and v. Then
F1−α; 1, v = t21− α , v
2
Proof
From Theorem 8.12
Z2
T2 =
χ2
v
But from Theorem 12.6
Z 2 ∼ χ2 with 1 degree of freedom. Hence,
Z2
Solution
From Table III in Appendix A,
EXERCISES
8.5 Referring to Exercise 8.3. Find the mean and variance of the χ2 vari-
able.
8.8 If X has the χ2 distribution with degrees of freedom of 12, find x such
that
8.9 If X has the t distribution, find from Table II in Appendix A the value
for
(a) t0.95,28 (b) t0.975,15 (c) t0.90,22
8.11 If X is Fv1 ,v2 , find from Table III in Appendix A the value for (a)
F0.95,20,30 (b) F0.90,20,24
8.11 If X is Fv1 ,v2 , find from Table III in Appendix A the value for (a)
F0.95,20,30 (b) F0.90,20,24
253
ADVANCED TOPICS IN INTRODUCTORY
PROBABILITY: A FIRST COURSE IN
PROBABILITY THEORY – VOLUME III Statistical Tables
STATISTICAL TABLES
254
ADVANCED TOPICS IN INTRODUCTORY
PROBABILITY: A FIRST COURSE IN
PROBABILITY THEORY – VOLUME III Statistical Tables
Statistical Tables 277
Table I
Standard Normal Distribution
(Full Table)
z
1 t2
Φ(z) = √ e− 2 dt
2π −∞
255
ADVANCED TOPICS IN INTRODUCTORY
PROBABILITY: A FIRST COURSE IN
PROBABILITY THEORY – VOLUME III Statistical Tables
278 Advanced Topics in Introductory Probability
Table II
Standard Normal Distribution
(Half Table)
1 z t2
Φ(z) = √ e− 2 dt
2π 0
256
ADVANCED TOPICS IN INTRODUCTORY
PROBABILITY: A FIRST COURSE IN
PROBABILITY THEORY – VOLUME III Statistical Tables
Statistical Tables 279
Table III
Normal Density Function
1 − t2
φ(z) = √ e 2
2π
257
ADVANCED TOPICS IN INTRODUCTORY
PROBABILITY: A FIRST COURSE IN
PROBABILITY THEORY – VOLUME III Statistical Tables
280 Advanced Topics in Introductory Probability
Table IV
Percentiles of Chi Square Distribution
258
ADVANCED TOPICS IN INTRODUCTORY
PROBABILITY: A FIRST COURSE IN
PROBABILITY THEORY – VOLUME III Statistical Tables
Statistical Tables 281
Table V
Percentiles of t Distribution
259
ADVANCED TOPICS IN INTRODUCTORY
PROBABILITY: A FIRST COURSE IN
PROBABILITY THEORY – VOLUME III Statistical Tables
282 Advanced Topics in Introductory Probability
Table VI
Percentiles of F Distribution
F0.90
260
ADVANCED TOPICS IN INTRODUCTORY
PROBABILITY: A FIRST COURSE IN
PROBABILITY THEORY – VOLUME III
Statistical Tables 283Statistical Tables
261
ADVANCED TOPICS IN INTRODUCTORY
PROBABILITY: A FIRST COURSE IN
PROBABILITY
284 THEORY – VOLUME IIIAdvanced Topics in Introductory ProbabilityStatistical Tables
Percentiles of F Distribution
F0.95
262
ADVANCED TOPICS IN INTRODUCTORY
PROBABILITY: A FIRST COURSE IN
PROBABILITY
StatisticalTHEORY
Tables – VOLUME III Statistical Tables
285
263
ADVANCED TOPICS IN INTRODUCTORY
PROBABILITY: A FIRST COURSE IN
PROBABILITY
286 THEORY – VOLUME III Advanced Statistical Tables
Topics in Introductory Probability
Percentiles of F Distribution
F0.99
264
ADVANCED TOPICS IN INTRODUCTORY
PROBABILITY: A FIRST COURSE IN
PROBABILITY THEORY – VOLUME III
Statistical Tables 287Statistical Tables
265
ADVANCED TOPICS IN INTRODUCTORY
PROBABILITY: A FIRST COURSE IN
PROBABILITY THEORY – VOLUME III Statistical Tables
288 Advanced Topics in Introductory Probability
Table VII
Random Numbers
266
ADVANCED TOPICS IN INTRODUCTORY
PROBABILITY:
Statistical A FIRST COURSE IN
Tables 289
PROBABILITY THEORY – VOLUME III Statistical Tables
267
ADVANCED TOPICS IN INTRODUCTORY
290
PROBABILITY: A FIRST COURSE INAdvanced Topics in Introductory Probability
PROBABILITY THEORY – VOLUME III Statistical Tables
268
ADVANCED TOPICS IN INTRODUCTORY
Statistical
PROBABILITY:Tables
A FIRST COURSE IN 291
PROBABILITY THEORY – VOLUME III Statistical Tables
269
ADVANCED TOPICS IN INTRODUCTORY
PROBABILITY: A FIRST COURSE IN
292
PROBABILITY THEORY – VOLUME Advanced
III Topics in Introductory Probability Statistical Tables
270
ADVANCED TOPICS IN INTRODUCTORY
PROBABILITY: A FIRST COURSE IN
PROBABILITY THEORY – VOLUME III Answers to Odd-Numbered Exercises
Chapter 1
X Y
1 2 3
3 5 1
1 34 34 34
1.1 2 4 6
2 34 34 34
1 2
3 34 34
0
4 1 5
4 34 34 34
9 7 211 e−λp (λp)x
1.3 (a) (c) 1.5 = 1 1.7 (c)
(2x + 1) 1.9 (a) , x = 0, 1, 2, · · · 1.11 (c) (i)
34 34 219 x!
2(x + y − 2xy), 0 ≤ x ≤ 1, 0 ≤ y ≤ 1 1.13 12
7
(c) 12
7
x 2+ x , 0≤x
2
≤ 1 1.15 (a) λe−λx , x ≥ 0
(c) λe−λ(y−x) , y ≥ x 1.17 (a) (i)x e−x , y ≥ 0
Chapter 2
X +Y 2 3 4 5 6 7
2.1 (a) 3 7 6 12 1 5
P (X + Y = k) 34 34 34 34 34 34
XY 1 2 3 4 6 8 9 12
(c) 3 7 2 8 8 1 5
P (XY = k) 34 34 34 34 34 34
0 34
2X + 3Y 5 7 8 9 10 11 12 13 14 15 17
(e) 3 2 5 1 4 5 2 6 1 5
P (2X + 3Y = k) 34 34 34 34 34 34 34 34 34
0 34
3X 3 6 9 12
(j) 9 12 3 10
P (3X = k) 34 34 34 34
X +Y 2 3 4
2.3 1 3 1
P (X + Y ) 8 8 2
2 2
λz e−λ 3
z (3 − z), 0≤z≤1
2.5(a) , z = 0, 1, 2, · · · 2.7(a) s(z) = 2 2
z!
3
z (4 − 3z 2 + z 3 ), 1≤z≤2
293
271
ADVANCED TOPICS IN INTRODUCTORY
PROBABILITY: A FIRST COURSE IN
PROBABILITY THEORY – VOLUME III Answers to Odd-Numbered Exercises
294 Advanced Topics in Introductory Probability
5
z3 , 0<z≤1
14 3 z 1
2.9(a) s(z) = + , 1<z≤2
7 2 3
3 5 3
z − 4z 2 +
11
z + 36 , 2 < z ≤ 3
7 6 2
2
(u3 + 1), −1 ≤ u < 0
2.17 6.027 × 10−5 2.19 (a) h(u) = 3
2
3
(1 − u3 ), 0≤u≤1
X 3
Y
1 2
2 3
2.21 4[1 − u(1 − ln u)], 0≤u≤1 2.23 X 2 7 3 8
P(Y = k)
20 20 20 20
u2 2
1 + 23 u − 3
, −1 ≤ u < 0 9
(u + 2), 0≤u<1
2.25 h(u) = u2
2.27 h(u) = 2
1 − 43 u + , 0≤u≤1 9u3
(u + 2), 1≤u<∞
3
3 u2 2
2.29 7
(1 − 4
+ u ln( u )), 0 < u < 2
Chapter 3
3.1(a) 4.47 3.3 (a) 2.01 (c) 10.39 (e) 11.24 (g) 1496.32 (i) 52.80 (k)2.11 (m) 3.21 3.5(a) 52 (c) 2
15
(e) 0.04 (g) 0 (i) 0.08 3.15 6.46
Chapter 4
1
4.1 0.064 4.3 (a) 2.3 (c) 2.68 (e) 1.2 4.5 0 4.9 -0.1832 4.11 − 11 4.13(a)(i) −0.21x + 1.52
2
24x +24x+2
(ii) −0.16y + 1.63 4.15 9(2x+1)2
4.19 0.05y + 2.68
Chapter 5
5.1(a) 12 5.3 n ≤ 250 5.5(a) 0.85 5.7 0.84 5.9 1.3947 5.11 0.9648 5.13(a) 0.3409 (c) 0.8185
5.15 0.0062 5.17 1600 5.19 0.9927 5.21 n ≈ 25 5.23 n ≈ 85, = 0.43 5.25(a) 0.1852
5.29 0.2328
Chapter 7
7.1(a) 3.66.8 (c) E(X) = 6.8, Var = 8.48 7.3(a) 6.8 7.5(a)(i) 140 (ii) 1.49 0.85 7.7(a)(i) 1,000
(ii) 7.9 s.d = 0.657 7.11(a)(i) 0.56 (ii) 0.4855 7.15(a) 0.004 7.17(a)(ii) 458.8
Chapter 8
8.3 12 8.5(a) E(X) = 12, Var = 24 8.7(a) 22.367 (c) 10.285 8.9(a) 1.701 (c) 1.321 8.11 1.93
272
ADVANCED TOPICS IN INTRODUCTORY
PROBABILITY: A FIRST COURSE IN
PROBABILITY THEORY – VOLUME III Bibliography
Bibliography
Bajpal, A.C., Calus, L.M. and Fairley, J.A., (1978), Statistical Methods for Engineers and
Scientists, John Wiley and Sons, New York.
Barlow, R. (1989), Statistics, A Guide to the Use of Statistical Methods in the Physical
Sciences, John Wiley and Sons, New York.
Beaumont, G.P. (1986), Probability and Random Variables, Ellis Norwood Ltd, Chichester.
Billinsley, P. (1979) Probability and Measure, John Wiley and Sons, New York.
Blake, Ian F.B. (1979), An Introduction to Applied Probability, John Wiley and Sons, New
York.
Breiman L., (1972), Society for Industrial and Applied Maths, Siam, Philadelphia.
David F.N. (1951), Probability Theory for Statistical Methods, Cambridge University Press,
Cambridge.
Feller, W. (1970) An Introduction to Probability Theory and its Applications Vol. I, Second
Edition, John Wiley and Sons, New York.295
Feller, W. (1971) An Introduction to Probability Theory and its Applications Vol. II,
Second Edition, John Wiley and Sons, New York.
Freund, J.E.and Walpole, R.E. (1971), Mathematical Statistics, Prentice - Hall, Inc, En-
glewood Cliffs, New Jersey.
Gnedenko, B.V. and Khinchin, A.Ya. (1962), An Elementary Introduction to the Theory
of Probability, Dover Publications, Inc., New York.
Guttman, I., Wilks, S.S and Hunter, J.S. (1982), Introduction Engineering Statistics, John
Wiley and Sons, New York.
Haln, G.J., and Shapiro, S.S. (1967) Statistical Models in Engineering, John WIley and
Sons.
Hoel, P.G. (1984), Mathematical Statistics, Fifth ed., John Wiley and Sons, New York.
Hoel, P.G., Port, S.C, and Stone, J.S. (1971), Introduction to Probability Theory, Houghton
Mifflin.
ADVANCED TOPICS
Guttman, IN INTRODUCTORY
I., Wilks, S.S and Hunter, J.S. (1982), Introduction Engineering Statistics, John
PROBABILITY:
Wiley andASons,
FIRSTNewCOURSE
York. IN
PROBABILITY THEORY – VOLUME III Bibliography
Haln, G.J., and Shapiro, S.S. (1967) Statistical Models in Engineering, John WIley and
Sons.
Hoel, P.G. (1984), Mathematical Statistics, Fifth ed., John Wiley and Sons, New York.
Hoel, P.G., Port, S.C, and Stone, J.S. (1971), Introduction to Probability Theory, Houghton
Mifflin.
Hogg, R.V. and Cragg, A.T. (1978), Introduction to Mathematical Statistics, Fourth Edi-
tion, Macmillan, New York.
Johnson, N.L. and Kotz, S.(1969) Distributions in Statistics: Discrete Distributions , John
Wiley and Sons, New York.
Johnson, N.L. and Kotz, S.(1970) Distributions in Statistics: Continuous Univariate Dis-
tributions 1 & 2, John Wiley and Sons, New York.
Kendal, M.G. and Stuart, A. (1963) The Advanced Theory of Statistics,, Vol. 1, 2nd ed.,
Charles Griffin & Company Ltd., London.
Kendal, M.G. and Stuart, A. (1976) The Advanced Theory of Statistics,, Vol. 3, 4th ed.,
Charles Griffin & Company Ltd., London.
Kendal, M.G. and Stuart, A. (1979) The Advanced Theory of Statistics,, Vol. 2, 4th ed.,
Charles Griffin & Company Ltd., London.
Bibliography 297
Kolmogorov, A.N. (1956), Foundations of the Theory of Probability, Chelsea Publishing
Company, New York.
Lai, Chung Kai (1975), Elementary Probability: Theory with Stochastic Processes, Springer
- Verlag, New York.
Lupton R., (1993), Statistics in Theory and Practice, Princeton, Princeton University
Press.
Meyer, P.L. (1970), Introductory Probability and Its Applications, 2nd Edition, Addison
Wesley Publishing Company.
Miller, I. and Freund, J.E. (1987), Probability and Statistics for Engineers, Third Edition,
Prentice-Hall, Englewood Cliffs, New Delhi.
Mood, A.M., Graybill, F.A. and Boes, D.C (1974), Introduction to the Theorey of Statis-
tics, Third Edition, McGraw-Hill, New York.
Nsowah-Nuamah, N.N.N., (2018), A First Course in Probability Theory, Vol. II: Theo-
retical Probability Distributions, Ghana Universities Press, Accra, Ghana.
Page, L.B. (1989), Probability for Engineering with Applications to Reliability, Computer
Science Press.
Prohorov, Yu.V. and Rozanov, Yu.A. (1969) Probability Theory, Springer-Verlag, New
York.
Robinson, E.A. (1985), Probability Theory and Applications International Human Re-
sources Dev. Corporation.
Ross, S. (1984), A First Course in Probability, 2nd Edition, Macmillan Publishing Com-
pany.
Spiegel, M.R (1980), Probability and Statistics, McGraw-Hill Book Company, New York.
Stoyanov. J, et. al. (1989), Exercise Manual in Probability Theory, Kluwer Academic
Publishers, London.
274
Studies in the History of Statistics and Probability, Vol. 1, edited by Pearson, E.S., and
Kendal, M. (1970), Charles Griffin & Company Ltd., London.
Prohorov, Yu.V. and Rozanov, Yu.A. (1969) Probability Theory, Springer-Verlag, New
York.
ADVANCED TOPICS IN INTRODUCTORY
Robinson, E.A. (1985), Probability Theory and Applications International Human Re-
PROBABILITY: A FIRST
sources Dev. COURSE IN
Corporation.
PROBABILITY THEORY – VOLUME III Bibliography
Ross, S. (1984), A First Course in Probability, 2nd Edition, Macmillan Publishing Com-
pany.
Spiegel, M.R (1980), Probability and Statistics, McGraw-Hill Book Company, New York.
Stoyanov. J, et. al. (1989), Exercise Manual in Probability Theory, Kluwer Academic
Publishers, London.
Studies in the History of Statistics and Probability, Vol. 1, edited by Pearson, E.S., and
Kendal, M. (1970), Charles Griffin & Company Ltd., London.
298 Advanced Topics in Introductory Probability
Studies in the History of Statistics and Probability, Vol. 1, edited by Pearson, E.S., and
Kendal, M. (1977), Charles Griffin & Company Ltd., London.
Trivedi, K.S. (1988), Probability and Statistics with Reliability, Queuing, and Computer
Science Applications, Prentice-Hall of India, New Delhi.
Wilks, S.S. (1962), Mathematical Statistics, John Wiley and Sons, New York.
275