STAT 330 Course Notes Fall 2024 Edition
STAT 330 Course Notes Fall 2024 Edition
Cyntha A. Struthers
Department of Statistics and Actuarial Science, University of Waterloo
1. Preview 1
1.1 Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
iii
iv CONTENTS
Preface
In order to provide improved versions of these Course Notes for students in subsequent
terms, please email corrections, sections that are confusing, or comments/suggestions to
[email protected].
1. Preview
The following examples will illustrate the ideas and concepts discussed in these Course
Notes. They also indicate how these ideas and concepts are connected to each other.
1.1 Example
The number of service interruptions in a communications system over 200 separate days is
summarized in the following frequency table:
It is believed that a Poisson model will …t these data well. Why might this be a reasonable
assumption? (PROBABILITY MODELS )
If we let the random variable X = number of interruptions in a day and assume that
the Poisson model is reasonable then the probability function of X is given by
xe
P (X = x) = for x = 0; 1; : : :
x!
where is a parameter of the model which represents the mean number of service inter-
ruptions in a day. (RANDOM VARIABLES, PROBABILITY FUNCTIONS, EXPEC-
TATION, MODEL PARAMETERS ) Since is unknown we might estimate it using the
sample mean
64(0) + 71(1) + + 1(5) 230
x= = = 1:15
200 200
1
2 1. PREVIEW
where
64 71 1
200! 1 1 1
c= :::
64!71! 1!0! 0! 1! 5!
dL
The maximum likelihood estimate of can be found by solving d = 0 or equivalently
d log L
d = 0 and verifying that it corresponds to a maximum.
If we want an interval of values for which are reasonable given the data then we
could construct a con…dence interval for . (INTERVAL ESTIMATION ) To construct
con…dence intervals we need to …nd the sampling distribution of the estimator. In this
example we would need to …nd the distribution of the estimator
X1 + X2 + + Xn
X=
n
where Xi = number of interruptions in a day i, i = 1; 2; : : : ; 200. (FUNCTIONS OF
RANDOM VARIABLES: cumulative distribution function technique, one-to-one transfor-
mations, moment generating function technique) Since Xi Poisson( ) with E(Xi ) =
and V ar(Xi ) = the distribution of X for large n is approximately N( ; =n) by the
Central Limit Theorem. (LIMITING DISTRIBUTIONS )
Suppose the manufacturer of the communications system claimed that the mean number
of interruptions was 1. Then we would like to test the hypothesis H : = 1. (TESTS OF
HYPOTHESIS ) A test of hypothesis uses a test statistic to measure the evidence based on
the observed data against the hypothesis. A test statistic with good properties for testing
H : = 0 is the likelihood ratio statistic, 2 log [L ( 0 ) =L (^ )]. (LIKELIHOOD RATIO
STATISTIC ) For large n the distribution of the likelihood ratio statistic is approximately
2 (1) if the hypothesis H : = 0 is true.
1.2 Example
The following are relief times in hours for 20 patients receiving a pain killer:
1:1 1:4 1:3 1:7 1:9 1:8 1:6 2:2 1:7 2:7
4:1 1:8 1:5 1:2 1:4 3:0 1:7 2:3 1:6 2:0
It is believed that the Weibull distribution with probability density function
1 (x= )
f (x) = x e for x > 0; > 0; >0
1.2. EXAMPLE 3
where xi is the observed relief time for the ith patient. (MULTIPARAMETER LIKELI-
HOODS ) The maximum likelihood estimates ^ and ^ are found by simultaneously solving
@ log L @ log L
=0 and =0
@ @
Since an explicit solution to these equations cannot be obtained, a numerical solution must
be found using an iterative method. (NEWTON’S METHOD) Also, since the maximum
likelihood estimators cannot be given explicitly, approximate con…dence intervals and tests
of hypothesis must be based on the asymptotic distributions of the maximum likelihood
estimators. (LIMITING OR ASYMPTOTIC DISTRIBUTIONS OF MAXIMUM LIKE-
LIHOOD ESTIMATORS )
4 1. PREVIEW
2. Univariate Random Variables
In this chapter we review concepts that were introduced in a previous probability course
such as STAT 220/230/240 as well as introducing new concepts. In Section 2:1 the concepts
of random experiments, sample spaces, probability models, and rules of probability are
reviewed. The concepts of a sigma algebra and probability set function are also introduced.
In Section 2:2 we de…ne a random variable and its cumulative distribution function. It
is important to note that the de…nition of a cumulative distribution function is the same
for all types of random variables. In Section 2:3 we de…ne a discrete random variable and
review the named discrete distributions (Hypergeometric, Binomial, Geometric, Negative
Binomial, Poisson). In Section 2:4 we de…ne a continuous random variable and review
the named continuous distributions (Uniform, Exponential, Normal, and Chi-squared). We
also introduce new continuous distributions (Gamma, Two Parameter Exponential, Weibull,
Cauchy, Pareto). A summary of the named discrete and continuous distributions
that are used in these Course Notes can be found in Chapter 11. In Section 2:5
we review the cumulative distribution function technique for …nding a function of a random
variable and prove a theorem which can be used in the case of a monotone function. In
Section 2:6 we review expectations of functions of random variables. In Sections 2:8 2:10
we introduce new material related to expectation such as inequalities, variance stabilizing
transformations and moment generating functions. Section 2.11 contains a number of
useful calculus results which will be used throughout these Course Notes.
2.1 Probability
To model real life phenomena for which we cannot predict exactly what will happen we
assign numbers, called probabilities, to outcomes of interest which re‡ect the likelihood of
such outcomes. To do this it is useful to introduce the concepts of an experiment and its
associated sample space. Consider some phenomenon or process which is repeatable, at
least in theory. We call the phenomenon or process a random experiment and refer to a
single repetition of the experiment as a trial. For such an experiment we consider the set
of all possible outcomes.
5
6 2. UNIVARIATE RANDOM VARIABLES
To assign probabilities to the events of interest for a given experiment we begin by de…ning
a collection of subsets of a sample space S which is rich enough to de…ne all the events of
interest for the experiment. We call such a collection of subsets a sigma algebra.
Suppose A1 ; A2 ; : : : are subsets of the sample space S which correspond to events of interest
for the experiment. To complete the probability model for the experiment we need to assign
real numbers P (Ai ) ; i = 1; 2; : : :, where P (Ai ) is called the probability of Ai . To develop
the theory of probability these probabilities must satisfy certain properties. The following
Axioms of Probability are a set of axioms which allow a mathematical structure to be
developed.
Note: The probabilities P (A1 ) ; P (A2 ) ; : : : can be assigned in any way as long they satisfy
these three axioms. However, if we wish to model real life phenomena we would assign the
probabilities such that they correspond to the relative frequencies of events in a repeatable
experiment.
2.1. PROBABILITY 7
2.1.4 Example
Let B be a sigma algebra associated with the sample space S and let P be a probability
set function with domain B. If A; B 2 B then prove the following:
(a) P (?) = 0
(b) If A; B 2 B and A and B are mutually exclusive events then P (A [ B) = P (A)+P (B).
(c) P A = 1 P (A)
(d) If A B then P (A) P (B) Note: A B means a 2 A implies a 2 B.
Solution
S
1
(a) Let A1 = S and Ai = ? for i = 2; 3; : : :. Since Ai = S then by De…nition 2.1.3 (A3)
i=1
it follows that
P
1
P (S) = P (S) + P (?)
i=2
and by (A2) we have
P
1
1=1+ P (?)
i=2
By (A1) the right side is a series of non-negative numbers which must converge to the left
side which is 1 which is …nite which results in a contradiction unless P (?) = 0 as required.
S
1
(b) Let A1 = A, A2 = B, and Ai = ? for i = 3; 4; : : :. Since Ai = A [ B then by (A3)
i=1
P
1
P (A [ B) = P (A) + P (B) + P (?)
i=3
and since P (?) = 0 by the result (a) it follows that
P (A [ B) = P (A) + P (B)
(c) Since S = A [ A and A \ A = ? then by (A2) and the result proved in (b) it follows
that
1 = P (S) = P A [ A = P (A) + P A
or
P A =1 P (A)
(d) Since B = (A \ B) [ A \ B = A [ A \ B and A \ A \ B = ? then by (b)
P (B) = P (A) + P A \ B . But by (A1), P A \ B 0 so it follows that P (B) P (A).
2.1.5 Exercise
Let B be a sigma algebra associated with the sample space S and let P be a probability
set function with domain B. If A; B 2 B then prove the following:
(a) 0 P (A) 1
(b) P A \ B = P (A) P (A \ B)
(c) P (A [ B) = P (A) + P (B) P (A \ B)
8 2. UNIVARIATE RANDOM VARIABLES
For a given experiment we are sometimes interested in the probability of an event given
that we know that the event of interest has occurred in a certain subset of S. For example,
the experiment might involve people of di¤erent ages and we may be interest in an event
only for a given age group. This leads us to de…ne conditional probability.
P (A \ B)
P (AjB) = provided P (B) > 0
P (B)
2.1.7 Example
The following table of probabilities are based on data from the 2011 Canadian census. The
probabilities are for Canadians aged 25 34.
High school
diploma or equivalent 0:185 0:016
Postsecondary
certi…cate, diploma or degree 0:683 0:040
Solution
(a) Let E be the event “employed”, A1 be the event “no certi…cate, diploma or degree”,
A2 be the event “high school diploma or equivalent”, and A3 be the event “postsecondary
certi…cate, diploma or degree”.
P (E) = P (E \ A1 ) + P (E \ A2 ) + P (E \ A3 )
= 0:066 + 0:185 + 0:683
= 0:934
2.1. PROBABILITY 9
(b)
P (A1 ) = P (E \ A1 ) + P E \ A1
= 0:066 + 0:010
= 0:076
(c)
(d)
P E \ (A2 [ A3 )
P A2 [ A3 j E =
P E
0:056
=
0:066
= 0:848
If the occurrence of event B does not a¤ect the probability of the event A, then the events
are called independent events.
P (A \ B) = P (A) P (B)
2.1.9 Example
In Example 2.1.7 are the events, “unemployed” and “no certi…cate, diploma or degree”,
independent events?
Solution
The events “unemployed”and “no certi…cate, diploma or degree”are not independent since
0:010 = P E \ A1
6= P E P (A1 ) = (0:066) (0:076)
10 2. UNIVARIATE RANDOM VARIABLES
X:S!<
2.2.2 Example
Three friends Ali, Benita and Chen are enrolled in STAT 330. Suppose we are interested in
whether these friends earn a grade of 70 or more. If we let A represent the event “Ali earns
a grade of 70 or more”, B represent the event “Benita earns a grade of 70 or more”, and C
represent the event “Chen earns a grade of 70 or more” then a suitable sample space is
Suppose we are mostly interested in how many of these friends earn a grade of 70 or
more. We can de…ne the random variable X = “number of friends who earn a grade of 70
or more”. The range of X is f0; 1; 2; 3g with associated mapping
X (ABC) = 3
X ABC = X ABC = X AB C = 2
X ABC = X AB C = X AB C = 1
X AB C = 0
Note: The cumulative distribution function is de…ned for all real numbers.
(2)
lim F (x) = 0 and lim F (x) = 1
x! 1 x!1
2.2.5 Example
Suppose X is a random variable with cumulative distribution function
8
>
> 0 x<0
>
>
>
>
< 0:1
> 0 x<1
F (x) = P (X x) = 0:3 1 x<2
>
>
>
> 0:6 2 x<3
>
>
>
:
1 x 3
1.1
0.9
0.8
0.7
0.6
F(x)
0.5
0.4
0.3
0.2
0.1
0
-1 -0.5 0 0.5 1 1.5 2 2.5 3 3.5 4
x
(vi) P (0 X 2).
Solution
(a) See Figure 2.1.
(b) (i)
P (X 1) = F (1) = 0:3
(ii)
P (X 2) = F (2) = 0:6
(iii)
P (X 2:4) = P (X 2) = F (2) = 0:6
(iv)
P (X = 2) = F (2) lim F (x) = 0:6 0:3 = 0:3
x!2
or
P (X = 2) = P (X 2) P (X 1) = F (2) F (1) = 0:6 0:3 = 0:3
(v)
P (0 < X 2) = P (X 2) P (X 0) = F (2) F (0) = 0:6 0:1 = 0:5
(vi)
P (0 X 2) = P (X 2) P (X < 0) = F (2) 0 = 0:6
2.2. RANDOM VARIABLES 13
2.2.6 Example
Suppose X is a random variable with cumulative distribution function
8
>
> 0 x 0
>
>
>
< 2 3
5x 0<x 1
F (x) = P (X x) = 1 2
>
> 5 12x 3x 7 1<x<2
>
>
>
: 1 x 2
Solution
0.8
0.6
F (x)
0.4
0.2
0
0 0.5 1 1.5 2
(b) (i)
P (X 1) = F (1)
2 2
= (1)3 =
5 5
= 0:4
(ii)
P (X 2) = F (2) = 1
(iii)
P (X 2:4) = F (2:4) = 1
(iv)
P (X = 0:5)
(v)
P (X = b)
(vi)
(vii)
f (x) = P (X = x)
= F (x) lim F (x ") for x 2 <
"!0+
(2)
P
f (x) = 1
x2A
2.3.4 Example
In Example 2.2.5 …nd the support set A, show that X is a discrete random variable and
determine its probability function.
Solution
The support set of X is A = f0; 1; 2; 3g which is a countable set. Its probability function is
8
>
> 0:1 if x = 0
>
>
>
< P (X 1) P (X 0) = 0:3 0:1 = 0:2 if x = 1
f (x) = P (X = x) =
>
> P (X 2) P (X 1) = 0:6 0:3 = 0:3 if x = 2
>
>
>
: P (X 3) P (X 2) = 1 0:6 = 0:4 if x = 3
or
x 0 1 2 3 Total
f (x) = P (X = x) 0:1 0:2 0:3 0:4 1
16 2. UNIVARIATE RANDOM VARIABLES
P
3
Since P (X 2 A) = P (X = x) = 1, X is a discrete random variable.
x=0
In the next example we review four of the named distributions which were introduced in a
previous probability course.
2.3.5 Example
Suppose a box containing a red balls and b black balls. For each of the following …nd
P
the probability function of the random variable X and show that f (x) = 1 where
x2A
A = fx : f (x) > 0g is the support set of X.
(a) X = number of red balls among n balls drawn at random without replacement.
(b) X = number of red balls among n balls drawn at random with replacement.
(c) X = number of black balls selected before obtaining the …rst red ball if sampling is
done at random with replacement.
(d) X = number of black balls selected before obtaining the kth red ball if sampling is done
at random with replacement.
Solution
(a) If n balls are selected at random without replacement from a box of a red balls and
b black balls then the random variable X = number of red balls has a Hypergeometric
distribution with probability function
a b
x n x
f (x) = P (X = x) = for x = max (0; n b) ; : : : ; min (a; n)
a+b
n
a b
P min(a;n)
P x n x
f (x) =
x2A x=max(0;n b) a+b
n
1 P
1 a b
=
a+b x=0 x n x
n
a+b
n
=
a+b
n
= 1
2.3. DISCRETE RANDOM VARIABLES 17
(b) If n balls are selected at random with replacement from a box of a red balls and b black
balls then we have a sequence of Bernoulli trials and the random variable X = number of
red balls has a Binomial distribution with probability function
n x
f (x) = P (X = x) = p (1 p)n x
for x = 0; 1; : : : ; n
x
a
where p = a+b . By the Binomial series 2.11.3(1)
P P
n n x
f (x) = p (1 p)n x
x2A x20 x
= (p + 1 p)n
= 1
(c) If sampling is done with replacement then we have a sequence of Bernoulli trials and
the random variable X = number of black balls selected before obtaining the …rst red ball
has a Geometric distribution with probability function
P P
1
f (x) = p (1 p)x
x2A x20
p
=
[1 (1 p)]
= 1
(d) If sampling is done with replacement then we have a sequence of Bernoulli trials and
the random variable X = number of black balls selected before obtaining the kth red ball
has a Negative Binomial distribution with probability function
x+k 1
f (x) = P (X = x) = pk (1 p)x for x = 0; 1; : : :
x
x+k 1 k
= ( 1)x
x x
k k
f (x) = P (X = x) = p (p 1)x for x = 0; 1; : : :
x
18 2. UNIVARIATE RANDOM VARIABLES
2.3.6 Example
If X is a random variable with probability function
xe
f (x) = for x = 0; 1; : : : ; >0 (2.1)
x!
show that
P
1
f (x) = 1
x=0
Solution
By the Exponential series 2.11.7
P
1 P
1 xe
f (x) =
x=0 x=0 x!
P
1 x
= e
x=0 x!
= e e
= 1
2.3.7 Exercise
If X is a random variable with probability function
(1 p)x
f (x) = for x = 1; 2; : : : ; 0 < p < 1
x log p
show that
P
1
f (x) = 1
x=1
Hint: Use the Logarithmic series 2.11.8.
Important Note: A summary of the named distributions used in these Course Notes can
be found in Chapter 11.
2.4. CONTINUOUS RANDOM VARIABLES 19
Note: The de…nition (2.2.3) and properties (2.2.4) of the cumulative distribution function
hold for the random variable X regardless of whether X is discrete or continuous.
2.4.2 Example
Suppose X is a random variable with cumulative distribution function
8
>
> 0 x 1
>
>
>
< 1 2
2 (x + 1) + x + 1 1<x 0
F (x) = P (X x) = 1 2
>
> 2 (x 1) + x 0<x<1
>
>
>
: 1 x 1
Solution
The cumulative distribution function F is a continuous function for all x 2 < since it is a
piecewise function composed of continuous functions and
F ( 1 + h) F ( 1) F ( 1 + h) F ( 1)
lim = 0 6= lim =1
h!0 h h!0+ h
F (0 + h) F (0) F (0 + h) F (0)
lim = 0 = lim
h!0 h h!0 + h
F (1 + h) F (1) F (1 + h) F (1)
lim = 1 6= lim =0
h!0 h h!0+ h
2.4.5 Example
Find and sketch the probability density function of the random variable X with the cumu-
lative distribution function in Example 2.2.6.
Solution
By taking the derivative of F (x) we obtain
8
>
> 0 x<0
>
>
>
< d 2 3 6 2
dx 5 x = 5x 0<x<1
F 0 (x) = d 12x 3x2 7 1
>
> = 5 (12 6x) 1<x<2
>
>
dx 5
>
: 0 x>2
We can assign any values to f (0), f (1), and f (2). For convenience we choose
f (0) = f (2) = 0 and f (1) = 56 .
2.4. CONTINUOUS RANDOM VARIABLES 21
1.4
1.2
1
f(x)
0.8
0.6
0.4
0.2
0
-0.5 0 0.5 1 1.5 2 2.5
x
Note: See Table 2.1 in Section 2.7 for a summary of the di¤erences between the properties
of a discrete and continuous random variable.
2.4.6 Example
Suppose X is a random variable with cumulative distribution function
8
>
> 0 x a
<
x a
F (x) = b a a<x b
>
>
: 1 x>b
where b > a.
(a) Sketch F (x) the cumulative distribution function of X.
(b) Find f (x) the probability density function of X and sketch it.
(c) Is it possible for f (x) to take on values greater than one?
22 2. UNIVARIATE RANDOM VARIABLES
Solution
(a) See Figure 2.4 for a sketch of the cumulative distribution function.
1.0
F(x)
0.5
0 x
a (a+b)/2 b
The derivative of F (x) does not exist for x = a or x = b. For convenience we de…ne
f (a) = f (b) = b 1 a so that
( 1
b a a x b
f (x) =
0 otherwise
See Figure 2.5 for a sketch of the probability density function. Note that we could de…ne
f (a) and f (b) to be any values and the cumulative distribution function would remain the
same since x = a or x = b are countably many points.
The random variable X is said to have a Uniform(a; b) distribution. We write this as
X Uniform(a; b).
(c) If a = 0 and b = 0:5 then f (x) = 2 for 0 x 0:5. This example illustrates that the
probability density function is not a probability and that the probability density function
can take on values greater than one.
The important restriction for continuous random variables is
Z1
f (x) dx = 1
1
2.4. CONTINUOUS RANDOM VARIABLES 23
1/(b-a)
f(x)
x
a (a+b)/2 b
Figure 2.5: Graph of the probability density function of a Uniform(a; b) random variable
2.4.7 Example
Consider the function
f (x) = for x 1
x +1
and 0 otherwise. For what values of is this function a probability density function?
Solution
Using the result (2.8) from Section 2.11
Z1 Z1
f (x) dx = +1
dx
x
1 1
Zb
1
= lim x dx
b!1
1
h i
= lim x jb1
b!1
1
= 1 lim
b!1 b
= 1 if >0
Also f (x) 0 if > 0. Therefore f (x) is a a probability density function for all > 0.
X is said to have a Pareto(1; ) distribution.
24 2. UNIVARIATE RANDOM VARIABLES
A useful function for evaluating integrals associated with several named random variables
is the Gamma function.
2.4.10 Example
Suppose X is a random variable with probability density function
x 1 e x=
f (x) = for x > 0; > 0; >0
( )
and 0 otherwise.
X is said to have a Gamma distribution with parameters and and we write
X Gamma( ; ).
(a) Verify that
Z1
f (x)dx = 1
1
Note: See Chapter 11 - Summary of Named Distributions. Note that the notation for
parameters used for named distributions is not necessarily the same in all textbooks. This
is especially true for distributions with two or more parameters.
2.4. CONTINUOUS RANDOM VARIABLES 25
Solution
(a)
Z1 Z1 1 e x=
x
f (x) dx = dx let y = x=
( )
1 0
Z1
1 1 y
= (y ) e dy
( )
0
Z1
1 1 y
= y e dy
( )
0
( )
=
( )
= 1
1 x=
f (x) = e for x > 0; >0
0.45
α=10,β=0.3
0.4
0.35
α=1,β=3 α=5,β=0.6
0.3
f(x)
0.25
α=2,β=1.5
0.2
0.15
0.1
0.05
0
0 1 2 3 4 5 6 7 8 9
x
2.4.11 Exercise
Suppose X is a random variable with probability density function
1 (x= )
f (x) = x e for x > 0; > 0; >0
and 0 otherwise.
X is said to have a Weibull distribution with parameters and and we write
X Weibull( ; ).
(a) Verify that
Z1
f (x)dx = 1
1
2.4.12 Exercise
Suppose X is a random variable with probability density function
f (x) = +1
for x > ; > 0; >0
x
and 0 otherwise.
X is said to have a Pareto distribution with parameters and and we write
X Pareto( ; ).
(a) Verify that
Z1
f (x)dx = 1
1
or equivalently
f (x; ) = f0 (x ) for 2<
or equivalently
1 x
f (x; ) = f1 ( ) for >0
2.5.3 Example
Suppose X is a continuous random variable with probability density function
1 (x )=
f (x) = e for x ; 2 <; >0
and 0 otherwise.
X is said to have a Two Parameter Exponential distribution and we write X Double
Exponential( ; ).
(a) If X Two Parameter Exponential( ; 1) show that is a location parameter for this
distribution. Sketch the probability density function for = 1; 0; 1 on the same graph.
(b) If X Two Parameter Exponential(0; ) show that is a scale parameter for this
distribution. Sketch the probability density function for = 0:5; 0; 2 on the same graph.
28 2. UNIVARIATE RANDOM VARIABLES
Solution
(a) For X Two Parameter Exponential( ; 1) the probability density function is
(x )
f (x; ) = e for x ; 2<
and 0 otherwise.
Let
x
f0 (x) = f (x; = 0) = e for x > 0
and 0 otherwise. Then
(x )
f (x; ) = e
= f0 (x ) for all 2<
0.9
0.6
f(x)
0.5
0.4
0.3
0.2
0.1
0
-1 0 1 2 3 4 5
x
Let
x
f1 (x) = f (x; = 1) = e for x > 0
and 0 otherwise. Then
1 x= 1 x
f (x; ) = e = f1 for all >0
1.8
θ=0.5
1.6
1.4
1.2
f(x)
1 θ=1
0.8
0.6
θ=2
0.4
0.2
0
0 0.5 1 1.5 2 2.5 3
x
2.5.4 Exercise
Suppose X is a continuous random variable with probability density function
1
f (x) = for x 2 <; 2 <; >0
f1 + [(x )= ]2 g
and 0 otherwise.
X is said to have a two parameter Cauchy distribution and we write
X Cauchy( ; ).
(a) If X Cauchy( ; 1) then show that is a location parameter for the distribution. Graph
the Cauchy( ; 1) probability density function for = 1; 0 and 1 on the same graph.
(b) If X Cauchy(0; ) then show that is a scale parameter for the distribution. Graph
the Cauchy(0; ) probability density function for = 0:5; 1 and 2 on the same graph.
30 2. UNIVARIATE RANDOM VARIABLES
2.6.2 Example
Solution
1 z 2 =2
f (z) = p e for z 2 <
2
G (y) = P (Y y)
= P Z2 y
p p
= P( y Z y)
p
Zy
1 z 2 =2
= p e dz
p 2
y
p
Zy
1 z 2 =2
= 2 p e dz since f (z) is an even function
2
0
2.6. FUNCTIONS OF A RANDOM VARIABLE 31
by 2.11.10.
We recognize (2.3) as the probability density function of a Chi-squared(1) random vari-
able so Y = Z 2 2 (1).
2.6.3 Theorem
If Z N(0; 1) then Z 2 2 (1).
2.6.4 Example
Suppose X Exponential( ). The cumulative distribution function for X is
8
<0 x 0
F (x) = P (X x) =
:1 e x= x>0
Solution
X=
G (y) = P (Y y) = P 1 e y
= P (X log (1 y))
= F( log (1 y))
log(1 y)=
= 1 e
= 1 (1 y)
= y
32 2. UNIVARIATE RANDOM VARIABLES
This is an example of a result which holds more generally as summarized in the following
theorem.
Proof
Suppose the continuous random variable X has support set A = fx : f (x) > 0g. For all
x 2 A, F is an increasing function since F is a cumulative distribution function. Therefore
for all x 2 A the function F has an inverse function F 1 .
For 0 < y < 1, the cumulative distribution function of Y = F (X) is
G (y) = P (Y y) = P (F (X) y)
1
= P X F (y)
1
= F F (y)
= y
Note: Because of the form of the function (transformation) Y = F (X) in (2.4), this
transformation is called the probability integral transformation. This result holds for any
cumulative distribution function F corresponding to a continuous random variable.
2.6. FUNCTIONS OF A RANDOM VARIABLE 33
2.6.6 Theorem
Suppose F is a cumulative distribution function for a continuous random variable. If
U Uniform(0; 1) then the random variable X = F 1 (U ) also has cumulative distribution
function F .
Proof
Suppose that the support set of the random variable X = F 1 (U ) is A. For x 2 A, the
cumulative distribution function of X = F 1 (U ) is
1
P (X x) = P F (U ) x
= P (U F (x))
= F (x)
Note: The result of the previous theorem is important because it provides a method for
generating observations from a continuous distribution. Let u be an observation generated
from a Uniform(0; 1) distribution using a random number generator. Then by Theorem
2.6.6, x = F 1 (u) is an observation from the distribution with cumulative distribution
function F .
2.6.7 Example
Explain how Theorem 2.6.6 can be used to generate observations from a Weibull( ; 1)
distribution.
Solution
If X has a Weibull( ; 1) distribution then the cumulative distribution function is
Zx
1 y
F (x; ) = y e dy
0
x
= 1 e for x > 0
F 1
(u) = [ log (1 u)]1= for 0 < u < 1
If we wish to …nd the distribution of the random variable Y = h(X) and h is a one-to-one
real-valued function then the following theorem can be used.
34 2. UNIVARIATE RANDOM VARIABLES
1 d 1
g(y) = f (h (y)) h (y) for y 2 B
dy
Proof
We prove this theorem using the cumulative distribution function technique.
d
(1) Suppose h is an increasing function and dx h (x) is continuous for x 2 A. Then h 1 (y)
d
is also an increasing function and dy h 1 (y) > 0 for y 2 B. The cumulative distribution
function of Y = h(X) is
G (y) = P (Y y) = P (h (X) y)
1
= P X h (y) since h is an increasing function
1
= F h (y)
Therefore
d d
g (y) = G (y) = F h 1 (y)
dy dy
d
= F 0 h 1 (y) h 1 (y) by the Chain Rule
dy
d d
= f h 1 (y) h 1 (y) y 2 B since h 1 (y) > 0
dy dy
d
(2) Suppose h is a decreasing function and dx h (x) is continuous for x 2 A. Then h 1 (y)
d
is also a decreasing function and dy h 1 (y) < 0 for y 2 B. The cumulative distribution
function of Y = h(X) is
G (y) = P (Y y) = P (h (X) y)
1
= P X h (y) since h is a decreasing function
1
= 1 F h (y)
Therefore
d d
g (y) = G (y) = 1 F h 1 (y)
dy dy
d
= F 0 h 1 (y) h 1 (y) by the Chain Rule
dy
d d
= f h 1 (y) h 1 (y) y 2 B since h 1 (y) < 0
dy dy
These two cases give the desired result.
2.6. FUNCTIONS OF A RANDOM VARIABLE 35
2.6.9 Example
The following two results were used extensively in your previous probability and statistics
courses.
(a) If Z N(0; 1) then Y = + Z N ; 2 .
2) X
(b) If X N( ; then Z = N(0; 1).
Prove these results using Theorem 2.6.8.
Solution
(a) If Z N(0; 1) then the probability density function of Z is
1 z 2 =2
f (z) = p e for z 2 <
2
1 (Y Y
Y = + Z = h (Z) is an increasing function with inverse function Z = h )= .
Since the support set of Z is A = <, the support set of Y is B = <.
Since
d d y 1
h 1 (y) = =
dy dy
then by Theorem 2.6.8 the probability density function of Y is
1 d 1
g(y) = f (h (y)) h (y)
dy
1 y 2
=2 1
= p e ( ) for y 2 <
2
which is the probability density function of an N ; 2 random variable.
Therefore if Z N(0; 1) then Y = + Z N ; 2 .
(b) If X N( ; 2) then the probability density function of X is
1 x 2
f (x) = p e ( ) =2
for x 2 <
2
Z= X = h (X) is an increasing function with inverse function X = h 1 (Z) = + Z.
Since the support set of X is A = <, the support set of Z is B = <.
Since
d d
h 1 (z) = ( + z) =
dz dy
then by Theorem 2.6.8 the probability density function of Z is
1 d 1
g(z) = f (h (z)) h (z)
dy
1 z 2 =2
= p e for z 2 <
2
which is the probability density function of an N(0; 1) random variable.
2) Y
Therefore if X N( ; then Z = N(0; 1).
36 2. UNIVARIATE RANDOM VARIABLES
2.6.10 Example
Use Theorem 2.6.8 to prove the following relationship between the Pareto distribution and
the Exponential distribution.
1
If X P areto (1; ) then Y = log(X) Exponential
Solution
If X Pareto(1; ) then the probability density function of X is
f (x) = +1
for x 1; >0
x
Y = log(X) = h (X) is an increasing function with inverse function X = eY = h 1 (Y ).
Since the support set of X is A = fx : x 0g, the support set of Y is B = fy : y > 0g.
Since
d 1 d y
h (y) = e = ey
dy dy
then by Theorem 2.6.8 the probability density function of Y is
1 d 1
g(y) = f (h (y)) h (y)
dy
= +1
ey = e y
for y > 0
(ey )
1
which is the probability density function of an Exponential random variable as required.
2.6.11 Exercise
Use Theorem 2.6.8 to prove the following relationship between the Exponential distribution
and the Weibull distribution.
2.6.12 Exercise
Suppose X is a random variable with probability density function
1
f (x) = x for 0 < x < 1; >0
and 0 otherwise.
1
Use Theorem 2.6.8 to prove that Y = log X Exponential .
2.7. EXPECTATION 37
2.7 Expectation
In this section we de…ne the expectation operator E which maps random variables to real
numbers. These numbers have an interpretation in terms of long run averages for repeated
independent trails of an experiment associated with the random variable. Much of this
section is a review of material covered in a previous probability course.
If X is a continuous random variable with probability density function f (x) then the ex-
pectation of the random variable h (X) is de…ned by
Z1
E [h(X)] = h (x) f (x)dx
1
2.7.2 Example
Find E(X) if X Geometric(p).
Solution
If X Geometric(p) then
and
P P
1 P
1
E (X) = xf (x) = xpq x = xpq x
x2A x=0 x=1
P
1
= pq xq x 1
which converges if 0 < q < 1
x=1
pq
= by 2.11.2(2)
(1 q)2
q
= if 0 < q < 1
p
2.7.3 Example
Suppose X Pareto(1; ) with probability density function
f (x) = +1
for x 1; >0
x
and 0 otherwise. Find E(X). For what values of does E(X) exist?
Solution
Z1 Z1
E (X) = xf (x)dx = x +1
dx
x
1 1
Z1
1
= dx which converges for > 1 by 2.8
x
1
Zb
+1 b 1
= lim x dx = lim x j1 = 1 1
b!1 1 b!1 1 b
1
= for >1
1
2.7.4 Exercise
Suppose X is a nonnegative continuous random variable with cumulative distribution func-
tion F (x) and E(X) < 1. Show that
Z1
E(X) = [1 F (x)] dx
0
E (aX + b) = aE (X) + b
E [ag(X) + bh(X)] = aE [g(X)] + bE [h(X)]
Z1
E [ag(X) + bh(X)] = [ag(x) + bh(x)] f (x)dx
1
Z1 Z1
= a g(x)f (x)dx + b h(x)f (x)dx by properties of integrals
1 1
= aE [g(X)] + bE [h(X)] by De…nition 2.7.1
as required.
E(X k )
V ar(X) = E[(X )2 ] = 2
where = E(X)
2
= V ar(X)
= E(X 2 ) 2
2
= E[X(X 1)] +
V ar(aX + b) = a2 V ar(X)
and
E(X 2 ) = 2
+ 2
V ar(X) = E[(X )2 ] = E X 2 2 X+ 2
= E X2 2 E (X) + 2
by Theorem 2.7.5
2 2 2
= E X 2 +
= E X2 2
Also
V ar(X) = E X 2 2
= E [X (X 1) + X] 2
2
= E [X (X 1)] + E (X) by Theorem 2.7.5
2
= E [X (X 1)] +
n o
V ar (aX + b) = E [aX + b (a + b)]2
h i h i
= E (aX a )2 = E a2 (X )2
= a2 E[(X )2 ] by Theorem 2.7.5
= a2 V ar(X) by de…nition
E(X 2 ) = 2
+ 2
2.7. EXPECTATION 41
2.7.8 Example
Solution
P
n n x
E(X (k) ) = x(k) p (1 p)n x
x=k x
Pn n k x
= n(k) p (1 p)n x
by 2.11.4(1)
x=k x k
nPk n k
= n(k) py+k (1 p)n y k
y=x k
y=0 y
nPk n k
= n(k) pk py (1 p)(n k) y
y=0 y
= n (k) k
p (p + 1 p)n k
by 2.11.3(1)
= n(k) pk for k = 1; 2; : : :
For k = 1 we obtain
E X (1) = E (X)
= n(1) p1
= np
For k = 2 we obtain
E X (2) = E [X (X 1)]
= n(2) p2
= n (n 1) p2
Therefore
2
V ar (X) = E[X(X 1)] +
= n (n 1) p2 + np n2 p2
= np2 + np
= np (1 p)
42 2. UNIVARIATE RANDOM VARIABLES
2.7.9 Exercise
Show the following:
k
(a) If X Poisson( ) then E(X (k) ) = for k = 1; 2; : : :
j
p 1
(b) If X Negative Binomial(k; p) then E(X (j) ) = ( k)(j) p for j = 1; 2; : : :
p
(c) If X Gamma( ; ) then E(X p ) = ( + p) = ( ) for p > .
k k
(d) If X Weibull( ; ) then E(X k ) = +1 for k = 1; 2; : : :
In each case …nd E (X) and V ar(X).
Table 2.1 summarizes the di¤erences between the properties of a discrete and continuous
random variable.
P Rx
F (x) = P (X x) = P (X = t) F (x) = P (X x) = f (t) dt
c.d.f. t x 1
F is a right continuous F is a continuous
function for all x 2 < function for all x 2 <
P
P (X 2 E) = P (X = x) P (a < X b) = F (b) F (a)
Probability x2E
P Rb
of an event = f (x) = f (x) dx
x2E a
Total P P R1
P (X = x) = f (x) = 1 f (x) dx = 1
Probability x2A x2A 1
where A = support set of X
P R1
Expectation E [g (X)] = g (x) f (x) E [g (X)] = g (x) f (x) dx
x2A 1
where A = support set of X
2.8 Inequalities
In Chapter 5 we consider limiting distributions of a sequence of random variables. The
following inequalities which involve the moments of a distribution are useful for proving
limit theorems.
E(jXjk )
P (jXj c) for all k; c > 0
ck
Proof (Continuous Case)
Suppose X is a continuous random variable with probability density function f (x). Let
x k
A= x: 1 = fx : jxj cg since c > 0
c
Then
! Z1
E jXjk X k
x k
= E = f (x) dx
ck c c
1
Z Z
x k x k
= f (x) dx + f (x) dx
c c
A A
Z Z
x k x k
f (x) dx since f (x) dx 0
c c
A A
Z
x k
f (x) dx since 1 for x 2 A
c
A
= P (jXj c)
as required. (The proof of the discrete case follows by replacing integrals with sums.)
2.8.3 Exercise
Use Markov’s Inequality to prove Chebyshev’s Inequality.
44 2. UNIVARIATE RANDOM VARIABLES
Y = g (X) g ( ) + g 0 ( ) (X )
Therefore
E (Y ) E g ( ) + g 0 ( ) (X ) = g( )
since
E g 0 ( ) (X ) = g 0 ( ) E [(X )] = 0
Also
2 2
V ar (Y ) V ar g 0 ( ) (X ) = g0 ( ) V ar (X) = g 0 ( ) ( ) (2.5)
If we want V ar (Y ) constant with respect to then we should choose g such that
2 2
g0 ( ) V ar (X) = g 0 ( ) ( ) = constant
2.9.1 Example
p
If X Poisson( ) then show that the random variable Y = g (X) = X has approximately
constant variance.
Solution
p p p 1
If X Poisson( ) then V ar (X) = ( ) = . For g (X) = X, g 0 (X) = 2X
1=2 .
p
Therefore by (2.5), the variance of Y = g (X) = X is approximately
p 2
2 1 1
g0 ( ) ( ) = 1=2
=
2 4
which is a constant.
2.9.2 Exercise
If X Exponential( ) then show that the random variable Y = g (X) = log X has approx-
imately constant variance.
2.10. MOMENT GENERATING FUNCTIONS 45
Important: When determining the moment generating function M (t) of a random variable
the values of t for which the expectation exists should always be stated.
2.10.2 Example
(a) Find the moment generating function of the random variable X Gamma( ; ).
(b) Find the moment generating function of the random variable X Negative Binomial(k; p).
Solution
(a) If X Gamma( ; ) then
Z1 Z1 1
tx x
M (t) = e f (x) dx = etx e x=
dx
( )
1 0
Z1
1 1 x 1
t 1
= x e dx which converges for t <
( )
0
Z1
1 1 x 1 t 1 t
= x e dx let y = x
( )
0
Z1 1
1 y y
= e dy
( ) 1 t 1 t
0
Z1
1 1 1 y ( ) 1
= y e dy =
( ) 1 t ( ) 1 t
0
1 1
= for t <
1 t
46 2. UNIVARIATE RANDOM VARIABLES
P
1 k k
M (t) = etx p ( q)x where q = 1 p
x20 x
P
1 k x
= pk et ( q) which converges for qet < 1
x20 x
k
= pk 1 qet by 2.11.3(2) for et < q 1
k
p
= for t < log (q)
1 qet
2.10.3 Exercise
(a) Show that the moment generating function of the random variable X Binomial(n; p)
n
is M (t) = q + pet for t 2 <.
(b) Show that the moment generating function of the random variable X Poisson( ) is
t
M (t) = e (e 1) for t 2 <.
If the moment generating function of random variable X exists then the following theo-
rem gives us a method for determining the distribution of the random variable Y = aX + b
which is a linear function of X.
Proof
as required.
2.10.5 Example
(a) Find the moment generating function of Z N(0; 1).
(b) Use (a) and the fact that X = + Z N( ; 2) to …nd the moment generating function
of a N( ; 2 ) random variable.
2.10. MOMENT GENERATING FUNCTIONS 47
Solution
(a) The moment generating function of Z is
Z1
1 z 2 =2
MZ (t) = etz p e dz
2
1
Z1
1 2
= p e (z 2tz )=2
dz
2
1
Z1
t2 =2 1 (z t)2 =2
= e p e dz
2
1
t2 =2
= e for t 2 <
by 2.4.4(2) since
1 (z t)=2
p e
2
is the probability density function of a N (t; 1) random variable.
(b) By Theorem 2.10.4 the moment generating function of X = + Z is
MX (t) = e t MZ ( t)
t)2 =2
= e t e(
t+ 2 t2 =2
= e for t 2 <
2.10.6 Exercise
If X Negative Binomial(k; p) then …nd the moment generating function of Y = X + k,
k = 1; 2; : : :
where
dk
M (k) (t) = M (t)
dtk
is the kth derivative of M (t).
48 2. UNIVARIATE RANDOM VARIABLES
dk
M (k) (t) = E etX
dtk
Z1
dk
= etx f (x) dx
dtk
1
Z1
dk tx
= e f (x) dx k = 1; 2; : : :
dtk
1
assuming the operations of di¤erentiation and integration can be exchanged. (This inter-
change of operations cannot always be done but for the moment generating functions of
interest in this course the result does hold.)
Using (2.6) we have
Z1
(k)
M (t) = xk etx f (x) dx
1
Letting t = 0 we obtain
M (k) (0) = E X k k = 1; 2; : : :
as required.
2.10.8 Example
If X Gamma( ; ) then M (t) = (1 t) , t < 1= . Find E X k , k = 1; 2; : : : using
Theorem 2.10.7.
Solution
d d
M (t) = M 0 (t) = (1 t)
dt dt
1
= (1 t)
2.10. MOMENT GENERATING FUNCTIONS 49
so
E (X) = M 0 (0) =
d2 00 d2
M (t) = M (t) = (1 t)
dt2 dt2
2
= ( + 1) 2 (1 t)
so
E X 2 = M 00 (0) = ( + 1) 2
dk (k) dk
M (t) = M (t) = (1 t)
dtk dtk
k k
= ( + 1) ( +k 1) (1 t) for k = 1; 2; : : :
so
E Xk = M (k) (0)
= ( + 1) ( +k 1) k
=( +k 1)(k) k
for k = 1; 2; : : :
Suppose M (k) (t), k = 1; 2; : : : exists for t 2 ( h; h) for some h > 0, then M (t) has a
Maclaurin series given by
P1 M (k) (0)
tk
k=0 k!
where
M (0) (0) = M (0) = 1
Therefore if we can obtain a Maclaurin series for M (t), for example, by using the Binomial
series or the Exponential series, then we can …nd E(X k ) by using
2.10.10 Example
Suppose X Gamma( ; ). Find E X k by using the Binomial series expansion for
M (t) = (1 t) , t < 1= .
Solution
M (t) = (1 t)
P1 1
= ( t)k for jtj < by 2.11.3(2)
k=0 k
P1 1
= ( )k tk for jtj <
k=0 k
( )k for k = 1; 2; : : :
k
Therefore
= k! ( )k
k
)(k)
(
= k! ( )k
k!
= ( )( 1) ( k + 2) ( k + 1) ( )k
k
= ( +k 1) ( + k 2) ( + 1) ( )
= ( +k 1)(k) k
for k = 1; 2; : : :
Proof
See Problem 18 for the proof of this result in the discrete case.
2.10. MOMENT GENERATING FUNCTIONS 51
2.10.12 Example
If X Exponential(1) then …nd the distribution of Y = + X where > 0 and 2 <.
Solution
From Example 2.4.10 we know that if X Exponential(1) then X Gamma(1; 1) so
1
MX (t) = for t < 1
1 t
By Theorem 2.10.4
2.10.13 Example
2X 2 (2
If X Gamma( ; ), where is a positive integer, then show ).
Solution
(a) From Example 2.10.2 the moment generating function of X is
1 1
M (t) = for t <
1 t
2X
By Theorem 2.10.4 the moment generating function of Y = is
2 1
MY (t) = MX t for jtj <
2
( )
2 3
1 1
= 4 5 for jtj <
1 2
t 2
1 1
= for jtj <
1 2t 2
By examining the list of moment generating functions in Chapter 11 we see that this is
the moment generating function of a 2 (2 ) random variable if is a positive integer.
Therefore by the Uniqueness Theorem for Moment Generating Functions, Y has a 2 (2 )
distribution if is a positive integer.
52 2. UNIVARIATE RANDOM VARIABLES
2.10.14 Exercise
Suppose the random variable X has moment generating function
2 =2
M (t) = et for t 2 <
(a) Use (2.7) and the Exponential series 2.11.7 to …nd E (X) and V ar (X).
(b) Find the moment generating function of Y = 2X 1. What is the distribution of Y ?
(2)
P
1 1
xtx 1
= for jtj < 1
x=1 (1 t)2
P
n n x n
(a + b)n = a b x
x=0 x
where
n n! n(x)
= =
x x! (n x)! x!
P
1 n x
(1 + t)n = t
x=0 x
where
n n(x) n(n 1) (n x + 1)
= =
x x! x!
2.11. CALCULUS REVIEW 53
n n k
(1) x(k) = n(k)
x x k
x+k 1 x+k 1 k
(2) = = ( 1)x
x k 1 x
PP P n!
(a1 + a2 + + ak )n = ax1 ax2 axk k
x1 !x2 ! xk ! 1 2
P
1 a b a+b
=
x=0 x n x n
x x2 P1 xn
ex = 1 + + + = for x 2 <
1! 2! n=0 n!
x2 x3
ln (1 + x) = log (1 + x) = x + for 1<x 1
2 3
Zx
g (x) = f (t) dt for a x b
a
h(x)
Z
G (x) = f (t) dt for a x b
a
Zu
g (u) = f (t) dt
a
Z1 Zb
f (x) dx = lim f (x) dx
b!1
a a
provided this limit exists. If the limit exists we say the improper integral converges otherwise
we say the improper integral diverges.
Rb
(b) If f (x) dx exists for every number a b then
a
Zb Zb
f (x) dx = lim f (x) dx
a! 1
1 a
Z1 Za Z1
f (x) dx = f (x) dx + f (x) dx
1 1 a
(b)
Z1 Z1
If g (x) dx is divergent then f (x) dx is divergent.
a a
5. The Geometric and Exponential distributions both have a property referred to as the
memoryless property.
6. Suppose that f1 (x) ; f2 (x) ; : : : ; fk (x) are probability density functions with support
sets A1 ; A2 ; : : : ; Ak ; means 1 ; 2 ; : : : ; k ; and …nite variances 21 ; 22 ; : : : ; 2k respec-
Pk
tively. Suppose that 0 < p1 ; p2 ; : : : ; pk < 1 and pi = 1.
i=1
P
k
(a) Show that g (x) = pi fi (x) is a probability density function.
i=1
(b) Let X be a random variable with probability density function g (x). Find the
support set of X, E (X) and V ar (X).
7.
8. Suppose T t(n).
(a) Find E X k for k = 1; 2; : : :. Use this result to …nd E (X) and V ar (X).
(b) Graph the probability density function for (i) a = 0:7, b = 0:7, (ii) a = 1, b = 3,
(iii) a = 2, b = 2, (iv) a = 2, b = 4, and (v) a = 3, b = 1 on the same graph.
(c) What special probability density function is obtained for a = b = 1?
10. If E(jXjk ) exists for some integer k > 1, then show that E(jXjj ) exists for
j = 1; 2; : : : ; k 1.
11. If X Binomial(n; ), …nd the variance stabilizing transformation g (X) such that
V ar [g (X)] is approximately constant.
1 1
E X4 P X2
4 2
13. For each of the following probability (density) functions derive the moment generating
function M (t). State the values for which M (t) exists and use the moment generating
function to …nd the mean and variance.
n x
(a) f (x) = x p (1 p)n x for x = 0; 1; : : : ; n; 0 < p < 1
(b) f (x) = xe =x! for x = 0; 1; : : :; >0
(c) f (x) = 1 e (x )= for x > ; 2 <, >0
(d) f (x) = 21 e jx j for x 2 <; 2<
(e) f (x) = 2x for 0 < x < 1
8
>
<x
> 0 x 1
(f) f (x) = 2 x 1<x 2
>
>
:0 otherwise
14. Suppose X is a random variable with moment generating function M (t) = E(etX )
which exists for t 2 ( h; h) for some h > 0. Then K(t) = log M (t) is called the
cumulant generating function of X.
(b) If X Negative Binomial(k; p) then use (a) to …nd E(X) and V ar(X).
15. For each of the following …nd the Maclaurin series for M (t) using known series. Thus
determine all the moments of X if X is a random variable with moment generating
function M (t):
17. Suppose X 2 and Z N(0; 1). Use the properties of moment generating func-
(1)
tions to compute E X and E Z k for k = 1; 2; : : : How are these two related? Is
k
18. Suppose X and Y are discrete random variables such that P (X = j) = pj and
P (Y = j) = qj for j = 0; 1; : : :. Suppose also that MX (t) = MY (t) for t 2 ( h; h),
h > 0. Show that X and Y have the same distribution. (Hint: Compare MX (log s)
and MY (log s) and recall that if two power series are equal then their coe¢ cients are
equal.)
19. Suppose X is a random variable with moment generating function M (t) = et =(1 t2 )
for jtj < 1.
Models for real phenomena usually involve more than a single random variable. When there
are multiple random variables associated with an experiment or process we usually denote
them as X; Y; : : : or as X1 ; X2 ; : : : . For example, your …nal mark in a course might involve
X1 = your assignment mark, X2 = your midterm test mark, and X3 = your exam mark.
We need to extend the ideas introduced in Chapter 2 for univariate random variables to
deal with multivariate random variables.
In Section 3:1 we began be de…ning the joint and marginal cumulative distribution
functions since these de…nitions hold regardless of what type of random variable we have.
We de…ne these functions in the case of two random variables X and Y . More than two
random variables will be considered in speci…c examples in later sections. In Section 3:2 we
brie‡y review discrete joint probability functions and marginal probability functions that
were introduced in a previous probability course. In Section 3:3 we introduce the ideas
needed for two continuous random variables and look at detailed examples since this is new
material. In Section 3:4 we de…ne independence for two random variables and show how the
Factorization Theorem for Independence can be used. When two random variables are not
independent then we are interested in conditional distributions. In Section 3:5 we review
the de…nition of a conditional probability function for discrete random variables and de…ne
a conditional probability density function for continuous random variables which is new
material. In Section 3:6 we review expectations of functions of discrete random variables.
We also de…ne expectations of functions of continuous random variables which is new ma-
terial except for the case of Normal random variables. In Section 3:7 we de…ne conditional
expectations which arise from the conditional distributions discussed in Section 3:5. In
Section 3:8 we discuss moment generating functions for two or more random variables, and
show how the Factorization Theorem for Moment Generating Functions can be used to
prove that random variables are independent. In Section 3:9 we review the Multinomial
distribution and its properties. In Section 3:10 we introduce the very important Bivariate
Normal distribution and its properties. Section 3:11 contains some useful results related to
evaluating double integrals.
61
62 3. MULTIVARIATE RANDOM VARIABLES
Note: The de…nitions and properties of the joint cumulative distribution function and the
marginal cumulative distribution functions hold for both (X; Y ) discrete random variables
and for (X; Y ) continuous random variables.
Joint and marginal cumulative distribution functions for discrete random variables are not
very convenient for determining probabilities. Joint and marginal probability functions,
which are de…ned in Section 3.2, are more frequently used for discrete random variables.
In Section 3.3 we look at speci…c examples of joint and marginal cumulative distribution
functions for continuous random variables. In Chapter 5 we will see the important role of
cumulative distribution functions in determining asymptotic distributions.
3.2. BIVARIATE DISCRETE DISTRIBUTIONS 63
The set A = f(x; y) : f (x; y) > 0g is called the support set of (X; Y ).
f1 (x) = P (X = x)
P
= f (x; y) for x 2 <
all y
f2 (y) = P (Y = y)
P
= f (x; y) for y 2 <
all x
3.2.4 Example
In a fourth year statistics course there are 10 actuarial science students, 9 statistics students
and 6 math business students. Five students are selected at random without replacement.
Let X be the number of actuarial science students selected, and let Y be the number of
statistics students selected.
64 3. MULTIVARIATE RANDOM VARIABLES
Find
(d) P (X > Y )
Solution
(a) The joint probability function of X and Y is
f (x; y) = P (X = x; Y = y)
10 9 6
x y 5 x y
=
25
5
for x = 0; 1; : : : ; 5, y = 0; 1; : : : ; 5, x + y 5
f1 (x) = P (X = x)
10 9 6
X1
x y 5 x y
=
25
y=0
5
10 15 9 6
1
X
x 5 x y 5 x y
=
25 15
y=0
5 5 x
10 15
x 5 x
= for x = 0; 1; : : : ; 5
25
5
by the Hypergeometric identity 2.11.6. Note that the marginal probability function of X
is Hypergeometric(25; 10; 5). This makes sense because, when we are only interested in the
number of actuarial science students, we only have two types of objects (actuarial science
students and non-actuarial science students) and we are sampling without replacement
which gives us the familiar Hypergeometric probability function.
3.2. BIVARIATE DISCRETE DISTRIBUTIONS 65
f2 (y) = P (Y = y)
10 9 6
X1
x y 5 x y
=
25
x=0
5
9 16 10 6
1
X
y 5 y x 5 x y
=
25 16
x=0
5 5 y
9 16
y 5 y
= for y = 0; 1; : : : ; 5
25
5
3.2.5 Exercise
The Hardy-Weinberg law of genetics states that, under certain conditions, the relative
frequencies with which three genotypes AA; Aa and aa occur in the population will be 2 ;
2 (1 ) and (1 )2 respectively where 0 < < 1. Suppose n members of a very large
population are selected at random.
Let X be the number of AA types selected and let Y be the number of Aa types selected.
Find
(a) the joint probability function of X and Y
(b) the marginal probability function of X
(c) the marginal probability function of Y
(d) P (X + Y = t) for t = 0; 1; : : :.
66 3. MULTIVARIATE RANDOM VARIABLES
@2
f (x; y) = F (x; y)
@x@y
exists and is a continuous function except possibly along a …nite number of curves. Suppose
also that
Z1 Z1
f (x; y)dxdy = 1
1 1
Then X and Y are said to be continuous random variables with joint probability density
function f . The set A = f(x; y) : f (x; y) > 0g is called the support set of (X; Y ).
@2
Note: We will arbitrarily de…ne f (x; y) to be equal to 0 when @x@y F (x; y) does not exist
although we could de…ne it to be any real number.
3.3.3 Example
Suppose X and Y are continuous random variables with joint probability density function
and 0 otherwise. The support set of (X; Y ) is A = f(x; y) : 0 < x < 1; 0 < y < 1g.
3.3. BIVARIATE CONTINUOUS DISTRIBUTIONS 67
The joint probability function for (x; y) 2 A is graphed in Figure 3.1. We notice that the
surface is the portion of the plane z = x + y lying above the region A.
1.5
0.5
0
1
1
0.8
0.5 0.6
0.4
0.2
0 0
Figure 3.1: Graph of joint probability density function for Example 3.3.3
(b) Find
1 1
(i) P X 3; Y 2
(ii) P (X Y)
1
(iii) P X + Y 2
1
(iv) P XY 2 .
68 3. MULTIVARIATE RANDOM VARIABLES
Solution
(a) A graph of the support set for (X; Y ) is given in Figure 3.2. Such a graph is useful for
determining the limits of integration of the double integral.
y A
0 x 1
Figure 3.2: Graph of the support set of (X; Y ) for Example 3.3.3
Z1 Z1 Z Z
f (x; y)dxdy = (x + y) dxdy
1 1 (x;y) 2A
Z1 Z1
= (x + y) dxdy
0 0
Z1
1 2
= x + xy j10 dy
2
0
Z1
1
= + y dy
2
0
1 1
= y + y 2 j10
2 2
1 1
= +
2 2
= 1
3.3. BIVARIATE CONTINUOUS DISTRIBUTIONS 69
1/2
x
0 1/3 1
Z Z
1 1
P X ;Y = (x + y) dxdy
3 2
(x;y) 2B
Z1=2Z1=3
= (x + y) dxdy
0 0
Z1=2
1 2 1=3
= x + xy j0 dy
2
0
Z1=2" 2
#
1 1 1
= + y dy
2 3 3
0
1 1 1=2
= y + y 2 j0
18 6
2
1 1 1 1
= +
18 2 6 2
5
=
72
70 3. MULTIVARIATE RANDOM VARIABLES
(ii) A graph of the region of integration is given in Figure 3.4. Note that when the
region is not rectangular then care must be taken with the limits of integration.
y =x
(y ,y )
x
0 1
Z Z
P (X Y)= (x + y) dxdy
(x;y) 2C
Z1 Zy
= (x + y) dxdy
y=0 x=0
Z1
1 2
= x + xy jy0 dy
2
0
Z1
1 2
= y + y 2 dy
2
0
Z1
3 2 1
= y dy = y 3 j10
2 2
0
1
=
2
Alternatively
Z1 Z1
P (X Y)= (x + y) dydx
x=0 y=x
Why does the answer of 1=2 make sense when you look at Figure 3.1?
3.3. BIVARIATE CONTINUOUS DISTRIBUTIONS 71
1/2
(x,1/2-x)
0 1/2 1 x
x+y =1/2
Z Z
1
P X +Y = (x + y) dxdy
2
(x;y) 2D
1 1
x
Z2 Z
2
= (x + y) dydx
x=0 y=0
1
Z2
1 1
x
= xy + y 2 j02 dx
2
0
Z2 " #
1
2
1 1 1
= x x + x dx
2 2 2
0
1
Z2
x2 1 x3 1 1=2
= + dx = + x j0
2 8 6 8
0
2
=
48
Alternatively
1 1
y
Z2 Z
2
1
P X +Y = (x + y) dxdy
2
y=0 x=0
Why does this small probability make sense when you look at Figure 3.1?
72 3. MULTIVARIATE RANDOM VARIABLES
(iv) A graph of the region of integration E is given in Figure 3.6. In this example the
integration can be done more easily by integrating over the region F .
1
(x,1/(2x))
1/2 -
xy=1/2
E
| x
0 1/2 1
Z Z
1
P XY = (x + y) dxdy
2
(x;y) 2E
Z Z
= 1 (x + y) dxdy
(x;y) 2F
Z1 Z1
= 1 (x + y) dydx
1
x= 12 y= 2x
Z1
1
= 1 xy + y 2 j11 dx
2 2x
1
2
Z1 ( "
2
#)
1 1 1 1
= 1 x+ x + dx
2 2x 2 2x
1
2
Z1
1
= 1 x dx
8x2
1
2
1 2 1
= 1 x + j11
2 8x 2
3
=
4
3.3. BIVARIATE CONTINUOUS DISTRIBUTIONS 73
3.3.5 Example
For the joint probability density function in Example 3.3.3 determine:
(a) the marginal probability density function of X and the marginal probability density
function of Y
(b) the joint cumulative distribution function of X and Y
(c) the marginal cumulative distribution function of X and the marginal cumulative distri-
bution function of Y
Solution
(a) The marginal probability density function of X is
Z1 Z1
1 1
f1 (x) = f (x; y)dy = (x + y) dy = xy + y 2 j10 = x + for 0 < x < 1
2 2
1 0
and 0 otherwise.
Since both the joint probability density function f (x; y) and the support set A are symmetric
in x and y then by symmetry the marginal probability density function of Y is
1
f2 (y) = y + for 0 < y < 1
2
and 0 otherwise.
(b) Since
Zy Zx Zy Zy
1 2 1 2
P (X x; Y y) = (s + t) dsdt = s + st jx0 dt = x + xt dt
2 2
0 0 0 0
1 2 1
= x t + xt2 jy0
2 2
1 2
= x y + xy 2 for 0 < x < 1, 0 < y < 1
2
74 3. MULTIVARIATE RANDOM VARIABLES
Zx Z1
1 2
P (X x; Y y) = (s + t) dtds = x +x for 0 < x < 1, y 1
2
0 0
Zy Z1
1 2
P (X x; Y y) = (s + t) dsdt = y +y for x 1, 0 < y < 1
2
0 0
(c) Since the support set of (X; Y ) is A = f(x; y) : 0 < x < 1; 0 < y < 1g then
F1 (x) = P (X x)
= lim F (x; y) = lim P (X x; Y y)
y!1 y!1
= F (x; 1)
1 2
= x +x for 0 < x < 1
2
Alternatively
Zx Zx
1
F1 (x) = P (X x) = f1 (s) ds = s+ ds
2
1 0
1 2
= x +x for 0 < x < 1
2
In either case the marginal cumulative distribution function of X is
8
>
> 0 x 0
<
1 2
F1 (x) = 2 x +x 0<x<1
>
>
: 1 x 1
3.3.6 Exercise
Suppose X and Y are continuous random variables with joint probability density function
k
f (x; y) = for x 0; y 0
(1 + x + y)3
and 0 otherwise.
(a) Determine k and sketch f (x; y).
(b) Find
(i) P (X 1; Y 2)
(ii) P (X Y )
(iii) P (X + Y 1)
(c) Determine the marginal probability density function of X and the marginal probability
density function of Y .
(d) Determine the joint cumulative distribution function of X and Y .
(e) Determine the marginal cumulative distribution function of X and the marginal cumu-
lative distribution function of Y .
3.3.7 Exercise
Suppose X and Y are continuous random variables with joint probability density function
x y
f (x; y) = ke for y x 0
and 0 otherwise.
(a) Determine k and sketch f (x; y).
(b) Find
(i) P (X 1; Y 2)
(ii) P (X Y )
(iii) P (X + Y 1)
(c) Determine the marginal probability density function of X and the marginal probability
density function of Y .
(d) Determine the joint cumulative distribution function of X and Y .
(e) Determine the marginal cumulative distribution function of X and the marginal cumu-
lative distribution function of Y .
76 3. MULTIVARIATE RANDOM VARIABLES
P (X 2 A and Y 2 B) = P (X 2 A)P (Y 2 B)
De…nition 3.4.1 is not very convenient for determining the independence of two random
variables. The following theorem shows how to use the marginal and joint cumulative
distribution functions or the marginal and joint probability (density) functions to determine
if two random variables are independent.
(2) Suppose X and Y are random variables with joint probability (density) function f (x; y).
Suppose also that f1 (x) is the marginal probability (density) function of X with support
set A1 = fx : f1 (x) > 0g and f2 (y) is the marginal probability (density) function of Y with
support set A2 = fy : f2 (y) > 0g. Then X and Y are independent random variables if and
only if
f (x; y) = f1 (x) f2 (y) for all (x; y) 2 A1 A2
where A1 A2 = f(x; y) : x 2 A1 ; y 2 A2 g.
Proof
(1) For given (x; y), let Ax = fs : s xg and let By = ft : t yg. Then by De…nition 3.4.1
X and Y are independent random variables if and only if
P (X 2 Ax and Y 2 By ) = P (X 2 Ax ) P (Y 2 By )
3.4. INDEPENDENT RANDOM VARIABLES 77
P (X 2 Ax ) = P (X x) = F1 (x)
and
P (Y 2 By ) = F2 (y)
as required.
(2) (Continuous Case) From (1) we have X and Y are independent random variables if
and only if
F (x; y) = F1 (x) F2 (y) (3.1)
@ @
for all (x; y) 2 <2 . Now @x F1 (x) exists for x 2 A1 and @y F2 (y) exists for y 2 A2 . Taking
@2
the partial derivative @x@y of both sides of (3.1) where the partial derivative exists implies
that X and Y are independent random variables if and only if
@2 @ @
F (x; y) = F1 (x) F2 (y) for all (x; y) 2 A1 A2
@x@y @x @y
or
f (x; y) = f1 (x) f2 (y) for all (x; y) 2 A1 A2
as required.
Note: The discrete case can be proved using an argument similar to the one used for (1).
3.4.3 Example
Solution
(a) Since the total number of students is …xed, a larger number of actuarial science students
would imply a smaller number of statistics students and we would guess that the random
variables are not independent. To show this we only need to …nd one pair of values (x; y)
78 3. MULTIVARIATE RANDOM VARIABLES
10 9 6 6
0 0 5 5
P (X = 0; Y = 0) = =
25 25
5 5
10 15 15
0 5 5
P (X = 0) = =
25 25
5 5
9 16 16
0 5 5
P (Y = 0) = =
25 25
5 5
and
6 15 16
5 5 5
P (X = 0; Y = 0) = 6= P (X = 0) P (Y = 0) =
25 25 25
5 5 5
therefore by Theorem 3.4.2, X and Y are not independent random variables.
(b) Since
f (x; y) = x + y for 0 < x < 1; 0 < y < 1
1 1
f1 (x) = x + for 0 < x < 1, f2 (y) = y + for 0 < y < 1
2 2
it would appear that X and Y are not independent random variables. To show this we only
need to …nd one pair of values (x; y) for which f (x; y) 6= f1 (x) f2 (y). Since
2 1 2 1 7 5
f ; = 1 6= f1 f2 =
3 3 3 3 6 6
3.4.4 Exercise
In Exercises 3.2.5, 3.3.6 and 3.3.7 determine if X and Y independent random variables.
In the previous examples we determined whether the random variables were independent
using the joint probability (density) function and the marginal probability (density) func-
tions. The following very useful theorem does not require us to determine the marginal
probability (density) functions.
3.4. INDEPENDENT RANDOM VARIABLES 79
Notes:
(1) If the Factorization Theorem for Independence holds then the marginal probability
(density) function of X will be proportional to g and the marginal probability (density)
function of Y will be proportional to h.
(2) Whenever the support set A is not rectangular the random variables will not be inde-
pendent. The reason for this is that when the support set is not rectangular it will always
be possible to …nd a point (x; y) such that x 2 A1 with f1 (x) > 0, and y 2 A2 with
f2 (y) > 0 so that f1 (x) f2 ( y) > 0, but (x; y) 2= A so f (x; y) = 0. This means there is a
point (x; y) such that f (x; y) 6= f1 (x) f2 ( y) and therefore X and Y are not independent
random variables.
(3) The above de…nitions and theorems can easily be extended to the random vector
(X1 ; X2 ; : : : ; Xn ).
Letting g (x) = f1 (x) and h (y) = f2 (y) proves there exist g (x) and h (y) such that
then
Z1 Z
f1 (x) = f (x; y)dy = g (x) h (y) dy = cg (x) for x 2 A1
1 y2A2
and
Z1 Z
f2 (y) = f (x; y)dx = g (x) h (y) dy = kh (y) for y 2 A2
1 x2A1
80 3. MULTIVARIATE RANDOM VARIABLES
Now
Z1 Z1
1 = f (x; y)dxdy
1 1
Z Z
= f1 (x) f2 (y) dxdy
y2A2 x2A1
Z Z
= cg (x) kh (y) dxdy
y2A2 x2A1
= ck
Since ck = 1
3.4.6 Example
Suppose X and Y are discrete random variables with joint probability function
x+y
e 2
f (x; y) = for x = 0; 1 : : : ; y = 0; 1; : : :
x!y!
(a) Determine if X and Y independent are random variables.
(b) Determine the marginal probability function of X and the marginal probability function
of Y .
Solution
(a) The support set of (X; Y ) is A = f(x; y) : x = 0; 1; : : : ; y = 0; 1; : : :g which is rec-
tangular. The support set of X is A1 = fx : x = 0; 1; : : :g, and the support set of Y is
A2 = fy : y = 0; 1; : : :g.
Let
x y
e e
g (x) = and h (y) =
x! y!
Then f (x; y) = g(x)h(y) for all (x; y) 2 A1 A2 . Therefore by the Factorization Theorem
for Independence X and Y are independent random variables.
(b) By inspection we can see that g (x) is the probability function for a Poisson( ) random
variable. Therefore the marginal probability function of X is
x
e
f1 (x) = for x = 0; 1 : : :
x!
3.4. INDEPENDENT RANDOM VARIABLES 81
3.4.7 Example
Suppose X and Y are continuous random variables with joint probability density function
3
f (x; y) = y 1 x2 for 1 < x < 1; 0 < y < 1
2
and 0 otherwise.
(a) Determine if X and Y independent are random variables.
(b) Determine the marginal probability function of X and the marginal probability function
of Y .
Solution
(a) The support set of (X; Y ) is A = f(x; y) : 1 < x < 1; 0 < y < 1g which is rectan-
gular. The support set of X is A1 = fx : 1 < x < 1g, and the support set of Y is
A2 = fy : 0 < y < 1g.
Let
3
g (x) = 1 x2 and h (y) = y
2
Then f (x; y) = g(x)h(y) for all (x; y) 2 A1 A2 . Therefore by the Factorization Theorem
for Independence X and Y are independent random variables.
(b) Since the marginal probability density function of Y is proportional to h (y) we know
f2 (y) = kh (y) for 0 < y < 1 where k is determined by
Z1
3 3k 2 1 3k
1=k ydy = y j0 =
2 4 4
0
4
Therefore k = 3 and
f2 (y) = 2y for 0 < y < 1
and 0 otherwise.
Since X and Y are independent random variables f (x; y) = f1 (x) f2 (y) or
f1 (x) = f (x; y)=f2 (y) for x 2 A1 . Therefore the marginal probability density function of
X is
3
f (x; y) y 1 x2
f1 (x) = = 2
f2 (y) 2y
3
= 1 x2 for 1<x<1
4
and 0 otherwise.
82 3. MULTIVARIATE RANDOM VARIABLES
3.4.8 Example
Suppose X and Y are continuous random variables with joint probability density function
2 p
f (x; y) = for 0 < x < 1 y2, 1<y<1
and 0 otherwise.
(a) Determine if X and Y independent are random variables.
(b) Determine the marginal probability function of X and the marginal probability function
of Y .
Solution
(a) The support set of (X; Y ) which is
n p o
A = (x; y) : 0 < x < 1 y2; 1<y<1
y
1
x
0 1
A
-1
Figure 3.7: Graph of the support set of (X; Y ) for Example 3.4.8
The region A is half of a unit circle which has area equal to =2. Since the probability
density function is constant or uniform on this region we can see that f (x; y) must equal
2= on the region A since the volume of the solid must be equal to 1.
3.4. INDEPENDENT RANDOM VARIABLES 83
and 0 otherwise.
To …nd the marginal probability density function of Y which use the description of the
support set in which the range of Y does not depend on x which is
n p o
A = (x; y) : 0 < x < 1 y 2 ; 1 < y < 1
and 0 otherwise.
84 3. MULTIVARIATE RANDOM VARIABLES
Notes:
(1) If X and Y are discrete random variables then
f1 (xjy) = P (X = xjY = y)
P (X = x; Y = y)
=
P (Y = y)
f (x; y)
=
f2 (y)
and
P P f (x; y)
f1 (xjy) =
x x f2 (y)
1 P
= f (x; y)
f2 (y) x
f2 (y)
=
f2 (y)
= 1
d
R Ry
x+h
dh f (u; v) dvdu
x 1
= lim by L’Hôpital’s Rule
h!0
d
R
x+h
dh f1 (u) du
x
Ry
f (x + h; v) dv
1
= lim by the Fundamental Theorem of Calculus
h!0 f1 (x + h)
Ry
limh!0 f (x + h; v) dv
1
=
limh!0 f1 (x + h)
Ry
f (x; v) dv
1
=
f1 (x)
assuming that the limits exist and that integration and the limit operation can be inter-
changed. If we di¤erentiate the last term with respect to y using the Fundamental Theorem
of Calculus we have
d f (x; y)
P (Y yjX = x) =
dy f1 (x)
86 3. MULTIVARIATE RANDOM VARIABLES
f (x; y)
f2 (yjx) =
f1 (x)
3.5.2 Example
In Example 3.4.8 determine the conditional probability density function of X given Y = y
and the conditional probability density function of Y given X = x.
Solution
The conditional probability density function of X given Y = y is
f (x; y)
f1 (xjy) =
f2 (y)
2
= 2
p
1 y2
1 p
= p for 0 < x < 1 y2; 1<y<1
1 y2
Note that for each y 2 ( 1; 1), the conditional probability density function of X given
p
Y = y is Uniform 0; 1 y 2 . This makes sense because the joint probability density
function is constant on its support set.
The conditional probability density function of Y given X = x is
f (x; y)
f2 (yjx) =
f1 (x)
2
= 4
p
1 x2
1 p p
= p for 1 x2 < y < 1 x2 ; 0 < x < 1
2 1 x2
3.5. CONDITIONAL DISTRIBUTIONS 87
Note that for each x 2 (0; 1), the conditional probability density function of Y given X = x
p p
is Uniform 1 x2 ; 1 x2 . This again makes sense because the joint probability
density function is constant on its support set.
3.5.3 Exercise
In Exercise 3.2.5 show that the conditional probability function of Y given X = x is
2 (1 )
Binomial n x; 2
1
Why does this make sense?
3.5.4 Exercise
In Example 3.3.3 and Exercises 3.3.6 and 3.3.7 determine the conditional probability density
function of X given Y = y and the conditional probability density function of Y given
X = x. Be sure to check that
Z1 Z1
f1 (xjy)dx = 1 and f2 (yjx)dy = 1
1 1
When choosing a model for bivariate data it is sometimes easier to specify a conditional
probability (density) function and a marginal probability (density) function. The joint
probability (density) function can then be determined using the Product Rule which is
obtained by rewriting (3.2) and (3.3).
3.5.6 Example
In modeling survival in a certain insect population it is assumed that the number of eggs
laid by a single female follows a Poisson( ) distribution. It is also assumed that each egg
has probability p of surviving independently of any other egg. Determine the probability
function of the number of eggs that survive.
88 3. MULTIVARIATE RANDOM VARIABLES
Solution
Let Y = number of eggs laid and let X = number of eggs that survive. Then Y Poisson( )
and XjY = y Binomial(y; p). We want to determine the marginal probability function
of X.
By the Product Rule the joint probability function of X and Y is
A = f(x; y) : y = x; x + 1; : : : ; x = 0; 1; : : :g (3.4)
Since we are summing over y we need to use the second description of the support set given
in (3.4). So
P1 px e (1 p)y x y
f1 (x) =
y=x x! (y x)!
px e x P
1 (1 p)y x y x
= let u = y x
x! y=x (y x)!
px e x P1 [ (1 p)]u
=
x! u=0 u!
x
p e x
= e (1 p) by the Exponential series 2.11.7
x!
(p )x e p
= for x = 0; 1; : : :
x!
which we recognize as a Poisson(p ) probability function.
3.5.7 Example
Determine the marginal probability function of X if Y Gamma( ; 1 ) and the conditional
distribution of X given Y = y is Weibull(p; y 1=p ).
3.5. CONDITIONAL DISTRIBUTIONS 89
Solution
Since Y Gamma( ; 1 )
y 1e y
f2 (y) = for y > 0
( )
and 0 otherwise.
Since the conditional distribution of X given Y = y is Weibull(p; y 1=p )
yxp
f1 (xjy) = pyxp 1
e for x > 0
for x > 0 and 0 otherwise. This distribution is a member of the Burr family of distributions
which is frequently used by actuaries for modeling household income, crop prices, insurance
risk, and many other …nancial variables.
90 3. MULTIVARIATE RANDOM VARIABLES
The following theorem gives us one more method for determining whether two random
variables are independent.
3.5.8 Theorem
Suppose X and Y are random variables with marginal probability (density) functions f1 (x)
and f2 (y) respectively and conditional probability (density) functions f1 (xjy) and f2 (yjx).
Suppose also that A1 is the support set of X, and A2 is the support set of Y . Then X and
Y are independent random variables if and only if either of the following holds
or
f2 (yjx) = f2 (y) for all y 2 A2
3.5.9 Example
Suppose the conditional distribution of X given Y = y is
e x
f1 (xjy) = y
for 0 < x < y
1 e
and 0 otherwise. Are X and Y independent random variables?
Solution
Since the conditional distribution of X given Y = y depends on y then f1 (xjy) = f1 (x)
cannot hold for all x in the support set of X and therefore X and Y are not independent
random variables.
If X and Y are continuous random variables with joint probability density function f (x; y)
then
Z1 Z1
E[h(X; Y )] = h(x; y)f (x; y)dxdy
1 1
3.6.2 Theorem
Suppose X and Y are random variables with joint probability (density) function f (x; y), a
and b are real constants, and g(x; y) and h(x; y) are real-valued functions. Then
Z1 Z1
E[ag(X; Y ) + bh(X; Y )] = [ag(x; y) + bh(x; y)] f (x; y)dxdy
1 1
Z1 Z1 Z1 Z1
= a g(x; y)f (x; y)dxdy + b h(x; y)f (x; y)dxdy
1 1 1 1
by properties of double integrals
= aE[g(X; Y )] + bE[h(X; Y )] by De…nition 3.6.1
3.6.3 Corollary
(1)
E(aX + bY ) = aE(X) + bE(Y ) = a X +b Y
P
n P
n P
n
E ai Xi = ai E(Xi ) = ai i
i=1 i=1 i=1
where i = E(Xi ).
(3) If X1 ; X2 ; : : : ; Xn are random variables with E(Xi ) = , i = 1; 2; : : : ; n then
1 Pn 1 Pn n
E X = E (Xi ) = = =
n i=1 n i=1 n
92 3. MULTIVARIATE RANDOM VARIABLES
Z1 Z1
E(aX + bY ) = (ax + by) f (x; y)dxdy
1 1
2 3 2 3
Z1 Z1 Z1 Z1
= a x4 f (x; y)dy 5 dx + b y4 f (x; y)dx5 dy
1 1 1 1
Z1 Z1
= a xf1 (x) dx + b yf2 (y) dy
1 1
= aE(X) + bE(Y ) = a X +b Y
Q
n Q
n
E hi (Xi ) = E [hi (Xi )]
i=1 i=1
Z1 Z1
E [g(X)h(Y )] = g(x)h(y)f1 (x) f2 (y) dxdy
1 1
2 3
Z1 Z1
= h(y)f2 (y) 4 g(x)f1 (x) dx5 dy
1 1
Z1
= E [g (X)] h(y)f2 (y) dy
1
= E [g (X)] E [h (Y )]
3.6. JOINT EXPECTATIONS 93
Cov(X; Y ) = E(XY ) X Y
Proof
Cov(X; Y ) = E [(X X ) (Y Y )]
= E (XY XY X Y + X Y)
= E(XY ) X E(Y ) Y E(X) + X Y
= E(XY ) E(X)E(Y ) E(Y )E(X) + E(X)E(Y )
= E(XY ) E(X)E(Y )
Proof of (1)
h i
V ar (aX + bY ) = E (aX + bY a X b Y )2
n o
2
= E [a (X X ) + b (Y Y )]
h i
2 2
= E a2 (X X ) + b (Y
2
Y ) + 2ab (X X ) (Y Y )
h i h i
2 2
= a2 E (X X) + b2 E (Y Y) + 2abE [(X X ) (Y Y )]
= a2 2
X + b2 2
Y + 2abCov (X; Y )
3.6.9 Example
For the joint probability density function in Example 3.3.3 …nd (X; Y ).
Solution
Z1 Z1 Z1 Z1
E (XY ) = xyf (x; y)dxdy = xy (x + y) dxdy
1 1 0 0
Z1 Z1 Z1
1 3 1
= x2 y + xy 2 dxdy = x y + x2 y 2 j10 dy
3 2
0 0 0
Z1
1 1 1 2 1 3 1 1 1
= y + y 2 dy = y + y j0 = +
3 2 6 6 6 6
0
1
=
3
3.6. JOINT EXPECTATIONS 95
Z1 Z1 Z1
1 1
E (X) = xf1 (x)dx = x x+ dx = x2 + x dx
2 2
1 0 0
1 3 1 2 1
= x + x j0
3 4
1 1 7
= + =
3 4 12
Z1 Z1 Z1
2 2 2 1 1
E X = x f1 (x)dx = x x+ dx = x3 + x2 dx
2 2
1 0 0
1 4 1 3 1
= x + x j0
4 6
1 1 3+2 5
= + = =
4 6 12 12
2
5 7
V ar (X) = E X 2 [E (X)]2 =
12 12
60 49 11
= =
144 144
By symmetry
7 11
E (Y ) = and V ar (Y ) =
12 144
Therefore
1 7 7 48 49 1
Cov (X; Y ) = E (XY ) E (X) E (Y ) = = =
3 12 12 144 144
and
Cov(X; Y )
(X; Y ) =
X Y
1
q 144
=
11 11
144 144
1 144
=
144 11
1
=
11
3.6.10 Exercise
For the joint probability density function in Exercise 3.3.7 …nd (X; Y ).
96 3. MULTIVARIATE RANDOM VARIABLES
3.6.11 Theorem
If (X; Y ) is the correlation coe¢ cient of random variables X and Y then
1 (X; Y ) 1
(X; Y ) = 1 if and only if Y = aX + b for some a > 0 and (X; Y ) = 1 if and only if
Y = aX + b for some a < 0.
Proof
Let S = X + tY , where t 2 <. Then E (S) = S and
2
V ar(S) = E (S S)
2
= Ef[(X + tY ) ( X +t Y )] g
2
= Ef[(X X ) + t(Y Y )] g
2
= E (X X ) + 2t(X X )(Y Y) + t2 (Y Y)
2
= t2 2
Y + 2Cov(X; Y )t + 2
X
Now V ar(S) 0 for any t 2 < implies that the quadratic equation V ar(S) = t2 2Y +
2Cov(X; Y )t + 2X in the variable t must have at most one real root. To have at most
one real root the discrimant of this quadratic equation must be less than or equal to zero.
Therefore
[2Cov(X; Y )]2 4 2X 2Y 0
or
[Cov(X; Y )]2 2 2
X Y
or
Cov(X; Y )
j (X; Y )j = 1
X Y
and therefore
1 (X; Y ) 1
To see that (X; Y ) = 1 corresponds to a linear relationship between X and Y , note
that (X; Y ) = 1 implies
jCov(X; Y )j = X Y
and therefore
[2Cov(X; Y )]2 4 2 2
X Y =0
which corresponds to a zero discriminant in the quadratic equation. This means that there
exists one real number t for which
V ar(S) = V ar(X + t Y ) = 0
Since conditional probability (density) functions are also probability (density) function,
expectations can be de…ned in terms of these conditional probability (density) functions as
in the following de…nition.
P
E [g(X)jy] = g(x)f1 (xjy)
x
Z1
E[g(X)jy] = g(x)f1 (xjy)dx
1
3.7.3 Example
For the joint probability density function in Example 3.4.8 …nd E (Y jx) the conditional
mean of Y given X = x, and V ar (Y jx) the conditional variance of Y given X = x.
Solution
From Example 3.5.2 we have
1 p p
f2 (yjx) = p for 1 x2 < y < 1 x2 ; 0 < x < 1
2 1 x2
98 3. MULTIVARIATE RANDOM VARIABLES
Therefore
Z1
E (Y jx) = yf2 (yjx)dy
1
p
Z1 x2
1
= y p dy
p 2 1 x2
1 x2
p
Z1 x2
1
= p ydy
2 1 x2 p
1 x2
p
1 1 x2
= p y2j p 1 x2
1 x2
= 0
Since E (Y jx) = 0
Z1
2
V ar (Y jx) = E Y jx = y 2 f2 (yjx)dy
1
p
Z1 x2
1
= y2 p dy
p 2 1 x2
1 x2
p
Z1 x2
1
= p y 2 dy
2 1 x2 p
1 x2
p
1 1 x2
= p y3j p 1 x2
6 1 x2
1
= 1 x2
3
p p
Recall that the conditional distribution of Y given X = x is Uniform 1 x2 ; 1 x2 .
a+b
The results above can be veri…ed by noting that if U Uniform(a; b) then E (U ) = 2
(b a)2
and V ar (U ) = 12 .
3.7.4 Exercise
In Exercises 3.5.3 and 3.3.7 …nd E (Y jx), V ar (Y jx), E (Xjy) and V ar (Xjy).
3.7.5 Theorem
If X and Y are independent random variables then E [g (X) jy] = E [g (X)] and
E [h (Y ) jx] = E [h (Y )].
3.7. CONDITIONAL EXPECTATION 99
Z1
E[g(X)jy] = g(x)f1 (xjy)dx
1
Z1
= g(x)f1 (x)dx by Theorem 3.5.8
1
= E [g (X)]
as required.
E [h (Y ) jx] = E [h (Y )] follows in a similar manner.
3.7.6 De…nition
E [g (X) jY ] is the function of the random variable Y whose value is E [g (X) jy] when Y = y.
This means of course that E [g (X) jY ] is a random variable.
3.7.7 Example
Solution
Since XjY = y Binomial(y; p)
E (XjY = y) = py
and
E (XjY ) = pY
E [E (XjY )] = E (pY ) = pE (Y ) = p
3.7.8 Theorem
Suppose X and Y are random variables then
E fE [g (X) jY ]g = E [g (X)]
Proof (Continuous Case)
2 3
Z1
E fE [g (X) jY ]g = E 4 g (x) f1 (xjy) dx5
1
2 3
Z1 Z1
= 4 g (x) f1 (xjy) dx5 f2 (y) dy
1 1
Z1 Z1
= g (x) f1 (xjy) f2 (y) dxdy
1 1
2 3
Z1 Z1
= g (x) 4 f (x; y) dy 5 dx
1 1
Z1
= g (x) f1 (x) dx
1
= E [g (X)]
When the joint model is speci…ed in terms of a conditional distribution XjY = y and
a marginal distribution for Y then Theorems 3.7.8 and 3.7.10 give a method for …nding
expectations for functions of X without having to determine the marginal distribution for
X.
3.7.11 Example
Suppose P Uniform(0; 0:1) and Y jP = p Binomial(10; p). Find E(Y ) and V ar(Y ).
Solution
Since P Uniform(0; 0:1)
and
V ar (Y jp) = 10p (1 p) , V ar (Y jP ) = 10P (1 P ) = 10 P P2
Therefore
1 1
E (Y ) = E [E (Y jP )] = E (10P ) = 10E (P ) = 10 =
20 2
and
3.7.12 Exercise
In Example 3.5.7 …nd E (X) and V ar (X) using Corollary 3.7.9 and Theorem 3.7.10.
3.7.13 Exercise
Suppose P Beta(a; b) and Y jP = p Geometric(p). Find E(Y ) and V ar(Y ).
102 3. MULTIVARIATE RANDOM VARIABLES
is called the joint moment generating function of X and Y if this expectation exists (joint
sum/integral converges absolutely) for all t1 2 ( h1 ; h1 ) for some h1 > 0, and all
t2 2 ( h2 ; h2 ) for some h2 > 0.
More generally if X1 ; X2 ; : : : ; Xn are random variables then
P
n
M (t1 ; t2 ; : : : ; tn ) = E exp ti Xi
i=1
If the joint moment generating function is known that it is straightforward to obtain the
moment generating functions of the marginal distributions.
3.8.3 Example
Suppose X and Y are continuous random variables with joint probability density function
y
f (x; y) = e for 0 < x < y < 1
and 0 otherwise.
3.8. JOINT MOMENT GENERATING FUNCTIONS 103
Solution
(a) The joint moment generating function is
Therefore
1 1 1
M (t1 ; t2 ) = lim e (1 t1 t2 )y jb0 + e (1 t2 )y jb0
t1 b!1 1 t1 t2 1 t2
1 1 h i 1 h (1 i
= lim e (1 t1 t2 )b 1 + e t2 )b
1
t1 b!1 1 t1 t2 1 t2
1 1 1
=
t1 1 t1 t2 1 t2
1 (1 t2 ) (1 t1 t2 )
=
t1 (1 t1 t2 ) (1 t2 )
1
= for t1 + t2 < 1 and t2 < 1
(1 t1 t2 ) (1 t2 )
104 3. MULTIVARIATE RANDOM VARIABLES
MX (t) = E(etX )
= M (t; 0)
1
=
(1 t 0) (1
0)
1
= for t < 1
1 t
By examining the list of moment generating functions in Chapter 11 we see that this is
the moment generating function of a Exponential(1) random variable. Therefore by the
Uniqueness Theorem for Moment Generating Functions, X has a Exponential(1) distribu-
tion.
(c) The moment generating function of Y is
MY (t) = E(etY )
= M (0; t)
1
=
(1 0 t) (1 t)
1
= for t < 1
(1 t)2
By examining the list of moment generating functions in Chapter 11 we see that this
is the moment generating function of a Gamma(2; 1) random variable. Therefore by the
Uniqueness Theorem for Moment Generating Functions, Y has a Gamma(2; 1) distribution.
3.8.4 Example
Suppose X and Y are continuous random variables with joint probability density function
x y
f (x; y) = e for x > 0, y > 0
and 0 otherwise.
(a) Find the joint moment generating function of X and Y .
(b) What is the moment generating function of X and what is the marginal distribution of
X?
(c) What is the moment generating function of Y and what is the marginal distribution of
Y?
3.8. JOINT MOMENT GENERATING FUNCTIONS 105
Solution
(a) The joint moment generating function is
Z1 Z1
t1 X+t2 Y
M (t1 ; t2 ) = E e = et1 x+t2 y f (x; y) dxdy
1 1
Z1 Z1
= et1 x+t2 y e x y
dxdy
0 0
0 10 1 1
Z1 Z
= @ e y(1 t2 ) A @
dy e x(1 t1 )
dxA which converges for t1 < 1, t2 < 1
0 0
! !
e y(1 t2 ) e x(1 t1 )
= lim jb0 lim jb0
b!1 (1 t2 ) b!1 (1 t1 )
1 1
= for t1 < 1, t2 < 1
1 t1 1 t2
By examining the list of moment generating functions in Chapter 11 we see that this is
the moment generating function of a Exponential(1) random variable. Therefore by the
Uniqueness Theorem for Moment Generating Functions, X has a Exponential(1) distribu-
tion.
By examining the list of moment generating functions in Chapter 11 we see that this is
the moment generating function of a Exponential(1) random variable. Therefore by the
Uniqueness Theorem for Moment Generating Functions, Y has a Exponential(1) distribu-
tion.
106 3. MULTIVARIATE RANDOM VARIABLES
3.8.5 Theorem
If X and Y are random variables with joint moment generating function M (t1 ; t2 ) which
exists for all t1 2 ( h1 ; h1 ) for some h1 > 0, and all t2 2 ( h2 ; h2 ) for some h2 > 0 then
@ j+k
E(X j Y k ) = M (t1 ; t2 )j(t1 ;t2 )=(0;0)
@tj1 @tk2
Proof
See Problem 11(a).
Proof
See Problem 11(b).
3.8.7 Example
Use Theorem 3.8.6 to determine if X and Y are independent random variables in Examples
3.8.3 and 3.8.4.
Solution
For Example 3.8.3
1
M (t1 ; t2 ) = for t1 + t2 < 1 and t2 < 1
(1 t1 t2 ) (1 t2 )
1
MX (t1 ) = for t1 < 1
1 t1
1
MY (t2 ) = for t2 < 1
(1 t2 )2
Since
3
1 1 1 8 1 1 1 1 4
M ; = 1 1 1 = 6= MX MY = 1 =
4 4 1 1 3 4 4 1 1 2 3
4 4 4 4 1 4
3.8.8 Example
Suppose X1 ; X2 ; : : : ; Xn are independent and identically distributed random variables each
with moment generating function M (t), t 2 ( h; h) for some h > 0. Find M (t1 ; t2 ; : : : ; tn )
the joint moment generating function of X1 ; X2 ; : : : ; Xn . Find the moment generating
P n
function of T = Xi .
i=1
Solution
Since the Xi ’s are independent random variables each with moment generating function
M (t), t 2 ( h; h) for some h > 0, the joint moment generating function of X1 ; X2 ; : : : ; Xn
is
P
n
M (t1 ; t2 ; : : : ; tn ) = E exp ti Xi
i=1
Q
n
= E eti Xi
i=1
Q
n
= E eti Xi
i=1
Q
n
= M (ti ) for ti 2 ( h; h) ; i = 1; 2; : : : ; n for some h > 0
i=1
P
n
The moment generating function of T = Xi is
i=1
MT (t) = E etT
P
n
= E exp tXi
i=1
= M (t; t; : : : ; t)
Q
n
= M (t)
i=1
= [M (t)]n for t 2 ( h; h) for some h > 0
108 3. MULTIVARIATE RANDOM VARIABLES
n!
f (x1 ; x2 ; : : : ; xk ) = px1 px2 pxk k
x1 !x2 ! xk ! 1 2
P
k P
k
for xi = 0; 1; : : : ; n; i = 1; 2; : : : ; k; xi = n; 0 pi 1; i = 1; 2; : : : ; k; pi = 1
i=1 i=1
P
k
Notes: (1) Since Xi = n, the Multinomial distribution is actually a joint distribution
i=1
for k 1 random variables which can be written as
kP1
kP1 n xi
n! x i=1
f (x1 ; x2 ; : : : ; xk 1) = kP1
px1 1 px2 2 pk k 11 1 pi
i=1
x1 !x2 ! xk 1! n xi !
i=1
kP1 kP1
for xi = 0; 1; : : : ; n; i = 1; 2; : : : ; k 1; xi n; 0 pi 1; i = 1; 2; : : : ; k 1; pi 1
i=1 i=1
n!
f (x1 ) = px1 (1 p1 )n x1
x1 ! (n x1 )! 1
n x1
= p (1 p1 )n x1
x1 1
for x1 = 0; 1; : : : ; n; 0 p1 1
(2) If k = 3 we obtain the Trinomial distribution
n!
f (x1 ; x2 ) = px1 1 px2 2 (1 p1 p2 )n x1 x2
x1 !x2 ! (n x1 x2 )!
for xi = 0; 1; : : : ; n; i = 1; 2, x1 + x2 n and 0 pi 1; i = 1; 2; p1 + p2 1
3.9. MULTINOMIAL DISTRIBUTION 109
n
= p1 et1 + p2 et2 + pk 1e
tk 1
+ pk (3.5)
Xi Binomial(n; pi ) for i = 1; 2; : : : ; k
(3) If T = Xi + Xj ; i 6= j; then
T Binomial (n; pi + pj )
(4)
Cov (Xi ; Xj ) = npi pj for i 6= j
(5) The conditional distribution of any subset of (X1 ; X2 ; : : : ; Xk ) given the remaining of
the coordinates is a Multinomial distribution. In particular the conditional probability
function of Xi given Xj = xj ; i 6= j; is
pi
Xi jXj = xj Binomial n xj ;
1 pj
pi
Xi jXi + Xj = t Binomial t;
pi + pj
3.9.3 Example
Suppose (X1 ; X2 ; : : : ; Xk ) Multinomial(n; p1 ; p2 ; : : : ; pk )
(a) Prove (X1 ; X2 ; : : : ; Xk 1) has joint moment generating function
n
M (t1 ; t2 ; : : : ; tk 1) = p1 et1 + p2 et2 + pk 1e
tk 1
+ pk
Solution
P
k
(a) Let A = (x1 ; x2 ; : : : ; xk ) : xi = 0; 1; : : : ; n; i = 1; 2; : : : ; k; xi = n then
i=1
P P P n!
= et1 x1 +t2 x2 + +tk 1 xk 1
p x1 p x2 pkxk 1 1 pxk k
(x1 ;x2 ; :::;xk ) 2A x1 !x2 ! xk ! 1 2
P P P n! x1 x2 xk
= p1 et1 p2 et2 pk 1e
tk 1 1
pxk k
(x1 ;x2 ; :::;xk ) x !x
2A 1 2 ! xk !
n
= p1 et1 + p2 et2 + pk 1e
tk 1
+ pk for (t1 ; t2 ; : : : ; tk 1) 2 <k
which is of the form 3.5 so by the Uniqueness Theorem for Moment Generating Functions,
(X1 ; X2 ; X3 ) Multinomial(n; p1 ; p1 ; 1 p1 p2 ).
(c) The moment generating function of Xi is
n
M (0; 0; : : : ; t; 0; : : : ; 0) = pi eti + (1 pi ) for ti 2 <
3.9.4 Exercise
Prove property (3) in Theorem 3.9.2.
3.10. BIVARIATE NORMAL DISTRIBUTION 111
1 1 1
f (x1 ; x2 ) = exp (x ) (x )T for (x1 ; x2 ) 2 <2
2 j j1=2 2
where " #
h i h i 2
1 1 2
x= x1 x2 ; = 1 2 ; = 2
1 2 2
(2) X1 N( 2) 2 ).
1; 1 and X2 N( 2; 2
(3) Cov (X1 ; X2 ) = 1 2 and Cor (X1 ; X2 ) = where 1 1.
(4) X1 and X2 are independent random variables if and only if = 0.
(5) If c = (c1 ; c2 ) is a nonzero vector of constants then
c1 X1 + c2 X2 N cT ; c cT
XA + b BVN A + b; AT A
(7)
2 2
X2 jX1 = x1 N 2 + 2 (x1 1 )= 1 ; 2 (1 )
112 3. MULTIVARIATE RANDOM VARIABLES
and
2 2
X1 jX2 = x2 N 1 + 1 (x2 2 )= 2 ; 1 (1 )
(8) (X ) 1 (X )T 2 (2)
Proof
For proofs of properties (1) (4) and (6) (7) see Problem 13.
(5) The moment generating function of c1 X1 + c2 X2 is
E et(c1 X1 +c2 X2 )
The BVN joint probability density function is graphed in Figures 3.8 - 3.10.
0.2
0.15
f(x,y)
0.1
0.05
0
3 3
2 2
1 1
0 0
-1 -1
-2 -2
-3 -3
y x
" #
h i 1 0
Figure 3.8: Graph of BVN p.d.f. with = 0 0 and =
0 1
The graphs all have the same mean vector = [0 0] but di¤erent variance/covariance
matrices . The axes all have the same scale.
3.10. BIVARIATE NORMAL DISTRIBUTION 113
0.2
0.15
f(x,y)
0.1
0.05
0
3 3
2 2
1 1
0 0
-1 -1
-2 -2
-3 -3
y x
" #
h i 1 0:5
Figure 3.9: Graph of BVN p.d.f. with = 0 0 and =
0:5 1
0.2
0.15
f(x,y)
0.1
0.05
0
3 3
2 2
1 1
0 0
-1 -1
-2 -2
-3 -3
y x
" #
h i 0:6 0:5
Figure 3.10: Graph of BVN p.d.f. with = 0 0 and =
0:5 1
114 3. MULTIVARIATE RANDOM VARIABLES
y=g(x)
y R
x=a y=h(x)
x=b
Suppose f (x; y) 0 for all (x; y) 2 <2 . The graph of z = f (x; y) is a surface in 3-space
lying above or touching the xy-plane. The volume of the solid bounded by the surface
z = f (x; y) and the xy-plane above the region R is given by
Zb h(x)
Z
Volume = f (x; y)dydx
x=a y=g(x)
y=b
y
x=g(y) R x=h(y)
y=a
Zd h(y)
Z
Volume = f (x; y)dxdy
y=c x=g(y)
Give an expression for the volume of the solid bounded by the surface z = f (x; y) and
the xy-plane above the region R = R1 [ R2 in Figure 3.13.
b3
y=g(x) x=h(y)
y
b2 R2
R
1
b1
a1 a2 a3
x
2. Suppose X and Y are discrete random variables with joint probability function
e 2
f (x; y) = for x = 0; 1; : : : ; y; y = 0; 1; : : :
x!(y x)!
(a) Find the marginal probability function of X and the marginal probability func-
tion of Y .
(b) Are X and Y independent random variables?
3. Suppose X and Y are continuous random variables with joint probability density
function
f (x; y) = k(x2 + y) for 0 < y < 1 x2 ; 1<x<1
(a) Determine k:
(b) Find the marginal probability density function of X and the marginal probability
density function of Y .
(c) Are X and Y independent random variables?
(d) Find P (Y X + 1).
4. Suppose X and Y are continuous random variables with joint probability density
function
f (x; y) = kx2 y for x2 < y < 1
(a) Determine k.
(b) Find the marginal probability density function of X and the marginal probability
density function of Y .
(c) Are X and Y independent random variables?
(d) Find P (X Y ).
(e) Find the conditional probability density function of X given Y = y and the
conditional probability density function of Y given X = x.
3.12. CHAPTER 3 PROBLEMS 117
5. Suppose X and Y are continuous random variables with joint probability density
function
f (x; y) = kxe y for 0 < x < 1; 0 < y < 1
(a) Determine k.
(b) Find the marginal probability density function of X and the marginal probability
density function of Y .
(c) Are X and Y independent random variables?
(d) Find P (X + Y t).
6. Suppose each of the following functions is a joint probability density function for
continuous random variables X and Y .
8. Suppose X and Y are continuous random variables. Suppose also that the marginal
probability density function of X is
1
f1 (x) = (1 + 4x) for 0 < x < 1
3
and the conditional probability density function of Y given X = x is
2y + 4x
f2 (yjx) = for 0 < x < 1; 0 < y < 1
1 + 4x
Determine:
10. Assume that Y denotes the number of bacteria in a cubic centimeter of liquid and
that Y j Poisson( ). Further assume that varies from location to location and
Gamma( ; ).
11. Suppose X and Y are random variables with joint moment generating function
M (t1 ; t2 ) which exists for all jt1 j < h1 and jt2 j < h2 for some h1 ; h2 > 0.
12. Suppose X and Y are discrete random variables with joint probability function
e 2
f (x; y) = for x = 0; 1; : : : ; y; y = 0; 1; : : :
x!(y x)!
(x ) 1
(x )T 2xtT
= [x ( + t )] 1
[x ( + t )]T 2 tT t tT
Use this identity to show that the joint moment generating function of X1 and
X2 is
(c) Use moment generating functions to show Cov(X1 ; X2 ) = 1 2. Hint: Use the
result in Problem 11(a).
(d) Use moment generating functions to show that X1 and X2 are independent
random variables if and only if = 0.
(e) Let A be a 2 2 nonsingular matrix and b be a 1 2 vector. Use the moment
generating function to show that
XA + b BVN A + b; AT A
14. Suppose X and Y are continuous random variables with joint probability density
function
f (x; y) = 2e x y for 0 < x < y < 1
4.1.1 Example
Suppose X and Y are continuous random variables with joint probability density function
121
122 4. FUNCTIONS OF TWO OR MORE RANDOM VARIABLES
Solution
The support set of (X; Y ) is A = f(x; y) : 0 < x < y < 1g which is the union of the regions
E and F shown in Figure 4.1
F
x=t/y x=y
E
√t -
1 x
0
Due to the shape of the region E, the double integral over the region E would have to be
written as the sum of two double integrals. It is easier to …nd G (t) using
Z Z
G (t) = 3ydxdy
(x;y) 2 E
Z Z
= 1 3ydxdy
(x;y) 2 F
Z1 Zy Z1
= 1 3ydxdy = 1 3y xjyt=y dy
p p
y= t x=t=y t
Z1 Z1
t
= 1 3y y dy = 1 3y 2 3t dy
p
y p
t t
3
= 1 y 3ty j1p t
=1 1 3t t3=2 3t3=2
= 3t 2t3=2 for 0 < t < 1
4.1. CUMULATIVE DISTRIBUTION FUNCTION TECHNIQUE 123
Now a cumulative distribution function must be a continuous function for all real values.
Therefore as a check we note that
and
lim 3t 2t3=2 = 1 = G (1)
t!1
d d
G (t) = 3t 2t3=2
dt dt
= 3 3t1=2 for 0 < t < 1
and 0 otherwise.
4.1.2 Exercise
Suppose X and Y are continuous random variables with joint probability density function
f (x; y) = 3y for 0 x y 1
4.1.3 Example
Suppose X1 ; X2 ; : : : ; Xn are independent and identically distributed continuous random
variables each with probability density function f (x) and cumulative distribution function
F (x). Find the probability density function of U = max (X1 ; X2 ; : : : ; Xn ) = X(n) and
V = min (X1 ; X2 ; : : : ; Xn ) = X(1) .
124 4. FUNCTIONS OF TWO OR MORE RANDOM VARIABLES
Solution
For u 2 <, the cumulative distribution function of U is
d d
g (u) = G (u) = [F (u)]n
du du
= n [F (u)]n 1 f (u) for u 2 A
and 0 otherwise.
For v 2 <, the cumulative distribution function of V is
d d
h (v) = H (v) = f1 [1 F (v)]n g
dv dv
= n [1 F (v)]n 1 f (v) for v 2 A
and 0 otherwise.
4.2. ONE-TO-ONE TRANSFORMATIONS 125
u = h1 (x; y)
v = h2 (x; y)
is a one-to-one transformation for all (x; y) 2 RXY and that S maps the region RXY into
the region RU V in the uv plane. Since S : (x; y) ! (u; v) is a one-to-one transformation
there exists a inverse transformation T de…ned by
x = w1 (u; v)
y = w2 (u; v)
such that T = S 1 : (u; v) ! (x; y) for all (u; v) 2 RU V . The Jacobian of the transformation
T is
@x @x 1
@(x; y) @u @v @(u; v)
= @y @y =
@(u; v) @u @v
@(x; y)
@(u;v)
where @(x;y) is the Jacobian of the transformation S.
u = h1 (x; y)
v = h2 (x; y)
@u @u @v @v
Suppose the partial derivatives @x , @y , @x and @y are continuous functions in the neigh-
@(u;v)
bourhood of the point (a; b). Suppose also that @(x;y) 6= 0 at the point (a; b). Then there is
a neighbourhood of the point (a; b) in which S has an inverse.
Note: These are su¢ cient but not necessary conditions for the inverse to exist.
126 4. FUNCTIONS OF TWO OR MORE RANDOM VARIABLES
U = h1 (X; Y )
V = h2 (X; Y )
X = w1 (U; V )
Y = w2 (U; V )
Suppose also that S maps RXY into RU V . Then g(u; v), the joint joint probability density
function of U and V , is given by
@(x; y)
g(u; v) = f (w1 (u; v); w2 (u; v))
@(u; v)
for all (u; v) 2 RU V . (Compare Theorem 2.6.8 for univariate random variables.)
4.2.3 Proof
We want to …nd g(u; v), the joint probability density function of the random variables U
and V . Suppose S 1 maps the region B RU V into the region A RXY then
P [(U; V ) 2 B]
ZZ
= g(u; v)dudv (4.1)
B
= P [(X; Y ) 2 A]
ZZ
= f (x; y)dxdy
A
ZZ
@(x; y)
= f (w1 (u; v); w2 (u; v)) dudv (4.2)
@(u; v)
B
where the last line follows by the Change of Variable Theorem. Since this is true for all
B RU V we have, by comparing (4.1) and (4.2), that the joint probability density function
of U and V is given by
@(x; y)
g(u; v) = f (w1 (u; v); w2 (u; v))
@(u; v)
for all (u; v) 2 RU V .
In the following example we see how Theorem 4.2.2 can be used to show that the sum of
two independent Exponential(1) random variables is a Gamma random variable.
4.2. ONE-TO-ONE TRANSFORMATIONS 127
4.2.4 Example
Suppose X Exponential(1) and Y Exponential(1) independently. Find the joint prob-
ability density function of U = X + Y and V = X. Show that U Gamma(2; 1).
Solution
Since X Exponential(1) and Y Exponential(1) independently, the joint probability
density function of X and Y is
x y
f (x; y) = f1 (x) f2 (y) = e e
x y
= e
with support set RXY = f(x; y) : x > 0; y > 0g which is shown in Figure 4.2.
. .
y . .
. .
...
x
0
The transformation
S : U =X +Y, V =X
has inverse transformation
X =V, Y =U V
Under S the boundaries of RXY are mapped as
and the point (1; 2) is mapped to the point (3; 1). Thus S maps RXY into
v v=u
...
u
0
Note that the transformation S is a linear transformation and so we would expect the
Jacobian of the transformation to be a constant.
The joint probability density function of U and V is given by
@(x; y)
g (u; v) = f (w1 (u; v); w2 (u; v))
@(u; v)
= f (v; u v) j 1j
u
= e for (u; v) 2 RU V
and 0 otherwise.
To …nd the marginal probability density functions for U we note that the support set
RU V is not rectangular and the range of integration for v will depend on u. The marginal
probability density function of U is
Z1 Zu
u
g1 (u) = g (u; v) dv = e dv
1 v=0
u
= ue for u > 0
and 0 otherwise which is the probability density function of a Gamma(2; 1) random variable.
Therefore U Gamma(2; 1).
4.2. ONE-TO-ONE TRANSFORMATIONS 129
In the following exercise we see how the sum and di¤erence of two independent Exponential(1)
random variables give a Gamma random variable and a Double Exponential random vari-
able respectively.
4.2.5 Exercise
Suppose X Exponential(1) and Y Exponential(1) independently. Find the joint prob-
ability density function of U = X + Y and V = X Y . Show that U Gamma(2; 1) and
V Double Exponential(0; 1).
In the following example we see how the Gamma and Beta distributions are related.
4.2.6 Example
Suppose X Gamma(a; 1) and Y Gamma(b; 1) independently. Find the joint probability
X
density function of U = X + Y and V = X+Y . Show that U Gamma(a + b; 1) and
X
V Beta(a; b) independently. Find E (V ) by …nding E X+Y .
Solution
Since X Gamma(a; 1) and Y Gamma(b; 1) independently, the joint probability density
function of X and Y is
xa 1e x yb 1e y
f (x; y) = f1 (x) f2 (y) =
(a) (b)
xa 1 y b 1 e x y
=
(a) (b)
with support set RXY = f(x; y) : x > 0; y > 0g which is the same support set as shown in
Figure 4.2.
The transformation
X
S : U =X +Y, V =
X +Y
has inverse transformation
X = U V , Y = U (1 V)
Under S the boundaries of RXY are mapped as
and the point (1; 2) is mapped to the point 3; 13 . Thus S maps RXY into
...
u
0
a+b 1 u va 1 (1 v)b 1
g (u; v) = u
| {z e } (a) (b)
h1 (u) | {z }
h2 (v)
for all (u; v) 2 B1 B2 then, by the Factorization Theorem for Independence, U and V are
independent random variables. Also by the Factorization Theorem for Independence the
probability density function of U must be proportional to h1 (u). By writing
ua+b 1 e u (a + b) a
g (u; v) = v 1
(1 v)b 1
(a + b) (a) (b)
we note that the function in the …rst square bracket is the probability density function of a
Gamma(a + b; 1) random variable and therefore U Gamma(a + b; 1). It follows that the
4.2. ONE-TO-ONE TRANSFORMATIONS 131
function in the second square bracket must be the probability density function of V which
is a Beta(a; b) probability density function. Therefore U Gamma(a + b; 1) independently
of V Beta(a; b).
In Chapter 2, Problem 9 the moments of a Beta random variable were found by inte-
gration. Here is a rather clever way of …nding E (V ) using the mean of a Gamma random
variable. In Exercise 2.7.9 it was shown that the mean of a Gamma( ; ) random variable
is .
Now
X
E (U V ) = E (X + Y ) = E (X) = (a) (1) = a
(X + Y )
since X Gamma(a; 1). But U and V are independent random variables so
a = E (U V ) = E (U ) E (V )
a = E (U ) E (V ) = (a + b) E (V )
4.2.7 Exercise
Suppose X Beta(a; b) and Y Beta(a + b; c) independently. Find the joint probability
density function of U = XY and V = X. Show that U Beta(a; b + c).
In the following example we see how a rather unusual transformation can be used to trans-
form two independent Uniform(0; 1) random variables into two independent N(0; 1) random
variables. This transformation is referred to as the Box-Muller Transformation after the
two statisticians George E. P. Box and Mervin Edgar Muller who published this result in
1958.
Show that U N(0; 1) and V N(0; 1) independently. Explain how you could use this
result to generate independent observations from a N(0; 1) distribution.
132 4. FUNCTIONS OF TWO OR MORE RANDOM VARIABLES
Solution
Since X Uniform(0; 1) and Y Uniform(0; 1) independently, the joint probability density
function of X and Y is
with support set RXY = f(x; y) : 0 < x < 1; 0 < y < 1g.
Consider the transformation
To determine the support set of (U; V ) we note that 0 < y < 1 implies 1 < cos (2 y) < 1.
Also 0 < x < 1 implies 0 < ( 2 log X)1=2 < 1. Therefore u = ( 2 log x)1=2 cos (2 y) takes
on values in the interval ( 1; 1). By a similar argument v = ( 2 log x)1=2 sin (2 y) also
takes on values in the interval ( 1; 1). Therefore the support set of (U; V ) is RU V = <2 .
The inverse of the transformation S can be determined. In particular we note that since
h i2 h i2
U 2 + V 2 = ( 2 log X)1=2 cos (2 Y ) + ( 2 log X)1=2 sin (2 Y )
= ( 2 log X) cos2 (2 Y ) + sin2 (2 Y )
= 2 log X
and
V sin (2 Y )
= = tan (2 Y )
U cos (2 Y )
the inverse transformation is
X=e 2
1
(U 2 +V 2 ) , Y = 1 arctan V
2 U
To determine the Jacobian of the inverse transformation it is simpler to use the result
2 3 1
1 @u @u
@(x; y) @(u; v) @x @y
= = 4 @v @v 5
@(u; v) @(x; y) @x @y
Since
@u @u
@x @y
@v @v
@x @y
1=2
1
x ( 2 log x) cos (2 y) 2 ( 2 log x) 1=2 sin (2 y)
= 1=2
1
x ( 2 log x) sin (2 y) 2 ( 2 log x) 1=2 cos (2 y)
2
= cos2 (2 y) + sin2 (2 y)
x
2
=
x
4.2. ONE-TO-ONE TRANSFORMATIONS 133
Therefore
@(x; y) @(u; v) 1 2 1
= =
@(u; v) @(x; y) x
x
=
2
1 1 2 2
= e 2 (u +v )
2
The joint probability density function of U and V is
@(x; y)
g (u; v) = f (w1 (u; v); w2 (u; v))
@(u; v)
1 1
(u2 +v2 )
= (1) e 2
2
1 1
(u2 +v2 )
= e 2 for (u; v) 2 <2
2
The support set of U is < and the support set of V is <. Since g (u; v) can be written as
1 1 2
u 1 1 2
v
g (u; v) = p e 2 p e 2
2 2
for all (u; v) 2 < < = <2 , therefore by the Factorization Theorem for Independence, U
and V are independent random variables. We also note that the joint probability density
function is the product of two N(0; 1) probability density functions. Therefore U N(0; 1)
and V N(0; 1) independently.
Let x and y be two independent Uniform(0; 1) observations which have been generated
using a random number generator. Then from the result above we have that
The result in the following theorem is one that was used (without proof) in a previous statis-
tics course such as STAT 221/231/241 to construct con…dence intervals and test hypotheses
regarding the mean in a N ; 2 model when the variance 2 is unknown.
Z
T =p t(n)
X=n
134 4. FUNCTIONS OF TWO OR MORE RANDOM VARIABLES
Proof
The transformation T = p Z is not a one-to-one transformation. However if we add the
X=n
variable U = X to complete the transformation and consider the transformation
Z
S: T =p , U =X
X=n
with support set RXZ = f(x; z) : x > 0; z 2 <g. The transformation S maps RXZ into
RT U = f(t; u) : t 2 <; u > 0g.
The Jacobian of the inverse transformation is
@x @x
@ (x; z) @t @u 0 1 u 1=2
= @z @z = u 1=2 @z
=
@ (t; u) @t @u n @u
n
u 1=2 u 1=2
g (t; u) = f t ;u
n n
1 2 u 1=2
= p un=2 1 e u=2 e t u=(2n)
2(n+1)=2 (n=2) n
1 2
= p u(n+1)=2 1 e u(1+t =n)=2 for (t; u) 2 RT U
2(n+1)=2 (n=2) n
and 0 otherwise.
To determine the distribution of T we need to …nd the marginal probability density
function for T .
Z1
g1 (t) = g (t; u) du
1
Z1
1 u(1+t2 =n)=2
= p u(n+1)=2 1
e du
2(n+1)=2 (n=2) n
0
4.2. ONE-TO-ONE TRANSFORMATIONS 135
2 2 1 1
t2
Let y = u2 1 + tn so that u = 2y 1 + tn and du = 2 1 + n dy. Note that when
u = 0 then y = 0, and when u ! 1 then y ! 1. Therefore
Z1 " #
1 (n+1)=2 1
" 1
#
1 t2 y t2
g1 (t) = p 2y 1 + e 2 1+ dy
2(n+1)=2 (n=2) n n n
0
" #
1 (n+1)=2 Z1
1 t2
= p 2(n+1)=2 1+ y (n+1)=2 1
e y
dy
2(n+1)=2 (n=2) n n
0
(n+1)=2
1 t2 n+1
= p 1+
(n=2) n n 2
n+1 (n+1)=2
2p t2
= 1+ for t 2 <
(n=2) n n
which is the probability density function of a random variable with a t(n) distribution.
Therefore
Z
T =p t(n)
X=n
as required.
4.2.10 Example
Solution
If X 2 (n) independently of Z N(0; 1) then we know from the previous theorem that
Z
T =p t(n)
X=n
Now
!
Z p 1=2
E (T ) = E p = nE (Z) E X
X=n
since X and Z are independent random variables. Since E (Z) = 0 it follows that E (T ) = 0
136 4. FUNCTIONS OF TWO OR MORE RANDOM VARIABLES
Z1
k 1
E X = xk xn=2 1
e x=2
dx
2n=2 (n=2)
0
Z1
1
= xk+n=2 1
e x=2
dx let y = x=2
2n=2 (n=2)
0
Z1
1
= (2y)k+n=2 1
e y
(2) dy
2n=2 (n=2)
0
Z1
2k+n=2
= y k+n=2 1
e y
dy
2n=2 (n=2)
0
2k (n=2 + k)
= (4.3)
(n=2)
which exists for n=2 + k > 0. If k = 1=2 then the integral exists for n=2 > 1=2 or n > 1.
Therefore
E (T ) = 0 for n > 1
Now
V ar (T ) = E T 2 [E (T )]2
= E T2 since E (T ) = 0
and
Z2
E T2 = E = nE Z 2 E X 1
X=n
Since Z N(0; 1) then
E Z2 = V ar (Z) + [E (Z)]2 = 1 + 02
= 1
Also by (4.3)
2 1 (n=2 1) 1
1
E X = =
(n=2) 2 (n=2 1)
1
=
n 2
which exists for n > 2. Therefore
V ar (T ) = E T 2 = nE Z 2 E X 1
1
= n (1)
n 2
n
= for n > 2
n 2
4.3. MOMENT GENERATING FUNCTION TECHNIQUE 137
The following theorem concerns the F distribution which is used in testing hypotheses about
the parameters in a multiple linear regression model.
X=n
U= F(n; m)
Y =m
4.2.12 Exercise
(a) Prove Theorem 4.2.11. Hint: Complete the transformation with V = Y .
(b) Find E(U ) and V ar(U ) and note for what values of n and m that they exist.
Hint: Use the technique and results of Example 4.2.10.
4.3.1 Theorem
Suppose X1 ; X2 ; : : : ; Xn are independent random variables and Xi has moment generating
function Mi (t) which exists for t 2 ( h; h) for some h > 0. The moment generating function
P
n
of Y = Xi is given by
i=1
Q
n
MY (t) = Mi (t)
i=1
for t 2 ( h; h).
If the Xi ’s are independent and identically distributed random variables each with
P
n
moment generating function M (t) then Y = Xi has moment generating function
i=1
MY (t) = [M (t)]n
for t 2 ( h; h).
Proof
P
n
The moment generating function of Y = Xi is
i=1
138 4. FUNCTIONS OF TWO OR MORE RANDOM VARIABLES
MY (t) = E etY
P
n
= E exp t Xi
i=1
Q
n
= E etXi since X1 ; X2 ; : : : ; Xn are independent random variables
i=1
Q
n
= Mi (t) for t 2 ( h; h)
i=1
as required.
Note: This theorem in conjunction with the Uniqueness Theorem for Moment Generating
Functions can be used to …nd the distribution of Y .
Here is a summary of results about sums of random variables for the named distributions.
P
n
2 P
n
Xi ki
i=1 i=1
Proof
(1) Suppose Xi Binomial(ni ; p), i = 1; 2; : : : ; n independently. The moment generating
function of Xi is
ni
Mi (t) = pet + q for t 2 <
P
n
for i = 1; 2; : : : ; n. By Theorem 4.3.1 the moment generating function of Y = Xi is
i=1
Q
n
MY (t) = Mi (t)
i=1
Q
n
ni
= pet + q
i=1
P
n
ni
= pet + q i=1 for t 2 <
P
n
which is the moment generating function of a Binomial ni ; p random variable. There-
i=1
P
n P
n
fore by the Uniqueness Theorem for Moment Generating Functions Xi Binomial ni ; p .
i=1 i=1
(2) Suppose Xi Poisson( i ), i = 1; 2; : : : ; n independently. The moment generating func-
tion of Xi is
t
Mi (t) = e i (e 1)
for t 2 <
P
n
for i = 1; 2; : : : ; n. By Theorem 4.3.1 the moment generating function of Y = Xi is
i=1
Q
n
MY (t) = Mi (t)
i=1
Q
n t
= e i (e 1)
i=1
!
P
n
i (et 1)
= e i=1
for t 2 <
P
n
which is the moment generating function of a Poisson i random variable. Therefore
i=1
P
n P
n
by the Uniqueness Theorem for Moment Generating Functions Xi Poisson i .
i=1 i=1
140 4. FUNCTIONS OF TWO OR MORE RANDOM VARIABLES
Q
n
MY (t) = Mi (t)
i=1
Q
n p ki
=
i=1 1 qet
Pn
ki
p i=1
= for t < log (q)
1 qet
P
n
which is the moment generating function of a Negative Binomial ki ; p random vari-
i=1
able. Therefore by the Uniqueness Theorem for Moment Generating Functions
Pn Pn
Xi Negative Binomial ki ; p .
i=1 i=1
(4) Suppose Xi Exponential( ), i = 1; 2; : : : ; n independently. The moment generating
function of each Xi is
1 1
M (t) = for t <
1 t
Pn
for i = 1; 2; : : : ; n. By Theorem 4.3.1 the moment generating function of Y = Xi is
i=1
n
1 1
MY (t) = [M (t)]n = = for t <
1 t
which is the moment generating function of a Gamma(n; ) random variable. Therefore by
P
n
the Uniqueness Theorem for Moment Generating Functions Xi Gamma(n; ).
i=1
(5) Suppose Xi Gamma( i ; ), i = 1; 2; : : : ; n independently. The moment generating
function of Xi is
i
1 1
Mi (t) = for t <
1 t
P
n
for i = 1; 2; : : : ; n. By Theorem 4.3.1 the moment generating function of Y = Xi is
i=1
Q
n Q
n 1 i
MY (t) = Mi (t) =
i=1 i=1 1 t
P
n
i
1 i=1 1
= for t <
1 t
4.3. MOMENT GENERATING FUNCTION TECHNIQUE 141
P
n
which is the moment generating function of a Gamma i; random variable. There-
i=1
P
n P
n
fore by the Uniqueness Theorem for Moment Generating Functions Xi Gamma i; .
i=1 i=1
(6) Suppose Xi 2 (k
i ), i = 1; 2; : : : ; n independently. The moment generating function
of Xi is
ki
1 1
Mi (t) = for t <
1 2t 2
P
n
for i = 1; 2; : : : ; n. By Theorem 4.3.1 the moment generating function of Y = Xi is
i=1
Q
n
MY (t) = Mi (t)
i=1
ki
Q
n 1
=
i=1 1 2t
P
n
ki
1 i=1 1
= for t <
1 2t 2
2
P
n
which is the moment generating function of a ki random variable. Therefore by
i=1
P
n
2
P
n
the Uniqueness Theorem for Moment Generating Functions Xi ki .
i=1 i=1
and by (6)
2
P
n Xi 2
(n)
i=1
4.3.3 Exercise
Suppose X1 ; X2 ; : : : ; Xn are independent and identically distributed random variables with
moment generating function M (t), E (Xi ) = , and V ar (Xi ) = 2 < 1. Give an ex-
p P
n
pression for the moment generating function of Z = n X = where X = n1 Xi in
i=1
terms of M (t).
The following theorem is one that was used in your previous probability and statistics
courses without proof. The method of moment generating functions now allows us to easily
proof this result.
142 4. FUNCTIONS OF TWO OR MORE RANDOM VARIABLES
P
n P
n P
n
ai Xi N ai i ; a2i 2
i
i=1 i=1 i=1
Proof
Suppose Xi N( i ; 2 ), i = 1; 2; : : : ; n independently. The moment generating function of
i
Xi is
2 2
Mi (t) = e i t+ i t =2 for t 2 <
P
n
for i = 1; 2; : : : ; n. The moment generating function of Y = ai Xi is
i=1
MY (t) = E etY
P
n
= E exp t ai Xi
i=1
Q
n
= E e(ai t)Xi since X1 ; X2 ; : : : ; Xn are independent random variables
i=1
Q
n
= Mi (ai t)
i=1
Q
n 2 2 2
= e i ai t+ i ai t =2
i=1
P
n P
n
= exp ai i t+ a2i 2
i t2 =2 for t 2 <
i=1 i=1
P
n P
n
which is the moment generating function of a N ai i ; a2i 2
i random variable.
i=1 i=1
Therefore by the Uniqueness Theorem for Moment Generating Functions
P
n P
n P
n
ai Xi N ai i ; a2i 2
i
i=1 i=1 i=1
4.3.5 Corollary
If Xi N( ; 2 ), i = 1; 2; : : : ; n independently then
P
n
2
Xi N n ;n
i=1
and
1 Pn 2
X= Xi N ;
n i=1 n
4.3. MOMENT GENERATING FUNCTION TECHNIQUE 143
Proof
To prove that
P
n
2
Xi N n ;n
i=1
P
n P
n P
n
2
Xi N ;
i=1 i=1 i=1
or
P
n
2
Xi N n ;n
i=1
To prove that
1 Pn 2
X= Xi N ;
n i=1 n
we note that
P
n 1
X= Xi
i=1 n
Let ai = n1 , i = , and 2
i = 2 in Theorem 4.3.4 to obtain
!
2
P
n 1 P
n 1 P
n 1 2
X= Xi N ;
i=1 n i=1 n i=1 n
or
2
X N ;
n
4.3.7 Exercise
Prove the identity 4.3.6
4.3.8 Theorem
If Xi N( ; 2 ), i = 1; 2; : : : ; n independently then
2
X N ;
n
independently of
P
n
2
Xi X
(n 1) S 2 i=1 2
2
= 2
(n 1)
where
P
n
2
Xi X
i=1
S2 =
n 1
Proof
For a proof that X and S 2 are independent random variables please see Problem 16.
By identity 4.3.6
P
n P
n
2 2
(Xi )2 = Xi X +n X
i=1 i=1
Since X and S 2 are independent random variables, it follows that U and V are independent
random variables.
By 4.3.2(7)
2
P
n Xi 2
Y = (n)
i=1
n=2 1
MY (t) = (1 2t) for t < (4.4)
2
2
X N ; n was proved in Corollary 4.3.5. By Example 2.6.9
X
p N (0; 1)
= n
1=2 1
MV (t) = (1 2t) for t < (4.5)
2
Since U and V are independent random variables and Y = U + V then
n=2 1=2 1
(1 2t) = MU (t) (1 2t) for t <
2
or
1(n 1)=2
MU (t) = (1 2t) for t <
2
which is the moment generating function of a 2 (n 1) random variable. Therefore by the
Uniqueness Theorem for Moment Generating Functions
(n 1) S 2 2
U= 2
(n 1)
4.3.9 Theorem
If Xi N( ; 2 ), i = 1; 2; : : : ; n independently then
X
p t (n 1)
S= n
Proof
p X
X = n Z
p =r =q
S= n (n 1)S 2 U
2 n 1
n 1
where
X
Z= p N (0; 1)
= n
independently of
(n 1) S 2 2
U= 2
(n 1)
The following theorem is useful for testing the equality of variances in a two sample Normal
model.
146 4. FUNCTIONS OF TWO OR MORE RANDOM VARIABLES
4.3.10 Theorem
Suppose X1 ; X2 ; : : : ; Xn are independent N 1 ; 21 random variables, and independently
Y1 ; Y2 ; : : : ; Ym are independent N 2 ; 22 random variables. Let
P
n
2 P
m
2
Xi X Yi Y
i=1 i=1
S12 = and S22 =
n 1 m 1
Then
S12 = 2
1
F (n 1; m 1)
S22 = 2
2
4.3.11 Exercise
Prove Theorem 4.3.10. Hint: Use Theorems 4.3.8 and 4.2.11.
4.4. CHAPTER 4 PROBLEMS 147
2. Suppose X and Y are continuous random variables with joint probability density
function
f (x; y) = 24xy for 0 < x + y < 1; 0 < x < 1; 0 < y < 1
and 0 otherwise.
3. Suppose X and Y are continuous random variables with joint probability density
function
f (x; y) = e y for 0 < x < y < 1
and 0 otherwise.
4. Suppose X and Y are nonnegative continuous random variables with joint probability
density function f (x; y). Show that the probability density function of U = X + Y
is given by
Z1
g (u) = f (v; u v) dv
0
5. Suppose X and Y are continuous random variables with joint probability density
function
f (x; y) = 2 (x + y) for 0 < x < y < 1
and 0 otherwise.
6. Suppose X and Y are continuous random variables with joint probability density
function
f (x; y) = 4xy for 0 < x < 1; 0 < y < 1
and 0 otherwise.
(a) Find the probability density function of T = X + Y using the cumulative distri-
bution function technique.
(b) Find the joint probability density function of S = X and T = X + Y . Find the
marginal probability density function of T and compare your answer to the one
you obtained in (a).
(c) Find the joint probability density function of U = X 2 and V = XY . Be sure to
specify the support set of (U; V ).
(d) Find the marginal probability density function’s of U and V:
(e) Find E(V 3 ): (Hint: Are X and Y independent random variables?)
7. Suppose X and Y are continuous random variables with joint probability density
function
f (x; y) = 4xy for 0 < x < 1; 0 < y < 1
and 0 otherwise.
(a) Find the joint probability density function of U = X=Y and V = XY: Be sure
to specify the support set of (U; V ).
(b) Are U and V independent random variables?
(c) Find the marginal probability density function’s of U and V . Be sure to specify
their support sets.
8. Suppose X and Y are independent Uniform(0; ) random variables. Find the proba-
bility density function of U = X Y:
(Hint: Complete the transformation with V = X + Y:)
2 1=2
X1 = 1 + 1 Z1 , X2 = 2 + 2[ Z1 + 1 Z2 ]
11. Let X and Y be independent N(0; 1) random variables and let U = X=Y .
13. Let X1 ; X2 ; X3 be independent N(0; 1) random variables. Let the random variables
Y1 ; Y2 ; Y3 be de…ned by
Show that Y1 ; Y2 ; Y3 are independent random variables and …nd their marginal prob-
ability density function’s.
1 P
n
(a) Let ti = si s + s=n; i = 1; 2; : : : ; n where s = n si . Show that
i=1
P
n P
n P
n
2 P
n
E[exp( si Ui + sX)] = E[exp( ti Xi )] = exp ti + t2i =2
i=1 i=1 i=1 i=1
Hint: Since Xi N( ; 2)
2 2
E[exp(ti Xi )] = exp ti + ti =2
P
n P
n P
n
(b) Verify that ti = s and t2i = (si s)2 + s2 =n.
i=1 i=1 i=1
(c) Use (a) and (b) to show that
2 P
n
M (s1 ; s2 ; : : : ; sn ; s) = exp[ s + ( =n)(s2 =2)] exp[ 2
(si s)2 =2]
i=1
(d) Show that the random variable X is independent of the random vector U and
Pn
thus X and (Xi X)2 are independent. Hint: MX (s) = M (0; 0; : : : ; 0; s) and
i=1
MU (s1 ; s2 ; : : : ; sn ) = M (s1 ; s2 ; : : : ; sn ; 0).
5. Limiting or Asymptotic
Distributions
n x (np)x e np
p (1 p)n x
for x = 0; 1; : : : ; n
x x!
n x
P (Xn x) = p (1 p)n x
for x = 0; 1; : : : ; n
x
!
x np
P Z p where Z N (0; 1)
np (1 p)
if n is large and p is close to 1=2 (a special case of the very important Central Limit Theorem)
was used. These are examples of what we will call limiting or asymptotic distributions.
In this chapter we consider a sequence of random variables X1 ; X2 ; : : : ; Xn ; : : : and look
at the de…nitions and theorems related to determining the limiting distribution of such a
sequence. In Section 5:1 we de…ne convergence in distribution and look at several examples
to illustrate its meaning. In Section 5:2 we de…ne convergence in probability and examine
its relationship to convergence in distribution. In Section 5:3 we look at the Weak Law of
Large Numbers which is an important theorem when examining the behaviour of estimators
of unknown parameters (Chapter 6). In Section 5:4 we use the moment generating function
to …nd limiting distributions including a proof of the Central Limit Theorem. The Central
Limit Theorem was used in STAT 221/231/241 to construct an approximate con…dence
interval for an unknown parameter. In Section 5:5 additional limit theorems for …nding
limiting distributions are introduced. These additional theorems allow us to determine new
limiting distributions by combining the limiting distributions which have been determined
from de…nitions, the Weak Law of Large Numbers, and/or the Central Limit Theorem.
151
152 5. LIMITING OR ASYMPTOTIC DISTRIBUTIONS
if
lim Fn (x) = F (x)
n!1
at all points x at which F (x) is continuous. We call F the limiting or asymptotic distribution
of Xn .
Note:
(1) Although we say the random variable Xn converges in distribution to the random
variable X, the de…nition of convergence in distribution is de…ned in terms of the pointwise
convergence of the corresponding sequence of cumulative distribution functions.
(2) This de…nition holds for both discrete and continuous random variables.
(3) One way to think about convergence in distribution is that, if Xn !D X, then for large
n
Fn (x) = P (Xn x) F (x) = P (X x)
if x is a point of continuity of F (x). How good the approximation is will depend on the
values of n and x.
The following theorem and corollary will be useful in determining limiting distributions.
5.1. CONVERGENCE IN DISTRIBUTION 153
cn
b (n)
lim 1+ + = ebc
n!1 n n
5.1.3 Corollary
If b and c are real constants then
cn
b
lim 1+ = ebc
n!1 n
5.1.4 Example
Let Yi Exponential(1) ; i = 1; 2; : : : independently. Consider the sequence of random
variables X1 ; X2 ; : : : ; Xn ; : : : where Xn = max (Y1 ; Y2 ; : : : ; Yn ) log n. Find the limiting
distribution of Xn .
Solution
Since Yi Exponential(1)
(
0 y 0
P (Yi y) = y
1 e y>0
As n ! 1, log n ! 1 so
n
( e x)
lim Fn (x) = lim 1+
n!1 n!1 n
e x
= e for x 2 <
by 5.1.3.
154 5. LIMITING OR ASYMPTOTIC DISTRIBUTIONS
1
0.9 n =1
n =2
0.8 n =5
n = 10
0.7 n = infinity
0.6
Fn(x)
0.5
0.4
0.3
0.2
0.1
0
-3 -2 -1 0 1 2 3 4
x
x n
e
Figure 5.1: Graphs of Fn (x) = 1 n for n = 1; 2; 5; 10; 1
5.1.5 Example
Let Yi Uniform(0; ), i = 1; 2; : : : independently. Consider the sequence of random vari-
ables X1 ; X2 ; : : : ; Xn ; : : : where Xn = max (Y1 ; Y2 ; : : : ; Yn ). Find the limiting distribution
of Xn .
Solution
Since Yi Uniform(0; ) 8
>
> 0 y 0
<
y
P (Yi y) = 0<y<
>
>
: 1 y
5.1. CONVERGENCE IN DISTRIBUTION 155
Therefore (
0 x<
lim Fn (x) = = F (x)
n!1 1 x
In Figure 5.2 you can see how quickly the curves Fn (x) approach the limiting curve F (x).
1
n =1
0.9 n =2
n =5
0.8
n=10
0.7 n=100
n=infinity
0.6
F n(x)
0.5
0.4
0.3
0.2
0.1
0
-0.5 0 0.5 1 1.5 2 2.5
x
x n
Figure 5.2: Graphs of Fn (x) = for = 2 and n = 1; 2; 5; 10; 100; 1
Therefore
Xn !D X
Since X only takes on one value with probability one, X is called a degenerate random
variable. When Xn converges in distribution to a degenerate random variable we also call
this convergence in probability to a constant as de…ned in the next section.
5.1.6 Comment
Suppose X1 ; X2 ; : : : ; Xn ; : : :is a sequence of random variables such that Xn !D X. Then
for large n we can use the approximation
P (Xn x) P (X x)
If X is degenerate at b then P (X = b) = 1 and this approximation is not very useful.
However, if the limiting distribution is degenerate then we could use this result in another
way. In Example 5.1.5 we showed that if Yi Uniform(0; ), i = 1; 2; : : : ; n indepen-
dently then Xn = max (Y1 ; Y2 ; : : : ; Yn ) converges in distribution to a degenerate random
variable X with P (X = ) = 1. This result is rather useful since, if we have observed data
y1 ; y2 ; : : : ; yn from a Uniform(0; ) distribution and is unknown, then this suggests using
y(n) = max (y1 ; y2 ; : : : ; yn ) as an estimate of if n is reasonably large. We will discuss this
idea in more detail in Chapter 6.
In Example 5.1.5 the limiting distribution was degenerate. When the limiting distribution
is degenerate we say Xn converges in probability to a constant. The following de…nition,
which follows from De…nition 5.2.1, can be used for proving convergence in probability.
5.2.4 Example
Suppose X1 ; X2 ; : : : ; Xn ; : : : is a sequence of random variables with E(Xn ) = n and
V ar(Xn ) = 2n . If lim n = a and lim 2n = 0 then show Xn !p a.
n!1 n!1
Solution
To show Xn !p a we need to show that for all " > 0
or equivalently
lim P (jXn aj ") = 0
n!1
Recall Markov’s Inequality. For all k; c > 0
Xk E
P (jXj c)
ck
Therefore by Markov’s Inequality with k = 2 and c = " we have
E jXn aj2
0 lim P (jXn aj ") lim (5.1)
n!1 n!1 "2
Now
h i
2 2
E jXn aj = E (Xn a)
h i
2
= E (Xn n ) + 2( n a) E (Xn n) +( n a)2
158 5. LIMITING OR ASYMPTOTIC DISTRIBUTIONS
Since
2 2
lim E [Xn n] = lim V ar (Xn ) = lim n =0
n!1 n!1 n!1
and
lim ( n a) = 0 since lim n =a
n!1 n!1
therefore
h i
lim E jXn aj2 = lim E (Xn a)2 = 0
n!1 n!1
The proof in Example 5.2.4 used De…nition 5.2.3 to prove convergence in probability. The
reason for this is that the distribution of the Xi ’s was not speci…ed. Only conditions on
E(Xn ) and V ar(Xn ) were speci…ed. This means that the result in Example 5.2.4 holds for
any sequence of random variables X1 ; X2 ; : : : ; Xn ; : : : satisfying the given conditions.
5.2.5 Theorem
Suppose X1 ; X2 ; : : : ; Xn ; : : : is a sequence of random variables such that Xn has cumulative
distribution function Fn (x). If
8
<0 x<b
lim Fn (x) = lim P (Xn x) =
n!1 n!1 :1 x>b
then Xn !p b.
Note: We do not need to worry about whether lim Fn (b) exists since x = b is a point of
n!1
discontinuity of the limiting distribution (see De…nition 5.1.1).
5.3. WEAK LAW OF LARGE NUMBERS 159
5.2.6 Example
Let Yi Exponential( ; 1), i = 1; 2; : : : independently. Consider the sequence of random
variables X1 ; X2 ; : : : ; Xn ; : : : where Xn = min (Y1 ; Y2 ; : : : ; Yn ). Show that Xn !p .
Solution
Since Yi Exponential( ; 1)
( (y )
e y>
P (Yi > y) =
1 y
Therefore (
0 x
lim Fn (x) =
n!1 1 x>
which we note is not a cumulative distribution function since the function is not right-
continuous at x = . However
(
0 x<
lim Fn (x) =
n!1 1 x>
and therefore by Theorem 5.2.5, Xn !p . In Figure 5.3 you can see how quickly the limit
is approached.
1
0.9
0.8
0.7
0.6
F n(x)
0.5
0.4 n =1
n =2
0.3 n =5
n = 10
0.2 n=100
n = infinity
0.1
0
1.5 2 2.5 3 3.5 4
x
Proof
Using De…nition 5.2.3 we need to show
k 1
P Xn p for all k > 0
n k2
p
n"
Let k = . Then
2
0 P Xn " for all " > 0
n"
Since
2
lim =0
n!1 n"
5.3. WEAK LAW OF LARGE NUMBERS 161
Notes:
(1) The proof of the Weak Law of Large Numbers does not actually require that the random
variables be identically distributed, only that they all have the same mean and variance.
As well the proof does not require knowing the distribution of these random variables.
(2) In words the Weak Law of Large Numbers says that the sample mean Xn approaches
the population mean as n ! 1.
5.3.2 Example
If X Pareto(1; ) then X has probability density function
f (x) = for x 1
x +1
and 0 otherwise. X has cumulative distribution function
8
<0 if x < 1
F (x) =
:1 1
for x 1
x
The points (i; xi ), i = 1; 2; : : : ; 500 for one simulation of 500 observations from a Pareto(1; 5)
distribution are plotted in Figure 5.4.
4.5
3.5
3
xi
2.5
1.5
0.5
0 100 200 300 400 500
i
1 P
n
Figure 5.5 shows a plot of the points (n; xn ), n = 1; 2; : : : ; 500 where xn = n xi is
i=1
the sample mean. We note that the sample mean xn is approaching the population mean
= E (X) = 5 5 1 = 1:25 as n increases.
1.5
1.45
1.4
1.35
sample
mean
1.3
1.25 µ=1.25
1.2
1.15
0 100 200 300 400 500
n
If we generate a further 1000 values of xi , and plot (i; xi ) ; i = 1; 2; : : : ; 1500 we obtain the
graph in Figure 5.6.
5.5
4.5
3.5
xi
3
2.5
1.5
0.5
0 500 1000 1500
i
The corresponding plot of (n; xn ), n = 1; 2; : : : ; 1500 in shown in Figure 5.7. We note that
the sample mean xn stays very close to the population mean = 1:25 for n > 500.
1.5
1.45
1.4
1.35
sample
mean
1.3
1.25 µ=1.25
1.2
1.15
0 500 1000 1500
n
Note that these …gures correspond to only one set of simulated data. If we generated
another set of data using a random number generator the actual data points would change.
However what would stay the same is that the sample mean for the new data set would
still approach the mean value E (X) = 1:25 as n increases.
164 5. LIMITING OR ASYMPTOTIC DISTRIBUTIONS
The points (i; xi ), i = 1; 2; : : : ; 500 for one simulation of 500 observations from a Pareto(1; 0:5)
distribution are plotted in Figure 5.8. For this distribution
Z1 Z1
1
E (X) = x +1
dx = dx which diverges to 1
x x
1 1
Note in Figure 5.8 that the are some very large observations. In particular there is one
observation which is close to 40; 000.
4
x10
4
3.5
2.5
xi
2
1.5
0.5
0
0 100 200 300 400 500
i
The corresponding plot of (n; xn ), n = 1; 2; : : : ; 500 for these data is given in Figure 5.9.
Note that the mean xn does not appear to be approaching a …xed value.
700
600
500
sample
mean
400
300
200
100
0
0 100 200 300 400 500
n
In Figure 5.10 the points (n; xn ) for a set of 50; 000 observations generated from a Pareto(1; 0:5)
distribution are plotted. Note that the mean xn does not approach a …xed value and in
general is getting larger as n gets large. This is consistent with E (X) diverging to 1.
4
x10
12
10
8
sample
mean
6
0
0 1 2 3 4 5
n x10 4
5.3.3 Example
If X Cauchy(0; 1) then
1
f (x) = for x 2 <
(1 + x2 )
The probability density function, shown in Figure 5.11, is symmetric about the y axis and
the median of the distribution is equal to 0.
0.35
0.3
0.25
0.2
f(x)
0.15
0.1
0.05
0
-5 0 5
x
Z0
1 x
dx diverges to 1 (5.3)
1 + x2
1
and
Z1
1 x
dx diverges to 1 (5.4)
1 + x2
0
1 1
F (x) = tan x for 0 < x < 1
2
0
sample
mean -2
-4
-6
-8
-10
-12
0 1 2 3 4 5
n x 10 5
then
Xn !D X
Note:
(1) The sequence of random variables X1 ; X2 ; : : : ; Xn ; : : : converges if the corresponding se-
quence of moment generating functions M1 (t) ; M1 (t) ; : : : ; Mn (t) ; : : : converges pointwise.
(2) This de…nition holds for both discrete and continuous random variables.
at all points x at which F (x) is continuous. If X is a discrete random variable then the
cumulative distribution function is a right continuous function. The values of x of main
interest for a discrete random variable are exactly the points at which F (x) is discontinu-
ous. The following theorem indicates that lim Fn (x) = F (x) holds for the values of x at
n!1
which F (x) is discontinuous if Xn and X are non-negative integer-valued random variables.
The named discrete distributions Bernoulli, Binomial, Geometric, Negative Binomial, and
Poisson are all non-negative integer-valued random variables.
168 5. LIMITING OR ASYMPTOTIC DISTRIBUTIONS
5.4.2 Theorem
Suppose Xn and X are non-negative integer-valued random variables. If Xn !D X then
lim P (Xn x) = P (X x) holds for all x and in particular
n!1
5.4.3 Example
Consider the sequence of random variables X1 ; X2 ; : : : ; Xk ; : : : where
Xk Negative Binomial(k; p). Use Theorem 5.4.1 to determine the limiting distribution
of Xk as k ! 1, p ! 1 such that kq=p = remains constant where q = 1 p. Use this
limiting distribution and Theorem 5.4.2 to give an approximation for P (Xk = x).
Solution
If Xk Negative Binomial(k; p) then
k
p
Mk (t) = E etXk = for t < log q (5.5)
1 qet
If = kq=p then
k
p= and q = (5.6)
+k +k
Substituting 5.6 into 5.5 and simplifying gives
k
!k !k
+k 1
Mk (t) = t
= +k et
1 +k e k
" #k
1
= (et 1)
1 k
" # k
et 1
= 1 for t < log
k +k
Now " # k
et 1 t
lim 1 = e (e 1)
for t < 1
k!1 k
t
by Corollary 5.1.3. Since M (t) = e (e 1) for t 2 < is the moment generating function of
a Poisson( ) random variable then by Theorem 5.4.1, Xk !D X Poisson( ).
By Theorem 5.4.2
x
kq kq=p
k k p e
P (Xk = x) = p ( q)x for x = 0; 1; : : :
x x!
5.4. MOMENT GENERATING FUNCTION TECHNIQUE FOR LIMITING DISTRIBUTIONS169
In your previous probability and statistics courses you would have used the Central Limit
Theorem (without proof!) for approximating Binomial and Poisson probabilities as well as
constructing approximate con…dence intervals. We now give a proof of this theorem.
p
n Xn
Zn = !D Z N(0; 1)
Proof
We can write Zn as
1 P n
Zn = p (Xi )
n i=1
and
h i
M 00 (0) = E (Xi )2 = V ar (Xi ) = 2
1
M (t) = M (0) + M 0 (0) t + M 00 (c) t2
2
1 00 2
= 1 + M (c) t (5.7)
2
Mn (t) = E etZn
t P n
= E exp p (Xi )
n i=1
n
t t
= M p for p <h (5.8)
n n
Using (5.7) in (5.8) gives
" #
2 n
1 t
Mn (t) = 1 + M 00 (cn ) p
2 n
( )n
2 2
1 t 1 t
= 1+ p M 00 (cn ) M 00 (0) + p M 00 (0)
2 n 2 n
t
for some cn between 0 and p
n
But M 00 (0) = 2 so
( )n
1 2
2t t2 [M 00 (cn ) M 00 (0)]
Mn (t) = 1+ +
n 2 2 n
t
for some cn between 0 and p
n
t
Since cn is between 0 and p
n
, cn ! 0 as n ! 1. Since M 00 (t) is continuous on ( h; h)
and
t2 [M 00 (cn ) M 00 (0)]
lim =0
n!1 2 2 n
Therefore by Theorem 5.1.2, with
t2
(n) = M 00 (cn ) M 00 (0)
2 2
we have
( )n
1 2
2t t2 [M 00 (cn ) M 00 (0)]
lim Mn (t) = lim 1+ +
n!1 n!1 n 2 2 n
1 2
= e2t for jtj < 1
which is the moment generating function of a N(0; 1) random variable. Therefore by The-
orem 5.4.1
Zn !D Z N(0; 1)
5.4. MOMENT GENERATING FUNCTION TECHNIQUE FOR LIMITING DISTRIBUTIONS171
as required.
Note: Although this proof assumes that the moment generating function of Xi , i = 1; 2; : : :
exists, it does not make any assumptions about the form of the distribution of the Xi ’s.
There are other more general proofs of the Central Limit Theorem which only assume the
existence of the variance 2 (which implies the existence of the mean ).
2
5.4.6 Example - Normal Approximation to the Distribution
Suppose Yn 2 (n), n = 1; 2; : : :. Consider the sequence of random variables Z ; Z ; : : : ; Z ; : : :
1 2 n
p
where Zn = (Yn n) = 2n. Show that
Yn n
Zn = p !D Z N(0; 1)
2n
Solution
Let Xi 2 (1), i = 1; 2; : : : independently. Since X ; X ; : : : are independent and identically
1 2
distributed random variables with E (Xi ) = 1 and V ar (Xi ) = 2, then by the Central Limit
Theorem p
n Xn 1
p !D Z N(0; 1)
2
Pn
But Xn = n1 Xi so
i=1
p 1 P
n
n n Xi 1
i=1 Sn n
p = p
2 2n
P
n
where Sn = Xi . Therefore
i=1
Sn n
p !D Z N(0; 1)
2n
Now by 4.3.2(6), Sn 2 (n) and therefore Yn and Sn have the same distribution. It follows
that
Yn n
Zn = p !D Z N(0; 1)
2n
Yn np
Zn = p !D Z N(0; 1)
np(1 p)
Proof of (1)
Since g is continuous at x = a then for every " > 0 there exists a > 0 such that jx aj <
implies jg (x) g (a)j < ". By Example 2.1.4(d) this implies that
But Xn !p a it follows that for every " > 0 there exists a > 0 such that
But
lim P (jg (Xn ) g (a)j < ") 1
n!1
5.5.2 Example
If Xn !p a > 0, Yn !p b 6= 0 and Zn !D Z N(0; 1) then …nd the limiting distributions
of each of the following:
p
(b) Xn
(b) Xn + Yn
(c) Yn + Zn
(d) Xn Zn
(e) Zn2
5.5. ADDITIONAL LIMIT THEOREMS 173
Solution
p
(a) Let g (x) = x which is a continuous function for all x 2 <+ . Since Xn !p a then by
p p p p
5.5.1(1), Xn = g (Xn ) !p g (a) = a or Xn !p a.
(b) Let g (x; y) = x + y which is a continuous function for all (x; y) 2 <2 . Since Xn !p a
and Yn !p b then by 5.5.1(2), Xn + Yn = g (Xn ; Yn ) !p g (a; b) = a + b or Xn + Yn !p a + b.
(c) Let g (y; z) = y + z which is a continuous function for all (y; z) 2 <2 . Since Yn !p b
and Zn !D Z N(0; 1) then by 5.5.1(3), Yn + Zn = g (Yn ; Zn ) !D g (b; z) = b + Z or
Yn + Zn !D b + Z where Z N(0; 1). Since b + Z N(b; 1), therefore Yn + Zn !D b + Z
N(b; 1)
(d) Let g (x; z) = xz which is a continuous function for all (x; z) 2 <2 . Since Xn !p a and
Zn !D Z N(0; 1) then by Slutsky’s Theorem, Xn Zn = g (Xn ; Zn ) !D g (a; z) = aZ or
Xn Zn !D aZ where Z N(0; 1). Since aZ N 0; a2 , therefore Xn Zn !D aZ N 0; a2
(e) Let g (x; z) = z 2 which is a continuous function for all (x; z) 2 <2 . Since Zn !D Z
N(0; 1) then by Slutsky’s Theorem, Zn2 = g (Xn ; Zn ) !D g (a; z) = Z 2 or Zn2 !D Z 2 where
Z N(0; 1). Since Z 2 2 (1), therefore Z 2 ! Z 2
n D
2 (1)
5.5.3 Exercise
If Xn !p a > 0, Yn !p b 6= 0 and Zn !D Z N(0; 1) then …nd the limiting distributions
of each of the following:
(a) Xn2
(b) Xn Yn
(c) Xn =Yn
(d) Xn 2Zn
(e) 1=Zn
In Example 5.5.2 we identi…ed the function g in each case. As with other limit theorems we
tend not to explicitly identify the function g once we have a good idea of how the theorems
work as illustrated in the next example.
5.5.4 Example
Suppose Xi Poisson( ), i = 1; 2; : : : independently. Consider the sequence of random
variables Z1 ; Z2 ; : : : ; Zn ; : : : where
p
n Xn
Zn = p
Xn
Solution
Since X1 ; X2 ; : : : are independent and identically distributed random variables with
E (Xi ) = and V ar (Xi ) = , then by the Central Limit Theorem
p
n Xn
Wn = p !D Z N(0; 1) (5.9)
Xn !p (5.10)
Now
p
n Xn
Zn = p
Xn
p
n(Xn )
p
= q
Xn
Wn
=
Un
By (5.9), (5.11), and Slutsky’s Theorem
Wn Z
Zn = !D =Z N(0; 1)
Un 1
5.5.5 Example
Suppose Xi Uniform(0; 1), i = 1; 2; : : : independently. Consider the sequence of random
variables U1 ; U2 ; : : : ; Un ; : : : where Un = max (X1 ; X2 ; : : : ; Xn ). Show that
(a) Un !p 1
(b) eUn !p e
(c) sin (1 Un ) !p 0
(d) Vn = n (1 Un ) !D V Exponential(1)
(e) 1 e Vn !D 1 e V Uniform(0; 1)
2
(f ) (Un + 1) [n (1 Un )] !D 4V Exponential(4)
5.5. ADDITIONAL LIMIT THEOREMS 175
Solution
(a) Since X1 ; X2 ; : : : ; Xn are Uniform(0; 1) random variables then for i = 1; 2; : : :
8
>
> 0 x 0
<
P (Xi x) = x 0<x<1
>
>
: 1 x 1
Fn (x) = P (Un u)
= P (max (X1 ; X2 ; : : : ; Xn ) u)
Qn
= P (Xi u)
i=1
8
>
> 0 u 0
<
= un 0<u<1
>
>
: 1 u 1
Therefore (
0 u<1
lim Fn (u) =
n!1 1 u 1
Gn (x) = P (Vn v)
= P (n (1 max (X1 ; X2 ; : : : ; Xn )) v)
v
= P max (X1 ; X2 ; : : : ; Xn ) 1
n
v
= 1 P max (X1 ; X2 ; : : : ; Xn ) 1
n
Q
n v
= 1 P Xi 1
i=1 n
(
0 v 0
= v n
1 1 n v>0
176 5. LIMITING OR ASYMPTOTIC DISTRIBUTIONS
Therefore (
0 v 0
lim Gn (v) = v
n!1 1 e v>0
Vn = n (1 Un ) !D V Exponential(1) (5.13)
d
g (w) = elog(1 w)
[ log (1 w)]
dw
= 1 for 0 < w < 1
(Un + 1)2 [n (1 Un )] !D 4V
nb (Xn a) !D X (5.15)
for some b > 0. Suppose the function g(x) is di¤erentiable at a and g 0 (a) 6= 0. Then
Proof
By Taylor’s Theorem (2.11.15) we have
or
g (Xn ) g (a) = g 0 (cn ) (Xn a) (5.16)
where cn is between a and Xn .
From (5.15) it follows that Xn !p a. Since cn is between Xn and a, therefore cn !p a and
by 5.5.1(1)
g 0 (cn ) !p g 0 (a) (5.17)
Multiplying (5.16) by nb gives
5.5.7 Example
Suppose Xi Exponential( ), i = 1; 2; : : : independently. Find the limiting distributions
of each of the following:
(a) Xn
p
(b) Un = n Xn
p
n(Xn )
(c) Zn =
p Xn
(d) Vn = n log(Xn ) log
Solution
(a) Since X1 ; X2 ; : : : are independent and identically distributed random variables with
E (Xi ) = and V ar (Xi ) = 2 , then by the Weak Law of Large Numbers
Xn !p (5.19)
(b) Since X1 ; X2 ; : : : are independent and identically distributed random variables with
E (Xi ) = and V ar (Xi ) = 2 , then by the Central Limit Theorem
p
n Xn
Wn = !D Z N (0; 1) (5.20)
Z
Zn !D =Z N(0; 1)
1
1
(d) Let g (x) = log x, a = , and b = 1=2. Then g 0 (x) = x and g 0 (a) = g 0 ( ) = 1 . By
(5.21) and the Delta Method
1
n1=2 log(Xn ) log !D ( Z) = Z N (0; 1)
5.5.8 Exercise
Suppose Xi Poisson( ), i = 1; 2; : : : independently. Show that
p
Un = n Xn !D U N (0; )
p p p 1
and Vn = n Xn !D V N 0;
4
5.5.9 Theorem
Let X1 ; X2 ; : : : ; Xn ; : : : be a sequence of random variables such that
p 2
n(Xn a) !D X N 0; (5.23)
provided g 0 (a) 6= 0.
Proof
Suppose g(x) is a di¤erentiable function at a and g 0 (a) 6= 0. Let b = 1=2. Then by (5.23)
and the Delta Method it follows that
p 2
n[g(Xn ) g(a)] !D W N 0; g 0 (a) 2
5.6. CHAPTER 5 PROBLEMS 179
Show that
1 2
lim log Mn (t) = t
n!1 2
What is the limiting distribution of Yn ?
is h p
t= n p i n
Mn (t) = e 1 t= n
Z
Tn = p t(n)
Wn =n
Show that
Tn !D Y N(0; 1)
180 5. LIMITING OR ASYMPTOTIC DISTRIBUTIONS
1 Pn
Xn = Xi
n i=1
1 P n
2
Sn2 = Xi Xn
n 1 i=1
and p
n Xn
Tn =
Sn
Show that
Sn ! p
and
Tn !D Z N(0; 1)
Xn
(a) Tn = n
Xn Xn
(b) Un = n 1 n
p Xn
(c) Wn = n n
Wn
(d) Zn = p
Un
p q p
Xn
(e) Vn = n arcsin n arcsin
(f) Compare the variances of the limiting distributions of Wn , Zn and Vn and com-
ment.
Yn
(a) Xn = n
p (1 )
(b) Wn = n Xn
1
(c) Vn = 1+Xn
p
(d) Zn = p n(V n )
Vn2 (1 Vn )
5.6. CHAPTER 5 PROBLEMS 181
E(Xn ) =
and
a
V ar(Xn ) = for p > 0
np
Show that
Xn !p
182 5. LIMITING OR ASYMPTOTIC DISTRIBUTIONS
6. Maximum Likelihood
Estimation - One Parameter
In this chapter we look at the method of maximum likelihood to obtain both point and
interval estimates of one unknown parameter. Some of this material was introduced in a
previous statistics course such as STAT 221/231/241.
In Section 6:2 we review the de…nitions needed for the method of maximum likelihood
estimation, the derivations of the maximum likelihood estimates for the unknown parameter
in the Binomial, Poisson and Exponential models, and the important invariance property of
maximum likelihood estimates. You will notice that we pay more attention to verifying that
the maximum likelihood estimate does correspond to a maximum using the …rst derivative
test. Example 6.2.9 is new and illustrates how the maximum likelihood estimate is found
when the support set of the random variable depends on the unknown parameter.
In Section 6:3 we de…ne the score function, the information function, and the expected
information function. These functions play an important role in the distribution of the
maximum likelihood estimator. These functions are also used in Newton’s Method which
is a method for determining the maximum likelihood estimate in cases where there is no
explicit solution. Although the maximum likelihood estimates in nearly all the examples
you saw previously could be found explicitly, this is not true in general.
In Section 6:6 we review how to …nd a con…dence interval using a pivotal quantity.
Con…dence intervals also give us a way to summarize the uncertainty in an estimate. We
also give a theorem on how to obtain a pivotal quantity using the maximum likelihood
estimator if the parameter is either a scale or location parameter. In Section 6:7 we review
how to …nd an approximate con…dence interval using an asymptotic pivotal quantity. We
then show how to use asymptotic pivotal quantities based on the limiting distribution of
the maximum likelihood estimator to construct approximate con…dence intervals.
183
184 6. MAXIMUM LIKELIHOOD ESTIMATION - ONE PARAMETER
6.1 Introduction
Suppose the random variable X (possibly a vector of random variables) has probability
function/probability density function f (x; ). Suppose also that is unknown and 2
where is the parameter space or the set of possible values of . Let X be the potential
data that is to be collected. In your previous statistics course you learned how numerical
and graphical summaries as well as goodness of …t tests could be used to check whether the
assumed model for an observed set of data x was reasonable. In this course we will assume
that the …t of the model has been checked and that the main focus now is to use the model
and the data to determine point and interval estimates of .
A statistic, T = T (X), is a function of the data X which does not depend on any unknown
parameters.
6.1.2 Example
6.1.4 Example
L( ) = L ( ; x)
= P (observing the data x; )
= P (X = x; )
= f (x; ) for 2
L( ) = L ( ; x)
= P (observing the data x; )
= P (X1 = x1 ; X2 = x2 ; : : : ; Xn = xn ; )
Qn
= f (xi ; ) for 2
i=1
The shape of the likelihood function and the value of at which it is maximized are not
a¤ected if L( ) is multiplied by a constant. Indeed it is not the absolute value of the
likelihood function that is important but the relative values at two di¤erent values of the
186 6. MAXIMUM LIKELIHOOD ESTIMATION - ONE PARAMETER
parameter, for example, L( 1 )=L( 2 ). This ratio can be interpreted as how much more or
less consistent the data are with the parameter 1 as compared to 2 . The ratio L( 1 )=L( 2 )
is also una¤ected if L( ) is multiplied by a constant. In view of this the likelihood may be
de…ned as P (X = x; ) or as any constant multiple of it.
Since the log function (Note: log = ln) is an increasing function, the value of which
maximizes the likelihood L ( ) also maximizes log L( ), the logarithm of the likelihood
function. Since it is usually simpler to …nd the derivative of the sum of n terms rather
than the product, it is often easier to determine the maximum likelihood estimate of by
solving dd log L ( ) = 0.
l( ) = l ( ; x) = log L( ) for 2
where x are the observed data and log is the natural logarithmic function.
6.2.4 Example
Suppose in a sequence of n Bernoulli trials the probability of success is equal to and we
have observed x successes. Find the likelihood function, the log likelihood function, the
maximum likelihood estimate of and the maximum likelihood estimator of .
Solution
The likelihood function for based on x successes in n trials is
L( ) = P (X = x; )
n x
= (1 )n x
for 0 1
x
or more simply
L( ) = x
(1 )n x
for 0 1
6.2. MAXIMUM LIKELIHOOD METHOD 187
with derivative
d x n x
l( ) =
d 1
x (1 ) (n x)
=
(1 )
x n
= for 0 < <1
(1 )
6.2.5 Example
Suppose x1 ; x2 ; : : : ; xn is an observed random sample from the Poisson( ) distribution. Find
the likelihood function, the log likelihood function, the maximum likelihood estimate of
and the maximum likelihood estimator of .
Solution
The likelihood function is
Q
n Q
n
L( ) = f (xi ; ) = P (Xi = xi ; )
i=1 i=1
Q
n xi
e
=
i=1 xi !
P
n
Q
n 1 xi
n
= i=1 e for 0
i=1 xi !
188 6. MAXIMUM LIKELIHOOD ESTIMATION - ONE PARAMETER
or more simply
nx n
L( ) = e for 0
Suppose x 6= 0. The log likelihood function is
with derivative
d x
l( ) = n 1
d
n
= (x ) for >0
The solution to dd l( ) = 0 is = x which is the sample mean. Since dd l( ) > 0 if 0 < < x
and dd l( ) < 0 if > x then, by the …rst derivative test, l( ) has a absolute maximum at
= x.
If x = 0 then
n
L( ) = e for 0
which is a decreasing function of on the interval [0; 1). L( ) is maximized at the endpoint
= 0 or = 0 = x.
In all cases the value of which maximizes the likelihood function is the sample mean
= x. Therefore the maximum likelihood estimate of is ^ = x and the maximum
likelihood estimator is ~ = X.
assuming the function f (x; ) is reasonably smooth over the interval. More generally, sup-
pose x1 ; x2 ; : : : ; xn are the observations from a random sample from the distribution with
probability density function f (x; ) which have been rounded to the nearest which is
assumed to be small. Then
Q
n
n Q
n
P (X1 = x1 ; X2 = x2 ; : : : ; Xn = xn ; ) f (xi ; ) = f (xi ; )
i=1 i=1
6.2. MAXIMUM LIKELIHOOD METHOD 189
If we assume that the precision does not depend on the unknown parameter , then the
term n can be ignored. This argument leads us to adopt the following de…nition of the
likelihood function for a random sample from a continuous distribution.
6.2.8 Example
Suppose x1 ; x2 ; : : : ; xn is an observed random sample from the Exponential( ) distribution.
Find the likelihood function, the log likelihood function, the maximum likelihood estimate
of and the maximum likelihood estimator of .
Solution
The likelihood function is
Q
n Q
n 1
xi =
L( ) = f (xi ; ) = e
i=1 i=1
1 P
n
= n exp xi =
i=1
n nx=
= e for >0
with derivative
d 1 x
l( ) = n 2
d
n
= 2 (x )
Now dd l ( ) = 0 for = x. Since dd l( ) > 0 if 0 < < x and dd l( ) < 0 if > x then, by
the …rst derivative test, l( ) has a absolute maximum at = x. Therefore the maximum
likelihood estimate of is ^ = x and the maximum likelihood estimator is ~ = X.
6.2.9 Example
Suppose x1 ; x2 ; : : : ; xn is an observed random sample from the Uniform(0; ) distribution.
Find the likelihood function, the maximum likelihood estimate of and the maximum
likelihood estimator of .
190 6. MAXIMUM LIKELIHOOD ESTIMATION - ONE PARAMETER
Solution
The probability density function of a Uniform(0; ) random variable is
1
f (x; ) = for 0 x
and zero otherwise. The support set of the random variable X is [0; ] which depends on the
unknown parameter . In such examples care must be taken in determining the maximum
likelihood estimate of .
The likelihood function is
Q
n
L( ) = f (xi ; )
i=1
Q
n 1
= if 0 xi , i = 1; 2; : : : ; n, and >0
i=1
1
= n if 0 xi , i = 1; 2; : : : ; n, and >0
where x(n) = max (x1 ; x2 ; : : : ; xn ) is the maximum of the sample. To see this remember
that in order to observe the sample x1 ; x2 ; : : : ; xn the value of must be larger than all the
observed xi ’s.
L( ) is a decreasing function of on the interval [x(n) ; 1). Therefore L( ) is maximized
at = x(n) . The maximum likelihood estimate of is ^ = x(n) and the maximum likelihood
estimator is ~ = X(n) .
One of the reasons the method of maximum likelihood is so widely used is the invariance
property of the maximum likelihood estimate under one-to-one transformations.
Note: The invariance property of the maximum likelihood estimate means that if we know
the maximum likelihood estimate of then we know the maximum likelihood estimate of
any function of .
6.3. SCORE AND INFORMATION FUNCTIONS 191
6.2.11 Example
In Example 6.2.8 …nd the maximum likelihood estimate of the median of the distribution
and the maximum likelihood estimate of V ar(~).
Solution
If X has an Exponential( ) distribution then the median m is found by solving
Zm
1 x=
0:5 = e dx
0
to obtain
m= log (0:5)
By the Invariance of the Maximum Likelihood Estimate the maximum likelihood estimate
^ = ^ log (0:5) = x log (0:5).
of m is m
Since Xi has an Exponential( ) distribution with V ar (Xi ) = 2 , i = 1; 2; : : : ; n indepen-
dently, the variance of the maximum likelihood estimator ~ = X is
2
V ar ~ = V ar(X) =
n
By the Invariance of the Maximum Likelihood Estimate the maximum likelihood estimate
of V ar(~) is (^)2 =n = (x)2 =n.
d d
S( ) = S( ; x) = l( ) = log L ( ) for 2
d d
Another function which plays an important role in the method of maximum likelihood
is the information function.
192 6. MAXIMUM LIKELIHOOD ESTIMATION - ONE PARAMETER
In Section 6:7 we will see how the observed information I(^) can be used to construct
an approximate con…dence interval for the unknown parameter . I(^) also tells us about
the concavity of the log likelihood function l ( ).
6.3.3 Example
Find the observed information for Example 6.2.5. Suppose the maximum likelihood esti-
mate of was ^ = 2. Compare I(^) = I (2) if n = 10 and n = 25. Plot the function
r ( ) = l ( ) l(^) for n = 10 and n = 25 on the same graph.
Solution
From Example 6.2.5, the score function is
d n
S( ) = l ( ) = (x ) for >0
d
Therefore the information function is
d2 d h x i nx
l( ) = n 1 =
d 2 d 2
nx
= 2
If n = 10 then I(^) = I (2) = n=^ = 10=2 = 5. If n = 25 then I(^) = I (2) = 25=2 = 12:5.
See Figure 6.1. The function r ( ) = l ( ) l(^) is more concave and symmetric for n = 25
than for n = 10. As the number of observations increases we have more “information”
about the unknown parameter .
Although we view the likelihood, log likelihood, score and information functions as
functions of , they are also functions of the observed data x. When it is important to
emphasize the dependence on the data x we will write L( ; x); S( ; x); and I( ; x). When
we wish to determine the sampling distribution of the corresponding random variables we
will write L( ; X), S( ; X), and I( ; X).
6.3. SCORE AND INFORMATION FUNCTIONS 193
-1
n=10
-2
-3
n=25
r(θ)
-4
-5
-6
-7
-8
-9
-10
1 1.5 2 2.5 3 3.5
θ
Here is one more function which plays an important role in the method of maximum
likelihood.
J( ) = E [I( ; X)]
d2
= E l( ; X) for 2
d 2
where X is the potential data.
Note:
If X = (X1 ; X2 ; : : : ; Xn ) is a random sample from f (x; ) then
d2
J( ) = E l( ; X)
d 2
d2
= nE log f (X; ) for 2
d 2
6.3.5 Example
For each of the following …nd the observed information I(^) and the expected information
J ( ). Compare I(^) and J(^). Determine the mean and variance of the maximum likeli-
hood estimator ~. Compare the expected information with the variance of the maximum
likelihood estimator.
(a) Example 6.2.4 (Binomial)
(b) Example 6.2.5 (Poisson)
(c) Example 6.2.8 (Exponential)
Solution
(a) From Example 6.2.4, the score function based on x successes in n Bernoulli trials is
d x n
S( ) = l( ) = for 0 < <1
d (1 )
Therefore the information function is
d2 d x n x x n x
I( ) = 2l( ) = = 2
d d 1 (1 )2
x n x
= 2 + for 0 < < 1
(1 )2
(b) From Example 6.3.3 the information function based on Poisson data x1 ; x2 ; : : : ; xn is
nx
I( ) = 2 for >0
nX n
J( ) = E [I( ; X1 ; X2 ; : : : ; Xn )] = E 2 = 2 ( )
n
= for >0
E(~) = E X =
and
1
V ar(~) = V ar X = =
n J( )
(c) From Example 6.2.8 the score function based on Exponential data x1 ; x2 ; : : : ; xn is
d 1 x
S( ) = l( ) = n 2 for >0
d
Therefore the information function is
d2 d 1 x
I( ) = 2 l ( ) = nd 2
d
1 2x
= n 2 + 3 for > 0
196 6. MAXIMUM LIKELIHOOD ESTIMATION - ONE PARAMETER
2
Since Xi has a Exponential( ) distribution with E (Xi ) = and V ar (Xi ) = then
2
E X = and V ar X = =n. Therefore the expected information is
1 2X 1 2
J( ) = E [I( ; X1 ; X2 ; : : : ; Xn )] = nE 2 + 3 =n 2 + 2
n
= 2 for >0
E(~) = E X =
and
2
1
V ar(~) = V ar X = =
n J( )
1
In all three examples we have I(^) = J(^), E(~) = , and V ar(~) = [J ( )] .
In the three previous examples, we observed that E(~) = and therefore ~ was an
unbiased estimator of . This is not always true for maximum likelihood estimators as we
see in the next example. However, maximum likelihood estimators usually have other good
properties. Suppose ~n = ~n (X1 ; X2 ; : : : ; Xn ) is the maximum likelihood estimator based
on a sample of size n. If lim E(~n ) = then ~n is an asymptotically unbiased estimator of
n!1
. If ~n !p then ~n is called a consistent estimator of .
6.3.6 Example
Suppose x1 ; x2 ; : : : ; xn is an observed random sample from the distribution with probability
density function
f (x; ) = x 1 for 0 x 1 for > 0 (6.1)
(a) Find the score function, the maximum likelihood estimator, the information function,
the observed information, and the expected information.
Pn
(b) Show that T = log Xi Gamma n; 1
i=1
(c) Use (b) and 2.7.9 to show that ~ is not an unbiased estimator of . Show however that
~ is an aysmptotically unbiased estimator of .
(d) Show that ~ is a consistent estimator of .
6.3. SCORE AND INFORMATION FUNCTIONS 197
(e) Use (b) and 2.7.9 to …nd V ar(~). Compare V ar(~) with the expected information.
Solution
(a) The likelihood function is
Q
n Q
n
1
L( ) = f (xi ; ) = xi
i=1 i=1
1
n Q
n
= xi for >0
i=1
or more simply
n Q
n
L( ) = xi for >0
i=1
The log likelihood function is
P
n
l( ) = n log + log xi
i=1
= n log t for >0
P
n
where t = log xi . The score function is
i=1
d
S( ) = l( )
d
n
= t
1
= (n t) for >0
Now dd l ( ) = 0 for = n=t. Since dd l( ) > 0 if 0 < < n=t and dd l( ) < 0 if > n=t
then, by the …rst derivative test, l( ) has a absolute maximum at = n=t. Therefore the
maximum likelihood estimate of is ^ = n=t and the maximum likelihood estimator is
~ = n=T where T = P log Xi .
n
i=1
The information function is
d2
I( ) = l( )
d 2
n
= 2 for > 0
(b) From Exercise 2.6.12 we have that if Xi has the probability density function (6.1) then
1
Yi = log Xi Exponential for i = 1; 2; : : : ; n (6.2)
P
n P
n 1
T = log Xi = Yi Gamma n;
i=1 i=1
(n + p)
E (T p ) = p for p > n
(n)
= n
n 1
= 1 6=
1 n
lim E ~ = lim 1 =
n!1 n!1 1 n
T 1 Pn 1
= Yi !p
n n i=1
Since
2
2 (n 2)
E T = 2 =
(n) (n 1) (n 2)
then
n o
2
V ar(~) = n2 E T 2
E T 1
" #
2 2
2
= n
(n 1) (n 2) n 1
(n 1) (n 2)
= n2 2
(n 1)2 (n 2)
2
=
1 2
1 n (n 2)
2
We note that V ar ~ 6= n = 1
J( ) , however for large n, V ar ~ 1
J( ) .
6.3.7 Example
Suppose x1 ; x2 ; : : : ; xn is an observed random sample from the Weibull( ; 1) distribution
with probability density function
1 x
f (x; ) = x e for x > 0; >0
Find the score function and the information function. How would you …nd the maximum
likelihood estimate of ?
Solution
The likelihood function is
Q
n Q
n
1 xi
L( ) = f (xi ; ) = xi e
i=1 i=1
1
n Q
n P
n
= xi exp xi for >0
i=1 i=1
200 6. MAXIMUM LIKELIHOOD ESTIMATION - ONE PARAMETER
or more simply
n Q
n P
n
L( ) = xi exp xi for >0
i=1 i=1
P
n
where t = log xi .
i=1
Notice that S( ) = 0 cannot be solved explicitly. The maximum likelihood estimate can
only be determined numerically for a given sample of data x1 ; x2 ; : : : ; xn . Since
d n P
n
S( )= 2 + xi (log xi )2 for >0
d i=1
is negative for all values of > 0 then we know that the function S ( ) is always decreasing
and therefore there is only one solution to S( ) = 0. The solution to S( ) = 0 gives the
maximum likelihood estimate.
The information function is
d2
I( ) = l( )
d 2
n Pn
= 2 + xi (log xi )2 for >0
i=1
To illustrate how to …nd the maximum likelihood estimate for a given sample of data, we
randomly generate 20 observations from the Weibull( ; 1) distribution. To do this we use the
result of Example 2.6.7 in which we showed that if u is an observation from the Uniform(0; 1)
distribution then x = [ log (1 u)]1= is an observation from the Weibull( ; 1) distribution.
The following R code generates the data, plots the likelihood function, …nds ^ by solving
S( ) = 0 using the R function uniroot, and determines S(^) and the observed information
I(^).
6.3. SCORE AND INFORMATION FUNCTIONS 201
0:01 0:01 0:05 0:07 0:10 0:11 0:23 0:28 0:44 0:46
0:64 1:07 1:16 1:25 2:40 3:03 3:65 5:90 6:60 30:07
4e-20
0e+00 2e-20
Note that the interval [0:25; 0:75] used to graph the likelihood function was determined
by trial and error. The values lower=0.4 and upper=0.6 used for uniroot were determined
from the graph of the likelihood function. From the graph is it easy to see that the value
of ^ lies in the interval [0:4; 0:6].
Newton’s Method, which is a numerical method for …nding the roots of an equation usually
discussed in …rst year calculus, is a method which can be used for …nding the maximum like-
lihood estimate. Newton’s Method usually works quite well for …nding maximum likelihood
estimates because the likelihood function is often very quadratic in shape.
6.3. SCORE AND INFORMATION FUNCTIONS 203
(i)
(i+1) (i) S( )
= + (i)
for i = 0; 1; : : :
I( )
Notes:
(0)
(1) The initial estimate, , may be determined by graphing L ( ) or l ( ).
(2) The algorithm is usually run until the value of (i) no longer changes to a reasonable
number of decimal places. When the algorithm is stopped it is always important to check
that the value of obtained does indeed maximize L ( ).
(3) This algorithm is also called the Newton-Raphson Method.
(4) I ( ) can be replaced by J ( ) for a similar algorithm which is called the Method of
Scoring.
(5) If the support set of X depends on (e.g. Uniform(0; )) then ^ is not found by solving
S( ) = 0.
6.3.9 Example
Solution
Here is R code for Newton’s Method for the Weibull Example
# Newton’s Method for Weibull Example
NewtonWB<-function(th,x)
{thold<-th
thnew<-th+0.1
while (abs(thold-thnew)>0.00001)
{thold<-thnew
thnew<-thold+WBSF(thold,x)/WBIF(thold,x)
print(thnew)}
return(thnew)}
#
thetahat<-NewtonWB(0.2,x)
Newton’s Method converges after four iterations and the value of thetahat returned is
^ = 0:4951605 which is the same value to six decimal places as was obtained above using
the uniroot function.
204 6. MAXIMUM LIKELIHOOD ESTIMATION - ONE PARAMETER
L( )
R( ) = R ( ; x) = for 2
L(^)
The relative likelihood function takes on values between 0 and 1 and can be used to rank
parameter values according to their plausibilities in light of the observed data. If R( 1 ) =
0:1, for example, then 1 is rather an implausible parameter value because the data are
ten times more probable when = ^ than they are when = 1 . However, if R( 1 ) = 0:5,
then 1 is a fairly plausible value because it gives the data 50% of the maximum possible
probability under the model.
The set of values for which R( ) p is called a 100p% likelihood interval for .
Values of inside a 10% likelihood interval are referred to as plausible values in light of the
observed data. Values of outside a 10% likelihood interval are referred to as implausible
values given the observed data. Values of inside a 50% likelihood interval are very plausible
and values of outside a 1% likelihood interval are very implausible in light of the data.
The log relative likelihood function is the natural logarithm of the relative likelihood func-
tion:
r( ) = r ( ; x) = log[R( )] for 2
6.4.4 Example
Plot the relative likelihood function for in Example 6.3.7. Find 10% and 50% likelihood
intervals for .
Solution
Here is R code to plot the relative likelihood function for the Weibull Example with lines
for determining 10% and 50% likelihood intervals for as well as code to determine these
intervals using uniroot.
The R function WBRLF uses the R function WBLF from Example 6.3.7.
# function for calculating Weibull relative likelihood function
WBRLF<-function(th,thetahat,x)
{R<-WBLF(th,x)/WBLF(thetahat,x)
return(R)}
#
# plot the Weibull relative likelihood function
th<-seq(0.25,0.75,0.01)
R<-sapply(th,WBRLF,thetahat,x)
plot(th,R,"l",xlab=expression(theta),
ylab=expression(paste("R(",theta,")")),lwd=3)
# add lines to determine 10% and 50% likelihood intervals
abline(a=0.10,b=0,col="red",lwd=2)
abline(a=0.50,b=0,col="blue",lwd=2)
#
# use uniroot to determine endpoints of 10%, 15%, and 50% likelihood intervals
uniroot(function(th) WBRLF(th,thetahat,x)-0.1,lower=0.3,upper=0.4)$root
uniroot(function(th) WBRLF(th,thetahat,x)-0.1,lower=0.6,upper=0.7)$root
uniroot(function(th) WBRLF(th,thetahat,x)-0.5,lower=0.35,upper=0.45)$root
uniroot(function(th) WBRLF(th,thetahat,x)-0.5,lower=0.55,upper=0.65)$root
uniroot(function(th) WBRLF(th,thetahat,x)-0.15,lower=0.3,upper=0.4)$root
uniroot(function(th) WBRLF(th,thetahat,x)-0.15,lower=0.6,upper=0.7)$root
1.0
0.8
0.6
R(θ)
0.4
0.2
0.0
6.4.5 Example
Suppose x1 ; x2 ; : : : ; xn is an observed random sample from the Uniform(0; ) distribution.
Plot the relative likelihood function for if n = 20 and x(20) = 1:5. Find 10% and 50%
likelihood intervals for .
Solution
From Example 6.2.9 the likelihood function for n = 10 and ^ = x(10) = 0:5 is
8
<0 if 0 < < 0:5
L( ) =
: 1 if 0:5
10
is graphed in Figure 6.4 along with lines for determining 10% and 50% likelihood intervals.
To determine the value of at which the horizontal line R = p intersects the graph of
10
R( ) we solve 0:5 = p to obtain = 0:5p 1=10 . Since R( ) = 0 if 0 < < 0:5 then a
100p% likelihood interval for is of the form 0:5; 0:5p 1=10 .
6.4. LIKELIHOOD INTERVALS 207
0.9
0.7
R(θ)
0.5
0.3
0.1
0
0 0.2 0.4 0.6 0.8 1
θ
More generally for an observed random sample x1 ; x2 ; : : : ; xn from the Uniform(0; ) distri-
bution a 100p% likelihood interval for will be of the form x(n) ; x(n) p 1=n .
6.4.6 Exercise
Suppose x1 ; x2 ; : : : ; xn is an observed random sample from the Two Parameter Exponential(1; )
distribution. Plot the relative likelihood function for if n = 12 and x(1) = 2. Find 10%
and 50% likelihood intervals for .
208 6. MAXIMUM LIKELIHOOD ESTIMATION - ONE PARAMETER
~n !p (6.3)
L( ; Xn ) 2
2 log R( ; Xn ) = 2 log !D W (1) (6.5)
L(~n ; Xn )
for each 2 .
The proof of this result which depends on applying Taylor’s Theorem to the score
function is beyond the scope of this course. The regularity conditions are a bit complicated
but essentially they are a set of conditions which ensure that the error term in Taylor’s
Theorem goes to zero as n ! 1. One of the conditions is that the support set of f (x; ) does
not depend on . Therefore, for example, this theorem cannot be applied to the maximum
likelihood estimator in the case of a random sample from the Uniform(0; ) distribution.
6.5. LIMITING DISTRIBUTION OF MAXIMUM LIKELIHOOD ESTIMATOR 209
This is actually not a problem since the distribution of the maximum likelihood estimator
in this case can be determined exactly.
Since (6.3) holds ~n is called a consistent estimator of .
Theorem 6.5.1 implies that for su¢ ciently large n, ~n has an approximately N( ; 1=J( ))
distribution. Therefore for large n
E(~n )
and the maximum likelihood estimator is an asymptotically unbiased estimator of .
Since ~n has an approximately N( ; 1=J( )) distribution this also means that for su¢ -
ciently large n
1
V ar(~n )
J( )
1=J( ) is called the asymptotic variance of ~n . Of course J( ) is unknown because is
unknown. By (6.3), (6.4), and Slutsky’s Theorem
q
~
( n ) J(~n ) !D Z N(0; 1) (6.6)
which implies that the asymptotic variance of ~n can be estimated using 1=J(^n ). Therefore
for su¢ ciently large n we have
1
V ar(~n )
J(^n )
By the Weak Law of Large Numbers
1 1 Pn d2 d2
I( ; Xn ) = l( ; Xi ) !p E l( ; Xi ) (6.7)
n n i=1 d 2 d 2
Therefore by (6.3), (6.4), (6.7) and the Limit Theorems it follows that
q
~
( n ) I(~n ; Xn ) !D Z N(0; 1) (6.8)
which implies the asymptotic variance of ~n can be also be estimated using 1=I(^n ) where
I(^n ) is the observed information. Therefore for su¢ ciently large n we have
1
V ar(~n )
I(^n )
Results (6.6), (6.8) and (6.5) can be used to construct approximate con…dence intervals
for .
In Chapter 8 we will see how result (6.5) can be used in a test of hypothesis.
Although we will not prove Theorem 6.5.1 in general we can prove the results in a par-
ticular case. The following example illustrates how techniques and theorems from previous
chapters can be used together to obtained the results of interest. It is also a good review
of several ideas covered thus far in these Course Notes.
210 6. MAXIMUM LIKELIHOOD ESTIMATION - ONE PARAMETER
6.5.2 Example
2 4
(a) Suppose X Weibull(2; ). Show that E X 2 = and V ar X 2 = .
(b) Suppose X1 ; X2 ; : : : ; Xn is a random sample from the Weibull(2; ) distribution. Find
the maximum likelihood estimator ~ of , the information function I ( ), the observed
information I(^), and the expected information J ( ).
(c) Show that
~n !p
Solution
(a) From Exercise 2.7.9 we have
k
E Xk = k
+1 for k = 1; 2; : : : (6.9)
2
if X Weibull(2; ). Therefore
2
E X2 = 2
+1 = 2
2
4
E X4 = 4
+1 =2 4
2
and h i
2 2 2 2
V ar X 2 = E X2 E X2 =2 4
= 4
Q
n Q
n 2x e (xi = )2
i
L( ) = f (xi ; ) = 2
i=1 i=1
2n Q
n 1 P
n
= 2xi exp 2 x2i for >0
i=1 i=1
or more simply
2n t
L( ) = exp 2 for >0
P
n
where t = x2i . The log likelihood function is
i=1
t
l( ) = 2n log 2 for >0
d t 1=2 d t 1=2 d
Now d l( ) = 0 for = n . Since d l( ) > 0 if 0 < < n and d l( ) < 0 if
t 1=2 1=2
> n by the …rst derivative test, l( ) has a absolute maximum at = nt
then, .
1=2
Therefore the maximum likelihood estimate of is ^ = n t
. The maximum likelihood
estimator is
1=2
~= T Pn
where T = Xi2
n i=1
The information function is
d2
I( ) = l( )
d 2
6T 2n
= 4 2 for >0
P
n
E (T ) = E Xi2 =n 2
i=1
4n
= 2 for >0
Since X12 ; X22 ; : : : ; Xn2 are independent and identically distributed random variables with
E Xi2 = 2 and V ar Xi2 = 4 for i = 1; 2; : : : ; n then by the Weak Law of Large
Numbers
T 1 Pn
= X 2 !p 2
n n i=1 i
and by 5.5.1(1)
1=2
~= T
!p (6.10)
n
as required.
212 6. MAXIMUM LIKELIHOOD ESTIMATION - ONE PARAMETER
Since X12 ; X22 ; : : : ; Xn2 are independent and identically distributed random variables with
E Xi2 = 2 and V ar Xi2 = 4 for i = 1; 2; : : : ; n then by the Central Limit Theorem
p T 2
n n
2 !D Z N(0; 1) (6.11)
p 2 d 1 1
Let g (x) = x and a = . Then dx g (x) = p
2 x
and g 0 (a) = 2 . By (6.11) and the Delta
Method we have q
p p
T 2
n n 1 1
2 !D Z N 0;
2 4 2
or
p q
T
n n
(2 ) 2 (6.12)
p " 1=2
#
2 n T
= !D Z N(0; 1) (6.13)
n
as required.
6.5.3 Example
Suppose X1 ; X2 ; : : : ; Xn is a random sample from the Uniform(0; ) distribution. Since
the support set of Xi depends on Theorem 6.5.1 does not hold. Show however that the
maximum likelihood estimator ~n = X(n) is a consistent estimator of .
Solution
In Example 5.1.5 we showed that ~n = max (X1 ; X2 ; : : : ; Xn ) !p and therefore ~n is a
consistent estimator of .
6.6. CONFIDENCE INTERVALS 213
then [a(x); b(x)] is called a 100p% con…dence interval for where x are the observed data.
Pivotal quantities can be used for constructing con…dence intervals in the following way.
Since the distribution of Q(X; ) is known we can write down a probability statement of
the form
P (q1 Q(X; ) q2 ) = p
where q1 and q2 do not depend on . If Q is a monotone function of then this statement
can be rewritten as
P [A(X) B(X)] = p
and the interval [a(x); b(x)] is a 100p% con…dence interval.
6.6.3 Example
Suppose X = (X1 ; X2 ; : : : ; Xn ) is a random sample from the Exponential( ) distribution.
Determine the distribution of
Pn
2 Xi
i=1
Q(X; ) =
and thus show Q(X; ) is a pivotal quantity. Show how Q(X; ) can be used to construct
a 100p% equal tail con…dence interval for .
214 6. MAXIMUM LIKELIHOOD ESTIMATION - ONE PARAMETER
Solution
Since X = (X1 ; X2 ; : : : ; Xn ) is a random sample from the Exponential( ) distribution then
by 4.3.2(4)
Pn
Y = Xi Gamma (n; )
i=1
Since P (a W b) = p then
0 P
n 1
2 Xi
B C
PB
@a
i=1
bC
A=p
or 0 P 1
n P
n
2 X 2 Xi
B i=1 i C
B
P@ i=1 C=p
b a A
Therefore 2 P 3
n Pn
2 xi 2 xi
6 i=1 7
6 i=1 7
4 b ; a 5
The following theorem gives the pivotal quantity in the case in which is either a location
or scale parameter.
6.6.4 Theorem
Let X = (X1 ; X2 ; : : : ; Xn ) be a random sample from f (x; ) and let ~ = ~(X) be the
maximum likelihood estimator of the scalar parameter based on X.
(1) If is a location parameter of the distribution then Q(X; ) = ~ is a pivotal quantity.
(2) If is a scale parameter of the distribution then Q(X; ) = ~= is a pivotal quantity.
6.6. CONFIDENCE INTERVALS 215
6.6.5 Example
Suppose X = (X1 ; X2 ; : : : ; Xn ) is a random sample from the Weibull(2; ) distribution. Use
Theorem 6.6.4 to …nd a pivotal quantity Q(X; ). Show how the pivotal quantity can be
used to construct a 100p% equal tail con…dence interval for .
Solution
From Chapter 2, Problem 3(a) we know that is a scale parameter for the Weibull(2; )
distribution. From Example 6.5.2 the maximum likelihood estimator of is
1=2 1=2
~= T 1 Pn
= X2
n n i=1 i
P
n
This form suggests looking for the distribution of Xi2 which is a sum of independent and
i=1
identically distributed random variables X12 ; X22 ; : : : ; Xn2 .
From Chapter 2, Problem 7(h) we have that if Xi Weibull(2; ), i = 1; 2; : : : ; n then
P
n
Xi2 Gamma n; 2
i=1
Now !2
~
Q1 (X; ) = 2n = 2n [Q(X; )]2
is a one-to-one function of Q(X; ). To see that Q1 (X; ) and Q(X; ) generate the same
con…dence intervals for we note that
P (a
Q1 (X; ) b)
0 !2 1
~
= P @a 2n bA
" #
~ 1=2
a 1=2 b
= P
2n 2n
" #
1=2
a 1=2 b
= P Q(X; )
2n 2n
To construct a 100p% equal tail con…dence interval we choose a and b such that
1 p 1 p
P (W a) = and P (W b) =
2 2
where W 2 (2n). Since
" #
~ 1=2
a 1=2 b
p = P
2n 2n
" #
1=2 1=2
2n 2n
= P ~ ~
b a
1=2
P
n
where ^ = 1
n x2i
i=1
6.6.6 Example
Suppose X = (X1 ; X2 ; : : : ; Xn ) is a random sample from the Uniform(0; ) distribution.
Use Theorem 6.6.4 to …nd a pivotal quantity Q(X; ). Show how the pivotal quantity can
be used to construct a 100p% con…dence interval for of the form [^; a^].
6.6. CONFIDENCE INTERVALS 217
Solution
From Chapter 2, Problem 3(b) we know that is a scale parameter for the Uniform(0; )
distribution. From Example 6.2.9 the maximum likelihood estimator of is
~ = X(n)
= P X(n) q
Qn
= P (Xi q )
i=1
Q
n x
= q since P (Xi x) = for 0 x
i=1
n
= q for 0 q 1
To construct a 100p% con…dence interval for of the form [^; a^] we need to choose a such
that
p = P ~ a~
!
1 ~
= P 1 a =P 1
~ a
1
= P Q (X; ) 1
a
1
= P (Q (X; ) 1) P Q (X; )
a
1
= 1 P Q (X; )
a
n
= 1 a
1=n
or a = (1 p) . The 100p% con…dence interval for is
h i
^; (1 p) 1=n ^
where ^ = x(n) .
218 6. MAXIMUM LIKELIHOOD ESTIMATION - ONE PARAMETER
In your previous statistics course, approximate con…dence intervals for the Binomial and
Poisson distribution were justi…ed using a Central Limit Theorem argument. We are now
able to clearly justify the asymptotic pivotal quantity using the theorems of Chapter 5.
6.7.2 Example
Suppose Xn = (X1 ; X2 ; : : : ; Xn ) is a random sample from the Poisson( ) distribution. Show
that p
n Xn
Q(Xn ; ) = p
Xn
is an asymptotic pivotal quantity. Show how Q(Xn ; ) can be used to construct an approx-
imate 100p% equal tail con…dence interval for .
Solution
In Example 5.5.4, the Weak Law of Large Numbers, the Central Limit Theorem and Slut-
sky’s Theorem were all used to prove
p
n Xn
Q(Xn ; ) = p !D Z N(0; 1) (6.14)
Xn
and thus Q(Xn ; ) is an asymptotic pivotal quantity.
Let a be the value such that P (Z a) = (1 + p) =2 where Z N(0; 1). Then by (6.14) we
have
p !
n Xn
p P a p a
Xn
r r !
Xn Xn
= P Xn a Xn a
n n
or 2 s s 3
^n ^n
4^n a ; ^n + a 5 (6.15)
n n
since ^n = xn .
6.7.3 Exercise
Suppose Xn Binomial(n; ). Show that
p Xn
n n
Q(Xn ; ) = q
Xn Xn
n 1 n
is an asymptotic pivotal quantity. Show that an approximate 100p% equal tail con…dence
interval for based on Q(Xn ; ) is given by
2 s s 3
^n (1 ^n ) ^n (1 ^n )
4^n a ; ^n + a 5 (6.16)
n n
where ^n = xn
n .
6.7.5 Example
Use the results from Example 6.3.5 to determine the approximate 100p% con…dence intervals
based on (6.17) and (6.18) in the case of Binomial data and Poisson data. Compare these
intervals with the intervals in (6.16) and (6.15).
Solution
From Example 6.3.5F we have that for Binomial data
n
I(^n ) = J(^n ) =
^n (1 ^n )
6.7.6 Example
For Example 6.3.7 construct an approximate 95% con…dence interval based on (6.18). Com-
pare this with the 15% likelihood interval determined in Example 6.4.4.
Solution
From Example 6.3.7 we have ^ = 0:4951605 and I(^) = 181:8069. Therefore an approximate
95% con…dence interval based on (6.18) is
s
^ 1:96 1
I(^)
r
1
= 0:4951605 1:96
181:8069
= 0:4951605 0:145362
= [0:3498; 0:6405]
From Example 6.4.4 the 15% likelihood interval is [0:3550; 0:6401] which is very similar. We
expect this to happen since the relative likelihood function in Figure 6.3 is very symmetric.
6.7.8 Theorem
If a is a value such othat p = 2P (Z a) 1 where Z N (0; 1), then the likelihood interval
n 2
: R( ) e a =2 is an approximate 100p% con…dence interval.
Proof
By Theorem 6.5.1
L( ; Xn ) 2
2 log R( ; Xn ) = 2 log !D W (1) (6.19)
L(~n ; Xn )
where
n Xn = (X1 ;oX2 ; : : : ; Xn ). The con…dence coe¢ cient corresponding to the interval
2
: R( ) e a =2 is
L( ; Xn ) a2 =2
P e = P 2 log R( ; Xn ) a2
L(~n ; Xn )
P W a2 where W 2
(1) by 6.19
= 2P (Z a) 1 where Z N (0; 1)
= p
as required.
222 6. MAXIMUM LIKELIHOOD ESTIMATION - ONE PARAMETER
6.7.9 Example
Since
0:95 = 2P (Z 1:96) 1 where Z N (0; 1)
and
(1:96)2 =2 1:9208
e =e 0:1465 0:15
therefore a 15% likelihood interval for is also an approximate 95% con…dence interval for
.
6.7.10 Exercise
(a) Show that a 1% likelihood interval is an approximate 99:8% con…dence interval.
(b) Show that a 50% likelihood interval is an approximate 76% con…dence interval.
Note that while the con…dence intervals given by (6.17) or (6.18) are symmetric about the
point estimate ^n , this is not true in general for likelihood intervals.
6.7.11 Example
For Example 6.3.7 compare a 15% likelihood interval with the approximate 95% con…dence
interval in Example 6.7.6.
Solution
From Example 6.3.7 the 15% likelihood interval is
[0:3550; 0:6401]
[0:3498; 0:6405]
These intervals are very close and agree to 2 decimal places. The reason for this is because
the likelihood function (see Figure 6.3) is very symmetric about the maximum likelihood
estimate. The approximate intervals (6.17) or (6.18) will be close to the corresponding
likelihood interval whenever the likelihood function is reasonably symmetric about the
maximum likelihood estimate.
6.7.12 Exercise
Suppose x1 ; x2 ; : : : ; xn is an observed random sample from the Logistic( ; 1) distribution
with probability density function
e (x )
f (x; ) = for x 2 <; 2<
1+e (x ) 2
6.7. APPROXIMATE CONFIDENCE INTERVALS 223
(a) Find the likelihood function, the score function, and the information function. How
would you …nd the maximum likelihood estimate of ?
(b) Show that if u is an observation from the Uniform(0; 1) distribution then
1
x= log 1
u
(a) Find the score function and the maximum likelihood estimator of .
(b) Find the observed information and the expected information.
(c) Find the maximum likelihood estimator of E(Xi ).
P
20
(d) If n = 20 and xi = 40 then …nd the maximum likelihood estimate of and a
i=1
15% likelihood interval for . Is = 0:5 a plausible value of ? Why?
(a) Find the score function and the maximum likelihood estimator of .
(b) Find the observed information and the expected information.
(c) Find the maximum likelihood estimator of E(Xi ).
P
20
(d) If n = 20 and log xi = 10 …nd the maximum likelihood estimate of and a
i=1
15% likelihood interval for . Is = 0:1 a plausible value of ? Why?
(e) Show that
P
n
Q(X; ) = 2 log Xi
i=1
6. The following model is proposed for the distribution of family size in a large popula-
tion:
k
P (k children in family; ) = for k = 1; 2; : : :
1 2
P (0 children in family; ) =
1
The parameter is unknown and 0 < < 12 . Fifty families were chosen at random
from the population. The observed numbers of children are given in the following
table:
No. of children 0 1 2 3 4 Total
Frequency observed 17 22 7 3 1 50
(a) Find the likelihood, log likelihood, score and information functions for .
(b) Find the maximum likelihood estimate of and the observed information.
(c) Find a 15% likelihood interval for .
(d) A large study done 20 years earlier indicated that = 0:45. Is this value plausible
for these data?
(e) Calculate estimated expected frequencies. Does the model give a reasonable …t
to the data?
P (~ q) = 1 e nq
for q 0
P
n
2 (n).
(d) Use moment generating functions to show that Q = 2 Xi If n = 20
i=1
P
20
and xi = 6, use the pivotal quantity Q to construct an exact 95% equal tail
i=1
con…dence interval for . Is = 0:7 a plausible value of ?
h i1=2
(e) Verify that (~n ) J(~n ) !D Z N(0; 1). Use this asymptotic pivotal
quantity to construct an approximate 95% con…dence interval for . Compare
this interval with the exact con…dence interval from (d) and a 15% likelihood
interval for . What do the approximate con…dence interval and the likelihood
interval indicate about the plausibility of the value = 0:7?
n
P (Y = y) = (1 e )y (e )n y
for y = 0; 1; : : : ; n
y
In this chapter we look at the method of maximum likelihood to obtain both point and
interval estimates for the case in which the unknown parameter is a vector of unknown
parameters = ( 1 ; 2 ; : : : ; k ). In your previous statistics course you would have seen the
N ; 2 model with two unknown parameters = ; 2 and the simple linear regression
model N + x; 2 with three unknown parameters = ; ; 2 .
Although the case of k parameters is a natural extension of the one parameter case, the
k parameter case is more challenging. For example, the maximum likelihood estimates are
usually found be solving k nonlinear equations in k unknowns 1 ; 2 ; : : : ; k . In most cases
there are no explicit solutions and the maximum likelihood estimates must be found using a
numerical method such as Newton’s Method. Another challenging issue is how to summarize
the uncertainty in the k estimates. For one parameter it is straightforward to summarize
the uncertainty using a likelihood interval or a con…dence interval. For k parameters these
intervals become regions in <k which are di¢ cult to visualize and interpret.
In Section 7:1 we give all the de…nitions related to …nding the maximum likelihood
estimates for k unknown parameters. These de…nitions are analogous to the de…nitions
which were given in Chapter 6 for one unknown parameter. We also give the extension
of the invariance property of maximum likelihood estimates and Newton’s Method for k
variables. In Section 7:2 we de…ne likelihood regions and show how to …nd them for the
case k = 2.
In Section 7:3 we introduce the Multivariate Normal distribution which is the natural
extension of the Bivariate Normal distribution discussed in Section 3.10. We also give the
limiting distribution of the maximum likelihood estimator of which is a natural extension
of Theorem 6.5.1.
In Section 7:4 we show how to obtain approximate con…dence regions for based on
the limiting distribution of the maximum likelihood estimator of and show how to …nd
them for the case k = 2. We also show how to …nd approximate con…dence intervals for
individual parameters and indicate that these intervals must be used with care.
227
228 7. MAXIMUM LIKELIHOOD ESTIMATION - MULTIPARAMETER
L( ) = L ( ; x)
Qn
= f (xi ; ) for 2
i=1
As in the case of k = 1 it is frequently easier to work with the log likelihood function which
is maximized at the same value of as the likelihood function.
l( ) = l ( ; x) = log L( ) for 2
where x are the observed data and log is the natural logarithmic function.
We will see that, as in the case of one parameter, the information matrice will provides
information about the variance of the maximum likelihood estimator.
@ 2 l( )
for 2
@ i@ j
where x are the observed data. I(^) is called the observed information matrix.
@ 2 l( ; X)
E for 2
@ i@ j
The invariance property of the maximum likelihood estimator also holds in the multipara-
meter case.
7.1.8 Example
Suppose x1 ; x2 ; : : : ; xn is a observed random sample from the N( ; 2 ) distribution. Find the
score vector, the information matrix, the expected information matrix and the maximum
likelihood estimator of = ; 2 . Find the observed information matrix I ^ ; ^ 2 and thus
verify that ^ ; ^ 2 is the maximum likelihood estimator of ; 2 . What is the maximum
likelihood estimator of the parameter = ; 2 = = which is called the coe¢ cient of
variation?
230 7. MAXIMUM LIKELIHOOD ESTIMATION - MULTIPARAMETER
Solution
The likelihood function is
Q
n 1 1
L ; 2
= p exp 2
(xi )2
i=1 2 2
n=2 n=2 1 P
n
= (2 ) 2
exp 2
(xi )2 for 2 <, 2
>0
2 i=1
or more simply
n=2 1 P
n
L ; 2
= 2
exp 2
(xi )2 for 2 <, 2
>0
2 i=1
n 1 P
n
l ; 2
= log 2
2
(xi )2
2 2 i=1
n 1 1 P
n
= log 2 2
(xi x)2 + n (x )2
2 2 i=1
n 1 h i
1
= log 2 2
(n 1) s2 + n (x )2 for 2 <, 2
>0
2 2
where
1 P
n
s2 = (xi x)2
n 1 i=1
Now
@l n 2 1
= 2 (x )=n (x )
@
and h i
@l n 1 1 2
= 2
+ 2
(n 1) s2 + n (x )2
@ 2 2 2
The equations
@l @l
= 0; =0
@ @ 2
are solved simultaneously for
1 Pn (n 1)
= x and 2
= (xi x)2 = s2
n i=1 n
Since
@2l n @2l n (x )
= ; =
@ 2 2 @ @ 2 4
@2l n 1 1 h i
= + (n 1) s2 + n (x )2
@( 2 )2 2 4 6
7.1. LIKELIHOOD AND RELATED FUNCTIONS 231
Since
n n2
I11 ^ ; ^ 2 = > 0 and det I ^ ; ^ 2
= >0
^2 2^ 6
then by the second derivative test the maximum likelihood estimates of of and 2 are
1 Pn (n 1)
^ = x and ^ 2 = (xi x)2 = s2
n i=1 n
and the maximum likelihood estimators are
1 Pn
2 (n 1)
~ = X and ~ 2 = Xi X = S2
n i=1 n
The observed information is 2 3
n
^2
0
I ^; ^2 = 4 n
5
0 2^ 4
Now " #
n n n X
E 2
= 2
; E 4
=0
Also
n 1 1 h 2
i
E + (n 1) S 2 + n X
2 4 6
n 1 1 n 2
h
2
io
= + (n 1) E(S ) + nE X
2 4 6
n 1 1
= + 6 (n 1) 2 + 2
2 4
n
=
2 4
since h i 2
2
E X = 0; E X = V ar X = and E S 2 = 2
n
Therefore the expected information matrix is
" n
#
2 0
2
J ; = n
0 2 4
Note that
2
V ar X =
n
1 Pn 2(n 1) 4 2 4
2
V ar ^ 2 = V ar Xi X =
n i=1 n2 n
and
1 Pn
2
Cov(X; ^ 2 ) = Cov(X; Xi X )=0
n i=1
P
n
2
since X and Xi X are independent random variables.
i=1
Recall from your previous statistics course that inferences for and 2 are made using
the independent pivotal quantities
X (n 1)S 2 2
p t (n 1) and 2
(n 1)
S= n
0.8
0.6
0.4
0.2
0
8
6 6
4 5.5
5
2 4.5
2 0 4
σ µ
7.1.9 Exercise
Suppose Yi N( + xi ; 2 ), i = 1; 2; : : : ; n independently where the xi are known constants.
Show that the maximum likelihood estimators of , and 2 are given by
~ = Y ^x
Pn
(xi x) Yi Y
~ = i=1
Pn
(xi x)2
i=1
2 1 Pn
~ xi )2
~ = (Yi ~
n i=1
7.1.10 Example
Suppose x1 ; x2 ; : : : ; xn is an observed random sample from the Beta(a; b) distribution with
probability density function
(a + b) a
f (x; a; b) = x 1
(1 x)b 1
for 0 < x < 1, a > 0; b > 0
(a) (b)
Find the likelihood function, the score vector, and the information matrix and the expected
information matrix. How would you …nd the maximum likelihood estimates of a and b?
Solution
The likelihood function is
Q
n (a + b) a
L(a; b) = x 1
(1 xi )b 1
for a > 0; b > 0
i=1 (a) (b) i
n a 1 b 1
(a + b) Q
n Q
n
= xi (1 xi )
(a) (b) i=1 i=1
or more simply
n a b
(a + b) Q
n Q
n
L(a; b) = xi (1 xi ) for a > 0; b > 0
(a) (b) i=1 i=1
l(a; b) = n [log (a + b) log (a) log (b) + at1 + bt2 ] for a > 0; b > 0
where
1 Pn 1 Pn
t1 = log xi and t2 = log(1 xi )
n i=1 n i=1
234 7. MAXIMUM LIKELIHOOD ESTIMATION - MULTIPARAMETER
Let
d log (z) 0 (z)
(z) = =
dz (z)
which is called the digamma function.
The score vector is
h i
@l @l
S (a; b) = @a @b
h i
= n (a + b) (a) + t1 (a + b) (b) + t2
for a > 0; b > 0. S (a; b) = (0; 0) must be solved numerically to …nd the maximum likelihood
estimates of a and b.
Let
0 d
(z) = (z)
dz
which is called the trigamma function.
The information matrix is
" #
0 (a) 0 (a + b) 0 (a + b)
I(a; b) = n 0 (a 0 (b) 0 (a
+ b) + b)
7.1.11 Exercise
Note: The initial estimate, (0) , may be determined by calculating L ( ) for a grid of
values to determine the region in which L ( ) obtains a maximum.
7.1. LIKELIHOOD AND RELATED FUNCTIONS 235
7.1.13 Example
Use the following R code to randomly generate 35 observations from a Beta(a; b) distribution
# randomly generate 35 observations from a Beta(a,b)
set.seed(32086689) # set the seed so results can be reproduced
# use randomly generated a and b values
truea<-runif(1,min=2,max=3)
trueb<-runif(1,min=1,max=4)
# data are sorted and rounded to two decimal places for easier display
x<-sort(round(rbeta(35,truea,trueb),2))
x
a; ^b).
Use Newton’s Method and R to …nd (^
a; ^b) and I(^
What are the values of S(^ a; ^b)?
Solution
The generated data are
0:08 0:19 0:21 0:25 0:28 0:29 0:29 0:30 0:30 0:32
0:34 0:36 0:39 0:45 0:45 0:47 0:48 0:49 0:54 0:54
0:55 0:55 0:56 0:56 0:61 0:63 0:64 0:65 0:69 0:69
0:73 0:77 0:79 0:81 0:85
The maximum likelihood estimates of a and b can be found using Newton’s Method given
by " # " #
a(i+1) a(i) h i 1
(i) (i) (i) (i)
= + S(a ; b ) I(a ; b )
b(i+1) b(i)
for i = 0; 1; ::: until convergence.
Here is R code for Newton’s Method for the Beta Example.
# function for calculating Beta score for a and b and data x
BESF<-function(a,b,x)
{S<-length(x)*c(digamma(a+b)-digamma(a)+mean(log(x)),
digamma(a+b)-digamma(b)+mean(log(1-x))))
return(S)}
#)
# function for calculating Beta information for a and b)
BEIF<-function(a,b)
{I<-length(x)*cbind(c(trigamma(a)-trigamma(a+b),-trigamma(a+b)),
c(-trigamma(a+b),trigamma(b)-trigamma(a+b))))
return(I)}
236 7. MAXIMUM LIKELIHOOD ESTIMATION - MULTIPARAMETER
which indicates we have obtained a local extrema. The observed information matrix is
" #
8:249382 6:586959
a; ^b) =
I(^
6:586959 7:381967
and
a; ^b)]11 = 8:249382 > 0
[I(^
then by the second derivative test we have found the maximum likelihood estimates.
7.1.14 Exercise
Use the following R code to randomly generate 30 observations from a Gamma( ; ) dis-
tribution
# randomly generate 35 observations from a Gamma(a,b)
set.seed(32067489) # set the seed so results can be reproduced
# use randomly generated a and b values
truea<-runif(1,min=1,max=3)
trueb<-runif(1,min=3,max=5)
# data are sorted and rounded to two decimal places for easier display
x<-sort(round(rgamma(30,truea,scale=trueb),2))
x
Use Newton’s Method and R to …nd (^ ; ^ ). What are the values of S(^ ; ^ ) and I(^ ; ^ )?
7.2. LIKELIHOOD REGIONS 237
Therefore
L ( 1; 2)
R ( 1; 2) =
L(^1 ; ^2 )
" #" #
1 h i I^11 I^12 ^1 1
1 ^1 1
^2 2
2L(^1 ; ^2 ) I^12 I^22 ^2 2
h i 1h i
= 1 2L(^1 ; ^2 ) ( 1
^1 )2 I^11 + 2 1
^1 ( 2
^2 )I^12 + ( 2
^2 )2 I^22
( 1
^1 )2 I^11 + 2( 1
^1 )( 2
^2 )I^12 + ( 2
^2 )2 I^22 = 2 (1 p) L(^1 ; ^2 )
7.2.2 Example
(a) Use R to graph 1%, 5%, 10%, 50%, and 90% likelihood regions for the parameters (a; b)
in Example 7.1.13. Comment on the shapes of the regions.
(b) Is the value (2:5; 3:5) a plausible value of (a; b)?
Solution
(a) The following R code generates the required likelihood regions.
# function for calculating Beta relative likelihood function
# for parameters a,b and data x
BERLF<-function(a,b,that,x)
{t1<-prod(x)
t2<-prod(1-x)
n<-length(x)
ah<-that[1]
bh<-that[2]
L<-<-((gamma(a+b)*gamma(ah)*gamma(bh))/
(gamma(a)*gamma(b)*gamma(ah+bh)))^n*t1^(a-ah)*t2^(b-bh)
return(L)}
#
a<-seq(0.5,5.5,0.01)
b<-seq(0.5,6,0.01)
R<-outer(a,b,FUN = BERLF,thetahat,x)
contour(a,b,R,levels=c(0.01,0.05,0.10,0.50,0.9),xlab="a",ylab="b",lwd=2)
The 1%, 5%, 10%, 50%, and 90% likelihood regions for (a; b) are shown in Figure 7.2.
The likelihood contours are approximate ellipses but they are not symmetric about
the maximum likelihood estimates (^ a; ^b) = (2:824775; 2:97317). The likelihood regions are
more stretched for larger values of a and b. The ellipses are also skewed relative to the ab
coordinate axes. The skewness of the likelihood contours relative to the ab coordinate axes
is determined by the value of I^12 . If the value of I^12 is close to zero the skewness will be
small.
(b) Since R (2:5; 3:5) = 0:082 the point (2:5; 3:5) lies outside a 10% likelihood region so it
is not a very plausible value of (a; b).
Note however that a = 2:5 is a plausible value of a for some values of b, for example,
a = 2:5, b = 2:5 lies inside a 50% likelihood region so (2:5; 2:5) is a plausible value of (a; b).
We see that when there is more than one parameter then we need to determine whether a
set of values are jointly plausible given the observed data.
7.2. LIKELIHOOD REGIONS 239
6
5
4
b
3
2
1
1 2 3 4 5
7.2.3 Exercise
(a) Use R to graph 1%, 5%, 10%, 50%, and 90% likelihood regions for the parameters ( ; )
in Exercise 7.1.14. Comment on the shapes of the regions.
(b) Is the value (3; 2:7) a plausible value of ( ; )?
(c) Use the R code in Exercise 7.1.14 to generate 100 observations from the Gamma( ; )
distribution.
(d) Use R to graph 1%, 5%, 10%, 50%, and 90% likelihood regions for ( ; ) for the data
generated in (c). Comment on the shapes of these regions as compared to the regions in
(a).
240 7. MAXIMUM LIKELIHOOD ESTIMATION - MULTIPARAMETER
Xn !p X
Xn !D X = (X1 ; X2 ; : : : ; Xk )
To discuss the asymptotic properties of the maximum likelihood estimator in the multipa-
rameter case we also need the de…nition and properties of the Multivariate Normal Dis-
tribution. The Multivariate Normal distribution is the natural extension of the Bivariate
Normal distribution which was discussed in Section 3.10.
1 1 1
f (x1 ; x2 ; : : : ; xk ) = exp (x ) (x )T for x 2 <k
(2 )k=2 j j1=2 2
The following theorem gives some important properties of the Multivariate Normal distri-
bution. These properties are a natural extension of the properties of the Bivariate Normal
distribution found in Theorem 3.10.2.
7.3. LIMITING DISTRIBUTION OF MAXIMUM LIKELIHOOD ESTIMATOR 241
1
M (t) = exp tT + t tT for t = (t1 ; t2 ; : : : ; tk ) 2 <k
2
P
k
XcT = ci Xi N cT ; c cT
i=1
XA N( A; AT A)
(6) The conditional distribution of any subset of (X1 ; X2 ; : : : ; Xk ) given the rest of the co-
ordinates is a MVN distribution. In particular the conditional probability density function
of Xi given Xj = xj ; i 6= j; is
2 2
Xi jXj = xj N( i + ij i (xj j )= j ; (1 ij ) i )
The following theorem gives the asymptotic distribution of the maximum likelihood esti-
mator in the multiparameter case. This theorem looks very similar to Theorem 6.5.1 with
the scalar quantities replaced by the appropriate vectors and matrices.
~n !p (7.2)
2 log R( ; Xn ) = 2[l(~n ; Xn ) l( ; Xn )] !D W 2
(k) (7.4)
242 7. MAXIMUM LIKELIHOOD ESTIMATION - MULTIPARAMETER
for each 2 .
An approximate 100p% con…dence region for based on the asymptotic pivotal quantity
(~n )J(~n )(~n )T is the set of all vectors in the set
Similarly since
An approximate 100p% con…dence region for based on the asymptotic pivotal quantity
(~n )I(~n ; Xn )(~n )T is the set of all vectors in the set
an approximate 100p% con…dence region for based on this asymptotic pivotal quantity is
the set of all vectors satisfying
f : 2 log R( ; x) cg
f : 2 log R( ; x) cg
n o
= : R( ; x) e c=2
7.4.3 Example
Use R and the results from Examples 7.1.10 and 7.1.13 to graph approximate 90%, 95%,
and 99% con…dence regions for (a; b). Compare these approximate con…dence regions with
the likelihood regions in Example 7.2.2.
Solution
From Example 7.1.10 that for a random sample from the Beta(a; b) distribution the infor-
mation matrix and the expected information matrix are given by
" #
0 (a) 0 (a + b) 0 (a + b)
I(a; b) = n 0 (a + b) 0 (b) 0 (a + b) = J(a; b)
Since " #
h i a
~ a
a
~ a ~b b a; ~b)
J(~ ~b !D W 2
(2)
b
an approximate 100p% con…dence region for (a; b) is given by
" #
h i a
^ a
f(a; b) : a^ a ^b b J(^ ^
a; b) ^ cg
b b
which gives
c= 2 log(1 p)
For p = 0:95, c = 2 log(0:05) = 5:99; an approximate 95% con…dence region is given by
" #
h i a
^ a
f(a; b) : a^ a ^b b J(^ a; ^b) ^ 5:99g
b b
If we let " #
J^11 J^12
a; ^b) =
J(^
J^12 J^22
then the approximate con…dence region can be written as
f(a; b) : (^
a a)2 J^11 + 2 (^
a a) (^b b)J^12 + (^b b)2 J^22 5:99g
We note that the approximate con…dence region is the set of points on and inside the ellipse
(^
a a)2 J^11 + 2 (^
a a) (^b b)J^12 + (^b b)2 J^22 = 5:99
7.4. APPROXIMATE CONFIDENCE REGIONS 245
a; ^b).
which is centred at (^
For the data in Example 7.1.13, a ^ = 2:824775, ^b = 2:97317 and
" #
8:249382 6:586959
I(^a; ^b) = J(^
a; ^b) =
6:586959 7:381967
Approximate 90% ( 2 log(0:1) = 4:61), 95% ( 2 log(0:05) = 5:99), and 99% ( 2 log(0:01) = 9:21)
con…dence regions are shown in Figure 7.3.
The following R code generates the required approximate con…dence regions.
# function for calculating values for determining confidence regions
ConfRegion<-function(a,b,th,info)
{c<-(th[1]-a)^2*info[1,1]+2*(th[1]-a)*
(th[2]-b)*info[1,2]+(th[2]-b)^2*info[2,2]
return(c)}
#
# graph approximate confidence regions
a<-seq(1,5.5,0.01)
b<-seq(1,6,0.01)
c<-outer(a,b,FUN = ConfRegion,thetahat,Ithetahat)
contour(a,b,c,levels=c(4.61,5.99,9.21),xlab="a",ylab="b",lwd=2)
#
6
5
4
b
3
2
1
1 2 3 4 5
A 10% likelihood region for (a; b) is given by f(a; b) : R(a; b; x) 0:1g. Since
2
2 log R(a; b; Xn ) !D W (2) = Exponential (2)
we have
and therefore a 10% likelihood region corresponds to an approximate 90% con…dence region.
Similarly 1% and 5% likelihood regions correspond to approximate 99% and 95% con…dence
regions respectively.
If we compare the likelihood regions in Figure 7.2 with the approximate con…dence
regions shown in Figure 7.3 we notice that the con…dence regions are exact ellipses centred
at the maximum likelihood estimates whereas the likelihood regions are only approximate
ellipses not centered at the maximum likelihood estimates. We notice that there are values
inside an approximate 99% con…dence region but which are outside a 1% likelihood region.
The point (a; b) = (1; 1:5) is an example. There were only 35 observations in this data
set. The di¤erences between the likelihood regions and the approximate con…dence regions
indicate that the Normal approximation might not be good. In this example the likelihood
regions provide a better summary of the uncertainty in the estimates.
7.4.4 Exercise
Use R and the results from Exercises 7.1.11 and 7.1.14 to graph approximate 90%, 95%,
and 99% con…dence regions for (a; b). Compare these approximate con…dence regions with
the likelihood regions in Exercise 7.2.3.
Since likelihood regions and approximate con…dence regions cannot be graphed or easily
interpreted for more than two parameters, we often construct approximate con…dence inter-
vals for individual parameters. Such con…dence intervals are often referred to as marginal
con…dence intervals. These con…dence intervals must be used with care as we will see in
Example 7.4.6.
Approximate con…dence intervals can also be constructed for a linear combination of para-
meters. An illustration is given in Example 7.4.6.
7.4. APPROXIMATE CONFIDENCE REGIONS 247
where ^i is the ith entry in the vector ^n , v^ii is the (i; i) entry of the matrix [J(^n )] 1, and
a is the value such that P (Z a) = 1+p 2 where Z N(0; 1).
Similarly since
7.4.6 Example
Using the results from Examples 7.1.10 and 7.1.13 determine approximate 95% marginal
con…dence intervals for a, b, and an approximate con…dence interval for a + b.
Solution
Let " #
h i 1 v^11 v^12
a; ^b)
J(^ =
v^12 v^22
Since " #!
h i h i 1 0
a
~ a ~b b a; ~b)]1=2 !D Z
[J(~ BVN 0 0 ;
0 1
Note that a = 2:1 is in the approximate 95% marginal con…dence interval for a and b = 3:8
is in the approximate 95% marginal con…dence interval for b and yet the point (2:1; 3:8)
is not in the approximate 95% joint con…dence region for (a; b). Clearly these marginal
con…dence intervals for a and b must be used with care.
To obtain an approximate 95% marginal con…dence interval for a + b we note that
a + ~b) = V ar(~
V ar(~ a) + V ar(~b) + 2Cov(~
a; ~b)
v^11 + v^22 + 2^
v12 = v^
7.4.7 Exercise
Using the results from Exercises 7.1.11 and 7.1.14 determine approximate 95% marginal
con…dence intervals for a, b, and an approximate con…dence interval for a + b.
7.5. CHAPTER 7 PROBLEMS 249
3. Suppose x11 ; x12 ; : : : ; x1n1 is an observed random sample from the N( 1 ; 2 ) distri-
bution and independently x21 ; x22 ; : : : ; x2n2 is an observed random sample from the
N( 2 ; 2 ) distribution. Find the maximum likelihood estimators of 1 ; 2 ; and 2 .
4. In a large population of males ages 40 50, the proportion who are regular smokers is
where 0 1 and the proportion who have hypertension (high blood pressure) is
where 0 1. Suppose that n men are selected at random from this population
and the observed data are
Category SH SH SH SH
Frequency x11 x12 x21 x22
where S is the event the male is a smoker and H is the event the male has hypertension.
(a) Assuming the events S and H are independent determine the likelihood function,
the score vector, the maximum likelihood estimates, and the information matrix
for and .
(b) Determine the expected information matrix and its inverse matrix. What do
you notice regarding the diagonal entries of the inverse matrix?
(a) Find the likelihood function, the score vector, and the information matrix for
and . How would you …nd the maximum likelihood estimates of and ?
(b) Show that if u is an observation from the Uniform(0; 1) distribution then
1
x= log 1
u
(c) Use the following R code to randomly generate 30 observations from a Logistic( ; )
distribution.
# randomly generate 30 observations from a Logistic(mu,beta)
# using a random mu and beta values
set.seed(21086689) # set the seed so results can be reproduced
truemu<-runif(1,min=2,max=3)
truebeta<-runif(1,min=3,max=4)
# data are sorted and rounded to two decimal places for easier display
x<-sort(round((truemu-truebeta*log(1/runif(30)-1)),2))
x
(d) Use Newton’s Method and R to …nd (^ ; ^ ). Determine S(^ ; ^ ) and I(^ ; ^ ).
(e) Use R to graph 1%, 5%, 10%, 50%, and 90% likelihood regions for ( ; ).
(f) Use R to graph approximate 90%, 95%, and 99% con…dence regions for ( ; ).
Compare these approximate con…dence regions with the likelihood regions in (e).
(g) Determine approximate 95% marginal con…dence intervals for , , and an ap-
proximate con…dence interval for + .
(a) Find the likelihood function, the score vector, and the information matrix for
and . How would you …nd the maximum likelihood estimates of and ?
(b) Show that if u is an observation from the Uniform(0; 1) distribution then
x= [ log (1 u)]1=
(f) Use R to graph approximate 90%, 95%, and 99% con…dence regions for ( ; ).
Compare these approximate con…dence regions with the likelihood regions in (e).
(g) Determine approximate 95% marginal con…dence intervals for , , and an ap-
proximate con…dence interval for + .
xi 1
7. Suppose Yi Binomial(1; pi ), i = 1; 2; : : : ; n independently where pi = 1 + e
and the xi are known constants.
(a) Determine the likelihood function, the score vector, and the expected information
matrix for and .
(b) Explain how you would use Newton’s method to …nd the maximum likelihood
estimates of and .
H0 : 2 0
where 0 is some subset of . H0 is called the null hypothesis. When conducting a test
of hypothesis there is usually another statement of interest which is the statement which
re‡ects what might be true if H0 is not supported by the observed data. This statement is
called the alternative hypothesis and is denoted HA or H1 . In many cases HA may simply
take the form
HA : 2 = 0
In constructing a test of hypothesis it is useful to distinguish between simple and com-
posite hypotheses.
253
254 8. HYPOTHESIS TESTING
If the hypothesis completely speci…es the model including any parameters in the model
then the hypothesis is simple otherwise the hypothesis is composite.
8.1.2 Example
For each of the following indicate whether the null hypothesis is simple or composite. Spec-
ify and 0 and determine the dimension of each.
(a) It is assumed that the observed data x = (x1 ; x2 ; : : : ; xn ) represent a random sample
from a Poisson( ) distribution. The hypothesis of interest is H0 : = 0 where 0 is a
speci…ed value of .
(b) It is assumed that the observed data x = (x1 ; x2 ; : : : ; xn ) represent a random sample
from a Gamma( ; ) distribution. The hypothesis of interest is H0 : = 0 where 0 is a
speci…ed value of .
(c) It is assumed that the observed data x = (x1 ; x2 ; : : : ; xn ) represent a random sample
from an Exponential( 1 ) distribution and independently the observed data y = (y1 ; y2 ; : : : ; ym )
represent a random sample from an Exponential( 2 ) distribution. The hypothesis of inter-
est is H0 : 1 = 2 .
(d) It is assumed that the observed data x = (x1 ; x2 ; : : : ; xn ) represent a random sample
from a N( 1 ; 21 ) distribution and independently the observed data y = (y1 ; y2 ; : : : ; ym )
represent a random sample from a N( 2 ; 22 ) distribution. The hypothesis of interest is
H0 : 1 = 2 ; 21 = 22 .
Solution
(a) This is a simple hypothesis since the model and the unknown parameter are completely
speci…ed. = f : > 0g which has dimension 1 and 0 = f 0 g which has dimension 0.
(b) This is a composite hypothesis since is not speci…ed by H0 . = f( ; ) : > 0; > 0g
which has dimension 2 and 0 = f( 0 ; ) : > 0g which has dimension 1.
(c) This is a composite hypothesis since 1 and 2 are not speci…ed by H0 .
= f( 1 ; 2 ) : 1 > 0; 2 > 0g which has dimension 2 and
0 = f( 1 ; 2 ) : 1 = 2 ; 1 > 0; 2 > 0g which has dimension 1.
To measure the evidence against H0 based on the observed data we use a test statistic or
discrepancy measure.
8.1. TEST OF HYPOTHESIS 255
A test statistic is usually chosen so that a small observed value of the test statistic
indicates close agreement between the observed data and the null hypothesis H0 while a
large observed value of the test statistic indicates poor agreement. The test statistic is
chosen before the data are examined and the choice re‡ects the type of departure from the
null hypothesis H0 that we wish to detect as speci…ed by the alternative hypothesis HA . A
general method for constructing test statistics can be based on the likelihood function as
we will see in the next two sections.
8.1.4 Example
For Example 8.1.2(a) suggest a test statistic which could be used if the alternative hypoth-
esis is HA : 6= 0 . Suggest a test statistic which could be used if the alternative hypothesis
is HA : > 0 and if the alternative hypothesis is HA : < 0 .
Solution
If H0 : = 0 is true then E X = 0 . If the alternative hypothesis is HA : 6= 0 then a
reasonable test statistic which could be used is D = X 0 .
If the alternative hypothesis is HA : > 0 then a reasonable test statistic which could be
used is D = X 0.
If the alternative hypothesis is HA : < 0 then a reasonable test statistic which could be
used is D = 0 X.
After the data have been collected the observed value of the test statistic is calculated.
Assuming the null hypothesis H0 is true we compute the probability of observing a value
of the test statistic at least as great as that observed. This probability is called the p-value
of the data in relation to the null hypothesis H0 .
p-value = P (D d; H0 )
The p-value is the probability of observing such poor agreement using test statistic D
between the null hypothesis H0 and the data if the null hypothesis H0 is true. If the p-value
256 8. HYPOTHESIS TESTING
is very small, then such poor agreement would occur very rarely if the null hypothesis H0 is
true, and we interpret this to mean that the observed data are providing evidence against
the null hypothesis H0 . The smaller the p-value the stronger the evidence against the null
hypothesis H0 based on the observed data. A large p-value does not mean that the null
hypothesis H0 is true but only indicates a lack of evidence against the null hypothesis H0
based on the observed data and using the test statistic D.
The following table gives a rough guideline for interpreting p-values. These are only
guidelines. The interpretation of p-values must always be made in the context of a given
study.
8.1.6 Example
For Example 8.1.4 suppose x = 5:7, n = 25 and 0 = 5. Determine the p-value for both
HA : 6= 0 and HA : > 0 . Give a conclusion in each case.
Solution
For x = 5:7, n = 25; 0 = 5, and HA : 6= 5 the observed value of the test statistic is
d = j5:7 5j = 0:7, and
p-value = P X 5 0:7; H0 : =5
P
25
= P (jT 125j 17:5) where T = Xi Poisson (125)
i=1
= P (T 107:5) + P (T 142:5)
= P (T 107) + P (T 143)
= 0:05605429 + 0:06113746
= 0:1171917
calculated using R. Since p-value > 0:1 there is no evidence against H0 : = 5 based on
the data.
For x = 5:7, n = 25; 0 = 5, and HA : > 5 the observed value of the test statistic is
8.2. LIKELIHOOD RATIO TESTS FOR SIMPLE HYPOTHESES 257
p-value = P X 5 0:7; H0 : =5
P
25
= P (T 125 17:5) where T = Xi Poisson (125)
i=1
= P (T 143)
= 0:06113746
Since 0:05 < p-value 0:1 there is weak evidence against H0 : = 5 based on the data.
8.1.7 Exercise
Suppose in a Binomial experiment 42 successes have been observed in 100 trials and the
hypothesis of interest is H0 : = 0:5.
(a) If the alternative hypothesis is HA : 6= 0:5, suggest a suitable test statistic, calculate
the p-value and give a conclusion.
(b) If the alternative hypothesis is HA : < 0:5, suggest a suitable test statistic, calculate
the p-value and give a conclusion.
(X; 0) = 2 log R ( 0 ; X)
" #
L ( 0 ; X)
= 2 log
L(~; X)
h i
= 2 l(~; X) l ( 0 ; X)
where ~ = ~ (X) is the maximum likelihood estimator of . Note that this test statistic
implicitly assumes that the alternative hypothesis is HA : 6= 0 or HA : 2
= 0.
258 8. HYPOTHESIS TESTING
Note that the p-value is calculated assuming H0 : = 0 is true. In general this p-value
is di¢ cult to determine exactly since the distribution of the random variable (X; 0 ) is
usually intractable. We use the result from Theorem 7.3.4 which says that under certain
(regularity) conditions
2 log R( ; Xn ) = 2[l(~n ; Xn ) l( ; Xn )] !D W 2
(k) (8.1)
2
p-value P [W (x; 0 )] where W (k)
8.2.1 Example
Suppose X1 ; X2 ; : : : ; Xn is a random sample from the N ; 2 distribution where 2 is
known. Show that, in this special case, the likelihood ratio test statistic for testing
H0 : = 0 has exactly a 2 (1) distribution.
Solution
From Example 7.1.8 we have that the likelihood function of is
n 1 P
n n(x )2
L( ) = exp 2
(xi x)2 exp 2
for 2<
2 i=1 2
or more simply
n(x )2
L( ) = exp 2
for 2<
2
The corresponding log likelihood function is
n(x )2
l( ) = 2
for 2<
2
Solving
dl n(x )
= 2
=0
d
8.2. LIKELIHOOD RATIO TESTS FOR SIMPLE HYPOTHESES 259
8.2.2 Example
Suppose X1 ; X2 ; : : : ; Xn is a random sample from the Poisson( ) distribution.
(a) Find the likelihood ratio test statistic for testing H0 : = 0. Verify that the likelihood
ratio statistic takes on large values if ~ > 0 or ~ < 0 .
(b) Suppose x = 6 and n = 25. Use the likelihood ratio test statistic to test H0 : = 5.
Compare this with the test in Example 8.1.6.
Solution
(a) From Example 6.2.5 we have the likelihood function
nx n
L( ) = e for 0
and maximum likelihood estimate ^ = x. The relative likelihood function can be written
as
n^
L( ) ^
R( ) = = en( ) for 0
L(^) ^
260 8. HYPOTHESIS TESTING
( 0 ; X) = 2 log R ( 0 ; X)
" ~ #
n
0 n(~ 0 )
= 2 log e
~
~ log 0 ~
= 2n + 0
~
0 0
= 2n~ 1 log (8.2)
~ ~
To verify that the likelihood ratio statistic takes on large values if ~ > 0 or ~ < 0 or
equivalently if ~0 < 1 or ~0 > 1, consider the function
1 t 1
g 0 (t) = a 1 =a for t > 0 and a > 0
t t
Since g 0 (t) < 0 for 0 < t < 1, and g 0 (t) > 0 for t > 1 we can conclude that the function
g (t) is a decreasing function for 0 < t < 1 and an increasing function for t > 1 with an
absolute minimum at t = 1. Since g (1) = 0, g (t) is positive for all t > 0 and t 6= 0.
Therefore if we let t = 0
~ in (8.2) then we see that ( 0 ; X) will be large for small values
of t = 0
~ < 1 or large values of t = 0
~ > 1.
(b) If x = 6, n = 25, and H0 : = 5 then the observed value of the likelihood ratio test
statistic is
(5; x) = 2 log R ( 0 ; X)
" #
5 25(5:6) 25(6 5)
= 2 log e
6
= 4:6965
The parameter space is = f : > 0g which has dimension 1 and thus k = 1. The
approximate p-value is
2
p-value P (W 4:6965) where W (1)
h p i
= 2 1 P Z 4:6965 where Z N (0; 1)
= 0:0302
calculated using R. Since 0:01 < p-value 0:05 there is evidence against H0 : = 5 based
on the data. Compared with the answer in Example 8.1.6 for HA : 6= 5 we note that the
p-values are slightly di¤erent by the conclusion is the same.
8.2. LIKELIHOOD RATIO TESTS FOR SIMPLE HYPOTHESES 261
8.2.3 Example
Suppose X1 ; X2 ; : : : ; Xn is a random sample from the Exponential( ) distribution.
(a) Find the likelihood ratio test statistic for testing H0 : = 0. Verify that the likelihood
ratio statistic takes on large values if ~ > 0 or ~ < 0 .
(b) Suppose x = 6 and n = 25. Use the likelihood ratio test statistic to test H0 : = 5.
(c) From Example 6.6.3 we have
P
n
2 Xi
i=1 2n~ 2
Q(X; ) = = (2n)
is a pivotal quantity. Explain how this pivotal quantity could be used to test H0 : = 0
if (i) HA : < 0 , (ii) HA : > 0 , and (iii) HA : 6= 0 .
(d) Suppose x = 6 and n = 25. Use the test statistic from (c) for HA : 6= 0 to test
H0 : = 5. Compare the answer with the answer in (b).
Solution
(a) From Example 6.2.8 we have the likelihood function
n nx=
L( ) = e for >0
and maximum likelihood estimate ^ = x. The relative likelihood function can be written
as !n
L( ) ^ ^
R( ) = = en(1 = ) for 0
^
L( )
The likelihood ratio test statistic for H0 : = 0 is
( 0 ; X) = 2 log R ( 0 ; X)
" !n #
~ ~= )
= 2 log en(1
" ! !#
~ ~
= 2n 1 log
0 0
To verify that the likelihood ratio statistic takes on large values if ~ > 0 or ~ < 0 or
~ ~
equivalently if 0 < 1 and 0 > 1 we note that ( 0 ; X) is of the form 8.3 so an argument
~
similar to Example 8.2.2(a) can be used with t = 0
.
(b) If x = 6, n = 25, and H0 : = 5 then the observed value of the likelihood ratio test
statistic is
(5; x) = 2 log R ( 0 ; X)
" #
6 25 25(1 6)=5
= 2 log e
5
= 0:8839222
262 8. HYPOTHESIS TESTING
The parameter space is = f : > 0g which has dimension 1 and thus k = 1. The
approximate p-value is
2
p-value P (W 0:8839222) where W (1)
h p i
= 2 1 P Z 0:8839222 where Z N (0; 1)
= 0:3471
calculated using R. Since p-value > 0:1 there is no evidence against H0 : = 5 based on
the data.
~
(c) (i) If HA : > 0 we could let D = 0
. If H0 : = 0 is true then since E(~) = 0 we
~
would expect observed values of D = 0 to be close to 1. However if HA : > 0 is true
~
then E(~) = > 0 and we would expect observed values of D = 0 to be larger than 1
and therefore large values of D provide evidence against H0 : = 0 . The corresponding
p-value would be
!
~ ^
p-value = P ; H0
0 0
!
2n^ 2
= P W where W (2n)
0
~
(ii) If HA : < 0 we could still let D = 0
. If H0 : = 0 is true then since E(~) = 0 we
~
would expect observed values of D = 0 to be close to 1. However if HA : < 0 is true
~
then E(~) = < 0 and we would expect observed values of D = 0 to be smaller than
1 and therefore small values of D provide evidence against H0 : = 0 . The corresponding
p-value would be
!
~ ^
p-value = P ; H0
0 0
!
2n^ 2
= P W where W (2n)
0
~
(iii) If HA : 6= 0 we could still let D = 0
. If H0 : = 0 is true then since E(~) = 0 we
~
would expect observed values of D = 0 to be close to 1. However if HA : 6= 0 is true
~
then E(~) = 6= 0 and we would expect observed values of D = 0 to be either larger
or smaller than 1 and therefore both large and small values of D provide evidence against
H0 : = 0 . If a large (small) value of D is observed it is not simple to determine exactly
which small (large) values should also be considered. Since we are not that concerned about
the exact p-value, the p-value is usually calculated more simply as
! !!
2n^ 2n^ 2
p-value = min 2P W ; 2P W where W (2n)
0 0
8.2. LIKELIHOOD RATIO TESTS FOR SIMPLE HYPOTHESES 263
~ 6
(d) If x = 6, n = 25, and H0 : = 5 then the observed value of the D = 0
is d = 5 with
(50) 6 (50) 6 2
p-value = min 2P W ; 2P W where W (50)
5 5
= min (1:6855; 0:3145)
= 0:3145
calculated using R. Since p-value > 0:1 there is no evidence against H0 : = 5 based on
the data.
h i
~ ~ ~
We notice the test statistic D = 0 and ( 0 ; X) = 2n 0
1 log 0
are both
~
functions of 0
. For this example the p-values are similar and the conclusions are the same.
8.2.4 Example
The following table gives the observed frequencies of the six faces in 100 rolls of a die:
Face: j 1 2 3 4 5 6 Total
Observed Frequency: xj 16 15 14 20 22 13 100
Are these observations consistent with the hypothesis that the die is fair?
Solution
The model for these data is (X1 ; X2 ; : : : ; X6 ) Multinomial(100; 1 ; 2 ; : : : ; 6 ) and the
hypothesis of interest is H0 : 1 = 2 = = 6 = 61 . Since the model and parameters are
P6
completely speci…ed this is a simple hypothesis. Since j = 1 there are really only k = 5
j=1
parameters. The relative likelihood function for ( 1 ; 2; : : : ; 5) is
n! x1 x2 x5 x6
L ( 1; 2; : : : ; 5) = 1 2 5 (1 1 2 5)
x1 !x2 ! x5 !x6 !
or more simply
x1 x2 x5 x6
L ( 1; 2; : : : ; 5) = 1 2 5 (1 1 2 5)
P
5
for 0 j 1 for j = 1; 2; : : : ; 5 and j 1. The log likelihood function is
j=1
P
5
l ( 1; 2; : : : ; 5) = xj log j + x6 log (1 1 2 5)
j=1
Now
@l xj n x1 x2 x5
= for j = 1; 2; : : : ; 5
@ j j 1 1 2 5
xj (1 1 2 5) j (n x1 x2 x5 )
=
j (1 1 2 5)
264 8. HYPOTHESIS TESTING
P
6 1
l( 0 ; X) = Xj log
i=1 6
so the likelihood ratio test statistic is
h i
(X; 0 ) = 2 l(~; X) l( 0 ; X)
P
6 Xj P
6 1
= 2 Xj log Xj log
i=1 n i=1 6
P
6 Xj
= 2 Xj log
i=1 Ej
where Ej = n=6 is the expected frequency for outcome j. This test statistic is the likelihood
ratio Goodness of Fit test statistic introduced in your previous statistics course.
For these data the observed value of the likelihood ratio test statistic is
P
6 xj
(x; 0) = 2 xj log
i=1 100=6
16 15 13
= 2 16 log + 15 log + + 13 log
100=6 100=6 100=6
= 3:699649
calculated using R. Since p-value> 0:1 there is no evidence based on the data against the
hypothesis of a fair die.
8.3. LIKELIHOOD RATIO TESTS FOR COMPOSITE HYPOTHESES 265
Note:
(1) In this example the data (X1 ; X2 ; : : : ; X6 ) are not a random sample. The conditions
for (8.1) hold by thinking of the experiment as a sequence of n independent trials with 6
outcomes on each trial.
(2) You may recall from your previous statistics course that the 2 approximation is rea-
sonable if the expected frequency Ej in each category is at least 5.
8.2.5 Exercise
In a long-term study of heart disease in a large group of men, it was noted that 63 men who
had no previous record of heart problems died suddenly of heart attacks. The following
table gives the number of such deaths recorded on each day of the week:
Test the hypothesis of interest that the deaths are equally likely to occur on any day of the
week.
= 2 l(~; X) max l( ; X)
2 0
Note that the p-value is calculated assuming H0 : 2 0 is true. In general this p-value
is di¢ cult to determine exactly since the distribution of the random variable (X; 0 ) is
266 8. HYPOTHESIS TESTING
usually intractable. Under certain (regularity) conditions it can be shown that, assuming
the hypothesis H0 : 2 0 is true,
8.3.1 Example
(a) Suppose X1 ; X2 ; : : : ; Xn is a random sample from the Gamma( ; ) distribution. Find
the likelihood ratio test statistic for testing H0 : = 0 where is unknown. Indicate how
to …nd the approximate p-value.
(b) For the data in Example 7.1.14 test the hypothesis H0 : = 2.
Solution
(a) From Example 8.1.2(b) we have = f( ; ) : > 0; > 0g which has dimension k = 2
and 0 = f( 0 ; ) : > 0g which has dimension q = 1 and the hypothesis is composite.
From Exercise 7.1.11 the likelihood function is
n Q
n 1 P
n
L( ; ) = [ ( ) ] xi exp xi for > 0; >0
i=1 i=1
l ( ; ) = log L ( ; )
P
n 1 P
n
= n log ( ) n log + log xi xi for > 0; >0
i=1 i=1
P
n
xi
d n 0 i=1
l ( 0; ) = + 2
d
d Pn
and d l( 0; ) = 0 for = n1 0 xi = x0 and therefore
i=1
X P
n
max l( ; ; X) = n log ( 0) n 0 log + 0 log Xi n 0
( ; )2 0 0 i=1
The likelihood ratio test statistic is
(X; 0)
= 2 l(~ ; ~ ; X) max l( ; ; X)
( ; )2 0
P
n 1 Pn
= 2[ n log (~ ) n~ log ~ + ~ log Xi X
i=1 ~ i=1 i
X P
n
+n log ( 0) +n 0 log 0 log Xi + n 0]
0 i=1
( 0) (~ 0) P
n X X
= 2n log + log Xi + 0 log ~ log ~ + 0
(~ ) n i=1 0 ~
with corresponding observed value
(x; 0)
( 0) (^ 0) P
n x x
= 2n log + log xi + 0 log ^ log ^ + 0
(^ ) n i=1 0 ^
Since k q=2 1=1
2
p-value P [W (x; 0 )] where W (1)
h p i
= 2 1 P Z (x; 0 ) where Z N (0; 1)
The degrees of freedom can also be determined by noticing that, under the full model two
parameters ( and ) were estimated, and under the null hypothesis H0 : = 0 only one
parameter ( ) was estimated. Therefore 2 1 = 1 are the degrees of freedom.
(b) For H0 : = 2 and the data in Example 7.1.14 we have n = 30, x = 6:824333,
Pn
1
n log xi = 1:794204, ^ = 4:118407, ^ = 1:657032. The observed value of the likelihood
i=1
ratio test statistic is (x; 0) = 6:886146 with
2
p-value P (W 6:886146) where W (1)
h p i
= 2 1 P Z 6:886146 where Z N (0; 1)
= 0:008686636
calculated using R. Since p-value 0:01 there is strong evidence against H0 : = 2 based
on the data.
268 8. HYPOTHESIS TESTING
8.3.2 Example
(a) Suppose X1 ; X2 ; : : : ; Xn is a random sample from the Exponential( 1 ) distribution and
independently Y1 ; Y2 ; : : : ; Ym is a random sample from the Exponential( 2 ) distribution.
Find the likelihood ratio test statistic for testing H0 : 1 = 2 . Indicate how to …nd the
approximate p-value.
P
10
(b) Find the approximate p-value if the observed data are n = 10, xi = 22, m = 15,
i=1
P
15
yi = 40. What would you conclude?
i=1
Solution
(a) From Example 8.1.2(c) we have = f( 1 ; 2 ) : 1 > 0; 2 > 0g which has dimension
k = 2 and 0 = f( 1 ; 2 ) : 1 = 2 ; 1 > 0; 2 > 0g which has dimension q = 1 and the
hypothesis is composite.
From Example 6.2.8 the likelihood function for an observed random sample x1 ; x2 ; : : : ; xn
from an Exponential( 1 ) distribution is
n nx=
L1 ( 1 ) = 1 e 1
for 1 >0
m my=
L2 ( 2 ) = 2 e 2
for 2 >0
with maximum likelihood estimate ^2 = y. Since the samples are independent the likelihood
function for ( 1 ; 2 ) is
The independence of the samples also implies the maximum likelihood estimators are still
~1 = X and ~2 = Y . Therefore
(nx + my)
l( ) = (n + m) log for >0
8.3. LIKELIHOOD RATIO TESTS FOR COMPOSITE HYPOTHESES 269
d (n + m) (nx + my)
l( ) = + 2
d
d nx+my
and d l ( ) = 0 for = n+m and therefore
nX + mY
max l( 1 ; 2 ; X; Y) = (n + m) log (n + m)
( 1 ; 2 )2 0 n+m
(X; Y; 0)
= 2 l(~1 ; ~2 ; X; Y) max l( 1 ; 2 ; X; Y)
( 1 ; 2 )2 0
nX + mY
= 2 n log X m log Y (n + m) + (n + m) log + (n + m)
n+m
nX + m Y
= 2 (n + m) log n log X m log Y
n+m
nx + my
(x; y; 0) = 2 (n + m) log n log x m log y
n+m
P
10 P
15
(b) For n = 10, xi = 22, m = 15, yi = 40 the observed value of the likelihood ratio
i=1 i=1
test statistic is (x; y; 0) = 0:2189032 and
2
p-value P (W 0:2189032) where W (1)
h p i
= 2 1 P Z 0:2189032 where Z N (0; 1)
= 0:6398769
calculated using R. Since p-value > 0:5 there is no evidence against H0 : 1 = 2 based on
the observed data.
8.3.3 Exercise
(a) Suppose X1 ; X2 ; : : : ; Xn is a random sample from the Poisson( 1 ) distribution and in-
dependently Y1 ; Y2 ; : : : ; Ym is a random sample from the Poisson( 2 ) distribution. Find the
270 8. HYPOTHESIS TESTING
likelihood ratio test statistic for testing H0 : 1 = 2. Indicate how to …nd the approximate
p-value.
P
10
(b) Find the approximate p-value if the observed data are n = 10, xi = 22, m = 15,
i=1
P
15
yi = 40. What would you conclude?
i=1
8.3.4 Example
In a large population of males ages 40 50, the proportion who are regular smokers is
where 0 1 and the proportion who have hypertension (high blood pressure) is
where 0 1. Suppose that n men are selected at random from this population and
the observed data are
Category SH S H SH S H
Frequency x11 x12 x21 x22
where S is the event the male is a smoker and H is the event the male has hypertension.
Find the likelihood ratio test statistic for testing H0 : events S and H are independent.
Indicate how to …nd the approximate p-value.
Solution
The model for these data is (X1 ; X2 ; : : : ; X6 ) Multinomial(100; 11 ; 12 ; 21 ; 22 ) with
parameter space
( )
P
2 P
2
= ( 11 ; 12 ; 21 ; 22 ) :0 ij 1 for i; j = 1; 2 ij 1
j=1 i=1
0 = f( 11 ; 12 ; 21 ; 22 ) : 11 = ; 12 = (1 );
21 = (1 ) ; 22 = (1 ) (1 ); 0 1; 0 1g
P
2 P
2
l( 11 ; 12 ; 21 ; 22 ) = xij log ij
j=1 i=1
8.3. LIKELIHOOD RATIO TESTS FOR COMPOSITE HYPOTHESES 271
xij
and the maximum likelihood estimate of ij is ^ij = n for i; j = 1; 2. Therefore
P
2 P
2 Xij
l(~11 ; ~12 ; ~21 ; ~22 ; X) = Xij log
j=1 i=1 n
If the events S and H are independent events then from Chapter 7, Problem 4 we have
that the likelihood function is
max l( 11 ; 12 ; 21 ; 22 ; X)
( 11 ; 12 ; 21 ; 22 )2 0
(X; 0)
P
2 P
2 Xij
= 2 Xij log
j=1 i=1 Eij
RC
where Eij = in j , Ri = Xi1 + Xi2 , Cj = X1j + X2j for i; j = 1; 2. Eij is the expected
frequency if the hypothesis of independence is true.
The corresponding observed value is
2 P
P 2 xij
(x; 0) =2 xij log
j=1 i=1 eij
ri cj
where eij = n , ri = xi1 + xi2 , cj = x1j + x2j for i; j = 1; 2.
272 8. HYPOTHESIS TESTING
This of course is the usual test of independence in a two-way table which was discussed in
your previous statistics course.
8.4. CHAPTER 8 PROBLEMS 273
(a) Find the likelihood ratio test statistic for testing H0 : = 0. Indicate how to
…nd the approximate p-value.
P
25
(b) For the data n = 25 and log xi = 40 …nd the approximate p-value for testing
i=1
H0 : 0 = 1. What would you conclude?
(a) Find the likelihood ratio test statistic for testing H0 : = 0. Indicate how to
…nd the approximate p-value.
P
20
(b) If n = 20 and x2i = 10 …nd the approximate p-value for testing H : = 0:1.
i=1
What would you conclude?
(a) Find the likelihood ratio test statistic for testing H0 : 1 = 2 = 3. Indicate
how to …nd the approximate p-value.
(b) Find the likelihood ratio test statistic for testing H0 : 1 = 2 ; 2 = 2 (1 );
3 = (1 )2 . Indicate how to …nd the approximate p-value.
274 8. HYPOTHESIS TESTING
275
276 9. SOLUTIONS TO CHAPTER EXERCISES
9.1 Chapter 2
Exercise 2.1.5
(a) P (A) 0 follows from De…nition 2.1.3(A1). From Example 2.1.4(c) we have P A =
1 P (A). But from De…nition 2.1.3(A1) P A 0 and therefore P (A) 1.
(b) Since A = (A \ B) [ A \ B and (A \ B) \ A \ B = ? then by Example 2.1.4(b)
P (A) = P (A \ B) + P A \ B
or
P A \ B = P (A) P (A \ B)
as required.
(c) Since
A [ B = A \ B [ (A \ B) [ A \ B
is the union of three mutually exclusive events then by De…nition 2.1.3(A3) and Example
2.1.4(a) we have
P (A [ B) = P A \ B + P (A \ B) + P A \ B (9.1)
P A \ B = P (A) P (A \ B) (9.2)
and similarly
P A \ B = P (B) P (A \ B) (9.3)
Substituting (9.2) and (9.3) into (9.1) gives
P (A [ B) = P (A) P (A \ B) + P (A \ B) + P (B) P (A \ B)
= P (A) + P (B) P (A \ B)
as required.
9.1. CHAPTER 2 277
Exercise 2.3.7
By 2.11.8
x2 x3
log (1 + x) = x + for 1<x 1
2 3
Let x = p 1 to obtain
(p 1)2 (p 1)3
log p = (p 1) + (9.4)
2 3
which holds for 0 < p 2 and therefore also hold for 0 < p < 1. Now (9.4) can be written
as
(1 p)2 (1 p)3
log p = (1 p)
2 3
P
1 (1 p)x
= for 0 < p < 1
x=1 x
Therefore
P
1 (1 p)x
P
1
f (x) =
x=1 x=1 x log p
1 1 P (1 p)x
=
log p x=1 x
log p
= =1
log p
which holds for 0 < p < 1.
Exercise 2.4.11
(a)
Z1 Z1 1
x (x= )
f (x) dx = e dx
1 0
1
Let y = (x= ) . Then dy = x dx. When x = 0, y = 0 and as x ! 1, y ! 1.
Therefore
Z1 Z1
x 1 (x= )
f (x) dx = e dx
1 0
Z1
y
= e dy = (1) = 0! = 1
0
(b) If = 1 then
1 x=
f (x) = e for x > 0
2
α=1, β=0.5
1.8
1.4
1.2
f(x)
1 α=3, β=1
0.8
0.6
0.4
α=2, β=1
0.2
0
0 0.5 1 1.5 2 2.5 3
x
Exercise 2.4.12
(a)
Z1 Z1 Zb
1
f (x) dx = +1
dx = lim x dx
x b!1
1 1
h i h i
b
= lim x j =1 lim b
b!1 b!1
1
= 1 lim = 1 since >0
b!1 b
3.5
α=0.5, β=2
3
2.5
f(x)
2
α=1, β=2
α=0.5, β=1
1.5
1 α=1, β=1
0.5
0
0 0.5 1 1.5 2 2.5 3
x
Exercise 2.5.4
(a) For X Cauchy( ; 1) the probability density function is
1
f (x; ) = h i for x 2 <; 2<
1 + (x )2
and 0 otherwise. See Figure 9.3 for a sketch of the probability density function for
= 1; 0; 1.
0.35
0.3
θ=0 θ=1
θ=-1
0.25
0.2
f(x)
0.15
0.1
0.05
0
-6 -4 -2 0 2 4 6
x
Let
1
f0 (x) = f (x; = 0) = for x 2 <
(1 + x2 )
and 0 otherwise. Then
1
f (x; ) = h i = f0 (x ) for x 2 <; 2<
2
1 + (x )
and 0 otherwise. See Figure 9.4 for a sketch of the probability density function for
= 0:5; 1; 2.
Let
1
f1 (x) = f (x; = 1) = for x 2 <
(1 + x2 )
280 9. SOLUTIONS TO CHAPTER EXERCISES
0.7
0.6
θ=0.5
0.5
0.4
f (x)
0.3
θ=1
0.2
0.1
θ=2
0
-5 0 5
x
Exercise 2.6.11
If X Exponential(1) then the probability density function of X is
x
f (x) = e for x 0
1 d 1
g(y) = f (h (y)) h (y)
dy
y 1
(y= )
= e for y 0
Exercise 2.6.12
X is a random variable with probability density function
1
f (x) = x for 0 < x < 1; >0
1 d 1
g(y) = f (h (y)) h (y)
dy
y 1 y
= e e
y
= e for y 0
1
which is the probability density function of a Exponential random variable as required.
282 9. SOLUTIONS TO CHAPTER EXERCISES
Exercise 2.7.4
Using integration by parts with u = 1 F (x) and dv = dx gives du = f (x) dx, v = x and
Z1 Z1
[1 F (x)] dx = x [1 F (x)] j1
0 + xf (x) dx
0 0
Z1
= lim x f (t) dt + E (X)
x!1
x
Since
Z1 Z1
0 x f (t) dt tf (t) dt
x x
R1 Rx
Since E (X) = xf (x) dx exists then G (x) = tf (t) dt exists for all x > 0 and lim G (x) =
0 0 x!1
E (X).
By the First Fundamental Theorem of Calculus 2.11.9 and the de…nition of an improper
integral 2.11.11
Z1
tf (t) dt = lim G (b) G (x) = E (X) G (x)
b!1
x
Therefore
Z1
lim tf (t) dt = lim [E (X) G (x)] = E (X) lim G (x)
x!1 x!1 x!1
x
= E (X) E (X)
= 0
Exercise 2.7.9
(a) If X Poisson( ) then
P
1 x
e
E(X (k) ) = x(k)
x=0 x!
P
1 x k
k
= e let y = x k
x=k (x k)!
k P1 y
= e
y=0 y!
k
= e e by 2.11.7
k
= for k = 1; 2; : : :
E X (1) = E (X) =
and
E X (2) = E [X (X 1)] = 2
so
so
Z1 1 e x= Z1 +p 1 e x=
p px x x
E(X ) = x dx = dx let y =
( ) ( )
0 0
Z1
1 +p 1 y
= ( y) e dy
( )
0
+p Z1
+p 1 y
= y e dy which converges for +p>0
( )
0
p ( + p)
= for p >
( )
( + 1) ( )
E (X) = =
( ) ( )
=
and
( + 2) ( + 1) ( )
E X2 = E X2 = 2
= 2
( ) ( )
2
= ( + 1)
so
V ar (X) = E X 2 [E (X)]2 = ( + 1) 2
( )2
2
=
9.1. CHAPTER 2 285
k k
= +1 for k = 1; 2; : : :
1
E (X) = +1
and
2
E X2 = E X2 = 2
+1
so
2
2 1
V ar (X) = E X 2 [E (X)]2 = 2
+1 +1
( )
2
2 2 1
= +1 +1
Exercise 2.8.3
From Markov’s inequality we know
E jY jk
P (jY j c) for all k; c > 0 (9.6)
ck
Since we are given that X is a random variable with …nite mean and …nite variance 2
then h i
2
= E (X )2 = E jX j2
E jX j2 2 1
P (jX j k ) 2 = 2 =
(k ) (k ) k2
or
1
P (jX j k )
k2
for all k > 0 as required.
286 9. SOLUTIONS TO CHAPTER EXERCISES
Exercise 2.9.2
p
For a Exponential( ) random variable V ar (X) = ( ) = . For g (X) = log X,
g 0 (X) = X1 . Therefore by (2.5), the variance of Y = g (X) = log X is approximately
2
2 1
g0 ( ) ( ) = ( )
= 1
which is a constant.
Exercise 2.10.3
(a) If X Binomial(n; p) then
P
1 n x
M (t) = etx p (1 p)n x
x20 x
P1 n x
= et p (1 p)n x
which converges for t 2 <
x20 x
k
= pet + 1 p by 2.11.3(1)
P
1 x
e
M (t) = etx
x=0 x!
x
P
1 et
= e which converges for t 2 <
x=0 x!
et
= e e by 2.11.7
+ et
= e for t 2 <
Exercise 2.10.6
If X Negative Binomial(k; p) then
k
p
MX (t) = for t < log q
1 qet
Exercise 2.10.14
(a) By the Exponential series 2.11.7
2
t2 =2 t2 =2 t2 =2
M (t) = e =1+ + +
1! 2!
1 1 4
= 1 + t2 + t for t 2 <
2 2!22
Since E(X k ) = k! coe¢ cient of tk in the Maclaurin series for M (t) we have
1
E (X) = 1! (0) = 0 and E X 2 = 2! =1
2
and so
V ar (X) = E X 2 [E (X)]2 = 1
(b) By Theorem 2.10.4 the moment generating function of Y = 2X 1 is
By examining the list of moment generating functions in Chapter 11 we see that this is the
moment generating function of a N( 1; 4) random variable. Therefore by the Uniqueness
Theorem for Moment Generating Functions, Y has a N( 1; 4) distribution.
288 9. SOLUTIONS TO CHAPTER EXERCISES
9.2 Chapter 3
Exercise 3.2.5
(a) The joint probability function of X and Y is
f (x; y) = P (X = x; Y = y)
n! h in x y
2 x
= [2 (1 )]y (1 )2
x!y! (n x y)!
for x = 0; 1; : : : ; n; y = 0; 1; : : : ; n; x + y n
which is a Multinomial n; 2
; 2 (1 ) ; (1 )2 distribution or the trinomial distribution.
(b) The marginal probability function of X is
n
Xx n! h in x y
2 x
f1 (x) = P (X = x) = [2 (1 )]y (1 )2
x!y! (n x y)!
y=0
n!
1
X (n x)! h in x y
2 x
= [2 (1 )]y (1 )2
x! (n x)! y! (n x y)!
y=0
n h in x
x
= 2
2 (1 ) + (1 )2 by the Binomial Series 2.11.3(1)
x
n 2 x 2 n x
= 1 for x = 0; 1; : : : ; n
x
2
and so X Binomial n; .
(c) In a similar manner to (b) the marginal probability function of Y can be shown to be
Binomial(n; 2 (1 )) since P (Aa) = 2 (1 ).
(d)
P P P
t
P (X + Y = t) = f (x; y) = f (x; t x)
(x;y): x+y=t x=0
t
X n! h in t
2 x
= [2 (1 )]t x
(1 )2
x! (t x)! (n t)!
x=0
n h in t
tX t 2 x
= (1 )2 [2 (1 )]t x
t x
x=0
n h in t t
= (1 )2 2
+ 2 (1 ) by the Binomial Series 2.11.3(1)
t
n h in t
t
= 2
+ 2 (1 ) (1 )2 for t = 0; 1; : : : ; n
t
Exercise 3.3.6
(a)
Z1 Z1 Z1 Z1
1
1 = f (x; y)dxdy = k dxdy
(1 + x + y)3
1 1 0 0
Z1
k 1
= lim ja0 dy
2 a!1 (1 + x + y)2
0
Z1
k 1 1
= lim 2 + dy
2 a!1 (1 + a + y) (1 + y)2
0
Z1
k 1
= dy
2 (1 + y)2
0
k 1 a
= lim j
2 a!1 (1 + y) 0
k 1
= lim +1
2 a!1 (1 + a)
k
=
2
Therefore k = 2. A graph of the joint probability density function is given in Figure 9.5.
1 .5
0 .5
0
0
0
1 1
2 2
3 3
4 4
Figure 9.5: Graph of joint probability density function for Exercise 3.3.6
290 9. SOLUTIONS TO CHAPTER EXERCISES
(b) (i)
Z2 Z1 Z2
2 1
P (X 1; Y 2) = dxdy = j10 dy
(1 + x + y)3 (1 + x + y)2
0 0 0
Z2
1 1 1 1
= 2 + dy = j20
(2 + y) (1 + y)2 (2 + y) (1 + y)
0
1 1 1 1 5
= +1= (3 4 6 + 12) =
4 3 2 12 12
(ii) Since f (x; y) and the support set A = f(x; y) : x 0; y 0g are both symmetric in x
and y, P (X Y ) = 0:5
(iii)
Z1 Z
1 y Z1
2 1
P (X + Y 1) = dxdy = j10 y
dy
(1 + x + y)3 (1 + x + y)2
0 0 0
Z1
1 1 1 1 1
= 2 + dy = yj j1
(2) (1 + y)2 4 0 (1 + y) 0
0
1 1 1
= +0 +1 =
4 2 4
(c) Since
Z1 Z1
2 1
f (x; y)dy = dy = lim ja0
(1 + x + y)3 a!1 (1 + x + y)2
1 0
1
= for x 0
(1 + x)2
(d) Since
Zy Zx Zy
2 1 x
P (X x; Y y) = dsdt = 2 j0 dt
(1 + s + t)3 (1 + s + t)
0 0 0
Zy
1 1
= 2 + dt
(1 + x + t) (1 + t)2
0
1 1
= jy0
1+x+t 1+t
1 1 1
= + 1 for x 0, y 0
1+x+y 1+y 1+x
the joint cumulative distribution function of X and Y is
8
< 0 x < 0 or y < 0
F (x; y) = P (X x; Y y) = 1 1 1
: 1 + 1+x+y 1+y 1+x x 0, y 0
(e) Since
1 1 1
lim F (x; y) = lim 1+
y!1 y!1 1+x+y 1+y 1+x
1
= 1 for x 0
1+x
the marginal cumulative distribution function of X is
(
0 x<0
F1 (x) = P (X x) = 1
1 1+x x 0
Check:
d d 1
F1 (x) = 1
dx dx 1+x
1
=
(1 + x)2
= f1 (x) for x 0
Exercise 3.3.7
(a) Since the support set is A = f(x; y) : y > x 0g
Z1 Z1 Z1 Zy
x y
1 = f (x; y)dxdy = k e dxdy
1 1 0 0
Z1
x y y
= k e j0 dy
0
Z1
2y y
= k e +e dy
0
1 2y y
= k lim e e ja0
a!1 2
1 2a a 1
= k lim e e +1
a!1 2 2
k
=
2
and therefore k = 2. A graph of the joint probability density function is given in Figure 9.6
1.5
0.5
0
0
0 0.5
0.5
1
1
1.5 1.5
Figure 9.6: Graph of joint probability density function for Exercise 3.3.7
9.2. CHAPTER 3 293
Z1 Z2 Z1
x y x y 2
P (X 1; Y 2) = 2 e dydx = 2 e e jx dx
x=0 y=x x=0
Z1 Z1
x 2 x 2 x 2x
= 2 e e +e dx = 2 e e +e dx
x=0 x=0
2 x 1 2x 1 1
= 2 e e e j10 = 2 e 3
e 2
e 2
+
2 2 2
3 2
= 1 + 2e 3e
y=x
(x,x)
1 x
0
(ii) Since the support set A = f(x; y) : 0 < x < y < 1g contains only values for which
x < y then P (X Y ) = 1.
(iii) The region of integration is shown in Figure 9.8
Z1=2 Z
1 x Z1
x y x y 1 x
P (X + Y 1) = 2 e dydx = 2 e e jx dx
x=0 y=x x=0
Z1 Z1
x x 1 x 1 2x
= 2 e e +e dx = 2 e +e dx
x=0 x=0
1 1 2x 1=2 1 1 1 1 1
= 2 e x e j0 =2 e e +0+
2 2 2 2
1
= 1 2e
294 9. SOLUTIONS TO CHAPTER EXERCISES
(x,1-x) y =x
y =1-x
(x,x)
x
0 1/2 1
Z1 Z1
x y x y a 2x
f1 (x) = f (x; y)dy = 2 e dy = 2e lim e jx = 2e for x > 0
a!1
1 x
Z1 Zy
x y y x y y y
f (x; y)dy = 2 e dx = 2e e j0 = 2e 1 e for y > 0
1 0
and 0 otherwise.
(d) Since
Zx Zy Zx
s t s
P (X x; Y y) = 2 e dtds = 2 e e t jys ds
s=0 t=s 0
Zx
s y s
= 2 e e +e ds
0
Zx
y s 2s y s 1 2s
= 2 e e +e ds = 2 e e e jx0
2
0
x y 2x y
= 2e e 2e + 1 for y x 0
9.2. CHAPTER 3 295
Zy Zy
s t
P (X x; Y y) = 2 e dtds
s=0 t=s
2y y
= e 2e + 1 for x > y > 0
and
P (X x; Y y) = 0 for x 0 or y 0
therefore the joint cumulative distribution function of X and Y is
8
x y e 2x 2e y + 1 y x 0
>
> 2e
<
F (x; y) = P (X x; Y y) = e 2y 2e y + 1 x>y>0
>
>
: 0 x 0 or y 0
Check:
Zy Zy
1
P (Y y) = f2 (t) dt = 2 e t
e 2t
dt = 2 e t
+ e 2t
jy0
2
0 0
1 1
= 2 e y+ e 2y
+1 =1+e 2y
2e y
for y > 0
2 2
or
d d 2y y 2y y
F2 (y) = e 2e +1 = 2e + 2e = f2 (y) for y > 0
dy dy
296 9. SOLUTIONS TO CHAPTER EXERCISES
Exercise 3.4.4
From the solution to Exercise 3.2.5 we have
n! h in x y
2 x
f (x; y) = [2 (1 )]y (1 )2
x!y! (n x y)!
for x = 0; 1; : : : ; n; y = 0; 1; : : : ; n; x + y n
n 2 x 2 n x
f1 (x) = 1 for x = 0; 1; : : : ; n
x
and
n
f2 (y) = [2 (1 )]y [1 2 (1 )]n y
for y = 0; 1; : : : ; n
y
Since
2 n
f (0; 0) = (1 )2n 6= f1 (0) f2 (0) = 1 [1 2 (1 )]n
therefore X and Y are not independent random variables.
From the solution to Exercise 3.3.6 we have
2
f (x; y) = for x 0, y 0
(1 + x + y)3
1
f1 (x) = for x 0
(1 + x)2
and
1
f2 (y) = for y 0
(1 + y)2
Since
f (0; 0) = 2 6= f1 (0) f2 (0) = (1) (1)
therefore X and Y are not independent random variables.
From the solution to Exercise 3.3.7 we have
x y
f (x; y) = 2e for 0 < x < y < 1
2x
f1 (x) = 2e for x > 0
and
y y
f2 (y) = 2e 1 e for y > 0
Since
3 1 2 2
f (1; 2) = 2e 6= f1 (1) f2 (2) = 2e 2e 1 e
therefore X and Y are not independent random variables.
9.2. CHAPTER 3 297
Exercise 3.5.3
P (X = x; Y = y)
P (Y = yjX = x) =
P (X = x)
h in x y
2 x
n!
x!y!(n x y)! [2 (1 )]y (1 )2
=
n! 2 x 2 n x
x!(n x)! 1
h in x y
y 2
(n x)! [2 (1 )] (1 )
=
y! (n x y)! 2 y 2 n x y
1 1
" #y " #(n x) y
n x 2 (1 ) (1 )2
= 2 2
y 1 1
" #y " #(n x) y
2
n x 2 (1 ) 1 2 +
= 2 2
y 1 1
" #y " #(n x) y
2 2
n x 2 (1 ) 1 2 +2
= 2 2
y 1 1
" #y " #(n x) y
n x 2 (1 ) 2 (1 )
= 2 1 2
y 1 1
(1 )2 (1 )2 2 (1 )
2 = 2 =1 2
2 (1 ) + (1 ) 1 1
Since we have (n x) independent trials with probability of Success (type Aa) equal to
2 (1 )
then it follows that the number of Aa types, given that there are x members of type
(1 2 )
AA, would follow a Binomial n x; 2 1(1 2 ) distribution.
( )
298 9. SOLUTIONS TO CHAPTER EXERCISES
Exercise 3.5.4
For Example 3.3.3 the conditional probability density function of X given Y = y is
f (x; y) x+y
f1 (xjy) = = for 0 < x < 1 for each 0 < y < 1
f2 (y) y + 12
Check:
Z1 Z1
x+y 1 1 2 1 1
f1 (xjy) dx = 1 dx = 1 x + xyj10 = 1 +y 0 =1
y+2 y+ 2
2 y+ 2
2
1 0
and
Z1
f2 (yjx) dy = 1
1
Check:
Z1 Z1
2 (1 + y)2 2 1 1
f1 (xjy) dx = 3 dx = (1 + y) a!1
lim 2 j0
(1 + x + y) (1 + x + y)
1 0
1 1 1
= (1 + y)2 lim 2 + 2 = (1 + y)
2
=1
a!1 (1 + a + y) (1 + y) (1 + y)2
By symmetry the conditional probability density function of Y given X = x is
2 (1 + x)2
f2 (yjx) = for y > 0 for each x > 0
(1 + x + y)3
and
Z1
f2 (yjx) dy = 1
1
Check:
Z1 Zy x y
e 1 x y 1 e
f1 (xjy) dx = y
dx = y
e j0 = y
=1
1 e 1 e 1 e
1 0
f (x; y) 2e x y
f2 (yjx) = = = ex y
for y > x for each x > 0
f1 (x) 2e 2x
Check:
Z1 Z1
f2 (yjx) dy = ex y
dy = ex lim e y a
jx = ex 0 + e x
=1
a!1
1 x
Exercise 3.6.10
Z1 Z1 Z1 Zy
x y
E (XY ) = xyf (x; y)dxdy = 2 xye dxdy
1 1 0 0
0 1
Z1 Zy Z1
= 2 ye y @ xe x
dxA dy = 2 ye y
xe x
e x
jy0 dy
0 0 0
Z1 Z1
y y y
= 2 ye ye e + 1 dy = 2 y2e 2y
+ ye 2y
ye y
dy
0 0
Z1 Z1 Z1
= 2 y2e 2y
dy (2y) e 2y
dy + 2 ye y
dy
0 0 0
Z1 Z1
= 2 y2e 2y
dy (2y) e 2y
dy + 2 (2) but (2) = 1! = 1
0 0
Z1 Z1
2 2y 2y u 1
= 2 2 y e dy (2y) e dy let u = 2y or y = so dy = du
2 2
0 0
Z1 Z1
u 2
u1 u1
= 2 2 e du ue du
2 2 2
0 0
Z1 Z1
1 1 1 1
= 2 u2 e u
du ue u
du = 2 (3) (2) but (3) = 2! = 2
4 2 4 2
0 0
1 1
= 2 =1
2 2
300 9. SOLUTIONS TO CHAPTER EXERCISES
Z1 Z1
2 2
E Y = y f2 (y)dy = 2 y2e y
e y
+ 1 dy
1 0
Z1 Z1 Z1
2 2y 2 y
= 2 y e dy + 2 y e dy = 2 y2e 2y
dy + 2 (3)
0 0 0
Z1
= 4 2 y2e 2y
dy let u = 2y
0
Z1 Z1
u 2
u1 1 1 1 7
= 4 2 e du = 4 u2 e u
du = 4 (3) = 4 =
2 2 4 4 2 2
0 0
2
7 3 14 9 5
V ar (Y ) = E Y 2 [E (Y )]2 = = =
2 2 4 4
Therefore
1 3 3 1
Cov (X; Y ) = E (XY ) E (X) E (Y ) = 1 =1 =
2 2 4 4
and
1
Cov(X; Y ) 4 1
(X; Y ) = =q =p
X Y 1 5 5
4 4
9.2. CHAPTER 3 301
Exercise 3.7.12
Since Y Gamma( ; 1 )
1
E (Y ) = =
2
1
V ar (Y ) = = 2
and
Z1 1 y
k y e
E Y = yk dy let u = y
( )
0
Z1 k Z1
u k+ 1
u1
= e du = uk+ 1
e u
du
( ) ( )
0 0
k
( + k)
= for +k >0
( )
1=p 1 1=p 1
E (Xjy) = y 1+ , E (XjY ) = Y 1+
p p
and
1=p
2 2 2 1
V ar (Xjy) = y 1+ 1+
p p
2=p 2 2 1
V ar (XjY ) = Y 1+ 1+
p p
Therefore
1 1=p
E (X) = E [E (XjY )] = 1+ E Y
p
1=p 1
1 p
= 1+
p ( )
1 1
1+ p p
1=p
=
( )
302 9. SOLUTIONS TO CHAPTER EXERCISES
and
Exercise 3.7.13
Since P Beta(a; b)
a ab
E (P ) = , V ar (P ) =
a+b (a + b + 1) (a + b)2
and
Z1
(a + b) a
E P k
= pk p 1
(1 p)b 1
dp
(a) (b)
0
Z1
(a + b)
= pa+k 1
(1 p)b 1
dp
(a) (b)
0
Z1
(a + b) (a + k) (a + k + b) k+a
= p 1
(1 p)b 1
dp
(a) (a + k + b) (a + k) (b)
0
(a + b) (a + k) (a + b) (a + k)
= (1) =
(a) (a + k + b) (a) (a + k + b)
9.2. CHAPTER 3 303
1 p 1 P 1
E (Y jp) = , E (Y jP ) = = 1
p P P
and
1 p 1 P 1 1
V ar (Y jp) = , V ar (Y jP ) = = 2
p2 P2 P P
Therefore
1 1
E (Y ) = E [E (Y jP )] = E 1 =E P 1
P
(a + b) (a 1)
= 1
(a) (a 1 + b)
(a 1 + b) (a 1 + b) (a 1)
= 1
(a 1) (a 1) (a 1 + b)
a 1+b b
= 1=
a 1 a 1
provided a > 1 and a + b > 1.
Now
V ar(Y ) = E[V ar(Y jP )] + V ar[E(Y jP )]
1 1 1
= E + V ar 1
P2 P P
" #
2 2
1 1 1 1
= E E +E 1 E 1
P2 P P P
2
1 1 1 1 1 1
= E E +E 2E +1 E + 2E 1
P2 P P2 P P P
2 1 1 2
= 2E P E P E P
2
(a + b) (a 2) (a + b) (a 1) (a + b) (a 1)
= 2
(a) (a 2 + b) (a) (a 1 + b) (a) (a 1 + b)
2
(a + b 2) (a + b 1) a + b 1 a+b 1
= 2
(a 2) (a 1) a 1 a 1
a+b 1 a+b 2 a+b 1
= 2 1
a 1 a 2 a 1
ab (a + b 1)
=
(a 1)2 (a 2)
Exercise 3.9.4
If T = Xi + Xj ; i 6= j; then
T Binomial(n; pi + pj )
The moment generating function of T = Xi + Xj is
9.3 Chapter 4
Exercise 4.1.2
The support set of (X; Y ) is A = f(x; y) : 0 < x < y < 1g which is the union of the regions
E and F shown in Figure 9.3For s > 1
x=y/s
x=y
1
F E
1 x
0
Y
G (s) = P (S s) = P s = P (Y sX) = P (Y sX 0)
X
Z Z Z1 Zy Z1
= 3ydxdy = 3ydxdy = 3y xjyy=s dy
(x;y) 2 E y=0 x=y=s 0
Z1 Z1
y 1 1 1
= 3y y dy = 1 3y 2 dy = 1 y 3 j10 = 1
s s s s
0 0
1 1
As a check we note that lim 1 s = 0 = G (1) and lim 1 s = 1 so G (s) is a
s!1+ s!1
continuous function for all s 2 <.
For s > 1
d d 1 1
g (s) = G (s) = 1 =
ds ds s s2
The probability density function of S is
1
g (s) = for s > 1
s2
and 0 otherwise.
306 9. SOLUTIONS TO CHAPTER EXERCISES
Exercise 4.2.5
Since X Exponential(1) and Y Exponential(1) independently, the joint probability
density function of X and Y is
x y
f (x; y) = f1 (x) f2 (y) = e e
x y
= e
with support set RXY = f(x; y) : x > 0; y > 0g which is shown in Figure 9.9.The transfor-
. .
y . .
. .
...
x
0
mation
S : U =X +Y, V =X Y
has inverse transformation
U +V U V
X= , Y =
2 2
Under S the boundaries of RXY are mapped as
and the point (1; 2) is mapped to and (3; 1). Thus S maps RXY into
v v =u
...
u
0
...
v =-u
u+v u v 1
g (u; v) = f ;
2 2 2
1 u
= e for (u; v) 2 RU V
2
and 0 otherwise.
To …nd the marginal probability density functions for U we note that the support set
RU V is not rectangular and the range of integration for v will depend on u.
Z1
g1 (u) = g (u; v) dv
1
Zu
1 u
= e dv
2
v= u
1
= ue u (2)
2
= ue u for u > 0
and 0 otherwise which is the probability density function of a Gamma(2; 1) random variable.
Therefore U Gamma(2; 1).
308 9. SOLUTIONS TO CHAPTER EXERCISES
To …nd the marginal probability density functions for V we need to consider the two
cases v 0 and v < 0. For v 0
Z1
g2 (v) = g (u; v) du
1
Z1
1 u
= e du
2
u=v
1 u b
= lim e jv
2 b!1
1 b v
= lim e e
2 b!1
1 v
= e
2
For v < 0
Z1
g2 (v) = g (u; v) du
1
Z1
1 u
= e du
2
u= v
1 u b
= lim e j v
2 b!1
1 b
= lim e ev
2 b!1
1 v
= e
2
Therefore the probability density function of V is
( 1 v
2e v<0
g2 (v) = 1 v
2e v 0
Exercise 4.2.7
Since X Beta(a; b) and Y Beta(a + b; c) independently, the joint probability density
function of X and Y is
(a + b) a 1 (a + b + c) a+b
f (x; y) = x (1 x)b 1
y 1
(1 y)c 1
(a) (b) (a + b) (c)
(a + b + c) a
= x 1
(1 x)b 1
y a+b 1
(1 y)c 1
(a) (b) (c)
with support set RXY = f(x; y) : 0 < x < 1; 0 < y < 1g as shown in Figure 9.11.
x
0 1
The transformation
S : U = XY , V = X
has inverse transformation
X =V, Y = U=V
Under S the boundaries of RXY are mapped as
(k; 0) ! (0; k) 0 k 1
(0; k) ! (0; 0) 0 k 1
(1; k) ! (k; 1) 0 k 1
(k; 1) ! (k; k) 0 k 1
1 1 1 1
and the point 2; 3 is mapped to and 6; 2 . Thus S maps RXY into
u
0 1
Exercise 4.2.12
(a) Consider the transformation
X=n
S: U= , V =Y
Y =m
which has inverse transformation
n
X= UV , Y = V
m
Since X 2 (n) independently of Y 2 (m) then the joint probability density function
of X and Z is
1 1
f (x; y) = f1 (x) f2 (y) = xn=2 1
e x=2
y m=2 1
e y=2
2n=2 (n=2) 2m=2 (m=2)
1
= xn=2 1
e x=2 m=2 1
y e y=2
2(n+m)=2 (n=2) (m=2)
with support set RXY = f(x; y) : x > 0; y > 0g. The transformation S maps RXY into
RU V = f(u; v) : u > 0; v > 0g.
312 9. SOLUTIONS TO CHAPTER EXERCISES
and 0 otherwise.
To determine the distribution of U we still need to …nd the marginal probability density
function for U .
Z1
g1 (u) = g (u; v) dv
1
n n=2 n=2 1 Z1
u nu
= m
(n+m)=2
v (n+m)=2 1 e v( m +1)=2 dv
2 (n=2) (m=2)
0
1 1
n
Let y = v2 1 + m n
u so that v = 2y 1 + m u and dv = 2 1 + n
mu dy. Note that when
v = 0 then y = 0, and when v ! 1 then y ! 1. Therefore
n n=2 n=2 1 Z1
u n 1 (n+m)=2 1 n 1
m y
g1 (u) = 2y 1 + u e 2 1+ u dy
2(n+m)=2 (n=2) (m=2) m m
0
n n=2 n=2 1 (n+m)=2 Z1
u 2 n 1 (n+m)=2
= m
1+ u y (n+m)=2 1
e y
dy
2(n+m)=2 (n=2) (m=2) m
0
n n=2 n=2 1
m u n (n+m)=2 n+m
= 1+ u
(n=2) (m=2) m 2
n n=2 n+m
n (n+m)=2
= m 2
un=2 1
1+ u for u > 0
(n=2) (m=2) m
and 0 otherwise which is the probability density function of a random variable with a
F(n; m) distribution. Therefore U = YX=n
=m F(n; m).
(b) To …nd E (U ) we use
X=n m 1
E (U ) = E = E (X) E Y
Y =m n
9.3. CHAPTER 4 313
W 2 (k) then
2p (k=2 + p) k
E (W p ) = for + p > 0
(k=2) 2
Therefore
2 (n=2 + 1)
E (X) = =n
(n=2)
2 1 (m=2 1) 1 1
1
E Y = = = if m > 2
(m=2) 2 (m=2 1) m 2
and
m 1 m
E (U ) = (n) E = if m > 2
n m 2 m 2
To …nd V ar (U ) we need
" #
(X=n)2 m2
E U2 =E = E X2 E Y 2
(Y =m)2 n2
Now
22 (n=2 + 2) n n
E X2 = =4 +1 = n (n + 2)
(n=2) 2 2
2 2 (m=2 2) 1 1
2
E Y = = m m = for m > 4
(m=2) 4 2 1 2 2 (m 2) (m 4)
and
m2 1 n+2 m2
E U2 = n (n + 2) = for m > 4
n2 (m 2) (m 4) n (m 2) (m 4)
Therefore
V ar (U ) = E U 2 [E (U )]2
2
n+2 m2 m
=
n (m 2) (m 4) m 2
m 2 n+2 1
=
m 2 n (m 4) m 2
m2 (n + 2) (m 2) n (m 4)
=
m 2 n (m 4) (m 2)
m 2 2 (n + m 2)
=
m 2 n (m 4) (m 2)
2m2 (n + m 2)
= for m > 4
n (m 2)2 (m 4)
314 9. SOLUTIONS TO CHAPTER EXERCISES
Exercise 4.3.3
X1 ; X2 ; : : : ; Xn are independent and identically distributed random variables with moment
generating function M (t), E (Xi ) = , and V ar (Xi ) = 2 < 1.
p
The moment generating function of Z = n X = is
p
n(X )=
MZ (t) = E etZ = E et
p
p
n = t t n1 P n
= e E exp Xi
n i=1
p
n = t t P n
= e E exp p Xi
n i=1
t Q
p n t
n =
= e E exp p Xi since X1 ; X2 ; : : : ; Xn are independent
i=1 n
t Q
p n t
n =
= e M p since X1 ; X2 ; : : : ; Xn are identically distributed
i=1 n
p n
n = t t
= e M p
n
Exercise 4.3.7
P
n P
n
2
(Xi )2 = Xi X +X
i=1 i=1
Pn
2 P
n P
n
2
= Xi X +2 X Xi X + X
i=1 i=1 i=1
Pn
2 2
= Xi X +n X
i=1
since
P
n P
n P
n P
n
Xi X = Xi X= Xi nX
i=1 i=1 i=1 i=1
Pn 1 Pn
= Xi n Xi
i=1 n i=1
Pn P
n
= Xi Xi
i=1 i=1
= 0
9.3. CHAPTER 4 315
Exercise 4.3.11
Since X1 ; X2 ; : : : ; Xn are independent N 1 ; 21 random variables then by Theorem
4.3.8
Pn
2
2 Xi X
(n 1) S1
U= 2 = i=1 2
2
(n 1)
1 1
P
m
2
Yi Y
(m 1) S22 i=1 2
V = 2 = 2 (m 1)
2 2
S12 = 2
1
F (n 1; m 1)
S22 = 2
2
316 9. SOLUTIONS TO CHAPTER EXERCISES
9.4 Chapter 5
Exercise 5.4.4
If Xn Binomial(n; p) then
n
Mn (t) = E etXn = pet + q for t 2 < (9.7)
If = np then
p=and q = 1 (9.8)
n n
Substituting 9.8 into 9.7 and simplifying gives
" #n
n et 1
Mn (t) = et + 1 = 1+ for t 2 <
n n n
Now " #n
et 1 t
lim 1+ = e (e 1)
for t < 1
n!1 n
t
by Corollary 5.1.3. Since M (t) = e (e 1) for t 2 < is the moment generating function of
a Poisson( ) random variable then by Theorem 5.4.1, Xn !D X Poisson( ).
By Theorem 5.4.2
n x n x (np)x e np
P (Xn = x) = p (q) for x = 0; 1; : : :
x x!
Exercise 5.4.7
Let Xi Binomial(1; p), i = 1; 2; : : : independently. Since X1 ; X2 ; : : : are independent and
identically distributed random variables with E (Xi ) = p and V ar (Xi ) = p (1 p), then
p
n Xn p
p !D Z N(0; 1)
p (1 p)
P
n
by the Central Limit Theorem. Let Sn = Xi . Then
i=1
1 P
n
n n Xi p p
S np n Xn p
p n = p p
i=1
= p !D Z N(0; 1)
np (1 p) n p (1 p) p (1 p)
Now by 4.3.2(1) Sn Binomial(n; p) and therefore Yn and Sn have the same distribution.
It follows that
Yn np
Zn = p !D Z N(0; 1)
np (1 p)
9.4. CHAPTER 5 317
Exercise 5.5.3
(a) Let g (x) = x2 which is a continuous function for all x 2 <. Since Xn !p a then by
5.5.1(1), Xn2 = g (Xn ) !p g (a) = a2 or Xn2 !p a2 .
(b) Let g (x; y) = xy which is a continuous function for all (x; y) 2 <2 . Since Xn !p a and
Yn !p b then by 5.5.1(2), Xn Yn = g (Xn ; Yn ) !p g (a; b) = ab or Xn Yn !p ab.
(c) Let g (x; y) = x=y which is a continuous function for all (x; y) 2 <2 ; y 6= 0. Since
Xn !p a and Yn !p b 6= 0 then by 5.5.1(2), Xn =Yn = g (Xn ; Yn ) !p g (a; b) = a=b , b 6= 0
or Xn =Yn !p a=b, b 6= 0.
(d) Let g (x; z) = x 2z which is a continuous function for all (x; z) 2 <2 . Since Xn !p a
and Zn !D Z N(0; 1) then by Slutsky’s Theorem, Xn 2Zn = g (Xn ; Zn ) !D g (a; Z) =
a 2Z or Xn 2Zn !D a 2Z where Z N(0; 1). Since a 2Z N(a; 4), therefore
Xn 2Zn !D a 2Z N(a; 4)
(e) Let g (x; z) = 1=z which is a continuous function for all (x; z) 2 <2 ; z 6= 0. Since
Zn !D Z N(0; 1) then by Slutsky’s Theorem, 1=Zn = g (Xn ; Zn ) !D g (a; z) = 1=Z
or 1=Zn !D 1=Z where Z N(0; 1). Since h (z) = 1=z is a decreasing function for all z 6= 0
then by Theorem 2.6.8 the probability density function of W = 1=Z is
1 1=(2w2 ) 1
f (w) = p e for z 6= 0
2 w2
Exercise 5.5.8
By (5.9) p
n Xn
p !D Z N(0; 1)
9.5 Chapter 6
Exercise 6.4.6
The probability density function of a Exponential(1; ) random variable is
(x )
f (x; ) = e for x and 2<
and zero otherwise. The support set of the random variable X is [ ; 1) which depends on
the unknown parameter .
The likelihood function is
Qn
L( ) = f (xi ; )
i=1
Q
n
(xi )
= e if xi , i = 1; 2; : : : ; n and 2<
i=1
Qn
xi
= e en if xi and 2<
i=1
or more simply 8
<0 if > x(1)
L( ) =
:en if x(1)
where x(1) = min (x1 ; x2 ; : : : ; xn ) is the maximum of the sample. (Note: In order to observe
the sample x1 ; x2 ; : : : ; xn the value of must be smaller than all the observed xi ’s.) L( )
is a increasing function of on the interval ( 1; x(1)] ]. L( ) is maximized at = x(1) .
The maximum likelihood estimate of is ^ = x(1) and the maximum likelihood estimator
is ~ = X(1) .
Note that in this example there is no solution to dd l ( ) = dd (n ) = 0 and the maximum
likelihood estimate of is not found by solving dd l ( ) = 0.
If n = 12 and x(1) = 2 8
<0 if > 2
L( ) =
:e12 if 2
The relative likelihood function is
8
<0 if >2
R( ) =
:e12( 2) if 2
is graphed in Figure 9.13 along with lines for determining 10% and 50% likelihood intervals.
To determine the value of at which the horizontal line R = p intersects the graph of
R( ) we solve e12( 2) = p to obtain = 2 + log p=12. Since R( ) = 0 if > 2 then a
100p% likelihood interval for is of the form [2 + log (p) =12; 2]. For p = 0:1 we obtain the
10% likelihood interval [2 + log (0:1) =12; 2] = [1:8081; 2]. For p = 0:5 we obtain the 50%
likelihood interval [2 + log (0:5) =12; 2] = [1:9422; 2].
9.5. CHAPTER 6 319
0.9
0.8
0.7
0.6
R(θ)
0.5
0.4
0.3
0.2
0.1
0
1.5 1.6 1.7 1.8 1.9 2 2.1 2.2
θ
Exercise 6.7.3
By Chapter 5, Problem 7 we have
p Xn
n n
Q(Xn ; ) = q !D Z N(0; 1) (9.10)
Xn Xn
n 1 n
where ^ = xn =n.
320 9. SOLUTIONS TO CHAPTER EXERCISES
Exercise 6.7.12
(a) The likelihood function is
Q
n Q
n e (x )
L( ) = f (xi ; ) =
i=1 i=1 1+e (x ) 2
P
n Q
n 1
= exp xi en for 2<
i=1 i=1 1+e (xi ) 2
or more simply
Q
n 1
L( ) = en for 2<
i=1 1+e (xi ) 2
Notice that S( ) = 0 cannot be solved explicitly. The maximum likelihood estimate can
only be determined numerically for a given sample of data x1 ; x2 ; : : : ; xn . Note that since
" #
d Pn e xi
S( )= 2 2 for 2 <
d i=1 (1 + exi )
is negative for all values of > 0 then we know that S ( ) is always decreasing so there
is only one solution to S( ) = 0. Therefore the solution to S( ) = 0 gives the maximum
likelihood estimate.
The information function is
d2
I( ) = l( )
d 2
P
n e xi
= 2 2 for 2<
i=1 (1 + exi )
Solving
1
u=
1 + e (x )
9.5. CHAPTER 6 321
gives
1
x= log 1
u
Therefore the inverse cumulative distribution function is
1 1
F (u) = log 1 for 0 < u < 1
u
1
If u is an observation from the Uniform(0; 1) distribution then log u 1 is an obser-
vation from the Logistic( ; 1) distribution by Theorem 2.6.6.
(c) The generated data are
0:18 0:05 0:32 0:78 1:04 1:11 1:26 1:41 1:50 1:57
1:58 1:60 1:68 1:68 1:71 1:89 1:93 2:02 2:2:5 2:40
2:47 2:59 2:76 2:78 2:87 2:91 4:02 4:52 5:25 5:56
(d) Here is R code for calculating and graphing the likelihood function for these data.
# function for calculating Logistic information for data x and theta=th
LOLF<-function(th,x)
{n<-length(x)
L<-exp(n*th)*(prod(1+exp(th-x)))^(-2)
return(L)}
th<-seq(1,3,0.01)
L<-sapply(th,LOLF,x)
plot(th,L,"l",xlab=expression(theta),
ylab=expression(paste("L(",theta,")")),lwd=3)
The graph of the likelihood function is given in Figure 9.5.(e) Here is R code for Newton’s
50000
30000
L(θ)
0 10000
θ
322 9. SOLUTIONS TO CHAPTER EXERCISES
I(^) = 11:65138
9.5. CHAPTER 6 323
(g) Here is R code for plotting the relative likelihood function for based on these data.
# function for calculating Logistic relative likelihood function
LORLF<-function(th,thetahat,x)
{R<-LOLF(th,x)/LOLF(thetahat,x)
return(R)}
#
# plot the Logistic relative likelihood function
#plus a line to determine the 15% likelihood interval
th<-seq(1,3,0.01)
R<-sapply(th,LORLF,thetahat,x)
plot(th,R,"l",xlab=expression(theta),
ylab=expression(paste("R(",theta,")")),lwd=3)
abline(a=0.15,b=0,col="red",lwd=2)
The graph of the relative likelihood function is given in Figure 9.5.
1.0
0.8
0.6
R(θ)
0.4
0.2
0.0
(h) Here is R code for determining the 15% likelihood interval and the approximate 95%
con…dence interval (6.18)
# determine a 15% likelihood interval using uniroot
uniroot(function(th) LORLF(th,thetahat,x)-0.15,lower=1,upper=1.8)$root
uniroot(function(th) LORLF(th,thetahat,x)-0.15,lower=2.2,upper=3)$root
# calculate an approximate 95% confidence intervals for theta
L95<-thetahat-1.96/sqrt(Ithetahat)
U95<-thetahat+1.96/sqrt(Ithetahat)
cat("Approximate 95% confidence interval = ",L95,U95) # display values
324 9. SOLUTIONS TO CHAPTER EXERCISES
[1:443893; 2:592304]
which are very close due to the symmetric nature of the likelihood function.
9.6. CHAPTER 7 325
9.6 Chapter 7
Exercise 7.1.11
If x1 ; x2 ; :::; xn is an observed random sample from the Gamma( ; ) distribution then the
likelihood function for ; is
Q
n
L( ; ) = f (xi ; ; )
i=1
Q
n xi 1 xi =
= e for > 0; >0
i=1 ( )
1
n Q
n t2
= [ ( ) ] xi exp for > 0; >0
i=1
where
P
n
t2 = xi
i=1
or more simply
n Q
n t2
L( ; ) = [ ( ) ] xi exp for > 0; >0
i=1
The log likelihood function is
l ( ; ) = log L ( ; )
t2
= n log ( ) n log + t1 for > 0; >0
where
P
n
t1 = log xi
i=1
The score vector is
h i
@l @l
S( ; ) = @ @
h i
t2 n
= n ( ) + t1 n log 2
where
d
(z) = log (z)
dz
is the digamma function.
The information matrix is
2 3
@2l @2l
@ 2 @ @
I( ; ) = 4 @2l @2l
5
@ @ @ 2
2 3
0 n
n ( )
= 4 n 2t2 n
5
3 2
2 3
0 1
( )
= n4 1 2x
5
3 2
326 9. SOLUTIONS TO CHAPTER EXERCISES
where
0 d
(z) = (z)
dz
is the trigamma function.
The expected information matrix is
J ( ; ) = E [I ( ; ) ; X1 ; :::; Xn ]
2 0 1
3
( )
= n4 2E (X; ; )
1
5
3 2
2 3
0 1
( )
= n4 1 2
5
2 2
since E X; ; = .
S ( ; ) = (0 0) must be solved numerically to …nd the maximum likelihood estimates
of and .
9.6. CHAPTER 7 327
Exercise 7.1.14
The data are
1:58 2:78 2:81 3:29 3:45 3:64 3:81 4:69 4:89 5:37
5:47 5:52 5:87 6:07 6:11 6:12 6:26 6:42 6:74 7:49
7:93 7:99 8:14 8:31 8:72 9:26 10:10 12:82 15:22 17:82
The maximum likelihood estimates of and can be found using Newton’s Method
h i h i h i 1
(i+1) (i+1) (i) (i) (i) (i) (i) (i)
= +S ; I ; for i = 0; 1; : : :
Exercise 7.2.3
(a) The following R code generates the required likelihood regions.
# function for calculating Gamma relative likelihood for parameters a and b and
data x
GARLF<-function(a,b,that,x)
{t<-prod(x)
t2<-sum(x)
n<-length(x)
ah<-that[1]
bh<-that[2]
L<-((gamma(ah)*bh^ah)/(gamma(a)*b^a))^n
*t^(a-ah)*exp(t2*(1/bh-1/b))
return(L)}
a<-seq(1,8.5,0.02)
b<-seq(0.2,4.5,0.01)
R<-outer(a,b,FUN = GARLF,thetahat,x)
contour(a,b,R,levels=c(0.01,0.05,0.10,0.50,0.9),xlab="a",ylab="b",lwd=2)
The 1%, 5%, 10%, 50%, and 90% likelihood regions for ( ; ) are shown in Figure 9.14.
4
3
b
2
1
2 4 6 8
The likelihood contours are not very elliptical in shape. The contours suggest that large
values of together with small values of or small values of together with large values
of are plausible given the observed data.
(b) Since R (3; 2:7) = 0:14 the point (3; 2:7) lies inside a 10% likelihood region so it is a
plausible value of ( ; ).
(d) The 1%, 5%, 10%, 50%, and 90% likelihood regions for ( ; ) for 100 observations are
shown in Figure 9.15. We note that for a larger number of observations the likelihood
regions are more elliptical in shape.
4.5
4.0
3.5
3.0
b
2.5
2.0
1.5
Exercise 7.4.4
The following R code graphs the approximate con…dence regions. The function ConfRegion
was used in Example 7.4.3.
# graph approximate confidence regions
c<-outer(a,b,FUN = ConfRegion,thetahat,Ithetahat)
contour(a,b,c,levels=c(4.61,5.99,9.21),xlab="a",ylab="b",lwd=2)
4
3
b
2
1
2 4 6 8
These approximate con…dence regions which are ellipses are very di¤erent than the
likelihood regions in Figure 9.14. In particular we note that ( ; ) = (3; 2:7) lies inside a
10% likelihood region but outside a 99% approximate con…dence region.
There are only 30 observations and these di¤erences suggest the Normal approximation
is not very good. The likelihood regions are a better summary of the uncertainty in the
estimates.
332 9. SOLUTIONS TO CHAPTER EXERCISES
Exercise 7.4.7
Let " #
h i 1 v^11 v^12
I(^ ; ^ ) =
v^12 v^22
Since " #!
h i h i 1 0
~ ~ [J(~ ; ~ )]1=2 !D Z BVN 0 0 ;
0 1
then for large n, V ar(~ ) v^11 , V ar( ~ ) v^22 and Cov(~ ; ~ ) v^12 . Therefore an approxi-
mate 95% con…dence interval for a is given by
p p
[^ 1:96 v^11 ; ^ + 1:96 v^11 ]
9.7 Chapter 8
Exercise 8.1.7
(a) Let X = number of successes in n trails. Then X Binomial(n; ) and E (X) = n .
If the null hypothesis is H0 : = 0 and the alternative hypothesis is HA : 6= 0 then a
suitable test statistic is D = jX n 0 j.
For n = 100, x = 42, and 0 = 0:5 the observed value of D is d = jx n 0 j = j42 50j = 8.
The p-value is
P (jX n 0j jx n 0 j ; H0 : = 0)
calculated using R. Since p-value > 0:1 there is no evidence based on the data against
H0 : = 0:5.
(b) If the null hypothesis is H0 : = 0 and the alternative hypothesis is HA : < 0 then
a suitable test statistic is D = n 0 X.
For n = 100, x = 42, and 0 = 0:5 the observed value of D is d = n 0 x = 50 42 = 8.
The p-value is
P (n 0 X n 0 x; H0 : = 0)
calculated using R. Since 0:05 < p-value < 0:1 there is weak evidence based on the data
against H0 : = 0:5.
Exercise 8.2.5
The model for these data is (X1 ; X2 ; : : : ; X7 )Multinomial(63; 1 ; 2 ; : : : ; 7 ) and the
1
hypothesis of interest is H0 : 1 = 2 = =7 = 7 . Since the model and parameters
P7
are completely speci…ed this is a simple hypothesis. Since j = 1 there are only k = 6
j=1
parameters.
The likelihood ratio test statistic, which can be derived in the same way as Example 8.2.4,
is
P7 Xj
(X; 0 ) = 2 Xj log
i=1 Ej
where Ej = 63=7 is the expected frequency for outcome j.
9.7. CHAPTER 8 335
For these data the observed value of the likelihood ratio test statistic is
P
7 xj
(x; 0) = 2 xj log
i=1 63=7
22 7 6
= 2 22 log + 7 log + + 6 log
63=7 63=7 63=7
= 23:27396
calculated using R. Since p-value < 0:001 there is strong evidence based on the data against
the hypothesis that the deaths are equally likely to occur on any day of the week.
Exercise 8.3.3
(a) = f( 1 ; 2 ) : 1 > 0; 2> 0g which has dimension k = 2 and
0 = f( 1 ; 2 ) : 1 = 2 ; 1 > 0; 2 > 0g which has dimension q = 1 and the hypothesis
H0 : 1 = 2 is composite.
From Example 6.2.5 the likelihood function for an observed random sample x1 ; x2 ; : : : ; xn
from an Poisson( 1 ) distribution is
nx n
L1 ( 1 ) = 1 e
1
for 1 0
with maximum likelihood estimate ^2 = y. Since the samples are independent the likelihood
function for ( 1 ; 2 ) is
L ( 1; 2) = L1 ( 1 )L2 ( 2 ) for 1 0; 2 0
The independence of the samples also implies the maximum likelihood estimators are
~1 = X and ~2 = Y . Therefore
d nx + my
l( ) = (n + m)
d
d nx+my
and d l ( ) = 0 for = n+m and therefore
nX + mY
max l( 1 ; 2 ; X; Y) = nX + mY log nX + mY
( 1 ; 2 )2 0 n+m
(X; Y; 0)
= 2 l(~1 ; ~2 ; X; Y) max l( 1 ; 2 ; X; Y)
( 1 ; 2 )2 0
nX + mY
= 2 nX log X + mY log Y nX + mY nX + mY log + nX + mY
n+m
nX + m Y
= 2 nX log X + mY log Y nX + mY log
n+m
nx + my
(x; y; 0) = 2 nx log x + my log y (nx + my) log
n+m
P
10 P
15
(b) For n = 10, xi = 22, m = 15, yi = 40 the observed value of the likelihood ratio
i=1 i=1
test statistic is (x; y; 0) = 0:5344026 and
2
p-value P [W 0:5344026] where W (1)
h p i
= 2 1 P Z 0:5344026 where Z N (0; 1)
= 0:4647618
calculated using R. Since p-value > 0:1 there is no evidence against H0 : 1 = 2 based on
the data.
10. Solutions to Selected End of
Chapter Problems
337
338 10. SOLUTIONS TO SELECTED END OF CHAPTER PROBLEMS
10.1 Chapter 2
1(a) Starting with
P
1
x 1
= for j j < 1
x=0 1
it can be shown that
P
1
x
x = for j j < 1 (10.1)
x=1 (1 )2
P
1 1+
x2 x 1
= for j j < 1 (10.2)
x=1 (1 )3
P
1 1+4 + 2
x3 x 1
= for j j < 1 (10.3)
x=1 (1 )4
(1)
1 P
1
x (1 )2
= x = using (10.1) gives k =
k x=1 (1 )2
and therefore
f (x) = (1 )2 x x 1
for x = 1; 2; :::; 0 < <1
The graph of f (x) in Figure 10.1 is for = 0:3.
0.5
0.45
0.4
0.35
0.3
f(x)
0.25
0.2
0.15
0.1
0.05
0
1 2 3 4 5 6 7 8
x
(2) 8
>
<0 if x < 1
F (x) = Px
>
: (1 )2 t t 1
=1 (1 + x x ) x
for x = 1; 2; :::
t=1
10.1. CHAPTER 2 339
Note that F (x) is speci…ed by indicating its value at each jump point.
(3)
P
1 P
1
E (X) = x (1 )2 x x 1
= (1 )2 x2 x 1
x=1 x=1
(1 + )
= (1 )2 using (10.2)
(1 )3
(1 + )
=
(1 )
P
1 P
1
E X2 = x2 (1 )2 xx 1
= (1 )2 x3 x 1
x=1 x=1
(1 + 4 + 2 )
= (1 )2 using (10.3)
(1 )4
(1 + 4 + 2 )
=
(1 )2
(1 + 4 + 2 ) (1 + )2 2
V ar(X) = E(X 2 ) [E (X)]2 = 2 =
(1 )2 (1 ) (1 )2
(4) Using = 0:3,
P (0:5 < X 2) = P (X = 1) + P (X = 2)
= 0:49 + (0:49)(2)(0:3) = 0:784
P (X > 0:5; X 2)
P (X > 0:5jX 2) =
P (X 2)
P (X = 1) + P (X = 2)
= =1
P (X 2)
1:(b) (1)
Z1 Z1
1 1 1
= dx = 2 dx because of symmetry
k 1 + (x= )2 1 + (x= )2
1 0
Z1
1 1
= 2 dy let y = x; dy = dx
1 + y2
0
The graphs for = 0:5, 1 and 2 are plotted in Figure 10.2. The graph for each di¤erent
value of is obtained from the graph for = 1 by simply relabeling the x and y axes. That
is, on the x axis, each point x is relabeled x and on the y axis, each point y is relabeled
y= . The graph of f (x) below is for = 1: Note that the graph of f (x) is symmetric about
the y axis.
0.7
0.6
θ=0.5
0.5
0.4
f(x)
0.3
0.2
θ=0
0.1
θ=2
0
-5 0 5
x
(2)
Zx
1 1 t
F (x) = h i dt = lim arctan jxb
2 b! 1
1
1 + (t= )
1 x 1
= arctan + for x 2 <
2
(3) Consider the integral
Z1 Z1
x t 1
h i dx = dt = lim ln 1 + b2 = +1
1 + (x= )2 1 + t2 b!1 2
0 0
does not converge absolutely and E (X) does not exist. Since E (X) does not exist V ar (X)
does not exist.
(4) Using = 1,
P (X > 0:5; X 2)
P (X > 0:5jX 2) =
P (X 2)
P (0:5 < X 2) F (2) F (0:5)
= =
P (X 2) F (2)
arctan (2) arctan (0:5)
= 0:2403
arctan (2) + 2
1:(c) (1)
Z1
1 jx j
= e dx let y = x , then dy = dx
k
1
Z1 Z1
jyj y
= e dy = 2 e dy by symmetry
1 0
= 2 (1) = 2 (0!) = 2
1
Thus k = 2 and
1
f (x) = e jx j for x 2 <; 2 <
2
The graphs for = 1, 0 and 2 are plotted in Figure 10.3. The graph for each di¤erent
value of is obtained from the graph for = 0 by simply shifting the graph for = 0 to
the right units if is positive and shifting the graph for = 0 to the left units if is
negative. Note that the graph of f (x) is symmetric about the line x = .
(2) 8 x
> R 1 t
>
< 1 2 e dt
> x
F (x) =
>
> R 1 t Rx 1 t+
>
: 2 e dt + 2e dt x >
1
8
< 1 ex x
2
=
:1 + 1 t+ jx 1 x+
2 2 e =1 2e x>
342 10. SOLUTIONS TO SELECTED END OF CHAPTER PROBLEMS
0.9
θ=0 θ=2
0.8 θ=-1
0.7
0.6
f(x)
0.5
0.4
0.3
0.2
0.1
0
-5 0 5
x
1
R1 jx j dx
converges, the integral 2 xe converges absolutely and by the symmetry of f (x)
1
we have E (X) = .
Z1
2 1
E X = x2 e jx j
dx let y = x , then dy = dx
2
1
Z1 Z1
1 2 jyj 1
= (y + ) e dy = y 2 + 2y + 2
e jyj
dy
2 2
1 1
Z1 Z1
= y2e y
dy + 0 + 2
e y
dy using the properties of even/odd functions
0 0
2 2
= (3) + (1) = 2! +
2
= 2+
Therefore
V ar(X) = E(X 2 ) [E (X)]2 = 2 + 2 2
=2
(4) Using = 0,
1 0:5 2
P (0:5 < X 2) = F (2) F (0:5) = (e e ) 0:2356
2
10.1. CHAPTER 2 343
1:(e) (1)
Z1
1 1
= x2 e x
dx let y = x; dy = dx
k
0
Z1
1 1 2! 2
= 3 y2e y
dy = 3 (3) = 3 = 3
0
3
Thus k = 2 and
1 3 2 x
f (x) =
x e for x 0; > 0
2
The graphs for = 0:5, 1 and 2 are plotted in Figure 10.4. The graph for each di¤erent
value of is obtained from the graph for = 1 by simply relabeling the x and y axes. That
is, on the x axis, each point x is relabeled x= and on the y axis, each point y is relabeled
y.
0.3
θ=0.5
θ=1
θ=2
0.25
0.2
f(x)
0.15
0.1
0.05
0
0 2 4 6 8 10
x
(2) 8
>
<0 if x 0
F (x) = Rx
>
: 21 3 2
t e t dt if x > 0
0
344 10. SOLUTIONS TO SELECTED END OF CHAPTER PROBLEMS
Therefore
8
<0 if x 0
F (x) =
:1 1 x 2 2
2e x +2 x+2 if x > 0
(3)
Z1
1 3 3 x 1
E (X) = x e dx let y = x; dy = dx
2
0
Z1
1 1 3! 3
= y3e y
dy = (4) = =
2 2 2
0
Z1
2 1 3 4 x 1
E X = x e dx let y = x; dy = dx
2
0
Z1
1 1 4! 12
= y4e y
dy = 2 (5) = 2 = 2
2 2 2 2
0
and
2
2 2 12 3 3
V ar (X) = E X [E (X)] = 2 = 2
10.1. CHAPTER 2 345
(4) Using = 1,
P (X > 0:5; X 2)
P (X > 0:5jX 2) =
P (X 2)
P (0:5 < X 2) F (2) F (0:5)
= =
P (X 2) F (2)
1 0:5 13 2
e e (10)
= 2 4
1 2
1 2 e (10)
0:9555
0.25
0.2
0.15
f (x)
0.1
0.05
0
-10 -5 0 5 10
x
(3) Since f is a symmetric function about x = 0 then if E (jXj) exists then E (X) = 0.
Now
Z1 x=
xe
E (jXj) = 2 2 dx
1+e x=
0
Since
Z1 Z1
x x= y
e dx = ye dy = (2)
0 0
converges and
x xe x=
x=
e 2 for x 0
1+e x=
Z1 x=
xe
E (jXj) = 2 2 dx
1+e x=
0
By symmetry
Z1
x2 e x=
E X2 = 2 dx
1+e x=
1
Z0
x2 e x=
= 2 2 dx let y = x=
1+e x=
1
Z0
2 y2e y
= 2 dy
(1 + e y )2
1
e y 1
u = y2; du = 2ydy; dv = ; v=
(1 + e y )2 (1 + e y)
we have
Z0 Z0
y2e y y2 2y
dy = lim j0
y) a
dy
(1 + e y )2 a! 1 (1 + e 1+e y
1 1
Z0
a2 y a
= lim 2 dy lim e =1
a! 1 (1 + e a ) 1+e y a! 1
1
Z0
2a y
= lim 2 dy by L’Hospital’s Rule
a! 1 e a 1+e y
1
Z0
2 y
= lim a
2 y
dy by L’Hospital’s Rule
a! 1 e 1+e
1
Z0
y ey
= 0 2 y
dy multiply by
1+e ey
1
Z0
yey
= 2 dy
1 + ey
1
Let
u = ey ; du = ey dy; log u = y
to obtain
Z0 Z1
yey log u 2
dy = du =
1 + ey 1+u 12
1 0
348 10. SOLUTIONS TO SELECTED END OF CHAPTER PROBLEMS
Z1 2
log u
du =
1+u 12
0
Therefore
2
E X2 = 2 2
2
12
2 2
=
3
and
2 2
V ar (X) = E X 2 [E (X)]2 = 02
3
2 2
=
3
(4)
P (X > 0:5; X 2)
P (X > 0:5jX 2) =
P (X 2)
P (0:5 < X 2)
=
P (X 2)
F (2) F (0:5)
=
F (2)
1 1 1
= 1 0:25
1+e
1+e 1+e
0:2310
10.1. CHAPTER 2 349
2:(a) Since
f (x; ) = (1 )2 x x 1
therefore
f0 (x) = f (x; = 0) = 0
and
f1 (x) = f (x; = 1) = 0
Since
1 x
f (x; ) 6= f0 (x ) and f (x; ) 6= f1
2: (b) Since
1
f (x; ) = h i for x 2 <; >0
1 + (x= )2
therefore
1
f1 (x) = f (x; = 1) = for x 2 <
(1 + x2 )
Since
1 x
f (x; ) = f1 for all x 2 <; >0
2: (c) Since
1 jx j
f (x; ) = e for x 2 <; 2<
2
therefore
1 jxj
f0 (x) = f (x; = 0) = e for x 2 <
2
Since
f (x; ) = f0 (x ) for x 2 <; 2<
therefore is a location parameter for this distribution.
2: (e) Since
1 3 2 x
f (x; ) = x e for x 0; >0
2
therefore
1
f1 (x) = f (x; = 1) = x2 e x
for x 0
2
Since
f (x; ) = f1 ( x) for x 0; >0
therefore 1= is a scale parameter for this distribution.
350 10. SOLUTIONS TO SELECTED END OF CHAPTER PROBLEMS
8 2
> c =2 ec(x )
<ke
> x< c
2
f (x) = ke (x ) =2 c x +c
>
>
:kec2 =2 e c(x ) x> +c
Therefore
Z c Z+c Z1
1 c2 =2 2
c2 =2
= e ec(x )
dx + e (x ) =2
dx + e e c(x )
dx
k
1 c +c
Z1 p Zc
2 =2 1 z 2 =2
= 2ec e cu
du + 2 p e dz
2
c c
c2 =2 1 cu b
p
= 2e lim e jc + 2 P (jZj c) where Z N (0; 1)
b!1 c
2 c2 =2 c2 p
= e e + 2 [2 (c) 1] where is the N (0; 1) c.d.f.
c
2 c2 =2 p
= e + 2 [2 (c) 1]
c
as required.
Zx
c2 =2
F (x) = ke ec(u )
du; let y = u
1
Z
x
2 =2
= kec ecy dy
1
2 =2 1 cy x
= kec lim e ja
a! 1 c
k c2 =2+c(x )
= e
c
and F ( c) = kc e c2 =2 .
10.1. CHAPTER 2 351
If c x + c then
p Z
x
k c2 =2 1 (u )2 =2
F (x) = e +k 2 p e du let z = u
c 2
c
p Z
x
k c2 =2 1 z 2 =2
= e +k 2 p e dz
c 2
c
k c2 =2
p
= e + k 2 [ (x ) ( c)]
c
k c2 =2
p
= e + k 2 [ (x )+ (c) 1] :
c
If x > + c then
Z1
c2 =2 c(u )
F (x) = 1 ke e du let y = u
x
Z1
c2 =2 cy
= 1 ke e dy
x
c2 =2 1 cy b
= 1 ke lim e jx
b!1 c
k c2 =2 c(x )
= 1 e
c
Therefore
8 2
> k c =2+c(x )
<ce
>
p
x< c
2
F (x) = kc e c =2 + k 2 [ (x )+ (c) 1] c x +c
>
>
:1 k ec2 =2 c(x ) x> +c
c
Z1 Z1
k k
E X = x f (x) dx = xk f0 (x ) dx let y = u (10.4)
1 1
Z1
= (y + )k f0 (y) dy
1
352 10. SOLUTIONS TO SELECTED END OF CHAPTER PROBLEMS
In particular
Z1 Z1 Z1
E (X) = (y + ) f0 (y) dy = yf0 (y) dy + f0 (y) dy
1 1 1
Z1
= yf0 (y) dy + (1) since f0 (y) is a p.d.f.
1
Z1
= yf0 (y) dy +
1
Now
Z1 Z c Zc Z1
1 c2 =2 cy y 2 =2 c2 =2 cy
yf0 (y) dy = e ye dy + ye dy + e ye dy
k
1 1 c c
let y = u in the …rst integral
Z1 Zc Z1
c2 =2 cu y 2 =2 c2 =2 cy
= e ue du + ye dy + e ye dy
c c c
By integration by parts
2 3
Z1 Zb
1 1
ye cy
dy = lim 4 ye cy b
jc + e cy
dy 5
b!1 c c
c c
1 cy b 1 cy b
= lim ye jc e jc
b!1 c c2
1 c2
= 1+ e (10.5)
c2
Therefore
Z1
1 1 c2 1 c2
yf0 (y) dy = 1+ e +0+ 1+ e =0
k c2 c2
1
and
Z1
E (X) = yf0 (y) dy + = 0 + =
1
10.1. CHAPTER 2 353
V ar (X) = E X 2 [E (X)]2
Z1
= (y + )2 f0 (y) dy 2
using (10:4)
1
Z1 Z1 Z1
2 2 2
= y f0 (y) dy + 2 yf0 (y) dy + f0 (y) dy
1 1 1
Z1
= y 2 f0 (y) dy + 2 (0) + 2
(1) 2
1
Z1
= y 2 f0 (y) dy
1
Now
Z1 Z c Zc Z1
1 2 c2 =2 2 cy 2 y 2 =2 c2 =2
y f0 (y) dy = e y e dy + y e dy + e y2e cy
dy
k
1 1 c c
(let y = u in the …rst integral)
Z1 Zc Z1
c2 =2 y 2 =2 c2 =2
= e cu
u e du + y 2 e
2
dy + e y2e cy
dy
c c c
Z1 Zc
2 =2 y 2 =2
= 2ec y2e cy
dy + y2e dy
c c
2 3
Z1 Zb
1 2
y2e cy
dy = lim 4 y 2 e cy b
jc + ye cy
dy 5
b!1 c c
c c
2 1 c2 c2
= ce + 1+ 2 e
c c
2 2 2
= c+ + 3 e c
c c
354 10. SOLUTIONS TO SELECTED END OF CHAPTER PROBLEMS
Also
Z+c Zc
y 2 =2 y 2 =2
y2e dy = 2 y2e dy
c 0
2 3
p Zc
y 2 =2 c 1 y 2 =2
= 2 4 ye j0 + 2 p e dy 5 using integration by parts
2
0
n o p
c2 =2
= 2 ce + 2 [ (c) 0:5]
p 2
= 2 [2 (c) 1] 2ce c =2
Therefore
Z1
V ar (X) = y 2 f0 (y) dy
1
2 2 2 p 2
= k 2 c + + 3 e c + 2 [2 (c) 1] 2ce c =2
c c
( )
1
= 2
p
c2 =2 + 2 [2 (c)
ce 1]
2 2 2 p 2
2 c + + 3 e c + 2 [2 (c) 1] 2ce c =2
c c
4: (c) Let
f0 (x) = f (x; = 0)
( 2
ke x =2 if jxj c
= 2
ke cjxj+c =2 if jxj > c
Since
(
ke (x )2 =2 if jx j c
f0 (x ) =
ke cjx j+c2 =2 if jx j>c
= f (x)
4: (d) On the graph in Figure 10.6 we have graphed f (x) for c = 1, = 0 (red), f (x) for
c = 2, = 0 (blue) and the N(0; 1) probability density function (black).
0.4
N(0,1) p.d.f. (black)
0.35
0.3
0.25
f(x) c=2 (blue)
0.2
0.15
0.1
c=1 (red)
0.05
0
-5 0 5
x
Figure 10.6: Graphs of f (x) for c = 1, = 0 (red), c = 2, = 0 and the N(0; 1) p.d.f.
(black).
We note that there is very little di¤erence between the graphs of the N(0; 1) probability
density function and the graph for f (x) for c = 2, = 0 however as c becomes smaller
(c = 1) the “tails”of the probability density function become much “fatter”relative to the
N(0; 1) probability density function.
356 10. SOLUTIONS TO SELECTED END OF CHAPTER PROBLEMS
6: Since f1 (x) ; f2 (x) ; : : : ; fk (x) are probability density functions with support sets A1 ;
A2 ; : : : ; Ak then we know that fi (x) > 0 for all x 2 Ai ; i = 1; 2; : : : ; k. Also since
Pk Pk
0 < p1 ; p2 ; : : : ; pk 1 with pi = 1, we have that g (x) = pi fi (x) > 0 for all
i=1 i=1
S
k
x2A= Ai and A = support set of X. Also
i=1
Z1 k
X Z1 k
X k
X
g(x)dx = pi fi (x) dx = pi (1) = pi = 1
1 i=1 1 i=1 i=1
As well
Z1 k
X Z1
2 2
E X = x g(x)dx = pi x2 fi (x) dx
1 i=1 1
k
X
2 2
= pi i + i
i=1
since
Z1 Z1 Z1 Z1
2
x2 fi (x) dx = (x i) fi (x) dx + 2 i xfi (x) dx 2
i fi (x) dx
1 1 1 1
2 2 2
= i + 2 i i
2 2
= i + i
x 1 e x=
f (x) = for x > 0
( )
and 0 otherwise. Let A = fx : f (x) > 0g = fx : x > 0g. Now y = ex = h(x) is a one-to-one
function on A and h maps the set A to the set B = fy : y > 1g. Also
1 d 1 1
x=h (y) = log y and h (y) =
dy y
1 d 1
g (y) = f h (y) h (y)
dy
1 log y=
(log y) e 1
=
( ) y
1 1= 1
(log y) y
= for y 2 B
( )
and 0 otherwise.
x 1 e x=
f (x) = for x > 0
( )
and 0 otherwise. Let A = fx : f (x) > 0g = fx : x > 0g. Now y = 1=x = h(x) is a one-to-one
function on A and h maps the set A to the set B = fy : y > 0g. Also
1 1 d 1 1
x=h (y) = and h (y) =
y dy y2
1 d 1
g (y) = f h (y) h (y)
dy
y 1 e 1=( y)
= for y 2 B
( )
and 0 otherwise. This is the probability density function of an Inverse Gamma( ; ) random
variable. Therefore Y = X 1 Inverse Gamma( ; ).
xk 1 e x=
f (x) = k
for x > 0
(k)
10.1. CHAPTER 2 359
and 0 otherwise. Let A = fx : f (x) > 0g = fx : x > 0g. Now y = 2x= = h(x) is a
one-to-one function on A and h maps the set A to the set B = fy : y > 0g. Also
1 y d 1
x=h (y) = and h (y) =
2 dy 2
The probability density function of Y is
1 d 1
g (y) = f h (y) h (y)
dy
yk 1 e y=2
= for y 2 B
(k) 2k
and 0 otherwise for k = 1; 2; : : : which is the probability density function of a 2 (2k) random
variable. Therefore Y = 2X= 2 (2k).
1 d 1 1
x=h (y) = log y and h (y) =
dy y
The probability density function of Y is
1 d 1
g (y) = f h (y) h (y)
dy
1 1
2 (log y )2
= p e2 y2B
y 2
Note this distribution is called the Lognormal distribution.
7:(e) Since X N ; 2 the probability density function of X is
1 1
2 (x )2
f (x) = p e2 for x 2 <
2
Let A = fx : f (x) > 0g = <. Now y = x 1 = h(x) is a one-to-one function on A and h
maps the set A to the set B = fy : y 6= 0; y 2 <g. Also
1 1 d 1 1
x=h (y) = and h (y) =
y dy y2
The probability density function of Y is
1 d 1
g (y) = f h (y) h (y)
dy
h i2
1 1
2
1
= p e2 y
for y 2 B
2 y2
360 10. SOLUTIONS TO SELECTED END OF CHAPTER PROBLEMS
1 d 1
g (y) = f h (y) h (y)
dy
1 1
= for y 2 <
1 + y2
and 0 otherwise. This is the probability density function of a Cauchy(1; 0) random variable.
Therefore Y = tan(X) Cauchy(1; 0).
7:(g) Since X Pareto( ; ) the probability density function of X is
1 d 1
g (y) = f h (y) h (y)
dy
= ey= +1
=e y
for y 2 B
ey=
and 0 otherwise. This is the probability density function of a Exponential(1) random
variable. Therefore Y = log (X= ) Exponential(1).
7:(h) If X Weibull(2; ) the probability density function of X is
2xe (x= )2
f (x) = 2 for x > 0; >0
and 0 otherwise. Let A = fx : f (x) > 0g = fx : x > 0g. Now y = x2 = h(x) is a one-to-one
function on A and h maps the set A to the set B = fy : y > 0g. Also
1 d 1
x=h (y) = y 1=2 and h 1
(y) =
dy 2y 1=2
The probability density function of Y is
1 d 1
g (y) = f h (y) h (y)
dy
2
e y=
= 2 for y 2 B
10.1. CHAPTER 2 361
2
and 0 otherwise for k = 1; 2; : : : which is the probability density function of a Exponential
random variable. Therefore Y = X 2 Exponential( 2 )
1 jxj
f (x) = e for x 2 <
2
p p
G (y) = P (Y y) = P X 2 y = P ( y X y)
Z py
1 jxj
= p 2
e dx
y
Z py
= e x dx by symmetry for y > 0
0
By the First Fundamental Theorem of Calculus and the chain rule the probability density
function of Y is
d p d p
g (y) = G (y) = e y y
dy dy
1 p
= p e y for y > 0
2 y
and 0 otherwise.
k+1
1 x2 ( k+1
2 )
2 p
f (x) = k
1+ for x 2 <
2 k k
p p
G (y) = P (Y y) = P X 2 y =P( y X y)
p
Z y
k+1
1 x2 ( k+1
2 )
2 p
= k
1+ dx
p 2 k k
y
p
Zy k+1 ( k+1
2 )
2 1 x2
= 2 k
p 1+ dx by symmetry for y > 0
2 k k
0
By the First Fundamental Theorem of Calculus and the chain rule the probability density
362 10. SOLUTIONS TO SELECTED END OF CHAPTER PROBLEMS
function of Y is
d
g (y) = G (y)
dy
k+1 ( k+1
2 1 y 2 ) d p
= 2 k
p 1+ y for y > 0
2 k k dy
k+1 ( k+1
2 1 y 2 ) 1
= 2 k p
p 1+ p for y > 0
2 k k 2 y
k+1
1 1=2
1 1 ( k+1
2 ) 1 p
2 1
= k 1
y 2 1+ y for y > 0 since =
2 2
k k 2
and 0 otherwise. This is the probability density function of a F(1; k) random variable.
Therefore Y = X 2 F(1; k).
10.1. CHAPTER 2 363
8:(a) Let
n+1
2 1
cn = n
p
2
n
(n+1)=2
t2
f (t) = cn 1 + for t 2 <; n = 1; 2; :::
n
Since f ( t) = f (t), f is an even function whose graph is symmetric about the y axis.
Therefore if E (jT j) exists then due to symmetry E (T ) = 0. To determine when E (jT j)
exists, again due to symmetry, we only need to determine for what values of n the integral
Z1 (n+1)=2
t2
t 1+ dt
n
0
converges.
There are two cases to consider: n = 1 and n > 1.
For n = 1 we have
Z1
1 1 1
t 1 + t2 dt = lim ln 1 + t2 jb0 = lim ln 1 + b2 = 1
b!1 2 b!1 2
0
Z1 (n+1)=2
2 t2
t 1+ dt
n
0
converges.
364 10. SOLUTIONS TO SELECTED END OF CHAPTER PROBLEMS
Now
Z1 (n+1)=2
t2
t2 1 + dt
n
0
p
Zn (n+1)=2 Z1 (n+1)=2
t2 t2
= t2 1 + dt + t2 1 + dt
n p
n
0 n
The …rst integral is …nite since it is the integral of a …nite function over the …nite interval
p
[0; n]. We will show that the second integral
Z1 (n+1)=2
2 t2
t 1+ dt
p
n
n
diverges for n = 1; 2.
Now
Z1 (n+1)=2
2 t2 p
t 1+ dt let y = t= n
p
n
n
Z1
3=2 (n+1)=2
=n y2 1 + y2 dy (10.9)
1
For n = 1
y2 y2 1
= for y 1
(1 + y 2 ) 2 2
(y + y ) 2
and since
Z1
1
dy
2
1
diverges, therefore by the Comparison Test for Improper Integrals, (10.9) diverges for n = 1.
(Note: For n = 1 we could also argue that V ar(T ) does not exist since E (T ) does not exist
for n = 1.)
For n = 2,
y2 y2 1
3=2 3=2
= 3=2 for y 1
(1 + y 2 ) (y 2 + y 2 ) 2 y
and since
Z1
1 1
dy
23=2 y
1
diverges, therefore by the Comparison Test for Improper Integrals, (10.9) diverges for n = 2.
10.1. CHAPTER 2 365
Then
" (n+1)=2
#
2 n t2
E T = 2cn lim t 1+ jb0
b!1 n 1 n
Z1 (n 1)=2
n t2
+2cn 1+ dt
n 1 n
0
Z1 (n 1)=2
n b n t2
= 2cn lim (n+1)=2
+ cn 1+ dt
n 1 b!1 b2 n 1 n
1+ n 1
1=2 Z1 (n 2+1)=2
cn n n y2
= cn 2 1+ dy
cn 2 n 1 n 2 n 2
1
1=2
cn n n
=
cn 2 n 1 n 2
where the integral equals one since the integrand is the p.d.f. of a t (n 2) random variable.
366 10. SOLUTIONS TO SELECTED END OF CHAPTER PROBLEMS
Finally
1=2
cn n n
cn 2 n 1 n 2
n+1 n 2
p 1=2
2 2 (n 2) n n
= n p n 2+1
2 n 2
n 1 n 2
n+1 n 1=2 1=2
2 2 1 n 2 n n
= n+1 n
2 1 2
n n 1 n 2
n+1 n+1 n
2 1 2 1 2 1 n
= n+1 n n
2 1 2 1 2 1 n 1
(n 1) 1 n
=
2 (n 2) =2 n 1
n
=
n 2
Therefore for n > 2
n
V ar(T ) = E T 2 =
n 2
10.1. CHAPTER 2 367
(a + b) a
f (x) = x 1
(1 x)b 1
for 0 < x < 1
(a) (b)
or
Z1
(a) (b)
xa 1
(1 x)b 1
dx = (10.10)
(a + b)
0
Z1
(a + b)
E X k
= xk xa 1
(1 x)b 1
dx
(a) (b)
0
Z1
(a + b)
= xa+k 1
(1 x)b 1
dx
(a) (b)
0
(a + b) (a + k) (b)
= by (10:10)
(a) (b) (a + b + k)
(a + k) (a + b)
= for k = 1; 2; : : :
(a) (a + b + k)
For k = 1 we have
(a + 1) (a + b)
E Xk = E (X) =
(a) (a + b + 1)
a (a) (a + b)
=
(a) (a + b) (a + b)
a
=
a+b
For k = 2 we have
(a + 2) (a + b)
E X2 =
(a) (a + b + 2)
(a + 1) (a) (a) (a + b)
=
(a) (a + b + 1) (a + b) (a + b)
a (a + 1)
=
(a + b) (a + b + 1)
368 10. SOLUTIONS TO SELECTED END OF CHAPTER PROBLEMS
Therefore
V ar (X) = E X 2 [E (X)]2
2
a (a + 1) a
=
(a + b) (a + b + 1) a+b
2
a (a + 1) (a + b) a (a + b + 1)
=
(a + b)2 (a + b + 1)
a a2 + ab + a + b a2 + ab + a
=
(a + b)2 (a + b + 1)
ab
= 2
(a + b) (a + b + 1)
9: (b)
4.5
3.5
3 a=1,b=3
f(x) a=3,b=1
2.5
a=2,b=4
2 a=0.7,b=0.7
1.5
a=2,b=2
1
0.5
0
0 0.2 0.4 0.6 0.8 1
x
9: (c) If a = b = 1 then
f (x) = 1 for 0 < x < 1
and 0 otherwise. This is the Uniform(0; 1) probability density function.
10.1. CHAPTER 2 369
10: We will prove this result assuming X is a continuous random variable. The proof for
X a discrete random variable follows in a similar manner with integrals replaced by sums.
Suppose X has probability density function f (x) and E jXjk exists for some integer
k > 1. Then the improper integral
Z1
jxjk f (x) dx
1
Since
0 jxjk f (x) f (x) for x 2 A
we have Z Z
k
0 jxj f (x) dx f (x) dx = P X 2 A 1 (10.11)
A A
R1 R
Convergence of jxjk f (x) dx and (10.11) imply the convergence of jxjk f (x) dx.
1 A
Now
Z1 Z Z
j j
jxj f (x) dx = jxj f (x) dx + jxjj f (x) dx for j = 1; 2; :::; k 1 (10.12)
1 A A
and Z
0 jxjj f (x) dx 1
A
R
by the same argument as in (10.11). Since jxjk f (x) dx converges and
A
dg k
=p
d (1 )
where k is chosen for convenience. We need to solve the separable di¤erential equation
Z Z
1
dg = k p d (10.13)
(1 )
Since
d p 1 d p
arcsin x = q x
dx p 2 dx
1 ( x)
1 1
= p p
1 x 2 x
1
= p
2 x (1 x)
13:(b)
P
1 xe
M (t) = E(etx ) = etx
x=0 x!
x
P
1 et
= e
x=0 x!
et
= e e by the Exponential Series
= e (et 1) for t 2 <
t
M 0 (t) = e (e 1)
et
E(X) = M 0 (0) =
t t
M 00 (t) = e (e 1)
( et )2 + e (e 1)
et
E(X 2 ) = M 00 (0) = 2
+
2
V ar(X) = E(X ) [E(X)]2 = 2
+ 2
=
13:(c)
Z1
1
M (t) = E(etx ) = etx e (x )=
dx
= Z1
e x 1
t 1 1
= e dx which converges for t > 0 or t <
Let
1 1
y= t x; dy = t dx
to obtain
= Z1 = Z1
e x 1
t e x 1
t
M (t) = e dx = e dx
= Z1 =
e y e y b
= e dy = lim e j 1
1
t (1 t) b!1 t
1
t
e= 1
t b e = 1
t
= lim e e = e
(1 t) b!1 (1 t)
e t 1
= for t <
(1 t)
372 10. SOLUTIONS TO SELECTED END OF CHAPTER PROBLEMS
e t e t
M 0 (t) = +
(1 t) (1 t)2
e t
= [ (1 t) + ]
(1 t)2
E(X) = M 0 (0) = +
e t e t 2 et
M 00 (t) = ( )+ + [ (1 t) + ]
(1 t)2 (1 t)2 (1 t)3
E(X 2 ) = M 00 (0) = + ( + 2 )( + )
2 2 2 2
= + +3 +2 = +2 +2
= ( + )2 + 2
13:(d)
Z1
tx 1
M (t) = E(e ) = etx e jx j dx
2
1
2 3
Z Z1
14
= etx ex dx + etx e (x ) dx5
2
1
2 3
Z Z1
14
= e ex(t+1) dx + e e x(1 t) dx5
2
1
" ! !#
1 e (t+1) e (1 t)
= e +e for t + 1 > 0 and 1 t>0
2 t+1 1 t
1 et et
= + for t 2 ( 1; 1)
2 t+1 1 t
et
= for t 2 ( 1; 1)
1 t2
e t e t (2t)
M 0 (t) = +
1 t2 (1 t2 )2
e t
= [ 1 t2 + 2t]
(1 t2 )2
E(X) = M 0 (0) =
10.1. CHAPTER 2 373
e t e t e t (4t)
M 00 (t) = [ 2t + 2] + + (1 t2 ) + 2t
(1 t2 )2 (1 t2 )2 (1 t2 )4
E(X 2 ) = M 00 (0) = 2 + 2
13:(e)
Z1
tX
M (t) = E e = 2xetx xdx
0
Since Z
1 1
xetx dx = x etx + C
t t
2 1 tx 1
M (t) = x e j0
t t
2 1 t 2 1
= 1 e
t t t t
t
2 (t 1) e + 1
= ; if t 6= 0
t2
For t = 0, M (0) = E (1) = 1. Therefore
8
<1 if t = 0
M (t) = 2[(t 1)et +1]
: if t 6= 0
t2
Note that
1) et + 1
2 (t 0
lim M (t) = lim 2
indeterminate of the form
t!0 t!0 t 0
t
2 e + (t 1) e t
= lim by Hospital’s Rule
t!0 2t
et + (t 1) et 0
= lim indeterminate of the form
t!0 t 0
t t t
= lim e + e + (t 1) e by Hospital’s Rule
t!0
= 1+1 1=1
= M (0)
2 (t 1) et + 1 2 P1 ti
= (t 1) +1
t2 t2 i=0 i!
2 P 1 ti+1 P1 ti
= +1
t2 i=0 i! i=0 i!
2 P1 ti+1 P1 ti
= t + 1 + t + +1
t2 i=1 i! i=2 i!
2 Pt 1 i+1 Pt
1 i
= 2
t i=1 i! i=2 i!
2 Pt 1 i+1 P1 ti+1
=
t2 i=1 i! i=1 (i + 1)!
2 P 1
1 1
= ti+1
t2 i=1 i! (i + 1)!
and since
2 P1 1 1
2
ti+1 jt=0 = 1
t i=1 i! (i + 1)!
therefore M (t) has a Maclaurin series representation for all t 2 < given by
2 P1 1 1
ti+1
t2 i=1 i! (i + 1)!
P1 1 1
= 2 ti 1
i=1 i! (i + 1)!
P1 1 1
= 2 ti
i=0 (i + 1)! (i + 2)!
Since E X k = k! the coe¢ cient of tk in the Maclaurin series for M (t) we have
1 1 1 1 2
E (X) = (1!) (2) =2 =
(1 + 1)! (1 + 2)! 2 6 3
and
1 1 1 1 1
E X 2 = (2!) (2) =4 =
(2 + 1)! (2 + 2)! 6 24 2
Therefore
2
2 21 2 1
V ar (X) = E X [E (X)] = =
2 3 18
10.1. CHAPTER 2 375
Alternatively we could …nd E (X) = M 0 (0) using the limit de…nition of the derivative
M (t) M (0)
M 0 (0) = lim
t!0 t
2[(t 1)et +1]
t2
1
= lim
t!0 t
2 (t 1) et + 1 t2
= lim
t!0 t3
2
= using L’Hospital’s Rule
3
Similarly E X 2 = M 00 (0) could be found using
M 0 (t) M 0 (0)
M 00 (0) = lim
t!0 t
where !
d 2 (t 1) et + 1
M 0 (t) =
dt t2
for t 6= 0.
13:(f )
Z1 Z2
tX tx
M (t) = E e = e xdx + etx (2 x) dx
0 1
Z1 Z2 Z2
= xetx dx + 2 etx dx xetx dx:
0 1 1
Since Z
1 1
xetx dx = x etx + C
t t
1 1 tx 1 2 tx 2 1 1
M (t) = x e j0 + e j1 x etx j21
t t t t t
1 1 t 1 1 2 2t 1 1 2t 1
= 1 e + e et 2 e 1 et
t t t t t t t t
2 1 1 1 1 2 1 1 1
= e2t + 2 + et 1 + 1 + 2
t t t t t t t t t
e2t t
2e + 1
= for t 6= 0
t2
For t = 0, M (0) = E (1) = 1. Therefore
(
1 if t = 0
M (t) = 2t e 2et +1
t2
if t 6= 0
376 10. SOLUTIONS TO SELECTED END OF CHAPTER PROBLEMS
Note that
2et + 1
e2t 0
lim M (t) = lim 2
, indeterminate of the form , use Hospital’s Rule
t!0 t!0 t 0
2e2t 2et 0
= lim , indeterminate of the form , use Hospital’s Rule
t!0 2t 0
2e2t et
= lim =2 1=1
t!0 1
and therefore M (t) exists and is continuous for t 2 <.
Using the Exponential series we have for t 6= 0
e2t 2et + 1 1 (2t)2 (2t)3 (2t)4
= f1 + 2t + + + +
t2 t2 2! 3! 4!
t2 t3 t4
2 1+t+ + + + + 1g
2! 3! 4!
7
= 1 + t + t2 + (10.14)
12
and since
7 2
1+t+ t + jt=0 = 1
12
(10.14) is the Maclaurin series representation for M (t) for t 2 <.
Since E X k = k! the coe¢ cient of tk in the Maclaurin series for M (t) we have
7 7
E (X) = 1! 1 = 1 and E X 2 = 2! =
12 6
Therefore
7 1
V ar (X) = E X 2 1= [E (X)]2 =
6 6
0
Alternatively we could …nd E (X) = M (0) using the limit de…nition of the derivative
e2t 2et +1
M (t) M (0) t2
1
M 0 (0) = lim = lim
t
t!0 t!0 t
7 4
2et + 1e2t t2 t2 + t3 + 12 t + t2
= lim = lim
t!0 t3 t!0 t3
7
= lim 1 + t + =1
t!0 12
Similarly E X 2 = M 00 (0) could be found using
M 0 (t) M 0 (0)
M 00 (0) = lim
t!0 t
where
d e2t 2et + 1 2 t e2t et e2t + 2et 1
M 00 (t) = = for t 6= 0
dt t2 t3
10.1. CHAPTER 2 377
14:(a)
Therefore
p
K(t) = log M (t) = k log
1 qet
= k log p k log 1 qet for t < log q
qet kqet
K 0 (t) = k
=
1 qet 1 qet
kq kq
E(X) = K 0 (0) = =
1 q p
" #
1 qet et et qet
K 00 (t) = kq
(1 qet )2
1 q+q kq
V ar(X) = K 00 (0) = kq 2 = 2
(1 q) p
378 10. SOLUTIONS TO SELECTED END OF CHAPTER PROBLEMS
15:(b)
1+t
M (t) =
1 t
P
1
= (1 + t) tk for jtj < 1 by the Geometric series
k=0
P
1 P
1
= tk + tk+1
k=0 k=0
= 1 + t + t + ::: t + t2 + t3 + :::
2
P
1
= 1+ 2tk for jtj < 1 (10.15)
k=1
Since
P1 M (k) (0) P1 E(X k )
M (t) = tk = tk (10.16)
k=0 k! k=0 k!
then by matching coe¢ cients in the two series (10.15) and (10.16) we have
E(X k )
= 2 for k = 1; 2; :::
k!
or
E(X k ) = 2k! for k = 1; 2; :::
15:(c)
et
M (t) =
1 t2
P
1 ti P
1
= t2k for jtj < 1 by the Geometric series
i=0 i! k=0
t t3 t2
= 1+ + + + ::: 1 + t2 + t4 + ::: for jtj < 1
1! 2! 3!
1 1 1 1
= 1+ t+ 1+ t2 + + t3
1! 2! 1! 3!
1 1 1 1 1
+ 1+ + t4 + + + t5 + ::: for jtj < 1
2! 4! 1! 3! 5!
Since
P1 M (k) (0) P1 E(X k )
M (t) = tk = tk
k=0 k! k=0 k!
then by matching coe¢ cients in the two series we have
1 P
k
E X 2k = (2k)! for k = 1; 2; :::
i=0 (2i)!
Pk 1
E X 2k+1 = (2k + 1)! for k = 1; 2; :::
i=0 (2i + 1)!
10.1. CHAPTER 2 379
16: (a)
Z1
tY tjZj 1 z 2 =2
MY (t) = E e =E e = etjzj p e dz
2
1
Z1
2 z 2 =2 z 2 =2
= p etz e dz since etjzj e is an even function
2
0
Z1 2 Z1
2 2 2et =2 2 2zt t2 )=2
= p e (z 2zt)=2
dz = p e (z dz
2 2
0 0
Z1
2 =2 1 (z t)2 =2
= 2et p e dz let y = (z t) ; dy = dz
2
0
Zt
t2 =2 1 y 2 =2
= 2e p e dy
2
1
t2 =2
= 2e (t) for t 2 <
Therefore
E (Y ) = E (jZj)
d
= MY (t) jt=0
dt
h i
2 =2 2 =2
= 2tet (t) + 2et (t) jt=0
r
2 2
= 0+ p =
2
380 10. SOLUTIONS TO SELECTED END OF CHAPTER PROBLEMS
d 0
(t) = (t)
dt
d 1 t2 =2
= p e
dt 2
te t2 =2
= p
2
and
0
(0) = 0
Therefore
d2 d d
MY (t) = MY (t)
dt2 dt dt
d h t2 =2 2
i
= 2te (t) + 2et =2 (t)
dt
d 2
= 2 et =2 [t (t) + (t)]
dt
n o
2 =2 2 =2
= 2 et t (t) + (t) + 0
(t) + tet [t (t) + (t)]
and
d2
E Y2 = MY (t) jt=0
dt2
0
= 2 (1) 0 + (0) + (0) + (0) [(0) (0) + (0)]
= 2 (0)
1
= 2 =1
2
Therefore
V ar (Y ) = V ar (jZj)
= E Y2 [E (Y )]2
r !2
2
= 1
2
= 1
2
=
10.1. CHAPTER 2 381
18: Since
MX (t) = E(etx )
P
1
= ejt pj for jtj < h; h > 0
j=0
then
P
1
MX (log s) = ej log s pj
j=0
P1
= s j pj for j log sj < h; h > 0
j=0
and
P
1 P
1
s j pj = s j qj for j log sj < h; h > 0
j=0 j=0
Since two power series are equal if and only if their coe¢ cients are all equal we have pj = qj ,
j = 0; 1; ::: and therefore X and Y have the same distribution.
382 10. SOLUTIONS TO SELECTED END OF CHAPTER PROBLEMS
et
M (t) = for jtj < 1
1 t2
the moment generating function of Y = (X 1) =2 is
h i
MY (t) = E etY = E et(X 1)=2
t
= e t=2
E e ( 2 )X
t=2 t
= e M
2
t=2 et=2 t
= e ;= for <1
t 2 2
1 2
1
= 1 2 for jtj < 2
1 4t
19: (b)
2 2
1 2 1 1 1 2
M 0 (t) = ( 1) 1 t t = t 1 t
4 2 2 4
E(X) = M 0 (0) = 0
" #
2 3
1 1 2 1 2 1
M 00 (t) = 1 t + t ( 2) 1 t t
2 4 4 2
1
E(X 2 ) = M 00 (0) =
2
1 1
V ar(X) = E(X 2 ) [E(X)]2 = 0=
2 2
19: (c) Since the moment generating function of a Double Exponential( ; ) random variable
is
e t 1
M (t) = 2 2 for jtj <
1 t
and the moment generating function of Y is
1
MY (t) = 1 2 for jtj < 2
1 4t
therefore by the Uniqueness Theorem for Moment Generating Functions Y has a Double
Exponential 0; 12 distribution.
10.2. CHAPTER 3 383
10.2 Chapter 3
1:(a)
1 P
1 P
1 P
1 P
1
= q 2 px+y = q 2 py px
k y=0 x=0 y=0 x=0
P
1 1
= q2 py by the Geometric Series since 0 < p < 1
y=0 1 p
P
1
= q py since q = 1 p
y=0
1
= q by the Geometric Series
1 p
= 1
Therefore k = 1.
1:(b) The marginal probability function of X is
!
P P
1 P
1
f1 (x) = P (X = x) = f (x; y) = q 2 px+y = q 2 px py
y y=0 y=0
1
= q 2 px by the Geometric Series
1 p
= qpx for x = 0; 1; :::
= q 2 pt (t + 1) for t = 0; 1; :::
Therefore
q 2 px+(t x) 1
P (X = xjX + Y = t) = 2 t
= for x = 0; 1; :::; t
q p (t + 1) t+1
384 10. SOLUTIONS TO SELECTED END OF CHAPTER PROBLEMS
2:(a)
e 2
f (x; y) = for x = 0; 1; :::; y; y = 0; 1; :::
x! (y x)!
OR
e 2
f (x; y) = for y = x; x + 1; :::; x = 0; 1; :::
x! (y x)!
P P
1 e 2
f1 (x) = f (x; y) =
y y=x x! (y x)!
e 2 P1 1
= let k = y x
x! y=x (y x)!
e 2 P1 1 e 2 e1
= =
x! k=0 k! x!
e 1
= x = 0; 1; ::: by the Exponential Series
x!
Note that X Poisson(1).
P P
y e 2
f2 (y) = f (x; y) =
x x=0 x! (y x)!
e 2 P
y
y!
=
y! x=0 x!(y x)!
e 2 P y x
y
= 1
y! x=0 x
e 2
= (1 + 1)y by the Binomial Series
y!
2y e 2
= for y = 0; 1; :::
y!
OR
y=1-x 2
x
-1 0 1
Z1 Z1 Z Z Z1 1Z x2
Z1 Z1
1 x2 1 2
= k x y + y 2 j10
2
dx = k x2 1 x2 + 1 x2 dx
2 2
1 1
Z1 h i
2
= k 2x2 1 x2 + 1 x2 dx by symmetry
0
Z1
1 51 4 5
= k 1 x4 dx = k x x j = k and thus k =
5 0 5 4
0
Therefore
5 2
f (x; y) = x +y for (x; y) 2 A
4
386 10. SOLUTIONS TO SELECTED END OF CHAPTER PROBLEMS
3 1 3 1
f ; = 0 6= f1 f2 >0
4 2 4 2
y=x+1
y
2
y=1-x
x
-1 0 1
Z Z Z Z
5 2 5 2
P (Y X + 1) = x + y dydx = 1 x + y dydx
4 4
(x;y) 2B (x;y) 2C
Z0 1Z x2
5 2
= 1 x + y dydx
4
x= 1 y=x+1
Z0
5 1 2
= 1 x2 y + y 2 j1x+1x dx
4 2
x= 1
Z0 n o
5 2
= 1 [2x2 (1 x2 ) + 1 x2 ] [2x2 (x + 1) + (x + 1)2 ] dx
8
1
Z0
5 5 1 5 1 4
= 1 x4 2x3 3x2 2x dx = 1 + x + x + x3 + x2 j0 1
8 8 5 2
1
5 1 1 5 2+5 3
= 1 ( 1) + + ( 1) + 1 = 1 =1
8 5 2 8 10 16
13
=
16
388 10. SOLUTIONS TO SELECTED END OF CHAPTER PROBLEMS
y
2
y=x
y=1
1
| | x
-1 0 1
Z1 Z1 Z Z
1 = f (x; y)dxdy = k x2 ydydx
1 1 (x;y) 2A
Z1 Z1 Z1
2 1 21
= k x ydydx = k x2 y j 2 dx
2 x
x= 1 y=x2 x= 1
Z1
k k 1 3 1 71
= x2 1 x4 dx = x x j 1
2 2 3 7
x= 1
k 1 1 1 1 1 1
= ( 1) + ( 1) = k
2 3 7 3 7 3 7
4k
=
21
Therefore k = 21=4 and
21 2
f (x; y) = x y for (x; y) 2 A
4
and 0 otherwise.
10.2. CHAPTER 3 389
Z1
21
f1 (x) = x2 ydy
4
x2
21x2 h i
= y 2 j1x2
8
21x2
= 1 x4 for 1<x<1
8
and 0 otherwise. The support set of X is A1 = fx : 1 < x < 1g.
The support set A of (X; Y ) is not rectangular. To show that X and Y are not inde-
pendent random variables we only need to …nd x 2 A1 , and y 2 A2 such that (x; y) 2
= A.
1 1
Let x = 2 and y = 10 . Since
1 1 1 1
f ; = 0 6= f1 f2 >0
2 10 2 10
y
2
y=x
y=1
1
y=x
| | x
-1 0 1
Z Z
21 2
P (X Y) = x ydydx
4
(x;y) 2B
Z1 Zx
21 2
= x ydydx
4
x=0 y=x2
Z1
21
= x2 y 2 jxx2 dx
8
0
Z1
21
= x2 x2 x4 dx
8
0
Z1
21
= x4 x6 dx
8
0
21 1 5 1 7 1
= x x j
8 5 7 0
21 7 5
=
8 35
3
=
20
10.2. CHAPTER 3 391
f (x; y)
f1 (xjy) =
f2 (y)
21 2
4 x y
= 7 5=2
2y
3 2 3=2 p p
= x y for y<x< y; 0 < y < 1
2
and 0 otherwise. Check:
p
Z1 Zy
1 3=2
f1 (xjy) dx = y 3x2 dx
2 p
1 y
h p i
y
= y 3=2 x3 j0
3=2 3=2
= y y
= 1
f (x; y)
f2 (yjx) =
f1 (x)
21 2
4 x y
= 21x2
(1 x4 )
8
2y
= for x2 < y < 1; 1<x<1
(1 x4 )
y
y=1
1
y=x
A
| x
0
1
Z1 Z1 Z Z
1 = f (x; y) dxdy = k (x + y) dydx
1 1 (x;y) 2A
Z1 Zy Z1
1 2
= k (x + y) dxdy = k x + xy jyx=0 dy
2
y=0 x=0 y=0
Z1
1 2
= k y + y 2 dx
2
0
Z1
3 2 k 31
= k y dy = y j0
2 2
0
k
=
2
Therefore k = 2 and
f (x; y) = 2 (x + y) for (x; y) 2 A
and 0 otherwise.
10.2. CHAPTER 3 393
= y 2 + 2y 2 0
2
= 3y for 0 < y < 1
6:(d) (iv)
Z1
E (Xjy) = xf1 (xjy) dx
1
Zy
2 (x + y)
= x dx
3y 2
x=0
Zy
1
= 2 x2 + yx dx
3y 2
x=0
1 2 3
= x + yx2 jyx=0
3y 2 3
1 2 3
= y + y3
3y 2 3
1 5 3
= y
3y 2 3
5
= y for 0 < y < 1
9
Z1
E (Y jx) = yf2 (yjx) dy
1
Z1
2 (x + y)
= y dy
1 + 2x 3x2
y=x
2 3
Z1
1 4
= 2 xy + y 2 dy 5
1 + 2x 3x2
y=x
1 2
= xy 2 + y 3 j1y=x
1 + 2x 3x2 3
1 2 2
= 2
x+ x3 + x3
1 + 2x 3x 3 3
2 5 3
3 +x 3x
=
1 + 2x 3x2
2 + 3x 5x3
= for 0 < x < 1
3 (1 + 2x 3x2 )
10.2. CHAPTER 3 395
y
1
y=1-x
x
0 1
Z1 Z1 Z Z
1 = f (x; y) dxdx = k x2 ydydx
1 1 (x;y) 2A
Z1 Z
1 x Z1 Z1
1 21 k
= k 2
x ydydx = k x 2
y j x
dx = x2 (1 x)2 dx
2 0 2
x=0 y=0 0 0
Z1
k k 1 3 1 4 1 51
= x2 2x3 + x4 dx = x x + x j0
2 2 3 2 5
0
k 1 1 1 k 10 15 + 6
= + =
2 3 2 5 2 30
k
=
60
Therefore k = 60 and
and 0 otherwise. The support of X is A1 = fx : 0 < x < 1g. Note that X Beta(3; 3).
= 60y x2 dx
0
h i
= 20y x3 j10 y
and 0 otherwise. The support of Y is A2 = fy : 0 < y < 1g. Note that Y Beta(2; 4).
f (x; y)
f1 (xjy) =
f2 (y)
60x2 y
=
20y (1 y)3
3x2
= for 0 < x < 1 y; 0 < y < 1
(1 y)3
Check:
Z1 Z
1 y
h i
1 1 (1 y)3
f1 (xjy) dx = 3x2 dx = x3 j01 y
= =1
(1 y)3 (1 y)3 (1 y)3
1 0
10.2. CHAPTER 3 397
f (x; y)
f2 (yjx) =
f1 (x)
60x2 y
=
30x2 (1 x)2
2y
= for 0 < y < 1 x; 0 < x < 1
(1 x)2
6:(f ) (iv)
Z1
E (Xjy) = xf1 (xjy) dx
1
Z
1 y
3
= 3 (1 y) x x2 dx
0
3 1 41 y
= 3 (1 y) x j
4 0
3 3
= (1 y) (1 y)4
4
3
= (1 y) for 0 < y < 1
4
Z1
E (Y jx) = yf2 (yjx) dy
1
Z
1 x
2
= 2 (1 x) y (y) dy
0
2 1 31 x
= 2 (1 x) y j
3 0
2 2
= (1 x) (1 x)3
3
2
= (1 x) for 0 < x < 1
3
398 10. SOLUTIONS TO SELECTED END OF CHAPTER PROBLEMS
.
y .
.
y=x
...
A
x
0
Z1 Z1 Z Z
x 2y
1 = f (x; y) dxdx = k e dydx
1 1 (x;y) 2A
Z1 Z1 Z1 h i
x 2y 2y x b
= k e dxdy = k e lim e jy dy
b!1
y=0 x=y 0
Z1 Z1
2y y b 3y
= k e e lim e dy = k e dy
b!1
0 0
Z1
k 3y
= 3e dy
3
0
3y 1
But 3e is the probability density function of a Exponential 3 random variable and
therefore the integral is equal to 1. Therefore 1 = k=3 or k = 3.
Therefore
x 2y
f (x; y) = 3e for (x; y) 2 A
and 0 otherwise.
10.2. CHAPTER 3 399
2y x b
= 3e lim e jy
b!1
2y y b
= 3e e lim e
b!1
3y
= 3e for y > 0
1
and 0 otherwise. The support of Y is A2 = fy : y > 0g. Note that Y Exponential 3 .
6:(g) (iii) The conditional probability density function of X given Y = y is
f (x; y)
f1 (xjy) =
f2 (y)
3e x 2y
=
3e 3y
= e (x y) for x > y > 0
f (x; y)
f2 (yjx) =
f1 (x)
3e x 2y
= 3 x (1 2x )
2e e
2e 2y
= 2x
for 0 < y < x
1 e
400 10. SOLUTIONS TO SELECTED END OF CHAPTER PROBLEMS
6:(g) (iv)
Z1
E (Xjy) = xf1 (xjy) dx
1
Z1
(x y)
= xe dx
y
h i
= ey lim (x + 1) e x b
jy
b!1
b+1
= ey (y + 1) e y
lim
b!1 eb
= y + 1 for y > 0
Z1
E (Y jx) = yf2 (yjx) dy
1
Zx
2ye 2y
= dy
1 e 2x
0
1 1
= 2x
y+ e 2y jx0
1 e 2
1 1 1
= x+ e 2x
1 e 2x 2 2
1 (2x + 1) e 2x
= for x > 0
2 (1 e 2x )
10.2. CHAPTER 3 401
and 0 otherwise.
7:(c) The conditional probability density function of X given Y = y is
f (x; y)
f1 (xjy) =
f2 (y)
1
1 x
=
log (1 y)
1
= for 0 < x < y < 1
(x 1) log (1 y)
and 0 otherwise.
402 10. SOLUTIONS TO SELECTED END OF CHAPTER PROBLEMS
E (Y j ) = n and V ar (Y j ) = n (1 )
a ab
E( )= , V ar ( ) =
a+b (a + b + 1) (a + b)2
and
E 2
= V ar ( ) + [E ( )]2
2
ab a
= +
(a + b + 1) (a + b)2 a+b
2
ab + a (a + b + 1)
=
(a + b + 1) (a + b)2
a [b + a(a + b) + a] a (a + b) (a + 1)
= 2 =
(a + b + 1) (a + b) (a + b + 1) (a + b)2
a (a + 1)
=
(a + b + 1) (a + b)
Therefore
a
E (Y ) = E [E (Y j )] = E (n ) = nE ( ) = n
a+b
and
V ar (Y ) = E [var (Y j )] + V ar [E (Y j )]
= E [n (1 )] + V ar (n )
2
= n E( ) E + n2 V ar ( )
a a (a + 1) n2 ab
= n +
a+b(a + b + 1) (a + b) (a + b + 1) (a + b)2
(a + b + 1) (a + 1) n2 ab
= na +
(a + b + 1) (a + b) (a + b + 1) (a + b)2
b (a + b) n2 ab
= na 2 +
(a + b + 1) (a + b) (a + b + 1) (a + b)2
nab (a + b + n)
=
(a + b + 1) (a + b)2
10.2. CHAPTER 3 403
E (Y ) = E [E (Y j )] = E ( ) =
2
Since Y j Poisson( ), V ar (Y j ) = and since Gamma( ; ), V ar ( ) = .
V ar (Y ) = E [V ar (Y j )] + V ar [E (Y j )]
= E ( ) + V ar ( )
2
= +
Therefore
Z1 Z1
@ j+k @ j+k
M (t1 ; t2 ) = et1 x+t2 y f (x; y) dxdx
@tj1 @tk2 @tj1 @tk2
1 1
= E X j Y k et1 X+t2 Y
and
@ j+k
M (t1 ; t2 ) j(t1 ;t2 )=(0;0) = E X j Y k
@tj1 @tk2
as required. Note that this proof is for the case of (X; Y ) continuous random variables.
The proof for (X; Y ) discrete random variables follows in a similar manner with integrals
replaced by summations.
11: (b) Suppose that X and Y are independent random variables then
Suppose that M (t1 ; t2 ) = MX (t1 ) MY (t2 ) for all jt1 j < h; jt2 j < h for some h > 0. Then
Z1 Z1 Z1 Z1
t1 x+t2 y t1 x
e f (x; y) dxdx = e f1 (x) dx et2 Y f2 (y) dy
1 1 1 1
Z1 Z1
= et1 x+t2 y f1 (x) f2 (y) dxdx
1 1
But by the Uniqueness Theorem for Moment Generating Functions this can only hold
if f (x; y) = f1 (x) f2 (y) for all (x; y) and therefore X and Y are independent random
variables.
Thus we have shown that X and Y are independent random variables if and only if
MX (t1 ) MY (t2 ) = M (t1 ; t2 ).
Note that this proof is for the case of (X; Y ) continuous random variables. The proof for
(X; Y ) discrete random variables follows in a similar manner with integrals replaced by
summations.
10.2. CHAPTER 3 405
By 11 (a)
@2
E (X1 X2 ) = M (t1 ; t2 ) j(t1 ;t2 )=(0;0)
@t1 @t2
@2 n
= p1 et1 + p2 et2 + p3 j(t1 ;t2 )=(0;0)
@t1 @t2
n 2
= n (n 1) p1 et1 p2 et2 p1 et1 + p2 et2 + p3 j(t1 ;t2 )=(0;0)
= n (n 1) p1 p2
Also
n
MX1 (t) = M (t; 0) = p1 et + p2 + p3
n
= p1 et + 1 p1 for t 2 <
and
d
E (X1 ) = MX1 (t) jt=0
dt
n 1
= np1 p1 et + 1 p1 jt=0
= np1
Similarly
E (X2 ) = np2
Therefore
(x ) tT is a scalar.
[x ( + t )] 1
[x ( + t )]T 2 tT t tT
= [(x ) t ] 1
[(x ) t ]T 2 tT t tT
h i
= [(x ) t ] 1
(x )T tT 2 tT t tT
= (x ) 1
(x )T (x ) 1
tT t 1
(x )T + t 1
tT 2 tT t tT
= (x ) 1
(x )T (x ) tT t (x )T + t tT 2 tT t tT
= (x ) 1
(x )T xtT + tT txT + t T
2 tT
= (x ) 1
(x )T xtT + tT xtT + tT 2 tT
= (x ) 1
(x )T 2xtT
= (x ) 1
(x )T 2xtT
as required.
Now
since
1 1
exp [x ( + t )] 1
[x ( + t )]T
2 j j1=2 2
is a BVN( + t ; ) probability density function and therefore the integral is equal to one.
10.2. CHAPTER 3 407
13:(b) Since
1
MX1 (t) = M (t; 0) = exp 1t + t2 2
1 for t 2 <
2
which is the moment generating function of a N 1 ; 21 random variable, then by the
Uniqueness Theorem for Moment Generating Functions X1 N 1 ; 21 . By a similar
argument X2 N 2 ; 22 .
13:(c) Since
@2 @2 T 1
M (t1 ; t2 ) = exp t + tT t
@t1 @t2 @t1 @t2 2
@ 2 1 2 2 1 2 2
= exp 1 t1 + 2 t2 + t1 1 + t1 t2 1 2 + t2 2
@t1 @t2 2 2
@ 2
= 1 + t1 1 + t2 1 2 M (t1 ; t2 )
@t2
2 2
= 1 2 M (t1 ; t2 ) + 1 + t1 1 + t2 1 2 2 + t2 2 + t1 1 2 M (t1 ; t2 )
therefore
@2
E(XY ) = M (t1 ; t2 )j(t1 ;t2 )=(0;0)
@t1 @t2
= 1 2+ 1 2
13:(d) By Theorem 3.8.6, X1 and X2 are independent random variables if and only if
13:(e) Since
1
E exp(XtT ) = exp tT + t tT for t 2 <2
2
therefore
Efexp[(XA + b)tT ]g
= Efexp[XAtT + btT ]g
n h T
io
= exp(btT )E exp X tAT
T 1 T
= exp(btT ) exp tAT + tAT tAT
2
1
= exp btT + AtT + t AT A tT
2
1
= exp ( A + b) tT + t(AT A)t for t 2 <2
2
which is the moment generating function of a BVN A + b; AT A random variable, then
by the Uniqueness Theorem for Moment Generating Functions, XA+b BVN A + b; AT A .
13:(f ) First note that
" # " 1
#
1 2 1 2
1 2 1 2 1 2
= 2 2 (1 2) 2 = 2)
1
1
1 2 1 2 1 (1 1 2
2
2
" # 1=2
2 h i1=2 p
1=2 1 1 2 2 2 2 2
j j = 2 = 1 2 ( 1 2) = 1 2 1
1 2 2
" #
2 2
1 T 1 x1 1 x1 1 x2 2 x2 2
(x ) (x ) = 2)
2 +
(1 1 1 2 2
and
2
x1
(x ) 1
(x )T 1
1
" #
2 2 2
1 x1 1 x1 1 x2 2 x2 2 x1 1
= 2)
2 +
(1 1 1 2 2 1
" #
2 2 2
1 x1 1 x1 1 x2 2 x2 2 2 x1 1
= 2)
2 + 1
(1 1 1 2 2 1
1 2 2
2 2 2 2
= 2 (1 2)
(x2 2) 2 (x1 1 ) (x2 2) + 2 (x1 1)
2 1 1
2
1 2
= 2 (1 2)
(x2 2) (x1 1)
2 1
2
1 2
= 2 (1 2)
x2 2+ (x1 1)
2 1
10.2. CHAPTER 3 409
14:(b)
2 1
MX (t) = M (t; 0) = = 1 for t < 1
2 t 1 2t
which is the moment generating function of an Exponential 12 random variable. Therefore
by the Uniqueness Theorem for Moment Generating Functions X Exponential 12 .
2 2
MY (t) = M (0; t) = = for t < 1
(1 t) (2 t) 2 3t + t2
which is not a moment generating function we recognize. We …nd the probability density
function of Y using
Zy
f2 (y) = 2e x y dx = 2e y e x jy0
0
y y
= 2e 1 e for y > 0
and 0 otherwise.
14:(c) E (X) = 12 since X Exponential 1
2 .
Alternatively
d d 2
E (X) = MX (t) jt=0 = jt=0
dt dt 2 t
2 ( 1) ( 1) 2 1
= 2 jt=0 = 4 = 2
(2 t)
Similarly
d d 2
E (Y ) = MY (t) jt=0 = jt=0
dt dt 2 3t + t2
2 ( 1) ( 3 + 2t) 6 3
= 2 jt=0 = =
(2 3t + t2 ) 4 2
Since
@2 @2 1
M (t1 ; t2 ) = 2
@t1 @t2 @t1 @t2 (1 t2 ) (2 t1 t2 )
@ 1
= 2
@t2 (1 t2 ) (2 t1 t2 )2
1 1 1 2
= 2 2 2 + (1
(1 t2 ) (2 t1 t2 ) t2 ) (2 t1 t2 )3
we obtain
@2
E (XY ) = M (t1 ; t2 ) j(t1 ;t2 )=(0;0)
@t1 @t2
1 1 1 2
= 2 2 2 + (1 j(t1 ;t2 )=(0;0)
(1 t2 ) (2 t1 t2 ) t2 ) (2 t1 t2 )3
1 2
= 2 + =1
4 8
10.2. CHAPTER 3 411
Therefore
10.3 Chapter 4
1: We are given that X and Y are independent random variables and we want to show that
U = h (X) and V = g (Y ) are independent random variables. We will assume that X and
Y are continuous random variables. The proof for discrete random variables is obtained by
replacing integrals by sums.
Suppose X has probability density function f1 (x) and support set A1 , and Y has
probability density function f2 (y) and support set A2 . Then the joint probability density
function of X and Y is
P (U u; V v) = P (h (X) u; g (Y ) v)
ZZ
= f1 (x) f2 (y) dxdy
B
where
B = f(x; y) : h (x) u; g (y) vg
Let B1 = fx : h (x) ug and B2 = fy : g (y) vg. Then B = B1 B2 .
Since
ZZ
P (U u; V v) = f1 (x) f2 (y) dxdy
B1 B2
Z Z
= f1 (x) dx f2 (y) dy
B1 B2
= P (h (X) u) P (g (Y ) v)
= P (U u) P (V v) for all (u; v) 2 <2
y=1-x
R xy
x
0 1
Under S
v
1 v =u
R
uv
u
0 1
@ (x; y) 0 1
= = 1
@ (u; v) 1 1
g (u; v) = f (v; u v) j 1j
= 24v (u v) for (u; v) 2 RU V
and 0 otherwise.
2: (b) The marginal probability density function of U is given by
Z1
g1 (u) = g (u; v) dv
1
Zu
= 24v (u v) dv
0
= 12uv 2 8v 3 ju0
= 4u3 for 0 < u < 1
and 0 otherwise.
10.3. CHAPTER 4 415
6:(a)
y
1
y=t-x
A
t
t x
0 1
For 0 t 1
G (t) = P (T t) = P (X + Y t)
Z Z Zt Zt x
= 4xydydx = 4xydydx
(x;y) 2At x=0 y=0
Zt
= 2x y 2 jt0 x
dx
0
Zt
= 2x (t x)2 dx
0
Zt
= 2x t2 2tx + x2 dx
0
4 3 1 4t
= x2 t2 tx + x j0
3 2
4 4 1 4
= t4 t + t
3 2
1 4
= t
6
416 10. SOLUTIONS TO SELECTED END OF CHAPTER PROBLEMS
Bt
t-1 y=t-x
x
0 t-1 1
Z1 h i Z1
= 2x 1 (t x)2 dx = 2x 1 t2 + 2tx x2 dx
t 1 t 1
4 1 41
= x 1 t + tx3
2 2
x j
3 2 t 1
4 1 4 1
= 1 t2 + t (t 1)2 1 t2 + t (t 1)3 (t 1)4
3 2 3 2
4 1 1
= 1 t2 + t + (t 1)3 (t + 3)
3 2 6
so
4 1 1
G (t) = t2 t+ (t 1)3 (t + 3) for 1 t 2
3 2 6
The probability density function of T = X + T is
8
dG (t) < 3 t
2 3
if 0 t 1
g (t) = = h i
dt :2t 4 1
1)2 (t + 3) + 16 (t 1)3
3 2 (t if 1 < t 2
8
< 2 t3 if 0 t 1
3
=
:2 t3 + 6t 4 if 1 < t 2
3
and 0 otherwise.
10.3. CHAPTER 4 417
Under S
y
1
Rxy
x
0 1
(k; 0) ! k2 ; 0 0<k<1
(0; k) ! (0; 0) 0<k<1
(1; k) ! (1; k) 0<k<1
2
(k; 1) ! k ;k 0<k<1
v
1 2
u=v or v=sqrt(u )
R
uv
u
0 1
p p 1
g (u; v) = f u; v= u
2u
p p 1
= 4 u v= u
2u
2v
= for (u; v) 2 RU V
u
and 0 otherwise.
6:(d) The marginal probability density function of U is
Z1
g1 (u) = g (u; v) dv
1
p
Zu
2v 1 h 2 pu i
= dv = v j0
u u
0
= 1 for 0 < u < 1
and 0 otherwise.
6:(e) The support set of X is A1 = fx : 0 < x < 1g and the support set of Y is
A2 = fy : 0 < y < 1g. Since
f (x; y) = 4xy = 2x (2y)
for all (x; y) 2 RXY = A1 A2 , therefore by the Factorization Theorem for Independence
X and Y are independent random variables. Also
f1 (x) = 2x for x 2 A1
and
f2 (y) = 2y for y 2 A2
so X and Y have the same distribution. Therefore
h i
E V 3 = E (XY )3
= E X3 E Y 3
2
= E X3
2 1 32
Z
= 4 x3 (2x) dx5
0
2
2 51
= x j
5 0
2
2
=
5
4
=
25
420 10. SOLUTIONS TO SELECTED END OF CHAPTER PROBLEMS
Rxy
x
0 1
Under S
1
RU V = (u; v) : v < u < ; 0<v<1
v
which is picture in Figure 10.22.
10.3. CHAPTER 4 421
v
1
uv=1
. . .
v=u
Ruv
...
u
0 1
1 3 1 3
g ; = 0 6= g1 g2
2 4 2 4
Z1
g2 (v) = g (u; v) du
1
Z1=v
1
= 2v du
u
v
= 2v [ln v] j1=v
v
= 4v ln v for 0 < v < 1
and 0 otherwise.
The marginal probability density function of U is given by
Z1
g1 (u) = g (u; v) dv
1
8 Ru
>
> 1
2vdv if 0 < u < 1
>
> u
< 0
= 1=u
>
> 1
R
>
> 2vdv if u 1
: 0
u
8
>
<u if 0 < u < 1
=
>
: u13 if u 1
and 0 otherwise.
10.3. CHAPTER 4 423
where
RXY = f(x; y) : 0 < x < ; 0 < y < g
which is pictured in Figure 10.23.The transformation
y
θ
R
xy
x
0 θ
S: U =X Y; V = X + Y
v
2θ
v=2θ+u v=2θ-u
R
uv
v=-u v=u
u
-θ 0 θ
1 1 1
g (u; v) = f (u + v) ; (v u)
2 2 2
1
= for (u; v) 2 RU V
2 2
and 0 otherwise.
10.3. CHAPTER 4 425
8
u+
>
< 2 for <u 0
=
>
: u
for 0 < u <
2
juj
= 2 for <u<
and 0 otherwise.
426 10. SOLUTIONS TO SELECTED END OF CHAPTER PROBLEMS
Note that
z12 + z22
" #
2 2 2
x1 1 1 x2 2 x1 1 x2 2 2 x1 1
= + 2)
2 +
1 (1 2 1 2 1
" #
2 2
1 x1 1 x1 1 x2 2 x2 2
= 2)
2 +
(1 1 1 2 2
1 T
= (x ) (x )
where h i h i
x= x1 x2 and = 1 2
h i
Therefore the joint probability density function of X = X1 X2 is
1 1
g (x) = 1=2
exp (x ) 1
(x )T for x 2 <2
2 j j 2
MU (s) = M (s; 0)
2 )s2 =2
= e(2 )s+(2
for s 2 <
MV (t) = M (0; t)
2 )t2 =2
= e(2 for t 2 <
and V N 0; 2 2 .
428 10. SOLUTIONS TO SELECTED END OF CHAPTER PROBLEMS
X1 = Y1 Y2 Y3 ; X2 = Y2 Y3 (1 Y1 ) ; X3 = Y3 (1 Y2 )
Let
RX = f(x1 ; x2 ; x3 ) : 0 < x1 < 1; 0 < x2 < 1; 0 < x3 < 1g
and
RY = f(y1 ; y2 ; y3 ) : 0 < y1 < 1; 0 < y2 < 1; 0 < y3 < 1g
The transformation from (X1 ; X2 ; X3 ) ! (Y1 ; Y2 ; Y3 ) maps RX into RY .
The Jacobian of the transformation from (X1 ; X2 ; X3 ) ! (Y1 ; Y2 ; Y3 ) is
@x1 @x1 @x1
@y1 @y2 @y3
@ (x1 ; x2 ; x3 ) @x2 @x2 @x2
= @y1 @y2 @y3
@ (y1 ; y2 ; y3 ) @x3 @x3 @x3
@y1 @y2 @y3
y2 y3 y1 y3 y1 y2
= y2 y3 y3 (1 y1 ) y2 (1 y1 )
0 y3 1 y2
= y3 (1 y1 ) y22 y3 + y1 y22 y3 + (1 y2 ) y2 y32 (1 y1 ) + y1 y2 y32
= y3 y22 y3 y1 y22 y3 + y1 y22 y3 + (1 y2 ) y2 y32 y1 y2 y32 + y1 y2 y32
= y22 y32 + y2 y32 y22 y32
= y2 y32
and 0 otherwise.
10.3. CHAPTER 4 429
Let
g1 (y1 ) = 1 for y1 2 A1
g2 (y2 ) = 2y2 for y2 2 A2
and
1
g3 (y3 ) = y32 exp ( y3 ) for y3 2 A3
2
Since g (y1 ; y2 ; y3 ) = g1 (y1 ) g2 (y2 ) g3 (y3 ) for all (y1 ; y2 ; y3 ) 2 A1 A2 A3 therefore by the
Factorization Theorem for Independence, (Y1 ; Y2 ; Y3 ) are independent random variables.
Since Z
gi (yi ) dyi = 1 for i = 1; 2; 3
Ai
13: Let
RY = f(y1 ; y2 ; y3 ) : 0 < y1 < 1; 0 < y2 < 2 ; 0 < y3 < g
Consider the transformation from (X1 ; X2 ; X3 ) ! (Y1 ; Y2 ; Y3 ) de…ned by
X1 = Y1 cos Y2 sin Y3; X2 = Y1 sin Y2 sin Y3 ; X3 = Y1 cos Y3
The Jacobian of the transformation from (X1 ; X2 ; X3 ) ! (Y1 ; Y2 ; Y3 ) is
@x1 @x1 @x1
@y1 @y2 @y3
@ (x1 ; x2 ; x3 ) @x2 @x2 @x2
= @y1 @y2 @y3
@ (y1 ; y2 ; y3 ) @x3 @x3 @x3
@y1 @y2 @y3
and 0 otherwise.
Let
A1 = fy1 : y1 > 0g
A2 = fy2 : 0 < y2 < 2 g
A3 = fy3 : 0 < y3 < g
2 1 2
g1 (y1 ) = p y12 exp y for y1 2 A1
2 2 1
1
g2 (y2 ) = for y2 2 A2
2
and
1
g3 (y3 ) = sin y3 for y3 2 A3
2
Since g (y1 ; y2 ; y3 ) = g1 (y1 ) g2 (y2 ) g3 (y3 ) for all (y1 ; y2 ; y3 ) 2 A1 A2 A3 therefore by the
Factorization Theorem for Independence, (Y1 ; Y2 ; Y3 ) are independent random variables.
Since Z
gi (yi ) dyi = 1 for i = 1; 2; 3
Ai
MU (t) = E etU
= E et(X+Y )
= E etX E etY since X and Y are independent random variables
= MX (t) MY (t)
MU (t)
MY (t) =
MX (t)
m=2
(1 2t)
= n=2
(1 2t)
1 1
= (m n)=2
for t <
(1 2t) 2
P
n P
n P
n
Xi X = Xi nX = 0 and (si s) = 0
i=1 i=1 i=1
Therefore
P
n P
n s
ti Xi = si s+ Xi X +X (10.17)
i=1 i=1 n
Pn h s s i
= si Xi X + s Xi X + (si s) X + X
i=1 n n
Pn s P
n P
n
= si Ui + s Xi X +X (si s) + sX
i=1 n i=1 i=1
Pn
= si Ui + sX
i=1
P
n P
n
E exp si Ui + sX = E exp ti Xi (10.19)
i=1 i=1
P
n 1 2 P
n
= exp ti + t2i
i=1 2 i=1
16:(b)
P
n P
n s Pn s
ti = si s+ = (si s) + n =0+s=s (10.20)
i=1 i=1 n i=1 n
P
n P
n s 2
t2i = si s+ (10.21)
i=1 i=1 n
Pn s Pn P
n s 2
= (si s)2 + 2 (si s) +
i=1 n i=1 i=1 n
Pn s2
= (si s)2 + 0 +
i=1 n
Pn s2
= (si s)2 +
i=1 n
434 10. SOLUTIONS TO SELECTED END OF CHAPTER PROBLEMS
16:(c)
P
n P
n
M (s1 ; :::; sn ; s) = E exp si Ui + sX = E exp ti Xi
i=1 i=1
P
n 2 P
n
= exp ti + t2 by (10.19)
i=1 2 i=1 i
1 Pn s2
= exp s+ 2
(si s)2 + by (10.20) and (10.21)
2 i=1 n
1 s2 1 2P n
= exp s+ 2
exp (si s)2
2 n 2 i=1
16:(d) Since
1 2 s2
MX (s) = M (0; :::; 0; s) = exp s+
2 n
and
1 P
n
MU (s1 ; :::; sn ) = M (s1 ; :::; sn ; 0) = exp 2
(si s)2
2 i=1
we have
M (s1 ; :::; sn ; s) = MX (s) MU (s1 ; :::; sn )
By the Independence Theorem for Moment Generating Functions, X and U = (U1 ; U2 ; : : : ; Un )
are independent random variables. Therefore by Chapter 4, Problem 1, X and
Pn Pn
2
Ui2 = Xi X are independent random variables.
i=1 i=1
10.4. CHAPTER 5 435
10.4 Chapter 5
1: (a) Since Yi Exponential( ; 1), i = 1; 2; : : : independently then
Z1
(y ) (x )
P (Yi > x) = e dy = e for x > ; i = 1; 2; : : : (10.22)
x
Since 8
<1 if x >
lim Fn (x) =
n!1 :0 if x <
therefore
Xn !p (10.24)
(b) By (10.24) and the Limit Theorems
Xn
Un = !p 1
(c) Now
v
P (Vn v) = P [n (Xn ) < v] = P Xn +
n
n(v=n+ )
= 1 e using (10:23)
v
= 1 e for v 0
Vn !D V Exponential (1)
(d) Since
v
P (Wn w) = P n2 (Xn ) < w = P Xn +
n2
n(w=n2 + )
= 1 e
w=n
= 1 e for w 0
therefore
lim P (Wn w) = 0 for all w 2 <
n!1
which is not a cumulative distribution function. Therefore Wn has no limiting distribution.
436 10. SOLUTIONS TO SELECTED END OF CHAPTER PROBLEMS
Gn (z) = P (Zn z)
= P (n [1 F (Yn )] z)
z
= P F (Yn ) 1
n
Let A be the support set of Xi . F (x) is an increasing function for x 2 A and therefore has
an inverse, F 1 , which is de…ned on the interval (0; 1). Therefore for 0 < z < n
z
Gn (z) = P F (Yn ) 1
n
1 z
= P Yn F 1
n
z
= 1 P Yn < F 1 1
h z in
n
= 1 F F 1 1 by (10.25)
n
z n
= 1 1
n
Since h z ni
z
lim 1 1 =1 e for z > 0
n!1 n
therefore 8
<0 if z < 0
lim Gn (z) =
n!1 :1 e z if z > 0
which is the cumulative distribution function of a Exponential(1) random variable. There-
fore by the de…nition of convergence in distribution
Zn !D Z Exponential (1)
10.4. CHAPTER 5 437
is
Mn (t) = E etYn
p t P n
= E exp n t+ p Xi
n i=1
p t P n
= e n t E exp p Xi
n i=1
p Q
n t
= e n t E exp p Xi
i=1 n
since X1 ; X2 ; : : : ; Xn are independent random variables
p n
t
= e n t M p
n
p h p i
= e n t exp n et= n 1
h p p i
= exp n t + n et= n 1 for t 2 <
and
p p
log Mn (t) = n t+n et= n
1 for t 2 <
By Taylor’s Theorem
x2 ec 3
ex = 1 + x + + x
2 3!
for some c between 0 and x. Therefore
p t 1 t 2 1 t 3 cn
et= n
= 1+ p + p + p e
n 2 n 3! n
t 1 t2 1 t3
= 1+ p + + ecn
n 2 n 3! n3=2
p
for some cn between 0 and t= n.
438 10. SOLUTIONS TO SELECTED END OF CHAPTER PROBLEMS
Therefore
p p
log Mn (t) = n t+n et= n
1
p t 1 t2 1 t3
= n t+n p + + ecn
n 2 n 3! n3=2
1 2 t3 1
= t + p ecn for t 2 <
2 3! n
Since
lim cn = 0
n!1
it follows that
lim ecn = e0 = 1
n!1
Therefore
1 2 t3 1
lim log Mn (t) = lim t + p ecn
n!1 n!1 2 3! n
1 2 t 3
= t + (0) (1)
2 3!
1 2
= t for t 2 <
2
and
1
t2
lim Mn (t) = e 2 for t 2 <
n!1
Yn !D Y N (0; )
10.4. CHAPTER 5 439
is
Mn (t) = E etZn
1 P n p
= E exp t p Xi n
n i=1
p t P n
= E e n t exp p Xi
n i=1
n t Q
p n t
= e E exp p Xi
i=1 n
p n
n t t
= e M p
n
!n
p 1
n t
= e
1 ptn
p n p
t= n t n
= e 1 p for t <
n
By Taylor’s Theorem
x2 ec 3
ex = 1 + x + + x
2 3!
for some c between 0 and x. Therefore
p 2 3
t= n t 1 t 1 t
e = 1+ p + p + p ecn
n 2! n 3! n
2 2 3 3
t 1 t 1 t
= 1+ p + + ecn
n 2 n 3! n3=2
p
for some cn between 0 and t= n.
440 10. SOLUTIONS TO SELECTED END OF CHAPTER PROBLEMS
Therefore
2 2 3 3 n
t 1 t 1 t t
Mn (t) = 1+ p + + ecn 1 p
n 2 n 3! n3=2 n
2 2 3 3
t 1 t 1 t
= [1 + p + + ecn
n 2 n 3! n3=2
2 2 3 3
t t t 1 t t 1 t t n
p p p p ecn p ]
n n n 2 n n 3! n3=2 n
2 2 3 3 3 3 4 4 n
1 t 1 t 1 t t
= 1 + ecn
2 n 2 n3=2 3! n3=2 n2
2 2 n
1 t (n)
= 1 +
2 n n
where
3 3 3 3 4 4
1 t 1 t t
(n) = + p ecn
2 n1=2 3! n n
Since
lim cn = 0
n!1
it follows that
lim ecn = e0 = 1
n!1
Also
3 3 4 4
t t
lim p = 0; lim =0
n!1 n n!1 n
and therefore
lim (n) = 0
n!1
Xn !p (10.29)
2
Xi
Let Wi = , i = 1; 2; : : : with
" #
2
Xi
E (Wi ) = E =1
Sn2 !p 2
(1) (1 + 0) = 2
442 10. SOLUTIONS TO SELECTED END OF CHAPTER PROBLEMS
and therefore
Sn
!p 1 (10.32)
E (Yi ) =
and
V ar (Yi ) = (1 )
for i = 1; 2; : : :
By 4.3.2(1)
P
n
Yi Binomial (n; )
i=1
Since Y1 ; Y2 ; : : : are independent and identically distributed random variables with
E (Yi ) = and V ar (Yi ) = (1 ) < 1, then by the Weak Law of Large Numbers
1 Pn
Yi = Yn !p
n i=1
P
n
Since Xn and Yi have the same distribution
i=1
Xn
Tn = !p (10.34)
n
(b) By (10.34) and the Limit Theorems
Xn Xn
Un = 1 !p (1 ) (10.35)
n n
(c) Since Y1 ; Y2 ; : : : are independent and identically distributed random variables with
E (Yi ) = and V ar (Yi ) = (1 ) < 1, then by the Central Limit Theorem
p
n Yn
p !D Z N (0; 1)
(1 )
P
n
Since Xn and Yi have the same distribution
i=1
p Xn
n n
Sn = p !D Z N (0; 1) (10.36)
(1 )
Wn !D W N (0; (1 )) (10.37)
444 10. SOLUTIONS TO SELECTED END OF CHAPTER PROBLEMS
1 1
g 0 (x) = q p
p 2 2 x
1 ( x)
1
= p
2 x (1 x)
and
g 0 (a) = g 0 ( )
1
= p
2 (1 )
1 p Z 1
Vn !D p Z (1 )= N 0;
2 (1 ) 2 4
and
1
V ar (Xi ) = 2
i = 1; 2; : : :
(a) Since X1 ; X2 ; : : : are independent and identically distributed random variables with
E (Xi ) = 1 and V ar (Xi ) = 1 2 < 1, then by the Weak Law of Large Numbers
Yn 1 Pn 1
Xn = = Xi !p (10.38)
n n i=1
(b) Since X1 ; X2 ; : : : are independent and identically distributed random variables with
E (Xi ) = 1 and V ar (Xi ) = 1 2 < 1, then by the Central Limit Theorem
p
n Xn 1
q !D Z N (0; 1) (10.39)
1
2
q
1
Since Z N(0; 1), W = 2 Z N 0; 1 2 and therefore
p 1 1
Wn = n Xn !D W N 0; 2 (10.40)
Therefore by (10.40)
p 1 1 1
n !D W N 0; 2 (10.42)
Vn
Next we determine the limiting distribution of
p
n (Vn )
1 1
which is the numerator of Zn . Let g (x) = x and a = . Then
g 0 (x) = x 2
and
g 0 (a) = g 0 1
= 2
or p
n (Vn )
q !D Z N (0; 1) (10.44)
2
(1 )
since if Z N(0; 1) then Z N(0; 1) by symmetry of the N(0; 1) distribution.
Since p
p pn(V n )
n (Vn ) 2
(1 )
Zn = p =q 2 (10.45)
Vn2 (1 Vn ) Vn (1 Vn )
2
(1 )
Z
Zn !D =Z N (0; 1)
1
10.4. CHAPTER 5 447
(a) Since X1 ; X2 ; : : : are independent and identically distributed random variables with
E (Xi ) = 2 and V ar (Xi ) = 2 2 < 1 then by the Weak Law of Large Numbers
1 Pn
Xn = Xi !p 2 (10.47)
n i=1
Xn 2 p
p !p p = 2
2 2
and p
2
p !p 1 (10.48)
Xn = 2
(b) Since X1 ; X2 ; : : : are independent and identically distributed random variables with
E (Xi ) = 2 and V ar (Xi ) = 2 2 < 1 then by the Central Limit Theorem
p p
n Xn 2 n Xn 2
Wn = p = p !D Z N (0; 1) (10.49)
2 2
2
By (10.49) and Slutsky’s Theorem
p p 2
Vn = n Xn 2 !D 2 Z N 0; 2 (10.50)
10.5 Chapter 6
3: (a) If Xi Geometric( ) then
where
P
n
t= xi
i=1
The log likelihood function is
^= n
n+t
and the maximum likelihood estimator is
^= n P
n
where T = Xi
n+T i=1
I( ) = S 0 ( ) = l00 ( )
n t
= 2 + for 0 < <1
(1 )2
Since I ( ) > 0 for all 0 < < 1, the graph of l ( ) is concave down and this also con…rms
that ^ = n= (n + t) is the maximum likelihood estimate.
10.5. CHAPTER 6 449
n t n t
I(^) = 2 + = 2 + t 2
^ (1 ^)2 n ( n+t )
n+t
n T nE (T )
E 2 + 2 = 2 +
(1 ) (1 )2
n n (1 )=
= 2 +
(1 )2
(1 )+
= n 2
(1 )
n
= 2 for 0 < <1
(1 )
3: (c) Since
(1 )
= E (Xi ) =
then by the invariance property of maximum likelihood estimators the maximum likelihood
estimator of = E (Xi ) is
1 ^ T
^= = =X
^ n
P
20
3: (d) If n = 20 and t = xi = 40 then the maximum likelihood estimate of is
i=1
^= 20 1
=
20 + 40 3
The relative likelihood function of is given by
L( ) 20
(1 )40
R( ) = = for 0 1
L(^) 1 20 2 40
3 3
R (0:5) = 0:03344 implies that = 0:5 is outside a 10% likelihood interval and we would
conclude that = 0:5 is not a very plausible value of given the data.
450 10. SOLUTIONS TO SELECTED END OF CHAPTER PROBLEMS
1.0
0.8
0.6
R(θ)
0.4
0.2
0.0
n! h in x1 x2
2 x1
L ( 1; 2) = [2 (1 )]x2 (1 )2
x1 !x2 ! (n x1 x2 )!
n!
= 2 x2 2x1 +x2
(1 )2n 2x1 x2
for 0 < <1
x1 !x2 ! (n x1 x2 )!
The log likelihood function is
n!
l ( ) = log + x2 log 2
x1 !x2 ! (n x1 x2 )!
+ (2x1 + x2 ) log + (2n 2x1 x2 ) log (1 ) for 0 < <1
Since
2x1 + x2
S ( ) > 0 for 0 < <
2n
and
2x1 + x2
S ( ) < 0 for 1 > >
2n
therefore by the …rst derivative test, l ( ) has an absolute maximum at
= (2x1 + x2 ) = (2n). Thus the maximum likelihood estimate of is
^ = 2x1 + x2
2n
and the maximum likelihood estimator of is
^ = 2X1 + X2
2n
The information function is
2x1 + x2 2n (2x1 + x2 )
I( )= 2 + for 0 < <1
(1 )2
and the observed information is
2x1 + x2 (2n)2 (2x1 + x2 ) (2n)2 [2n (2x1 + x2 )]
I(^) = I = +
2n (2x1 + x2 )2 [2n (2x1 + x2 )]2
(2n)2 (2n)2 2n
= + = 2x1 +x2 2x1 +x2
(2x1 + x2 ) [2n (2x1 + x2 )] 2n 1 2n
2n
=
^ 1 ^
Since
2
X1 Binomial n; and X2 Binomial (n; 2 (1 ))
2
E (2X1 + X2 ) = 2n + n [2 (1 )] = 2n
The expected information is
2X1 + X2 2n
(2X1 + X2 )
J( ) = E 2 +
(1 )2
2n 2n (1 ) 1 1
= 2 + 2 = 2n +
(1 ) 1
2n
= for 0 < < 1
(1 )
6: (a) Given
k
P (k children in family; ) = for k = 1; 2; : : :
1 2 1
P (0 children in family; ) = for 0 < <
1 2
452 10. SOLUTIONS TO SELECTED END OF CHAPTER PROBLEMS
6: (b) S ( ) = 0 if
2 2 82 1
98 164 + 49 = 0 or + =0
49 2
Therefore S ( ) = 0 if
q s s
82 82 2 1
49 49 41 1 4 2 82 2 41 1 (82)2 2 (49)2
= = 2=
2 49 2 49 49 2 (49)2
41 1p 41 1 p
= 6724 4802 = 1922
49 98 49 98
p
Since 0 < < 21 and = 41 1
49 + 98 1922 > 1, we choose
41 1p
= 1922
49 98
10.5. CHAPTER 6 453
Since
41 1p 41 1p 1
S ( ) > 0 for 0 < < = 1922 and S ( ) < 0 for = 1922 < <
49 98 49 98 2
therefore the maximum likelihood estimate of is
^= 41 1p
= 1922 0:389381424147286 0:3894
49 98
The observed information for the given data is
68 17 49
I(^) = + 2 1666:88
(1 2^)2 (1 ^)2 ^
6: (c) A graph of the relative likelihood function is given in Figure 10.26.A 15% likelihood
1.0
0.8
0.6
R(θ)
0.4
0.2
0.0
P ~ + 1 log (1 p) ~
n
~ 1
= P 0 log (1 p)
n
1
= P 0 Q (X; ) log (1 p)
n
1
= P Q (X; ) log (1 p) P (Q (X; ) 0)
n
log(1 p)
= 1 e 0
= 1 (1 p) = p
h i
^+ 1
log (1 p) ; ^ is a 100p% con…dence interval for .
n
Since
1 1 p ~ + 1 log 1 + p
P ~ + log
n 2 n 2
1 1+p ~ 1 1 p
= P log log
n 2 n 2
1 1+p 1 1 p
= P log Q (X; ) log
n 2 n 2
1 p 1+p
= 1 elog( 2 ) 1 elog( 2 )
1 p 1 p
= + + + =p
2 2 2 2
10.5. CHAPTER 6 455
h i
^+ 1
log 1 p
;^ + 1
log 1+p
is a 100p% con…dence interval for .
n 2 n 2
h i
The interval ^ + n1 log (1 p) ; ^ is a better choice since it contains ^ while the interval
h i
^ + 1 log 1 p ; ^ + 1 log 1+p does not.
n 2 n 2
1 1
8:(a) If x1 ; x2 ; : : : ; xn is an observed random sample from the Gamma 2; distribution
then the likelihood function is
Q
n
L( ) = f (xi ; )
i=1
1=2 1=2 xi
Q
n xi e
= 1
i=1 2
1=2 n
Q
n 1 n=2 t
= xi e for >0
i=1 2
where
P
n
t= xi
i=1
or more simply
n=2 t
L( ) = e for >0
The log likelihood function is
l ( ) = log L ( )
n
= log t for >0
2
and the score function is
d n n 2 t
S( )= l( ) = t= for >0
d 2 2
n
S ( ) = 0 for =
2t
Since
n
S ( ) > 0 for 0 < <
2t
and
n
S ( ) < 0 for >
2t
n
therefore by the …rst derivative test l ( ) has a absolute maximum at = 2t . Thus
^= n = 1
2t 2x
is the maximum likelihood estimate of and
~= 1
2X
456 10. SOLUTIONS TO SELECTED END OF CHAPTER PROBLEMS
1 1
E (Xi ) = and V ar (Xi ) =
2 2 2
and by the Weak Law of Large Numbers
1
X !p
2
and by the Limit Theorems
~ = 1 !p 1 =
2X 2 21
as required.
8:(c) By the Invariance Property of Maximum Likelihood Estimates the maximum likeli-
hood estimate of
1
= V ar (Xi ) =
2 2
is
2
1 1 4t2 t
^= 2 = 2 = =2 = 2x2
2^
n
2 2t 2n2 n
1
M (t) = for t <
t 1=2
1
P
n
The moment generating function of Q = 2 Xi is
i=1
P
n
MQ (t) = E etQ = E exp 2t Xi
i=1
Q
n Q
n
= E [exp (2tXi )] = M (2t )
i=1 i=1
!n=2
1
= 2t
for 2t <
1
1 1
= n=2
for t <
(1 2t) 2
which is the moment generating function of a 2 (n) random variable. Therefore by the
Uniqueness Theorem for Moment Generating Functions, Q 2 (n).
10.5. CHAPTER 6 457
To construct a 95% equal tail con…dence interval for we …nd a and b such that
P (Q a) = 0:025 = P (Q > b) so that
or
a b
P < < = 0:95
2T 2T
a b
so that 2t ; 2t is a 95% equal tail con…dence interval for .
For n = 20 we have
P
20
For t = xi = 6 a 95% equal tail con…dence interval for is
i=1
9:59 34:17
; = [0:80; 2:85]
2(6) 2(6)
Since = 0:7 is not in the 95% con…dence interval it is not a plausible value of in light of
the data.
8:(e) The information function is
d n
I( )= S( )= 2 for >0
d 2
The expected information is
n n
J ( ) = E [I ( ; X1 ; : : : ; Xn )] = E = for >0
2 2 2 2
and h i1=2 p p
n ~ n 1
J(~) ~ =p =p
2~ 2~ 2X
By the Central Limit Theorem
p
n X 21 p 1
= 2n X !D Z N (0; 1) (10.51)
p1 2
2
Let
1 1
g(x) = and a =
2x 2
then
1
g 0 (x) =
2x2
1 1
g(a) = g = 1 =
2 2 2
458 10. SOLUTIONS TO SELECTED END OF CHAPTER PROBLEMS
and
1 1
g 0 (a) = g 0 = 2 = 2 2
2 2 1
2
By the Delta Theorem and (10.51)
p 1
2n !D 2 2Z N 0; 4 4
2X
or p
2n ~ !D 2 2Z N 0; 4 4
By Slutsky’s Theorem p
2n ~
!D Z N (0; 1)
2 2
But if Z N(0; 1) then Z N(0; 1) and thus
p
n ~
p !D Z N (0; 1) (10.52)
2
P
20
For n = 20 and t = xi = 6;
i=1
^= 1 5 n 20
6 = and J ^ = 2 = 2 = 3:6
2 20
3 2^ 2 53
P
20
For n = 20 and t = xi = 6 the relative likelihood function of is given by
i=1
10 6 10
L( ) e 3
R( ) = = = e10 6
for >0
L(^) 5 10 5
3 e 10
0.4
0.2
0.0
1 2 3 4
10.6 Chapter 7
1: Since Xi has cumulative distribution function
2
1
F (x; 1; 2) =1 for x 1; 1 > 0; 2 >0
x
For each value of 2 the likelihood function is maximized over 1 by taking 1 to be as large
as possible subject to 0 < 1 x(1) . Therefore for …xed 2 the likelihood is maximized
for 1 = x(1) . Since this is true for all values of 2 the value of ( 1 ; 2 ) which maximizes
L ( 1 ; 2 ) will necessarily have 1 = x(1) .
To …nd the value of 2 which maximizes L x(1) ; 2 consider the function
1
n n 2 Q
n 2
L2 ( 2 ) = L x(1) ; 2 = 2 x(1) xi for 2 >0
i=1
Now
d n P
n
l2 ( 2 ) = l20 ( 2 ) = + n log x(1) log xi
d 2 2 i=1
n Pn xi
= log
2 i=1 x(1)
n
= t
2
n 2t
=
2
10.6. CHAPTER 7 461
where
P
n xi
t= log
i=1 x(1)
Now l20 ( 2 ) = 0 for 2 = n=t. Since l20 ( 2 ) > 0 for 0 < 2 < n=t and l20 ( 2 ) < 0 for 2 > n=t
therefore by the …rst derivative test l2 ( 2 ) is maximized for 2 = n=t = ^2 . Therefore
L2 ( 2 ) = L x(1) ; 2 is also maximized for 2 = ^2 . Therefore the maximum likelihood
estimates are
^1 = x(1) and ^2 = n
Pn
log xxi
(1)
i=1
and the maximum likelihood estimators are
~1 = X(1) and ~2 = n
P
n
Xi
log X(1)
i=1
4: (a) If the events S and H are independent events then P (S \ H) = P (S)P (H) = ,
P (S \ H) = P (S)P (H) = (1 ), etc.
The likelihood function is
n!
L( ; ) = ( )x11 [ (1 )]x12 [(1 ) ]x21 [(1 ) (1 )]x22
x11 !x12 !x21 !x22 !
or more simply (ignoring constants with respect to and )
L( ; ) = x11 +x12
(1 )x21 +x22 x11 +x21
(1 )x12 +x22 for 0 1, 0 1
l( ; ) = (x11 + x12 ) log + (x21 + x22 ) log(1 ) + (x11 + x21 ) log + (x12 + x22 ) log(1 )
for 0 < < 1, 0 < <1
Since
@l x11 + x12 x21 + x22 x11 + x12 n (x11 + x12 ) x11 + x12 n
= = =
@ 1 1 (1 )
@l x11 + x21 x12 + x22 x11 + x21 n (x11 + x21 ) x11 + x21 n
= = =
@ 1 1 (1 )
the score vector is
h i
x11 +x12 n x11 +x21 n
S( ; ) = (1 ) (1 )
for 0 < < 1, 0 < <1
4:(b) Since X11 + X12 = the number of times the event S is observed and P (S) = , then
the distribution of X11 + X12 is Binomial(n; ). Therefore E (X11 + X12 ) = n and
Since X11 + X21 = the number of times the event H is observed and P (H) = , then the
distribution of X11 + X21 is Binomial(n; ). Therefore E (X11 + X21 ) = n and
xi 1
7:(a) If Yi Binomial(1; pi ) where pi = 1 + e and x1 ; x2 ; : : : ; xn are known
constants, the likelihood function for ( ; ) is
Q
n 1 yi
L( ; ) = p (1 pi )1 yi
i=1 yi i
or more simply
Q
n
L( ; ) = pyi i (1 pi )1 yi
i=1
The log likelihood function is
P
n
l ( ; ) = log L ( ; ) = [yi log (pi ) + (1 yi ) log (1 pi )]
i=1
10.6. CHAPTER 7 463
Note that
@pi @ 1 e xi
xi
= 1+e =
@ @ (1 + e x i )2
1 e xi
= xi ) (1 + e xi )
= pi (1 pi )
(1 + e
and
@pi @ 1 xi e xi
xi
= 1+e =
@ @ (1 + e xi )2
1 e xi
= xi xi ) (1 xi )
= xi pi (1 pi )
(1 + e +e
Therefore
@l @l @pi Pn yi (1 yi ) @pi
= =
@ @pi @ i=1 pi (1 pi ) @
Pn yi (1 pi ) (1 yi ) pi
= pi (1 pi )
i=1 pi (1 pi )
Pn
= [yi (1 pi ) (1 yi ) pi ]
i=1
Pn
= (yi pi )
i=1
@l @l @pi Pn yi (1 yi ) @pi
= =
@ @pi @ i=1 pi (1 pi ) @
P yi (1 pi ) (1 yi ) pi
n
= xi pi (1 pi )
i=1 pi (1 pi )
Pn
= xi [yi (1 pi ) (1 yi ) pi ]
i=1
Pn
= xi (yi pi )
i=1
@2l @ @l @ P
n Pn @p
i P
n
= = (yi pi ) = = xi pi (1 pi )
@ @ @ @ @ i=1 i=1 @ i=1
464 10. SOLUTIONS TO SELECTED END OF CHAPTER PROBLEMS
and
@2l @ @l @ P
n P
n @pi P
n
2 = @ = xi (yi pi ) = xi = x2i pi (1 pi )
@ @ @ i=1 i=1 @ i=1
which is a constant function of the random variables Y1 ; Y2 ; :::; Yn and therefore the expected
information is J ( ; ) = I ( ; )
5:(b) The maximum likelihood estimates of and are found by solving the equations
h i
@l @l
S( ; ) = @ @
P
n P
n
= (yi pi ) xi (yi pi )
i=1 i=1
h i
= 0 0
(0) ; (0)
where is an initial estimate of ( ; ).
10.7. CHAPTER 8 465
10.7 Chapter 8
1:(a) The hypothesis H0 : = 0 is a simple hypothesis since the model is completely
speci…ed.
From Example 6.3.6 the likelihood function is
n Q
n
L( ) = xi for >0
i=1
^= n
P
n
log xi
i=1
( 0 ; X) = 2 log R ( 0 ; X)
" ~#
n
0 Q
n 0
= 2 log Xi
~ i=1
~ P log Xi
n
0
= 2 n log 0
~ i=1
0 ~ 1Pn
= 2 n log +n 0 log Xi
~ n i=1
P
n
log Xi
= 2n log
0
+ 0
~ 1 since i=1
=
1
~ ~ n ~
0 0
= 2n 1 log
~ ~
( 0 ; x) = 2 log R ( 0 ; X)
0 0
= 2n 1 log
^ ^
466 10. SOLUTIONS TO SELECTED END OF CHAPTER PROBLEMS
The parameter space is = f : > 0g which has dimension 1 and thus k = 1. The
approximate p-value is
2
p-value P (W ( 0 ; x)) where W (1)
h p i
= 2 1 P Z ( 0 ; x) where Z N (0; 1)
P
20
(b) If n = 20 and log xi = 25 and H0 : = 1 then ^ = 20=25 = 0:8 the observed value
i=1
of the likelihood ratio test statistic is
( 0 ; x) = 2 log R ( 0 ; X)
1 1
= 2 (20) 1 log
0:8 0:8
= 40 [0:25 log (1:25)]
= 1:074258
and
2
p value P (W 1:074258) where W (1)
h p i
= 2 1 P Z 1:074258 where Z N (0; 1)
= 0:2999857
calculated using R. Since p value > 0:1 there is no evidence against H0 : = 1 based on
the data.
2n 1 P
n
L1 ( 1 ) = 1 exp 2 x2i for 1 >0
1 i=1
1=2
P
n
with maximum likelihood estimate ^1 = 1
n x2i .
i=1
Similarly the likelihood function for an observed random sample y1 ; y2 ; : : : ; ym from a
Weibull(2; 2 ) distribution is
2m 1 P
m
L2 ( 2 ) = 2 exp 2 yi2 for 2 >0
2 i=1
1=2
P
m
with maximum likelihood estimate ^2 = 1
m yi2 .
i=1
10.7. CHAPTER 8 467
The independence of the samples implies the maximum likelihood estimators are
1=2 1=2
~1 = 1 Pn
~2 = 1 Pm
X2 Y2
n i=1 i m i=1 i
Therefore
1 Pn 1 Pm
l(~1 ; ~2 ; X; Y) = n log X2 m log Y2 (n + m)
n i=1 i m i=1 i
If 1 = 2 = then the log likelihood function is
1 P
n P
m
l( ) = 2 (n + m) log 2 x2i + yi2 for >0
i=1 i=1
d 2 (n + m) 2 P
n P
m
l( ) = + 3 x2i + yi2
d i=1 i=1
d
and d l ( ) = 0 for
1=2
1 P
n P
m
= x2i + yi2
n+m i=1 i=1
and therefore
1 P
n P
m
max l( 1 ; 2 ; X; Y) = (n + m) log Xi2 + Yi2 (n + m)
( 1 ; 2 )2 0 n+m i=1 i=1
(X; Y; 0)
= 2 l(~1 ; ~2 ; X; Y) max l( 1 ; 2 ; X; Y)
( 1 ; 2 )2 0
1 P
n 1 Pm
= 2[ n log Xi2 m log Y2 (n + m)
n i=1 m i=1 i
1 Pn P
m
+ (n + m) log Xi2 + Yi2 + (n + m) ]
n + m i=1 i=1
1 Pn P
m
= 2[ (n + m) log Xi2 + Yi2
n + m i=1 i=1
1 P n 1 Pm
n log Xi2 m log Y2 ]
n i=1 m i=1 i
468 10. SOLUTIONS TO SELECTED END OF CHAPTER PROBLEMS
1 P
n P
m
(x; y; 0) = 2[ (n + m) log x2i + yi2
n+m i=1 i=1
1 Pn 1 Pm
n log x2 m log y2 ]
n i=1 i m i=1 i
469
Summary of Discrete Distributions
Moment
Probability
Notation and Mean Variance Generating
Function
Parameters EX VarX Function
fx
Mt
Discrete Uniforma, b b
1
b−a1 ab b−a1 2 −1 1
b−a1
∑ e tx
b≥a 2 12 xa
x a, a 1, … , b
a, b integers t∈
HypergeometricN, r, n xr N−r
n−x
Nn
N 1, 2, … nr nr
1 − r
N−n Not tractable
N N N N−1
n 0, 1, … , N x max 0, n − N r,
r 0, 1, … , N … , minr, n
Binomialn, p
nx p x q n−x np npq pe t q n
0 ≤ p ≤ 1, q 1 − p
x 0, 1, … , n t∈
n 1, 2, …
Bernoullip p x q 1−x p pq pe t q
0 ≤ p ≤ 1, q 1 − p x 0, 1 t∈
fx 1 , x 2 , … , x k
Multinomialn; p 1 , p 2 , … , p k Mt 1 , t 2 , … , t k−1
n!
x 1 !x 2 !x k !
p x11 p x22 p xk k
0 ≤ pi ≤ 1 VarX i p 1 e t 1 p 2 e t 2
x i 0, 1, … , n EX i np i
i 1, 2, … , k np i 1 − p i p k−1 e t k−1 p k n
i 1, 2, … , k i 1, 2, … , k
k i 1, 2, … , k ti ∈
and ∑ p i 1 k
i1 and ∑ x i n i 1, 2, … , k − 1
i1
Summary of Continuous Distributions
Probability Moment
Notation and Density Mean Variance Generating
Parameters Function EX VarX Function
fx Mt
e bt −e at
Uniforma, b
1
ab b−a 2 b−at
t≠0
b−a
2 12
ba axb 1 t0
Γab
ΓaΓb
x a−1 1 − x b−1
k−1
Betaa, b 0x1 a ab 1∑ ai
abi
tk
k!
ab ab1ab 2 k1 i0
a 0, b 0
Γ x −1 e −x dx t∈
0
2 2
e −x− /2 2 t 2 /2
N, 2 2 2 e t
∈ , 2 0 x∈ t∈
2 2
e −log x− /2 2
Lognormal, 2 2 /2 e 2
2 x e DNE
2 2 2
∈ , 0 x0 −e
1 1
Exponential e −x/
2 1−t
1
0 x≥0 t
Two Parameter 1 e t
e −x−/ 2 1−t
Exponential,
1
x≥ t
∈ , 0
Double 1 e t
e −|x−|/ 2 1− 2 t 2
Exponential,
2 2
1
x∈ |t|
∈ , 0
1 x−/ −
Extreme Value,
e x−/−e 22 e t Γ1 t
0. 5772 6
∈ , 0 x∈ t −1/
Euler’s constant
x −1 e −x/
Gamma, 1 − t −
Γ 2
1
0, 0 x0 t
x −−1 e −1/x 1 1
Inverse Gamma, Γ −1 2 −1 2 −2 DNE
0, 0 x0 1 2
Summary of Continuous Distributions Continued
Probability Moment
Notation and Density Mean Variance Generating
Parameters Function EX VarX Function
fx Mt
x k/2−1 e −x/2
2 k 2 k/2 Γk/2 k 2k 1 − 2t −k/2
1
k 1, 2, … x0 t 2
Weibull, x −1 e −x/ 1 2 Γ1 2
Γ1 Not tractable
2 1
0, 0 x0 −Γ 1
2
Pareto, x 1 −1 −1 2 −2 DNE
0, 0 x 1 2
e −x−/
Logistic, 1e −x−/
2
22
e t Γ1 − tΓ1 t
3
∈ , 0 x∈
1
Cauchy, 1x−/ 2 DNE DNE DNE
∈ , 0 x∈
2 −k1/2
Γ k1 1 xk k
tk 2
k
0 k−2 DNE
k Γ 2
k 1, 2, … k 2, 3, … k 3, 4, …
x∈
k1
k1 2 k 1 k 2
k2
Γ 2
Fk 1 , k 2 k1 k2
k2 2k 22 k 1 k 2 −2
Γ 2
Γ 2
k 1 1, 2, … k 1 k 2
k 2 −2 k 1 k 2 −2 2 k 2 −4 DNE
k1 −
x 2
−1
1 k1
x 2 k2 2 k2 4
k 2 1, 2, … k2
x0
X1
X X2
BVN,
1
2
1 ∈ , 2 ∈ fx 1 , x 2 Mt 1 , t 2
1
e − 12 x− T −1 x− e
T t 1 t T t
21 1 2 2|| 1/2
2
1 2 22 x 1 ∈ , x 2 ∈ t 1 ∈ , t 2 ∈
1 0, 2 0
−1 1
12. Distribution Tables
473
N(0,1) Cumulative
Distribution Function
df \ p 0.6 0.7 0.8 0.9 0.95 0.975 0.99 0.995 0.999 0.9995
1 0.3249 0.7265 1.3764 3.0777 6.3138 12.7062 31.8205 63.6567 318.3088 636.6192
2 0.2887 0.6172 1.0607 1.8856 2.9200 4.3027 6.9646 9.9248 22.3271 31.5991
3 0.2767 0.5844 0.9785 1.6377 2.3534 3.1824 4.5407 5.8409 10.2145 12.9240
4 0.2707 0.5686 0.9410 1.5332 2.1318 2.7764 3.7469 4.6041 7.1732 8.6103
5 0.2672 0.5594 0.9195 1.4759 2.0150 2.5706 3.3649 4.0321 5.8934 6.8688
6 0.2648 0.5534 0.9057 1.4398 1.9432 2.4469 3.1427 3.7074 5.2076 5.9588
7 0.2632 0.5491 0.8960 1.4149 1.8946 2.3646 2.9980 3.4995 4.7853 5.4079
8 0.2619 0.5459 0.8889 1.3968 1.8595 2.3060 2.8965 3.3554 4.5008 5.0413
9 0.2610 0.5435 0.8834 1.3830 1.8331 2.2622 2.8214 3.2498 4.2968 4.7809
10 0.2602 0.5415 0.8791 1.3722 1.8125 2.2281 2.7638 3.1693 4.1437 4.5869
11 0.2596 0.5399 0.8755 1.3634 1.7959 2.2010 2.7181 3.1058 4.0247 4.4370
12 0.2590 0.5386 0.8726 1.3562 1.7823 2.1788 2.6810 3.0545 3.9296 4.3178
13 0.2586 0.5375 0.8702 1.3502 1.7709 2.1604 2.6503 3.0123 3.8520 4.2208
14 0.2582 0.5366 0.8681 1.3450 1.7613 2.1448 2.6245 2.9768 3.7874 4.1405
15 0.2579 0.5357 0.8662 1.3406 1.7531 2.1314 2.6025 2.9467 3.7328 4.0728
16 0.2576 0.5350 0.8647 1.3368 1.7459 2.1199 2.5835 2.9208 3.6862 4.0150
17 0.2573 0.5344 0.8633 1.3334 1.7396 2.1098 2.5669 2.8982 3.6458 3.9651
18 0.2571 0.5338 0.8620 1.3304 1.7341 2.1009 2.5524 2.8784 3.6105 3.9216
19 0.2569 0.5333 0.8610 1.3277 1.7291 2.0930 2.5395 2.8609 3.5794 3.8834
20 0.2567 0.5329 0.8600 1.3253 1.7247 2.0860 2.5280 2.8453 3.5518 3.8495
21 0.2566 0.5325 0.8591 1.3232 1.7207 2.0796 2.5176 2.8314 3.5272 3.8193
22 0.2564 0.5321 0.8583 1.3212 1.7171 2.0739 2.5083 2.8188 3.5050 3.7921
23 0.2563 0.5317 0.8575 1.3195 1.7139 2.0687 2.4999 2.8073 3.4850 3.7676
24 0.2562 0.5314 0.8569 1.3178 1.7109 2.0639 2.4922 2.7969 3.4668 3.7454
25 0.2561 0.5312 0.8562 1.3163 1.7081 2.0595 2.4851 2.7874 3.4502 3.7251
26 0.2560 0.5309 0.8557 1.3150 1.7056 2.0555 2.4786 2.7787 3.4350 3.7066
27 0.2559 0.5306 0.8551 1.3137 1.7033 2.0518 2.4727 2.7707 3.4210 3.6896
28 0.2558 0.5304 0.8546 1.3125 1.7011 2.0484 2.4671 2.7633 3.4082 3.6739
29 0.2557 0.5302 0.8542 1.3114 1.6991 2.0452 2.4620 2.7564 3.3962 3.6594
30 0.2556 0.5300 0.8538 1.3104 1.6973 2.0423 2.4573 2.7500 3.3852 3.6460
40 0.2550 0.5286 0.8507 1.3031 1.6839 2.0211 2.4233 2.7045 3.3069 3.5510
50 0.2547 0.5278 0.8489 1.2987 1.6759 2.0086 2.4033 2.6778 3.2614 3.4960
60 0.2545 0.5272 0.8477 1.2958 1.6706 2.0003 2.3901 2.6603 3.2317 3.4602
70 0.2543 0.5268 0.8468 1.2938 1.6669 1.9944 2.3808 2.6479 3.2108 3.4350
80 0.2542 0.5265 0.8461 1.2922 1.6641 1.9901 2.3739 2.6387 3.1953 3.4163
90 0.2541 0.5263 0.8456 1.2910 1.6620 1.9867 2.3685 2.6316 3.1833 3.4019
100 0.2540 0.5261 0.8452 1.2901 1.6602 1.9840 2.3642 2.6259 3.1737 3.3905
>100 0.2535 0.5247 0.8423 1.2832 1.6479 1.9647 2.3338 2.5857 3.1066 3.3101