0% found this document useful (0 votes)

27 views

STAT 330 Course Notes Fall 2024 Edition

Uploaded by

sagiping

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

27 views

STAT 330 Course Notes Fall 2024 Edition

Uploaded by

sagiping

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 482

STATISTICS 330 COURSE NOTES

Cyntha A. Struthers
Department of Statistics and Actuarial Science, University of Waterloo

Fall 2024 Edition

ii 1
Contents

1. Preview 1
1.1 Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2

2. Univariate Random Variables 5

2.1 Probability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2.2 Random Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
2.3 Discrete Random Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
2.4 Continuous Random Variables . . . . . . . . . . . . . . . . . . . . . . . . . . 19
2.5 Location and Scale Parameters . . . . . . . . . . . . . . . . . . . . . . . . . 27
2.6 Functions of a Random Variable . . . . . . . . . . . . . . . . . . . . . . . . 30
2.7 Expectation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
2.8 Inequalities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
2.9 Variance Stabilizing Transformation . . . . . . . . . . . . . . . . . . . . . . 44
2.10 Moment Generating Functions . . . . . . . . . . . . . . . . . . . . . . . . . 45
2.11 Calculus Review . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
2.12 Chapter 2 Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56

3. Multivariate Random Variables 61

3.1 Joint and Marginal Cumulative Distribution Functions . . . . . . . . . . . . 62
3.2 Bivariate Discrete Distributions . . . . . . . . . . . . . . . . . . . . . . . . . 63
3.3 Bivariate Continuous Distributions . . . . . . . . . . . . . . . . . . . . . . . 66
3.4 Independent Random Variables . . . . . . . . . . . . . . . . . . . . . . . . . 76
3.5 Conditional Distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84
3.6 Joint Expectations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90
3.7 Conditional Expectation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97
3.8 Joint Moment Generating Functions . . . . . . . . . . . . . . . . . . . . . . 102
3.9 Multinomial Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108
3.10 Bivariate Normal Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . 111
3.11 Calculus Review . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114
3.12 Chapter 3 Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116

iii
iv CONTENTS

4. Functions of Two or More Random Variables 121

4.1 Cumulative Distribution Function Technique . . . . . . . . . . . . . . . . . 121
4.2 One-to-One Transformations . . . . . . . . . . . . . . . . . . . . . . . . . . 125
4.3 Moment Generating Function Technique . . . . . . . . . . . . . . . . . . . . 137
4.4 Chapter 4 Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 147

5. Limiting or Asymptotic Distributions 151

5.1 Convergence in Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . 152
5.2 Convergence in Probability . . . . . . . . . . . . . . . . . . . . . . . . . . . 156
5.3 Weak Law of Large Numbers . . . . . . . . . . . . . . . . . . . . . . . . . . 159
5.4 Moment Generating Function Technique for Limiting Distributions . . . . . 167
5.5 Additional Limit Theorems . . . . . . . . . . . . . . . . . . . . . . . . . . . 172
5.6 Chapter 5 Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 179

6. Maximum Likelihood Estimation - One Parameter 183

6.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 184
6.2 Maximum Likelihood Method . . . . . . . . . . . . . . . . . . . . . . . . . . 185
6.3 Score and Information Functions . . . . . . . . . . . . . . . . . . . . . . . . 191
6.4 Likelihood Intervals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 204
6.5 Limiting Distribution of Maximum Likelihood Estimator . . . . . . . . . . . 208
6.6 Con…dence Intervals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 213
6.7 Approximate Con…dence Intervals . . . . . . . . . . . . . . . . . . . . . . . 218
6.8 Chapter 6 Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 224

7. Maximum Likelihood Estimation - Multiparameter 227

7.1 Likelihood and Related Functions . . . . . . . . . . . . . . . . . . . . . . . . 228
7.2 Likelihood Regions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 237
7.3 Limiting Distribution of Maximum Likelihood Estimator . . . . . . . . . . . 240
7.4 Approximate Con…dence Regions . . . . . . . . . . . . . . . . . . . . . . . . 242
7.5 Chapter 7 Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 249

8. Hypothesis Testing 253

8.1 Test of Hypothesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 253
8.2 Likelihood Ratio Tests for Simple Hypotheses . . . . . . . . . . . . . . . . . 257
8.3 Likelihood Ratio Tests for Composite Hypotheses . . . . . . . . . . . . . . . 265
8.4 Chapter 8 Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 273

9. Solutions to Chapter Exercises 275

9.1 Chapter 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 276
9.2 Chapter 3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 288
9.3 Chapter 4 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 305
9.4 Chapter 5 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 316
CONTENTS v

9.5 Chapter 6 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 318

9.6 Chapter 7 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 325
9.7 Chapter 8 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 334

10. Solutions to Selected End of Chapter Problems 337

10.1 Chapter 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 338
10.2 Chapter 3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 383
10.3 Chapter 4 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 412
10.4 Chapter 5 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 435
10.5 Chapter 6 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 448
10.6 Chapter 7 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 460
10.7 Chapter 8 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 465

11. Summary of Named Distributions 469

12. Distribution Tables 473

0 CONTENTS

Preface
In order to provide improved versions of these Course Notes for students in subsequent
terms, please email corrections, sections that are confusing, or comments/suggestions to
[email protected].
1. Preview

The following examples will illustrate the ideas and concepts discussed in these Course
Notes. They also indicate how these ideas and concepts are connected to each other.

1.1 Example

The number of service interruptions in a communications system over 200 separate days is
summarized in the following frequency table:

Number of interruptions 0 1 2 3 4 5 >5 Total

Observed frequency 64 71 42 18 4 1 0 200

It is believed that a Poisson model will …t these data well. Why might this be a reasonable
assumption? (PROBABILITY MODELS )
If we let the random variable X = number of interruptions in a day and assume that
the Poisson model is reasonable then the probability function of X is given by

xe
P (X = x) = for x = 0; 1; : : :
x!

where is a parameter of the model which represents the mean number of service inter-
ruptions in a day. (RANDOM VARIABLES, PROBABILITY FUNCTIONS, EXPEC-
TATION, MODEL PARAMETERS ) Since is unknown we might estimate it using the
sample mean
64(0) + 71(1) + + 1(5) 230
x= = = 1:15
200 200

(POINT ESTIMATION ) The estimate ^ = x is the maximum likelihood estimate of .

It is the value of which maximizes the likelihood function. (MAXIMUM LIKELIHOOD
ESTIMATION ) The likelihood function is the probability of the observed data as a function
of the unknown parameter(s) in the model. The maximum likelihood estimate is thus the
value of which maximizes the probability of observing the given data.

1
2 1. PREVIEW

In this example the likelihood function is given by

L( ) = P (observing 0 interruptions 64 times,. . . , > 5 interruptions 0 times; )

0e 64 1e 71 5e 1 xe 0
200! P
1
=
64!71! 1!0! 0! 1! 5! x=6 x!
64(0)+71(1)+ +1(5) (64+71+ +1)
=c e
230 200
=c e for >0

where
64 71 1
200! 1 1 1
c= :::
64!71! 1!0! 0! 1! 5!
dL
The maximum likelihood estimate of can be found by solving d = 0 or equivalently
d log L
d = 0 and verifying that it corresponds to a maximum.
If we want an interval of values for which are reasonable given the data then we
could construct a con…dence interval for . (INTERVAL ESTIMATION ) To construct
con…dence intervals we need to …nd the sampling distribution of the estimator. In this
example we would need to …nd the distribution of the estimator
X1 + X2 + + Xn
X=
n
where Xi = number of interruptions in a day i, i = 1; 2; : : : ; 200. (FUNCTIONS OF
RANDOM VARIABLES: cumulative distribution function technique, one-to-one transfor-
mations, moment generating function technique) Since Xi Poisson( ) with E(Xi ) =
and V ar(Xi ) = the distribution of X for large n is approximately N( ; =n) by the
Central Limit Theorem. (LIMITING DISTRIBUTIONS )
Suppose the manufacturer of the communications system claimed that the mean number
of interruptions was 1. Then we would like to test the hypothesis H : = 1. (TESTS OF
HYPOTHESIS ) A test of hypothesis uses a test statistic to measure the evidence based on
the observed data against the hypothesis. A test statistic with good properties for testing
H : = 0 is the likelihood ratio statistic, 2 log [L ( 0 ) =L (^ )]. (LIKELIHOOD RATIO
STATISTIC ) For large n the distribution of the likelihood ratio statistic is approximately
2 (1) if the hypothesis H : = 0 is true.

1.2 Example
The following are relief times in hours for 20 patients receiving a pain killer:

1:1 1:4 1:3 1:7 1:9 1:8 1:6 2:2 1:7 2:7
4:1 1:8 1:5 1:2 1:4 3:0 1:7 2:3 1:6 2:0
It is believed that the Weibull distribution with probability density function
1 (x= )
f (x) = x e for x > 0; > 0; >0
1.2. EXAMPLE 3

will provide a good …t to the data. (CONTINUOUS MODELS, PROBABILITY DENSITY

FUNCTIONS ) Assuming independent observations the (approximate) likelihood function
is
Q
20
L( ; ) = xi 1 e (xi = ) for > 0; >0
i=1

where xi is the observed relief time for the ith patient. (MULTIPARAMETER LIKELI-
HOODS ) The maximum likelihood estimates ^ and ^ are found by simultaneously solving

@ log L @ log L
=0 and =0
@ @
Since an explicit solution to these equations cannot be obtained, a numerical solution must
be found using an iterative method. (NEWTON’S METHOD) Also, since the maximum
likelihood estimators cannot be given explicitly, approximate con…dence intervals and tests
of hypothesis must be based on the asymptotic distributions of the maximum likelihood
estimators. (LIMITING OR ASYMPTOTIC DISTRIBUTIONS OF MAXIMUM LIKE-
LIHOOD ESTIMATORS )
4 1. PREVIEW
2. Univariate Random Variables

In this chapter we review concepts that were introduced in a previous probability course
such as STAT 220/230/240 as well as introducing new concepts. In Section 2:1 the concepts
of random experiments, sample spaces, probability models, and rules of probability are
reviewed. The concepts of a sigma algebra and probability set function are also introduced.
In Section 2:2 we de…ne a random variable and its cumulative distribution function. It
is important to note that the de…nition of a cumulative distribution function is the same
for all types of random variables. In Section 2:3 we de…ne a discrete random variable and
review the named discrete distributions (Hypergeometric, Binomial, Geometric, Negative
Binomial, Poisson). In Section 2:4 we de…ne a continuous random variable and review
the named continuous distributions (Uniform, Exponential, Normal, and Chi-squared). We
also introduce new continuous distributions (Gamma, Two Parameter Exponential, Weibull,
Cauchy, Pareto). A summary of the named discrete and continuous distributions
that are used in these Course Notes can be found in Chapter 11. In Section 2:5
we review the cumulative distribution function technique for …nding a function of a random
variable and prove a theorem which can be used in the case of a monotone function. In
Section 2:6 we review expectations of functions of random variables. In Sections 2:8 2:10
we introduce new material related to expectation such as inequalities, variance stabilizing
transformations and moment generating functions. Section 2.11 contains a number of
useful calculus results which will be used throughout these Course Notes.

2.1 Probability

To model real life phenomena for which we cannot predict exactly what will happen we
assign numbers, called probabilities, to outcomes of interest which re‡ect the likelihood of
such outcomes. To do this it is useful to introduce the concepts of an experiment and its
associated sample space. Consider some phenomenon or process which is repeatable, at
least in theory. We call the phenomenon or process a random experiment and refer to a
single repetition of the experiment as a trial. For such an experiment we consider the set
of all possible outcomes.

5
6 2. UNIVARIATE RANDOM VARIABLES

2.1.1 De…nition - Sample Space

A sample space S is a set of all the distinct outcomes for a random experiment, with the
property that in a single trial, one and only one of these outcomes occurs.

To assign probabilities to the events of interest for a given experiment we begin by de…ning
a collection of subsets of a sample space S which is rich enough to de…ne all the events of
interest for the experiment. We call such a collection of subsets a sigma algebra.

2.1.2 De…nition - Sigma Algebra

A collection of subsets of a set S is called a sigma algebra, denoted by B, if it satis…es the
following properties:
(1) ? 2 B where ? is the empty set
(2) If A 2 B then A 2 B
S
1
(3) If A1 ; A2 ; : : : 2 B then Ai 2 B
i=1

Suppose A1 ; A2 ; : : : are subsets of the sample space S which correspond to events of interest
for the experiment. To complete the probability model for the experiment we need to assign
real numbers P (Ai ) ; i = 1; 2; : : :, where P (Ai ) is called the probability of Ai . To develop
the theory of probability these probabilities must satisfy certain properties. The following
Axioms of Probability are a set of axioms which allow a mathematical structure to be
developed.

2.1.3 De…nition - Probability Set Function

Let B be a sigma algebra associated with the sample space S. A probability set function is
a function P with domain B that satis…es the following axioms:
(A1) P (A) 0 for all A 2 B
(A2) P (S) = 1
(A3) If A1 ; A2 ; : : : 2 B are pairwise mutually exclusive events, that is, Ai \ Aj = ? for all
i 6= j, then
S
1 P
1
P Ai = P (Ai )
i=1 i=1

Note: The probabilities P (A1 ) ; P (A2 ) ; : : : can be assigned in any way as long they satisfy
these three axioms. However, if we wish to model real life phenomena we would assign the
probabilities such that they correspond to the relative frequencies of events in a repeatable
experiment.
2.1. PROBABILITY 7

2.1.4 Example
Let B be a sigma algebra associated with the sample space S and let P be a probability
set function with domain B. If A; B 2 B then prove the following:
(a) P (?) = 0
(b) If A; B 2 B and A and B are mutually exclusive events then P (A [ B) = P (A)+P (B).
(c) P A = 1 P (A)
(d) If A B then P (A) P (B) Note: A B means a 2 A implies a 2 B.

Solution
S
1
(a) Let A1 = S and Ai = ? for i = 2; 3; : : :. Since Ai = S then by De…nition 2.1.3 (A3)
i=1
it follows that
P
1
P (S) = P (S) + P (?)
i=2
and by (A2) we have
P
1
1=1+ P (?)
i=2
By (A1) the right side is a series of non-negative numbers which must converge to the left
side which is 1 which is …nite which results in a contradiction unless P (?) = 0 as required.
S
1
(b) Let A1 = A, A2 = B, and Ai = ? for i = 3; 4; : : :. Since Ai = A [ B then by (A3)
i=1
P
1
P (A [ B) = P (A) + P (B) + P (?)
i=3
and since P (?) = 0 by the result (a) it follows that
P (A [ B) = P (A) + P (B)
(c) Since S = A [ A and A \ A = ? then by (A2) and the result proved in (b) it follows
that
1 = P (S) = P A [ A = P (A) + P A
or
P A =1 P (A)
(d) Since B = (A \ B) [ A \ B = A [ A \ B and A \ A \ B = ? then by (b)
P (B) = P (A) + P A \ B . But by (A1), P A \ B 0 so it follows that P (B) P (A).

2.1.5 Exercise
Let B be a sigma algebra associated with the sample space S and let P be a probability
set function with domain B. If A; B 2 B then prove the following:
(a) 0 P (A) 1
(b) P A \ B = P (A) P (A \ B)
(c) P (A [ B) = P (A) + P (B) P (A \ B)
8 2. UNIVARIATE RANDOM VARIABLES

For a given experiment we are sometimes interested in the probability of an event given
that we know that the event of interest has occurred in a certain subset of S. For example,
the experiment might involve people of di¤erent ages and we may be interest in an event
only for a given age group. This leads us to de…ne conditional probability.

2.1.6 De…nition - Conditional Probability

Let B be a sigma algebra associated with the sample space S and suppose A; B 2 B. The
conditional probability of event A given event B is

P (A \ B)
P (AjB) = provided P (B) > 0
P (B)

2.1.7 Example
The following table of probabilities are based on data from the 2011 Canadian census. The
probabilities are for Canadians aged 25 34.

Highest level of education attained Employed Unemployed

No certi…cate,
diploma or degree 0:066 0:010

High school
diploma or equivalent 0:185 0:016

Postsecondary
certi…cate, diploma or degree 0:683 0:040

If a person is selected at random what is the probability the person

(a) is employed?
(b) has no certi…cate, diploma or degree?
(c) is unemployed and has at least a high school diploma or equivalent?
(d) has at least a high school diploma or equivalent given that they are unemployed?

Solution

(a) Let E be the event “employed”, A1 be the event “no certi…cate, diploma or degree”,
A2 be the event “high school diploma or equivalent”, and A3 be the event “postsecondary
certi…cate, diploma or degree”.

P (E) = P (E \ A1 ) + P (E \ A2 ) + P (E \ A3 )
= 0:066 + 0:185 + 0:683
= 0:934
2.1. PROBABILITY 9

(b)

P (A1 ) = P (E \ A1 ) + P E \ A1
= 0:066 + 0:010
= 0:076

(c)

P (unemployed and have at least a high school diploma or equivalent)

= P E \ (A2 [ A3 )
= P E \ A2 + P E \ A3
= 0:016 + 0:040 = 0:056

(d)

P E \ (A2 [ A3 )
P A2 [ A3 j E =
P E
0:056
=
0:066
= 0:848

If the occurrence of event B does not a¤ect the probability of the event A, then the events
are called independent events.

2.1.8 De…nition - Independent Events

Let B be a sigma algebra associated with the sample space S and suppose A; B 2 B. A
and B are independent events if

P (A \ B) = P (A) P (B)

2.1.9 Example
In Example 2.1.7 are the events, “unemployed” and “no certi…cate, diploma or degree”,
independent events?

Solution
The events “unemployed”and “no certi…cate, diploma or degree”are not independent since

0:010 = P E \ A1
6= P E P (A1 ) = (0:066) (0:076)
10 2. UNIVARIATE RANDOM VARIABLES

2.2 Random Variables

A probability model for a random experiment is often easier to construct if the outcomes of
the experiment are real numbers. When the outcomes are not real numbers, the outcomes
can be mapped to numbers using a function called a random variable. When the observed
data are numerical values such as the number of interruptions in a day in a communications
system or the length of time until relief after taking a pain killer, random variables are still
used in constructing probability models.

2.2.1 De…nition of a Random Variable

A random variable X is a function from a sample space S to the real numbers <, that is,

X:S!<

such that P (X x) is de…ned for all x 2 <.

Note: ‘X x’is an abbreviation for f! 2 S : X(!) xg where f! 2 S : X(!) xg 2 B

and B is a sigma algebra associated with the sample space S.

2.2.2 Example
Three friends Ali, Benita and Chen are enrolled in STAT 330. Suppose we are interested in
whether these friends earn a grade of 70 or more. If we let A represent the event “Ali earns
a grade of 70 or more”, B represent the event “Benita earns a grade of 70 or more”, and C
represent the event “Chen earns a grade of 70 or more” then a suitable sample space is

S = ABC; ABC; ABC; AB C; ABC; AB C; AB C; AB C

Suppose we are mostly interested in how many of these friends earn a grade of 70 or
more. We can de…ne the random variable X = “number of friends who earn a grade of 70
or more”. The range of X is f0; 1; 2; 3g with associated mapping

X (ABC) = 3
X ABC = X ABC = X AB C = 2
X ABC = X AB C = X AB C = 1
X AB C = 0

An important function associated with random variables is the cumulative distribution

function.
2.2. RANDOM VARIABLES 11

2.2.3 De…nition - Cumulative Distribution Function

The cumulative distribution function (c.d.f.) of a random variable X is de…ned by

F (x) = P (X x) for x 2 <

Note: The cumulative distribution function is de…ned for all real numbers.

2.2.4 Properties - Cumulative Distribution Function

(1) F is a non-decreasing function, that is,

F (x1 ) F (x2 ) for all x1 < x2

(2)
lim F (x) = 0 and lim F (x) = 1
x! 1 x!1

(3) F is a right-continuous function, that is,

lim F (x) = F (a)

x!a+

(4) For all a < b

P (a < X b) = P (X b) P (X a) = F (b) F (a)

(5) For all b

P (X = b) = F (b) lim F (a)
a!b

2.2.5 Example
Suppose X is a random variable with cumulative distribution function
8
>
> 0 x<0
>
>
>
>
< 0:1
> 0 x<1
F (x) = P (X x) = 0:3 1 x<2
>
>
>
> 0:6 2 x<3
>
>
>
:
1 x 3

(a) Graph the function F (x).

(b) Determine the probabilities
(i) P (X 1)
(ii) P (X 2)
(iii) P (X 2:4)
(iv) P (X = 2)
(v) P (0 < X 2)
12 2. UNIVARIATE RANDOM VARIABLES

1.1

0.9

0.8

0.7

0.6
F(x)
0.5

0.4

0.3

0.2

0.1

0
-1 -0.5 0 0.5 1 1.5 2 2.5 3 3.5 4
x

Figure 2.1: Graph of F (x) = P (X x) for Example 2.2.5

(vi) P (0 X 2).

Solution
(a) See Figure 2.1.
(b) (i)
P (X 1) = F (1) = 0:3

(ii)
P (X 2) = F (2) = 0:6

(iii)
P (X 2:4) = P (X 2) = F (2) = 0:6

(iv)
P (X = 2) = F (2) lim F (x) = 0:6 0:3 = 0:3
x!2

or
P (X = 2) = P (X 2) P (X 1) = F (2) F (1) = 0:6 0:3 = 0:3

(v)
P (0 < X 2) = P (X 2) P (X 0) = F (2) F (0) = 0:6 0:1 = 0:5

(vi)
P (0 X 2) = P (X 2) P (X < 0) = F (2) 0 = 0:6
2.2. RANDOM VARIABLES 13

2.2.6 Example
Suppose X is a random variable with cumulative distribution function
8
>
> 0 x 0
>
>
>
< 2 3
5x 0<x 1
F (x) = P (X x) = 1 2
>
> 5 12x 3x 7 1<x<2
>
>
>
: 1 x 2

(a) Graph the function F (x).

(b) Determine the probabilities
(i) P (X 1)
(ii) P (X 2)
(iii) P (X 2:4)
(iv) P (X = 0:5)
(v) P (X = b), for b 2 <
(vi) P (1 < X 2:4)
(vii) P (1 X 2:4).

Solution

(a) See Figure 2.2

0.8

0.6

F (x)

0.4

0.2

0
0 0.5 1 1.5 2

Figure 2.2: Graph of F (x) = P (X x) for Example 2.2.6

14 2. UNIVARIATE RANDOM VARIABLES

(b) (i)

P (X 1) = F (1)
2 2
= (1)3 =
5 5
= 0:4

(ii)
P (X 2) = F (2) = 1
(iii)
P (X 2:4) = F (2:4) = 1
(iv)
P (X = 0:5)

= F (0:5) lim F (x)

x!0:5
= F (0:5) F (0:5)
= 0

(v)
P (X = b)

= F (b) lim F (a)

a!b
= F (b) F (b)
= 0 for all b 2 <

(vi)

P (1 < X 2:4) = F (2:4) F (1)

= 1 0:4
= 0:6

(vii)

P (1 X 2:4) = P (1 < X 2:4) since P (X = 1) = 0

= 0:6
2.3. DISCRETE RANDOM VARIABLES 15

2.3 Discrete Random Variables

A set A is countable if the number of elements in the set is …nite or the elements of the set
can be put into a one-to-one correspondence with the positive integers.

2.3.1 De…nition - Discrete Random Variable

A random variable X de…ned on a sample space S is a discrete random variable if there is
a countable subset A < such that P (X 2 A) = 1.

2.3.2 De…nition - Probability Function

If X is a discrete random variable then the probability function (p.f.) of X is given by

f (x) = P (X = x)
= F (x) lim F (x ") for x 2 <
"!0+

The set A = fx : f (x) > 0g is called the support set of X.

2.3.3 Properties - Probability Function

(1)
f (x) 0 for x 2 <

(2)
P
f (x) = 1
x2A

2.3.4 Example
In Example 2.2.5 …nd the support set A, show that X is a discrete random variable and
determine its probability function.

Solution
The support set of X is A = f0; 1; 2; 3g which is a countable set. Its probability function is
8
>
> 0:1 if x = 0
>
>
>

> P (X 2) P (X 1) = 0:6 0:3 = 0:3 if x = 2
>
>
>
: P (X 3) P (X 2) = 1 0:6 = 0:4 if x = 3

or
x 0 1 2 3 Total
f (x) = P (X = x) 0:1 0:2 0:3 0:4 1
16 2. UNIVARIATE RANDOM VARIABLES

P
3
Since P (X 2 A) = P (X = x) = 1, X is a discrete random variable.
x=0

In the next example we review four of the named distributions which were introduced in a
previous probability course.

2.3.5 Example
Suppose a box containing a red balls and b black balls. For each of the following …nd
P
the probability function of the random variable X and show that f (x) = 1 where
x2A
A = fx : f (x) > 0g is the support set of X.
(a) X = number of red balls among n balls drawn at random without replacement.
(b) X = number of red balls among n balls drawn at random with replacement.
(c) X = number of black balls selected before obtaining the …rst red ball if sampling is
done at random with replacement.
(d) X = number of black balls selected before obtaining the kth red ball if sampling is done
at random with replacement.

Solution
(a) If n balls are selected at random without replacement from a box of a red balls and
b black balls then the random variable X = number of red balls has a Hypergeometric
distribution with probability function

a b
x n x
f (x) = P (X = x) = for x = max (0; n b) ; : : : ; min (a; n)
a+b
n

By the Hypergeometric identity 2.11.6

a b
P min(a;n)
P x n x
f (x) =
x2A x=max(0;n b) a+b
n
1 P
1 a b
=
a+b x=0 x n x
n
a+b
n
=
a+b
n
= 1
2.3. DISCRETE RANDOM VARIABLES 17

(b) If n balls are selected at random with replacement from a box of a red balls and b black
balls then we have a sequence of Bernoulli trials and the random variable X = number of
red balls has a Binomial distribution with probability function

n x
f (x) = P (X = x) = p (1 p)n x
for x = 0; 1; : : : ; n
x
a
where p = a+b . By the Binomial series 2.11.3(1)

P P
n n x
f (x) = p (1 p)n x
x2A x20 x
= (p + 1 p)n
= 1

(c) If sampling is done with replacement then we have a sequence of Bernoulli trials and
the random variable X = number of black balls selected before obtaining the …rst red ball
has a Geometric distribution with probability function

f (x) = P (X = x) = p (1 p)x for x = 0; 1; : : :

By the Geometric series 2.11.1

P P
1
f (x) = p (1 p)x
x2A x20
p
=
[1 (1 p)]
= 1

(d) If sampling is done with replacement then we have a sequence of Bernoulli trials and
the random variable X = number of black balls selected before obtaining the kth red ball
has a Negative Binomial distribution with probability function

x+k 1
f (x) = P (X = x) = pk (1 p)x for x = 0; 1; : : :
x

Using the identity 2.11.4(1)

x+k 1 k
= ( 1)x
x x

the probability function can be written as

k k
f (x) = P (X = x) = p (p 1)x for x = 0; 1; : : :
x
18 2. UNIVARIATE RANDOM VARIABLES

By the Binomial series 2.11.3(2)

P P
1 k k
f (x) = p (p 1)x
x2A x=0 x
P
1 k
= pk (p 1)x
x=0 x
k k
= p (1 + p 1)
= 1

2.3.6 Example
If X is a random variable with probability function
xe
f (x) = for x = 0; 1; : : : ; >0 (2.1)
x!
show that
P
1
f (x) = 1
x=0

Solution
By the Exponential series 2.11.7
P
1 P
1 xe
f (x) =
x=0 x=0 x!
P
1 x
= e
x=0 x!
= e e
= 1

The probability function (2.1) is called the Poisson probability function.

2.3.7 Exercise
If X is a random variable with probability function
(1 p)x
f (x) = for x = 1; 2; : : : ; 0 < p < 1
x log p
show that
P
1
f (x) = 1
x=1
Hint: Use the Logarithmic series 2.11.8.

Important Note: A summary of the named distributions used in these Course Notes can
be found in Chapter 11.
2.4. CONTINUOUS RANDOM VARIABLES 19

2.4 Continuous Random Variables

2.4.1 De…nition - Continuous Random Variable
Suppose X is a random variable with cumulative distribution function F . If F is a continu-
ous function for all x 2 < and F is di¤erentiable except possibly at countably many points
then X is called a continuous random variable.

Note: The de…nition (2.2.3) and properties (2.2.4) of the cumulative distribution function
hold for the random variable X regardless of whether X is discrete or continuous.

2.4.2 Example
Suppose X is a random variable with cumulative distribution function
8
>
> 0 x 1
>
>
>
< 1 2
2 (x + 1) + x + 1 1<x 0
F (x) = P (X x) = 1 2
>
> 2 (x 1) + x 0<x<1
>
>
>
: 1 x 1

Show that X is a continuous random variable.

Solution

The cumulative distribution function F is a continuous function for all x 2 < since it is a
piecewise function composed of continuous functions and

lim F (x) = F (a)

x!a

at the break points a = 1; 0; 1.

The function F is di¤erentiable for all x 6= 1; 0; 1 since it is a piecewise function
composed of di¤erentiable functions. Since

F ( 1 + h) F ( 1) F ( 1 + h) F ( 1)
lim = 0 6= lim =1
h!0 h h!0+ h
F (0 + h) F (0) F (0 + h) F (0)
lim = 0 = lim
h!0 h h!0 + h
F (1 + h) F (1) F (1 + h) F (1)
lim = 1 6= lim =0
h!0 h h!0+ h

F is di¤erentiable at x = 0 butnot di¤erentiable at x = 1; 1. The set f 1; 1g is a countable

set.
Since F is a continuous function for all x 2 < and F is di¤erentiable except for countable
many points, therefore X is a continuous random variable.
20 2. UNIVARIATE RANDOM VARIABLES

2.4.3 De…nition - Probability Density Function

If X is a continuous random variable with cumulative distribution function F (x) then the
probability density function (p.d.f.) of X is f (x) = F 0 (x) if F is di¤erentiable at x. The
set A = fx : f (x) > 0g is called the support set of X.
Note: At the countably many points at which F 0 (a) does not exist, f (a) may be assigned
any convenient value since the probabilities P (X x) will be una¤ected by the choice. We
usually choose f (a) 0 and most often we choose f (a) = 0.

2.4.4 Properties - Probability Density Function

(1) f (x) 0 for all x 2 <
R1
(2) f (x)dx = lim F (x) lim F (x) = 1
1 x!1 x! 1
F (x+h) F (x) P (x X x+h)
(3) f (x) = lim h = lim h if this limit exists
h!0 h!0
R
x
(4) F (x) = f (t)dt; x2<
1
Rb
(5) P (a < X b) = P (X b) P (X a) = F (b) F (a) = f (x)dx
a
(6) P (X = b) = F (b) lim F (a) = F (b) F (b) = 0
a!b
(since F is continuous).

2.4.5 Example
Find and sketch the probability density function of the random variable X with the cumu-
lative distribution function in Example 2.2.6.

Solution
By taking the derivative of F (x) we obtain
8
>
> 0 x<0
>
>
>
< d 2 3 6 2
dx 5 x = 5x 0<x<1
F 0 (x) = d 12x 3x2 7 1
>
> = 5 (12 6x) 1<x<2
>
>
dx 5
>
: 0 x>2

We can assign any values to f (0), f (1), and f (2). For convenience we choose
f (0) = f (2) = 0 and f (1) = 56 .
2.4. CONTINUOUS RANDOM VARIABLES 21

The probability density function is

8
6 2
>
> 5x 0<x 1
<
0 1
f (x) = F (x) = 5 (12 6x) 1<x<2
>
>
: 0 otherwise

The graph of f (x) is given in Figure 2.3.

1.4

1.2

1
f(x)
0.8

0.6

0.4

0.2

0
-0.5 0 0.5 1 1.5 2 2.5
x

Figure 2.3: Graph of probability density function for Example 2.4.5

Note: See Table 2.1 in Section 2.7 for a summary of the di¤erences between the properties
of a discrete and continuous random variable.

2.4.6 Example
Suppose X is a random variable with cumulative distribution function
8
>
> 0 x a
<
x a
F (x) = b a a<x b
>
>
: 1 x>b

where b > a.
(a) Sketch F (x) the cumulative distribution function of X.
(b) Find f (x) the probability density function of X and sketch it.
(c) Is it possible for f (x) to take on values greater than one?
22 2. UNIVARIATE RANDOM VARIABLES

Solution
(a) See Figure 2.4 for a sketch of the cumulative distribution function.

1.0

F(x)

0.5

0 x
a (a+b)/2 b

Figure 2.4: Graph of cumulative distribution function for Example 2.4.6

(b) By taking the derivative of F (x) we obtain

8
>
> 0 x<a
<
1
F 0 (x) = b a a<x
>
: 0 x>b

The derivative of F (x) does not exist for x = a or x = b. For convenience we de…ne
f (a) = f (b) = b 1 a so that
( 1
b a a x b
f (x) =
0 otherwise
See Figure 2.5 for a sketch of the probability density function. Note that we could de…ne
f (a) and f (b) to be any values and the cumulative distribution function would remain the
same since x = a or x = b are countably many points.
The random variable X is said to have a Uniform(a; b) distribution. We write this as
X Uniform(a; b).
(c) If a = 0 and b = 0:5 then f (x) = 2 for 0 x 0:5. This example illustrates that the
probability density function is not a probability and that the probability density function
can take on values greater than one.
The important restriction for continuous random variables is
Z1
f (x) dx = 1
1
2.4. CONTINUOUS RANDOM VARIABLES 23

1/(b-a)

f(x)

x
a (a+b)/2 b

Figure 2.5: Graph of the probability density function of a Uniform(a; b) random variable

2.4.7 Example
Consider the function
f (x) = for x 1
x +1
and 0 otherwise. For what values of is this function a probability density function?

Solution
Using the result (2.8) from Section 2.11
Z1 Z1
f (x) dx = +1
dx
x
1 1
Zb
1
= lim x dx
b!1
1
h i
= lim x jb1
b!1
1
= 1 lim
b!1 b
= 1 if >0

Also f (x) 0 if > 0. Therefore f (x) is a a probability density function for all > 0.
X is said to have a Pareto(1; ) distribution.
24 2. UNIVARIATE RANDOM VARIABLES

A useful function for evaluating integrals associated with several named random variables
is the Gamma function.

2.4.8 De…nition - Gamma Function

The gamma function, denoted by ( ) for all > 0, is given by
Z1
1 y
( )= y e dy
0

2.4.9 Properties - Gamma Function

(1) ( ) = ( 1) ( 1) >1
(2) (n) = (n 1)! n = 1; 2; : : :
p
(3) ( 21 ) =

2.4.10 Example
Suppose X is a random variable with probability density function

x 1 e x=
f (x) = for x > 0; > 0; >0
( )

and 0 otherwise.
X is said to have a Gamma distribution with parameters and and we write
X Gamma( ; ).
(a) Verify that
Z1
f (x)dx = 1
1

(b) What special probability density function is obtained for = 1?

(c) Graph the probability density functions for
(i) = 1, = 3
(ii) = 2, = 1:5
(iii) = 5, = 0:6
(iv) = 10, = 0:3
on the same graph.

Note: See Chapter 11 - Summary of Named Distributions. Note that the notation for
parameters used for named distributions is not necessarily the same in all textbooks. This
is especially true for distributions with two or more parameters.
2.4. CONTINUOUS RANDOM VARIABLES 25

Solution
(a)

Z1 Z1 1 e x=
x
f (x) dx = dx let y = x=
( )
1 0
Z1
1 1 y
= (y ) e dy
( )
0
Z1
1 1 y
= y e dy
( )
0
( )
=
( )
= 1

(b) If = 1 the probability density function is

1 x=
f (x) = e for x > 0; >0

and 0 otherwise which is the Exponential( ) distribution.

(c) See Figure 2.6

0.45
α=10,β=0.3
0.4

0.35
α=1,β=3 α=5,β=0.6
0.3
f(x)
0.25
α=2,β=1.5
0.2

0.15

0.1

0.05

0
0 1 2 3 4 5 6 7 8 9
x

Figure 2.6: Gamma( ; ) probability density functions

26 2. UNIVARIATE RANDOM VARIABLES

2.4.11 Exercise
Suppose X is a random variable with probability density function
1 (x= )
f (x) = x e for x > 0; > 0; >0

and 0 otherwise.
X is said to have a Weibull distribution with parameters and and we write
X Weibull( ; ).
(a) Verify that
Z1
f (x)dx = 1
1

(b) What special probability density function is obtained for = 1?

(c) Graph the probability density functions for
(i) = 1, = 0:5
(ii) = 2, = 0:5
(iii) = 2, = 1
(iv) = 3, = 1
on the same graph.

2.4.12 Exercise
Suppose X is a random variable with probability density function

f (x) = +1
for x > ; > 0; >0
x
and 0 otherwise.
X is said to have a Pareto distribution with parameters and and we write
X Pareto( ; ).
(a) Verify that
Z1
f (x)dx = 1
1

(b) Graph the probability density functions for

(i) = 1, = 1
(ii) = 1, = 2
(iii) = 0:5, = 1
(iv) = 0:5, = 2
on the same graph.
2.5. LOCATION AND SCALE PARAMETERS 27

2.5 Location and Scale Parameters

In Chapter 6 we will look at methods for constructing con…dence intervals for an unknown
parameter . If the parameter is either a location parameter or a scale parameter then a
con…dence interval is easier to construct.

2.5.1 De…nition - Location Parameter

Suppose X is a continuous random variable with probability density function f (x; ) where
is a parameter of the distribution. Let F0 (x) = F (x; = 0) and f0 (x) = f (x; = 0).
The parameter is called a location parameter of the distribution if

F (x; ) = F0 (x ) for 2<

or equivalently
f (x; ) = f0 (x ) for 2<

2.5.2 De…nition - Scale Parameter

Suppose X is a continuous random variable with probability density function f (x; ) where
is a parameter of the distribution. Let F1 (x) = F (x; = 1) and f1 (x) = f (x; = 1).
The parameter is called a scale parameter of the distribution if
x
F (x; ) = F1 for >0

or equivalently
1 x
f (x; ) = f1 ( ) for >0

2.5.3 Example
Suppose X is a continuous random variable with probability density function
1 (x )=
f (x) = e for x ; 2 <; >0

and 0 otherwise.
X is said to have a Two Parameter Exponential distribution and we write X Double
Exponential( ; ).
(a) If X Two Parameter Exponential( ; 1) show that is a location parameter for this
distribution. Sketch the probability density function for = 1; 0; 1 on the same graph.
(b) If X Two Parameter Exponential(0; ) show that is a scale parameter for this
distribution. Sketch the probability density function for = 0:5; 0; 2 on the same graph.
28 2. UNIVARIATE RANDOM VARIABLES

Solution
(a) For X Two Parameter Exponential( ; 1) the probability density function is
(x )
f (x; ) = e for x ; 2<

and 0 otherwise.
Let
x
f0 (x) = f (x; = 0) = e for x > 0
and 0 otherwise. Then
(x )
f (x; ) = e
= f0 (x ) for all 2<

and therefore is a location parameter of this distribution.

See Figure 2.7 for a sketch of the probability density function for = 1; 0; 1.

0.9

0.8 θ=-1 θ=0

θ=1
0.7

0.6
f(x)
0.5

0.4

0.3

0.2

0.1

0
-1 0 1 2 3 4 5
x

Figure 2.7: Exponential( ; 1) probability density function for = 1; 0; 1

(b) For X Two Parameter Exponential(0; ) the probability density function is

1 x=
f (x; ) = e for x > 0; >0

and 0 otherwise which is the Exponential( ) probability density function.

2.5. LOCATION AND SCALE PARAMETERS 29

Let
x
f1 (x) = f (x; = 1) = e for x > 0
and 0 otherwise. Then
1 x= 1 x
f (x; ) = e = f1 for all >0

and therefore is a scale parameter of this distribution.

See Figure 2.8 for a sketch of the probability density function for = 0:5; 1; 2.

1.8
θ=0.5
1.6

1.4

1.2
f(x)
1 θ=1

0.8

0.6
θ=2

0.4

0.2

0
0 0.5 1 1.5 2 2.5 3
x

Figure 2.8: Exponential( ) probability density function for = 0:5; 1; 2

2.5.4 Exercise
Suppose X is a continuous random variable with probability density function
1
f (x) = for x 2 <; 2 <; >0
f1 + [(x )= ]2 g

and 0 otherwise.
X is said to have a two parameter Cauchy distribution and we write
X Cauchy( ; ).
(a) If X Cauchy( ; 1) then show that is a location parameter for the distribution. Graph
the Cauchy( ; 1) probability density function for = 1; 0 and 1 on the same graph.
(b) If X Cauchy(0; ) then show that is a scale parameter for the distribution. Graph
the Cauchy(0; ) probability density function for = 0:5; 1 and 2 on the same graph.
30 2. UNIVARIATE RANDOM VARIABLES

2.6 Functions of a Random Variable

Suppose X is a continuous random variable with probability density function f and cumu-
lative distribution function F and we wish to …nd the probability density function of the
random variable Y = h(X) where h is a real-valued function. In this section we look at
techniques for determining the distribution of Y .

2.6.1 Cumulative Distribution Function Technique

A useful technique for determining the distribution of a function of a random variable

Y = h(X) is the cumulative distribution function technique. This technique involves ob-
taining an expression for G(y) = P (Y y), the cumulative distribution function of Y ,
in terms of F , the cumulative distribution function of X. The corresponding probability
density function g of Y is found by di¤erentiating G. Care must be taken to determine the
support set of the random variable Y .

2.6.2 Example

If Z N(0; 1) …nd the probability density function of Y = Z 2 . What type of random

variable is Y ?

Solution

If Z N(0; 1) then the probability density function of Z is

1 z 2 =2
f (z) = p e for z 2 <
2

Let G (y) = P (Y y) be the cumulative distribution function of Y = Z 2 . Since the support

set of the random variable Z is < and Y = Z 2 then the support set of Y is B = fy : y > 0g.
For y 2 B

G (y) = P (Y y)
= P Z2 y
p p
= P( y Z y)
p
Zy
1 z 2 =2
= p e dz
p 2
y
p
Zy
1 z 2 =2
= 2 p e dz since f (z) is an even function
2
0
2.6. FUNCTIONS OF A RANDOM VARIABLE 31

For y 2 B the probability density function of Y is

2p 3
Zy p 2
d d 6 2 7 2 d p
g (y) = G (y) = 4 p e
z 2 =2
dz 5 = p e ( y) =2 ( y)
dy dy 2 2 dy
0
2 p 2 1
= p e ( y) =2
p (2.2)
2 2 y
1
= p e y=2 (2.3)
2 y

by 2.11.10.
We recognize (2.3) as the probability density function of a Chi-squared(1) random vari-
able so Y = Z 2 2 (1).

The above solution provides a proof of the following theorem.

2.6.3 Theorem
If Z N(0; 1) then Z 2 2 (1).

2.6.4 Example
Suppose X Exponential( ). The cumulative distribution function for X is
8
<0 x 0
F (x) = P (X x) =
:1 e x= x>0

Determine the distribution of the random variable Y = F (X) = 1 e X= .

Solution

Since Y = 1 e X= ,X= log (1 Y)=F 1 (Y ) for X > 0 and 0 < Y < 1.

For 0 < y < 1 the cumulative distribution function of Y is

X=
G (y) = P (Y y) = P 1 e y
= P (X log (1 y))
= F( log (1 y))
log(1 y)=
= 1 e
= 1 (1 y)
= y
32 2. UNIVARIATE RANDOM VARIABLES

If U Uniform(0; 1) then the cumulative distribution function of U is

8
>
>0 u 0
>

>
>
:1 u 1

Therefore the cumulative distribution function of Y = F (X) = 1 e X= is Uniform(0; 1).

This is an example of a result which holds more generally as summarized in the following
theorem.

2.6.5 Theorem - Probability Integral Transformation

If X is a continuous random variable with cumulative distribution function F then the

random variable
ZX
Y = F (X) = f (t) dt (2.4)
1

has a Uniform(0; 1) distribution.

Proof

Suppose the continuous random variable X has support set A = fx : f (x) > 0g. For all
x 2 A, F is an increasing function since F is a cumulative distribution function. Therefore
for all x 2 A the function F has an inverse function F 1 .
For 0 < y < 1, the cumulative distribution function of Y = F (X) is

G (y) = P (Y y) = P (F (X) y)
1
= P X F (y)
1
= F F (y)
= y

which is the cumulative distribution function of a Uniform(0; 1) random variable. Therefore

Y = F (X) Uniform(0; 1) as required.

Note: Because of the form of the function (transformation) Y = F (X) in (2.4), this
transformation is called the probability integral transformation. This result holds for any
cumulative distribution function F corresponding to a continuous random variable.
2.6. FUNCTIONS OF A RANDOM VARIABLE 33

2.6.6 Theorem
Suppose F is a cumulative distribution function for a continuous random variable. If
U Uniform(0; 1) then the random variable X = F 1 (U ) also has cumulative distribution
function F .

Proof
Suppose that the support set of the random variable X = F 1 (U ) is A. For x 2 A, the
cumulative distribution function of X = F 1 (U ) is
1
P (X x) = P F (U ) x
= P (U F (x))
= F (x)

since P (U u) = u for 0 < u < 1 if U Uniform(0; 1). Therefore X = F 1 (U ) has

cumulative distribution function F .

Note: The result of the previous theorem is important because it provides a method for
generating observations from a continuous distribution. Let u be an observation generated
from a Uniform(0; 1) distribution using a random number generator. Then by Theorem
2.6.6, x = F 1 (u) is an observation from the distribution with cumulative distribution
function F .

2.6.7 Example
Explain how Theorem 2.6.6 can be used to generate observations from a Weibull( ; 1)
distribution.

Solution
If X has a Weibull( ; 1) distribution then the cumulative distribution function is
Zx
1 y
F (x; ) = y e dy
0
x
= 1 e for x > 0

The inverse cumulative distribution function is

F 1
(u) = [ log (1 u)]1= for 0 < u < 1

If u is an observation from the Uniform(0; 1) distribution then x = [ log (1 u)]1= is an

observation from the Weibull( ; 1) distribution by Theorem 2.6.6.

If we wish to …nd the distribution of the random variable Y = h(X) and h is a one-to-one
real-valued function then the following theorem can be used.
34 2. UNIVARIATE RANDOM VARIABLES

2.6.8 Theorem - One-to-One Transformation of a Random Variable

Suppose X is a continuous random variable with probability density function f and support
set A = fx : f (x) > 0g. Let Y = h(X) where h is a real-valued function. Let B = fy :
g(y) > 0g be the support set of the random variable Y . If h is a one-to-one function from
d
A to B and dx h (x) is continuous for x 2 A, then the probability density function of Y is

1 d 1
g(y) = f (h (y)) h (y) for y 2 B
dy
Proof
We prove this theorem using the cumulative distribution function technique.
d
(1) Suppose h is an increasing function and dx h (x) is continuous for x 2 A. Then h 1 (y)
d
is also an increasing function and dy h 1 (y) > 0 for y 2 B. The cumulative distribution
function of Y = h(X) is

G (y) = P (Y y) = P (h (X) y)
1
= P X h (y) since h is an increasing function
1
= F h (y)

Therefore
d d
g (y) = G (y) = F h 1 (y)
dy dy
d
= F 0 h 1 (y) h 1 (y) by the Chain Rule
dy
d d
= f h 1 (y) h 1 (y) y 2 B since h 1 (y) > 0
dy dy
d
(2) Suppose h is a decreasing function and dx h (x) is continuous for x 2 A. Then h 1 (y)
d
is also a decreasing function and dy h 1 (y) < 0 for y 2 B. The cumulative distribution
function of Y = h(X) is

G (y) = P (Y y) = P (h (X) y)
1
= P X h (y) since h is a decreasing function
1
= 1 F h (y)

Therefore
d d
g (y) = G (y) = 1 F h 1 (y)
dy dy
d
= F 0 h 1 (y) h 1 (y) by the Chain Rule
dy
d d
= f h 1 (y) h 1 (y) y 2 B since h 1 (y) < 0
dy dy
These two cases give the desired result.
2.6. FUNCTIONS OF A RANDOM VARIABLE 35

2.6.9 Example
The following two results were used extensively in your previous probability and statistics
courses.
(a) If Z N(0; 1) then Y = + Z N ; 2 .
2) X
(b) If X N( ; then Z = N(0; 1).
Prove these results using Theorem 2.6.8.

Solution
(a) If Z N(0; 1) then the probability density function of Z is
1 z 2 =2
f (z) = p e for z 2 <
2
1 (Y Y
Y = + Z = h (Z) is an increasing function with inverse function Z = h )= .
Since the support set of Z is A = <, the support set of Y is B = <.
Since
d d y 1
h 1 (y) = =
dy dy
then by Theorem 2.6.8 the probability density function of Y is
1 d 1
g(y) = f (h (y)) h (y)
dy
1 y 2
=2 1
= p e ( ) for y 2 <
2
which is the probability density function of an N ; 2 random variable.
Therefore if Z N(0; 1) then Y = + Z N ; 2 .
(b) If X N( ; 2) then the probability density function of X is
1 x 2
f (x) = p e ( ) =2
for x 2 <
2
Z= X = h (X) is an increasing function with inverse function X = h 1 (Z) = + Z.
Since the support set of X is A = <, the support set of Z is B = <.
Since
d d
h 1 (z) = ( + z) =
dz dy
then by Theorem 2.6.8 the probability density function of Z is
1 d 1
g(z) = f (h (z)) h (z)
dy
1 z 2 =2
= p e for z 2 <
2
which is the probability density function of an N(0; 1) random variable.
2) Y
Therefore if X N( ; then Z = N(0; 1).
36 2. UNIVARIATE RANDOM VARIABLES

2.6.10 Example
Use Theorem 2.6.8 to prove the following relationship between the Pareto distribution and
the Exponential distribution.

1
If X P areto (1; ) then Y = log(X) Exponential

Solution
If X Pareto(1; ) then the probability density function of X is

f (x) = +1
for x 1; >0
x
Y = log(X) = h (X) is an increasing function with inverse function X = eY = h 1 (Y ).
Since the support set of X is A = fx : x 0g, the support set of Y is B = fy : y > 0g.
Since
d 1 d y
h (y) = e = ey
dy dy
then by Theorem 2.6.8 the probability density function of Y is

1 d 1
g(y) = f (h (y)) h (y)
dy

= +1
ey = e y
for y > 0
(ey )
1
which is the probability density function of an Exponential random variable as required.

2.6.11 Exercise
Use Theorem 2.6.8 to prove the following relationship between the Exponential distribution
and the Weibull distribution.

If X Exponential(1) then Y = X 1= Weibull( ; ) for y > 0; > 0; >0

2.6.12 Exercise
Suppose X is a random variable with probability density function
1
f (x) = x for 0 < x < 1; >0

and 0 otherwise.
1
Use Theorem 2.6.8 to prove that Y = log X Exponential .
2.7. EXPECTATION 37

2.7 Expectation
In this section we de…ne the expectation operator E which maps random variables to real
numbers. These numbers have an interpretation in terms of long run averages for repeated
independent trails of an experiment associated with the random variable. Much of this
section is a review of material covered in a previous probability course.

2.7.1 De…nition - Expectation

Suppose h(x) is a real-valued function.
If X is a discrete random variable with probability function f (x) and support set A then
the expectation of the random variable h (X) is de…ned by
P
E [h(X)] = h (x) f (x)
x2A

provided the sum converges absolutely, that is, provided

P
E(jh(X)j) = jh (x)j f (x) < 1
x2A

If X is a continuous random variable with probability density function f (x) then the ex-
pectation of the random variable h (X) is de…ned by

Z1
E [h(X)] = h (x) f (x)dx
1

provided the integral converges absolutely, that is, provided

Z1
E(jh(X)j) = jh (x)j f (x)dx < 1
1

If E(jh(X)j) = 1 then we say that E [h(X)] does not exist.

E [h(X)] is also called the expected value of the random variable h (X).

2.7.2 Example
Find E(X) if X Geometric(p).

Solution
If X Geometric(p) then

f (x) = pq x for x = 0; 1; : : : ; q = 1 p; 0 < p < 1

38 2. UNIVARIATE RANDOM VARIABLES

and
P P
1 P
1
E (X) = xf (x) = xpq x = xpq x
x2A x=0 x=1
P
1
= pq xq x 1
which converges if 0 < q < 1
x=1
pq
= by 2.11.2(2)
(1 q)2
q
= if 0 < q < 1
p

2.7.3 Example
Suppose X Pareto(1; ) with probability density function

f (x) = +1
for x 1; >0
x
and 0 otherwise. Find E(X). For what values of does E(X) exist?

Solution
Z1 Z1
E (X) = xf (x)dx = x +1
dx
x
1 1
Z1
1
= dx which converges for > 1 by 2.8
x
1
Zb
+1 b 1
= lim x dx = lim x j1 = 1 1
b!1 1 b!1 1 b
1

= for >1
1

Therefore E (X) = 1 and the mean exists only for > 1.

2.7.4 Exercise
Suppose X is a nonnegative continuous random variable with cumulative distribution func-
tion F (x) and E(X) < 1. Show that

Z1
E(X) = [1 F (x)] dx
0

Hint: Use integration by parts with u = [1 F (x)].

2.7. EXPECTATION 39

2.7.5 Theorem - Expectation is a Linear Operator

Suppose X is a random variable with probability (density) function f (x), a and b are real
constants, and g(x) and h(x) are real-valued functions. Then

E (aX + b) = aE (X) + b
E [ag(X) + bh(X)] = aE [g(X)] + bE [h(X)]

Proof (Continuous Case)

Z1
E (aX + b) = (ax + b) f (x)dx
1
Z1 Z1
= a xf (x)dx + b f (x)dx by properties of integrals
1 1
= aE (X) + b (1) by De…nition 2.7.1 and Property 2.4.4
= aE (X) + b

Z1
E [ag(X) + bh(X)] = [ag(x) + bh(x)] f (x)dx
1
Z1 Z1
= a g(x)f (x)dx + b h(x)f (x)dx by properties of integrals
1 1
= aE [g(X)] + bE [h(X)] by De…nition 2.7.1

as required.

The following named expectations are used frequently.

2.7.6 Special Expectations

(1) The mean of a random variable
E (X) =

(2) The kth moment (about the origin) of a random variable

E(X k )

(3) The kth moment about the mean of a random variable

h i
E (X )k
40 2. UNIVARIATE RANDOM VARIABLES

(4) The kth factorial moment of a random variable

E X (k) = E [X(X 1) (X k + 1)]

(5) The variance of a random variable

V ar(X) = E[(X )2 ] = 2
where = E(X)

2.7.7 Theorem - Properties of Variance

2
= V ar(X)
= E(X 2 ) 2

2
= E[X(X 1)] +

V ar(aX + b) = a2 V ar(X)
and
E(X 2 ) = 2
+ 2

Proof (Continuous Case)

V ar(X) = E[(X )2 ] = E X 2 2 X+ 2

= E X2 2 E (X) + 2
by Theorem 2.7.5
2 2 2
= E X 2 +
= E X2 2

Also

V ar(X) = E X 2 2
= E [X (X 1) + X] 2

2
= E [X (X 1)] + E (X) by Theorem 2.7.5
2
= E [X (X 1)] +

n o
V ar (aX + b) = E [aX + b (a + b)]2
h i h i
= E (aX a )2 = E a2 (X )2
= a2 E[(X )2 ] by Theorem 2.7.5
= a2 V ar(X) by de…nition

Rearranging 2 = E(X 2 ) 2 gives

E(X 2 ) = 2
+ 2
2.7. EXPECTATION 41

2.7.8 Example

If X Binomial(n; p) then show

E(X (k) ) = n(k) pk for k = 1; 2; : : :

and thus …nd E (X) and V ar(X).

Solution

P
n n x
E(X (k) ) = x(k) p (1 p)n x
x=k x
Pn n k x
= n(k) p (1 p)n x
by 2.11.4(1)
x=k x k
nPk n k
= n(k) py+k (1 p)n y k
y=x k
y=0 y
nPk n k
= n(k) pk py (1 p)(n k) y
y=0 y
= n (k) k
p (p + 1 p)n k
by 2.11.3(1)
= n(k) pk for k = 1; 2; : : :

For k = 1 we obtain

E X (1) = E (X)
= n(1) p1
= np

For k = 2 we obtain

E X (2) = E [X (X 1)]
= n(2) p2
= n (n 1) p2

Therefore

2
V ar (X) = E[X(X 1)] +
= n (n 1) p2 + np n2 p2
= np2 + np
= np (1 p)
42 2. UNIVARIATE RANDOM VARIABLES

2.7.9 Exercise
Show the following:
k
(a) If X Poisson( ) then E(X (k) ) = for k = 1; 2; : : :
j
p 1
(b) If X Negative Binomial(k; p) then E(X (j) ) = ( k)(j) p for j = 1; 2; : : :
p
(c) If X Gamma( ; ) then E(X p ) = ( + p) = ( ) for p > .
k k
(d) If X Weibull( ; ) then E(X k ) = +1 for k = 1; 2; : : :
In each case …nd E (X) and V ar(X).

Table 2.1 summarizes the di¤erences between the properties of a discrete and continuous
random variable.

Discrete Random Variable

Property Continuous Random Variable

P Rx
F (x) = P (X x) = P (X = t) F (x) = P (X x) = f (t) dt
c.d.f. t x 1
F is a right continuous F is a continuous
function for all x 2 < function for all x 2 <

p.f./p.d.f. f (x) = P (X = x) f (x) = F 0 (x) 6= P (X = x) = 0

P
P (X 2 E) = P (X = x) P (a < X b) = F (b) F (a)
Probability x2E
P Rb
of an event = f (x) = f (x) dx
x2E a

Total P P R1
P (X = x) = f (x) = 1 f (x) dx = 1
Probability x2A x2A 1
where A = support set of X

P R1
Expectation E [g (X)] = g (x) f (x) E [g (X)] = g (x) f (x) dx
x2A 1
where A = support set of X

Table 2.1 Properties of discrete versus continuous random variables

2.8. INEQUALITIES 43

2.8 Inequalities
In Chapter 5 we consider limiting distributions of a sequence of random variables. The
following inequalities which involve the moments of a distribution are useful for proving
limit theorems.

2.8.1 Markov’s Inequality

E(jXjk )
P (jXj c) for all k; c > 0
ck
Proof (Continuous Case)

Suppose X is a continuous random variable with probability density function f (x). Let

x k
A= x: 1 = fx : jxj cg since c > 0
c

Then
! Z1
E jXjk X k
x k
= E = f (x) dx
ck c c
1
Z Z
x k x k
= f (x) dx + f (x) dx
c c
A A
Z Z
x k x k
f (x) dx since f (x) dx 0
c c
A A
Z
x k
f (x) dx since 1 for x 2 A
c
A
= P (jXj c)

as required. (The proof of the discrete case follows by replacing integrals with sums.)

2.8.2 Chebyshev’s Inequality

Suppose X is a random variable with …nite mean and …nite variance 2. Then for any
k>0
1
P (jX j k )
k2

2.8.3 Exercise
Use Markov’s Inequality to prove Chebyshev’s Inequality.
44 2. UNIVARIATE RANDOM VARIABLES

2.9 Variance Stabilizing Transformation

In Chapter 6 we look at methods for constructing a con…dence interval for an unknown
parameter based on data X. To do this it is often useful to …nd a transformation, g (X),
of the data X whose variance is approximately constant with respect to .
Suppose X is a random variable with …nite mean E (X) = . Suppose also that X has
p
…nite variance V ar (X) = 2 ( ) and standard deviation V ar (X) = ( ) also depending
on . Let Y = g (X) where g is a di¤erentiable function. By the linear approximation

Y = g (X) g ( ) + g 0 ( ) (X )

Therefore
E (Y ) E g ( ) + g 0 ( ) (X ) = g( )
since
E g 0 ( ) (X ) = g 0 ( ) E [(X )] = 0
Also
2 2
V ar (Y ) V ar g 0 ( ) (X ) = g0 ( ) V ar (X) = g 0 ( ) ( ) (2.5)
If we want V ar (Y ) constant with respect to then we should choose g such that
2 2
g0 ( ) V ar (X) = g 0 ( ) ( ) = constant

In other words we need to solve the di¤erential equation

dg k
=
d ( )
where k is a conveniently chosen constant.

2.9.1 Example
p
If X Poisson( ) then show that the random variable Y = g (X) = X has approximately
constant variance.

Solution
p p p 1
If X Poisson( ) then V ar (X) = ( ) = . For g (X) = X, g 0 (X) = 2X
1=2 .
p
Therefore by (2.5), the variance of Y = g (X) = X is approximately
p 2
2 1 1
g0 ( ) ( ) = 1=2
=
2 4
which is a constant.

2.9.2 Exercise
If X Exponential( ) then show that the random variable Y = g (X) = log X has approx-
imately constant variance.
2.10. MOMENT GENERATING FUNCTIONS 45

2.10 Moment Generating Functions

If we are given the probability (density) function of a random variable X or the cumulative
distribution function of the random variable X then we can determine everything there is to
know about the distribution of X. There is a third type of function, the moment generating
function, which also uniquely determines a distribution. The moment generating function is
closely related to other transforms used in mathematics, the Laplace and Fourier transforms.
Moment generating functions are a powerful tool for determining the distributions of
functions of random variables (Chapter 4), particularly sums, as well as determining the
limiting distribution of a sequence of random variables (Chapter 5).

2.10.1 De…nition - Moment Generating Function

If X is a random variable then M (t) = E(etX ) is called the moment generating function
(m.g.f.) of X if this expectation exists for all t 2 ( h; h) for some h > 0.

Important: When determining the moment generating function M (t) of a random variable
the values of t for which the expectation exists should always be stated.

2.10.2 Example
(a) Find the moment generating function of the random variable X Gamma( ; ).
(b) Find the moment generating function of the random variable X Negative Binomial(k; p).

Solution
(a) If X Gamma( ; ) then
Z1 Z1 1
tx x
M (t) = e f (x) dx = etx e x=
dx
( )
1 0
Z1
1 1 x 1
t 1
= x e dx which converges for t <
( )
0
Z1
1 1 x 1 t 1 t
= x e dx let y = x
( )
0
Z1 1
1 y y
= e dy
( ) 1 t 1 t
0
Z1
1 1 1 y ( ) 1
= y e dy =
( ) 1 t ( ) 1 t
0
1 1
= for t <
1 t
46 2. UNIVARIATE RANDOM VARIABLES

(b) If X Negative Binomial(k; p) then

P
1 k k
M (t) = etx p ( q)x where q = 1 p
x20 x
P
1 k x
= pk et ( q) which converges for qet < 1
x20 x
k
= pk 1 qet by 2.11.3(2) for et < q 1

k
p
= for t < log (q)
1 qet

2.10.3 Exercise
(a) Show that the moment generating function of the random variable X Binomial(n; p)
n
is M (t) = q + pet for t 2 <.
(b) Show that the moment generating function of the random variable X Poisson( ) is
t
M (t) = e (e 1) for t 2 <.

If the moment generating function of random variable X exists then the following theo-
rem gives us a method for determining the distribution of the random variable Y = aX + b
which is a linear function of X.

2.10.4 Theorem - Moment Generating Function of a Linear Function

Suppose the random variable X has moment generating function MX (t) de…ned for
t 2 ( h; h) for some h > 0. Let Y = aX + b where a; b 2 R and a 6= 0. Then the moment
generating function of Y is
h
MY (t) = ebt MX (at) for jtj <
jaj

Proof

MY (t) = E etY = E et(aX+b)

= ebt E eatX which exists for jatj < h
h
= ebt MX (at) for jtj <
jaj

as required.

2.10.5 Example
(a) Find the moment generating function of Z N(0; 1).
(b) Use (a) and the fact that X = + Z N( ; 2) to …nd the moment generating function
of a N( ; 2 ) random variable.
2.10. MOMENT GENERATING FUNCTIONS 47

Solution
(a) The moment generating function of Z is
Z1
1 z 2 =2
MZ (t) = etz p e dz
2
1
Z1
1 2
= p e (z 2tz )=2
dz
2
1
Z1
t2 =2 1 (z t)2 =2
= e p e dz
2
1
t2 =2
= e for t 2 <

by 2.4.4(2) since
1 (z t)=2
p e
2
is the probability density function of a N (t; 1) random variable.
(b) By Theorem 2.10.4 the moment generating function of X = + Z is

MX (t) = e t MZ ( t)
t)2 =2
= e t e(
t+ 2 t2 =2
= e for t 2 <

2.10.6 Exercise
If X Negative Binomial(k; p) then …nd the moment generating function of Y = X + k,
k = 1; 2; : : :

2.10.7 Theorem - Moments from Moment Generating Function

Suppose the random variable X has moment generating function M (t) de…ned for
t 2 ( h; h) for some h > 0. Then M (0) = 1 and

M (k) (0) = E(X k ) for k = 1; 2; : : :

where
dk
M (k) (t) = M (t)
dtk
is the kth derivative of M (t).
48 2. UNIVARIATE RANDOM VARIABLES

Proof (Continuous Case)

Note that
M (0) = E(X 0 ) = E (1) = 1
and also that
dk tx
e = xk etx k = 1; 2; : : : (2.6)
dtk
The result (2.6) can be proved by induction.
Now if X is a continuous random variable with moment generating function M (t) de…ned
for t 2 ( h; h) for some h > 0 then

dk
M (k) (t) = E etX
dtk
Z1
dk
= etx f (x) dx
dtk
1
Z1
dk tx
= e f (x) dx k = 1; 2; : : :
dtk
1

assuming the operations of di¤erentiation and integration can be exchanged. (This inter-
change of operations cannot always be done but for the moment generating functions of
interest in this course the result does hold.)
Using (2.6) we have
Z1
(k)
M (t) = xk etx f (x) dx
1

= E X k etX t 2 ( h; h) for some h > 0

Letting t = 0 we obtain
M (k) (0) = E X k k = 1; 2; : : :
as required.

2.10.8 Example
If X Gamma( ; ) then M (t) = (1 t) , t < 1= . Find E X k , k = 1; 2; : : : using
Theorem 2.10.7.

Solution
d d
M (t) = M 0 (t) = (1 t)
dt dt
1
= (1 t)
2.10. MOMENT GENERATING FUNCTIONS 49

so
E (X) = M 0 (0) =

d2 00 d2
M (t) = M (t) = (1 t)
dt2 dt2
2
= ( + 1) 2 (1 t)

so
E X 2 = M 00 (0) = ( + 1) 2

Continuing in this manner we have

dk (k) dk
M (t) = M (t) = (1 t)
dtk dtk
k k
= ( + 1) ( +k 1) (1 t) for k = 1; 2; : : :

E Xk = M (k) (0)

= ( + 1) ( +k 1) k
=( +k 1)(k) k
for k = 1; 2; : : :

2.10.9 Important Idea

Suppose M (k) (t), k = 1; 2; : : : exists for t 2 ( h; h) for some h > 0, then M (t) has a
Maclaurin series given by
P1 M (k) (0)
tk
k=0 k!

where
M (0) (0) = M (0) = 1

The coe¢ cient of tk in this power series is equal to

M (k) (0) E(X k )

=
k! k!

Therefore if we can obtain a Maclaurin series for M (t), for example, by using the Binomial
series or the Exponential series, then we can …nd E(X k ) by using

E(X k ) = k! coe¢ cient of tk in the Maclaurin series for M (t) (2.7)

50 2. UNIVARIATE RANDOM VARIABLES

2.10.10 Example
Suppose X Gamma( ; ). Find E X k by using the Binomial series expansion for
M (t) = (1 t) , t < 1= .

Solution

M (t) = (1 t)
P1 1
= ( t)k for jtj < by 2.11.3(2)
k=0 k
P1 1
= ( )k tk for jtj <
k=0 k

The coe¢ cient of tk in M (t) is

( )k for k = 1; 2; : : :
k

Therefore

E(X k ) = k! coe¢ cient of tk in the Maclaurin series for M (t)

= k! ( )k
k
)(k)
(
= k! ( )k
k!
= ( )( 1) ( k + 2) ( k + 1) ( )k
k
= ( +k 1) ( + k 2) ( + 1) ( )
= ( +k 1)(k) k
for k = 1; 2; : : :

which is the same result as obtained in Example 2.10.8.

Moment generating functions are particularly useful for …nding distributions of sums of
independent random variables. The following theorem plays an important role in this
technique.

2.10.11 Uniqueness Theorem for Moment Generating Functions

Suppose the random variable X has moment generating function MX (t) and the random
variable Y has moment generating function MY (t). Suppose also that MX (t) = MY (t) for
all t 2 ( h; h) for some h > 0. Then X and Y have the same distribution, that is,
P (X s) = FX (s) = FY (s) = P (Y s) for all s 2 <.

Proof
See Problem 18 for the proof of this result in the discrete case.
2.10. MOMENT GENERATING FUNCTIONS 51

2.10.12 Example
If X Exponential(1) then …nd the distribution of Y = + X where > 0 and 2 <.

Solution
From Example 2.4.10 we know that if X Exponential(1) then X Gamma(1; 1) so
1
MX (t) = for t < 1
1 t
By Theorem 2.10.4

MY (t) = e t MX ( t) for t < 1

e t 1
= for t <
1 t
By examining the list of moment generating functions in Chapter 11 we see that this is
the moment generating function of a Two Parameter Exponential( ; ) random variable.
Therefore by the Uniqueness Theorem for Moment Generating Functions, Y has a Two
Parameter Exponential( ; ) distribution.

2.10.13 Example
2X 2 (2
If X Gamma( ; ), where is a positive integer, then show ).

Solution
(a) From Example 2.10.2 the moment generating function of X is

1 1
M (t) = for t <
1 t
2X
By Theorem 2.10.4 the moment generating function of Y = is

2 1
MY (t) = MX t for jtj <
2
( )
2 3
1 1
= 4 5 for jtj <
1 2
t 2

1 1
= for jtj <
1 2t 2

By examining the list of moment generating functions in Chapter 11 we see that this is
the moment generating function of a 2 (2 ) random variable if is a positive integer.
Therefore by the Uniqueness Theorem for Moment Generating Functions, Y has a 2 (2 )
distribution if is a positive integer.
52 2. UNIVARIATE RANDOM VARIABLES

2.10.14 Exercise
Suppose the random variable X has moment generating function
2 =2
M (t) = et for t 2 <

(a) Use (2.7) and the Exponential series 2.11.7 to …nd E (X) and V ar (X).
(b) Find the moment generating function of Y = 2X 1. What is the distribution of Y ?

2.11 Calculus Review

2.11.1 Geometric Series
P
1 a
atx = a + at + at2 + = for jtj < 1
x=0 1 t

2.11.2 Useful Results

(1)
P
1 1
tx = for jtj < 1
x=0 1 t

(2)
P
1 1
xtx 1
= for jtj < 1
x=1 (1 t)2

2.11.3 Binomial Series

(1) For n 2 Z + (the positive integers)

P
n n x n
(a + b)n = a b x
x=0 x

where
n n! n(x)
= =
x x! (n x)! x!

(2) For n 2 Q (the rational numbers) and jtj < 1

P
1 n x
(1 + t)n = t
x=0 x

where
n n(x) n(n 1) (n x + 1)
= =
x x! x!
2.11. CALCULUS REVIEW 53

2.11.4 Important Identities

n n k
(1) x(k) = n(k)
x x k

x+k 1 x+k 1 k
(2) = = ( 1)x
x k 1 x

2.11.5 Multinomial Theorem

If n is a positive integer and a1 ; a2 ; : : : ; ak are real numbers, then

PP P n!
(a1 + a2 + + ak )n = ax1 ax2 axk k
x1 !x2 ! xk ! 1 2

where the summation extends over all non-negative integers x1 ; x2 ; : : : ; xk with x1 + x2 +

+ xk = n.

2.11.6 Hypergeometric Identity

P
1 a b a+b
=
x=0 x n x n

2.11.7 Exponential Series

x x2 P1 xn
ex = 1 + + + = for x 2 <
1! 2! n=0 n!

2.11.8 Logarithmic Series

x2 x3
ln (1 + x) = log (1 + x) = x + for 1<x 1
2 3

2.11.9 First Fundamental Theorem of Calculus (FTC1)

If f is continuous on [a; b] then the function g de…ned by

Zx
g (x) = f (t) dt for a x b
a

is continuous on [a; b], di¤erentiable on (a; b) and g 0 (x) = f (x).

54 2. UNIVARIATE RANDOM VARIABLES

2.11.10 Fundamental Theorem of Calculus and the Chain Rule

Suppose we want the derivative with respect to x of G (x) where

h(x)
Z
G (x) = f (t) dt for a x b
a

and h (x) is a di¤erentiable function on [a; b]. If we de…ne

Zu
g (u) = f (t) dt
a

then G (x) = g (h (x)). Then by the Chain Rule

G0 (x) = g 0 (h (x)) h0 (x)

= f (h (x)) h0 (x) for a < x < b

2.11.11 Improper Integrals

Rb
(a) If f (x) dx exists for every number b a then
a

Z1 Zb
f (x) dx = lim f (x) dx
b!1
a a

provided this limit exists. If the limit exists we say the improper integral converges otherwise
we say the improper integral diverges.
Rb
(b) If f (x) dx exists for every number a b then
a

Zb Zb
f (x) dx = lim f (x) dx
a! 1
1 a

provided this limit exists.

R1 Ra
(c) If both f (x) dx and f (x) dx are convergent then we de…ne
a 1

Z1 Za Z1
f (x) dx = f (x) dx + f (x) dx
1 1 a

where a is any real number.

2.11. CALCULUS REVIEW 55

2.11.12 Comparison Test for Improper Integrals

Suppose that f and g are continuous functions with f (x) g (x) 0 for x a.
(a)
Z1 Z1
If f (x) dx is convergent then g (x) dx is convergent.
a a

(b)
Z1 Z1
If g (x) dx is divergent then f (x) dx is divergent.
a a

2.11.13 Useful Result for Comparison Test

Z1
1
dx converges if and only if p > 1 (2.8)
xp
1

2.11.14 Useful Inequalities

1 1
for y 1, p > 0
1 + yp yp
1 1 1
= p for y 1, p > 0
1 + yp yp +y p 2y

2.11.15 Taylor’s Theorem

Suppose f is a real-valued function such that the derivatives f (1) ; f (2) ; : : : ; f (n+1) all exist
on an open interval containing the point x = a. Then

f (2) (a) f (n) (a) f (n+1) (c)

f (x) = f (a)+f (1) (a) (x a)+ (x a)2 + + (x a)n + (x a)n+1
2! n! (n + 1)!

for some c between a and x.

56 2. UNIVARIATE RANDOM VARIABLES

2.12 Chapter 2 Problems

1. Consider the following functions:

(a) f (x) = kx x for x = 1; 2; : : :; 0 < < 1

h i 1
(b) f (x) = k 1 + (x= )2 for x 2 <, >0
(c) f (x) = ke jx j for x 2 <, 2<
(d) f (x) = k (1 x) for 0 < x < 1, >0
(e) f (x) = kx2 e x for x > 0, >0
(f) f (x) = kx ( +1) for x 1, >0
x= x= 2
(g) f (x) = ke 1+e for x 2 <, >0
(h) f (x) = kx 3 e 1=( x) for x > 0, >0
(i) f (x) = k (1 + x) for x > 0, >1
(j) f (x) = k (1 x) x 1 for 0 < x < 1, >0
In each case:
(1) Determine k so that f (x) is a probability (density) function and sketch f (x).
(2) Let X be a random variable with probability (density) function f (x). Find
the cumulative distribution function of X.
(3) Find E(X) and V ar(X) using the probability (density) function. Indicate
the values of for which E(X) and V ar(X) exist.
(4) Find P (0:5 < X 2) and P (X > 0:5jX 2).
In (a) use = 0:3, in (b) use = 1, in (c) use = 0, in (d) use = 5, in (e) use
= 1, in (f ) use = 1, in (g) use = 2, in (h) use = 1, in (i) use = 2, in (j)
use = 3.

2. Determine if is a location parameter, a scale parameter, or neither for the distribu-

tions in (b) (j) of Problem 1.

3. (a) If X Weibull(2; ) then show is a scale parameter for this distribution.

(b) If X Uniform(0; ) then show is a scale parameter for this distribution.

4. Suppose X is a continuous random variable with probability density function

8
<ke (x )2 =2 jx j c
f (x) =
:ke cjx j+c =2
2
jx j>c

(a) Show that

1 2 2 p
= e c =2 + 2 [2 (c) 1]
k c
where is the N(0; 1) cumulative distribution function.
2.12. CHAPTER 2 PROBLEMS 57

(b) Find the cumulative distribution function of X, E (X) and V ar (X).

(c) Show that is a location parameter for this distribution.
(d) On the same graph sketch f (x) for c = 1, = 0, f (x) for c = 2, = 0 and the
N(0; 1) probability density function. What do you notice?

5. The Geometric and Exponential distributions both have a property referred to as the
memoryless property.

(a) Suppose X Geometric(p). Show that P (X k + jjX k) = P (X j)

where k and j are nonnegative integers. Explain why this is called the memory-
less property.
(b) Show that if Y Exponential( ) then P (Y a + bjY a) = P (Y b) where
a > 0 and b > 0.

6. Suppose that f1 (x) ; f2 (x) ; : : : ; fk (x) are probability density functions with support
sets A1 ; A2 ; : : : ; Ak ; means 1 ; 2 ; : : : ; k ; and …nite variances 21 ; 22 ; : : : ; 2k respec-
Pk
tively. Suppose that 0 < p1 ; p2 ; : : : ; pk < 1 and pi = 1.
i=1

P
k
(a) Show that g (x) = pi fi (x) is a probability density function.
i=1
(b) Let X be a random variable with probability density function g (x). Find the
support set of X, E (X) and V ar (X).

(a) If X Gamma( ; ) then …nd the probability density function of Y = eX .

(b) If X Gamma( ; ) then show Y = 1=X Inverse Gamma( ; ).
(c) If X Gamma(k; ) then show Y = 2X= 2 (2k) for k = 1; 2; : : :.
(d) If X N ; 2 then …nd the probability density function of Y = eX .
(e) If X N ; 2 then …nd the probability density function of Y = X 1.

(f) If X Uniform( 2; 2) then show that Y = tan X Cauchy(1; 0).

(g) If X Pareto( ; ) then show that Y = log(X= ) Exponential(1).
(h) If X Weibull(2; ) then show that Y = X 2 Exponential( 2 ).
(i) If X Double Exponential(0; 1) then …nd the probability density function of
2
Y =X .
(j) If X t(k) then show that Y = X 2 F(1; k).
58 2. UNIVARIATE RANDOM VARIABLES

8. Suppose T t(n).

(a) Show that E (T ) = 0 if n > 1.

(b) Show that V ar(T ) = n= (n 2) if n > 2.

9. Suppose X Beta(a; b).

(a) Find E X k for k = 1; 2; : : :. Use this result to …nd E (X) and V ar (X).
(b) Graph the probability density function for (i) a = 0:7, b = 0:7, (ii) a = 1, b = 3,
(iii) a = 2, b = 2, (iv) a = 2, b = 4, and (v) a = 3, b = 1 on the same graph.
(c) What special probability density function is obtained for a = b = 1?

10. If E(jXjk ) exists for some integer k > 1, then show that E(jXjj ) exists for
j = 1; 2; : : : ; k 1.

11. If X Binomial(n; ), …nd the variance stabilizing transformation g (X) such that
V ar [g (X)] is approximately constant.

12. Prove that for any random variable X,

1 1
E X4 P X2
4 2

13. For each of the following probability (density) functions derive the moment generating
function M (t). State the values for which M (t) exists and use the moment generating
function to …nd the mean and variance.

n x
(a) f (x) = x p (1 p)n x for x = 0; 1; : : : ; n; 0 0
(c) f (x) = 1 e (x )= for x > ; 2 <, >0
(d) f (x) = 21 e jx j for x 2 <; 2<
(e) f (x) = 2x for 0 < x < 1
8
>
<x
> 0 x 1
(f) f (x) = 2 x 1<x 2
>
>
:0 otherwise

14. Suppose X is a random variable with moment generating function M (t) = E(etX )
which exists for t 2 ( h; h) for some h > 0. Then K(t) = log M (t) is called the
cumulant generating function of X.

(a) Show that E(X) = K 0 (0) and V ar(X) = K 00 (0).

2.12. CHAPTER 2 PROBLEMS 59

(b) If X Negative Binomial(k; p) then use (a) to …nd E(X) and V ar(X).

15. For each of the following …nd the Maclaurin series for M (t) using known series. Thus
determine all the moments of X if X is a random variable with moment generating
function M (t):

(a) M (t) = (1 t) 3 for jtj < 1

(b) M (t) = (1 + t)=(1 t) for jtj < 1
(c) M (t) = et =(1 t2 ) for jtj < 1

16. Suppose Z N(0; 1) and Y = jZj.

2
(a) Show that MY (t) = 2 (t) et =2 for t 2 < where (t) is the cumulative distrib-
ution function of a N(0; 1) random variable.
(b) Use (a) to …nd E (jZj) and V ar (jZj).

17. Suppose X 2 and Z N(0; 1). Use the properties of moment generating func-
(1)
tions to compute E X and E Z k for k = 1; 2; : : : How are these two related? Is
k

this what you expected?

18. Suppose X and Y are discrete random variables such that P (X = j) = pj and
P (Y = j) = qj for j = 0; 1; : : :. Suppose also that MX (t) = MY (t) for t 2 ( h; h),
h > 0. Show that X and Y have the same distribution. (Hint: Compare MX (log s)
and MY (log s) and recall that if two power series are equal then their coe¢ cients are
equal.)

19. Suppose X is a random variable with moment generating function M (t) = et =(1 t2 )
for jtj < 1.

(a) Find the moment generating function of Y = (X 1) =2.

(b) Use the moment generating function of Y to …nd E (Y ) and V ar (Y ).
(c) What is the distribution of Y ?
60 2. UNIVARIATE RANDOM VARIABLES
3. Multivariate Random Variables

Models for real phenomena usually involve more than a single random variable. When there
are multiple random variables associated with an experiment or process we usually denote
them as X; Y; : : : or as X1 ; X2 ; : : : . For example, your …nal mark in a course might involve
X1 = your assignment mark, X2 = your midterm test mark, and X3 = your exam mark.
We need to extend the ideas introduced in Chapter 2 for univariate random variables to
deal with multivariate random variables.

In Section 3:1 we began be de…ning the joint and marginal cumulative distribution
functions since these de…nitions hold regardless of what type of random variable we have.
We de…ne these functions in the case of two random variables X and Y . More than two
random variables will be considered in speci…c examples in later sections. In Section 3:2 we
brie‡y review discrete joint probability functions and marginal probability functions that
were introduced in a previous probability course. In Section 3:3 we introduce the ideas
needed for two continuous random variables and look at detailed examples since this is new
material. In Section 3:4 we de…ne independence for two random variables and show how the
Factorization Theorem for Independence can be used. When two random variables are not
independent then we are interested in conditional distributions. In Section 3:5 we review
the de…nition of a conditional probability function for discrete random variables and de…ne
a conditional probability density function for continuous random variables which is new
material. In Section 3:6 we review expectations of functions of discrete random variables.
We also de…ne expectations of functions of continuous random variables which is new ma-
terial except for the case of Normal random variables. In Section 3:7 we de…ne conditional
expectations which arise from the conditional distributions discussed in Section 3:5. In
Section 3:8 we discuss moment generating functions for two or more random variables, and
show how the Factorization Theorem for Moment Generating Functions can be used to
prove that random variables are independent. In Section 3:9 we review the Multinomial
distribution and its properties. In Section 3:10 we introduce the very important Bivariate
Normal distribution and its properties. Section 3:11 contains some useful results related to
evaluating double integrals.

61
62 3. MULTIVARIATE RANDOM VARIABLES

3.1 Joint and Marginal Cumulative Distribution Functions

We begin with the de…nitions and properties of the cumulative distribution functions asso-
ciated with two random variables.

3.1.1 De…nition - Joint Cumulative Distribution Function

Suppose X and Y are random variables de…ned on a sample space S. The joint cumulative
distribution function of X and Y is given by

F (x; y) = P (X x; Y y) for (x; y) 2 <2

3.1.2 Properties - Joint Cumulative Distribution Function

(1) F is non-decreasing in x for …xed y
(2) F is non-decreasing in y for …xed x
(3) lim F (x; y) = 0 and lim F (x; y) = 0
x! 1 y! 1
(4) lim F (x; y) = 0 and lim F (x; y) = 1
(x;y)!( 1; 1) (x;y)!(1;1)

3.1.3 De…nition - Marginal Distribution Function

The marginal cumulative distribution function of X is given by

F1 (x) = lim F (x; y)

y!1
= P (X x) for x 2 <

The marginal cumulative distribution function of Y is given by

F2 (y) = lim F (x; y)

x!1
= P (Y y) for y 2 <

Note: The de…nitions and properties of the joint cumulative distribution function and the
marginal cumulative distribution functions hold for both (X; Y ) discrete random variables
and for (X; Y ) continuous random variables.
Joint and marginal cumulative distribution functions for discrete random variables are not
very convenient for determining probabilities. Joint and marginal probability functions,
which are de…ned in Section 3.2, are more frequently used for discrete random variables.
In Section 3.3 we look at speci…c examples of joint and marginal cumulative distribution
functions for continuous random variables. In Chapter 5 we will see the important role of
cumulative distribution functions in determining asymptotic distributions.
3.2. BIVARIATE DISCRETE DISTRIBUTIONS 63

3.2 Bivariate Discrete Distributions

Suppose X and Y are random variables de…ned on a sample space S. If there is a countable
subset A <2 such that P [(X; Y ) 2 A] = 1, then X and Y are discrete random variables.
Probabilities for discrete random variables are most easily handled in terms of joint
probability functions.

3.2.1 De…nition - Joint Probability Function

Suppose X and Y are discrete random variables.
The joint probability function of X and Y is given by

f (x; y) = P (X = x; Y = y) for (x; y) 2 <2

The set A = f(x; y) : f (x; y) > 0g is called the support set of (X; Y ).

3.2.2 Properties of Joint Probability Function

(1) f (x; y) 0 for (x; y) 2 <2
P P
(2) f (x; y) = 1
(x;y)2 A
(3) For any set R <2
P P
P [(X; Y ) 2 R] = f (x; y)
(x;y)2 R

3.2.3 De…nition - Marginal Probability Function

Suppose X and Y are discrete random variables with joint probability function f (x; y).
The marginal probability function of X is given by

f1 (x) = P (X = x)
P
= f (x; y) for x 2 <
all y

and the marginal probability function of Y is given by

f2 (y) = P (Y = y)
P
= f (x; y) for y 2 <
all x

3.2.4 Example
In a fourth year statistics course there are 10 actuarial science students, 9 statistics students
and 6 math business students. Five students are selected at random without replacement.
Let X be the number of actuarial science students selected, and let Y be the number of
statistics students selected.
64 3. MULTIVARIATE RANDOM VARIABLES

Find

(a) the joint probability function of X and Y

(b) the marginal probability function of X

(c) the marginal probability function of Y

(d) P (X > Y )

Solution
(a) The joint probability function of X and Y is

f (x; y) = P (X = x; Y = y)
10 9 6
x y 5 x y
=
25
5
for x = 0; 1; : : : ; 5, y = 0; 1; : : : ; 5, x + y 5

(b) The marginal probability function of X is

f1 (x) = P (X = x)
10 9 6
X1
x y 5 x y
=
25
y=0
5
10 15 9 6
1
X
x 5 x y 5 x y
=
25 15
y=0
5 5 x
10 15
x 5 x
= for x = 0; 1; : : : ; 5
25
5

by the Hypergeometric identity 2.11.6. Note that the marginal probability function of X
is Hypergeometric(25; 10; 5). This makes sense because, when we are only interested in the
number of actuarial science students, we only have two types of objects (actuarial science
students and non-actuarial science students) and we are sampling without replacement
which gives us the familiar Hypergeometric probability function.
3.2. BIVARIATE DISCRETE DISTRIBUTIONS 65

(c) The marginal probability function of Y is

f2 (y) = P (Y = y)
10 9 6
X1
x y 5 x y
=
25
x=0
5
9 16 10 6
1
X
y 5 y x 5 x y
=
25 16
x=0
5 5 y
9 16
y 5 y
= for y = 0; 1; : : : ; 5
25
5

by the Hypergeometric identity 2.11.6. The marginal probability function of Y is

Hypergeometric(25; 9; 5).
(d)
P P
P (X > Y ) = f (x; y)
(x;y): x>y
= f (1; 0) + f (2; 0) + f (3; 0) + f (4; 0) + f (5; 0)
+f (2; 1) + f (3; 1) + f (4; 1)
+f (3; 2)

3.2.5 Exercise
The Hardy-Weinberg law of genetics states that, under certain conditions, the relative
frequencies with which three genotypes AA; Aa and aa occur in the population will be 2 ;
2 (1 ) and (1 )2 respectively where 0 < < 1. Suppose n members of a very large
population are selected at random.
Let X be the number of AA types selected and let Y be the number of Aa types selected.
Find
(a) the joint probability function of X and Y
(b) the marginal probability function of X
(c) the marginal probability function of Y
(d) P (X + Y = t) for t = 0; 1; : : :.
66 3. MULTIVARIATE RANDOM VARIABLES

3.3 Bivariate Continuous Distributions

Probabilities for continuous random variables can also be specifed in terms of joint proba-
bility density functions.

3.3.1 De…nition - Joint Probability Density Function

Suppose that F (x; y) is a continuous function and that

@2
f (x; y) = F (x; y)
@x@y
exists and is a continuous function except possibly along a …nite number of curves. Suppose
also that
Z1 Z1
f (x; y)dxdy = 1
1 1

Then X and Y are said to be continuous random variables with joint probability density
function f . The set A = f(x; y) : f (x; y) > 0g is called the support set of (X; Y ).

@2
Note: We will arbitrarily de…ne f (x; y) to be equal to 0 when @x@y F (x; y) does not exist
although we could de…ne it to be any real number.

3.3.2 Properties - Joint Probability Density Function

(1)
f (x; y) 0 for all (x; y) 2 <2
(2)
ZZ
P [(X; Y ) 2 R] = f (x; y)dxdy for R <2
R
= the volume under the surface z = f (x; y)
and above the region R in the xy plane

3.3.3 Example
Suppose X and Y are continuous random variables with joint probability density function

f (x; y) = x + y for 0 < x < 1; 0 < y < 1

and 0 otherwise. The support set of (X; Y ) is A = f(x; y) : 0 < x < 1; 0 < y < 1g.
3.3. BIVARIATE CONTINUOUS DISTRIBUTIONS 67

The joint probability function for (x; y) 2 A is graphed in Figure 3.1. We notice that the
surface is the portion of the plane z = x + y lying above the region A.

1.5

0.5

0
1
1
0.8
0.5 0.6
0.4
0.2
0 0

Figure 3.1: Graph of joint probability density function for Example 3.3.3

(a) Show that

Z1 Z1
f (x; y)dxdy = 1
1 1

(b) Find
1 1
(i) P X 3; Y 2
(ii) P (X Y)
1
(iii) P X + Y 2
1
(iv) P XY 2 .
68 3. MULTIVARIATE RANDOM VARIABLES

Solution
(a) A graph of the support set for (X; Y ) is given in Figure 3.2. Such a graph is useful for
determining the limits of integration of the double integral.

y A

0 x 1

Figure 3.2: Graph of the support set of (X; Y ) for Example 3.3.3

Z1 Z1 Z Z
f (x; y)dxdy = (x + y) dxdy
1 1 (x;y) 2A
Z1 Z1
= (x + y) dxdy
0 0
Z1
1 2
= x + xy j10 dy
2
0
Z1
1
= + y dy
2
0
1 1
= y + y 2 j10
2 2
1 1
= +
2 2
= 1
3.3. BIVARIATE CONTINUOUS DISTRIBUTIONS 69

(b) (i) A graph of the region of integration is given in Figure 3.3

1/2

x
0 1/3 1

Figure 3.3: Graph of the integration region for Example 3.3.3(b)(i)

Z Z
1 1
P X ;Y = (x + y) dxdy
3 2
(x;y) 2B

Z1=2Z1=3
= (x + y) dxdy
0 0
Z1=2
1 2 1=3
= x + xy j0 dy
2
0
Z1=2" 2
#
1 1 1
= + y dy
2 3 3
0
1 1 1=2
= y + y 2 j0
18 6
2
1 1 1 1
= +
18 2 6 2
5
=
72
70 3. MULTIVARIATE RANDOM VARIABLES

(ii) A graph of the region of integration is given in Figure 3.4. Note that when the
region is not rectangular then care must be taken with the limits of integration.

y =x

(y ,y )

x
0 1

Figure 3.4: Graph of the integration region for Example 3.3.3(b)(ii)

Z Z
P (X Y)= (x + y) dxdy
(x;y) 2C
Z1 Zy
= (x + y) dxdy
y=0 x=0
Z1
1 2
= x + xy jy0 dy
2
0
Z1
1 2
= y + y 2 dy
2
0
Z1
3 2 1
= y dy = y 3 j10
2 2
0
1
=
2
Alternatively
Z1 Z1
P (X Y)= (x + y) dydx
x=0 y=x

Why does the answer of 1=2 make sense when you look at Figure 3.1?
3.3. BIVARIATE CONTINUOUS DISTRIBUTIONS 71

(iii) A graph of the region of integration is given in Figure 3.5.

1/2
(x,1/2-x)

0 1/2 1 x

x+y =1/2

Figure 3.5: Graph of the region of integration for Example 3.3.3(b)(iii)

Z Z
1
P X +Y = (x + y) dxdy
2
(x;y) 2D
1 1
x
Z2 Z
2

= (x + y) dydx
x=0 y=0
1
Z2
1 1
x
= xy + y 2 j02 dx
2
0

Z2 " #
1
2
1 1 1
= x x + x dx
2 2 2
0
1
Z2
x2 1 x3 1 1=2
= + dx = + x j0
2 8 6 8
0
2
=
48
Alternatively
1 1
y
Z2 Z
2
1
P X +Y = (x + y) dxdy
2
y=0 x=0

Why does this small probability make sense when you look at Figure 3.1?
72 3. MULTIVARIATE RANDOM VARIABLES

(iv) A graph of the region of integration E is given in Figure 3.6. In this example the
integration can be done more easily by integrating over the region F .

1
(x,1/(2x))

1/2 -
xy=1/2
E

| x
0 1/2 1

Figure 3.6: Graph of the region of integration for Example 3.3.3(b)(iv)

Z Z
1
P XY = (x + y) dxdy
2
(x;y) 2E
Z Z
= 1 (x + y) dxdy
(x;y) 2F
Z1 Z1
= 1 (x + y) dydx
1
x= 12 y= 2x

Z1
1
= 1 xy + y 2 j11 dx
2 2x
1
2

Z1 ( "
2
#)
1 1 1 1
= 1 x+ x + dx
2 2x 2 2x
1
2

Z1
1
= 1 x dx
8x2
1
2

1 2 1
= 1 x + j11
2 8x 2

3
=
4
3.3. BIVARIATE CONTINUOUS DISTRIBUTIONS 73

3.3.4 De…nition of Marginal Probability Density Function

Suppose X and Y are continuous random variables with joint probability density function
f (x; y).
The marginal probability density function of X is given by
Z1
f1 (x) = f (x; y)dy for x 2 <
1

and the marginal probability density function of Y is given by

Z1
f2 (y) = f (x; y)dx for y 2 <
1

3.3.5 Example
For the joint probability density function in Example 3.3.3 determine:
(a) the marginal probability density function of X and the marginal probability density
function of Y
(b) the joint cumulative distribution function of X and Y
(c) the marginal cumulative distribution function of X and the marginal cumulative distri-
bution function of Y

Solution
(a) The marginal probability density function of X is
Z1 Z1
1 1
f1 (x) = f (x; y)dy = (x + y) dy = xy + y 2 j10 = x + for 0 < x < 1
2 2
1 0

and 0 otherwise.
Since both the joint probability density function f (x; y) and the support set A are symmetric
in x and y then by symmetry the marginal probability density function of Y is
1
f2 (y) = y + for 0 < y < 1
2
and 0 otherwise.
(b) Since
Zy Zx Zy Zy
1 2 1 2
P (X x; Y y) = (s + t) dsdt = s + st jx0 dt = x + xt dt
2 2
0 0 0 0
1 2 1
= x t + xt2 jy0
2 2
1 2
= x y + xy 2 for 0 < x < 1, 0 < y < 1
2
74 3. MULTIVARIATE RANDOM VARIABLES

Zx Z1
1 2
P (X x; Y y) = (s + t) dtds = x +x for 0 < x < 1, y 1
2
0 0
Zy Z1
1 2
P (X x; Y y) = (s + t) dsdt = y +y for x 1, 0 < y < 1
2
0 0

the joint cumulative distribution function of X and Y is

8
>
> 0 x 0 or y 0
>
>
>
> 1
x2 y + xy 2 0 < x < 1, 0 < y < 1
>
< 2
1
F (x; y) = P (X x; Y y) = 2 x2 + x 0 < x < 1, y 1
>
>
>
> 1
y2 +y x 1, 0 < y < 1
>
> 2
>
: 1 x 1, y 1

F1 (x) = P (X x)
= lim F (x; y) = lim P (X x; Y y)
y!1 y!1
= F (x; 1)
1 2
= x +x for 0 < x < 1
2
Alternatively
Zx Zx
1
F1 (x) = P (X x) = f1 (s) ds = s+ ds
2
1 0
1 2
= x +x for 0 < x < 1
2
In either case the marginal cumulative distribution function of X is
8
>
> 0 x 0
<
1 2
F1 (x) = 2 x +x 0<x<1
>
>
: 1 x 1

By symmetry the marginal cumulative distribution function of Y is

8
>
> 0 y 0
<
1 2
F2 (y) = 2 y +y 0<y<1
>
>
: 1 y 1
3.3. BIVARIATE CONTINUOUS DISTRIBUTIONS 75

3.3.6 Exercise
Suppose X and Y are continuous random variables with joint probability density function
k
f (x; y) = for x 0; y 0
(1 + x + y)3

and 0 otherwise.
(a) Determine k and sketch f (x; y).
(b) Find
(i) P (X 1; Y 2)
(ii) P (X Y )
(iii) P (X + Y 1)
(c) Determine the marginal probability density function of X and the marginal probability
density function of Y .
(d) Determine the joint cumulative distribution function of X and Y .
(e) Determine the marginal cumulative distribution function of X and the marginal cumu-
lative distribution function of Y .

3.3.7 Exercise
Suppose X and Y are continuous random variables with joint probability density function
x y
f (x; y) = ke for y x 0

3.4 Independent Random Variables

Suppose we are modeling a phenomenon involving two random variables. For example
suppose X is your …nal mark in this course and Y is the time you spent doing practice
problems. We would be interested in whether the distribution of one random variable
a¤ects the distribution of the other. The following de…nition de…nes this idea precisely.
The de…nition should remind you of the de…nition of independent events (see De…nition
2.1.8).

3.4.1 De…nition - Independent Random Variables

Two random variables X and Y are called independent random variables if and only if

P (X 2 A and Y 2 B) = P (X 2 A)P (Y 2 B)

for all sets A and B of real numbers.

De…nition 3.4.1 is not very convenient for determining the independence of two random
variables. The following theorem shows how to use the marginal and joint cumulative
distribution functions or the marginal and joint probability (density) functions to determine
if two random variables are independent.

3.4.2 Theorem - Independent Random Variables

(1) Suppose X and Y are random variables with joint cumulative distribution function
F (x; y). Suppose also that F1 (x) is the marginal cumulative distribution function of X and
F2 (y) is the marginal cumulative distribution function of Y . Then X and Y are independent
random variables if and only if

F (x; y) = F1 (x) F2 (y) for all (x; y) 2 <2

(2) Suppose X and Y are random variables with joint probability (density) function f (x; y).
Suppose also that f1 (x) is the marginal probability (density) function of X with support
set A1 = fx : f1 (x) > 0g and f2 (y) is the marginal probability (density) function of Y with
support set A2 = fy : f2 (y) > 0g. Then X and Y are independent random variables if and
only if
f (x; y) = f1 (x) f2 (y) for all (x; y) 2 A1 A2
where A1 A2 = f(x; y) : x 2 A1 ; y 2 A2 g.

Proof
(1) For given (x; y), let Ax = fs : s xg and let By = ft : t yg. Then by De…nition 3.4.1
X and Y are independent random variables if and only if

P (X 2 Ax and Y 2 By ) = P (X 2 Ax ) P (Y 2 By )
3.4. INDEPENDENT RANDOM VARIABLES 77

for all (x; y) 2 <2 .

But
P (X 2 Ax and Y 2 By ) = P (X x; Y y) = F (x; y)

P (X 2 Ax ) = P (X x) = F1 (x)

and
P (Y 2 By ) = F2 (y)

Therefore X and Y are independent random variables if and only if

F (x; y) = F1 (x) F2 (y) for all (x; y) 2 <2

as required.
(2) (Continuous Case) From (1) we have X and Y are independent random variables if
and only if
F (x; y) = F1 (x) F2 (y) (3.1)
@ @
for all (x; y) 2 <2 . Now @x F1 (x) exists for x 2 A1 and @y F2 (y) exists for y 2 A2 . Taking
@2
the partial derivative @x@y of both sides of (3.1) where the partial derivative exists implies
that X and Y are independent random variables if and only if

@2 @ @
F (x; y) = F1 (x) F2 (y) for all (x; y) 2 A1 A2
@x@y @x @y

or
f (x; y) = f1 (x) f2 (y) for all (x; y) 2 A1 A2

as required.

Note: The discrete case can be proved using an argument similar to the one used for (1).

3.4.3 Example

(a) In Example 3.2.4 determine if X and Y are independent random variables.

(b) In Example 3.3.3 determine if X and Y are independent random variables.

Solution
(a) Since the total number of students is …xed, a larger number of actuarial science students
would imply a smaller number of statistics students and we would guess that the random
variables are not independent. To show this we only need to …nd one pair of values (x; y)
78 3. MULTIVARIATE RANDOM VARIABLES

for which P (X = x; Y = y) 6= P (X = x) P (Y = y). Since

10 9 6 6
0 0 5 5
P (X = 0; Y = 0) = =
25 25
5 5
10 15 15
0 5 5
P (X = 0) = =
25 25
5 5
9 16 16
0 5 5
P (Y = 0) = =
25 25
5 5

and
6 15 16
5 5 5
P (X = 0; Y = 0) = 6= P (X = 0) P (Y = 0) =
25 25 25
5 5 5
therefore by Theorem 3.4.2, X and Y are not independent random variables.
(b) Since
f (x; y) = x + y for 0 < x < 1; 0 < y < 1

1 1
f1 (x) = x + for 0 < x < 1, f2 (y) = y + for 0 < y < 1
2 2
it would appear that X and Y are not independent random variables. To show this we only
need to …nd one pair of values (x; y) for which f (x; y) 6= f1 (x) f2 (y). Since

2 1 2 1 7 5
f ; = 1 6= f1 f2 =
3 3 3 3 6 6

therefore by Theorem 3.4.2, X and Y are not independent random variables.

3.4.4 Exercise
In Exercises 3.2.5, 3.3.6 and 3.3.7 determine if X and Y independent random variables.

In the previous examples we determined whether the random variables were independent
using the joint probability (density) function and the marginal probability (density) func-
tions. The following very useful theorem does not require us to determine the marginal
probability (density) functions.
3.4. INDEPENDENT RANDOM VARIABLES 79

3.4.5 Factorization Theorem for Independence

Suppose X and Y are random variables with joint probability (density) function f (x; y).
Suppose also that A is the support set of (X; Y ), A1 is the support set of X, and A2 is the
support set of Y . Then X and Y are independent random variables if and only if there
exist non-negative functions g (x) and h (y) such that

f (x; y) = g (x) h (y) for all (x; y) 2 A1 A2

Notes:
(1) If the Factorization Theorem for Independence holds then the marginal probability
(density) function of X will be proportional to g and the marginal probability (density)
function of Y will be proportional to h.
(2) Whenever the support set A is not rectangular the random variables will not be inde-
pendent. The reason for this is that when the support set is not rectangular it will always
be possible to …nd a point (x; y) such that x 2 A1 with f1 (x) > 0, and y 2 A2 with
f2 (y) > 0 so that f1 (x) f2 ( y) > 0, but (x; y) 2= A so f (x; y) = 0. This means there is a
point (x; y) such that f (x; y) 6= f1 (x) f2 ( y) and therefore X and Y are not independent
random variables.
(3) The above de…nitions and theorems can easily be extended to the random vector
(X1 ; X2 ; : : : ; Xn ).

Proof (Continuous Case)

If X and Y are independent random variables then by Theorem 3.4.2

f (x; y) = f1 (x) f2 (y) for all (x; y) 2 A1 A2

Letting g (x) = f1 (x) and h (y) = f2 (y) proves there exist g (x) and h (y) such that

f (x; y) = g (x) h (y) for all (x; y) 2 A1 A2

If there exist non-negative functions g (x) and h (y) such that

f (x; y) = g (x) h (y) for all (x; y) 2 A1 A2

then
Z1 Z
f1 (x) = f (x; y)dy = g (x) h (y) dy = cg (x) for x 2 A1
1 y2A2

and
Z1 Z
f2 (y) = f (x; y)dx = g (x) h (y) dy = kh (y) for y 2 A2
1 x2A1
80 3. MULTIVARIATE RANDOM VARIABLES

Now
Z1 Z1
1 = f (x; y)dxdy
1 1
Z Z
= f1 (x) f2 (y) dxdy
y2A2 x2A1
Z Z
= cg (x) kh (y) dxdy
y2A2 x2A1
= ck

Since ck = 1

f (x; y) = g (x) h (y) = ckg (x) h (y)

= cg (x) kh (y)
= f1 (x) f2 (y) for all (x; y) 2 A1 A2

and by Theorem 3.4.2 X and Y are independent random variables.

3.4.6 Example
Suppose X and Y are discrete random variables with joint probability function
x+y
e 2
f (x; y) = for x = 0; 1 : : : ; y = 0; 1; : : :
x!y!
(a) Determine if X and Y independent are random variables.
(b) Determine the marginal probability function of X and the marginal probability function
of Y .

Solution
(a) The support set of (X; Y ) is A = f(x; y) : x = 0; 1; : : : ; y = 0; 1; : : :g which is rec-
tangular. The support set of X is A1 = fx : x = 0; 1; : : :g, and the support set of Y is
A2 = fy : y = 0; 1; : : :g.
Let
x y
e e
g (x) = and h (y) =
x! y!
Then f (x; y) = g(x)h(y) for all (x; y) 2 A1 A2 . Therefore by the Factorization Theorem
for Independence X and Y are independent random variables.
(b) By inspection we can see that g (x) is the probability function for a Poisson( ) random
variable. Therefore the marginal probability function of X is
x
e
f1 (x) = for x = 0; 1 : : :
x!
3.4. INDEPENDENT RANDOM VARIABLES 81

Similarly the marginal probability function of Y is

y
e
f2 (y) = for y = 0; 1 : : :
y!
and Y Poisson( ).

3.4.7 Example
Suppose X and Y are continuous random variables with joint probability density function
3
f (x; y) = y 1 x2 for 1 < x < 1; 0 < y < 1
2
and 0 otherwise.
(a) Determine if X and Y independent are random variables.
(b) Determine the marginal probability function of X and the marginal probability function
of Y .

Solution
(a) The support set of (X; Y ) is A = f(x; y) : 1 < x < 1; 0 < y < 1g which is rectan-
gular. The support set of X is A1 = fx : 1 < x < 1g, and the support set of Y is
A2 = fy : 0 < y < 1g.
Let
3
g (x) = 1 x2 and h (y) = y
2
Then f (x; y) = g(x)h(y) for all (x; y) 2 A1 A2 . Therefore by the Factorization Theorem
for Independence X and Y are independent random variables.
(b) Since the marginal probability density function of Y is proportional to h (y) we know
f2 (y) = kh (y) for 0 < y < 1 where k is determined by
Z1
3 3k 2 1 3k
1=k ydy = y j0 =
2 4 4
0
4
Therefore k = 3 and
f2 (y) = 2y for 0 < y < 1
and 0 otherwise.
Since X and Y are independent random variables f (x; y) = f1 (x) f2 (y) or
f1 (x) = f (x; y)=f2 (y) for x 2 A1 . Therefore the marginal probability density function of
X is
3
f (x; y) y 1 x2
f1 (x) = = 2
f2 (y) 2y
3
= 1 x2 for 1<x<1
4
and 0 otherwise.
82 3. MULTIVARIATE RANDOM VARIABLES

3.4.8 Example
Suppose X and Y are continuous random variables with joint probability density function
2 p
f (x; y) = for 0 < x < 1 y2, 1<y<1

and 0 otherwise.
(a) Determine if X and Y independent are random variables.
(b) Determine the marginal probability function of X and the marginal probability function
of Y .

Solution
(a) The support set of (X; Y ) which is
n p o
A = (x; y) : 0 < x < 1 y2; 1<y<1

is graphed in Figure 3.7.

y
1

x
0 1
A

-1

Figure 3.7: Graph of the support set of (X; Y ) for Example 3.4.8

The region A can also be described as

n p p o
A = (x; y) : 0 < x < 1; 1 x2 < y < 1 x2

The region A is half of a unit circle which has area equal to =2. Since the probability
density function is constant or uniform on this region we can see that f (x; y) must equal
2= on the region A since the volume of the solid must be equal to 1.
3.4. INDEPENDENT RANDOM VARIABLES 83

The support set for X is

A1 = fx : 0 < x < 1g
and the support set for Y is
A2 = fy : 1 < y < 1g
If we choose the point (0:9; 0:9) 2 A1 A2 , f (0:9; 0:9) = 0 but f1 (0:9) > 0 and
f2 (0:9) > 0 so f (0:9; 0:9) 6= f1 (0:9) f2 (0:9) and therefore X and Y are not independent
random variables.
(b) When the support set is not rectangular care must be taken to determine the marginal
probability functions.
To …nd the marginal probability density function of X we use the description of the support
set in which the range of X does not depend on y which is
n p p o
A = (x; y) : 0 < x < 1; 1 x2 < y < 1 x2

The marginal probability density function of X is

Z1
f1 (x) = f (x; y)dy
1
p
Z1 x2
2
= dy
p
1 x2
4p
= 1 x2 for 0 < x < 1

and 0 otherwise.
To …nd the marginal probability density function of Y which use the description of the
support set in which the range of Y does not depend on x which is
n p o
A = (x; y) : 0 < x < 1 y 2 ; 1 < y < 1

The marginal probability density function of Y is

Z1
f2 (y) = f (x; y)dx
1
p
Z1 y2
2
= dx
0
2p
= 1 y2 for 1<y<1

and 0 otherwise.
84 3. MULTIVARIATE RANDOM VARIABLES

3.5 Conditional Distributions

In Section 2.1 we de…ned the conditional probability of event A given event B as
P (A \ B)
P (AjB) = provided P (B) > 0
P (B)
The concept of conditional probability can also be extended to random variables.

3.5.1 De…nition - Conditional Probability (Density) Function

Suppose X and Y are random variables with joint probability (density) function f (x; y),
and marginal probability (density) functions f1 (x) and f2 (y) respectively. Suppose also
that the support set of (X; Y ) is A = f(x; y) : f (x; y) > 0g.
The conditional probability (density) function of X given Y = y is given by
f (x; y)
f1 (xjy) = (3.2)
f2 (y)
for (x; y) 2 A provided f2 (y) 6= 0.
The conditional probability (density) function of Y given X = x is given by
f (x; y)
f2 (yjx) = (3.3)
f1 (x)
for (x; y) 2 A provided f1 (x) 6= 0.

Notes:
(1) If X and Y are discrete random variables then

f1 (xjy) = P (X = xjY = y)
P (X = x; Y = y)
=
P (Y = y)
f (x; y)
=
f2 (y)
and
P P f (x; y)
f1 (xjy) =
x x f2 (y)
1 P
= f (x; y)
f2 (y) x
f2 (y)
=
f2 (y)
= 1

Similarly for f2 (yjx).

3.5. CONDITIONAL DISTRIBUTIONS 85

(2) If X and Y are continuous random variables

Z1 Z1
f (x; y)
f1 (xjy)dx = dx
f2 (y)
1 1
Z1
1
= f (x; y)dx
f2 (y)
1
f2 (y)
=
f2 (y)
= 1

Similarly for f2 (yjx).

(3) If X is a continuous random variable then f1 (x) 6= P (X = x) and P (X = x) = 0 for all
x. Therefore to justify the de…nition of the conditional probability density function of Y
given X = x when X and Y are continuous random variables we consider P (Y yjX = x)
as a limit

P (Y yjX = x) = lim P (Y yjx X x + h)

h!0
R Ry
x+h
f (u; v) dvdu
x 1
= lim
h!0 R
x+h
f1 (u) du
x

d
R Ry
x+h
dh f (u; v) dvdu
x 1
= lim by L’Hôpital’s Rule
h!0
d
R
x+h
dh f1 (u) du
x
Ry
f (x + h; v) dv
1
= lim by the Fundamental Theorem of Calculus
h!0 f1 (x + h)
Ry
limh!0 f (x + h; v) dv
1
=
limh!0 f1 (x + h)
Ry
f (x; v) dv
1
=
f1 (x)
assuming that the limits exist and that integration and the limit operation can be inter-
changed. If we di¤erentiate the last term with respect to y using the Fundamental Theorem
of Calculus we have
d f (x; y)
P (Y yjX = x) =
dy f1 (x)
86 3. MULTIVARIATE RANDOM VARIABLES

which gives us a justi…cation for using

f (x; y)
f2 (yjx) =
f1 (x)

as the conditional probability density function of Y given X = x.

(4) For a given value of x, call it x , we can think of obtaining the conditional probability
density function of Y given X = x geometrically in the following way. Think of the curve
of intersection which is obtained by cutting through the surface z = f (x; y) with the plane
x = x which is parallel to the yz plane. The curve of intersection is z = f (x ; y) which
is a curve lying in the plane x = x . The area under the curve z = f (x ; y) and lying
above the xy plane is not necessarily equal to 1 and therefore z = f (x ; y) is not a proper
probability density function. However if we consider the curve z = f (x ; y) =f1 (x ), which
is just a rescaled version of the curve z = f (x ; y), then the area lying under the curve
z = f (x ; y) =f1 (x ) and lying above the xy plane is equal to 1. This is the probability
density function we want.

3.5.2 Example
In Example 3.4.8 determine the conditional probability density function of X given Y = y
and the conditional probability density function of Y given X = x.

Solution
The conditional probability density function of X given Y = y is

f (x; y)
f1 (xjy) =
f2 (y)
2
= 2
p
1 y2
1 p
= p for 0 < x < 1 y2; 1<y<1
1 y2

Note that for each y 2 ( 1; 1), the conditional probability density function of X given
p
Y = y is Uniform 0; 1 y 2 . This makes sense because the joint probability density
function is constant on its support set.
The conditional probability density function of Y given X = x is

f (x; y)
f2 (yjx) =
f1 (x)
2
= 4
p
1 x2
1 p p
= p for 1 x2 < y < 1 x2 ; 0 < x < 1
2 1 x2
3.5. CONDITIONAL DISTRIBUTIONS 87

Note that for each x 2 (0; 1), the conditional probability density function of Y given X = x
p p
is Uniform 1 x2 ; 1 x2 . This again makes sense because the joint probability
density function is constant on its support set.

3.5.3 Exercise
In Exercise 3.2.5 show that the conditional probability function of Y given X = x is

2 (1 )
Binomial n x; 2
1
Why does this make sense?

3.5.4 Exercise
In Example 3.3.3 and Exercises 3.3.6 and 3.3.7 determine the conditional probability density
function of X given Y = y and the conditional probability density function of Y given
X = x. Be sure to check that
Z1 Z1
f1 (xjy)dx = 1 and f2 (yjx)dy = 1
1 1

When choosing a model for bivariate data it is sometimes easier to specify a conditional
probability (density) function and a marginal probability (density) function. The joint
probability (density) function can then be determined using the Product Rule which is
obtained by rewriting (3.2) and (3.3).

3.5.5 Product Rule

Suppose X and Y are random variables with joint probability (density) function f (x; y),
marginal probability (density) functions f1 (x) and f2 (y) respectively and conditional prob-
ability (density) function’s f1 (xjy) and f2 (yjx). Then

f (x; y) = f1 (xjy)f2 (y)

= f2 (yjx)f1 (x)

3.5.6 Example
In modeling survival in a certain insect population it is assumed that the number of eggs
laid by a single female follows a Poisson( ) distribution. It is also assumed that each egg
has probability p of surviving independently of any other egg. Determine the probability
function of the number of eggs that survive.
88 3. MULTIVARIATE RANDOM VARIABLES

Solution
Let Y = number of eggs laid and let X = number of eggs that survive. Then Y Poisson( )
and XjY = y Binomial(y; p). We want to determine the marginal probability function
of X.
By the Product Rule the joint probability function of X and Y is

f (x; y) = f1 (xjy) f2 (y)

y x ye
= p (1 p)y x
x y!
y x
px e (1 p) y
=
x! (y x)!

with support set is

A = f(x; y) : x = 0; 1; : : : ; y; y = 0; 1; : : :g
which can also be written as

A = f(x; y) : y = x; x + 1; : : : ; x = 0; 1; : : :g (3.4)

The marginal probability function of X can be obtained using

P
f1 (x) = f (x; y)
all y

Since we are summing over y we need to use the second description of the support set given
in (3.4). So

P1 px e (1 p)y x y
f1 (x) =
y=x x! (y x)!
px e x P
1 (1 p)y x y x
= let u = y x
x! y=x (y x)!
px e x P1 [ (1 p)]u
=
x! u=0 u!
x
p e x
= e (1 p) by the Exponential series 2.11.7
x!
(p )x e p
= for x = 0; 1; : : :
x!
which we recognize as a Poisson(p ) probability function.

3.5.7 Example
Determine the marginal probability function of X if Y Gamma( ; 1 ) and the conditional
distribution of X given Y = y is Weibull(p; y 1=p ).
3.5. CONDITIONAL DISTRIBUTIONS 89

Solution
Since Y Gamma( ; 1 )
y 1e y
f2 (y) = for y > 0
( )
and 0 otherwise.
Since the conditional distribution of X given Y = y is Weibull(p; y 1=p )

yxp
f1 (xjy) = pyxp 1
e for x > 0

By the Product Rule the joint probability density function of X and Y is

f (x; y) = f1 (xjy) f2 (y)

y 1e y
yxp
= pyxp 1
e
( )
p xp 1 y( +xp )
= y e
( )

The support set is

A = f(x; y) : x > 0; y > 0g
which is a rectangular region.
The marginal probability function of X is
Z1
f1 (x) = f (x; y) dy
1
Z1
p xp 1 y( +xp )
= y e dy let u = y ( + xp )
( )
0
Z1
p xp 1 u u 1
= e du
( ) + xp + xp
0
+1 Z
1
p xp 1 1 u
= u e du
( ) + xp
0
+1
p xp 1 1
= ( + 1) by 2.4.8
( ) + xp
p xp 1
= +1 since ( + 1) = ( )
( + xp )

for x > 0 and 0 otherwise. This distribution is a member of the Burr family of distributions
which is frequently used by actuaries for modeling household income, crop prices, insurance
risk, and many other …nancial variables.
90 3. MULTIVARIATE RANDOM VARIABLES

The following theorem gives us one more method for determining whether two random
variables are independent.

3.5.8 Theorem
Suppose X and Y are random variables with marginal probability (density) functions f1 (x)
and f2 (y) respectively and conditional probability (density) functions f1 (xjy) and f2 (yjx).
Suppose also that A1 is the support set of X, and A2 is the support set of Y . Then X and
Y are independent random variables if and only if either of the following holds

f1 (xjy) = f1 (x) for all x 2 A1

or
f2 (yjx) = f2 (y) for all y 2 A2

3.5.9 Example
Suppose the conditional distribution of X given Y = y is

e x
f1 (xjy) = y
for 0 < x < y
1 e
and 0 otherwise. Are X and Y independent random variables?

Solution
Since the conditional distribution of X given Y = y depends on y then f1 (xjy) = f1 (x)
cannot hold for all x in the support set of X and therefore X and Y are not independent
random variables.

3.6 Joint Expectations

As with univariate random variables we de…ne the expectation operator for bivariate random
variables. The discrete case is a review of material you would have seen in a previous
probability course.

3.6.1 De…nition - Joint Expectation

Suppose h(x; y) is a real-valued function.
If X and Y are discrete random variables with joint probability function f (x; y) and support
set A then
P P
E[h(X; Y )] = h(x; y)f (x; y)
(x;y) 2A

provided the joint sum converges absolutely.

3.6. JOINT EXPECTATIONS 91

If X and Y are continuous random variables with joint probability density function f (x; y)
then
Z1 Z1
E[h(X; Y )] = h(x; y)f (x; y)dxdy
1 1

provided the joint integral converges absolutely.

3.6.2 Theorem
Suppose X and Y are random variables with joint probability (density) function f (x; y), a
and b are real constants, and g(x; y) and h(x; y) are real-valued functions. Then

E[ag(X; Y ) + bh(X; Y )] = aE[g(X; Y )] + bE[h(X; Y )]

Proof (Continuous Case)

Z1 Z1
E[ag(X; Y ) + bh(X; Y )] = [ag(x; y) + bh(x; y)] f (x; y)dxdy
1 1
Z1 Z1 Z1 Z1
= a g(x; y)f (x; y)dxdy + b h(x; y)f (x; y)dxdy
1 1 1 1
by properties of double integrals
= aE[g(X; Y )] + bE[h(X; Y )] by De…nition 3.6.1

3.6.3 Corollary
(1)
E(aX + bY ) = aE(X) + bE(Y ) = a X +b Y

where X = E(X) and Y = E(Y ).

(2) If X1 ; X2 ; : : : ; Xn are random variables and a1 ; a2 ; : : : ; an are real constants then

P
n P
n P
n
E ai Xi = ai E(Xi ) = ai i
i=1 i=1 i=1

where i = E(Xi ).
(3) If X1 ; X2 ; : : : ; Xn are random variables with E(Xi ) = , i = 1; 2; : : : ; n then

1 Pn 1 Pn n
E X = E (Xi ) = = =
n i=1 n i=1 n
92 3. MULTIVARIATE RANDOM VARIABLES

Proof of (1) (Continuous Case)

Z1 Z1
E(aX + bY ) = (ax + by) f (x; y)dxdy
1 1
2 3 2 3
Z1 Z1 Z1 Z1
= a x4 f (x; y)dy 5 dx + b y4 f (x; y)dx5 dy
1 1 1 1
Z1 Z1
= a xf1 (x) dx + b yf2 (y) dy
1 1
= aE(X) + bE(Y ) = a X +b Y

3.6.4 Theorem - Expectation and Independence

(1) If X and Y are independent random variables and g(x) and h(y) are real valued functions
then
E [g (X) h (Y )] = E [g (X)] E [h (Y )]

(2) More generally if X1 ; X2 ; : : : ; Xn are independent random variables and h1 ; h2 ; : : : ; hn

are real valued functions then

Q
n Q
n
E hi (Xi ) = E [hi (Xi )]
i=1 i=1

Proof of (1) (Continuous Case)

Since X and Y are independent random variables then by Theorem 3.4.2

f (x; y) = f1 (x) f2 (y) for all (x; y) 2 A

where A is the support set of (X; Y ). Therefore

Z1 Z1
E [g(X)h(Y )] = g(x)h(y)f1 (x) f2 (y) dxdy
1 1
2 3
Z1 Z1
= h(y)f2 (y) 4 g(x)f1 (x) dx5 dy
1 1
Z1
= E [g (X)] h(y)f2 (y) dy
1
= E [g (X)] E [h (Y )]
3.6. JOINT EXPECTATIONS 93

3.6.5 De…nition - Covariance

The covariance of random variables X and Y is de…ned by

Cov(X; Y ) = E[(X X )(Y Y )]

If Cov(X; Y ) = 0 then X and Y are called uncorrelated random variables.

3.6.6 Theorem - Covariance and Independence

If X and Y are random variables then

Cov(X; Y ) = E(XY ) X Y

If X and Y are independent random variables then Cov(X; Y ) = 0.

Proof

Cov(X; Y ) = E [(X X ) (Y Y )]
= E (XY XY X Y + X Y)
= E(XY ) X E(Y ) Y E(X) + X Y
= E(XY ) E(X)E(Y ) E(Y )E(X) + E(X)E(Y )
= E(XY ) E(X)E(Y )

Now if X and Y are independent random variables then by Theorem 3.6.4

E (XY ) = E(X)E(Y ) and therefore Cov(X; Y ) = 0.

3.6.7 Theorem - Variance of a Linear Combination

(1) Suppose X and Y are random variables and a and b are real constants then

V ar(aX + bY ) = a2 V ar(X) + b2 V ar(Y ) + 2abCov(X; Y )

= a2 2
X + b2 2
Y + 2abCov(X; Y )

(2) Suppose X1 ; X2 ; : : : ; Xn are random variables with V ar(Xi ) = 2 and a1 ; a2 ; : : : ; an are

i
real constants then
P
n P
n nP1 P
n
V ar ai Xi = a2i 2
i +2 ai aj Cov (Xi ; Xj )
i=1 i=1 i=1 j=i+1

(3) If X1 ; X2 ; : : : ; Xn are independents random variables and a1 ; a2 ; : : : ; an are real constants

then
Pn P
n
V ar ai Xi = a2i 2i
i=1 i=1
94 3. MULTIVARIATE RANDOM VARIABLES

(4) If X1 ; X2 ; : : : ; Xn are independent random variables with V ar(Xi ) = 2, i = 1; 2; : : : ; n

then
2 n
1 Pn 1 P
V ar X = V ar Xi = V ar (Xi )
n i=1 n i=1
1 P n
2 n 2
= =
n2 i=1 n2
2
=
n

Proof of (1)
h i
V ar (aX + bY ) = E (aX + bY a X b Y )2
n o
2
= E [a (X X ) + b (Y Y )]
h i
2 2
= E a2 (X X ) + b (Y
2
Y ) + 2ab (X X ) (Y Y )
h i h i
2 2
= a2 E (X X) + b2 E (Y Y) + 2abE [(X X ) (Y Y )]

= a2 2
X + b2 2
Y + 2abCov (X; Y )

3.6.8 De…nition - Correlation Coe¢ cient

The correlation coe¢ cient of random variables X and Y is de…ned by
Cov(X; Y )
(X; Y ) =
X Y

3.6.9 Example
For the joint probability density function in Example 3.3.3 …nd (X; Y ).

Solution
Z1 Z1 Z1 Z1
E (XY ) = xyf (x; y)dxdy = xy (x + y) dxdy
1 1 0 0
Z1 Z1 Z1
1 3 1
= x2 y + xy 2 dxdy = x y + x2 y 2 j10 dy
3 2
0 0 0
Z1
1 1 1 2 1 3 1 1 1
= y + y 2 dy = y + y j0 = +
3 2 6 6 6 6
0
1
=
3
3.6. JOINT EXPECTATIONS 95

Z1 Z1 Z1
1 1
E (X) = xf1 (x)dx = x x+ dx = x2 + x dx
2 2
1 0 0
1 3 1 2 1
= x + x j0
3 4
1 1 7
= + =
3 4 12

Z1 Z1 Z1
2 2 2 1 1
E X = x f1 (x)dx = x x+ dx = x3 + x2 dx
2 2
1 0 0
1 4 1 3 1
= x + x j0
4 6
1 1 3+2 5
= + = =
4 6 12 12

2
5 7
V ar (X) = E X 2 [E (X)]2 =
12 12
60 49 11
= =
144 144
By symmetry
7 11
E (Y ) = and V ar (Y ) =
12 144
Therefore
1 7 7 48 49 1
Cov (X; Y ) = E (XY ) E (X) E (Y ) = = =
3 12 12 144 144

and
Cov(X; Y )
(X; Y ) =
X Y
1
q 144
=
11 11
144 144
1 144
=
144 11
1
=
11

3.6.10 Exercise
For the joint probability density function in Exercise 3.3.7 …nd (X; Y ).
96 3. MULTIVARIATE RANDOM VARIABLES

3.6.11 Theorem
If (X; Y ) is the correlation coe¢ cient of random variables X and Y then

1 (X; Y ) 1

(X; Y ) = 1 if and only if Y = aX + b for some a > 0 and (X; Y ) = 1 if and only if
Y = aX + b for some a < 0.

Proof
Let S = X + tY , where t 2 <. Then E (S) = S and
2
V ar(S) = E (S S)
2
= Ef[(X + tY ) ( X +t Y )] g
2
= Ef[(X X ) + t(Y Y )] g
2
= E (X X ) + 2t(X X )(Y Y) + t2 (Y Y)
2

= t2 2
Y + 2Cov(X; Y )t + 2
X

Now V ar(S) 0 for any t 2 < implies that the quadratic equation V ar(S) = t2 2Y +
2Cov(X; Y )t + 2X in the variable t must have at most one real root. To have at most
one real root the discrimant of this quadratic equation must be less than or equal to zero.
Therefore
[2Cov(X; Y )]2 4 2X 2Y 0
or
[Cov(X; Y )]2 2 2
X Y

or
Cov(X; Y )
j (X; Y )j = 1
X Y
and therefore
1 (X; Y ) 1
To see that (X; Y ) = 1 corresponds to a linear relationship between X and Y , note
that (X; Y ) = 1 implies
jCov(X; Y )j = X Y
and therefore
[2Cov(X; Y )]2 4 2 2
X Y =0
which corresponds to a zero discriminant in the quadratic equation. This means that there
exists one real number t for which

V ar(S) = V ar(X + t Y ) = 0

But V ar(X + t Y ) = 0 implies X + t Y must equal a constant, that is, X + t Y = c. Thus

X and Y satisfy a linear relationship.
3.7. CONDITIONAL EXPECTATION 97

3.7 Conditional Expectation

Since conditional probability (density) functions are also probability (density) function,
expectations can be de…ned in terms of these conditional probability (density) functions as
in the following de…nition.

3.7.1 De…nition - Conditional Expectation

The conditional expectation of g(X) given Y = y is given by

P
E [g(X)jy] = g(x)f1 (xjy)
x

if Y is a discrete random variable and

Z1
E[g(X)jy] = g(x)f1 (xjy)dx
1

if Y is a continuous random variable provided the sum/integral converges absolutely.

The conditional expectation of h(Y ) given X = x is de…ned in a similar manner.

3.7.2 Special Cases

(1) The conditional mean of X given Y = y is denoted by E (Xjy).

(2) The conditional variance of X given Y = y is denoted by V ar (Xjy) and is given by
n o
V ar (Xjy) = E [X E (Xjy)]2 jy
= E X 2 jy [E (Xjy)]2

3.7.3 Example

For the joint probability density function in Example 3.4.8 …nd E (Y jx) the conditional
mean of Y given X = x, and V ar (Y jx) the conditional variance of Y given X = x.

Solution
From Example 3.5.2 we have

1 p p
f2 (yjx) = p for 1 x2 < y < 1 x2 ; 0 < x < 1
2 1 x2
98 3. MULTIVARIATE RANDOM VARIABLES

Therefore
Z1
E (Y jx) = yf2 (yjx)dy
1
p
Z1 x2
1
= y p dy
p 2 1 x2
1 x2
p
Z1 x2
1
= p ydy
2 1 x2 p
1 x2
p
1 1 x2
= p y2j p 1 x2
1 x2
= 0

Since E (Y jx) = 0
Z1
2
V ar (Y jx) = E Y jx = y 2 f2 (yjx)dy
1
p
Z1 x2
1
= y2 p dy
p 2 1 x2
1 x2
p
Z1 x2
1
= p y 2 dy
2 1 x2 p
1 x2
p
1 1 x2
= p y3j p 1 x2
6 1 x2
1
= 1 x2
3
p p
Recall that the conditional distribution of Y given X = x is Uniform 1 x2 ; 1 x2 .
a+b
The results above can be veri…ed by noting that if U Uniform(a; b) then E (U ) = 2
(b a)2
and V ar (U ) = 12 .

3.7.4 Exercise
In Exercises 3.5.3 and 3.3.7 …nd E (Y jx), V ar (Y jx), E (Xjy) and V ar (Xjy).

3.7.5 Theorem
If X and Y are independent random variables then E [g (X) jy] = E [g (X)] and
E [h (Y ) jx] = E [h (Y )].
3.7. CONDITIONAL EXPECTATION 99

Proof (Continuous Case)

Z1
E[g(X)jy] = g(x)f1 (xjy)dx
1
Z1
= g(x)f1 (x)dx by Theorem 3.5.8
1
= E [g (X)]

as required.
E [h (Y ) jx] = E [h (Y )] follows in a similar manner.

3.7.6 De…nition

E [g (X) jY ] is the function of the random variable Y whose value is E [g (X) jy] when Y = y.
This means of course that E [g (X) jY ] is a random variable.

3.7.7 Example

In Example 3.5.6 the joint model was speci…ed by Y Poisson( ) and

XjY = y Binomial(y; p) and we showed that X Poisson(p ). Determine E (XjY = y),
E (XjY ), E[E(XjY )], and E (X). What do you notice about E[E(XjY )] and E (X).

Solution
Since XjY = y Binomial(y; p)

E (XjY = y) = py

and
E (XjY ) = pY

which is a random variable. Since Y Poisson( ) and E (Y ) =

E [E (XjY )] = E (pY ) = pE (Y ) = p

Now since X Poisson(p ) then E (X) = p .

We notice that
E [E (XjY )] = E (X)

The following theorem indicates that this result holds generally.

100 3. MULTIVARIATE RANDOM VARIABLES

3.7.8 Theorem
Suppose X and Y are random variables then
E fE [g (X) jY ]g = E [g (X)]
Proof (Continuous Case)
2 3
Z1
E fE [g (X) jY ]g = E 4 g (x) f1 (xjy) dx5
1
2 3
Z1 Z1
= 4 g (x) f1 (xjy) dx5 f2 (y) dy
1 1

Z1 Z1
= g (x) f1 (xjy) f2 (y) dxdy
1 1
2 3
Z1 Z1
= g (x) 4 f (x; y) dy 5 dx
1 1
Z1
= g (x) f1 (x) dx
1
= E [g (X)]

3.7.9 Corollary - Law of Total Expectation

Suppose X and Y are random variables then
E[E(XjY )] = E(X)
Proof
Let g (X) = X in Theorem 3.7.8 and the result follows.

3.7.10 Theorem - Law of Total Variance

Suppose X and Y are random variables then
V ar(X) = E[V ar(XjY )] + V ar[E(XjY )]
Proof
V ar (X) = E X 2 [E (X)]2
= E E X 2 jY fE [E (XjY )]g2 by Theorem 3.7.8
n o n o
= E E X 2 jY E [E (XjY )]2 + E [E(XjY )]2 fE [E(XjY )]g2
= E[V ar(XjY )] + V ar[E(XjY )]
3.7. CONDITIONAL EXPECTATION 101

When the joint model is speci…ed in terms of a conditional distribution XjY = y and
a marginal distribution for Y then Theorems 3.7.8 and 3.7.10 give a method for …nding
expectations for functions of X without having to determine the marginal distribution for
X.

3.7.11 Example
Suppose P Uniform(0; 0:1) and Y jP = p Binomial(10; p). Find E(Y ) and V ar(Y ).

Solution
Since P Uniform(0; 0:1)

0 + 0:1 1 (0:1 0)2 1

E (P ) = = , V ar (P ) = =
2 20 12 1200
and
2
1 1
E P2 = V ar (P ) + [E (P )]2 = +
1200 20
1 1 1+3 4 1
= + = = =
1200 400 1200 1200 300
Since Y jP = p Binomial(10; p)

E (Y jp) = 10p, E (Y jP ) = 10P

and
V ar (Y jp) = 10p (1 p) , V ar (Y jP ) = 10P (1 P ) = 10 P P2
Therefore
1 1
E (Y ) = E [E (Y jP )] = E (10P ) = 10E (P ) = 10 =
20 2
and

V ar(Y ) = E[V ar(Y jP )] + V ar[E(Y jP )]

= E 10 P P2 + V ar (10P )
= 10 E (P ) E P 2 + 100V ar (P )
1 1 1 11
= 10 + 100 =
20 300 1200 20

3.7.12 Exercise
In Example 3.5.7 …nd E (X) and V ar (X) using Corollary 3.7.9 and Theorem 3.7.10.

3.7.13 Exercise
Suppose P Beta(a; b) and Y jP = p Geometric(p). Find E(Y ) and V ar(Y ).
102 3. MULTIVARIATE RANDOM VARIABLES

3.8 Joint Moment Generating Functions

Moment generating functions can also be de…ned for bivariate and multivariate random
variables. As mentioned previously, moment generating functions are a powerful tool for
determining the distributions of functions of random variables (Chapter 4), particularly
sums, as well as determining the limiting distribution of a sequence of random variables
(Chapter 5).

3.8.1 De…nition - Joint Moment Generating Function

If X and Y are random variables then

M (t1 ; t2 ) = E et1 X+t2 Y

is called the joint moment generating function of X and Y if this expectation exists (joint
sum/integral converges absolutely) for all t1 2 ( h1 ; h1 ) for some h1 > 0, and all
t2 2 ( h2 ; h2 ) for some h2 > 0.
More generally if X1 ; X2 ; : : : ; Xn are random variables then
P
n
M (t1 ; t2 ; : : : ; tn ) = E exp ti Xi
i=1

is called the joint moment generating function of X1 ; X2 ; : : : ; Xn if this expectation exists

for all ti 2 ( hi ; hi ) for some hi > 0, i = 1; 2; : : : ; n.

If the joint moment generating function is known that it is straightforward to obtain the
moment generating functions of the marginal distributions.

3.8.2 Important Note

If M (t1 ; t2 ) exists for all t1 2 ( h1 ; h1 ) for some h1 > 0, and all t2 2 ( h2 ; h2 ) for some
h2 > 0, then the moment generating function of X is given by

MX (t) = E(etX ) = M (t; 0) for t 2 ( h1 ; h1 )

and the moment generating function of Y is given by

MY (t) = E(etY ) = M (0; t) for t 2 ( h2 ; h2 )

3.8.3 Example
Suppose X and Y are continuous random variables with joint probability density function
y
f (x; y) = e for 0 < x < y < 1

and 0 otherwise.
3.8. JOINT MOMENT GENERATING FUNCTIONS 103

(a) Find the joint moment generating function of X and Y .

(b) What is the moment generating function of X and what is the marginal distribution of
X?
(c) What is the moment generating function of Y and what is the marginal distribution of
Y?

Solution
(a) The joint moment generating function is

M (t1 ; t2 ) = E et1 X+t2 Y

Z1 Z1
= et1 x+t2 y f (x; y) dxdy
1 1
Z1 Zy
= et1 x+t2 y e y
dxdy
y=0 x=0
0 1
Z1 Zy
= et2 y y @ et1 x dxA dy
0 0
Z1
1 t1 x y
= et2 y y
e j0 dy
t1
0
Z1
1
= et2 y y
et1 y 1 dy
t1
0
Z1
1 (1 t1 t2 )y (1 t2 )y
= e e dy
t1
0

which converges for t1 + t2 < 1 and t2 < 1

Therefore
1 1 1
M (t1 ; t2 ) = lim e (1 t1 t2 )y jb0 + e (1 t2 )y jb0
t1 b!1 1 t1 t2 1 t2
1 1 h i 1 h (1 i
= lim e (1 t1 t2 )b 1 + e t2 )b
1
t1 b!1 1 t1 t2 1 t2
1 1 1
=
t1 1 t1 t2 1 t2
1 (1 t2 ) (1 t1 t2 )
=
t1 (1 t1 t2 ) (1 t2 )
1
= for t1 + t2 < 1 and t2 < 1
(1 t1 t2 ) (1 t2 )
104 3. MULTIVARIATE RANDOM VARIABLES

(b) The moment generating function of X is

MX (t) = E(etX )
= M (t; 0)
1
=
(1 t 0) (1
0)
1
= for t < 1
1 t
By examining the list of moment generating functions in Chapter 11 we see that this is
the moment generating function of a Exponential(1) random variable. Therefore by the
Uniqueness Theorem for Moment Generating Functions, X has a Exponential(1) distribu-
tion.
(c) The moment generating function of Y is

MY (t) = E(etY )
= M (0; t)
1
=
(1 0 t) (1 t)
1
= for t < 1
(1 t)2

By examining the list of moment generating functions in Chapter 11 we see that this
is the moment generating function of a Gamma(2; 1) random variable. Therefore by the
Uniqueness Theorem for Moment Generating Functions, Y has a Gamma(2; 1) distribution.

3.8.4 Example
Suppose X and Y are continuous random variables with joint probability density function
x y
f (x; y) = e for x > 0, y > 0

and 0 otherwise.
(a) Find the joint moment generating function of X and Y .
(b) What is the moment generating function of X and what is the marginal distribution of
X?
(c) What is the moment generating function of Y and what is the marginal distribution of
Y?
3.8. JOINT MOMENT GENERATING FUNCTIONS 105

Solution
(a) The joint moment generating function is

Z1 Z1
t1 X+t2 Y
M (t1 ; t2 ) = E e = et1 x+t2 y f (x; y) dxdy
1 1
Z1 Z1
= et1 x+t2 y e x y
dxdy
0 0
0 10 1 1
Z1 Z
= @ e y(1 t2 ) A @
dy e x(1 t1 )
dxA which converges for t1 < 1, t2 < 1
0 0
! !
e y(1 t2 ) e x(1 t1 )
= lim jb0 lim jb0
b!1 (1 t2 ) b!1 (1 t1 )
1 1
= for t1 < 1, t2 < 1
1 t1 1 t2

(b) The moment generating function of X is

MX (t) = E(etX ) = M (t; 0)

1
=
(1 t) (1 0)
1
= for t < 1
1 t

By examining the list of moment generating functions in Chapter 11 we see that this is
the moment generating function of a Exponential(1) random variable. Therefore by the
Uniqueness Theorem for Moment Generating Functions, X has a Exponential(1) distribu-
tion.

(c) The moment generating function of Y is

MY (t) = E(etY ) = M (0; t)

1
=
(1 0) (1 t)
1
= for t < 1
1 t

3.8.5 Theorem
If X and Y are random variables with joint moment generating function M (t1 ; t2 ) which
exists for all t1 2 ( h1 ; h1 ) for some h1 > 0, and all t2 2 ( h2 ; h2 ) for some h2 > 0 then

@ j+k
E(X j Y k ) = M (t1 ; t2 )j(t1 ;t2 )=(0;0)
@tj1 @tk2

Proof
See Problem 11(a).

3.8.6 Independence Theorem for Moment Generating Functions

Suppose X and Y are random variables with joint moment generating function M (t1 ; t2 )
which exists for all t1 2 ( h1 ; h1 ) for some h1 > 0, and all t2 2 ( h2 ; h2 ) for some h2 > 0.
Then X and Y are independent random variables if and only if

M (t1 ; t2 ) = MX (t1 )MY (t2 )

for all t1 2 ( h1 ; h1 ) and t2 2 ( h2 ; h2 ) where MX (t1 ) = M (t1 ; 0) and MY (t2 ) = M (0; t2 ).

Proof
See Problem 11(b).

3.8.7 Example
Use Theorem 3.8.6 to determine if X and Y are independent random variables in Examples
3.8.3 and 3.8.4.

Solution
For Example 3.8.3
1
M (t1 ; t2 ) = for t1 + t2 < 1 and t2 < 1
(1 t1 t2 ) (1 t2 )
1
MX (t1 ) = for t1 < 1
1 t1
1
MY (t2 ) = for t2 < 1
(1 t2 )2

Since
3
1 1 1 8 1 1 1 1 4
M ; = 1 1 1 = 6= MX MY = 1 =
4 4 1 1 3 4 4 1 1 2 3
4 4 4 4 1 4

then by Theorem 3.8.6 X and Y are not independent random variables.

3.8. JOINT MOMENT GENERATING FUNCTIONS 107

For Example 3.8.4

1 1
M (t1 ; t2 ) = for t1 < 1, t1 < 1
1 t1 1 t2
1
MX (t1 ) = for t1 < 1
1 t1
1
MY (t2 ) = for t2 < 1
1 t2
Since
1 1
M (t1 ; t2 ) = = MX (t1 ) MY (t2 ) for all t1 < 1, t1 < 1
1 t1 1 t2
then by Theorem 3.8.6 X and Y are independent random variables.

3.8.8 Example
Suppose X1 ; X2 ; : : : ; Xn are independent and identically distributed random variables each
with moment generating function M (t), t 2 ( h; h) for some h > 0. Find M (t1 ; t2 ; : : : ; tn )
the joint moment generating function of X1 ; X2 ; : : : ; Xn . Find the moment generating
P n
function of T = Xi .
i=1

Solution
Since the Xi ’s are independent random variables each with moment generating function
M (t), t 2 ( h; h) for some h > 0, the joint moment generating function of X1 ; X2 ; : : : ; Xn
is
P
n
M (t1 ; t2 ; : : : ; tn ) = E exp ti Xi
i=1
Q
n
= E eti Xi
i=1
Q
n
= E eti Xi
i=1
Q
n
= M (ti ) for ti 2 ( h; h) ; i = 1; 2; : : : ; n for some h > 0
i=1
P
n
The moment generating function of T = Xi is
i=1

MT (t) = E etT
P
n
= E exp tXi
i=1
= M (t; t; : : : ; t)
Q
n
= M (t)
i=1
= [M (t)]n for t 2 ( h; h) for some h > 0
108 3. MULTIVARIATE RANDOM VARIABLES

3.9 Multinomial Distribution

The discrete multivariate distribution which is the most widely used is the Multinomial
distribution which was introduced in a previous probability course.
We give its joint probability function and its important properties here.

3.9.1 De…nition - Multinomial Distribution

Suppose (X1 ; X2 ; : : : ; Xk ) are discrete random variables with joint probability function

n!
f (x1 ; x2 ; : : : ; xk ) = px1 px2 pxk k
x1 !x2 ! xk ! 1 2

P
k P
k
for xi = 0; 1; : : : ; n; i = 1; 2; : : : ; k; xi = n; 0 pi 1; i = 1; 2; : : : ; k; pi = 1
i=1 i=1

Then (X1 ; X2 ; : : : ; Xk ) is said to have a Multinomial distribution.

We write (X1 ; X2 ; : : : ; Xk ) Multinomial(n; p1 ; p2 ; : : : ; pk ).

P
k
Notes: (1) Since Xi = n, the Multinomial distribution is actually a joint distribution
i=1
for k 1 random variables which can be written as
kP1
kP1 n xi
n! x i=1
f (x1 ; x2 ; : : : ; xk 1) = kP1
px1 1 px2 2 pk k 11 1 pi
i=1
x1 !x2 ! xk 1! n xi !
i=1

kP1 kP1
for xi = 0; 1; : : : ; n; i = 1; 2; : : : ; k 1; xi n; 0 pi 1; i = 1; 2; : : : ; k 1; pi 1
i=1 i=1

(2) If k = 2 we obtain the familiar Binomial distribution

n!
f (x1 ) = px1 (1 p1 )n x1
x1 ! (n x1 )! 1
n x1
= p (1 p1 )n x1
x1 1

for x1 = 0; 1; : : : ; n; 0 p1 1
(2) If k = 3 we obtain the Trinomial distribution

n!
f (x1 ; x2 ) = px1 1 px2 2 (1 p1 p2 )n x1 x2
x1 !x2 ! (n x1 x2 )!

for xi = 0; 1; : : : ; n; i = 1; 2, x1 + x2 n and 0 pi 1; i = 1; 2; p1 + p2 1
3.9. MULTINOMIAL DISTRIBUTION 109

3.9.2 Theorem - Properties of the Multinomial Distribution

If (X1 ; X2 ; : : : ; Xk ) Multinomial(n; p1 ; p2 ; : : : ; pk ), then
(1) (X1 ; X2 ; : : : ; Xk 1) has joint moment generating function

M (t1 ; t2 ; : : : ; tk 1) = E et1 X1 +t2 X2 + +tk 1 Xk 1

n
= p1 et1 + p2 et2 + pk 1e
tk 1
+ pk (3.5)

for (t1 ; t2 ; : : : ; tk 1) 2 <k 1.

(2) Any subset of X1 ; X2 ; : : : ; Xk also has a Multinomial distribution. In particular

Xi Binomial(n; pi ) for i = 1; 2; : : : ; k

(3) If T = Xi + Xj ; i 6= j; then

T Binomial (n; pi + pj )

(4)
Cov (Xi ; Xj ) = npi pj for i 6= j
(5) The conditional distribution of any subset of (X1 ; X2 ; : : : ; Xk ) given the remaining of
the coordinates is a Multinomial distribution. In particular the conditional probability
function of Xi given Xj = xj ; i 6= j; is

pi
Xi jXj = xj Binomial n xj ;
1 pj

(6) The conditional distribution of Xi given T = Xi + Xj = t; i 6= j; is

pi
Xi jXi + Xj = t Binomial t;
pi + pj

3.9.3 Example
Suppose (X1 ; X2 ; : : : ; Xk ) Multinomial(n; p1 ; p2 ; : : : ; pk )
(a) Prove (X1 ; X2 ; : : : ; Xk 1) has joint moment generating function
n
M (t1 ; t2 ; : : : ; tk 1) = p1 et1 + p2 et2 + pk 1e
tk 1
+ pk

for (t1 ; t2 ; : : : ; tk 1) 2 <k 1.

(b) Prove (X1 ; X2 ; X3 ) Multinomial(n; p1 ; p1 ; 1 p1 p2 ).

(c) Prove Xi Binomial(n; pi ) for i = 1; 2; : : : ; k.
(d) Prove T = X1 + X2 Binomial(n; p1 + p2 ).
110 3. MULTIVARIATE RANDOM VARIABLES

Solution
P
k
(a) Let A = (x1 ; x2 ; : : : ; xk ) : xi = 0; 1; : : : ; n; i = 1; 2; : : : ; k; xi = n then
i=1

M (t1 ; t2 ; : : : ; tk 1) = E et1 X1 +t2 X2 + +tk 1 Xk 1

P P P n!
= et1 x1 +t2 x2 + +tk 1 xk 1
p x1 p x2 pkxk 1 1 pxk k
(x1 ;x2 ; :::;xk ) 2A x1 !x2 ! xk ! 1 2
P P P n! x1 x2 xk
= p1 et1 p2 et2 pk 1e
tk 1 1
pxk k
(x1 ;x2 ; :::;xk ) x !x
2A 1 2 ! xk !
n
= p1 et1 + p2 et2 + pk 1e
tk 1
+ pk for (t1 ; t2 ; : : : ; tk 1) 2 <k

by the Multinomial Theorem 2.11.5.

(b) The joint moment generating function of (X1 ; X2 ; X3 ) is
n
M (t1 ; t2 ; 0; : : : ; 0) = p1 et1 + p2 et2 + (1 p1 p2 ) for (t1 ; t2 ) 2 <2

which is of the form 3.5 so by the Uniqueness Theorem for Moment Generating Functions,
(X1 ; X2 ; X3 ) Multinomial(n; p1 ; p1 ; 1 p1 p2 ).
(c) The moment generating function of Xi is
n
M (0; 0; : : : ; t; 0; : : : ; 0) = pi eti + (1 pi ) for ti 2 <

for i = 1; 2; : : : ; k which is the moment generating function of a Binomial(n; pi ) random vari-

able. By the Uniqueness Theorem for Moment Generating Functions, Xi Binomial(n; pi )
for i = 1; 2; : : : ; k.
(d) The moment generating function of T = X1 + X2 is

MT (t) = E etT = E et(X1 +X2 )

= E etX1 +tX2
= M (t; t; 0; 0; : : : ; 0)
n
= p1 et + p2 et + (1 p1 p2 )
n
= (p1 + p2 ) et + (1 p1 p2 ) for t 2 <

which is the moment generating function of a Binomial(n; p1 + p2 ) random variable. By

the Uniqueness Theorem for Moment Generating Functions, T Binomial(n; p1 + p2 ).

3.9.4 Exercise
Prove property (3) in Theorem 3.9.2.
3.10. BIVARIATE NORMAL DISTRIBUTION 111

3.10 Bivariate Normal Distribution

The best known bivariate continuous distribution is the Bivariate Normal distribution. We
give its joint probability density function written in vector notation so we can easily intro-
duce the multivariate version of this distribution called the Multivariate Normal distribution
in Chapter 7.

3.10.1 De…nition - Bivariate Normal Distribution (BVN)

Suppose X1 and X2 are random variables with joint probability density function

1 1 1
f (x1 ; x2 ) = exp (x ) (x )T for (x1 ; x2 ) 2 <2
2 j j1=2 2

where " #
h i h i 2
1 1 2
x= x1 x2 ; = 1 2 ; = 2
1 2 2

and is an nonsingular matrix. Then X = (X1 ; X2 ) is said to have a bivariate normal

distribution. We write X BVN( ; ).

The Bivariate Normal distribution has many special properties.

3.10.2 Theorem - Properties of the BVN Distribution

If X BVN( ; ), then
(1) X1 ; X2 has joint moment generating function

M (t1 ; t2 ) = E et1 X1 +t2 X2 = E exp XtT

1
= exp tT + t tT for all t = (t1 ; t2 ) 2 <2
2

(2) X1 N( 2) 2 ).
1; 1 and X2 N( 2; 2
(3) Cov (X1 ; X2 ) = 1 2 and Cor (X1 ; X2 ) = where 1 1.
(4) X1 and X2 are independent random variables if and only if = 0.
(5) If c = (c1 ; c2 ) is a nonzero vector of constants then

c1 X1 + c2 X2 N cT ; c cT

(6) If A is a 2 2 nonsingular matrix and b is a 1 2 vector then

XA + b BVN A + b; AT A

(7)
2 2
X2 jX1 = x1 N 2 + 2 (x1 1 )= 1 ; 2 (1 )
112 3. MULTIVARIATE RANDOM VARIABLES

and
2 2
X1 jX2 = x2 N 1 + 1 (x2 2 )= 2 ; 1 (1 )

(8) (X ) 1 (X )T 2 (2)

Proof
For proofs of properties (1) (4) and (6) (7) see Problem 13.
(5) The moment generating function of c1 X1 + c2 X2 is

E et(c1 X1 +c2 X2 )

= E e(c1 t)X1 +(c2 t)X2

" # " #!
h i c t 1h i c1 t
1
= exp 1 2 + c1 t c2 t
c2 t 2 c2 t
" # " # !
h i c 1h i c1
1
= exp 1 2 t+ c1 c2 t2
c2 2 c2
1
= exp cT t + c cT t2 for t 2 < where c = (c1 ; c2 )
2

which is the moment generating function of a N cT ; c cT random variable. Therefore by

the Uniqueness Theorem for Moment Generating Functions, c1 X1 + c2 X2 N cT ; c cT .

The BVN joint probability density function is graphed in Figures 3.8 - 3.10.

0.2

0.15
f(x,y)

0.1

0.05

0
3 3
2 2
1 1
0 0
-1 -1
-2 -2
-3 -3
y x

" #
h i 1 0
Figure 3.8: Graph of BVN p.d.f. with = 0 0 and =
0 1

The graphs all have the same mean vector = [0 0] but di¤erent variance/covariance
matrices . The axes all have the same scale.
3.10. BIVARIATE NORMAL DISTRIBUTION 113

0.2

0.15

f(x,y)
0.1

0.05

0
3 3
2 2
1 1
0 0
-1 -1
-2 -2
-3 -3
y x

" #
h i 1 0:5
Figure 3.9: Graph of BVN p.d.f. with = 0 0 and =
0:5 1

0.2

0.15
f(x,y)

0.1

0.05

0
3 3
2 2
1 1
0 0
-1 -1
-2 -2
-3 -3
y x

" #
h i 0:6 0:5
Figure 3.10: Graph of BVN p.d.f. with = 0 0 and =
0:5 1
114 3. MULTIVARIATE RANDOM VARIABLES

3.11 Calculus Review

Consider the region R in the xy-plane in Figure 3.11.

y=g(x)

y R

x=a y=h(x)
x=b

Figure 3.11: Region 1

Suppose f (x; y) 0 for all (x; y) 2 <2 . The graph of z = f (x; y) is a surface in 3-space
lying above or touching the xy-plane. The volume of the solid bounded by the surface
z = f (x; y) and the xy-plane above the region R is given by

Zb h(x)
Z
Volume = f (x; y)dydx
x=a y=g(x)

y=b

y
x=g(y) R x=h(y)

y=a

Figure 3.12: Region 2

3.11. CALCULUS REVIEW 115

If R is the region in Figure 3.12 then the volume is given by

Zd h(y)
Z
Volume = f (x; y)dxdy
y=c x=g(y)

Give an expression for the volume of the solid bounded by the surface z = f (x; y) and
the xy-plane above the region R = R1 [ R2 in Figure 3.13.

y=g(x) x=h(y)

y
b2 R2

R
1

a1 a2 a3
x

Figure 3.13: Region 3

116 3. MULTIVARIATE RANDOM VARIABLES

3.12 Chapter 3 Problems

1. Suppose X and Y are discrete random variables with joint probability function

f (x; y) = kq 2 px+y for x = 0; 1; : : : ; y = 0; 1; : : : ; 0 < p < 1; q = 1 p

(a) Determine the value of k.

(b) Find the marginal probability function of X and the marginal probability func-
tion of Y . Are X and Y independent random variables?
(c) Find P (X = xjX + Y = t).

2. Suppose X and Y are discrete random variables with joint probability function

e 2
f (x; y) = for x = 0; 1; : : : ; y; y = 0; 1; : : :
x!(y x)!

(a) Find the marginal probability function of X and the marginal probability func-
tion of Y .
(b) Are X and Y independent random variables?

3. Suppose X and Y are continuous random variables with joint probability density
function
f (x; y) = k(x2 + y) for 0 < y < 1 x2 ; 1<x<1

(a) Determine k:
(b) Find the marginal probability density function of X and the marginal probability
density function of Y .
(c) Are X and Y independent random variables?
(d) Find P (Y X + 1).

4. Suppose X and Y are continuous random variables with joint probability density
function
f (x; y) = kx2 y for x2 < y < 1

(a) Determine k.
(b) Find the marginal probability density function of X and the marginal probability
density function of Y .
(c) Are X and Y independent random variables?
(d) Find P (X Y ).
(e) Find the conditional probability density function of X given Y = y and the
conditional probability density function of Y given X = x.
3.12. CHAPTER 3 PROBLEMS 117

5. Suppose X and Y are continuous random variables with joint probability density
function
f (x; y) = kxe y for 0 < x < 1; 0 < y < 1

(a) Determine k.
(b) Find the marginal probability density function of X and the marginal probability
density function of Y .
(c) Are X and Y independent random variables?
(d) Find P (X + Y t).

6. Suppose each of the following functions is a joint probability density function for
continuous random variables X and Y .

(a) f (x; y) = k for 0 < x < y < 1

(b) f (x; y) = kx for 0 < x2 < y < 1
(c) f (x; y) = kxy for 0 < y < x < 1
(d) f (x; y) = k (x + y) for 0 < x < y < 1
k
(e) f (x; y) = x for 0 < y < x < 1
(f) f (x; y) = kx2 y for 0 < x < 1; 0 < y < 1; 0 < x + y < 1
(g) f (x; y) = ke x 2y for 0 < y < x < 1
In each case:
(i) Determine k.
(ii) Find the marginal probability density function of X and the marginal prob-
ability density function of Y .
(iii) Find the conditional probability density function of X given Y = y and the
conditional probability density function of Y given X = x.
(iv) Find E (Xjy) and E (Y jx).

7. Suppose X Uniform(0; 1) and the conditional probability density function of Y

given X = x is
1
f2 (yjx) = for 0 < x < y < 1
1 x
Determine:

(a) the joint probability density function of X and Y

(b) the marginal probability density function of Y
(c) the conditional probability density function of X given Y = y.
118 3. MULTIVARIATE RANDOM VARIABLES

8. Suppose X and Y are continuous random variables. Suppose also that the marginal
probability density function of X is
1
f1 (x) = (1 + 4x) for 0 < x < 1
3
and the conditional probability density function of Y given X = x is
2y + 4x
f2 (yjx) = for 0 < x < 1; 0 < y < 1
1 + 4x
Determine:

(a) the joint probability density function of X and Y

(b) the marginal probability density function of Y
(c) the conditional probability density function of X given Y = y.

9. Suppose that Beta(a; b) and Y j Binomial(n; ). Find E(Y ) and V ar(Y ).

10. Assume that Y denotes the number of bacteria in a cubic centimeter of liquid and
that Y j Poisson( ). Further assume that varies from location to location and
Gamma( ; ).

(a) Find E(Y ) and V ar(Y ).

(b) If is a positive integer then show that the marginal probability function of Y
is Negative Binomial.

11. Suppose X and Y are random variables with joint moment generating function
M (t1 ; t2 ) which exists for all jt1 j < h1 and jt2 j < h2 for some h1 ; h2 > 0.

(a) Show that

@ j+k
E(X j Y k ) = M (t1 ; t2 )j(t1 ;t2 )=(0;0)
@tj1 @tk2
(b) Prove that X and Y are independent random variables if and only if
M (t1 ; t2 ) = MX (t1 )MY (t2 ).
(c) If (X1 ; X2 ; X3 ) Multinomial(n; p1 ; p2 ; p3 ) …nd Cov(X1 ; X2 ).

12. Suppose X and Y are discrete random variables with joint probability function

e 2
f (x; y) = for x = 0; 1; : : : ; y; y = 0; 1; : : :
x!(y x)!

(a) Find the joint moment generating function of X and Y .

(b) Find Cov(X; Y ).
3.12. CHAPTER 3 PROBLEMS 119

13. Suppose X = (X1 ; X2 ) BVN( ; ).

(a) Let t = (t1 ; t2 ). Use matrix multiplication to verify that

(x ) 1
(x )T 2xtT
= [x ( + t )] 1
[x ( + t )]T 2 tT t tT

Use this identity to show that the joint moment generating function of X1 and
X2 is

M (t1 ; t2 ) = E et1 X1 +t2 X2 = E exp XtT

1
= exp tT + t tT for all t = (t1 ; t2 ) 2 <2
2

(b) Use moment generating functions to show X1 N( 2) 2 ).

1; 1 and X2 N( 2; 2

(c) Use moment generating functions to show Cov(X1 ; X2 ) = 1 2. Hint: Use the
result in Problem 11(a).
(d) Use moment generating functions to show that X1 and X2 are independent
random variables if and only if = 0.
(e) Let A be a 2 2 nonsingular matrix and b be a 1 2 vector. Use the moment
generating function to show that

XA + b BVN A + b; AT A

(f) Verify that

2 2
x1 1 2
(x ) 1
(x )T 1
= 2 (1 2)
x2 2+ (x1 1)
1 2 1

and thus show that the conditional distribution of X2 given X1 = x1 is

N( 2 + 2 (x1 2 2 )). Note that by symmetry the conditional
1 )= 1 ; 2 (1
distribution of X1 given X2 = x2 is N( 1 + 1 (x2 2 2 )).
2 )= 2 ; 1 (1

14. Suppose X and Y are continuous random variables with joint probability density
function
f (x; y) = 2e x y for 0 < x < y < 1

(a) Find the joint moment generating function of X and Y .

(b) Determine the marginal distributions of X and Y .
(c) Find Cov(X; Y ).
120 3. MULTIVARIATE RANDOM VARIABLES
4. Functions of Two or More
Random Variables

In this chapter we look at techniques for determining the distributions of functions of

two or more random variables. These techniques are extremely important for determining
the distributions of estimators such as maximum likelihood estimators, the distributions
of pivotal quantities for constructing con…dence intervals, and the distributions of test
statistics for testing hypotheses.
In Section 4:1 we extend the cumulative distribution function technique introduced in
Section 2.6 to a function of two or more random variables. In Section 4:2 we look at a
method for determining the distribution of a one-to-one transformation of two or more
random variables which is an extension of the result in Theorem 2.6.8. In particular we
show how the t distribution, which you would have used in your previous statistics course,
arises as the ratio of two independent Chi-squared random variables each divided by their
degrees of freedom. In Section 4:3 we see how moment generating functions can be used for
determining the distribution of a sum of random variables which is an extension of Theorem
2.10.4. In particular we prove that a linear combination of independent Normal random
variables has a Normal distribution. This is a result which was used extensively in previous
probability and statistics courses.

4.1 Cumulative Distribution Function Technique

Suppose X1 ; X2 ; : : : ; Xn are continuous random variables with joint probability density
function f (x1 ; x2 ; : : : ; xn ). The probability density function of Y = h (X1 ; X2 ; : : : ; Xn )
can be determined using the cumulative distribution function technique that was used in
Section 2.6 for the case n = 1.

4.1.1 Example
Suppose X and Y are continuous random variables with joint probability density function

f (x; y) = 3y for 0 < x < y < 1

and 0 otherwise. Determine the probability density function of T = XY .

121
122 4. FUNCTIONS OF TWO OR MORE RANDOM VARIABLES

Solution
The support set of (X; Y ) is A = f(x; y) : 0 < x < y < 1g which is the union of the regions
E and F shown in Figure 4.1

F
x=t/y x=y

E
√t -

1 x
0

Figure 4.1: Support set for Example 4.1.1

For 0 < t < 1

Z Z
G (t) = P (T t) = P (XY t) = 3ydxdy
(x;y) 2 E

Due to the shape of the region E, the double integral over the region E would have to be
written as the sum of two double integrals. It is easier to …nd G (t) using
Z Z
G (t) = 3ydxdy
(x;y) 2 E
Z Z
= 1 3ydxdy
(x;y) 2 F
Z1 Zy Z1
= 1 3ydxdy = 1 3y xjyt=y dy
p p
y= t x=t=y t
Z1 Z1
t
= 1 3y y dy = 1 3y 2 3t dy
p
y p
t t
3
= 1 y 3ty j1p t
=1 1 3t t3=2 3t3=2
= 3t 2t3=2 for 0 < t < 1
4.1. CUMULATIVE DISTRIBUTION FUNCTION TECHNIQUE 123

The cumulative distribution function for T is

8
>
> 0 t 0
<
G (t) = 3t 2t3=2 0 < t < 1
>
>
: 1 t 1

Now a cumulative distribution function must be a continuous function for all real values.
Therefore as a check we note that

lim 3t 2t3=2 = 0 = G (0)

t!0+

and
lim 3t 2t3=2 = 1 = G (1)
t!1

so indeed G (t) is a continuous function for all t 2 <.

d
Since dt G (t) = 0 for t < 0 and t > 0, and

d d
G (t) = 3t 2t3=2
dt dt
= 3 3t1=2 for 0 < t < 1

the probability density function of T is

g (t) = 3 3t1=2 for 0 < t < 1

and 0 otherwise.

4.1.2 Exercise
Suppose X and Y are continuous random variables with joint probability density function

f (x; y) = 3y for 0 x y 1

and 0 otherwise. Find the probability density function of S = Y =X.

4.1.3 Example
Suppose X1 ; X2 ; : : : ; Xn are independent and identically distributed continuous random
variables each with probability density function f (x) and cumulative distribution function
F (x). Find the probability density function of U = max (X1 ; X2 ; : : : ; Xn ) = X(n) and
V = min (X1 ; X2 ; : : : ; Xn ) = X(1) .
124 4. FUNCTIONS OF TWO OR MORE RANDOM VARIABLES

Solution
For u 2 <, the cumulative distribution function of U is

G (u) = P (U u) = P [max (X1 ; X2 ; : : : ; Xn ) u]

= P (X1 u; X2 u; : : : ; Xn u)
= P (X1 u) P (X2 u) : : : P (Xn u)
since X1 ; X2 ; : : : ; Xn are independent random variables
Q
n
= P (Xi u)
i=1
Q
n
= F (u) since X1 ; X2 ; : : : ; Xn are identically distributed
i=1
= [F (u)]n

Suppose A is the support set of Xi , i = 1; 2; : : : ; n. The probability density function of U is

d d
g (u) = G (u) = [F (u)]n
du du
= n [F (u)]n 1 f (u) for u 2 A

and 0 otherwise.
For v 2 <, the cumulative distribution function of V is

H (v) = P (V v) = P [min (X1 ; X2 ; : : : ; Xn ) v]

= 1 P [min (X1 ; X2 ; : : : ; Xn ) > v]
= 1 P (X1 > v; X2 > v; : : : ; Xn > v)
= 1 P (X1 > v) P (X2 > v) : : : P (Xn > v)
since X1 ; X2 ; : : : ; Xn are independent random variables
Qn
= 1 P (Xi > v)
i=1
Q
n
= 1 [1 F (v)] since X1 ; X2 ; : : : ; Xn are identically distributed
i=1
= 1 [1 F (v)]n

Suppose A is the support set of Xi , i = 1; 2; : : : ; n. The probability density function of V is

d d
h (v) = H (v) = f1 [1 F (v)]n g
dv dv
= n [1 F (v)]n 1 f (v) for v 2 A

and 0 otherwise.
4.2. ONE-TO-ONE TRANSFORMATIONS 125

4.2 One-to-One Transformations

In this section we look at how to determine the joint distribution of a one-to-one transfor-
mation of two or more random variables. We concentrate on the bivariate case for ease of
presentation. The method does extend to more than two random variables. See Problems
12 and 13 at the end of this chapter for examples of one-to-one transformations of three
random variables.
We begin with some notation and a theorem which gives su¢ cient conditions for deter-
mining whether a transformation is one-to-one in the bivariate case followed by the theorem
which gives the joint probability density function for the two new random variables.

Suppose the transformation S de…ned by

u = h1 (x; y)
v = h2 (x; y)

is a one-to-one transformation for all (x; y) 2 RXY and that S maps the region RXY into
the region RU V in the uv plane. Since S : (x; y) ! (u; v) is a one-to-one transformation
there exists a inverse transformation T de…ned by

x = w1 (u; v)
y = w2 (u; v)

such that T = S 1 : (u; v) ! (x; y) for all (u; v) 2 RU V . The Jacobian of the transformation
T is
@x @x 1
@(x; y) @u @v @(u; v)
= @y @y =
@(u; v) @u @v
@(x; y)

@(u;v)
where @(x;y) is the Jacobian of the transformation S.

4.2.1 Inverse Mapping Theorem

Consider the transformation S de…ned by

u = h1 (x; y)
v = h2 (x; y)

@u @u @v @v
Suppose the partial derivatives @x , @y , @x and @y are continuous functions in the neigh-
@(u;v)
bourhood of the point (a; b). Suppose also that @(x;y) 6= 0 at the point (a; b). Then there is
a neighbourhood of the point (a; b) in which S has an inverse.

Note: These are su¢ cient but not necessary conditions for the inverse to exist.
126 4. FUNCTIONS OF TWO OR MORE RANDOM VARIABLES

4.2.2 Theorem - One-to-One Bivariate Transformations

Let X and Y be continuous random variables with joint probability density function f (x; y)
and let RXY = f(x; y) : f (x; y) > 0g be the support set of (X; Y ). Suppose the transfor-
mation S de…ned by

U = h1 (X; Y )
V = h2 (X; Y )

is a one-to-one transformation with inverse transformation

X = w1 (U; V )
Y = w2 (U; V )

Suppose also that S maps RXY into RU V . Then g(u; v), the joint joint probability density
function of U and V , is given by
@(x; y)
g(u; v) = f (w1 (u; v); w2 (u; v))
@(u; v)
for all (u; v) 2 RU V . (Compare Theorem 2.6.8 for univariate random variables.)

4.2.3 Proof
We want to …nd g(u; v), the joint probability density function of the random variables U
and V . Suppose S 1 maps the region B RU V into the region A RXY then

P [(U; V ) 2 B]
ZZ
= g(u; v)dudv (4.1)
B
= P [(X; Y ) 2 A]
ZZ
= f (x; y)dxdy
A
ZZ
@(x; y)
= f (w1 (u; v); w2 (u; v)) dudv (4.2)
@(u; v)
B

where the last line follows by the Change of Variable Theorem. Since this is true for all
B RU V we have, by comparing (4.1) and (4.2), that the joint probability density function
of U and V is given by
@(x; y)
g(u; v) = f (w1 (u; v); w2 (u; v))
@(u; v)
for all (u; v) 2 RU V .

In the following example we see how Theorem 4.2.2 can be used to show that the sum of
two independent Exponential(1) random variables is a Gamma random variable.
4.2. ONE-TO-ONE TRANSFORMATIONS 127

4.2.4 Example
Suppose X Exponential(1) and Y Exponential(1) independently. Find the joint prob-
ability density function of U = X + Y and V = X. Show that U Gamma(2; 1).

Solution
Since X Exponential(1) and Y Exponential(1) independently, the joint probability
density function of X and Y is
x y
f (x; y) = f1 (x) f2 (y) = e e
x y
= e

with support set RXY = f(x; y) : x > 0; y > 0g which is shown in Figure 4.2.
. .
y . .
. .

...

x
0

Figure 4.2: Support set RXY for Example 4.2.6

The transformation
S : U =X +Y, V =X
has inverse transformation
X =V, Y =U V
Under S the boundaries of RXY are mapped as

(k; 0) ! (k; k) for k 0

(0; k) ! (k; 0) for k 0

and the point (1; 2) is mapped to the point (3; 1). Thus S maps RXY into

RU V = f(u; v) : 0 < v < ug

128 4. FUNCTIONS OF TWO OR MORE RANDOM VARIABLES

v v=u

...

u
0

Figure 4.3: Support set RU V for Example 4.2.4

as shown in Figure 4.3.

The Jacobian of the inverse transformation is
@x @x
@ (x; y) @u @v 0 1
= @y @y = @y = 1
@ (u; v) @u @v 1 @v

Note that the transformation S is a linear transformation and so we would expect the
Jacobian of the transformation to be a constant.
The joint probability density function of U and V is given by

@(x; y)
g (u; v) = f (w1 (u; v); w2 (u; v))
@(u; v)
= f (v; u v) j 1j
u
= e for (u; v) 2 RU V

and 0 otherwise.
To …nd the marginal probability density functions for U we note that the support set
RU V is not rectangular and the range of integration for v will depend on u. The marginal
probability density function of U is

Z1 Zu
u
g1 (u) = g (u; v) dv = e dv
1 v=0
u
= ue for u > 0

and 0 otherwise which is the probability density function of a Gamma(2; 1) random variable.
Therefore U Gamma(2; 1).
4.2. ONE-TO-ONE TRANSFORMATIONS 129

In the following exercise we see how the sum and di¤erence of two independent Exponential(1)
random variables give a Gamma random variable and a Double Exponential random vari-
able respectively.

4.2.5 Exercise
Suppose X Exponential(1) and Y Exponential(1) independently. Find the joint prob-
ability density function of U = X + Y and V = X Y . Show that U Gamma(2; 1) and
V Double Exponential(0; 1).

In the following example we see how the Gamma and Beta distributions are related.

4.2.6 Example
Suppose X Gamma(a; 1) and Y Gamma(b; 1) independently. Find the joint probability
X
density function of U = X + Y and V = X+Y . Show that U Gamma(a + b; 1) and
X
V Beta(a; b) independently. Find E (V ) by …nding E X+Y .

Solution
Since X Gamma(a; 1) and Y Gamma(b; 1) independently, the joint probability density
function of X and Y is
xa 1e x yb 1e y
f (x; y) = f1 (x) f2 (y) =
(a) (b)
xa 1 y b 1 e x y
=
(a) (b)

with support set RXY = f(x; y) : x > 0; y > 0g which is the same support set as shown in
Figure 4.2.
The transformation
X
S : U =X +Y, V =
X +Y
has inverse transformation
X = U V , Y = U (1 V)
Under S the boundaries of RXY are mapped as

(k; 0) ! (k; 1) for k 0

(0; k) ! (k; 0) for k 0

and the point (1; 2) is mapped to the point 3; 13 . Thus S maps RXY into

RU V = f(u; v) : u > 0; 0 < v < 1g

as shown in Figure 4.4.
130 4. FUNCTIONS OF TWO OR MORE RANDOM VARIABLES

...

u
0

Figure 4.4: Support set RU V for Example 4.2.6

The Jacobian of the inverse transformation is

@ (x; y) v u
= = uv u + uv = u
@ (u; v) 1 v u

The joint probability density function of U and V is given by

g (u; v) = f (uv; u (1 v)) j uj

a 1
(uv) [u (1 v)]b 1 e uj uj
=
(a) (b)
uv
a 1 (1 v)b 1
= ua+b 1
e for (u; v) 2 RU V
(a) (b)
and 0 otherwise.
To …nd the marginal probability density functions for U and V we note that the support
set of U is B1 = fu : u > 0g and the support set of V is B2 = fv : 0 < v < 1g. Since

a+b 1 u va 1 (1 v)b 1
g (u; v) = u
| {z e } (a) (b)
h1 (u) | {z }
h2 (v)

for all (u; v) 2 B1 B2 then, by the Factorization Theorem for Independence, U and V are
independent random variables. Also by the Factorization Theorem for Independence the
probability density function of U must be proportional to h1 (u). By writing
ua+b 1 e u (a + b) a
g (u; v) = v 1
(1 v)b 1
(a + b) (a) (b)
we note that the function in the …rst square bracket is the probability density function of a
Gamma(a + b; 1) random variable and therefore U Gamma(a + b; 1). It follows that the
4.2. ONE-TO-ONE TRANSFORMATIONS 131

function in the second square bracket must be the probability density function of V which
is a Beta(a; b) probability density function. Therefore U Gamma(a + b; 1) independently
of V Beta(a; b).
In Chapter 2, Problem 9 the moments of a Beta random variable were found by inte-
gration. Here is a rather clever way of …nding E (V ) using the mean of a Gamma random
variable. In Exercise 2.7.9 it was shown that the mean of a Gamma( ; ) random variable
is .
Now

X
E (U V ) = E (X + Y ) = E (X) = (a) (1) = a
(X + Y )
since X Gamma(a; 1). But U and V are independent random variables so

a = E (U V ) = E (U ) E (V )

But since U Gamma(a + b; 1) we know E (U ) = a + b so

a = E (U ) E (V ) = (a + b) E (V )

Solving for E (V ) gives

a
E (V ) =
a+b
Higher moments can be found in a similar manner using the higher moments of a Gamma
random variable.

4.2.7 Exercise
Suppose X Beta(a; b) and Y Beta(a + b; c) independently. Find the joint probability
density function of U = XY and V = X. Show that U Beta(a; b + c).

In the following example we see how a rather unusual transformation can be used to trans-
form two independent Uniform(0; 1) random variables into two independent N(0; 1) random
variables. This transformation is referred to as the Box-Muller Transformation after the
two statisticians George E. P. Box and Mervin Edgar Muller who published this result in
1958.

4.2.8 Example - Box-Muller Transformation

Suppose X Uniform(0; 1) and Y Uniform(0; 1) independently. Find the joint proba-
bility density function of

U = ( 2 log X)1=2 cos (2 Y )

V = ( 2 log X)1=2 sin (2 Y )

Show that U N(0; 1) and V N(0; 1) independently. Explain how you could use this
result to generate independent observations from a N(0; 1) distribution.
132 4. FUNCTIONS OF TWO OR MORE RANDOM VARIABLES

Solution
Since X Uniform(0; 1) and Y Uniform(0; 1) independently, the joint probability density
function of X and Y is

f (x; y) = f1 (x) f2 (y) = (1) (1)

= 1

with support set RXY = f(x; y) : 0 < x < 1; 0 < y < 1g.
Consider the transformation

S : U = ( 2 log X)1=2 cos (2 Y ) , V = ( 2 log X)1=2 sin (2 Y )

To determine the support set of (U; V ) we note that 0 < y < 1 implies 1 < cos (2 y) < 1.
Also 0 < x < 1 implies 0 < ( 2 log X)1=2 < 1. Therefore u = ( 2 log x)1=2 cos (2 y) takes
on values in the interval ( 1; 1). By a similar argument v = ( 2 log x)1=2 sin (2 y) also
takes on values in the interval ( 1; 1). Therefore the support set of (U; V ) is RU V = <2 .
The inverse of the transformation S can be determined. In particular we note that since
h i2 h i2
U 2 + V 2 = ( 2 log X)1=2 cos (2 Y ) + ( 2 log X)1=2 sin (2 Y )
= ( 2 log X) cos2 (2 Y ) + sin2 (2 Y )
= 2 log X

and
V sin (2 Y )
= = tan (2 Y )
U cos (2 Y )
the inverse transformation is

X=e 2
1
(U 2 +V 2 ) , Y = 1 arctan V
2 U
To determine the Jacobian of the inverse transformation it is simpler to use the result
2 3 1
1 @u @u
@(x; y) @(u; v) @x @y
= = 4 @v @v 5
@(u; v) @(x; y) @x @y

Since
@u @u
@x @y
@v @v
@x @y

1=2
1
x ( 2 log x) cos (2 y) 2 ( 2 log x) 1=2 sin (2 y)
= 1=2
1
x ( 2 log x) sin (2 y) 2 ( 2 log x) 1=2 cos (2 y)
2
= cos2 (2 y) + sin2 (2 y)
x
2
=
x
4.2. ONE-TO-ONE TRANSFORMATIONS 133

Therefore

@(x; y) @(u; v) 1 2 1
= =
@(u; v) @(x; y) x
x
=
2
1 1 2 2
= e 2 (u +v )
2
The joint probability density function of U and V is

@(x; y)
g (u; v) = f (w1 (u; v); w2 (u; v))
@(u; v)
1 1
(u2 +v2 )
= (1) e 2
2
1 1
(u2 +v2 )
= e 2 for (u; v) 2 <2
2
The support set of U is < and the support set of V is <. Since g (u; v) can be written as

1 1 2
u 1 1 2
v
g (u; v) = p e 2 p e 2
2 2

for all (u; v) 2 < < = <2 , therefore by the Factorization Theorem for Independence, U
and V are independent random variables. We also note that the joint probability density
function is the product of two N(0; 1) probability density functions. Therefore U N(0; 1)
and V N(0; 1) independently.
Let x and y be two independent Uniform(0; 1) observations which have been generated
using a random number generator. Then from the result above we have that

u = ( 2 log x)1=2 cos (2 y)

v = ( 2 log x)1=2 sin (2 y)

are two independent N(0; 1) observations.

The result in the following theorem is one that was used (without proof) in a previous statis-
tics course such as STAT 221/231/241 to construct con…dence intervals and test hypotheses
regarding the mean in a N ; 2 model when the variance 2 is unknown.

4.2.9 Theorem - t Distribution

If X 2 (n) independently of Z N(0; 1) then

Z
T =p t(n)
X=n
134 4. FUNCTIONS OF TWO OR MORE RANDOM VARIABLES

Proof
The transformation T = p Z is not a one-to-one transformation. However if we add the
X=n
variable U = X to complete the transformation and consider the transformation
Z
S: T =p , U =X
X=n

then this transformation has an inverse transformation given by

1=2
U
X = U, Z = T
n

Since X 2 (n) independently of Z N(0; 1) the joint probability density function of

X and Z is
1 1 z 2 =2
f (x; z) = f1 (x) f2 (z) = xn=2 1
e x=2
p e
2n=2 (n=2) 2
1 z 2 =2
= p xn=2 1
e x=2
e
2(n+1)=2 (n=2)

with support set RXZ = f(x; z) : x > 0; z 2 <g. The transformation S maps RXZ into
RT U = f(t; u) : t 2 <; u > 0g.
The Jacobian of the inverse transformation is
@x @x
@ (x; z) @t @u 0 1 u 1=2
= @z @z = u 1=2 @z
=
@ (t; u) @t @u n @u
n

The joint probability density function of U and V is given by

u 1=2 u 1=2
g (t; u) = f t ;u
n n
1 2 u 1=2
= p un=2 1 e u=2 e t u=(2n)
2(n+1)=2 (n=2) n
1 2
= p u(n+1)=2 1 e u(1+t =n)=2 for (t; u) 2 RT U
2(n+1)=2 (n=2) n

and 0 otherwise.
To determine the distribution of T we need to …nd the marginal probability density
function for T .
Z1
g1 (t) = g (t; u) du
1
Z1
1 u(1+t2 =n)=2
= p u(n+1)=2 1
e du
2(n+1)=2 (n=2) n
0
4.2. ONE-TO-ONE TRANSFORMATIONS 135

2 2 1 1
t2
Let y = u2 1 + tn so that u = 2y 1 + tn and du = 2 1 + n dy. Note that when
u = 0 then y = 0, and when u ! 1 then y ! 1. Therefore

Z1 " #
1 (n+1)=2 1
" 1
#
1 t2 y t2
g1 (t) = p 2y 1 + e 2 1+ dy
2(n+1)=2 (n=2) n n n
0
" #
1 (n+1)=2 Z1
1 t2
= p 2(n+1)=2 1+ y (n+1)=2 1
e y
dy
2(n+1)=2 (n=2) n n
0
(n+1)=2
1 t2 n+1
= p 1+
(n=2) n n 2
n+1 (n+1)=2
2p t2
= 1+ for t 2 <
(n=2) n n

which is the probability density function of a random variable with a t(n) distribution.
Therefore
Z
T =p t(n)
X=n

as required.

4.2.10 Example

Use Theorem 4.2.9 to …nd E (T ) and V ar (T ) if T t(n).

Solution
If X 2 (n) independently of Z N(0; 1) then we know from the previous theorem that

Z
T =p t(n)
X=n

Now
!
Z p 1=2
E (T ) = E p = nE (Z) E X
X=n

since X and Z are independent random variables. Since E (Z) = 0 it follows that E (T ) = 0
136 4. FUNCTIONS OF TWO OR MORE RANDOM VARIABLES

as long as E X 1=2 exists. Since X 2 (n)

Z1
k 1
E X = xk xn=2 1
e x=2
dx
2n=2 (n=2)
0
Z1
1
= xk+n=2 1
e x=2
dx let y = x=2
2n=2 (n=2)
0
Z1
1
= (2y)k+n=2 1
e y
(2) dy
2n=2 (n=2)
0
Z1
2k+n=2
= y k+n=2 1
e y
dy
2n=2 (n=2)
0
2k (n=2 + k)
= (4.3)
(n=2)
which exists for n=2 + k > 0. If k = 1=2 then the integral exists for n=2 > 1=2 or n > 1.
Therefore
E (T ) = 0 for n > 1
Now

V ar (T ) = E T 2 [E (T )]2
= E T2 since E (T ) = 0

and
Z2
E T2 = E = nE Z 2 E X 1
X=n
Since Z N(0; 1) then

E Z2 = V ar (Z) + [E (Z)]2 = 1 + 02
= 1

Also by (4.3)
2 1 (n=2 1) 1
1
E X = =
(n=2) 2 (n=2 1)
1
=
n 2
which exists for n > 2. Therefore

V ar (T ) = E T 2 = nE Z 2 E X 1

1
= n (1)
n 2
n
= for n > 2
n 2
4.3. MOMENT GENERATING FUNCTION TECHNIQUE 137

The following theorem concerns the F distribution which is used in testing hypotheses about
the parameters in a multiple linear regression model.

4.2.11 Theorem - F Distribution

If X 2 (n) independently of Y 2 (m) then

X=n
U= F(n; m)
Y =m

4.2.12 Exercise
(a) Prove Theorem 4.2.11. Hint: Complete the transformation with V = Y .
(b) Find E(U ) and V ar(U ) and note for what values of n and m that they exist.
Hint: Use the technique and results of Example 4.2.10.

4.3 Moment Generating Function Technique

The moment generating function technique is particularly useful in determining the distri-
bution of a sum of two or more independent random variables if the moment generating
functions of the random variables exist.

4.3.1 Theorem
Suppose X1 ; X2 ; : : : ; Xn are independent random variables and Xi has moment generating
function Mi (t) which exists for t 2 ( h; h) for some h > 0. The moment generating function
P
n
of Y = Xi is given by
i=1
Q
n
MY (t) = Mi (t)
i=1

for t 2 ( h; h).
If the Xi ’s are independent and identically distributed random variables each with
P
n
moment generating function M (t) then Y = Xi has moment generating function
i=1

MY (t) = [M (t)]n

for t 2 ( h; h).

Proof
P
n
The moment generating function of Y = Xi is
i=1
138 4. FUNCTIONS OF TWO OR MORE RANDOM VARIABLES

MY (t) = E etY
P
n
= E exp t Xi
i=1
Q
n
= E etXi since X1 ; X2 ; : : : ; Xn are independent random variables
i=1
Q
n
= Mi (t) for t 2 ( h; h)
i=1

If X1 ; X2 ; : : : ; Xn are identically distributed each with moment generating function M (t)

then
Q
n
MY (t) = M (t)
i=1
= [M (t)]n for t 2 ( h; h)

as required.

Note: This theorem in conjunction with the Uniqueness Theorem for Moment Generating
Functions can be used to …nd the distribution of Y .

Here is a summary of results about sums of random variables for the named distributions.

4.3.2 Special Results

(1) If Xi Binomial(ni ; p), i = 1; 2; : : : ; n independently, then
P
n P
n
Xi Binomial ni ; p
i=1 i=1

(2) If Xi Poisson( i ), i = 1; 2; : : : ; n independently, then

P
n P
n
Xi Poisson i
i=1 i=1

(3) If Xi Negative Binomial(ki ; p), i = 1; 2; : : : ; n independently, then

P
n P
n
Xi Negative Binomial ki ; p
i=1 i=1

(4) If Xi Exponential( ), i = 1; 2; : : : ; n independently, then

P
n
Xi Gamma(n; )
i=1

(5) If Xi Gamma( i ; ), i = 1; 2; : : : ; n independently, then

P
n P
n
Xi Gamma i;
i=1 i=1
4.3. MOMENT GENERATING FUNCTION TECHNIQUE 139

(6) If Xi 2 (k ), i = 1; 2; : : : ; n independently, then

P
n
2 P
n
Xi ki
i=1 i=1

(7) If Xi N ; 2 , i = 1; 2; : : : ; n independently, then

2
P
n Xi 2
(n)
i=1

Proof
(1) Suppose Xi Binomial(ni ; p), i = 1; 2; : : : ; n independently. The moment generating
function of Xi is
ni
Mi (t) = pet + q for t 2 <
P
n
for i = 1; 2; : : : ; n. By Theorem 4.3.1 the moment generating function of Y = Xi is
i=1

Q
n
MY (t) = Mi (t)
i=1
Q
n
ni
= pet + q
i=1
P
n
ni
= pet + q i=1 for t 2 <
P
n
which is the moment generating function of a Binomial ni ; p random variable. There-
i=1
P
n P
n
fore by the Uniqueness Theorem for Moment Generating Functions Xi Binomial ni ; p .
i=1 i=1
(2) Suppose Xi Poisson( i ), i = 1; 2; : : : ; n independently. The moment generating func-
tion of Xi is
t
Mi (t) = e i (e 1)
for t 2 <
P
n
for i = 1; 2; : : : ; n. By Theorem 4.3.1 the moment generating function of Y = Xi is
i=1

Q
n
MY (t) = Mi (t)
i=1
Q
n t
= e i (e 1)
i=1
!
P
n
i (et 1)
= e i=1
for t 2 <
P
n
which is the moment generating function of a Poisson i random variable. Therefore
i=1
P
n P
n
by the Uniqueness Theorem for Moment Generating Functions Xi Poisson i .
i=1 i=1
140 4. FUNCTIONS OF TWO OR MORE RANDOM VARIABLES

(3) Suppose Xi Negative Binomial(ki ; p), i = 1; 2; : : : ; n independently. The moment

generating function of Xi is
p ki
Mi (t) = for t < log (q)
1 qet
P
n
for i = 1; 2; : : : ; n. By Theorem 4.3.1 the moment generating function of Y = Xi is
i=1

Q
n
MY (t) = Mi (t)
i=1
Q
n p ki
=
i=1 1 qet
Pn
ki
p i=1
= for t < log (q)
1 qet
P
n
which is the moment generating function of a Negative Binomial ki ; p random vari-
i=1
able. Therefore by the Uniqueness Theorem for Moment Generating Functions
Pn Pn
Xi Negative Binomial ki ; p .
i=1 i=1
(4) Suppose Xi Exponential( ), i = 1; 2; : : : ; n independently. The moment generating
function of each Xi is
1 1
M (t) = for t <
1 t
Pn
for i = 1; 2; : : : ; n. By Theorem 4.3.1 the moment generating function of Y = Xi is
i=1
n
1 1
MY (t) = [M (t)]n = = for t <
1 t
which is the moment generating function of a Gamma(n; ) random variable. Therefore by
P
n
the Uniqueness Theorem for Moment Generating Functions Xi Gamma(n; ).
i=1
(5) Suppose Xi Gamma( i ; ), i = 1; 2; : : : ; n independently. The moment generating
function of Xi is
i
1 1
Mi (t) = for t <
1 t
P
n
for i = 1; 2; : : : ; n. By Theorem 4.3.1 the moment generating function of Y = Xi is
i=1

Q
n Q
n 1 i
MY (t) = Mi (t) =
i=1 i=1 1 t
P
n
i
1 i=1 1
= for t <
1 t
4.3. MOMENT GENERATING FUNCTION TECHNIQUE 141

P
n
which is the moment generating function of a Gamma i; random variable. There-
i=1
P
n P
n
fore by the Uniqueness Theorem for Moment Generating Functions Xi Gamma i; .
i=1 i=1

(6) Suppose Xi 2 (k
i ), i = 1; 2; : : : ; n independently. The moment generating function
of Xi is
ki
1 1
Mi (t) = for t <
1 2t 2
P
n
for i = 1; 2; : : : ; n. By Theorem 4.3.1 the moment generating function of Y = Xi is
i=1

Q
n
MY (t) = Mi (t)
i=1
ki
Q
n 1
=
i=1 1 2t
P
n
ki
1 i=1 1
= for t <
1 2t 2

2
P
n
which is the moment generating function of a ki random variable. Therefore by
i=1
P
n
2
P
n
the Uniqueness Theorem for Moment Generating Functions Xi ki .
i=1 i=1

(7) Suppose Xi N ; 2 , i = 1; 2; : : : ; n independently. Then by Example 2.6.9 and

Theorem 2.6.3
2
Xi 2
(1) for i = 1; 2; : : : ; n

and by (6)
2
P
n Xi 2
(n)
i=1

4.3.3 Exercise
Suppose X1 ; X2 ; : : : ; Xn are independent and identically distributed random variables with
moment generating function M (t), E (Xi ) = , and V ar (Xi ) = 2 < 1. Give an ex-
p P
n
pression for the moment generating function of Z = n X = where X = n1 Xi in
i=1
terms of M (t).

The following theorem is one that was used in your previous probability and statistics
courses without proof. The method of moment generating functions now allows us to easily
proof this result.
142 4. FUNCTIONS OF TWO OR MORE RANDOM VARIABLES

4.3.4 Theorem - Linear Combination of Independent Normal Random

Variables
If Xi N( i ; 2 ), i = 1; 2; : : : ; n independently, then
i

P
n P
n P
n
ai Xi N ai i ; a2i 2
i
i=1 i=1 i=1

Proof
Suppose Xi N( i ; 2 ), i = 1; 2; : : : ; n independently. The moment generating function of
i
Xi is
2 2
Mi (t) = e i t+ i t =2 for t 2 <
P
n
for i = 1; 2; : : : ; n. The moment generating function of Y = ai Xi is
i=1

MY (t) = E etY
P
n
= E exp t ai Xi
i=1
Q
n
= E e(ai t)Xi since X1 ; X2 ; : : : ; Xn are independent random variables
i=1
Q
n
= Mi (ai t)
i=1
Q
n 2 2 2
= e i ai t+ i ai t =2

i=1
P
n P
n
= exp ai i t+ a2i 2
i t2 =2 for t 2 <
i=1 i=1

P
n P
n
which is the moment generating function of a N ai i ; a2i 2
i random variable.
i=1 i=1
Therefore by the Uniqueness Theorem for Moment Generating Functions
P
n P
n P
n
ai Xi N ai i ; a2i 2
i
i=1 i=1 i=1

4.3.5 Corollary
If Xi N( ; 2 ), i = 1; 2; : : : ; n independently then
P
n
2
Xi N n ;n
i=1

and
1 Pn 2
X= Xi N ;
n i=1 n
4.3. MOMENT GENERATING FUNCTION TECHNIQUE 143

Proof
To prove that
P
n
2
Xi N n ;n
i=1

let ai = 1, = , and 2 = 2 in Theorem 4.3.4 to obtain

i i

P
n P
n P
n
2
Xi N ;
i=1 i=1 i=1

or
P
n
2
Xi N n ;n
i=1

To prove that
1 Pn 2
X= Xi N ;
n i=1 n
we note that
P
n 1
X= Xi
i=1 n
Let ai = n1 , i = , and 2
i = 2 in Theorem 4.3.4 to obtain
!
2
P
n 1 P
n 1 P
n 1 2
X= Xi N ;
i=1 n i=1 n i=1 n

or
2
X N ;
n

The following identity will be used in proving Theorem 4.3.8.

4.3.6 Useful Identity

P
n P
n
2 2
(Xi )2 = Xi X +n X
i=1 i=1

4.3.7 Exercise
Prove the identity 4.3.6

As mentioned previously the t distribution is used to construct con…dence intervals and

test hypotheses regarding the mean in a N ; 2 model. We are now able to prove the
theorem on which these results are based.
144 4. FUNCTIONS OF TWO OR MORE RANDOM VARIABLES

4.3.8 Theorem
If Xi N( ; 2 ), i = 1; 2; : : : ; n independently then
2
X N ;
n

independently of
P
n
2
Xi X
(n 1) S 2 i=1 2
2
= 2
(n 1)

where
P
n
2
Xi X
i=1
S2 =
n 1
Proof
For a proof that X and S 2 are independent random variables please see Problem 16.
By identity 4.3.6
P
n P
n
2 2
(Xi )2 = Xi X +n X
i=1 i=1

Dividing both sides by 2 gives

2 2
P
n Xi (n 1) S 2 X
= 2
+ p
| {z } = n
|i=1 {z } | {z }
Y U V

Since X and S 2 are independent random variables, it follows that U and V are independent
random variables.
By 4.3.2(7)
2
P
n Xi 2
Y = (n)
i=1

with moment generating function

n=2 1
MY (t) = (1 2t) for t < (4.4)
2
2
X N ; n was proved in Corollary 4.3.5. By Example 2.6.9

X
p N (0; 1)
= n

and by Theorem 2.6.3

2
X 2
V = p (1)
= n
4.3. MOMENT GENERATING FUNCTION TECHNIQUE 145

with moment generating function

1=2 1
MV (t) = (1 2t) for t < (4.5)
2
Since U and V are independent random variables and Y = U + V then

MY (t) = E etY = E et(U +V ) = E etU E etV = MU (t) MV (t) (4.6)

Substituting (4.4) and (4.5) into (4.6) gives

n=2 1=2 1
(1 2t) = MU (t) (1 2t) for t <
2
or
1(n 1)=2
MU (t) = (1 2t) for t <
2
which is the moment generating function of a 2 (n 1) random variable. Therefore by the
Uniqueness Theorem for Moment Generating Functions

(n 1) S 2 2
U= 2
(n 1)

4.3.9 Theorem
If Xi N( ; 2 ), i = 1; 2; : : : ; n independently then

X
p t (n 1)
S= n
Proof

p X
X = n Z
p =r =q
S= n (n 1)S 2 U
2 n 1
n 1

where
X
Z= p N (0; 1)
= n
independently of
(n 1) S 2 2
U= 2
(n 1)

Therefore by Theorem 4.2.9

X
p t (n 1)
S= n

The following theorem is useful for testing the equality of variances in a two sample Normal
model.
146 4. FUNCTIONS OF TWO OR MORE RANDOM VARIABLES

4.3.10 Theorem
Suppose X1 ; X2 ; : : : ; Xn are independent N 1 ; 21 random variables, and independently
Y1 ; Y2 ; : : : ; Ym are independent N 2 ; 22 random variables. Let

P
n
2 P
m
2
Xi X Yi Y
i=1 i=1
S12 = and S22 =
n 1 m 1
Then
S12 = 2
1
F (n 1; m 1)
S22 = 2
2

4.3.11 Exercise
Prove Theorem 4.3.10. Hint: Use Theorems 4.3.8 and 4.2.11.
4.4. CHAPTER 4 PROBLEMS 147

4.4 Chapter 4 Problems

1. Show that if X and Y are independent random variables then U = h (X) and
V = g (Y ) are also independent random variables where h and g are real-valued
functions.

2. Suppose X and Y are continuous random variables with joint probability density
function
f (x; y) = 24xy for 0 < x + y < 1; 0 < x < 1; 0 < y < 1

and 0 otherwise.

(a) Find the joint probability density function of U = X + Y and V = X. Be sure

to specify the support set of (U; V ).
(b) Find the marginal probability density function of U and the marginal probability
density function of V . Be sure to specify their support sets.

3. Suppose X and Y are continuous random variables with joint probability density
function
f (x; y) = e y for 0 < x < y < 1

and 0 otherwise.

(a) Find the joint probability density function of U = X + Y and V = X. Be sure

to specify the support set of (U; V ).
(b) Find the marginal probability density function of U and the marginal probability
density function of V . Be sure to specify their support sets.

4. Suppose X and Y are nonnegative continuous random variables with joint probability
density function f (x; y). Show that the probability density function of U = X + Y
is given by
Z1
g (u) = f (v; u v) dv
0

Hint: Consider the transformation U = X + Y and V = X.

5. Suppose X and Y are continuous random variables with joint probability density
function
f (x; y) = 2 (x + y) for 0 < x < y < 1

and 0 otherwise.

(a) Find the joint probability density function of U = X and V = XY . Be sure to

specify the support set of (U; V ).
148 4. FUNCTIONS OF TWO OR MORE RANDOM VARIABLES

(b) Are U and V independent random variables?

6. Suppose X and Y are continuous random variables with joint probability density
function
f (x; y) = 4xy for 0 < x < 1; 0 < y < 1
and 0 otherwise.

(a) Find the probability density function of T = X + Y using the cumulative distri-
bution function technique.
(b) Find the joint probability density function of S = X and T = X + Y . Find the
marginal probability density function of T and compare your answer to the one
you obtained in (a).
(c) Find the joint probability density function of U = X 2 and V = XY . Be sure to
specify the support set of (U; V ).
(d) Find the marginal probability density function’s of U and V:
(e) Find E(V 3 ): (Hint: Are X and Y independent random variables?)

7. Suppose X and Y are continuous random variables with joint probability density
function
f (x; y) = 4xy for 0 < x < 1; 0 < y < 1
and 0 otherwise.

(a) Find the joint probability density function of U = X=Y and V = XY: Be sure
to specify the support set of (U; V ).
(b) Are U and V independent random variables?
(c) Find the marginal probability density function’s of U and V . Be sure to specify
their support sets.

8. Suppose X and Y are independent Uniform(0; ) random variables. Find the proba-
bility density function of U = X Y:
(Hint: Complete the transformation with V = X + Y:)

9. Suppose Z1 N(0; 1) and Z2 N(0; 1) independently. Let

2 1=2
X1 = 1 + 1 Z1 , X2 = 2 + 2[ Z1 + 1 Z2 ]

where 1< 1; 2 < 1, 1, 2 > 0 and 1< < 1.

(a) Show that (X1 ; X2 )T BVN( ; ).

4.4. CHAPTER 4 PROBLEMS 149

(b) Show that (X )T 1 (X ) 2 (2). Hint: Show (X )T 1 (X ) = ZT Z

where Z = (Z1 ; Z2 )T .

10. Suppose X N ; 2 and Y N ; 2 independently. Let U = X + Y and

V =X Y.

(a) Find the joint moment generating function of U and V .

(b) Use (a) to show that U and V are independent random variables.

11. Let X and Y be independent N(0; 1) random variables and let U = X=Y .

(a) Show that U Cauchy(1; 0).

(b) Show that the Cauchy(1; 0) probability density function is the same as the t(1)
probability density function

12. Let X1 ; X2 ; X3 be independent Exponential(1) random variables. Let the random

variables Y1 ; Y2 ; Y3 be de…ned by
X1 X1 + X2
Y1 = ; Y2 = ; Y3 = X1 + X2 + X3
X1 + X2 X1 + X2 + X3
Show that Y1 ; Y2 ; Y3 are independent random variables and …nd their marginal prob-
ability density function’s.

13. Let X1 ; X2 ; X3 be independent N(0; 1) random variables. Let the random variables
Y1 ; Y2 ; Y3 be de…ned by

X1 = Y1 cos Y2 sin Y3; X2 = Y1 sin Y2 sin Y3 ; X3 = Y1 cos Y3

for 0 < y1 < 1; 0 < y2 < 2 ; 0 < y3 <

Show that Y1 ; Y2 ; Y3 are independent random variables and …nd their marginal prob-
ability density function’s.

14. Suppose X1 ; X2 ; : : : ; Xn is a random sample from the Poisson( ) distribution. Find

P
n
the conditional probability function of X1 ; X2 ; : : : ; Xn given T = Xi = t.
i=1

15. Suppose X 2 (n), X + Y 2 (m), m > n and X and Y independent random

variables. Use the properties of moment generating functions to show that

Y 2 (m n).
150 4. FUNCTIONS OF TWO OR MORE RANDOM VARIABLES

16. Suppose X1 ; X2 ; : : : ; Xn is a random sample from the N( ; 2 ) distribution. In this

P
n
problem we wish to show that X and (Xi X)2 are independent random variables.
i=1
1 P
n
Note that this implies that X and S2 = n 1 (Xi X)2 are also independent random
i=1
variables.
Let U = (U1 ; U2 ; : : : ; Un ) where Ui = Xi X; i = 1; 2; : : : ; n and let
P
n
M (s1 ; s2 ; : : : ; sn ; s) = E[exp( si Ui + sX)]
i=1

be the joint moment generating function of U and X.

1 P
n
(a) Let ti = si s + s=n; i = 1; 2; : : : ; n where s = n si . Show that
i=1

P
n P
n P
n
2 P
n
E[exp( si Ui + sX)] = E[exp( ti Xi )] = exp ti + t2i =2
i=1 i=1 i=1 i=1

Hint: Since Xi N( ; 2)

2 2
E[exp(ti Xi )] = exp ti + ti =2

P
n P
n P
n
(b) Verify that ti = s and t2i = (si s)2 + s2 =n.
i=1 i=1 i=1
(c) Use (a) and (b) to show that

2 P
n
M (s1 ; s2 ; : : : ; sn ; s) = exp[ s + ( =n)(s2 =2)] exp[ 2
(si s)2 =2]
i=1

(d) Show that the random variable X is independent of the random vector U and
Pn
thus X and (Xi X)2 are independent. Hint: MX (s) = M (0; 0; : : : ; 0; s) and
i=1
MU (s1 ; s2 ; : : : ; sn ) = M (s1 ; s2 ; : : : ; sn ; 0).
5. Limiting or Asymptotic
Distributions

In a previous probability course the Poisson approximation to the Binomial distribution

n x (np)x e np
p (1 p)n x
for x = 0; 1; : : : ; n
x x!

if n is large and p is small was used.

As well the Normal approximation to the Binomial distribution

n x
P (Xn x) = p (1 p)n x
for x = 0; 1; : : : ; n
x
!
x np
P Z p where Z N (0; 1)
np (1 p)

if n is large and p is close to 1=2 (a special case of the very important Central Limit Theorem)
was used. These are examples of what we will call limiting or asymptotic distributions.
In this chapter we consider a sequence of random variables X1 ; X2 ; : : : ; Xn ; : : : and look
at the de…nitions and theorems related to determining the limiting distribution of such a
sequence. In Section 5:1 we de…ne convergence in distribution and look at several examples
to illustrate its meaning. In Section 5:2 we de…ne convergence in probability and examine
its relationship to convergence in distribution. In Section 5:3 we look at the Weak Law of
Large Numbers which is an important theorem when examining the behaviour of estimators
of unknown parameters (Chapter 6). In Section 5:4 we use the moment generating function
to …nd limiting distributions including a proof of the Central Limit Theorem. The Central
Limit Theorem was used in STAT 221/231/241 to construct an approximate con…dence
interval for an unknown parameter. In Section 5:5 additional limit theorems for …nding
limiting distributions are introduced. These additional theorems allow us to determine new
limiting distributions by combining the limiting distributions which have been determined
from de…nitions, the Weak Law of Large Numbers, and/or the Central Limit Theorem.

151
152 5. LIMITING OR ASYMPTOTIC DISTRIBUTIONS

5.1 Convergence in Distribution

In calculus you studied sequences of real numbers a1 ; a2 ; : : : ; an ; : : : and learned theorems
which allowed you to evaluate limits such as lim an . In this course we are interested in
n!1
a sequence of random variables X1 ; X2 ; : : : ; Xn ; : : : and what happens to the distribution
of Xn as n ! 1. We do this be examining what happens to Fn (x) = P (Xn x), the
cumulative distribution function of Xn , as n ! 1. Note that for a …xed value of x, the
sequence F1 (x) ; F2 (x) ; : : : ; Fn (x) : : : is a sequence of real numbers. In general we will
obtain a di¤erent sequence of real numbers for each di¤erent value of x. Since we have
a sequence of real numbers we will be able to use limit theorems you have used in your
previous calculus courses to evaluate lim Fn (x). We will need to take care in determining
n!1
how Fn (x) behaves as n ! 1 for all real values of x. To formalize these ideas we give the
following de…nition for convergence in distribution of a sequence of random variables.

5.1.1 De…nition - Convergence in Distribution

Let X1 ; X2 ; : : : ; Xn ; : : : be a sequence of random variables and let F1 (x) ; F2 (x) ; : : : ; Fn (x) ; : : :
be the corresponding sequence of cumulative distribution functions, that is, Xn has cumu-
lative distribution function Fn (x) = P (Xn x). Let X be a random variable with cumu-
lative distribution function F (x) = P (X x). We say Xn converges in distribution to X
and write
Xn !D X

if
lim Fn (x) = F (x)
n!1

at all points x at which F (x) is continuous. We call F the limiting or asymptotic distribution
of Xn .
Note:
(1) Although we say the random variable Xn converges in distribution to the random
variable X, the de…nition of convergence in distribution is de…ned in terms of the pointwise
convergence of the corresponding sequence of cumulative distribution functions.
(2) This de…nition holds for both discrete and continuous random variables.
(3) One way to think about convergence in distribution is that, if Xn !D X, then for large
n
Fn (x) = P (Xn x) F (x) = P (X x)

if x is a point of continuity of F (x). How good the approximation is will depend on the
values of n and x.

The following theorem and corollary will be useful in determining limiting distributions.
5.1. CONVERGENCE IN DISTRIBUTION 153

5.1.2 Theorem - e Limit

If b and c are real constants and lim (n) = 0 then
n!1

cn
b (n)
lim 1+ + = ebc
n!1 n n

5.1.3 Corollary
If b and c are real constants then
cn
b
lim 1+ = ebc
n!1 n

5.1.4 Example
Let Yi Exponential(1) ; i = 1; 2; : : : independently. Consider the sequence of random
variables X1 ; X2 ; : : : ; Xn ; : : : where Xn = max (Y1 ; Y2 ; : : : ; Yn ) log n. Find the limiting
distribution of Xn .

Solution
Since Yi Exponential(1)
(
0 y 0
P (Yi y) = y
1 e y>0

for i = 1; 2; : : :. Since the Yi ’s are independent random variables

Fn (x) = P (Xn x) = P (max (Y1 ; Y2 ; : : : ; Yn ) log n x)

= P (max (Y1 ; Y2 ; : : : ; Yn ) x + log n)
= P (Y1 x + log n; Y2 x + log n; : : : ; Yn x + log n)
Qn
= P (Yi x + log n)
i=1
Q
n
(x+log n)
= 1 e for x + log n > 0
i=1
x n
e
= 1 for x > log n
n

As n ! 1, log n ! 1 so
n
( e x)
lim Fn (x) = lim 1+
n!1 n!1 n
e x
= e for x 2 <

by 5.1.3.
154 5. LIMITING OR ASYMPTOTIC DISTRIBUTIONS

Consider the function

e x
F (x) = e for x 2 <
x
Since F 0 (x) = e x e e > 0 for all x 2 <, therefore F (x) is a continuous, increasing
x x
function for x 2 <. Also lim F (x) = lim e e = 0, and lim F (x) = lim e e = 1.
x! 1 x! 1 x!1 x!1
Therefore F (x) is a cumulative distribution function for a continuous random variable.
Let X be a random variable with cumulative distribution function F (x). Since
lim Fn (x) = F (x)
n!1
for all x 2 <, that is at all points x at which F (x) is continuous, therefore
Xn !D X
In Figure 5.1 you can see how quickly the curves Fn (x) approach the limiting curve
F (x).

1
0.9 n =1
n =2
0.8 n =5
n = 10
0.7 n = infinity

0.6
Fn(x)
0.5
0.4
0.3
0.2
0.1
0
-3 -2 -1 0 1 2 3 4
x

x n
e
Figure 5.1: Graphs of Fn (x) = 1 n for n = 1; 2; 5; 10; 1

5.1.5 Example
Let Yi Uniform(0; ), i = 1; 2; : : : independently. Consider the sequence of random vari-
ables X1 ; X2 ; : : : ; Xn ; : : : where Xn = max (Y1 ; Y2 ; : : : ; Yn ). Find the limiting distribution
of Xn .

Solution
Since Yi Uniform(0; ) 8
>
> 0 y 0
<
y
P (Yi y) = 0<y<
>
>
: 1 y
5.1. CONVERGENCE IN DISTRIBUTION 155

for i = 1; 2; : : :. Since the Yi ’s are independent random variables

Fn (x) = P (Xn x) = P (max (Y1 ; Y2 ; : : : ; Yn ) x)

Q
n
= P (Y1 x; Y2 x; : : : ; Yn x) = P (Yi x)
i=1
8 n
> Q
>
> 0 x 0
>
>
>
> Qi=1
< n
x
= 0<x<
>
> i=1
>
> Q
n
>
>
>
: 1 x
i=1
8
>
> 0 x 0
<
x n
= 0<x<
>
>
: 1 x

Therefore (
0 x<
lim Fn (x) = = F (x)
n!1 1 x

In Figure 5.2 you can see how quickly the curves Fn (x) approach the limiting curve F (x).

1
n =1
0.9 n =2
n =5
0.8
n=10
0.7 n=100
n=infinity
0.6
F n(x)
0.5
0.4
0.3
0.2
0.1
0
-0.5 0 0.5 1 1.5 2 2.5
x

x n
Figure 5.2: Graphs of Fn (x) = for = 2 and n = 1; 2; 5; 10; 100; 1

It is straightforward to check that F (x) is a cumulative distribution function for the

discrete random variable X with probability function
(
1 y=
f (x) =
0 otherwise
156 5. LIMITING OR ASYMPTOTIC DISTRIBUTIONS

Therefore
Xn !D X
Since X only takes on one value with probability one, X is called a degenerate random
variable. When Xn converges in distribution to a degenerate random variable we also call
this convergence in probability to a constant as de…ned in the next section.

5.1.6 Comment
Suppose X1 ; X2 ; : : : ; Xn ; : : :is a sequence of random variables such that Xn !D X. Then
for large n we can use the approximation
P (Xn x) P (X x)
If X is degenerate at b then P (X = b) = 1 and this approximation is not very useful.
However, if the limiting distribution is degenerate then we could use this result in another
way. In Example 5.1.5 we showed that if Yi Uniform(0; ), i = 1; 2; : : : ; n indepen-
dently then Xn = max (Y1 ; Y2 ; : : : ; Yn ) converges in distribution to a degenerate random
variable X with P (X = ) = 1. This result is rather useful since, if we have observed data
y1 ; y2 ; : : : ; yn from a Uniform(0; ) distribution and is unknown, then this suggests using
y(n) = max (y1 ; y2 ; : : : ; yn ) as an estimate of if n is reasonably large. We will discuss this
idea in more detail in Chapter 6.

5.2 Convergence in Probability

De…nition 5.1.1 is useful for …nding the limiting distribution of a sequence of random vari-
ables X1 ; X2 ; : : : ; Xn ; : : : when the sequence of corresponding cumulative distribution func-
tions F1 ; F2 ; : : : ; Fn ; : : : can be obtained. In other cases we may use the following de…nition
to determine the limiting distribution.

5.2.1 De…nition - Convergence in Probability

A sequence of random variables X1 ; X2 ; : : : ; Xn ; : : : converges in probability to a random
variable X if, for all " > 0,
lim P (jXn Xj ") = 0
n!1
or equivalently
lim P (jXn Xj < ") = 1
n!1
We write
Xn !p X
Convergence in probability is a stronger former of convergence than convergence in distri-
bution in the sense that convergence in probability implies convergence in distribution as
stated in the following theorem. However, if Xn converges in distribution to X, then Xn
may or may not converge in probability to X.
5.2. CONVERGENCE IN PROBABILITY 157

5.2.2 Theorem - Convergence in Probability Implies Convergence in Dis-

tribution
If Xn !p X then Xn !D X.

In Example 5.1.5 the limiting distribution was degenerate. When the limiting distribution
is degenerate we say Xn converges in probability to a constant. The following de…nition,
which follows from De…nition 5.2.1, can be used for proving convergence in probability.

5.2.3 De…nition - Convergence in Probability to a Constant

A sequence of random variables X1 ; X2 ; : : : ; Xn ; : : : converges in probability to a constant b
if, for all " > 0,
lim P (jXn bj ") = 0
n!1
or equivalently
lim P (jXn bj < ") = 1
n!1
We write
Xn !p b

5.2.4 Example
Suppose X1 ; X2 ; : : : ; Xn ; : : : is a sequence of random variables with E(Xn ) = n and
V ar(Xn ) = 2n . If lim n = a and lim 2n = 0 then show Xn !p a.
n!1 n!1

Solution
To show Xn !p a we need to show that for all " > 0

lim P (jXn aj < ") = 1

n!1

or equivalently
lim P (jXn aj ") = 0
n!1
Recall Markov’s Inequality. For all k; c > 0
Xk E
P (jXj c)
ck
Therefore by Markov’s Inequality with k = 2 and c = " we have

E jXn aj2
0 lim P (jXn aj ") lim (5.1)
n!1 n!1 "2
Now
h i
2 2
E jXn aj = E (Xn a)
h i
2
= E (Xn n ) + 2( n a) E (Xn n) +( n a)2
158 5. LIMITING OR ASYMPTOTIC DISTRIBUTIONS

Since
2 2
lim E [Xn n] = lim V ar (Xn ) = lim n =0
n!1 n!1 n!1

and
lim ( n a) = 0 since lim n =a
n!1 n!1

therefore
h i
lim E jXn aj2 = lim E (Xn a)2 = 0
n!1 n!1

which also implies

E jXn aj2
lim =0 (5.2)
n!1 "2
Thus by (5.1), (5.2) and the Squeeze Theorem

lim P (jXn aj ") = 0

n!1

for all " > 0 as required.

The proof in Example 5.2.4 used De…nition 5.2.3 to prove convergence in probability. The
reason for this is that the distribution of the Xi ’s was not speci…ed. Only conditions on
E(Xn ) and V ar(Xn ) were speci…ed. This means that the result in Example 5.2.4 holds for
any sequence of random variables X1 ; X2 ; : : : ; Xn ; : : : satisfying the given conditions.

If the sequence of corresponding cumulative distribution functions F1 ; F2 ; : : : ; Fn ; : : : can be

obtained then the following theorem can also be used to prove convergence in probability
to a constant.

5.2.5 Theorem
Suppose X1 ; X2 ; : : : ; Xn ; : : : is a sequence of random variables such that Xn has cumulative
distribution function Fn (x). If
8
<0 xb

then Xn !p b.

Note: We do not need to worry about whether lim Fn (b) exists since x = b is a point of
n!1
discontinuity of the limiting distribution (see De…nition 5.1.1).
5.3. WEAK LAW OF LARGE NUMBERS 159

5.2.6 Example
Let Yi Exponential( ; 1), i = 1; 2; : : : independently. Consider the sequence of random
variables X1 ; X2 ; : : : ; Xn ; : : : where Xn = min (Y1 ; Y2 ; : : : ; Yn ). Show that Xn !p .

Solution
Since Yi Exponential( ; 1)
( (y )
e y>
P (Yi > y) =
1 y

for i = 1; 2; : : :. Since Y1 ; Y2 ; : : : ; Yn are independent random variables

Fn (x) = 1 P (Xn > x) = 1 P (min (Y1 ; Y2 ; : : : ; Yn ) > x)

Qn
= 1 P (Y1 > x; Y2 > x; : : : ; Yn > x) = 1 P (Yi > x)
i=1
8
> Q
n
>
> 1 1 x
<
i=1
= Q
n
>
> e (x )
: 1
>
i=1
x>
(
0 x
= n(x )
1 e x>

Therefore (
0 x
lim Fn (x) =
n!1 1 x>
which we note is not a cumulative distribution function since the function is not right-
continuous at x = . However
(
0 x<
lim Fn (x) =
n!1 1 x>

and therefore by Theorem 5.2.5, Xn !p . In Figure 5.3 you can see how quickly the limit
is approached.

5.3 Weak Law of Large Numbers

In this section we look at a very important result which we will use in Chapter 6 to show
that maximum likelihood estimators have good properties. This result is called the Weak
Law of Large Numbers. Needless to say there is another law called the Strong Law of Large
Numbers but we will not consider this law here.
Also in this section we will look at some simulations to illustrate the theoretical result
in the Weak Law of Large Numbers.
160 5. LIMITING OR ASYMPTOTIC DISTRIBUTIONS

1
0.9
0.8
0.7
0.6
F n(x)
0.5
0.4 n =1
n =2
0.3 n =5
n = 10
0.2 n=100
n = infinity
0.1
0
1.5 2 2.5 3 3.5 4
x

Figure 5.3: Graphs of Fn (x) = 1 e n(x ) for =2

5.3.1 Weak Law of Large Numbers

Suppose X1 ; X2 ; : : : are independent and identically distributed random variables with
E(Xi ) = and V ar(Xi ) = 2 < 1. Consider the sequence of random X1 ; X2 ; : : : ; Xn ; : : :
where
1 Pn
Xn = Xi
n i=1
Then
Xn !p

Proof
Using De…nition 5.2.3 we need to show

lim P Xn " = 0 for all " > 0

n!1

We apply Chebyshev’s Theorem (see 2.8.2) to the random variable Xn where E Xn =

(see Corollary 3.6.3(3)) and V ar Xn = 2 =n (see Theorem 3.6.7(4)) to obtain

k 1
P Xn p for all k > 0
n k2
p
n"
Let k = . Then
2
0 P Xn " for all " > 0
n"
Since
2
lim =0
n!1 n"
5.3. WEAK LAW OF LARGE NUMBERS 161

therefore by the Squeeze Theorem

lim P Xn " = 0 for all " > 0
n!1
as required.

Notes:
(1) The proof of the Weak Law of Large Numbers does not actually require that the random
variables be identically distributed, only that they all have the same mean and variance.
As well the proof does not require knowing the distribution of these random variables.
(2) In words the Weak Law of Large Numbers says that the sample mean Xn approaches
the population mean as n ! 1.

5.3.2 Example
If X Pareto(1; ) then X has probability density function

f (x) = for x 1
x +1
and 0 otherwise. X has cumulative distribution function
8
<0 if x < 1
F (x) =
:1 1
for x 1
x

and inverse cumulative distribution function

1 1=
F (x) = (1 x) for 0 < x < 1
Also 8
<1 if 0 < 1
E (X) =
: if >1
1
and
V ar (X) =
( 1)2 ( 2)
Suppose X1 ; X2 ; : : : ; Xn are independent and identically distributed Pareto(1; ) random
variables. By the Weak Law of Large Numbers
1 Pn
Xn = Xi !p E (X) = for > 1
n i=1 1
If Ui Uniform(0; 1), i = 1; 2; : : : ; n independently then by Theorem 2.6.6
1
Xi = F (Ui )
1=
= (1 Ui ) Pareto (1; )
i = 1; 2; : : : ; n independently. If we generate Uniform(0; 1) observations u1 ; u2 ; : : : ; un using
a random number generator and then let xi = (1 ui ) 1= then x1 ; x2 ; : : : ; xn are observa-
tions from the Pareto(1; ) distribution.
162 5. LIMITING OR ASYMPTOTIC DISTRIBUTIONS

The points (i; xi ), i = 1; 2; : : : ; 500 for one simulation of 500 observations from a Pareto(1; 5)
distribution are plotted in Figure 5.4.

4.5

3.5

3
xi
2.5

1.5

0.5
0 100 200 300 400 500
i

Figure 5.4: 500 observations from a Pareto(1; 5) distribution

1 P
n
Figure 5.5 shows a plot of the points (n; xn ), n = 1; 2; : : : ; 500 where xn = n xi is
i=1
the sample mean. We note that the sample mean xn is approaching the population mean
= E (X) = 5 5 1 = 1:25 as n increases.

1.5

1.45

1.4

1.35
sample
mean
1.3

1.25 µ=1.25

1.2

1.15
0 100 200 300 400 500
n

Figure 5.5: Graph of xn versus n for 500 Pareto(1; 5) observations

5.3. WEAK LAW OF LARGE NUMBERS 163

If we generate a further 1000 values of xi , and plot (i; xi ) ; i = 1; 2; : : : ; 1500 we obtain the
graph in Figure 5.6.

5.5

4.5

3.5
xi
3

2.5

1.5

0.5
0 500 1000 1500
i

Figure 5.6: 1500 observations from a Pareto(1; 5) distribution

The corresponding plot of (n; xn ), n = 1; 2; : : : ; 1500 in shown in Figure 5.7. We note that
the sample mean xn stays very close to the population mean = 1:25 for n > 500.

1.5

1.45

1.4

1.35
sample
mean
1.3

1.25 µ=1.25

1.2

1.15
0 500 1000 1500
n

Figure 5.7: Graph of xn versus n for 1500 Pareto(1; 5) observations

Note that these …gures correspond to only one set of simulated data. If we generated
another set of data using a random number generator the actual data points would change.
However what would stay the same is that the sample mean for the new data set would
still approach the mean value E (X) = 1:25 as n increases.
164 5. LIMITING OR ASYMPTOTIC DISTRIBUTIONS

The points (i; xi ), i = 1; 2; : : : ; 500 for one simulation of 500 observations from a Pareto(1; 0:5)
distribution are plotted in Figure 5.8. For this distribution
Z1 Z1
1
E (X) = x +1
dx = dx which diverges to 1
x x
1 1

Note in Figure 5.8 that the are some very large observations. In particular there is one
observation which is close to 40; 000.
4
x10
4

3.5

2.5
xi
2

1.5

0.5

0
0 100 200 300 400 500
i

Figure 5.8: 500 observations from a Pareto(1; 0:5) distribution

The corresponding plot of (n; xn ), n = 1; 2; : : : ; 500 for these data is given in Figure 5.9.
Note that the mean xn does not appear to be approaching a …xed value.

700

600

500
sample
mean
400

300

200

100

0
0 100 200 300 400 500
n

Figure 5.9: Graph of xn versus n for 500 Pareto(1; 0:5) observations

5.3. WEAK LAW OF LARGE NUMBERS 165

In Figure 5.10 the points (n; xn ) for a set of 50; 000 observations generated from a Pareto(1; 0:5)
distribution are plotted. Note that the mean xn does not approach a …xed value and in
general is getting larger as n gets large. This is consistent with E (X) diverging to 1.

4
x10
12

8
sample
mean
6

0
0 1 2 3 4 5
n x10 4

Figure 5.10: Graph of xn versus n for 50000 Pareto(1; 0:5) observations

5.3.3 Example
If X Cauchy(0; 1) then
1
f (x) = for x 2 <
(1 + x2 )
The probability density function, shown in Figure 5.11, is symmetric about the y axis and
the median of the distribution is equal to 0.

0.35

0.3

0.25

0.2
f(x)
0.15

0.1

0.05

0
-5 0 5
x

Figure 5.11: Cauchy(0; 1) probability density function

166 5. LIMITING OR ASYMPTOTIC DISTRIBUTIONS

The mean of a Cauchy(0; 1) random variable does not exist since

Z0
1 x
dx diverges to 1 (5.3)
1 + x2
1

and
Z1
1 x
dx diverges to 1 (5.4)
1 + x2
0

The cumulative distribution function of a Cauchy(0; 1) random variable is

1 1
F (x) = arctan x + for x 2 <
2
and the inverse cumulative distribution function is

1 1
F (x) = tan x for 0 < x < 1
2

If we generate Uniform(0; 1) observations u1 ; u2 ; : : : ; uN using a random number generator

and then let xi = tan ui 12 , then the xi ’s are observations from the Cauchy(0; 1)
distribution.
The points (n; xn ) for a set of 500; 000 observations generated from a Cauchy(0; 1) distrib-
ution are plotted in Figure 5.12. Note that xn does not approach a …xed value. However,
unlike the Pareto(1; 0:5) example in which xn was getting larger as n got larger, we see in
Figure 5.12 that xn drifts back and forth around the line y = 0. This behaviour, which is
consistent with (5.3)and (5.4), continues even if more observations are generated.

0
sample
mean -2

-4

-6

-8

-10

-12
0 1 2 3 4 5
n x 10 5

Figure 5.12: Graph of xn versus n for 50000 Cauchy(0; 1) observations

5.4. MOMENT GENERATING FUNCTION TECHNIQUE FOR LIMITING DISTRIBUTIONS167

5.4 Moment Generating Function Technique for Limiting Dis-

tributions
We now look at the moment generating function technique for determining a limiting
distribution. Suppose we have a sequence of random variables X1 ; X2 ; : : : ; Xn ; : : : and
M1 (t) ; M1 (t) ; : : : ; Mn (t) ; : : : is the corresponding sequence of moment generating func-
tions. For a …xed value of t, the sequence M1 (t) ; M1 (t) ; : : : ; Mn (t) ; : : : is a sequence of
real numbers. In general we will obtain a di¤erent sequence of real numbers for each dif-
ferent value of t. Since we have a sequence of real numbers we will be able to use limit
theorems you have used in your previous calculus courses to evaluate lim Mn (t). We will
n!1
need to take care in determining how Mn (t) behaves as n ! 1 for an interval of values of
t containing the value 0. Of course this technique only works if the the moment generating
function exists and is tractable.

5.4.1 Limit Theorem for Moment Generating Functions

Let X1 ; X2 ; : : : ; Xn ; : : : be a sequence of random variables such that Xn has moment gener-
ating function Mn (t). Let X be a random variable with moment generating function M (t).
If there exists an h > 0 such that

lim Mn (t) = M (t) for all t 2 ( h; h)

n!1

then
Xn !D X
Note:
(1) The sequence of random variables X1 ; X2 ; : : : ; Xn ; : : : converges if the corresponding se-
quence of moment generating functions M1 (t) ; M1 (t) ; : : : ; Mn (t) ; : : : converges pointwise.
(2) This de…nition holds for both discrete and continuous random variables.

Recall from De…nition 5.1.1 that

Xn !D X
if
lim Fn (x) = F (x)
n!1

at all points x at which F (x) is continuous. If X is a discrete random variable then the
cumulative distribution function is a right continuous function. The values of x of main
interest for a discrete random variable are exactly the points at which F (x) is discontinu-
ous. The following theorem indicates that lim Fn (x) = F (x) holds for the values of x at
n!1
which F (x) is discontinuous if Xn and X are non-negative integer-valued random variables.
The named discrete distributions Bernoulli, Binomial, Geometric, Negative Binomial, and
Poisson are all non-negative integer-valued random variables.
168 5. LIMITING OR ASYMPTOTIC DISTRIBUTIONS

5.4.2 Theorem
Suppose Xn and X are non-negative integer-valued random variables. If Xn !D X then
lim P (Xn x) = P (X x) holds for all x and in particular
n!1

lim P (Xn = x) = P (X = x) for x = 0; 1; : : : :

n!1

5.4.3 Example
Consider the sequence of random variables X1 ; X2 ; : : : ; Xk ; : : : where
Xk Negative Binomial(k; p). Use Theorem 5.4.1 to determine the limiting distribution
of Xk as k ! 1, p ! 1 such that kq=p = remains constant where q = 1 p. Use this
limiting distribution and Theorem 5.4.2 to give an approximation for P (Xk = x).

Solution
If Xk Negative Binomial(k; p) then
k
p
Mk (t) = E etXk = for t < log q (5.5)
1 qet

If = kq=p then
k
p= and q = (5.6)
+k +k
Substituting 5.6 into 5.5 and simplifying gives

k
!k !k
+k 1
Mk (t) = t
= +k et
1 +k e k
" #k
1
= (et 1)
1 k
" # k
et 1
= 1 for t < log
k +k

Now " # k
et 1 t
lim 1 = e (e 1)
for t < 1
k!1 k
t
by Corollary 5.1.3. Since M (t) = e (e 1) for t 2 < is the moment generating function of
a Poisson( ) random variable then by Theorem 5.4.1, Xk !D X Poisson( ).
By Theorem 5.4.2
x
kq kq=p
k k p e
P (Xk = x) = p ( q)x for x = 0; 1; : : :
x x!
5.4. MOMENT GENERATING FUNCTION TECHNIQUE FOR LIMITING DISTRIBUTIONS169

5.4.4 Exercise - Poisson Approximation to the Binomial Distribution

Consider the sequence of random variables X1 ; X2 ; : : : ; Xn ; : : : where Xn Binomial(n; p).

Use Theorem 5.4.1 to determine the limiting distribution of Xn as n ! 1, p ! 0 such
that np = remains constant. Use this limiting distribution and Theorem 5.4.2 to give an
approximation for P (Xn = x).

In your previous probability and statistics courses you would have used the Central Limit
Theorem (without proof!) for approximating Binomial and Poisson probabilities as well as
constructing approximate con…dence intervals. We now give a proof of this theorem.

5.4.5 Central Limit Theorem

Suppose X1 ; X2 ; : : : are independent and identically distributed random variables with

E (Xi ) = and V ar (Xi ) = 2 < 1. Consider the sequence of random variables Z1 ; Z2 ; : : : ; Zn ; : : :
p
n(Xn ) P
n
where Zn = and Xn = n1 Xi . Then
i=1

p
n Xn
Zn = !D Z N(0; 1)

Proof
We can write Zn as
1 P n
Zn = p (Xi )
n i=1

Suppose that for i = 1; 2; : : :, Xi has moment generating function MX (t), t 2 ( h; h) for

some h > 0. Then for i = 1; 2; : : :, (Xi ) has moment generating function
t
M (t) = e MX (t), t 2 ( h; h) for some h > 0. Note that

M (0) = 1, M 0 (0) = E (Xi ) = E (Xi ) =0

and
h i
M 00 (0) = E (Xi )2 = V ar (Xi ) = 2

Also by Taylor’s Theorem (see 2.11.15) for n = 2 we have

1
M (t) = M (0) + M 0 (0) t + M 00 (c) t2
2
1 00 2
= 1 + M (c) t (5.7)
2

for some c between 0 and t.

170 5. LIMITING OR ASYMPTOTIC DISTRIBUTIONS

Since X1 ; X2 ; : : : ; Xn are independent and identically distributed, the moment generat-

ing function of Zn is

Mn (t) = E etZn
t P n
= E exp p (Xi )
n i=1
n
t t
= M p for p <h (5.8)
n n
Using (5.7) in (5.8) gives
" #
2 n
1 t
Mn (t) = 1 + M 00 (cn ) p
2 n
( )n
2 2
1 t 1 t
= 1+ p M 00 (cn ) M 00 (0) + p M 00 (0)
2 n 2 n
t
for some cn between 0 and p
n
But M 00 (0) = 2 so
( )n
1 2
2t t2 [M 00 (cn ) M 00 (0)]
Mn (t) = 1+ +
n 2 2 n
t
for some cn between 0 and p
n
t
Since cn is between 0 and p
n
, cn ! 0 as n ! 1. Since M 00 (t) is continuous on ( h; h)

lim M 00 (cn ) = M 00 lim cn = M 00 (0) = 2

n!1 n!1

and
t2 [M 00 (cn ) M 00 (0)]
lim =0
n!1 2 2 n
Therefore by Theorem 5.1.2, with
t2
(n) = M 00 (cn ) M 00 (0)
2 2
we have
( )n
1 2
2t t2 [M 00 (cn ) M 00 (0)]
lim Mn (t) = lim 1+ +
n!1 n!1 n 2 2 n
1 2
= e2t for jtj < 1

which is the moment generating function of a N(0; 1) random variable. Therefore by The-
orem 5.4.1
Zn !D Z N(0; 1)
5.4. MOMENT GENERATING FUNCTION TECHNIQUE FOR LIMITING DISTRIBUTIONS171

as required.

Note: Although this proof assumes that the moment generating function of Xi , i = 1; 2; : : :
exists, it does not make any assumptions about the form of the distribution of the Xi ’s.
There are other more general proofs of the Central Limit Theorem which only assume the
existence of the variance 2 (which implies the existence of the mean ).

2
5.4.6 Example - Normal Approximation to the Distribution
Suppose Yn 2 (n), n = 1; 2; : : :. Consider the sequence of random variables Z ; Z ; : : : ; Z ; : : :
1 2 n
p
where Zn = (Yn n) = 2n. Show that
Yn n
Zn = p !D Z N(0; 1)
2n
Solution
Let Xi 2 (1), i = 1; 2; : : : independently. Since X ; X ; : : : are independent and identically
1 2
distributed random variables with E (Xi ) = 1 and V ar (Xi ) = 2, then by the Central Limit
Theorem p
n Xn 1
p !D Z N(0; 1)
2
Pn
But Xn = n1 Xi so
i=1
p 1 P
n
n n Xi 1
i=1 Sn n
p = p
2 2n
P
n
where Sn = Xi . Therefore
i=1

Sn n
p !D Z N(0; 1)
2n

Now by 4.3.2(6), Sn 2 (n) and therefore Yn and Sn have the same distribution. It follows
that
Yn n
Zn = p !D Z N(0; 1)
2n

5.4.7 Exercise - Normal Approximation to the Binomial Distribution

Suppose Yn Binomial(n; p); n = 1; 2; : : : Consider the sequence of random variables
p
Z1 ; Z2 ; : : : ; Zn ; : : : where Zn = (Yn np) = np(1 p). Show that

Yn np
Zn = p !D Z N(0; 1)
np(1 p)

Hint: Let Xi Binomial(1; p); i = 1; 2; : : : ; n independently.

172 5. LIMITING OR ASYMPTOTIC DISTRIBUTIONS

5.5 Additional Limit Theorems

Suppose we know the limiting distribution of one or more sequences of random variables by
using the de…nitions and/or theorems in the previous sections of this chapter. The theorems
in this section allow us to more easily determine the limiting distribution of a function of
these sequences.

5.5.1 Limit Theorems

(1) If Xn !p a and g is continuous at x = a then g(Xn ) !p g(a).
(2) If Xn !p a, Yn !p b and g(x; y) is continuous at (a; b) then g(Xn ; Yn ) !p g(a; b).
(3) (Slutsky’s Theorem) If Xn !p a, Yn !D Y and g(a; y) is continuous for all y 2 support
set of Y then g(Xn ; Yn ) !D g(a; Y ).

Proof of (1)
Since g is continuous at x = a then for every " > 0 there exists a > 0 such that jx aj <
implies jg (x) g (a)j < ". By Example 2.1.4(d) this implies that

P (jg (Xn ) g (a)j < ") P (jX aj < )

But Xn !p a it follows that for every " > 0 there exists a > 0 such that

lim P (jg (Xn ) g (a)j < ") lim P (jX aj < ) = 1

n!1 n!1

But
lim P (jg (Xn ) g (a)j < ") 1
n!1

so by the Squeeze Theorem

lim P (jg (Xn ) g (a)j < ") = 1

n!1

and therefore g(Xn ) !p g(a).

5.5.2 Example
If Xn !p a > 0, Yn !p b 6= 0 and Zn !D Z N(0; 1) then …nd the limiting distributions
of each of the following:
p
(b) Xn
(b) Xn + Yn
(c) Yn + Zn
(d) Xn Zn
(e) Zn2
5.5. ADDITIONAL LIMIT THEOREMS 173

Solution
p
(a) Let g (x) = x which is a continuous function for all x 2 <+ . Since Xn !p a then by
p p p p
5.5.1(1), Xn = g (Xn ) !p g (a) = a or Xn !p a.
(b) Let g (x; y) = x + y which is a continuous function for all (x; y) 2 <2 . Since Xn !p a
and Yn !p b then by 5.5.1(2), Xn + Yn = g (Xn ; Yn ) !p g (a; b) = a + b or Xn + Yn !p a + b.
(c) Let g (y; z) = y + z which is a continuous function for all (y; z) 2 <2 . Since Yn !p b
and Zn !D Z N(0; 1) then by 5.5.1(3), Yn + Zn = g (Yn ; Zn ) !D g (b; z) = b + Z or
Yn + Zn !D b + Z where Z N(0; 1). Since b + Z N(b; 1), therefore Yn + Zn !D b + Z
N(b; 1)
(d) Let g (x; z) = xz which is a continuous function for all (x; z) 2 <2 . Since Xn !p a and
Zn !D Z N(0; 1) then by Slutsky’s Theorem, Xn Zn = g (Xn ; Zn ) !D g (a; z) = aZ or
Xn Zn !D aZ where Z N(0; 1). Since aZ N 0; a2 , therefore Xn Zn !D aZ N 0; a2
(e) Let g (x; z) = z 2 which is a continuous function for all (x; z) 2 <2 . Since Zn !D Z
N(0; 1) then by Slutsky’s Theorem, Zn2 = g (Xn ; Zn ) !D g (a; z) = Z 2 or Zn2 !D Z 2 where
Z N(0; 1). Since Z 2 2 (1), therefore Z 2 ! Z 2
n D
2 (1)

5.5.3 Exercise
If Xn !p a > 0, Yn !p b 6= 0 and Zn !D Z N(0; 1) then …nd the limiting distributions
of each of the following:
(a) Xn2
(b) Xn Yn
(c) Xn =Yn
(d) Xn 2Zn
(e) 1=Zn

In Example 5.5.2 we identi…ed the function g in each case. As with other limit theorems we
tend not to explicitly identify the function g once we have a good idea of how the theorems
work as illustrated in the next example.

5.5.4 Example
Suppose Xi Poisson( ), i = 1; 2; : : : independently. Consider the sequence of random
variables Z1 ; Z2 ; : : : ; Zn ; : : : where
p
n Xn
Zn = p
Xn

Find the limiting distribution of Zn .

174 5. LIMITING OR ASYMPTOTIC DISTRIBUTIONS

Solution
Since X1 ; X2 ; : : : are independent and identically distributed random variables with
E (Xi ) = and V ar (Xi ) = , then by the Central Limit Theorem
p
n Xn
Wn = p !D Z N(0; 1) (5.9)

and by the Weak Law of Large Numbers

Xn !p (5.10)

By (5.10) and 5.5.1(1) s r

Xn
Un = !p =1 (5.11)

Now
p
n Xn
Zn = p
Xn
p
n(Xn )
p
= q
Xn

Wn
=
Un
By (5.9), (5.11), and Slutsky’s Theorem

Wn Z
Zn = !D =Z N(0; 1)
Un 1

5.5.5 Example
Suppose Xi Uniform(0; 1), i = 1; 2; : : : independently. Consider the sequence of random
variables U1 ; U2 ; : : : ; Un ; : : : where Un = max (X1 ; X2 ; : : : ; Xn ). Show that
(a) Un !p 1
(b) eUn !p e
(c) sin (1 Un ) !p 0
(d) Vn = n (1 Un ) !D V Exponential(1)
(e) 1 e Vn !D 1 e V Uniform(0; 1)
2
(f ) (Un + 1) [n (1 Un )] !D 4V Exponential(4)
5.5. ADDITIONAL LIMIT THEOREMS 175

Solution
(a) Since X1 ; X2 ; : : : ; Xn are Uniform(0; 1) random variables then for i = 1; 2; : : :
8
>
> 0 x 0

>
: 1 x 1

Since X1 ; X2 ; : : : ; Xn are independent random variables

Fn (x) = P (Un u)
= P (max (X1 ; X2 ; : : : ; Xn ) u)
Qn
= P (Xi u)
i=1
8
>
> 0 u 0
<
= un 0<u<1
>
>
: 1 u 1

Therefore (
0 u<1
lim Fn (u) =
n!1 1 u 1

and by Theorem 5.2.5

Un = max (X1 ; X2 ; : : : ; Xn ) !p 1 (5.12)
(b) By (5.12) and 5.5.1(1)
eUn !p e1 = e
(c) By (5.12) and 5.5.1(1)

sin (1 Un ) !p sin (1 1) = sin (0) = 0

(d) The cumulative distribution function of Vn = n (1 Un ) is

Gn (x) = P (Vn v)
= P (n (1 max (X1 ; X2 ; : : : ; Xn )) v)
v
= P max (X1 ; X2 ; : : : ; Xn ) 1
n
v
= 1 P max (X1 ; X2 ; : : : ; Xn ) 1
n
Q
n v
= 1 P Xi 1
i=1 n
(
0 v 0
= v n
1 1 n v>0
176 5. LIMITING OR ASYMPTOTIC DISTRIBUTIONS

Therefore (
0 v 0
lim Gn (v) = v
n!1 1 e v>0

which is the cumulative distribution function of an Exponential(1) random variable. There-

fore by De…nition (5.1.1)

Vn = n (1 Un ) !D V Exponential(1) (5.13)

(e) By (5.13) and Slutsky’s Theorem

Vn V
1 e !D 1 e where V Exponential(1)

The probability density function of W = 1 e V is

d
g (w) = elog(1 w)
[ log (1 w)]
dw
= 1 for 0 < w < 1

which is the probability of a Uniform(0; 1) random variable. Therefore

Vn V
1 e !D 1 e Uniform (0; 1)

(f ) By (5.12) and 5.5.1(1)

(Un + 1)2 !p (1 + 1)2 = 4 (5.14)
By (5.13), (5.14), and Slutsky’s Theorem

(Un + 1)2 [n (1 Un )] !D 4V

where V Exponential(1). If V Exponential(1) then 4V Exponential(4) so

(Un + 1)2 [n (1 Un )] !D 4V Exponential (4)

5.5.6 Delta Method

Let X1 ; X2 ; : : : ; Xn ; : : : be a sequence of random variables such that

nb (Xn a) !D X (5.15)

for some b > 0. Suppose the function g(x) is di¤erentiable at a and g 0 (a) 6= 0. Then

nb [g(Xn ) g(a)] !D g 0 (a)X

5.5. ADDITIONAL LIMIT THEOREMS 177

Proof
By Taylor’s Theorem (2.11.15) we have

g (Xn ) = g (a) + g 0 (cn ) (Xn a)

or
g (Xn ) g (a) = g 0 (cn ) (Xn a) (5.16)
where cn is between a and Xn .
From (5.15) it follows that Xn !p a. Since cn is between Xn and a, therefore cn !p a and
by 5.5.1(1)
g 0 (cn ) !p g 0 (a) (5.17)
Multiplying (5.16) by nb gives

nb [g (Xn ) g (a)] = g 0 (cn ) nb (Xn a) (5.18)

Therefore by (5.15), (5.17), (5.18) and Slutsky’s Theorem

nb [g(Xn ) g(a)] !D g 0 (a)X

5.5.7 Example
Suppose Xi Exponential( ), i = 1; 2; : : : independently. Find the limiting distributions
of each of the following:
(a) Xn
p
(b) Un = n Xn
p
n(Xn )
(c) Zn =
p Xn
(d) Vn = n log(Xn ) log

Solution
(a) Since X1 ; X2 ; : : : are independent and identically distributed random variables with
E (Xi ) = and V ar (Xi ) = 2 , then by the Weak Law of Large Numbers

Xn !p (5.19)

(b) Since X1 ; X2 ; : : : are independent and identically distributed random variables with
E (Xi ) = and V ar (Xi ) = 2 , then by the Central Limit Theorem
p
n Xn
Wn = !D Z N (0; 1) (5.20)

Therefore by Slutsky’s Theorem

p 2
Un = n Xn !D Z N 0; (5.21)
178 5. LIMITING OR ASYMPTOTIC DISTRIBUTIONS

(c) Zn can be written as p

p n(Xn )
n Xn
Zn = = Xn
Xn
By (5.19) and 5.5.1(1),
Xn
!p =1 (5.22)

By (5.20), (5.22), and Slutsky’s Theorem

Z
Zn !D =Z N(0; 1)
1
1
(d) Let g (x) = log x, a = , and b = 1=2. Then g 0 (x) = x and g 0 (a) = g 0 ( ) = 1 . By
(5.21) and the Delta Method
1
n1=2 log(Xn ) log !D ( Z) = Z N (0; 1)

5.5.8 Exercise
Suppose Xi Poisson( ), i = 1; 2; : : : independently. Show that
p
Un = n Xn !D U N (0; )
p p p 1
and Vn = n Xn !D V N 0;
4

5.5.9 Theorem
Let X1 ; X2 ; : : : ; Xn ; : : : be a sequence of random variables such that
p 2
n(Xn a) !D X N 0; (5.23)

Suppose the function g(x) is di¤erentiable at a. Then

p 2
n[g(Xn ) g(a)] !D W N 0; g 0 (a) 2

provided g 0 (a) 6= 0.

Proof
Suppose g(x) is a di¤erentiable function at a and g 0 (a) 6= 0. Let b = 1=2. Then by (5.23)
and the Delta Method it follows that
p 2
n[g(Xn ) g(a)] !D W N 0; g 0 (a) 2
5.6. CHAPTER 5 PROBLEMS 179

5.6 Chapter 5 Problems

1. Suppose Yi Exponential( ; 1), i = 1; 2; : : : independently. Find the limiting distri-
butions of

(a) Xn = min (Y1 ; Y2 ; : : : ; Yn )

(b) Un = Xn =
(c) Vn = n(Xn )
(d) Wn = n2 (Xn )

2. Suppose X1 ; X2 ; : : : ; Xn are independent and identically distributed continuous ran-

dom variables with cumulative distribution function F (x) and probability density
function f (x). Let Yn = max (X1 ; X2 ; : : : ; Xn ).
Show that
Zn = n[1 F (Yn )] !D Z Exponential(1)

3. Suppose Xi Poisson( ), i = 1; 2; : : : independently. Find Mn (t) the moment

generating function of
p
Yn = n(Xn )

Show that
1 2
lim log Mn (t) = t
n!1 2
What is the limiting distribution of Yn ?

4. Suppose Xi Exponential( ), i = 1; 2; : : : independently. Show that the moment

generating function of
Pn p
Zn = Xi n = n
i=1

is h p
t= n p i n
Mn (t) = e 1 t= n

Find lim Mn (t) and thus determine the limiting distribution of Zn .

n!1

5. If Z N(0; 1) and Wn 2 (n) independently then we know

Z
Tn = p t(n)
Wn =n

Show that
Tn !D Y N(0; 1)
180 5. LIMITING OR ASYMPTOTIC DISTRIBUTIONS

6. Suppose X1 ; X2 ; : : : are independent and identically distributed random variables with

E(Xi ) = , V ar(Xi ) = 2 < 1, and E(Xi4 ) < 1. Let

1 Pn
Xn = Xi
n i=1
1 P n
2
Sn2 = Xi Xn
n 1 i=1

and p
n Xn
Tn =
Sn
Show that
Sn ! p
and
Tn !D Z N(0; 1)

7. Let Xn Binomial(n; ). Find the limiting distributions of

Xn
(a) Tn = n
Xn Xn
(b) Un = n 1 n
p Xn
(c) Wn = n n
Wn
(d) Zn = p
Un
p q p
Xn
(e) Vn = n arcsin n arcsin

(f) Compare the variances of the limiting distributions of Wn , Zn and Vn and com-
ment.

8. Suppose Xi Geometric( ), i = 1; 2; : : : independently. Let

P
n
Yn = Xi
i=1

Find the limiting distributions of

Yn
(a) Xn = n
p (1 )
(b) Wn = n Xn
1
(c) Vn = 1+Xn
p
(d) Zn = p n(V n )
Vn2 (1 Vn )
5.6. CHAPTER 5 PROBLEMS 181

9. Suppose Xi Gamma(2; ), i = 1; 2; : : : independently. Let

P
n
Yn = Xi
i=1

Find the limiting distributions of

p
Yn 2p
(a) Xn = n and Xn = 2
p
n(Xn 2 ) p
(b) Wn = p and Vn = n Xn 2
2 2
p
n(Xn 2 )
(c) Zn = X =p2
n
p
(d) Un = n log(Xn ) log(2 )
(e) Compare the variances of the limiting distributions of Zn and Un .

10. Suppose X1 ; X2 ; : : : ; Xn ; : : : is a sequence of random variables with

E(Xn ) =

and
a
V ar(Xn ) = for p > 0
np
Show that
Xn !p
182 5. LIMITING OR ASYMPTOTIC DISTRIBUTIONS
6. Maximum Likelihood
Estimation - One Parameter

In this chapter we look at the method of maximum likelihood to obtain both point and
interval estimates of one unknown parameter. Some of this material was introduced in a
previous statistics course such as STAT 221/231/241.

In Section 6:2 we review the de…nitions needed for the method of maximum likelihood
estimation, the derivations of the maximum likelihood estimates for the unknown parameter
in the Binomial, Poisson and Exponential models, and the important invariance property of
maximum likelihood estimates. You will notice that we pay more attention to verifying that
the maximum likelihood estimate does correspond to a maximum using the …rst derivative
test. Example 6.2.9 is new and illustrates how the maximum likelihood estimate is found
when the support set of the random variable depends on the unknown parameter.

In Section 6:3 we de…ne the score function, the information function, and the expected
information function. These functions play an important role in the distribution of the
maximum likelihood estimator. These functions are also used in Newton’s Method which
is a method for determining the maximum likelihood estimate in cases where there is no
explicit solution. Although the maximum likelihood estimates in nearly all the examples
you saw previously could be found explicitly, this is not true in general.

In Section 6:4 we review likelihood intervals. Likelihood intervals provide a way to

summarize the uncertainty in an estimate. In Section 6:5 we give a theorem on the limiting
distribution of the maximum likelihood estimator. This important theorem tells us why
maximum likelihood estimators are good estimators.

In Section 6:6 we review how to …nd a con…dence interval using a pivotal quantity.
Con…dence intervals also give us a way to summarize the uncertainty in an estimate. We
also give a theorem on how to obtain a pivotal quantity using the maximum likelihood
estimator if the parameter is either a scale or location parameter. In Section 6:7 we review
how to …nd an approximate con…dence interval using an asymptotic pivotal quantity. We
then show how to use asymptotic pivotal quantities based on the limiting distribution of
the maximum likelihood estimator to construct approximate con…dence intervals.

183
184 6. MAXIMUM LIKELIHOOD ESTIMATION - ONE PARAMETER

6.1 Introduction
Suppose the random variable X (possibly a vector of random variables) has probability
function/probability density function f (x; ). Suppose also that is unknown and 2
where is the parameter space or the set of possible values of . Let X be the potential
data that is to be collected. In your previous statistics course you learned how numerical
and graphical summaries as well as goodness of …t tests could be used to check whether the
assumed model for an observed set of data x was reasonable. In this course we will assume
that the …t of the model has been checked and that the main focus now is to use the model
and the data to determine point and interval estimates of .

6.1.1 De…nition - Statistic

A statistic, T = T (X), is a function of the data X which does not depend on any unknown
parameters.

6.1.2 Example

Suppose X = (X1 ; X2 ; : : : ; Xn ) is a random sample, that is, X1 ; X2 ; : : : ; Xn are independent

and identically distributed random variables, from a distribution with E(Xi ) = and
V ar(Xi ) = 2 where and 2 are unknown.
Pn Pn
2
The sample mean X = n1 Xi , the sample variance S 2 = n 1 1 Xi X , and the
i=1 i=1
sample minimum X(1) = min (X1 ; X2 ; : : : ; Xn ) are statistics.
Xp
The random variable = n
is not a statistic since it is not only a function of the data X
but also a function of the unknown parameters and 2.

6.1.3 De…nition - Estimator and Estimate

A statistic T = T (X) that is used to estimate ( ), a function of , is called an estimator

of ( ) and an observed value of the statistic t = t(x) is called an estimate of ( ).

6.1.4 Example

Suppose X = (X1 ; X2 ; : : : ; Xn ) are independent and identically distributed random vari-

ables from a distribution with E(Xi ) = and V ar(Xi ) = 2 , i = 1; 2; : : : ; n. Suppose
x = (x1 ; x2 ; : : : ; xn ) is an observed random sample from this distribution.
The random variable X is an estimator of . The number x is an estimate of .
The random variable S is an estimator of . The number s is an estimate of .
6.2. MAXIMUM LIKELIHOOD METHOD 185

6.2 Maximum Likelihood Method

Suppose X is discrete random variable with probability function P (X = x; ) = f (x; ),
2 where the scalar parameter is unknown. Suppose also that x is an observed value
of the random variable X. Then the probability of observing this value is, P (X = x; ) =
f (x; ). With the observed value of x substituted into f (x; ) we have a function of the
parameter only, referred to as the likelihood function and denoted L ( ; x) or L( ). In the
absence of any other information, it seems logical that we should estimate the parameter
using a value most compatible with the data. For example we might choose the value of
which maximizes the probability of the observed data or equivalently the value of which
maximizes the likelihood function.

6.2.1 De…nition - Likelihood Function: Discrete Case

Suppose X is a discrete random variable with probability function f (x; ), where is a
scalar and 2 and x is an observed value of X. The likelihood function for based on
the observed data x is

L( ) = L ( ; x)
= P (observing the data x; )
= P (X = x; )
= f (x; ) for 2

If X = (X1 ; X2 ; : : : ; Xn ) is a random sample from a distribution with probability function

f (x; ) and x = (x1 ; x2 ; : : : ; xn ) are the observed data then the likelihood function for
based on the observed data x is

L( ) = L ( ; x)
= P (observing the data x; )
= P (X1 = x1 ; X2 = x2 ; : : : ; Xn = xn ; )
Qn
= f (xi ; ) for 2
i=1

6.2.2 De…nition - Maximum Likelihood Estimate and Estimator

The value of that maximizes the likelihood function L( ) is called the maximum likelihood
estimate. The maximum likelihood estimate is a function of the observed data x and we
write ^ = ^ (x). The corresponding maximum likelihood estimator, which is a random
variable, is denoted by ~ = ~(X).

The shape of the likelihood function and the value of at which it is maximized are not
a¤ected if L( ) is multiplied by a constant. Indeed it is not the absolute value of the
likelihood function that is important but the relative values at two di¤erent values of the
186 6. MAXIMUM LIKELIHOOD ESTIMATION - ONE PARAMETER

parameter, for example, L( 1 )=L( 2 ). This ratio can be interpreted as how much more or
less consistent the data are with the parameter 1 as compared to 2 . The ratio L( 1 )=L( 2 )
is also una¤ected if L( ) is multiplied by a constant. In view of this the likelihood may be
de…ned as P (X = x; ) or as any constant multiple of it.

To …nd the maximum likelihood estimate we usually solve the equation dd L ( ) = 0. If

the value of which maximizes L ( ) occurs at an endpoint of , of course this does not
provide the value at which the likelihood is maximized. In Example 6.2.9 we see that solving
d
d L ( ) = 0 does not give the maximum likelihood estimate. Most often however we will
…nd ^ by solving dd L ( ) = 0.

Since the log function (Note: log = ln) is an increasing function, the value of which
maximizes the likelihood L ( ) also maximizes log L( ), the logarithm of the likelihood
function. Since it is usually simpler to …nd the derivative of the sum of n terms rather
than the product, it is often easier to determine the maximum likelihood estimate of by
solving dd log L ( ) = 0.

It is important to verify that ^ is the value of which maximizes L ( ) or equivalently

l ( ). This can be done using the …rst derivative test. Recall that the second derivative test
checks for a local extremum.

6.2.3 De…nition - Log Likelihood Function

The log likelihood function is de…ned as

l( ) = l ( ; x) = log L( ) for 2

where x are the observed data and log is the natural logarithmic function.

6.2.4 Example
Suppose in a sequence of n Bernoulli trials the probability of success is equal to and we
have observed x successes. Find the likelihood function, the log likelihood function, the
maximum likelihood estimate of and the maximum likelihood estimator of .

Solution
The likelihood function for based on x successes in n trials is

L( ) = P (X = x; )
n x
= (1 )n x
for 0 1
x

or more simply
L( ) = x
(1 )n x
for 0 1
6.2. MAXIMUM LIKELIHOOD METHOD 187

Suppose x 6= 0 and x 6= n. The log likelihood function is

l( ) = x log + (n x) log (1 ) for 0 < <1

with derivative
d x n x
l( ) =
d 1
x (1 ) (n x)
=
(1 )
x n
= for 0 < <1
(1 )

The solution to dd l( ) = 0 is = x=n which is the sample proportion. Since dd l( ) > 0 if

0 < < x=n and dd l( ) < 0 if x=n < < 1 then, by the …rst derivative test, l( ) has a
absolute maximum at = x=n.
If x = 0 then
L( ) = (1 )n for 0 1
which is a decreasing function of on the interval [0; 1]. L( ) is maximized at the endpoint
= 0 or = 0=n.
If x = n then
n
L( ) = for 0 1
which is a increasing function of on the interval [0; 1]. L( ) is maximized at the endpoint
= 1 or = n=n.
In all cases the value of which maximizes the likelihood function is the sample pro-
portion = x=n. Therefore the maximum likelihood estimate of is ^ = x=n and the
maximum likelihood estimator is ~ = x=n.

6.2.5 Example
Suppose x1 ; x2 ; : : : ; xn is an observed random sample from the Poisson( ) distribution. Find
the likelihood function, the log likelihood function, the maximum likelihood estimate of
and the maximum likelihood estimator of .

Solution
The likelihood function is
Q
n Q
n
L( ) = f (xi ; ) = P (Xi = xi ; )
i=1 i=1
Q
n xi
e
=
i=1 xi !
P
n
Q
n 1 xi
n
= i=1 e for 0
i=1 xi !
188 6. MAXIMUM LIKELIHOOD ESTIMATION - ONE PARAMETER

or more simply
nx n
L( ) = e for 0
Suppose x 6= 0. The log likelihood function is

l ( ) = n (x log ) for >0

with derivative
d x
l( ) = n 1
d
n
= (x ) for >0

The solution to dd l( ) = 0 is = x which is the sample mean. Since dd l( ) > 0 if 0 < < x
and dd l( ) < 0 if > x then, by the …rst derivative test, l( ) has a absolute maximum at
= x.
If x = 0 then
n
L( ) = e for 0
which is a decreasing function of on the interval [0; 1). L( ) is maximized at the endpoint
= 0 or = 0 = x.
In all cases the value of which maximizes the likelihood function is the sample mean
= x. Therefore the maximum likelihood estimate of is ^ = x and the maximum
likelihood estimator is ~ = X.

6.2.6 Likelihood Functions for Continuous Models

Suppose X is a continuous random variable with probability density function f (x; ). For
continuous continuous random variable, P (X = x; ) is unsuitable as a de…nition of the
likelihood function since this probability always equals zero.
For continuous data we usually observe only the value of X rounded to some degree
of precision, for example, data on waiting times is rounded to the closest second or data
on heights is rounded to the closest centimeter. The actual observation is really a discrete
random variable. For example, suppose we observe X correct to one decimal place. Then
Z
1:15

P (we observe 1:1 ; ) = f (x; )dx (0:1)f (1:1; )

1:05

assuming the function f (x; ) is reasonably smooth over the interval. More generally, sup-
pose x1 ; x2 ; : : : ; xn are the observations from a random sample from the distribution with
probability density function f (x; ) which have been rounded to the nearest which is
assumed to be small. Then
Q
n
n Q
n
P (X1 = x1 ; X2 = x2 ; : : : ; Xn = xn ; ) f (xi ; ) = f (xi ; )
i=1 i=1
6.2. MAXIMUM LIKELIHOOD METHOD 189

If we assume that the precision does not depend on the unknown parameter , then the
term n can be ignored. This argument leads us to adopt the following de…nition of the
likelihood function for a random sample from a continuous distribution.

6.2.7 De…nition - Likelihood Function: Continuous Case

If x = (x1 ; x2 ; : : : ; xn ) are the observed values of a random sample from a distribution with
probability density function f (x; ), then the likelihood function is de…ned as
Q
n
L ( ) = L ( ; x) = f (xi ; ) for 2
i=1

6.2.8 Example
Suppose x1 ; x2 ; : : : ; xn is an observed random sample from the Exponential( ) distribution.
Find the likelihood function, the log likelihood function, the maximum likelihood estimate
of and the maximum likelihood estimator of .

Solution
The likelihood function is
Q
n Q
n 1
xi =
L( ) = f (xi ; ) = e
i=1 i=1
1 P
n
= n exp xi =
i=1
n nx=
= e for >0

The log likelihood function is

x
l( ) = n log + for >0

with derivative
d 1 x
l( ) = n 2
d
n
= 2 (x )

Now dd l ( ) = 0 for = x. Since dd l( ) > 0 if 0 < < x and dd l( ) < 0 if > x then, by
the …rst derivative test, l( ) has a absolute maximum at = x. Therefore the maximum
likelihood estimate of is ^ = x and the maximum likelihood estimator is ~ = X.

6.2.9 Example
Suppose x1 ; x2 ; : : : ; xn is an observed random sample from the Uniform(0; ) distribution.
Find the likelihood function, the maximum likelihood estimate of and the maximum
likelihood estimator of .
190 6. MAXIMUM LIKELIHOOD ESTIMATION - ONE PARAMETER

Solution
The probability density function of a Uniform(0; ) random variable is

1
f (x; ) = for 0 x

and zero otherwise. The support set of the random variable X is [0; ] which depends on the
unknown parameter . In such examples care must be taken in determining the maximum
likelihood estimate of .
The likelihood function is
Q
n
L( ) = f (xi ; )
i=1
Q
n 1
= if 0 xi , i = 1; 2; : : : ; n, and >0
i=1
1
= n if 0 xi , i = 1; 2; : : : ; n, and >0

To determine the value of which maximizes L( ) we note that L( ) can be written as

8
<0 if 0 < < x(n)
L( ) =
: 1n if x(n)

where x(n) = max (x1 ; x2 ; : : : ; xn ) is the maximum of the sample. To see this remember
that in order to observe the sample x1 ; x2 ; : : : ; xn the value of must be larger than all the
observed xi ’s.
L( ) is a decreasing function of on the interval [x(n) ; 1). Therefore L( ) is maximized
at = x(n) . The maximum likelihood estimate of is ^ = x(n) and the maximum likelihood
estimator is ~ = X(n) .

Note: In this example there is no solution to dd l ( ) = dd ( n log ) = 0 and the maximum

likelihood estimate of is not found by solving dd l ( ) = 0.

One of the reasons the method of maximum likelihood is so widely used is the invariance
property of the maximum likelihood estimate under one-to-one transformations.

6.2.10 Theorem - Invariance of the Maximum Likelihood Estimate

If ^ is the maximum likelihood estimate of then g(^) is the maximum likelihood estimate
of g ( ).

Note: The invariance property of the maximum likelihood estimate means that if we know
the maximum likelihood estimate of then we know the maximum likelihood estimate of
any function of .
6.3. SCORE AND INFORMATION FUNCTIONS 191

6.2.11 Example

In Example 6.2.8 …nd the maximum likelihood estimate of the median of the distribution
and the maximum likelihood estimate of V ar(~).

Solution
If X has an Exponential( ) distribution then the median m is found by solving

Zm
1 x=
0:5 = e dx
0

to obtain
m= log (0:5)

By the Invariance of the Maximum Likelihood Estimate the maximum likelihood estimate
^ = ^ log (0:5) = x log (0:5).
of m is m
Since Xi has an Exponential( ) distribution with V ar (Xi ) = 2 , i = 1; 2; : : : ; n indepen-
dently, the variance of the maximum likelihood estimator ~ = X is

2
V ar ~ = V ar(X) =
n

By the Invariance of the Maximum Likelihood Estimate the maximum likelihood estimate
of V ar(~) is (^)2 =n = (x)2 =n.

6.3 Score and Information Functions

The derivative of the log likelihood function plays an important role in the method of
maximum likelihood. This function is often called the score function.

6.3.1 De…nition - Score Function

The score function is de…ned as

d d
S( ) = S( ; x) = l( ) = log L ( ) for 2
d d

where x are the observed data.

Another function which plays an important role in the method of maximum likelihood
is the information function.
192 6. MAXIMUM LIKELIHOOD ESTIMATION - ONE PARAMETER

6.3.2 De…nition - Information Function

The information function is de…ned as
d2 d2
I( ) = I( ; x) = l( ) = log L( ) for 2
d 2 d 2
where x are the observed data. I(^) is called the observed information.

In Section 6:7 we will see how the observed information I(^) can be used to construct
an approximate con…dence interval for the unknown parameter . I(^) also tells us about
the concavity of the log likelihood function l ( ).

6.3.3 Example
Find the observed information for Example 6.2.5. Suppose the maximum likelihood esti-
mate of was ^ = 2. Compare I(^) = I (2) if n = 10 and n = 25. Plot the function
r ( ) = l ( ) l(^) for n = 10 and n = 25 on the same graph.

Solution
From Example 6.2.5, the score function is
d n
S( ) = l ( ) = (x ) for >0
d
Therefore the information function is
d2 d h x i nx
l( ) = n 1 =
d 2 d 2

nx
= 2

and the observed information is

nx n^
I(^) = =
^2 ^2
n
=
^

If n = 10 then I(^) = I (2) = n=^ = 10=2 = 5. If n = 25 then I(^) = I (2) = 25=2 = 12:5.
See Figure 6.1. The function r ( ) = l ( ) l(^) is more concave and symmetric for n = 25
than for n = 10. As the number of observations increases we have more “information”
about the unknown parameter .
Although we view the likelihood, log likelihood, score and information functions as
functions of , they are also functions of the observed data x. When it is important to
emphasize the dependence on the data x we will write L( ; x); S( ; x); and I( ; x). When
we wish to determine the sampling distribution of the corresponding random variables we
will write L( ; X), S( ; X), and I( ; X).
6.3. SCORE AND INFORMATION FUNCTIONS 193

-1
n=10
-2

-3
n=25
r(θ)
-4

-5

-6

-7

-8

-9

-10
1 1.5 2 2.5 3 3.5
θ

Figure 6.1: Poisson Log Likelihoods for n = 10 and n = 25

Here is one more function which plays an important role in the method of maximum
likelihood.

6.3.4 De…nition - Expected Information Function

If is a scalar then the expected information function is given by

J( ) = E [I( ; X)]
d2
= E l( ; X) for 2
d 2
where X is the potential data.

Note:
If X = (X1 ; X2 ; : : : ; Xn ) is a random sample from f (x; ) then

d2
J( ) = E l( ; X)
d 2
d2
= nE log f (X; ) for 2
d 2

where X has probability density function f (x; ).

194 6. MAXIMUM LIKELIHOOD ESTIMATION - ONE PARAMETER

6.3.5 Example
For each of the following …nd the observed information I(^) and the expected information
J ( ). Compare I(^) and J(^). Determine the mean and variance of the maximum likeli-
hood estimator ~. Compare the expected information with the variance of the maximum
likelihood estimator.
(a) Example 6.2.4 (Binomial)
(b) Example 6.2.5 (Poisson)
(c) Example 6.2.8 (Exponential)

Solution
(a) From Example 6.2.4, the score function based on x successes in n Bernoulli trials is
d x n
S( ) = l( ) = for 0 < <1
d (1 )
Therefore the information function is
d2 d x n x x n x
I( ) = 2l( ) = = 2
d d 1 (1 )2
x n x
= 2 + for 0 < < 1
(1 )2

Since the maximum likelihood estimate is ^ = x

n the observed information is
x n x x n x
I(^) = 2 + = 2 + 2
^ (1 ^)2 x
n 1 nx
" #
x
n 1 nx 1 1
= n 2 + 2 =n x + x
x
1 nx n 1 n
n
n
= x
n 1 nx
n
=
^(1 ^)

If X has a Binomial(n; ) distribution then E (X) = n and V ar (X) = n (1 ). There-

fore the expected information is
X n X n n (1 )
J( ) = E [I( ; X)] = E 2 + 2 = 2 + 2
(1 ) (1 )
n [(1 )+ ]
=
(1 )
n
= for 0 < <1
(1 )

We note that I(^) = J(^).

6.3. SCORE AND INFORMATION FUNCTIONS 195

The maximum likelihood estimator is ~ = X=n with

X 1 n
E(~) = E = E (X) = =
n n n
and
X 1 n (1 ) (1 )
V ar(~) = V ar = 2
V ar (X) = =
n n n2 n
1
=
J( )

(b) From Example 6.3.3 the information function based on Poisson data x1 ; x2 ; : : : ; xn is
nx
I( ) = 2 for >0

Since the maximum likelihood estimate is ^ = x the observed information is

nx n
I(^) = 2 =
^ ^

We note that I(^) = J(^).

Since Xi has a Poisson( ) distribution with E (Xi ) = V ar (Xi ) = then E X = and
V ar X = =n. Therefore the expected information is

nX n
J( ) = E [I( ; X1 ; X2 ; : : : ; Xn )] = E 2 = 2 ( )
n
= for >0

The maximum likelihood estimator is ~ = X with

E(~) = E X =

and
1
V ar(~) = V ar X = =
n J( )

(c) From Example 6.2.8 the score function based on Exponential data x1 ; x2 ; : : : ; xn is
d 1 x
S( ) = l( ) = n 2 for >0
d
Therefore the information function is
d2 d 1 x
I( ) = 2 l ( ) = nd 2
d
1 2x
= n 2 + 3 for > 0
196 6. MAXIMUM LIKELIHOOD ESTIMATION - ONE PARAMETER

Since the maximum likelihood estimate is ^ = x the observed information is

!
1 2^ n
I(^) = n 2 + 3 =
^ ^ ^
( )2

2
Since Xi has a Exponential( ) distribution with E (Xi ) = and V ar (Xi ) = then
2
E X = and V ar X = =n. Therefore the expected information is

1 2X 1 2
J( ) = E [I( ; X1 ; X2 ; : : : ; Xn )] = nE 2 + 3 =n 2 + 2
n
= 2 for >0

We note that I(^) = J(^).

The maximum likelihood estimator is ~ = X with

E(~) = E X =

and
2
1
V ar(~) = V ar X = =
n J( )

1
In all three examples we have I(^) = J(^), E(~) = , and V ar(~) = [J ( )] .
In the three previous examples, we observed that E(~) = and therefore ~ was an
unbiased estimator of . This is not always true for maximum likelihood estimators as we
see in the next example. However, maximum likelihood estimators usually have other good
properties. Suppose ~n = ~n (X1 ; X2 ; : : : ; Xn ) is the maximum likelihood estimator based
on a sample of size n. If lim E(~n ) = then ~n is an asymptotically unbiased estimator of
n!1
. If ~n !p then ~n is called a consistent estimator of .

6.3.6 Example
Suppose x1 ; x2 ; : : : ; xn is an observed random sample from the distribution with probability
density function
f (x; ) = x 1 for 0 x 1 for > 0 (6.1)

(a) Find the score function, the maximum likelihood estimator, the information function,
the observed information, and the expected information.
Pn
(b) Show that T = log Xi Gamma n; 1
i=1
(c) Use (b) and 2.7.9 to show that ~ is not an unbiased estimator of . Show however that
~ is an aysmptotically unbiased estimator of .
(d) Show that ~ is a consistent estimator of .
6.3. SCORE AND INFORMATION FUNCTIONS 197

(e) Use (b) and 2.7.9 to …nd V ar(~). Compare V ar(~) with the expected information.

Solution
(a) The likelihood function is
Q
n Q
n
1
L( ) = f (xi ; ) = xi
i=1 i=1
1
n Q
n
= xi for >0
i=1

or more simply
n Q
n
L( ) = xi for >0
i=1
The log likelihood function is
P
n
l( ) = n log + log xi
i=1
= n log t for >0
P
n
where t = log xi . The score function is
i=1

d
S( ) = l( )
d
n
= t
1
= (n t) for >0

Now dd l ( ) = 0 for = n=t. Since dd l( ) > 0 if 0 < < n=t and dd l( ) < 0 if > n=t
then, by the …rst derivative test, l( ) has a absolute maximum at = n=t. Therefore the
maximum likelihood estimate of is ^ = n=t and the maximum likelihood estimator is
~ = n=T where T = P log Xi .
n

i=1
The information function is
d2
I( ) = l( )
d 2
n
= 2 for > 0

and the observed information is

n
I(^) = 2
^
The expected information is
n
J( ) = E [I( ; X1 ; X2 ; : : : ; Xn )] = E 2
n
= 2 for >0
198 6. MAXIMUM LIKELIHOOD ESTIMATION - ONE PARAMETER

(b) From Exercise 2.6.12 we have that if Xi has the probability density function (6.1) then

1
Yi = log Xi Exponential for i = 1; 2; : : : ; n (6.2)

From 4.3.2(4) have that

P
n P
n 1
T = log Xi = Yi Gamma n;
i=1 i=1

(c) Since T Gamma n; 1 then from 2.7.9 we have

(n + p)
E (T p ) = p for p > n
(n)

Assuming n > 1 we have

1 (n 1)
E T = 1 =
(n) n 1
so
n 1
E ~ = E = nE
T T
1
= nE T

= n
n 1

= 1 6=
1 n

and therefore ~ is not an unbiased estimator of .

Now ~ is an estimator based on a sample of size n. Since

lim E ~ = lim 1 =
n!1 n!1 1 n

therefore ~ is an asymptotically unbiased estimator of .

(d) By (6.2), Y1 ; Y2 ; : : : ; Yn are independent and identically distributed random variables
with E (Yi ) = 1 and V ar (Yi ) = 12 < 1. Therefore by the Weak Law of Large Numbers

T 1 Pn 1
= Yi !p
n n i=1

and by the Limit Theorems

~ = n !p
T
Thus ~ is a consistent estimator of .
6.3. SCORE AND INFORMATION FUNCTIONS 199

(e) To determine V ar(~) we note that

n
V ar(~) = V ar
T
1
= n2 V ar
T
( " # )
2 2
1 1
= n2 E E
T T
n o
2
= n2 E T 2
E T 1

Since
2
2 (n 2)
E T = 2 =
(n) (n 1) (n 2)
then
n o
2
V ar(~) = n2 E T 2
E T 1

" #
2 2
2
= n
(n 1) (n 2) n 1
(n 1) (n 2)
= n2 2
(n 1)2 (n 2)
2
=
1 2
1 n (n 2)
2
We note that V ar ~ 6= n = 1
J( ) , however for large n, V ar ~ 1
J( ) .

6.3.7 Example
Suppose x1 ; x2 ; : : : ; xn is an observed random sample from the Weibull( ; 1) distribution
with probability density function

1 x
f (x; ) = x e for x > 0; >0

Find the score function and the information function. How would you …nd the maximum
likelihood estimate of ?

Solution
The likelihood function is
Q
n Q
n
1 xi
L( ) = f (xi ; ) = xi e
i=1 i=1
1
n Q
n P
n
= xi exp xi for >0
i=1 i=1
200 6. MAXIMUM LIKELIHOOD ESTIMATION - ONE PARAMETER

or more simply
n Q
n P
n
L( ) = xi exp xi for >0
i=1 i=1

The log likelihood function is

P
n P
n
l( ) = n log + log xi xi for >0
i=1 i=1

The score function is

d
S( ) = l( )
d
n P
n
= +t xi log xi for >0
i=1

P
n
where t = log xi .
i=1

Notice that S( ) = 0 cannot be solved explicitly. The maximum likelihood estimate can
only be determined numerically for a given sample of data x1 ; x2 ; : : : ; xn . Since

d n P
n
S( )= 2 + xi (log xi )2 for >0
d i=1

is negative for all values of > 0 then we know that the function S ( ) is always decreasing
and therefore there is only one solution to S( ) = 0. The solution to S( ) = 0 gives the
maximum likelihood estimate.
The information function is
d2
I( ) = l( )
d 2
n Pn
= 2 + xi (log xi )2 for >0
i=1

To illustrate how to …nd the maximum likelihood estimate for a given sample of data, we
randomly generate 20 observations from the Weibull( ; 1) distribution. To do this we use the
result of Example 2.6.7 in which we showed that if u is an observation from the Uniform(0; 1)
distribution then x = [ log (1 u)]1= is an observation from the Weibull( ; 1) distribution.

The following R code generates the data, plots the likelihood function, …nds ^ by solving
S( ) = 0 using the R function uniroot, and determines S(^) and the observed information
I(^).
6.3. SCORE AND INFORMATION FUNCTIONS 201

# randomly generate 20 observations from a Weibull(theta,1)

# using a random theta value between 0.5 and 1.5
set.seed(20086689) # set the seed so results can be reproduced
truetheta<-runif(1,min=0.5,max=1.5)
# data are sorted and rounded to two decimal places for easier display
x<-sort(round((-log(1-runif(20)))^(1/truetheta),2))
x
#
# function for calculating Weibull likelihood for data x and theta=th
WBLF<-function(th,x)
{n<-length(x)
L<-th^n*prod(x)^th*exp(-sum(x^th))
return(L)}
#
# function for calculating Weibull score for data x and theta=th
WBSF<-function(th,x)
{n<-length(x)
t<-sum(log(x))
S<-(n/th)+t-sum(log(x)*x^th)
return(S)}
#
# function for calculating Weibull information for data x and theta=th
WBIF<-function(th,x)
{n<-length(x)
I<-(n/th^2)+sum(log(x)^2*x^th)
return(I)}
#
# plot the Weibull likelihood function
th<-seq(0.25,0.75,0.01)
L<-sapply(th,WBLF,x)
plot(th,R,"l",xlab=expression(theta),
ylab=expression(paste("L(",theta,")")),lwd=3)
#
# find thetahat using uniroot function
thetahat<-uniroot(function(th) WBSF(th,x),lower=0.4,upper=0.6)$root
cat("thetahat = ",thetahat) # display value of thetahat
# display value of Score function at thetahat
cat("S(thetahat) = ",WBSF(thetahat,x))
# calculate observed information
cat("Observed Information = ",WBIF(thetahat,x))
202 6. MAXIMUM LIKELIHOOD ESTIMATION - ONE PARAMETER

The generated data are

0:01 0:01 0:05 0:07 0:10 0:11 0:23 0:28 0:44 0:46
0:64 1:07 1:16 1:25 2:40 3:03 3:65 5:90 6:60 30:07

The likelihood function is graphed in Figure 6.2. The solution to S( ) = 0 determined

by uniroot was ^ = 0:4951607. S(^) = 2:468574 10 5 which is close to zero and the
observed information was I(^) = 181:8069 which is positive and indicates a local maximum.
We have already shown in this example that the solution to S( ) = 0 is the unique maximum
likelihood estimate.
8e-20
6e-20
L(θ)

4e-20
0e+00 2e-20

0.3 0.4 0.5 0.6 0.7

Figure 6.2: Likelihood function for Example 6.3.7

Note that the interval [0:25; 0:75] used to graph the likelihood function was determined
by trial and error. The values lower=0.4 and upper=0.6 used for uniroot were determined
from the graph of the likelihood function. From the graph is it easy to see that the value
of ^ lies in the interval [0:4; 0:6].

Newton’s Method, which is a numerical method for …nding the roots of an equation usually
discussed in …rst year calculus, is a method which can be used for …nding the maximum like-
lihood estimate. Newton’s Method usually works quite well for …nding maximum likelihood
estimates because the likelihood function is often very quadratic in shape.
6.3. SCORE AND INFORMATION FUNCTIONS 203

6.3.8 Newton’s Method

(0) (i)
Let be an initial estimate of . The estimate can be updated using

(i)
(i+1) (i) S( )
= + (i)
for i = 0; 1; : : :
I( )

Notes:
(0)
(1) The initial estimate, , may be determined by graphing L ( ) or l ( ).
(2) The algorithm is usually run until the value of (i) no longer changes to a reasonable
number of decimal places. When the algorithm is stopped it is always important to check
that the value of obtained does indeed maximize L ( ).
(3) This algorithm is also called the Newton-Raphson Method.
(4) I ( ) can be replaced by J ( ) for a similar algorithm which is called the Method of
Scoring.
(5) If the support set of X depends on (e.g. Uniform(0; )) then ^ is not found by solving
S( ) = 0.

6.3.9 Example

Use Newton’s Method to …nd the maximum likelihood in Example 6.3.7.

Solution
Here is R code for Newton’s Method for the Weibull Example
# Newton’s Method for Weibull Example
NewtonWB<-function(th,x)
{thold<-th
thnew<-th+0.1
while (abs(thold-thnew)>0.00001)
{thold<-thnew
thnew<-thold+WBSF(thold,x)/WBIF(thold,x)
print(thnew)}
return(thnew)}
#
thetahat<-NewtonWB(0.2,x)
Newton’s Method converges after four iterations and the value of thetahat returned is
^ = 0:4951605 which is the same value to six decimal places as was obtained above using
the uniroot function.
204 6. MAXIMUM LIKELIHOOD ESTIMATION - ONE PARAMETER

6.4 Likelihood Intervals

In your previous statistics course, likelihood intervals were introduced as one approach to
constructing an interval estimate for the unknown parameter .

6.4.1 De…nition - Relative Likelihood Function

The relative likelihood function R( ) is de…ned by

L( )
R( ) = R ( ; x) = for 2
L(^)

where x are the observed data.

The relative likelihood function takes on values between 0 and 1 and can be used to rank
parameter values according to their plausibilities in light of the observed data. If R( 1 ) =
0:1, for example, then 1 is rather an implausible parameter value because the data are
ten times more probable when = ^ than they are when = 1 . However, if R( 1 ) = 0:5,
then 1 is a fairly plausible value because it gives the data 50% of the maximum possible
probability under the model.

6.4.2 De…nition - Likelihood Interval

The set of values for which R( ) p is called a 100p% likelihood interval for .

Values of inside a 10% likelihood interval are referred to as plausible values in light of the
observed data. Values of outside a 10% likelihood interval are referred to as implausible
values given the observed data. Values of inside a 50% likelihood interval are very plausible
and values of outside a 1% likelihood interval are very implausible in light of the data.

6.4.3 De…nition - Log Relative Function

The log relative likelihood function is the natural logarithm of the relative likelihood func-
tion:
r( ) = r ( ; x) = log[R( )] for 2

where x are the observed data.

Likelihood regions or intervals may be determined from a graph of R( ) or r( ). Alterna-

tively, they can be found by solving R ( ) p = 0 or r( ) log p = 0. In most cases this
must be done numerically.
6.4. LIKELIHOOD INTERVALS 205

6.4.4 Example
Plot the relative likelihood function for in Example 6.3.7. Find 10% and 50% likelihood
intervals for .

Solution
Here is R code to plot the relative likelihood function for the Weibull Example with lines
for determining 10% and 50% likelihood intervals for as well as code to determine these
intervals using uniroot.
The R function WBRLF uses the R function WBLF from Example 6.3.7.
# function for calculating Weibull relative likelihood function
WBRLF<-function(th,thetahat,x)
{R<-WBLF(th,x)/WBLF(thetahat,x)
return(R)}
#
# plot the Weibull relative likelihood function
th<-seq(0.25,0.75,0.01)
R<-sapply(th,WBRLF,thetahat,x)
plot(th,R,"l",xlab=expression(theta),
ylab=expression(paste("R(",theta,")")),lwd=3)
# add lines to determine 10% and 50% likelihood intervals
abline(a=0.10,b=0,col="red",lwd=2)
abline(a=0.50,b=0,col="blue",lwd=2)
#
# use uniroot to determine endpoints of 10%, 15%, and 50% likelihood intervals
uniroot(function(th) WBRLF(th,thetahat,x)-0.1,lower=0.3,upper=0.4)$root
uniroot(function(th) WBRLF(th,thetahat,x)-0.1,lower=0.6,upper=0.7)$root
uniroot(function(th) WBRLF(th,thetahat,x)-0.5,lower=0.35,upper=0.45)$root
uniroot(function(th) WBRLF(th,thetahat,x)-0.5,lower=0.55,upper=0.65)$root
uniroot(function(th) WBRLF(th,thetahat,x)-0.15,lower=0.3,upper=0.4)$root
uniroot(function(th) WBRLF(th,thetahat,x)-0.15,lower=0.6,upper=0.7)$root

The graph of the relative likelihood function is given in Figure 6.3.

The upper and lower values used for uniroot were determined using this graph.
The 10% likelihood interval for is [0:34; 0:65].
The 50% likelihood interval for is [0:41; 0:58].
For later reference the 15% likelihood interval for is [0:3550; 0:6401].
206 6. MAXIMUM LIKELIHOOD ESTIMATION - ONE PARAMETER

1.0
0.8
0.6
R(θ)

0.4
0.2
0.0

0.3 0.4 0.5 0.6 0.7

Figure 6.3: Relative Likelihood function for Example 6.3.7

6.4.5 Example
Suppose x1 ; x2 ; : : : ; xn is an observed random sample from the Uniform(0; ) distribution.
Plot the relative likelihood function for if n = 20 and x(20) = 1:5. Find 10% and 50%
likelihood intervals for .

Solution
From Example 6.2.9 the likelihood function for n = 10 and ^ = x(10) = 0:5 is
8
<0 if 0 < < 0:5
L( ) =
: 1 if 0:5
10

The relative likelihood function is

8
<0 if 0 < < 0:5
R( ) =
: 0:5 10
if 0:5

is graphed in Figure 6.4 along with lines for determining 10% and 50% likelihood intervals.
To determine the value of at which the horizontal line R = p intersects the graph of
10
R( ) we solve 0:5 = p to obtain = 0:5p 1=10 . Since R( ) = 0 if 0 < < 0:5 then a
100p% likelihood interval for is of the form 0:5; 0:5p 1=10 .
6.4. LIKELIHOOD INTERVALS 207

A 10% likelihood interval is

h i
1=10
0:5; 0:5 (0:1) = [0:5; 0:629]

A 50% likelihood interval is

h i
1=10
0:5; 0:5 (0:5) = [0:5; 0:536]

0.9

0.7
R(θ)

0.5

0.3

0.1

0
0 0.2 0.4 0.6 0.8 1
θ

Figure 6.4: Relative likelihood function for Example 6.4.5

More generally for an observed random sample x1 ; x2 ; : : : ; xn from the Uniform(0; ) distri-
bution a 100p% likelihood interval for will be of the form x(n) ; x(n) p 1=n .

6.4.6 Exercise
Suppose x1 ; x2 ; : : : ; xn is an observed random sample from the Two Parameter Exponential(1; )
distribution. Plot the relative likelihood function for if n = 12 and x(1) = 2. Find 10%
and 50% likelihood intervals for .
208 6. MAXIMUM LIKELIHOOD ESTIMATION - ONE PARAMETER

6.5 Limiting Distribution of Maximum Likelihood Estimator

If there is no explicit expression for the maximum likelihood estimator ~, as in Example
6.3.7, then its sampling distribution can only be obtained by simulation. This makes it
di¢ cult to determine the properties of the maximum likelihood estimator. In particular,
to determine how good an estimator is, we look at its mean and variance. The mean
indicates whether the estimator is close, on average, to the true value of , while the variance
indicates the uncertainty in the estimator. The larger the variance the more uncertainty in
our estimation. To determine how good an estimator is we also look at how the mean and
variance behave as the sample size n ! 1.
We saw in Example 6.3.5 that E(~) = and V ar(~) = [J ( )] 1 ! 0 as n ! 1
for the Binomial, Poisson, Exponential models. For each of these models the maximum
likelihood estimator is an unbiased estimator of . In Example 6.3.6 we saw that E(~) !
and V ar(~) ! [J ( )] 1 ! 0 as n ! 1. The maximum likelihood estimator ~ is an
asymptotically unbiased estimator. In Example 6.3.7 we are not able to determine E(~)
and V ar(~).
The following theorem gives the limiting distribution of the maximum likelihood esti-
mator in general under certain restrictions.

6.5.1 Theorem - Limiting Distribution of Maximum Likelihood Estima-

tor
Suppose Xn = (X1 ; X2 ; : : : ; Xn ) be a random sample from the probability (density) function
f (x; ) for 2 . Let ~n = ~n (Xn ) be the maximum likelihood estimator of based on
Xn .
Then under certain (regularity) conditions

~n !p (6.3)

(~n )[J( )]1=2 !D Z N(0; 1) (6.4)

L( ; Xn ) 2
2 log R( ; Xn ) = 2 log !D W (1) (6.5)
L(~n ; Xn )
for each 2 .
The proof of this result which depends on applying Taylor’s Theorem to the score
function is beyond the scope of this course. The regularity conditions are a bit complicated
but essentially they are a set of conditions which ensure that the error term in Taylor’s
Theorem goes to zero as n ! 1. One of the conditions is that the support set of f (x; ) does
not depend on . Therefore, for example, this theorem cannot be applied to the maximum
likelihood estimator in the case of a random sample from the Uniform(0; ) distribution.
6.5. LIMITING DISTRIBUTION OF MAXIMUM LIKELIHOOD ESTIMATOR 209

This is actually not a problem since the distribution of the maximum likelihood estimator
in this case can be determined exactly.
Since (6.3) holds ~n is called a consistent estimator of .
Theorem 6.5.1 implies that for su¢ ciently large n, ~n has an approximately N( ; 1=J( ))
distribution. Therefore for large n
E(~n )
and the maximum likelihood estimator is an asymptotically unbiased estimator of .
Since ~n has an approximately N( ; 1=J( )) distribution this also means that for su¢ -
ciently large n
1
V ar(~n )
J( )
1=J( ) is called the asymptotic variance of ~n . Of course J( ) is unknown because is
unknown. By (6.3), (6.4), and Slutsky’s Theorem
q
~
( n ) J(~n ) !D Z N(0; 1) (6.6)

which implies that the asymptotic variance of ~n can be estimated using 1=J(^n ). Therefore
for su¢ ciently large n we have
1
V ar(~n )
J(^n )
By the Weak Law of Large Numbers
1 1 Pn d2 d2
I( ; Xn ) = l( ; Xi ) !p E l( ; Xi ) (6.7)
n n i=1 d 2 d 2
Therefore by (6.3), (6.4), (6.7) and the Limit Theorems it follows that
q
~
( n ) I(~n ; Xn ) !D Z N(0; 1) (6.8)

which implies the asymptotic variance of ~n can be also be estimated using 1=I(^n ) where
I(^n ) is the observed information. Therefore for su¢ ciently large n we have
1
V ar(~n )
I(^n )
Results (6.6), (6.8) and (6.5) can be used to construct approximate con…dence intervals
for .
In Chapter 8 we will see how result (6.5) can be used in a test of hypothesis.

Although we will not prove Theorem 6.5.1 in general we can prove the results in a par-
ticular case. The following example illustrates how techniques and theorems from previous
chapters can be used together to obtained the results of interest. It is also a good review
of several ideas covered thus far in these Course Notes.
210 6. MAXIMUM LIKELIHOOD ESTIMATION - ONE PARAMETER

6.5.2 Example
2 4
(a) Suppose X Weibull(2; ). Show that E X 2 = and V ar X 2 = .
(b) Suppose X1 ; X2 ; : : : ; Xn is a random sample from the Weibull(2; ) distribution. Find
the maximum likelihood estimator ~ of , the information function I ( ), the observed
information I(^), and the expected information J ( ).
(c) Show that
~n !p

(d) Show that

(~n )[J( )]1=2 !D Z N(0; 1)

Solution
(a) From Exercise 2.7.9 we have
k
E Xk = k
+1 for k = 1; 2; : : : (6.9)
2
if X Weibull(2; ). Therefore
2
E X2 = 2
+1 = 2
2
4
E X4 = 4
+1 =2 4
2
and h i
2 2 2 2
V ar X 2 = E X2 E X2 =2 4
= 4

(b) The likelihood function is

Q
n Q
n 2x e (xi = )2
i
L( ) = f (xi ; ) = 2
i=1 i=1

2n Q
n 1 P
n
= 2xi exp 2 x2i for >0
i=1 i=1

or more simply
2n t
L( ) = exp 2 for >0

P
n
where t = x2i . The log likelihood function is
i=1

t
l( ) = 2n log 2 for >0

The score function is

d 2n 2t 2 2
S( )= l( ) = + 3 = 2 t n for >0
d
6.5. LIMITING DISTRIBUTION OF MAXIMUM LIKELIHOOD ESTIMATOR 211

d t 1=2 d t 1=2 d
Now d l( ) = 0 for = n . Since d l( ) > 0 if 0 < < n and d l( ) < 0 if
t 1=2 1=2
> n by the …rst derivative test, l( ) has a absolute maximum at = nt
then, .
1=2
Therefore the maximum likelihood estimate of is ^ = n t
. The maximum likelihood
estimator is
1=2
~= T Pn
where T = Xi2
n i=1
The information function is
d2
I( ) = l( )
d 2
6T 2n
= 4 2 for >0

and the observed information is

2
6t 2n 6(n^ ) 2n
I(^) = =
^4 ^2 ^4 ^2
4n
=
^2
2
To …nd the expected information we note that, from (a), E Xi2 = and thus

P
n
E (T ) = E Xi2 =n 2
i=1

Therefore the expected information is

2
6T 2n 6E (T ) 2n 6n 2n
J( ) = E 4 2 = 4 2 = 4 2

4n
= 2 for >0

(c) To show that ~n !p we need to show that

1=2 1=2
~= T 1 Pn
= X2 !p
n n i=1 i

Since X12 ; X22 ; : : : ; Xn2 are independent and identically distributed random variables with
E Xi2 = 2 and V ar Xi2 = 4 for i = 1; 2; : : : ; n then by the Weak Law of Large
Numbers
T 1 Pn
= X 2 !p 2
n n i=1 i
and by 5.5.1(1)
1=2
~= T
!p (6.10)
n
as required.
212 6. MAXIMUM LIKELIHOOD ESTIMATION - ONE PARAMETER

(d) To show that

(~n )[J( )]1=2 !D Z N(0; 1)
we need to show that

[J( )]1=2 (~n )

p " #
2 n T 1=2
= !D Z N(0; 1)
n

Since X12 ; X22 ; : : : ; Xn2 are independent and identically distributed random variables with
E Xi2 = 2 and V ar Xi2 = 4 for i = 1; 2; : : : ; n then by the Central Limit Theorem
p T 2
n n
2 !D Z N(0; 1) (6.11)

p 2 d 1 1
Let g (x) = x and a = . Then dx g (x) = p
2 x
and g 0 (a) = 2 . By (6.11) and the Delta
Method we have q
p p
T 2
n n 1 1
2 !D Z N 0;
2 4 2
or
p q
T
n n
(2 ) 2 (6.12)
p " 1=2
#
2 n T
= !D Z N(0; 1) (6.13)
n

as required.

6.5.3 Example
Suppose X1 ; X2 ; : : : ; Xn is a random sample from the Uniform(0; ) distribution. Since
the support set of Xi depends on Theorem 6.5.1 does not hold. Show however that the
maximum likelihood estimator ~n = X(n) is a consistent estimator of .

Solution
In Example 5.1.5 we showed that ~n = max (X1 ; X2 ; : : : ; Xn ) !p and therefore ~n is a
consistent estimator of .
6.6. CONFIDENCE INTERVALS 213

6.6 Con…dence Intervals

In your previous statistics course a con…dence interval was used to summarize the available
information about an unknown parameter. Con…dence intervals allow us to quantify the
uncertainty in the unknown parameter.

6.6.1 De…nition - Con…dence Interval

Suppose X is a random variable (possibly a vector) whose distribution depends on , and
A(X) and B(X) are statistics. If

P [A(X) B(X)] = p for 0 < p < 1

then [a(x); b(x)] is called a 100p% con…dence interval for where x are the observed data.

Con…dence intervals can be constructed in a straightforward manner if a pivotal quantity

exists.

6.6.2 De…nition - Pivotal Quantity

Suppose X is a random variable (possibly a vector) whose distribution depends on . The
random variable Q(X; ) is called a pivotal quantity if the distribution of Q does not depend
on .

Pivotal quantities can be used for constructing con…dence intervals in the following way.
Since the distribution of Q(X; ) is known we can write down a probability statement of
the form
P (q1 Q(X; ) q2 ) = p
where q1 and q2 do not depend on . If Q is a monotone function of then this statement
can be rewritten as
P [A(X) B(X)] = p
and the interval [a(x); b(x)] is a 100p% con…dence interval.

6.6.3 Example
Suppose X = (X1 ; X2 ; : : : ; Xn ) is a random sample from the Exponential( ) distribution.
Determine the distribution of
Pn
2 Xi
i=1
Q(X; ) =

and thus show Q(X; ) is a pivotal quantity. Show how Q(X; ) can be used to construct
a 100p% equal tail con…dence interval for .
214 6. MAXIMUM LIKELIHOOD ESTIMATION - ONE PARAMETER

Solution
Since X = (X1 ; X2 ; : : : ; Xn ) is a random sample from the Exponential( ) distribution then
by 4.3.2(4)
Pn
Y = Xi Gamma (n; )
i=1

and by Chapter 2, Problem 7(c)

P
n
2 Xi
2Y i=1 2
Q(X; ) = = (2n)

and therefore Q(X; ) is a pivotal quantity.

Find values a and b such that
1 p 1 p
P (W a) = and P (W b) =
2 2
where W 2 (2n).

Since P (a W b) = p then
0 P
n 1
2 Xi
B C
PB
@a
i=1
bC
A=p

or 0 P 1
n P
n
2 X 2 Xi
B i=1 i C
B
P@ i=1 C=p
b a A

Therefore 2 P 3
n Pn
2 xi 2 xi
6 i=1 7
6 i=1 7
4 b ; a 5

is a 100p% equal tail con…dence interval for .

The following theorem gives the pivotal quantity in the case in which is either a location
or scale parameter.

6.6.4 Theorem
Let X = (X1 ; X2 ; : : : ; Xn ) be a random sample from f (x; ) and let ~ = ~(X) be the
maximum likelihood estimator of the scalar parameter based on X.
(1) If is a location parameter of the distribution then Q(X; ) = ~ is a pivotal quantity.
(2) If is a scale parameter of the distribution then Q(X; ) = ~= is a pivotal quantity.
6.6. CONFIDENCE INTERVALS 215

6.6.5 Example
Suppose X = (X1 ; X2 ; : : : ; Xn ) is a random sample from the Weibull(2; ) distribution. Use
Theorem 6.6.4 to …nd a pivotal quantity Q(X; ). Show how the pivotal quantity can be
used to construct a 100p% equal tail con…dence interval for .

Solution
From Chapter 2, Problem 3(a) we know that is a scale parameter for the Weibull(2; )
distribution. From Example 6.5.2 the maximum likelihood estimator of is
1=2 1=2
~= T 1 Pn
= X2
n n i=1 i

Therefore by Theorem 6.6.4

~
Q(X; ) =
1=2
1 1 Pn
= X2
n i=1 i

is a pivotal quantity. To construct a con…dence interval for we need to determine the

distribution of Q(X; ). This looks di¢ cult at …rst until we notice that
!2
~ 1 P n
= Xi2
n 2 i=1

P
n
This form suggests looking for the distribution of Xi2 which is a sum of independent and
i=1
identically distributed random variables X12 ; X22 ; : : : ; Xn2 .
From Chapter 2, Problem 7(h) we have that if Xi Weibull(2; ), i = 1; 2; : : : ; n then

Xi2 Exponential( 2 ) for i = 1; 2; : : : ; n

P
n
Therefore Xi2 is a sum of independent Exponential( 2 ) random variables. By 4.3.2(4)
i=1

P
n
Xi2 Gamma n; 2
i=1

and by Chapter 2, Problem 7(h)

P
n
2 Xi2
i=1 2
Q1 (X; ) = 2 (2n)

and therefore Q1 (X; ) is a pivotal quantity.

216 6. MAXIMUM LIKELIHOOD ESTIMATION - ONE PARAMETER

Now !2
~
Q1 (X; ) = 2n = 2n [Q(X; )]2

is a one-to-one function of Q(X; ). To see that Q1 (X; ) and Q(X; ) generate the same
con…dence intervals for we note that

P (a
Q1 (X; ) b)
0 !2 1
~
= P @a 2n bA
" #
~ 1=2
a 1=2 b
= P
2n 2n
" #
1=2
a 1=2 b
= P Q(X; )
2n 2n

To construct a 100p% equal tail con…dence interval we choose a and b such that
1 p 1 p
P (W a) = and P (W b) =
2 2
where W 2 (2n). Since
" #
~ 1=2
a 1=2 b
p = P
2n 2n
" #
1=2 1=2
2n 2n
= P ~ ~
b a

a 100p% equal tail con…dence interval is

" #
1=2 1=2
^ 2n ; ^
2n
b a

1=2
P
n
where ^ = 1
n x2i
i=1

6.6.6 Example
Suppose X = (X1 ; X2 ; : : : ; Xn ) is a random sample from the Uniform(0; ) distribution.
Use Theorem 6.6.4 to …nd a pivotal quantity Q(X; ). Show how the pivotal quantity can
be used to construct a 100p% con…dence interval for of the form [^; a^].
6.6. CONFIDENCE INTERVALS 217

Solution
From Chapter 2, Problem 3(b) we know that is a scale parameter for the Uniform(0; )
distribution. From Example 6.2.9 the maximum likelihood estimator of is

~ = X(n)

Therefore by Theorem 6.6.4

~ X(n)
Q(X; ) = =

is a pivotal quantity. To construct a con…dence interval for we need to determine the

distribution of Q(X; ).
!
~
P (Q (X; ) q) = P q

= P X(n) q
Qn
= P (Xi q )
i=1
Q
n x
= q since P (Xi x) = for 0 x
i=1
n
= q for 0 q 1

To construct a 100p% con…dence interval for of the form [^; a^] we need to choose a such
that

p = P ~ a~
!
1 ~
= P 1 a =P 1
~ a
1
= P Q (X; ) 1
a
1
= P (Q (X; ) 1) P Q (X; )
a
1
= 1 P Q (X; )
a
n
= 1 a
1=n
or a = (1 p) . The 100p% con…dence interval for is
h i
^; (1 p) 1=n ^

where ^ = x(n) .
218 6. MAXIMUM LIKELIHOOD ESTIMATION - ONE PARAMETER

6.7 Approximate Con…dence Intervals

A pivotal quantity does not always exist. For example there is no pivotal quantity for the
Binomial or the Poisson distributions. In these cases we use an asymptotic pivotal quantity
to construct approximate con…dence intervals.

6.7.1 De…nition - Asymptotic Pivotal Quantity

Suppose Xn = (X1 ; X2 ; : : : ; Xn ) is a random variable whose distribution depends on . The
random variable Q(Xn ; ) is called an asymptotic pivotal quantity if the limiting distribution
of Q(Xn ; ) as n ! 1 does not depend on .

In your previous statistics course, approximate con…dence intervals for the Binomial and
Poisson distribution were justi…ed using a Central Limit Theorem argument. We are now
able to clearly justify the asymptotic pivotal quantity using the theorems of Chapter 5.

6.7.2 Example
Suppose Xn = (X1 ; X2 ; : : : ; Xn ) is a random sample from the Poisson( ) distribution. Show
that p
n Xn
Q(Xn ; ) = p
Xn
is an asymptotic pivotal quantity. Show how Q(Xn ; ) can be used to construct an approx-
imate 100p% equal tail con…dence interval for .

Solution
In Example 5.5.4, the Weak Law of Large Numbers, the Central Limit Theorem and Slut-
sky’s Theorem were all used to prove
p
n Xn
Q(Xn ; ) = p !D Z N(0; 1) (6.14)
Xn
and thus Q(Xn ; ) is an asymptotic pivotal quantity.
Let a be the value such that P (Z a) = (1 + p) =2 where Z N(0; 1). Then by (6.14) we
have
p !
n Xn
p P a p a
Xn
r r !
Xn Xn
= P Xn a Xn a
n n

and an approximate 100p% equal tail con…dence interval for is

" r r #
xn xn
xn a ; xn + a
n n
6.7. APPROXIMATE CONFIDENCE INTERVALS 219

or 2 s s 3
^n ^n
4^n a ; ^n + a 5 (6.15)
n n

since ^n = xn .

6.7.3 Exercise
Suppose Xn Binomial(n; ). Show that
p Xn
n n
Q(Xn ; ) = q
Xn Xn
n 1 n

is an asymptotic pivotal quantity. Show that an approximate 100p% equal tail con…dence
interval for based on Q(Xn ; ) is given by
2 s s 3
^n (1 ^n ) ^n (1 ^n )
4^n a ; ^n + a 5 (6.16)
n n

where ^n = xn
n .

6.7.4 Approximate Con…dence Intervals and the Limiting Distribution

of the Maximum Likelihood Estimator
The limiting distribution of the maximum likelihood estimator ~n = ~n (X1 ; X2 ; : : : ; Xn )
can also be used to construct approximate con…dence intervals. This is particularly useful
in cases in which the maximum likelihood estimate cannot be found explicitly.
Since h i1=2
J(~n ) (~n ) !D Z N(0; 1)
h i1=2
then J(~n ) (~n ) is an asymptotic pivotal quantity. An approximate 100p% con…-
dence interval based on this asymptotic pivotal quantity is given by
s " s s #
^n a 1 1 1
= ^n a ; ^n + a (6.17)
J(^n ) J(^n ) J(^n )
1+p
where P (Z a) = 2 and Z N(0; 1).
Similarly since
[I(~n ; X)]1=2 (~n ) !D Z N(0; 1)
then [I(~n ; X)]1=2 (~n ) is an asymptotic pivotal quantity. An approximate 100p% con…-
dence interval based on this asymptotic pivotal quantity is given by
s " s s #
^n a 1 1 1
= ^n a ; ^n + a (6.18)
I(^n ) I(^n ) I(^n )
220 6. MAXIMUM LIKELIHOOD ESTIMATION - ONE PARAMETER

where I(^n ) is the observed information.

Notes:
(1) One drawback of these intervals is that we don’t know how large n needs to be to obtain
a good approximation.
(2) These approximate con…dence intervals are both symmetric about ^n which may not
be a reasonable summary of the plausible values of in light of the observed data. See
likelihood intervals below.
(3) It is possible to obtain approximate con…dence intervals based on a given data set which
contain values which are not valid for . For example, an interval may contain negative
values although must be positive.

6.7.5 Example

Use the results from Example 6.3.5 to determine the approximate 100p% con…dence intervals
based on (6.17) and (6.18) in the case of Binomial data and Poisson data. Compare these
intervals with the intervals in (6.16) and (6.15).

Solution
From Example 6.3.5F we have that for Binomial data
n
I(^n ) = J(^n ) =
^n (1 ^n )

so (6.17) and (6.18) both give the intervals

2 s s 3
^n (1 ^n ) ^n (1 ^n )
4^n a ; ^n + a 5
n n

which is the same interval as in (6.16).

From Example 6.3.5F we have that for Poisson data
n
I(^n ) = J(^n ) =
^n

so (6.17) and (6.18) both give the intervals

2 s s 3
^n ^n
4^n a ; ^n + a 5
n n

which is the same interval as in (6.15).

6.7. APPROXIMATE CONFIDENCE INTERVALS 221

6.7.6 Example
For Example 6.3.7 construct an approximate 95% con…dence interval based on (6.18). Com-
pare this with the 15% likelihood interval determined in Example 6.4.4.

Solution
From Example 6.3.7 we have ^ = 0:4951605 and I(^) = 181:8069. Therefore an approximate
95% con…dence interval based on (6.18) is
s
^ 1:96 1
I(^)
r
1
= 0:4951605 1:96
181:8069
= 0:4951605 0:145362
= [0:3498; 0:6405]

From Example 6.4.4 the 15% likelihood interval is [0:3550; 0:6401] which is very similar. We
expect this to happen since the relative likelihood function in Figure 6.3 is very symmetric.

6.7.7 Approximate Con…dence Intervals and Likelihood Intervals

In your previous statistics course you learned that likelihood intervals are also approximate
con…dence intervals.

6.7.8 Theorem
If a is a value such othat p = 2P (Z a) 1 where Z N (0; 1), then the likelihood interval
n 2
: R( ) e a =2 is an approximate 100p% con…dence interval.

Proof
By Theorem 6.5.1
L( ; Xn ) 2
2 log R( ; Xn ) = 2 log !D W (1) (6.19)
L(~n ; Xn )
where
n Xn = (X1 ;oX2 ; : : : ; Xn ). The con…dence coe¢ cient corresponding to the interval
2
: R( ) e a =2 is

L( ; Xn ) a2 =2
P e = P 2 log R( ; Xn ) a2
L(~n ; Xn )
P W a2 where W 2
(1) by 6.19
= 2P (Z a) 1 where Z N (0; 1)
= p

as required.
222 6. MAXIMUM LIKELIHOOD ESTIMATION - ONE PARAMETER

6.7.9 Example
Since
0:95 = 2P (Z 1:96) 1 where Z N (0; 1)
and
(1:96)2 =2 1:9208
e =e 0:1465 0:15
therefore a 15% likelihood interval for is also an approximate 95% con…dence interval for
.

6.7.10 Exercise
(a) Show that a 1% likelihood interval is an approximate 99:8% con…dence interval.
(b) Show that a 50% likelihood interval is an approximate 76% con…dence interval.

Note that while the con…dence intervals given by (6.17) or (6.18) are symmetric about the
point estimate ^n , this is not true in general for likelihood intervals.

6.7.11 Example
For Example 6.3.7 compare a 15% likelihood interval with the approximate 95% con…dence
interval in Example 6.7.6.

Solution
From Example 6.3.7 the 15% likelihood interval is

[0:3550; 0:6401]

and from Example 6.7.6 the approximate 95% con…dence interval is

[0:3498; 0:6405]

These intervals are very close and agree to 2 decimal places. The reason for this is because
the likelihood function (see Figure 6.3) is very symmetric about the maximum likelihood
estimate. The approximate intervals (6.17) or (6.18) will be close to the corresponding
likelihood interval whenever the likelihood function is reasonably symmetric about the
maximum likelihood estimate.

6.7.12 Exercise
Suppose x1 ; x2 ; : : : ; xn is an observed random sample from the Logistic( ; 1) distribution
with probability density function

e (x )
f (x; ) = for x 2 <; 2<
1+e (x ) 2
6.7. APPROXIMATE CONFIDENCE INTERVALS 223

(a) Find the likelihood function, the score function, and the information function. How
would you …nd the maximum likelihood estimate of ?
(b) Show that if u is an observation from the Uniform(0; 1) distribution then

1
x= log 1
u

is an observation from the Logistic( ; 1) distribution.

(c) Use the following R code to randomly generate 30 observations from a Logistic( ; 1)
distribution.
# randomly generate 30 observations from a Logistic(theta,1)
# using a random theta value between 2 and 3
set.seed(21086689) # set the seed so results can be reproduced
truetheta<-runif(1,min=2,max=3)
# data are sorted and rounded to two decimal places for easier display
x<-sort(round((truetheta-log(1/runif(30)-1)),2))
x
(d) Use R to plot the likelihood function for based on these data.
(e) Use Newton’s Method and R to …nd ^.
(f ) What are the values of S(^) and I(^)?
(g) Use R to plot the relative likelihood function for based on these data.
(h) Compare the 15% likelihood interval with the approximate 95% con…dence interval
(6.18).
224 6. MAXIMUM LIKELIHOOD ESTIMATION - ONE PARAMETER

6.8 Chapter 6 Problems

1. Suppose X Binomial(n; ). Plot the log relative likelihood function for if x = 3 is
observed for n = 100. On the same graph plot the log relative likelihood function for
if x = 6 is observed for n = 200. Compare the graphs as well as the 10% likelihood
interval and 50% likelihood interval for .

2. Suppose x1 ; x2 ; : : : ; xn is an observed random sample from the Discrete Uniform(1; )

distribution. Find the likelihood function, the maximum likelihood estimate of and
the maximum likelihood estimator of . If n = 20 and x20 = 33, …nd a 15% likelihood
interval for .

3. Suppose x1 ; x2 ; : : : ; xn is an observed random sample from the Geometric( ) distrib-

ution.

(a) Find the score function and the maximum likelihood estimator of .
(b) Find the observed information and the expected information.
(c) Find the maximum likelihood estimator of E(Xi ).
P
20
(d) If n = 20 and xi = 40 then …nd the maximum likelihood estimate of and a
i=1
15% likelihood interval for . Is = 0:5 a plausible value of ? Why?

4. Suppose (X1 ; X2 ; X3 ) Multinomial(n; 2 ; 2 (1 ) ; (1 )2 ). Find the maximum

likelihood estimator of , the observed information and the expected information.

5. Suppose x1 ; x2 ; : : : ; xn is an observed random sample from the Pareto(1; ) distribu-

tion.

(a) Find the score function and the maximum likelihood estimator of .
(b) Find the observed information and the expected information.
(c) Find the maximum likelihood estimator of E(Xi ).
P
20
(d) If n = 20 and log xi = 10 …nd the maximum likelihood estimate of and a
i=1
15% likelihood interval for . Is = 0:1 a plausible value of ? Why?
(e) Show that
P
n
Q(X; ) = 2 log Xi
i=1

is a pivotal quantity. (Hint: What is the distribution of log X if X Pareto(1; )?)

Use this pivotal quantity to determine a 95% equal tail con…dence interval for
the data in (d). Compare this interval with the 15% likelihood interval.
6.8. CHAPTER 6 PROBLEMS 225

6. The following model is proposed for the distribution of family size in a large popula-
tion:
k
P (k children in family; ) = for k = 1; 2; : : :
1 2
P (0 children in family; ) =
1
The parameter is unknown and 0 < < 12 . Fifty families were chosen at random
from the population. The observed numbers of children are given in the following
table:
No. of children 0 1 2 3 4 Total
Frequency observed 17 22 7 3 1 50

(a) Find the likelihood, log likelihood, score and information functions for .
(b) Find the maximum likelihood estimate of and the observed information.
(c) Find a 15% likelihood interval for .
(d) A large study done 20 years earlier indicated that = 0:45. Is this value plausible
for these data?
(e) Calculate estimated expected frequencies. Does the model give a reasonable …t
to the data?

7. Suppose x1 ; x2 ; : : : ; xn is a observed random sample from the Two Parameter

Exponential( ; 1) distribution. Show that is a location parameter and ~ = X(1) is
the maximum likelihood estimator of . Show that

P (~ q) = 1 e nq
for q 0

and thus show that

^ + 1 log (1 p) ; ^
n
and
^ + 1 log 1 p 1
; ^ + log
1+p
n 2 n 2
are both 100p% con…dence intervals for . Which con…dence interval seems more
reasonable?
1 1
8. Suppose X1 ; X2 ; : : : ; Xn is a random sample from the Gamma 2; distribution.

(a) Find ~n , the maximum likelihood estimator of .

(b) Justify the statement ~n !p .
(c) Find the maximum likelihood estimator of V ar(Xi ).
226 6. MAXIMUM LIKELIHOOD ESTIMATION - ONE PARAMETER

P
n
2 (n).
(d) Use moment generating functions to show that Q = 2 Xi If n = 20
i=1
P
20
and xi = 6, use the pivotal quantity Q to construct an exact 95% equal tail
i=1
con…dence interval for . Is = 0:7 a plausible value of ?
h i1=2
(e) Verify that (~n ) J(~n ) !D Z N(0; 1). Use this asymptotic pivotal
quantity to construct an approximate 95% con…dence interval for . Compare
this interval with the exact con…dence interval from (d) and a 15% likelihood
interval for . What do the approximate con…dence interval and the likelihood
interval indicate about the plausibility of the value = 0:7?

9. The number of coliform bacteria X in a 10 cubic centimeter sample of water from a

section of lake near a beach has a Poisson( ) distribution.

(a) If a random sample of n specimen samples is taken and X1 ; X2 ; : : : ; Xn are

the respective numbers of observed bacteria, …nd the likelihood function, the
maximum likelihood estimator and the expected information for :
P
20
(b) If n = 20 and xi = 40, obtain an approximate 95% con…dence interval for
i=1
and a 15% likelihood interval for . Compare the intervals.
(c) Suppose there is a fast, simple test which can detect whether there are bacteria
present, but not the exact number. If Y is the number of samples out of n which
have bacteria, show that

n
P (Y = y) = (1 e )y (e )n y
for y = 0; 1; : : : ; n
y

(d) If n = 20 and we found y = 17 of the samples contained bacteria, use the

likelihood function from part (c) to get an approximate 95% con…dence interval
for . Hint: Let = 1 e and use the likelihood function for to get an
approximate con…dence interval for and then transform this to an approximate
con…dence interval for = log(1 ).
7. Maximum Likelihood
Estimation - Multiparameter

In this chapter we look at the method of maximum likelihood to obtain both point and
interval estimates for the case in which the unknown parameter is a vector of unknown
parameters = ( 1 ; 2 ; : : : ; k ). In your previous statistics course you would have seen the
N ; 2 model with two unknown parameters = ; 2 and the simple linear regression
model N + x; 2 with three unknown parameters = ; ; 2 .

Although the case of k parameters is a natural extension of the one parameter case, the
k parameter case is more challenging. For example, the maximum likelihood estimates are
usually found be solving k nonlinear equations in k unknowns 1 ; 2 ; : : : ; k . In most cases
there are no explicit solutions and the maximum likelihood estimates must be found using a
numerical method such as Newton’s Method. Another challenging issue is how to summarize
the uncertainty in the k estimates. For one parameter it is straightforward to summarize
the uncertainty using a likelihood interval or a con…dence interval. For k parameters these
intervals become regions in <k which are di¢ cult to visualize and interpret.

In Section 7:1 we give all the de…nitions related to …nding the maximum likelihood
estimates for k unknown parameters. These de…nitions are analogous to the de…nitions
which were given in Chapter 6 for one unknown parameter. We also give the extension
of the invariance property of maximum likelihood estimates and Newton’s Method for k
variables. In Section 7:2 we de…ne likelihood regions and show how to …nd them for the
case k = 2.

In Section 7:3 we introduce the Multivariate Normal distribution which is the natural
extension of the Bivariate Normal distribution discussed in Section 3.10. We also give the
limiting distribution of the maximum likelihood estimator of which is a natural extension
of Theorem 6.5.1.

In Section 7:4 we show how to obtain approximate con…dence regions for based on
the limiting distribution of the maximum likelihood estimator of and show how to …nd
them for the case k = 2. We also show how to …nd approximate con…dence intervals for
individual parameters and indicate that these intervals must be used with care.

227
228 7. MAXIMUM LIKELIHOOD ESTIMATION - MULTIPARAMETER

7.1 Likelihood and Related Functions

7.1.1 De…nition - Likelihood Function
Suppose X is a (vector) random variable with probability (density) function f (x; ), where
= ( 1 ; 2 ; : : : ; k ) 2 and is the parameter space or set of possible values of . Suppose
also that x is an observed value of X. The likelihood function for based on the observed
data x is
L( ) = L( ; x) = f (x; ) for 2 (7.1)

If X = (X1 ; X2 ; : : : ; Xn ) is a random sample from a distribution with probability function

f (x; ) and x = (x1 ; x2 ; : : : ; xn ) are the observed data then the likelihood function for
based on the observed data x1 ; x2 ; : : : ; xn is

L( ) = L ( ; x)
Qn
= f (xi ; ) for 2
i=1

Note: If X is a discrete random variable then L( ) = P (observing the data x; ). If X is

a continuous random variable then an argument similar to the one in 6.2.6 can be made to
justify the use of (7.1).

7.1.2 De…nition - Maximum Likelihood Estimate and Estimator

As in the case of k = 1 it is frequently easier to work with the log likelihood function which
is maximized at the same value of as the likelihood function.

7.1.3 De…nition - Log Likelihood Function

The log likelihood function is de…ned as

l( ) = l ( ; x) = log L( ) for 2

where x are the observed data and log is the natural logarithmic function.

The maximum likelihood estimate of , ^ = (^1 ; ^2 ; : : : ; ^k ) is usually found by solving

@l( )
@ j = 0; j = 1; 2; : : : ; k simultaneously. See Chapter 7, Problem 1 for an example in which
the maximum likelihood estimate is not found in this way.
7.1. LIKELIHOOD AND RELATED FUNCTIONS 229

7.1.4 De…nition - Score Vector

If = ( 1 ; 2; : : : ; k ) then the score vector (function) is a 1 k vector of functions de…ned
as
@l( ) @l( ) @l( )
S( ) = S( ; x) = ; ;:::; for 2
@ 1 @ 2 @ k
where x are the observed data.

We will see that, as in the case of one parameter, the information matrice will provides
information about the variance of the maximum likelihood estimator.

7.1.5 De…nition - Information Matrix

If = ( 1 ; 2 ; : : : ; k ) then the information matrix (function) I( ) = I ( ; x) is a k k
symmetric matrix of functions whose (i; j) entry is given by

@ 2 l( )
for 2
@ i@ j

where x are the observed data. I(^) is called the observed information matrix.

7.1.6 De…nition - Expected Information Matrix

If = ( 1 ; 2 ; : : : ; k ) then the expected information matrix (function) J( ) is a k k
symmetric matrix of functions whose (i; j) entry is given by

@ 2 l( ; X)
E for 2
@ i@ j

The invariance property of the maximum likelihood estimator also holds in the multipara-
meter case.

7.1.7 Theorem - Invariance of the Maximum Likelihood Estimate

If ^ = (^1 ; ^2 ; : : : ; ^k ) is the maximum likelihood estimate of = ( 1; 2; : : : ; k ) then g(^)
is the maximum likelihood estimate of g ( ).

7.1.8 Example
Suppose x1 ; x2 ; : : : ; xn is a observed random sample from the N( ; 2 ) distribution. Find the
score vector, the information matrix, the expected information matrix and the maximum
likelihood estimator of = ; 2 . Find the observed information matrix I ^ ; ^ 2 and thus
verify that ^ ; ^ 2 is the maximum likelihood estimator of ; 2 . What is the maximum
likelihood estimator of the parameter = ; 2 = = which is called the coe¢ cient of
variation?
230 7. MAXIMUM LIKELIHOOD ESTIMATION - MULTIPARAMETER

Solution
The likelihood function is
Q
n 1 1
L ; 2
= p exp 2
(xi )2
i=1 2 2
n=2 n=2 1 P
n
= (2 ) 2
exp 2
(xi )2 for 2 <, 2
>0
2 i=1

or more simply

n=2 1 P
n
L ; 2
= 2
exp 2
(xi )2 for 2 <, 2
>0
2 i=1

The log likelihood function is

n 1 P
n
l ; 2
= log 2
2
(xi )2
2 2 i=1
n 1 1 P
n
= log 2 2
(xi x)2 + n (x )2
2 2 i=1
n 1 h i
1
= log 2 2
(n 1) s2 + n (x )2 for 2 <, 2
>0
2 2
where
1 P
n
s2 = (xi x)2
n 1 i=1
Now
@l n 2 1
= 2 (x )=n (x )
@
and h i
@l n 1 1 2
= 2
+ 2
(n 1) s2 + n (x )2
@ 2 2 2

The equations
@l @l
= 0; =0
@ @ 2
are solved simultaneously for

1 Pn (n 1)
= x and 2
= (xi x)2 = s2
n i=1 n

Since
@2l n @2l n (x )
= ; =
@ 2 2 @ @ 2 4

@2l n 1 1 h i
= + (n 1) s2 + n (x )2
@( 2 )2 2 4 6
7.1. LIKELIHOOD AND RELATED FUNCTIONS 231

the information matrix is

2 n(x )
3
n
2 4
I ; 2
=4 h i 5
n(x )
4
n 1
2 4 + 1
6 (n 1) s2 + n (x )2

Since
n n2
I11 ^ ; ^ 2 = > 0 and det I ^ ; ^ 2
= >0
^2 2^ 6
then by the second derivative test the maximum likelihood estimates of of and 2 are
1 Pn (n 1)
^ = x and ^ 2 = (xi x)2 = s2
n i=1 n
and the maximum likelihood estimators are
1 Pn
2 (n 1)
~ = X and ~ 2 = Xi X = S2
n i=1 n
The observed information is 2 3
n
^2
0
I ^; ^2 = 4 n
5
0 2^ 4

Now " #
n n n X
E 2
= 2
; E 4
=0

Also
n 1 1 h 2
i
E + (n 1) S 2 + n X
2 4 6

n 1 1 n 2
h
2
io
= + (n 1) E(S ) + nE X
2 4 6
n 1 1
= + 6 (n 1) 2 + 2
2 4
n
=
2 4
since h i 2
2
E X = 0; E X = V ar X = and E S 2 = 2
n
Therefore the expected information matrix is
" n
#
2 0
2
J ; = n
0 2 4

and the inverse of the expected information matrix is

2 2 3
1 n 0
J ; 2
=4 2 4
5
0 n
232 7. MAXIMUM LIKELIHOOD ESTIMATION - MULTIPARAMETER

Note that
2
V ar X =
n
1 Pn 2(n 1) 4 2 4
2
V ar ^ 2 = V ar Xi X =
n i=1 n2 n

and
1 Pn
2
Cov(X; ^ 2 ) = Cov(X; Xi X )=0
n i=1

P
n
2
since X and Xi X are independent random variables.
i=1

By the invariance property of maximum likelihood estimators the maximum likelihood

estimator of = = is ^ = ^ =^ .

Recall from your previous statistics course that inferences for and 2 are made using
the independent pivotal quantities

X (n 1)S 2 2
p t (n 1) and 2
(n 1)
S= n

See Figure 7.1 for a graph of R ; 2 for n = 50; ^ = 5 and ^ 2 = 4.

0.8

0.6

0.4

0.2

0
8
6 6
4 5.5
5
2 4.5
2 0 4
σ µ

Figure 7.1: Normal Relative Likelihood Function for n = 50; ^ = 5 and ^ 2 = 4

7.1. LIKELIHOOD AND RELATED FUNCTIONS 233

7.1.9 Exercise
Suppose Yi N( + xi ; 2 ), i = 1; 2; : : : ; n independently where the xi are known constants.
Show that the maximum likelihood estimators of , and 2 are given by

~ = Y ^x
Pn
(xi x) Yi Y
~ = i=1
Pn
(xi x)2
i=1
2 1 Pn
~ xi )2
~ = (Yi ~
n i=1

Note: ~ and ~ are also the least squares estimators of and .

7.1.10 Example
Suppose x1 ; x2 ; : : : ; xn is an observed random sample from the Beta(a; b) distribution with
probability density function

(a + b) a
f (x; a; b) = x 1
(1 x)b 1
for 0 < x < 1, a > 0; b > 0
(a) (b)

Find the likelihood function, the score vector, and the information matrix and the expected
information matrix. How would you …nd the maximum likelihood estimates of a and b?

Solution
The likelihood function is
Q
n (a + b) a
L(a; b) = x 1
(1 xi )b 1
for a > 0; b > 0
i=1 (a) (b) i
n a 1 b 1
(a + b) Q
n Q
n
= xi (1 xi )
(a) (b) i=1 i=1

or more simply
n a b
(a + b) Q
n Q
n
L(a; b) = xi (1 xi ) for a > 0; b > 0
(a) (b) i=1 i=1

The log likelihood function is

l(a; b) = n [log (a + b) log (a) log (b) + at1 + bt2 ] for a > 0; b > 0

where
1 Pn 1 Pn
t1 = log xi and t2 = log(1 xi )
n i=1 n i=1
234 7. MAXIMUM LIKELIHOOD ESTIMATION - MULTIPARAMETER

Let
d log (z) 0 (z)
(z) = =
dz (z)
which is called the digamma function.
The score vector is
h i
@l @l
S (a; b) = @a @b
h i
= n (a + b) (a) + t1 (a + b) (b) + t2

for a > 0; b > 0. S (a; b) = (0; 0) must be solved numerically to …nd the maximum likelihood
estimates of a and b.
Let
0 d
(z) = (z)
dz
which is called the trigamma function.
The information matrix is
" #
0 (a) 0 (a + b) 0 (a + b)
I(a; b) = n 0 (a 0 (b) 0 (a
+ b) + b)

for a > 0; b > 0, which is also expected information matrix.

7.1.11 Exercise

Suppose x1 ; x2 ; : : : ; xn is an observed random sample from the Gamma( ; ) distribution.

Find the likelihood function, the score vector, and the information matrix and the expected
information matrix. How would you …nd the maximum likelihood estimates of and ?

Often S( 1 ; 2 ; : : : ; k) = (0; 0; : : : ; 0) must be solved numerically using a method such as

Newton’s Method.

7.1.12 Newton’s Method

Let (0) be an initial estimate of = ( 1; 2 ; : : : ; k ). The estimate (i)

can be updated
using
h ih i 1
(i+1) (i) (i) (i)
= + S( ) I( ) for i = 0; 1; : : :

Note: The initial estimate, (0) , may be determined by calculating L ( ) for a grid of
values to determine the region in which L ( ) obtains a maximum.
7.1. LIKELIHOOD AND RELATED FUNCTIONS 235

7.1.13 Example
Use the following R code to randomly generate 35 observations from a Beta(a; b) distribution
# randomly generate 35 observations from a Beta(a,b)
set.seed(32086689) # set the seed so results can be reproduced
# use randomly generated a and b values
truea<-runif(1,min=2,max=3)
trueb<-runif(1,min=1,max=4)
# data are sorted and rounded to two decimal places for easier display
x<-sort(round(rbeta(35,truea,trueb),2))
x
a; ^b).
Use Newton’s Method and R to …nd (^
a; ^b) and I(^
What are the values of S(^ a; ^b)?

Solution
The generated data are

0:08 0:19 0:21 0:25 0:28 0:29 0:29 0:30 0:30 0:32
0:34 0:36 0:39 0:45 0:45 0:47 0:48 0:49 0:54 0:54
0:55 0:55 0:56 0:56 0:61 0:63 0:64 0:65 0:69 0:69
0:73 0:77 0:79 0:81 0:85

The maximum likelihood estimates of a and b can be found using Newton’s Method given
by " # " #
a(i+1) a(i) h i 1
(i) (i) (i) (i)
= + S(a ; b ) I(a ; b )
b(i+1) b(i)
for i = 0; 1; ::: until convergence.
Here is R code for Newton’s Method for the Beta Example.
# function for calculating Beta score for a and b and data x
BESF<-function(a,b,x)
{S<-length(x)*c(digamma(a+b)-digamma(a)+mean(log(x)),
digamma(a+b)-digamma(b)+mean(log(1-x))))
return(S)}
#)
# function for calculating Beta information for a and b)
BEIF<-function(a,b)
{I<-length(x)*cbind(c(trigamma(a)-trigamma(a+b),-trigamma(a+b)),
c(-trigamma(a+b),trigamma(b)-trigamma(a+b))))
return(I)}
236 7. MAXIMUM LIKELIHOOD ESTIMATION - MULTIPARAMETER

# Newton’s Method for Beta Example

NewtonBE<-function(a,b,x)
{thold<-c(a,b)
thnew<-thold+0.1
while (sum(abs(thold-thnew))>0.0000001)
{thold<-thnew
thnew<-thold+BESF(thold[1],thold[2],x)%*%solve(BEIF(thold[1],thold[2]))
print(thnew)}
return(thnew)}
thetahat<-NewtonBE(2,2,x)
^ = 2:824775 and ^b = 2:97317. The score vector
The maximum likelihood estimates are a
evaluated at (^ ^
a; b) is
h i
a; ^b) =
S(^ 3:108624 10 14 7:771561 10 15

which indicates we have obtained a local extrema. The observed information matrix is
" #
8:249382 6:586959
a; ^b) =
I(^
6:586959 7:381967

Note that since

a; ^b)] = (8:249382) (7:381967)

det[I(^ (6:586959)2 > 0

and
a; ^b)]11 = 8:249382 > 0
[I(^
then by the second derivative test we have found the maximum likelihood estimates.

7.1.14 Exercise
Use the following R code to randomly generate 30 observations from a Gamma( ; ) dis-
tribution
# randomly generate 35 observations from a Gamma(a,b)
set.seed(32067489) # set the seed so results can be reproduced
# use randomly generated a and b values
truea<-runif(1,min=1,max=3)
trueb<-runif(1,min=3,max=5)
# data are sorted and rounded to two decimal places for easier display
x<-sort(round(rgamma(30,truea,scale=trueb),2))
x
Use Newton’s Method and R to …nd (^ ; ^ ). What are the values of S(^ ; ^ ) and I(^ ; ^ )?
7.2. LIKELIHOOD REGIONS 237

7.2 Likelihood Regions

For one unknown parameter likelihood intervals provide a way of summarizing the uncer-
tainty in the maximum likelihood estimate by providing an intervals of values which are
plausible given the observed data. For k unknown parameter summarizing the uncertainty
is more challenging. We begin with the de…nition of a likelihood region which is the natural
extension of a likelihood interval.

7.2.1 De…nition - Likelihood Regions

The set of values for which R( ) p is called a 100p% likelihood region for .

A 100p% likelihood region for two unknown parameters = ( 1 ; 2 ) is given by

f( 1 ; 2 ) ; R ( 1 ; 2 ) pg. These regions will be elliptical in shape. To show this we note
that for ( 1 ; 2 ) su¢ ciently close to (^1 ; ^2 ) we have
" # " #
^1 1 1 h i ^1 1
L ( 1; 2) L(^1 ; ^2 ) + (^1 ; ^2 ) + ^1 1
^2 2 I(^1 ; ^2 )
^2 2 2 ^2 2
" #
h
1 ^ i ^1 1
= L(^1 ; ^2 ) + 1 1
^2 2 I(^1 ; ^2 ) since S(^1 ; ^2 ) = 0
2 ^2 2

Therefore
L ( 1; 2)
R ( 1; 2) =
L(^1 ; ^2 )
" #" #
1 h i I^11 I^12 ^1 1
1 ^1 1
^2 2
2L(^1 ; ^2 ) I^12 I^22 ^2 2
h i 1h i
= 1 2L(^1 ; ^2 ) ( 1
^1 )2 I^11 + 2 1
^1 ( 2
^2 )I^12 + ( 2
^2 )2 I^22

The set of points ( 1 ; 2) which satisfy R ( 1 ; 2) = p is approximately the set of points

( 1 ; 2 ) which satisfy

( 1
^1 )2 I^11 + 2( 1
^1 )( 2
^2 )I^12 + ( 2
^2 )2 I^22 = 2 (1 p) L(^1 ; ^2 )

which we recognize as the points on an ellipse centred at (^1 ; ^2 ). Therefore a 100p%

likelihood region for two unknown parameters = ( 1 ; 2 ) will be the set of points on and
inside a region which will be approximately elliptical in shape.
A similar argument can be made to show that the likelihood regions for three unknown
parameters = ( 1 ; 2 ; 3 ) will be approximate ellipsoids in <3 .
238 7. MAXIMUM LIKELIHOOD ESTIMATION - MULTIPARAMETER

7.2.2 Example
(a) Use R to graph 1%, 5%, 10%, 50%, and 90% likelihood regions for the parameters (a; b)
in Example 7.1.13. Comment on the shapes of the regions.
(b) Is the value (2:5; 3:5) a plausible value of (a; b)?

Solution
(a) The following R code generates the required likelihood regions.
# function for calculating Beta relative likelihood function
# for parameters a,b and data x
BERLF<-function(a,b,that,x)
{t1<-prod(x)
t2<-prod(1-x)
n<-length(x)
ah<-that[1]
bh<-that[2]
L<-<-((gamma(a+b)*gamma(ah)*gamma(bh))/
(gamma(a)*gamma(b)*gamma(ah+bh)))^n*t1^(a-ah)*t2^(b-bh)
return(L)}
#
a<-seq(0.5,5.5,0.01)
b<-seq(0.5,6,0.01)
R<-outer(a,b,FUN = BERLF,thetahat,x)
contour(a,b,R,levels=c(0.01,0.05,0.10,0.50,0.9),xlab="a",ylab="b",lwd=2)
The 1%, 5%, 10%, 50%, and 90% likelihood regions for (a; b) are shown in Figure 7.2.
The likelihood contours are approximate ellipses but they are not symmetric about
the maximum likelihood estimates (^ a; ^b) = (2:824775; 2:97317). The likelihood regions are
more stretched for larger values of a and b. The ellipses are also skewed relative to the ab
coordinate axes. The skewness of the likelihood contours relative to the ab coordinate axes
is determined by the value of I^12 . If the value of I^12 is close to zero the skewness will be
small.
(b) Since R (2:5; 3:5) = 0:082 the point (2:5; 3:5) lies outside a 10% likelihood region so it
is not a very plausible value of (a; b).
Note however that a = 2:5 is a plausible value of a for some values of b, for example,
a = 2:5, b = 2:5 lies inside a 50% likelihood region so (2:5; 2:5) is a plausible value of (a; b).
We see that when there is more than one parameter then we need to determine whether a
set of values are jointly plausible given the observed data.
7.2. LIKELIHOOD REGIONS 239

6
5
4
b

3
2
1

1 2 3 4 5

Figure 7.2: Likelihood regions for Beta example

7.2.3 Exercise
(a) Use R to graph 1%, 5%, 10%, 50%, and 90% likelihood regions for the parameters ( ; )
in Exercise 7.1.14. Comment on the shapes of the regions.
(b) Is the value (3; 2:7) a plausible value of ( ; )?
(c) Use the R code in Exercise 7.1.14 to generate 100 observations from the Gamma( ; )
distribution.
(d) Use R to graph 1%, 5%, 10%, 50%, and 90% likelihood regions for ( ; ) for the data
generated in (c). Comment on the shapes of these regions as compared to the regions in
(a).
240 7. MAXIMUM LIKELIHOOD ESTIMATION - MULTIPARAMETER

7.3 Limiting Distribution of Maximum Likelihood Estimator

To discuss the asymptotic properties of the maximum likelihood estimator in the multipa-
rameter case we need to de…ne convergence in probability and convergence in distribution
of a sequence of random vectors.

7.3.1 De…nition - Convergence of a Sequence of Random Vectors

Let X1 ; X2 ; : : : ; Xn ; : : : be a sequence of random vectors where Xn = (X1n ; X2n ; : : : ; Xkn ).
Let X = (X1 ; X2 ; : : : ; Xk ) be a random vector and x = (x1 ; x2 ; : : : ; xk ).
(1) If Xin !p Xi for i = 1; 2; : : : ; k, then

Xn !p X

(2) Let Fn (x) = P (X1n x1 ; X2n x2 ; : : : ; Xkn xk ) be the cumulative distribution

function of Xn . Let F (x) = P (X1 x1 ; X2 x2 ; : : : ; Xk xk ) be the cumulative dis-
tribution function of X. If
lim Fn (x) = F (x)
n!1

at all points of continuity of F (x) then

Xn !D X = (X1 ; X2 ; : : : ; Xk )

To discuss the asymptotic properties of the maximum likelihood estimator in the multipa-
rameter case we also need the de…nition and properties of the Multivariate Normal Dis-
tribution. The Multivariate Normal distribution is the natural extension of the Bivariate
Normal distribution which was discussed in Section 3.10.

7.3.2 De…nition - Multivariate Normal Distribution (MVN)

Let X = (X1 ; X2 ; : : : ; Xk ) be a 1 k random vector with E(Xi ) = i and Cov(Xi ; Xj ) = ij ;
i; j = 1; 2; : : : ; k. (Note: Cov(Xi ; Xi ) = ii = V ar(Xi ) = 2i .) Let = ( 1; 2; : : : ; k )
be the mean vector and be the k k symmetric covariance matrix whose (i; j) entry is
1 , exists. If the joint probability density
ij . Suppose also that the inverse matrix of ,
function of (X1 ; X2 ; : : : ; Xk ) is given by

1 1 1
f (x1 ; x2 ; : : : ; xk ) = exp (x ) (x )T for x 2 <k
(2 )k=2 j j1=2 2

where x = (x1 ; x2 ; : : : ; xk ) then X is said to have a Multivariate Normal distribution. We

write X MVN( ; ).

The following theorem gives some important properties of the Multivariate Normal distri-
bution. These properties are a natural extension of the properties of the Bivariate Normal
distribution found in Theorem 3.10.2.
7.3. LIMITING DISTRIBUTION OF MAXIMUM LIKELIHOOD ESTIMATOR 241

7.3.3 Theorem - Properties of MVN Distribution

Suppose X = (X1 ; X2 ; : : : ; Xk ) MVN( ; ). Then
(1) X has joint moment generating function

1
M (t) = exp tT + t tT for t = (t1 ; t2 ; : : : ; tk ) 2 <k
2

(2) Any subset of X1 ; X2 ; : : : ; Xk also has a MVN distribution and in particular

Xi N i ; 2i ; i = 1; 2; : : : ; k.
(3)
1
(X ) (X )T 2
(k)
(4) Let c = (c1 ; c2 ; : : : ; ck ) be a nonzero vector of constants then

P
k
XcT = ci Xi N cT ; c cT
i=1

(5) Let A be a k p vector of constants of rank p k then

XA N( A; AT A)

(6) The conditional distribution of any subset of (X1 ; X2 ; : : : ; Xk ) given the rest of the co-
ordinates is a MVN distribution. In particular the conditional probability density function
of Xi given Xj = xj ; i 6= j; is
2 2
Xi jXj = xj N( i + ij i (xj j )= j ; (1 ij ) i )

The following theorem gives the asymptotic distribution of the maximum likelihood esti-
mator in the multiparameter case. This theorem looks very similar to Theorem 6.5.1 with
the scalar quantities replaced by the appropriate vectors and matrices.

7.3.4 Theorem - Limiting Distribution of the Maximum Likelihood Esti-

mator
Suppose Xn = (X1 ; X2 ; : : : ; Xn ) is a random sample from f (x; ) where = ( 1 ; 2 ; : : : ; k ) 2
and the dimension of is k. Let ~n = ~n (X1 ; X2 ; : : : ; Xn ) be the maximum likelihood
estimator of based on Xn . Let 0k be a 1 k vector of zeros, Ik be the k k identity
matrix, and [J( )]1=2 be a matrix such that [J( )]1=2 [J( )]1=2 = J( ). Then under certain
(regularity) conditions

~n !p (7.2)

(~n ) [J( )]1=2 !D Z MVN(0k ; Ik ) (7.3)

2 log R( ; Xn ) = 2[l(~n ; Xn ) l( ; Xn )] !D W 2
(k) (7.4)
242 7. MAXIMUM LIKELIHOOD ESTIMATION - MULTIPARAMETER

for each 2 .

Since ~n !p , ~n is a consistent estimator of .

Theorem 7.3.4 implies that for su¢ ciently large n, ~n has an approximately
MVN( ; [J( 0 )] 1 ) distribution. Therefore for su¢ ciently large n
E(~n )
and therefore ~n is an asymptotically unbiased estimator of . Also
1
V ar(~n ) [J( )]
where [J( )] 1 is the inverse matrix of of the matrix J( ). (Since J( ) is a k k sym-
metric matrix, [J( )] 1 also a k k symmetric matrix.) [J( )] 1 is called the asymptotic
variance/covariance matrix of ~n . Of course J( ) is unknown because is unknown. But
(7.2), (7.3) and Slutsky’s Theorem imply that
(~n )[J(~n )]1=2 !D Z MVN(0k ; Ik )
Therefore for su¢ ciently large n we have
h i 1
V ar(~n ) J(^n )
h i 1
where J(^n ) is the inverse matrix of J(^n ).
It is also possible to show that
(~n )[I(~n ; X)]1=2 !D MVN(0k ; Ik )
so that for su¢ ciently large n we also have
h i 1
V ar(~n ) I(^n )
h i 1
where I(^n ) is the inverse matrix of the observed information matrix I(^n ).
These results can be used to construct approximate con…dence regions for as shown
in the next section.
Note: These results do not hold if the support set of X depends on .

7.4 Approximate Con…dence Regions

7.4.1 De…nition - Con…dence Region
A 100p% con…dence region for = ( 1; 2; : : : ; k ) based on X is a region R(X) <k which
satis…es
P [ 2 R(X)] = p
Exact con…dence regions can only be obtained in a very few special cases such as Normal
linear models. More generally we must rely on approximate con…dence regions based on
the results of Theorem 7.3.3.
7.4. APPROXIMATE CONFIDENCE REGIONS 243

7.4.2 Asymptotic Pivotal Quantities and Approximate Con…dence Re-

gions
The limiting distribution of ~n can be used to obtain approximate con…dence regions for
. Since
(~n )[J( )]1=2 !D Z MVN(0k ; Ik )

it follows from Theorem 7.3.3(3) and Limit Theorems that

(~n )J(~n )(~n )T !D W 2

(k)

An approximate 100p% con…dence region for based on the asymptotic pivotal quantity
(~n )J(~n )(~n )T is the set of all vectors in the set

f : (^n )J(^n )(^n )T cg

where c is the value such that P (W c) = p and W 2 (k).

Similarly since

(~n )[I(~n ; Xn )]1=2 !D Z MVN(0k ; Ik )

it follows from Theorem 7.3.3(3) and Limit Theorems that

(~n )I(~n ; Xn )(~n )T !D W 2

(k)

An approximate 100p% con…dence region for based on the asymptotic pivotal quantity
(~n )I(~n ; Xn )(~n )T is the set of all vectors in the set

f : (^n )I(^n )(^n )T cg

where I(^n ) is the observed information.

Finally since
2
2 log R( ; Xn ) !D W (k)

an approximate 100p% con…dence region for based on this asymptotic pivotal quantity is
the set of all vectors satisfying

f : 2 log R( ; x) cg

where x = (x1 ; x2 ; : : : ; xn ) are the observed data. Since

f : 2 log R( ; x) cg
n o
= : R( ; x) e c=2

we recognize that this interval is actually a likelihood region.

244 7. MAXIMUM LIKELIHOOD ESTIMATION - MULTIPARAMETER

7.4.3 Example
Use R and the results from Examples 7.1.10 and 7.1.13 to graph approximate 90%, 95%,
and 99% con…dence regions for (a; b). Compare these approximate con…dence regions with
the likelihood regions in Example 7.2.2.

Solution
From Example 7.1.10 that for a random sample from the Beta(a; b) distribution the infor-
mation matrix and the expected information matrix are given by
" #
0 (a) 0 (a + b) 0 (a + b)
I(a; b) = n 0 (a + b) 0 (b) 0 (a + b) = J(a; b)

Since " #
h i a
~ a
a
~ a ~b b a; ~b)
J(~ ~b !D W 2
(2)
b
an approximate 100p% con…dence region for (a; b) is given by
" #
h i a
^ a
f(a; b) : a^ a ^b b J(^ ^
a; b) ^ cg
b b

where P (W c) = p. Since 2 (2) = Gamma(1; 2) = Exponential(2), c can be determined

using
Zc
1 x=2 c=2
p = P (W c) = e dx = 1 e
2
0

which gives
c= 2 log(1 p)
For p = 0:95, c = 2 log(0:05) = 5:99; an approximate 95% con…dence region is given by
" #
h i a
^ a
f(a; b) : a^ a ^b b J(^ a; ^b) ^ 5:99g
b b

If we let " #
J^11 J^12
a; ^b) =
J(^
J^12 J^22
then the approximate con…dence region can be written as

f(a; b) : (^
a a)2 J^11 + 2 (^
a a) (^b b)J^12 + (^b b)2 J^22 5:99g

We note that the approximate con…dence region is the set of points on and inside the ellipse

(^
a a)2 J^11 + 2 (^
a a) (^b b)J^12 + (^b b)2 J^22 = 5:99
7.4. APPROXIMATE CONFIDENCE REGIONS 245

a; ^b).
which is centred at (^
For the data in Example 7.1.13, a ^ = 2:824775, ^b = 2:97317 and
" #
8:249382 6:586959
I(^a; ^b) = J(^
a; ^b) =
6:586959 7:381967

Approximate 90% ( 2 log(0:1) = 4:61), 95% ( 2 log(0:05) = 5:99), and 99% ( 2 log(0:01) = 9:21)
con…dence regions are shown in Figure 7.3.
The following R code generates the required approximate con…dence regions.
# function for calculating values for determining confidence regions
ConfRegion<-function(a,b,th,info)
{c<-(th[1]-a)^2*info[1,1]+2*(th[1]-a)*
(th[2]-b)*info[1,2]+(th[2]-b)^2*info[2,2]
return(c)}
#
# graph approximate confidence regions
a<-seq(1,5.5,0.01)
b<-seq(1,6,0.01)
c<-outer(a,b,FUN = ConfRegion,thetahat,Ithetahat)
contour(a,b,c,levels=c(4.61,5.99,9.21),xlab="a",ylab="b",lwd=2)
#
6
5
4
b

3
2
1

1 2 3 4 5

Figure 7.3: Approximate con…dence regions for Beta(a; b) example

246 7. MAXIMUM LIKELIHOOD ESTIMATION - MULTIPARAMETER

A 10% likelihood region for (a; b) is given by f(a; b) : R(a; b; x) 0:1g. Since
2
2 log R(a; b; Xn ) !D W (2) = Exponential (2)

we have

P [R(a; b; X) 0:1] = P [ 2 log R(a; b; X) 2 log (0:1)]

P (W 2 log (0:1))
[ 2 log(0:1)]=2
= 1 e
= 1 0:1 = 0:9

and therefore a 10% likelihood region corresponds to an approximate 90% con…dence region.
Similarly 1% and 5% likelihood regions correspond to approximate 99% and 95% con…dence
regions respectively.
If we compare the likelihood regions in Figure 7.2 with the approximate con…dence
regions shown in Figure 7.3 we notice that the con…dence regions are exact ellipses centred
at the maximum likelihood estimates whereas the likelihood regions are only approximate
ellipses not centered at the maximum likelihood estimates. We notice that there are values
inside an approximate 99% con…dence region but which are outside a 1% likelihood region.
The point (a; b) = (1; 1:5) is an example. There were only 35 observations in this data
set. The di¤erences between the likelihood regions and the approximate con…dence regions
indicate that the Normal approximation might not be good. In this example the likelihood
regions provide a better summary of the uncertainty in the estimates.

7.4.4 Exercise
Use R and the results from Exercises 7.1.11 and 7.1.14 to graph approximate 90%, 95%,
and 99% con…dence regions for (a; b). Compare these approximate con…dence regions with
the likelihood regions in Exercise 7.2.3.

Since likelihood regions and approximate con…dence regions cannot be graphed or easily
interpreted for more than two parameters, we often construct approximate con…dence inter-
vals for individual parameters. Such con…dence intervals are often referred to as marginal
con…dence intervals. These con…dence intervals must be used with care as we will see in
Example 7.4.6.
Approximate con…dence intervals can also be constructed for a linear combination of para-
meters. An illustration is given in Example 7.4.6.
7.4. APPROXIMATE CONFIDENCE REGIONS 247

7.4.5 Approximate Marginal Con…dence Intervals

Let i be the ith entry in the vector = ( 1; 2 ; : : : ; k ). Since

(~n )[J( )]1=2 !D Z MVN(0k ; Ik )

it follows that an approximate 100p% marginal con…dence interval for i is given by

p p
[î a vîi ; î + a vîi ]

where ^i is the ith entry in the vector ^n , v^ii is the (i; i) entry of the matrix [J(^n )] 1, and
a is the value such that P (Z a) = 1+p 2 where Z N(0; 1).
Similarly since

(~n )[I(~n ; Xn )]1=2 !D Z MVN(0k ; Ik )

it follows that an approximate 100p% con…dence interval for i is given by

p p
[î a vîi ; î + a vîi ]

where v^ii is now the (i; i) entry of the matrix [I(^n )] 1.

7.4.6 Example
Using the results from Examples 7.1.10 and 7.1.13 determine approximate 95% marginal
con…dence intervals for a, b, and an approximate con…dence interval for a + b.

Solution
Let " #
h i 1 v^11 v^12
a; ^b)
J(^ =
v^12 v^22
Since " #!
h i h i 1 0
a
~ a ~b b a; ~b)]1=2 !D Z
[J(~ BVN 0 0 ;
0 1

a) v^11 , V ar(~b) v^22 and Cov(~

then for large n, V ar(~ a; ~b) v^12 . Therefore an approxi-
mate 95% con…dence interval for a is given by
p p
[^a 1:96 v^11 ; a ^ + 1:96 v^11 ]

and an approximate 95% con…dence interval for b is given by

p p
[^b 1:96 v^22 ; ^b + 1:96 v^22 ]
248 7. MAXIMUM LIKELIHOOD ESTIMATION - MULTIPARAMETER

^ = 2:824775, ^b = 2:97317 and

For the data in Example 7.1.13, a
h i 1 h i 1
a; ^b)
I(^ = a; ^b)
J(^
" # 1
8:249382 6:586959
=
6:586959 7:381967
" #
0:4216186 0:3762120
=
0:3762120 0:4711608

An approximate 95% marginal con…dence interval for a is

p p
[2:824775 + 1:96 0:4216186; 2:824775 1:96 0:4216186] = [1:998403; 3:651148]

and an approximate 95% con…dence interval for b is

p p
[2:97317 1:96 0:4711608; 2:97317 + 1:96 0:4711608] = [2:049695; 3:896645]

Note that a = 2:1 is in the approximate 95% marginal con…dence interval for a and b = 3:8
is in the approximate 95% marginal con…dence interval for b and yet the point (2:1; 3:8)
is not in the approximate 95% joint con…dence region for (a; b). Clearly these marginal
con…dence intervals for a and b must be used with care.
To obtain an approximate 95% marginal con…dence interval for a + b we note that

a + ~b) = V ar(~
V ar(~ a) + V ar(~b) + 2Cov(~
a; ~b)
v^11 + v^22 + 2^
v12 = v^

so that an approximate 95% con…dence interval for a + b is given by

p p
a + ^b 1:96 v^; a
[^ ^ + ^b + 1:96 v^]

For the data in Example 7.1.13

^ + ^b = 2:824775 + 2:824775 = 5:797945

a
v^ = v^11 + v^22 + 2^
v12 = 0:4216186 + 0:4711608 + 2(0:3762120) = 1:645203

and an approximate 95% marginal con…dence interval for a + b is

p p
[5:797945 + 1:96 1:645203; 5:797945 1:96 1:645203] = [2:573347; 9:022544]

7.4.7 Exercise
Using the results from Exercises 7.1.11 and 7.1.14 determine approximate 95% marginal
con…dence intervals for a, b, and an approximate con…dence interval for a + b.
7.5. CHAPTER 7 PROBLEMS 249

7.5 Chapter 7 Problems

1. Suppose x1 ; x2 ; : : : ; xn is a observed random sample from the distribution with cumu-
lative distribution function
2
1
F (x; 1; 2) = 1 for x 1; 1 > 0; 2 >0
x
Find the maximum likelihood estimates and the maximum likelihood estimators of
1 and 2 .

2. Suppose (X1 ; X2 ; X3 ) Multinomial(n; 1 ; 2 ; 3 ). Verify that the maximum likeli-

hood estimators of 1 and 2 are ~1 = X1 =n and ~2 = X2 =n. Find the expected
information for 1 and 2 .

3. Suppose x11 ; x12 ; : : : ; x1n1 is an observed random sample from the N( 1 ; 2 ) distri-
bution and independently x21 ; x22 ; : : : ; x2n2 is an observed random sample from the
N( 2 ; 2 ) distribution. Find the maximum likelihood estimators of 1 ; 2 ; and 2 .

4. In a large population of males ages 40 50, the proportion who are regular smokers is
where 0 1 and the proportion who have hypertension (high blood pressure) is
where 0 1. Suppose that n men are selected at random from this population
and the observed data are
Category SH SH SH SH
Frequency x11 x12 x21 x22

where S is the event the male is a smoker and H is the event the male has hypertension.

(a) Assuming the events S and H are independent determine the likelihood function,
the score vector, the maximum likelihood estimates, and the information matrix
for and .
(b) Determine the expected information matrix and its inverse matrix. What do
you notice regarding the diagonal entries of the inverse matrix?

5. Suppose x1 ; x2 ; : : : ; xn is an observed random sample from the Logistic( ; ) distrib-

ution.

(a) Find the likelihood function, the score vector, and the information matrix for
and . How would you …nd the maximum likelihood estimates of and ?
(b) Show that if u is an observation from the Uniform(0; 1) distribution then

1
x= log 1
u

is an observation from the Logistic( ; ) distribution.

250 7. MAXIMUM LIKELIHOOD ESTIMATION - MULTIPARAMETER

(c) Use the following R code to randomly generate 30 observations from a Logistic( ; )
distribution.
# randomly generate 30 observations from a Logistic(mu,beta)
# using a random mu and beta values
set.seed(21086689) # set the seed so results can be reproduced
truemu<-runif(1,min=2,max=3)
truebeta<-runif(1,min=3,max=4)
# data are sorted and rounded to two decimal places for easier display
x<-sort(round((truemu-truebeta*log(1/runif(30)-1)),2))
x
(d) Use Newton’s Method and R to …nd (^ ; ^ ). Determine S(^ ; ^ ) and I(^ ; ^ ).
(e) Use R to graph 1%, 5%, 10%, 50%, and 90% likelihood regions for ( ; ).
(f) Use R to graph approximate 90%, 95%, and 99% con…dence regions for ( ; ).
Compare these approximate con…dence regions with the likelihood regions in (e).
(g) Determine approximate 95% marginal con…dence intervals for , , and an ap-
proximate con…dence interval for + .

6. Suppose x1 ; x2 ; : : : ; xn is an observed random sample from the Weibull( ; ) distrib-

ution.

x= [ log (1 u)]1=

is an observation from the Weibull( ; ) distribution.

(c) Use the following R code to randomly generate 40 observations from a Weibull( ; )
distribution.
# randomly generate 40 observations from a Weibull(alpha,beta)
# using random values for alpha and beta
set.seed(21086689) # set the seed so results can be reproduced
truealpha<-runif(1,min=2,max=3)
truebeta<-runif(1,min=3,max=4)
# data are sorted and rounded to two decimal places for easier display
x<-sort(round(truebeta*(-log(1-runif(40)))^(1/truealpha),2))
x
(d) Use Newton’s Method and R to …nd (^ ; ^ ). Determine S(^ ; ^ ) and I(^ ; ^ ).
(e) Use R to graph 1%, 5%, 10%, 50%, and 90% likelihood regions for ( ; ).
7.5. CHAPTER 7 PROBLEMS 251

(f) Use R to graph approximate 90%, 95%, and 99% con…dence regions for ( ; ).
Compare these approximate con…dence regions with the likelihood regions in (e).
(g) Determine approximate 95% marginal con…dence intervals for , , and an ap-
proximate con…dence interval for + .
xi 1
7. Suppose Yi Binomial(1; pi ), i = 1; 2; : : : ; n independently where pi = 1 + e
and the xi are known constants.

(a) Determine the likelihood function, the score vector, and the expected information
matrix for and .
(b) Explain how you would use Newton’s method to …nd the maximum likelihood
estimates of and .

8. Suppose x1 ; x2 ; : : : ; xn is an observed random sample from the Three Parameter Burr

distribution with probability density function
1
(x= )
f (x; ; ; ) = +1 for x > 0, > 0; > 0; >0
[1 + (x= ) ]
(a) Find the likelihood function, the score vector, and the information matrix for ,
, and . How would you …nd the maximum likelihood estimates of , , and
?
(b) Show that if u is an observation from the Uniform(0; 1) distribution then
h i1=
x= (1 u) 1= 1
is an observation from the Three Parameter Burr distribution.
(c) Use the following R code to randomly generate 60 observations from the Three
Parameter Burr distribution.
# randomly generate 60 observations from the 3 Parameter Burr
# distribution using random values for alpha, beta and gamma
set.seed(21086689) # set the seed so results can be reproduced
truea<-runif(1,min=2,max=3)
trueb<-runif(1,min=3,max=4)
truec<-runif(1,min=3,max=4)
# data are sorted and rounded to 2 decimal places for easier display
x<-sort(round(trueb*((1-runif(60))^(-1/truec)-1)^(1/truea),2))
x
(d) Use Newton’s Method and R to …nd (^ ; ^ ; ^ ). Determine S(^ ; ^ ; ^ ) and I(^ ; ^ ; ^ ).
Use the second derivative test to verify that (^ ; ^ ; ^ ) are the maximum likelihood
estimates.
(e) Determine approximate 95% marginal con…dence intervals for , , and , and
an approximate con…dence interval for + + .
252 7. MAXIMUM LIKELIHOOD ESTIMATION - MULTIPARAMETER
8. Hypothesis Testing

Point estimation is a useful statistical procedure for estimating unknown parameters in

a model based on observed data. Interval estimation is a useful statistical procedure for
quantifying the uncertainty in these estimates. Hypothesis testing is another important
statistical procedure which is used for deciding whether a given statement is supported by
the observed data.
In Section 8:1 we review the de…nitions and steps of a test of hypothesis. Much of
this material was introduced in a previous statistics course such as STAT 221/231/241. In
Section 8:2 we look at how the likelihood function can be use to construct a test of hypothesis
when the model is completely speci…ed by the hypothesis of interest. The material in this
section is mostly a review of material covered in a previous statistics course. In Section 8:3
we look at how the likelihood function function can be use to construct a test of hypothesis
when the model is not completely speci…ed by the hypothesis of interest. The material in
this section is mostly new material.

8.1 Test of Hypothesis

In order to analyse a set of data x we often assume a model f (x; ) where 2 and
is the parameter space or set of possible values of . A test of hypothesis is a statistical
procedure used for evaluating the strength of the evidence provided by the observed data
against an hypothesis. An hypothesis is a statement about the model. In many cases the
hypothesis can be formulated in terms of the parameter as

H0 : 2 0

where 0 is some subset of . H0 is called the null hypothesis. When conducting a test
of hypothesis there is usually another statement of interest which is the statement which
re‡ects what might be true if H0 is not supported by the observed data. This statement is
called the alternative hypothesis and is denoted HA or H1 . In many cases HA may simply
take the form
HA : 2 = 0
In constructing a test of hypothesis it is useful to distinguish between simple and com-
posite hypotheses.

253
254 8. HYPOTHESIS TESTING

8.1.1 De…nition - Simple and Composite Hypotheses

If the hypothesis completely speci…es the model including any parameters in the model
then the hypothesis is simple otherwise the hypothesis is composite.

8.1.2 Example

For each of the following indicate whether the null hypothesis is simple or composite. Spec-
ify and 0 and determine the dimension of each.
(a) It is assumed that the observed data x = (x1 ; x2 ; : : : ; xn ) represent a random sample
from a Poisson( ) distribution. The hypothesis of interest is H0 : = 0 where 0 is a
speci…ed value of .
(b) It is assumed that the observed data x = (x1 ; x2 ; : : : ; xn ) represent a random sample
from a Gamma( ; ) distribution. The hypothesis of interest is H0 : = 0 where 0 is a
speci…ed value of .
(c) It is assumed that the observed data x = (x1 ; x2 ; : : : ; xn ) represent a random sample
from an Exponential( 1 ) distribution and independently the observed data y = (y1 ; y2 ; : : : ; ym )
represent a random sample from an Exponential( 2 ) distribution. The hypothesis of inter-
est is H0 : 1 = 2 .
(d) It is assumed that the observed data x = (x1 ; x2 ; : : : ; xn ) represent a random sample
from a N( 1 ; 21 ) distribution and independently the observed data y = (y1 ; y2 ; : : : ; ym )
represent a random sample from a N( 2 ; 22 ) distribution. The hypothesis of interest is
H0 : 1 = 2 ; 21 = 22 .

Solution
(a) This is a simple hypothesis since the model and the unknown parameter are completely
speci…ed. = f : > 0g which has dimension 1 and 0 = f 0 g which has dimension 0.
(b) This is a composite hypothesis since is not speci…ed by H0 . = f( ; ) : > 0; > 0g
which has dimension 2 and 0 = f( 0 ; ) : > 0g which has dimension 1.
(c) This is a composite hypothesis since 1 and 2 are not speci…ed by H0 .
= f( 1 ; 2 ) : 1 > 0; 2 > 0g which has dimension 2 and
0 = f( 1 ; 2 ) : 1 = 2 ; 1 > 0; 2 > 0g which has dimension 1.

(d) This is a composite hypothesis since 1 , 2, and 22 are not speci…ed by H0 .

1 2,
= 2 2 2 > 0; 2 >0
1 ; 1 ; 2 ; 2 : 1 2 <; 1 2 2 <; 2 which has dimension 4
and 0 = ; 2 ; ; 2 : = ; 2 = 2; 2 <; 21 > 0; 2 2 <; 22 > 0 which has
1 1 2 2 1 2 1 2 1
dimension 2.

To measure the evidence against H0 based on the observed data we use a test statistic or
discrepancy measure.
8.1. TEST OF HYPOTHESIS 255

8.1.3 De…nition - Test Statistic or Discrepancy Measure

A test statistic or discrepancy measure D is a function of the data X that is constructed
to measure the degree of “agreement” between the data X and the null hypothesis H0 .

A test statistic is usually chosen so that a small observed value of the test statistic
indicates close agreement between the observed data and the null hypothesis H0 while a
large observed value of the test statistic indicates poor agreement. The test statistic is
chosen before the data are examined and the choice re‡ects the type of departure from the
null hypothesis H0 that we wish to detect as speci…ed by the alternative hypothesis HA . A
general method for constructing test statistics can be based on the likelihood function as
we will see in the next two sections.

8.1.4 Example
For Example 8.1.2(a) suggest a test statistic which could be used if the alternative hypoth-
esis is HA : 6= 0 . Suggest a test statistic which could be used if the alternative hypothesis
is HA : > 0 and if the alternative hypothesis is HA : < 0 .

Solution
If H0 : = 0 is true then E X = 0 . If the alternative hypothesis is HA : 6= 0 then a
reasonable test statistic which could be used is D = X 0 .
If the alternative hypothesis is HA : > 0 then a reasonable test statistic which could be
used is D = X 0.
If the alternative hypothesis is HA : < 0 then a reasonable test statistic which could be
used is D = 0 X.

After the data have been collected the observed value of the test statistic is calculated.
Assuming the null hypothesis H0 is true we compute the probability of observing a value
of the test statistic at least as great as that observed. This probability is called the p-value
of the data in relation to the null hypothesis H0 .

8.1.5 De…nition - p-value

Suppose we use the test statistic D = D (X) to test the null hypothesis H0 . Suppose also
that d = D (x) is the observed value of D. The p-value or observed signi…cance level of the
test of hypothesis H0 using test statistic D is

p-value = P (D d; H0 )

The p-value is the probability of observing such poor agreement using test statistic D
between the null hypothesis H0 and the data if the null hypothesis H0 is true. If the p-value
256 8. HYPOTHESIS TESTING

is very small, then such poor agreement would occur very rarely if the null hypothesis H0 is
true, and we interpret this to mean that the observed data are providing evidence against
the null hypothesis H0 . The smaller the p-value the stronger the evidence against the null
hypothesis H0 based on the observed data. A large p-value does not mean that the null
hypothesis H0 is true but only indicates a lack of evidence against the null hypothesis H0
based on the observed data and using the test statistic D.

The following table gives a rough guideline for interpreting p-values. These are only
guidelines. The interpretation of p-values must always be made in the context of a given
study.

Table 10.1: Guidelines for interpreting p-values

p-value Interpretation
p-value > 0:10 No evidence against H0 based on the observed data.
0:05 < p-value 0:10 Weak evidence against H0 based on the observed data.
0:01 < p-value 0:05 Evidence against H0 based on the observed data.
0:001 < p-value 0:01 Strong evidence against H0 based on the observed data.
p-value 0:001 Very strong evidence against H0 based on the observed data.

8.1.6 Example

For Example 8.1.4 suppose x = 5:7, n = 25 and 0 = 5. Determine the p-value for both
HA : 6= 0 and HA : > 0 . Give a conclusion in each case.

Solution
For x = 5:7, n = 25; 0 = 5, and HA : 6= 5 the observed value of the test statistic is
d = j5:7 5j = 0:7, and

p-value = P X 5 0:7; H0 : =5
P
25
= P (jT 125j 17:5) where T = Xi Poisson (125)
i=1
= P (T 107:5) + P (T 142:5)
= P (T 107) + P (T 143)
= 0:05605429 + 0:06113746
= 0:1171917

calculated using R. Since p-value > 0:1 there is no evidence against H0 : = 5 based on
the data.
For x = 5:7, n = 25; 0 = 5, and HA : > 5 the observed value of the test statistic is
8.2. LIKELIHOOD RATIO TESTS FOR SIMPLE HYPOTHESES 257

d = 5:7 5 = 0:7, and

p-value = P X 5 0:7; H0 : =5
P
25
= P (T 125 17:5) where T = Xi Poisson (125)
i=1
= P (T 143)
= 0:06113746

Since 0:05 < p-value 0:1 there is weak evidence against H0 : = 5 based on the data.

8.1.7 Exercise
Suppose in a Binomial experiment 42 successes have been observed in 100 trials and the
hypothesis of interest is H0 : = 0:5.
(a) If the alternative hypothesis is HA : 6= 0:5, suggest a suitable test statistic, calculate
the p-value and give a conclusion.
(b) If the alternative hypothesis is HA : < 0:5, suggest a suitable test statistic, calculate
the p-value and give a conclusion.

8.2 Likelihood Ratio Tests for Simple Hypotheses

In Examples 8.1.4 and 8.1.7 it was reasonably straightforward to suggest a test statistic
which made sense. In this section we consider a general method for constructing a test
statistic which has good properties in the case of a simple hypothesis. The test statistic
we use is the likelihood ratio test statistic which was introduced in your previous statistics
course.
Suppose X = (X1 ; X2 ; : : : ; Xn ) is a random sample from f (x; ) where 2 and the
dimension of is k. Suppose also that the hypothesis of interest is H0 : = 0 where the
elements of 0 are completely speci…ed. H0 can also be written as H0 : 2 0 where 0
consists of the single point 0 . The dimension of 0 is zero. H0 is a simple hypothesis
since the model and all the parameters are completely speci…ed. The likelihood ratio test
statistic for this simple hypothesis is

(X; 0) = 2 log R ( 0 ; X)
" #
L ( 0 ; X)
= 2 log
L(~; X)
h i
= 2 l(~; X) l ( 0 ; X)

where ~ = ~ (X) is the maximum likelihood estimator of . Note that this test statistic
implicitly assumes that the alternative hypothesis is HA : 6= 0 or HA : 2
= 0.
258 8. HYPOTHESIS TESTING

Let the observed value of the likelihood ratio test statistic be

h i
(x; 0 ) = 2 l(^; x) l ( 0 ; x)

where x = (x1 ; x2 ; : : : ; xn ) are the observed data. The p-value is

p-value = P [ (X; 0) (x; 0 ); H0 ]

Note that the p-value is calculated assuming H0 : = 0 is true. In general this p-value
is di¢ cult to determine exactly since the distribution of the random variable (X; 0 ) is
usually intractable. We use the result from Theorem 7.3.4 which says that under certain
(regularity) conditions

2 log R( ; Xn ) = 2[l(~n ; Xn ) l( ; Xn )] !D W 2
(k) (8.1)

for each 2 where Xn = (X1 ; X2 ; : : : ; Xn ) and ~n = ~n (X1 ; X2 ; : : : ; Xn ).

Therefore based on the asymptotic result (8.1) and assuming H0 : = 0 is true, the
p-value for testing H0 : = 0 using the likelihood ratio test statistic can be approximated
using

2
p-value P [W (x; 0 )] where W (k)

8.2.1 Example
Suppose X1 ; X2 ; : : : ; Xn is a random sample from the N ; 2 distribution where 2 is
known. Show that, in this special case, the likelihood ratio test statistic for testing
H0 : = 0 has exactly a 2 (1) distribution.

Solution
From Example 7.1.8 we have that the likelihood function of is

n 1 P
n n(x )2
L( ) = exp 2
(xi x)2 exp 2
for 2<
2 i=1 2

or more simply
n(x )2
L( ) = exp 2
for 2<
2
The corresponding log likelihood function is

n(x )2
l( ) = 2
for 2<
2
Solving
dl n(x )
= 2
=0
d
8.2. LIKELIHOOD RATIO TESTS FOR SIMPLE HYPOTHESES 259

gives = x. Since l( ) is a quadratic function which is concave down we know that ^ = x

is the maximum likelihood estimate. The corresponding maximum likelihood estimator of
is
1 Pn
~=X= Xi
n i=1
Since L(^ ) = 1, the relative likelihood function is
L( )
R( ) =
L(^ )
n(x )2
= exp 2
for 2<
2
The likelihood ratio test statistic for testing the hypothesis H0 : = 0 is
L ( 0 ; X)
( 0 ; X) = 2 log
L (~ ; X)
n(X 2
0)
= 2 log exp 2
since ~ = X
2
n(X 2
0)
= 2
2
X
= p 0
= n
If H0 : = is true then X N 2
0 0; and
2
X
p 0 2
(1)
= n
as required.

8.2.2 Example
Suppose X1 ; X2 ; : : : ; Xn is a random sample from the Poisson( ) distribution.
(a) Find the likelihood ratio test statistic for testing H0 : = 0. Verify that the likelihood
ratio statistic takes on large values if ~ > 0 or ~ < 0 .
(b) Suppose x = 6 and n = 25. Use the likelihood ratio test statistic to test H0 : = 5.
Compare this with the test in Example 8.1.6.

Solution
(a) From Example 6.2.5 we have the likelihood function
nx n
L( ) = e for 0

and maximum likelihood estimate ^ = x. The relative likelihood function can be written
as
n^
L( ) ^
R( ) = = en( ) for 0
L(^) ^
260 8. HYPOTHESIS TESTING

The likelihood ratio test statistic for H0 : = 0 is

( 0 ; X) = 2 log R ( 0 ; X)
" ~ #
n
0 n(~ 0 )
= 2 log e
~

~ log 0 ~
= 2n + 0
~
0 0
= 2n~ 1 log (8.2)
~ ~

To verify that the likelihood ratio statistic takes on large values if ~ > 0 or ~ < 0 or
equivalently if ~0 < 1 or ~0 > 1, consider the function

g (t) = a [(t 1) log (t)] for t > 0 and a > 0 (8.3)

We note that g (t) ! 1 as t ! 0+ and t ! 1. Now

1 t 1
g 0 (t) = a 1 =a for t > 0 and a > 0
t t

Since g 0 (t) < 0 for 0 < t < 1, and g 0 (t) > 0 for t > 1 we can conclude that the function
g (t) is a decreasing function for 0 < t < 1 and an increasing function for t > 1 with an
absolute minimum at t = 1. Since g (1) = 0, g (t) is positive for all t > 0 and t 6= 0.
Therefore if we let t = 0
~ in (8.2) then we see that ( 0 ; X) will be large for small values
of t = 0
~ < 1 or large values of t = 0
~ > 1.
(b) If x = 6, n = 25, and H0 : = 5 then the observed value of the likelihood ratio test
statistic is

(5; x) = 2 log R ( 0 ; X)
" #
5 25(5:6) 25(6 5)
= 2 log e
6
= 4:6965

The parameter space is = f : > 0g which has dimension 1 and thus k = 1. The
approximate p-value is
2
p-value P (W 4:6965) where W (1)
h p i
= 2 1 P Z 4:6965 where Z N (0; 1)
= 0:0302

calculated using R. Since 0:01 < p-value 0:05 there is evidence against H0 : = 5 based
on the data. Compared with the answer in Example 8.1.6 for HA : 6= 5 we note that the
p-values are slightly di¤erent by the conclusion is the same.
8.2. LIKELIHOOD RATIO TESTS FOR SIMPLE HYPOTHESES 261

8.2.3 Example
Suppose X1 ; X2 ; : : : ; Xn is a random sample from the Exponential( ) distribution.
(a) Find the likelihood ratio test statistic for testing H0 : = 0. Verify that the likelihood
ratio statistic takes on large values if ~ > 0 or ~ < 0 .
(b) Suppose x = 6 and n = 25. Use the likelihood ratio test statistic to test H0 : = 5.
(c) From Example 6.6.3 we have
P
n
2 Xi
i=1 2n~ 2
Q(X; ) = = (2n)

is a pivotal quantity. Explain how this pivotal quantity could be used to test H0 : = 0
if (i) HA : < 0 , (ii) HA : > 0 , and (iii) HA : 6= 0 .
(d) Suppose x = 6 and n = 25. Use the test statistic from (c) for HA : 6= 0 to test
H0 : = 5. Compare the answer with the answer in (b).

Solution
(a) From Example 6.2.8 we have the likelihood function
n nx=
L( ) = e for >0

and maximum likelihood estimate ^ = x. The relative likelihood function can be written
as !n
L( ) ^ ^
R( ) = = en(1 = ) for 0
^
L( )
The likelihood ratio test statistic for H0 : = 0 is

( 0 ; X) = 2 log R ( 0 ; X)
" !n #
~ ~= )
= 2 log en(1
" ! !#
~ ~
= 2n 1 log
0 0

To verify that the likelihood ratio statistic takes on large values if ~ > 0 or ~ < 0 or
~ ~
equivalently if 0 < 1 and 0 > 1 we note that ( 0 ; X) is of the form 8.3 so an argument
~
similar to Example 8.2.2(a) can be used with t = 0
.
(b) If x = 6, n = 25, and H0 : = 5 then the observed value of the likelihood ratio test
statistic is

(5; x) = 2 log R ( 0 ; X)
" #
6 25 25(1 6)=5
= 2 log e
5
= 0:8839222
262 8. HYPOTHESIS TESTING

The parameter space is = f : > 0g which has dimension 1 and thus k = 1. The
approximate p-value is
2
p-value P (W 0:8839222) where W (1)
h p i
= 2 1 P Z 0:8839222 where Z N (0; 1)
= 0:3471

calculated using R. Since p-value > 0:1 there is no evidence against H0 : = 5 based on
the data.
~
(c) (i) If HA : > 0 we could let D = 0
. If H0 : = 0 is true then since E(~) = 0 we
~
would expect observed values of D = 0 to be close to 1. However if HA : > 0 is true
~
then E(~) = > 0 and we would expect observed values of D = 0 to be larger than 1
and therefore large values of D provide evidence against H0 : = 0 . The corresponding
p-value would be
!
~ ^
p-value = P ; H0
0 0
!
2n^ 2
= P W where W (2n)
0

~
(ii) If HA : < 0 we could still let D = 0
. If H0 : = 0 is true then since E(~) = 0 we
~
would expect observed values of D = 0 to be close to 1. However if HA : < 0 is true
~
then E(~) = < 0 and we would expect observed values of D = 0 to be smaller than
1 and therefore small values of D provide evidence against H0 : = 0 . The corresponding
p-value would be
!
~ ^
p-value = P ; H0
0 0
!
2n^ 2
= P W where W (2n)
0

~
(iii) If HA : 6= 0 we could still let D = 0
. If H0 : = 0 is true then since E(~) = 0 we
~
would expect observed values of D = 0 to be close to 1. However if HA : 6= 0 is true
~
then E(~) = 6= 0 and we would expect observed values of D = 0 to be either larger
or smaller than 1 and therefore both large and small values of D provide evidence against
H0 : = 0 . If a large (small) value of D is observed it is not simple to determine exactly
which small (large) values should also be considered. Since we are not that concerned about
the exact p-value, the p-value is usually calculated more simply as
! !!
2n^ 2n^ 2
p-value = min 2P W ; 2P W where W (2n)
0 0
8.2. LIKELIHOOD RATIO TESTS FOR SIMPLE HYPOTHESES 263

~ 6
(d) If x = 6, n = 25, and H0 : = 5 then the observed value of the D = 0
is d = 5 with

(50) 6 (50) 6 2
p-value = min 2P W ; 2P W where W (50)
5 5
= min (1:6855; 0:3145)
= 0:3145

calculated using R. Since p-value > 0:1 there is no evidence against H0 : = 5 based on
the data.
h i
~ ~ ~
We notice the test statistic D = 0 and ( 0 ; X) = 2n 0
1 log 0
are both
~
functions of 0
. For this example the p-values are similar and the conclusions are the same.

8.2.4 Example
The following table gives the observed frequencies of the six faces in 100 rolls of a die:

Face: j 1 2 3 4 5 6 Total
Observed Frequency: xj 16 15 14 20 22 13 100

Are these observations consistent with the hypothesis that the die is fair?

Solution
The model for these data is (X1 ; X2 ; : : : ; X6 ) Multinomial(100; 1 ; 2 ; : : : ; 6 ) and the
hypothesis of interest is H0 : 1 = 2 = = 6 = 61 . Since the model and parameters are
P6
completely speci…ed this is a simple hypothesis. Since j = 1 there are really only k = 5
j=1
parameters. The relative likelihood function for ( 1 ; 2; : : : ; 5) is
n! x1 x2 x5 x6
L ( 1; 2; : : : ; 5) = 1 2 5 (1 1 2 5)
x1 !x2 ! x5 !x6 !
or more simply
x1 x2 x5 x6
L ( 1; 2; : : : ; 5) = 1 2 5 (1 1 2 5)

P
5
for 0 j 1 for j = 1; 2; : : : ; 5 and j 1. The log likelihood function is
j=1

P
5
l ( 1; 2; : : : ; 5) = xj log j + x6 log (1 1 2 5)
j=1

Now
@l xj n x1 x2 x5
= for j = 1; 2; : : : ; 5
@ j j 1 1 2 5
xj (1 1 2 5) j (n x1 x2 x5 )
=
j (1 1 2 5)
264 8. HYPOTHESIS TESTING

since x6 = n x1 x2 x5 . We could solve @@lj = 0 for j = 1; 2; : : : ; 5 simultaneously.

In the Binomial case we know ^ = nx . It seems reasonable that the maximum likelihood
x x
estimate of j is ^j = nj for j = 1; 2; : : : ; 5. To verify this is true we substitute j = nj for
j = 1; 2; : : : ; 5 into @@lj to obtain
xj P xj P
xj xi xj + xi = 0
n i6=j n i6=j
Xj
Therefore the maximum likelihood estimator of is ~j =
for j = 1; 2; : : : ; 5. Note also
j n
P5
that by the invariance property of maximum likelihood estimators ~6 = 1 ~j = X6 .
n
j=1
Therefore we can write
P
6 Xj
l(~1 ; ~2 ; ~3 ; ~4 ; ~5 ; X) = Xj log
i=1 n
1
Since the null hypothesis is H0 : 1 = 2 = = 6 = 6

P
6 1
l( 0 ; X) = Xj log
i=1 6
so the likelihood ratio test statistic is
h i
(X; 0 ) = 2 l(~; X) l( 0 ; X)

P
6 Xj P
6 1
= 2 Xj log Xj log
i=1 n i=1 6
P
6 Xj
= 2 Xj log
i=1 Ej

where Ej = n=6 is the expected frequency for outcome j. This test statistic is the likelihood
ratio Goodness of Fit test statistic introduced in your previous statistics course.
For these data the observed value of the likelihood ratio test statistic is
P
6 xj
(x; 0) = 2 xj log
i=1 100=6
16 15 13
= 2 16 log + 15 log + + 13 log
100=6 100=6 100=6
= 3:699649

The approximate p-value is

2
p-value P (W 3:699649) where W (5)
= 0:5934162

calculated using R. Since p-value> 0:1 there is no evidence based on the data against the
hypothesis of a fair die.
8.3. LIKELIHOOD RATIO TESTS FOR COMPOSITE HYPOTHESES 265

Note:
(1) In this example the data (X1 ; X2 ; : : : ; X6 ) are not a random sample. The conditions
for (8.1) hold by thinking of the experiment as a sequence of n independent trials with 6
outcomes on each trial.
(2) You may recall from your previous statistics course that the 2 approximation is rea-
sonable if the expected frequency Ej in each category is at least 5.

8.2.5 Exercise
In a long-term study of heart disease in a large group of men, it was noted that 63 men who
had no previous record of heart problems died suddenly of heart attacks. The following
table gives the number of such deaths recorded on each day of the week:

Day of Week Mon. Tues. Wed. Thurs. Fri. Sat. Sun.

No. of Deaths 22 7 6 13 5 4 6

Test the hypothesis of interest that the deaths are equally likely to occur on any day of the
week.

8.3 Likelihood Ratio Tests for Composite Hypotheses

Suppose X = (X1 ; X2 ; : : : ; Xn ) is a random sample from f (x; ) where 2 and has
dimension k. Suppose we wish to test H0 : 2 0 where 0 is a subset of of dimension
q where 0 < q < k. The hypothesis H0 is a composite hypothesis since all the values of
the unknown parameters are not speci…ed. For testing composite hypotheses we use the
likelihood ratio test statistic
2 3
max L( ; X)
2 0
(X; 0 ) = 2 log 4 5
max L( ; X)
2

= 2 l(~; X) max l( ; X)
2 0

where ~ = ~ (X1 ; X2 ; : : : ; Xn ) is the maximum likelihood estimator of . Note that this

test statistic implicitly assumes that the alternative hypothesis is HA : 2= 0.
Let the observed value of the likelihood ratio test statistic be

(x; 0) = 2 l(^; x) max l ( ; x)

2 0

where x = (x1 ; x2 ; : : : ; xn ) are the observed data. The p-value is

p-value = P [ (X; 0) (x; 0 ); H0 ]

Note that the p-value is calculated assuming H0 : 2 0 is true. In general this p-value
is di¢ cult to determine exactly since the distribution of the random variable (X; 0 ) is
266 8. HYPOTHESIS TESTING

usually intractable. Under certain (regularity) conditions it can be shown that, assuming
the hypothesis H0 : 2 0 is true,

(X; 0) = 2 l(~n ; Xn ) max l( ; Xn ) !D W 2

(k q)
2 0

where Xn = (X1 ; X2 ; : : : ; Xn ) and ~n = ~n (X1 ; X2 ; : : : ; Xn ). The approximate p-value is

given by
p-value P [W (x; 0 )]
where x = (x1 ; x2 ; : : : ; xn ) are the observed data.
Note:
The number of degrees of freedom is the di¤erence between the dimension of and the
dimension of 0 . The degrees of freedom can also be determined as the number of para-
meters estimated under the model with no restrictions minus the number of parameters
estimated under the model with restrictions imposed by the null hypothesis H0 .

8.3.1 Example
(a) Suppose X1 ; X2 ; : : : ; Xn is a random sample from the Gamma( ; ) distribution. Find
the likelihood ratio test statistic for testing H0 : = 0 where is unknown. Indicate how
to …nd the approximate p-value.
(b) For the data in Example 7.1.14 test the hypothesis H0 : = 2.

Solution
(a) From Example 8.1.2(b) we have = f( ; ) : > 0; > 0g which has dimension k = 2
and 0 = f( 0 ; ) : > 0g which has dimension q = 1 and the hypothesis is composite.
From Exercise 7.1.11 the likelihood function is
n Q
n 1 P
n
L( ; ) = [ ( ) ] xi exp xi for > 0; >0
i=1 i=1

where and the log likelihood function is

l ( ; ) = log L ( ; )
P
n 1 P
n
= n log ( ) n log + log xi xi for > 0; >0
i=1 i=1

The maximum likelihood estimators cannot be found explicitly so we write

P
n 1 Pn
l(~ ; ~ ; X) = n log (~ ) n~ log ~ + ~ log Xi X
i=1 ~ i=1 i
If = 0 then the log likelihood function is
P
n
xi
P
n
i=1
l( 0; ) = n log ( 0) n 0 log + 0 log xi for >0
i=1
8.3. LIKELIHOOD RATIO TESTS FOR COMPOSITE HYPOTHESES 267

which is only a function of . To determine max l( ; ; X) we note that

( ; )2 0

P
n
xi
d n 0 i=1
l ( 0; ) = + 2
d
d Pn
and d l( 0; ) = 0 for = n1 0 xi = x0 and therefore
i=1

X P
n
max l( ; ; X) = n log ( 0) n 0 log + 0 log Xi n 0
( ; )2 0 0 i=1
The likelihood ratio test statistic is
(X; 0)

= 2 l(~ ; ~ ; X) max l( ; ; X)
( ; )2 0

P
n 1 Pn
= 2[ n log (~ ) n~ log ~ + ~ log Xi X
i=1 ~ i=1 i
X P
n
+n log ( 0) +n 0 log 0 log Xi + n 0]
0 i=1
( 0) (~ 0) P
n X X
= 2n log + log Xi + 0 log ~ log ~ + 0
(~ ) n i=1 0 ~
with corresponding observed value
(x; 0)
( 0) (^ 0) P
n x x
= 2n log + log xi + 0 log ^ log ^ + 0
(^ ) n i=1 0 ^
Since k q=2 1=1
2
p-value P [W (x; 0 )] where W (1)
h p i
= 2 1 P Z (x; 0 ) where Z N (0; 1)
The degrees of freedom can also be determined by noticing that, under the full model two
parameters ( and ) were estimated, and under the null hypothesis H0 : = 0 only one
parameter ( ) was estimated. Therefore 2 1 = 1 are the degrees of freedom.
(b) For H0 : = 2 and the data in Example 7.1.14 we have n = 30, x = 6:824333,
Pn
1
n log xi = 1:794204, ^ = 4:118407, ^ = 1:657032. The observed value of the likelihood
i=1
ratio test statistic is (x; 0) = 6:886146 with
2
p-value P (W 6:886146) where W (1)
h p i
= 2 1 P Z 6:886146 where Z N (0; 1)
= 0:008686636
calculated using R. Since p-value 0:01 there is strong evidence against H0 : = 2 based
on the data.
268 8. HYPOTHESIS TESTING

8.3.2 Example
(a) Suppose X1 ; X2 ; : : : ; Xn is a random sample from the Exponential( 1 ) distribution and
independently Y1 ; Y2 ; : : : ; Ym is a random sample from the Exponential( 2 ) distribution.
Find the likelihood ratio test statistic for testing H0 : 1 = 2 . Indicate how to …nd the
approximate p-value.
P
10
(b) Find the approximate p-value if the observed data are n = 10, xi = 22, m = 15,
i=1
P
15
yi = 40. What would you conclude?
i=1

Solution
(a) From Example 8.1.2(c) we have = f( 1 ; 2 ) : 1 > 0; 2 > 0g which has dimension
k = 2 and 0 = f( 1 ; 2 ) : 1 = 2 ; 1 > 0; 2 > 0g which has dimension q = 1 and the
hypothesis is composite.
From Example 6.2.8 the likelihood function for an observed random sample x1 ; x2 ; : : : ; xn
from an Exponential( 1 ) distribution is

n nx=
L1 ( 1 ) = 1 e 1
for 1 >0

with maximum likelihood estimate ^1 = x.

Similarly the likelihood function for an observed random sample y1 ; y2 ; : : : ; ym from an
Exponential( 2 ) distribution is

m my=
L2 ( 2 ) = 2 e 2
for 2 >0

with maximum likelihood estimate ^2 = y. Since the samples are independent the likelihood
function for ( 1 ; 2 ) is

L ( 1; 2) = L1 ( 1 )L2 ( 2 ) for 1 > 0; 2 >0

and the log likelihood function

nx my
l ( 1; 2) = n log 1 m log 2 for 1 > 0; 2 >0
1 2

The independence of the samples also implies the maximum likelihood estimators are still
~1 = X and ~2 = Y . Therefore

l(~1 ; ~2 ; X; Y) = n log X m log Y (n + m)

If 1 = 2 = then the log likelihood function is

(nx + my)
l( ) = (n + m) log for >0
8.3. LIKELIHOOD RATIO TESTS FOR COMPOSITE HYPOTHESES 269

which is only a function of . To determine max l( 1 ; 2 ; X; Y) we note that

( 1 ; 2 )2 0

d (n + m) (nx + my)
l( ) = + 2
d
d nx+my
and d l ( ) = 0 for = n+m and therefore

nX + mY
max l( 1 ; 2 ; X; Y) = (n + m) log (n + m)
( 1 ; 2 )2 0 n+m

The likelihood ratio test statistic is

(X; Y; 0)

= 2 l(~1 ; ~2 ; X; Y) max l( 1 ; 2 ; X; Y)
( 1 ; 2 )2 0

nX + mY
= 2 n log X m log Y (n + m) + (n + m) log + (n + m)
n+m
nX + m Y
= 2 (n + m) log n log X m log Y
n+m

with corresponding observed value

nx + my
(x; y; 0) = 2 (n + m) log n log x m log y
n+m

Since k q=2 1=1

2
p-value P [W (x; y; 0 )] where W (1)
h p i
= 2 1 P Z (x; y; 0 ) where Z N (0; 1)

P
10 P
15
(b) For n = 10, xi = 22, m = 15, yi = 40 the observed value of the likelihood ratio
i=1 i=1
test statistic is (x; y; 0) = 0:2189032 and
2
p-value P (W 0:2189032) where W (1)
h p i
= 2 1 P Z 0:2189032 where Z N (0; 1)
= 0:6398769

calculated using R. Since p-value > 0:5 there is no evidence against H0 : 1 = 2 based on
the observed data.

8.3.3 Exercise
(a) Suppose X1 ; X2 ; : : : ; Xn is a random sample from the Poisson( 1 ) distribution and in-
dependently Y1 ; Y2 ; : : : ; Ym is a random sample from the Poisson( 2 ) distribution. Find the
270 8. HYPOTHESIS TESTING

likelihood ratio test statistic for testing H0 : 1 = 2. Indicate how to …nd the approximate
p-value.
P
10
(b) Find the approximate p-value if the observed data are n = 10, xi = 22, m = 15,
i=1
P
15
yi = 40. What would you conclude?
i=1

8.3.4 Example
In a large population of males ages 40 50, the proportion who are regular smokers is
where 0 1 and the proportion who have hypertension (high blood pressure) is
where 0 1. Suppose that n men are selected at random from this population and
the observed data are
Category SH S H SH S H
Frequency x11 x12 x21 x22
where S is the event the male is a smoker and H is the event the male has hypertension.
Find the likelihood ratio test statistic for testing H0 : events S and H are independent.
Indicate how to …nd the approximate p-value.

Solution
The model for these data is (X1 ; X2 ; : : : ; X6 ) Multinomial(100; 11 ; 12 ; 21 ; 22 ) with
parameter space
( )
P
2 P
2
= ( 11 ; 12 ; 21 ; 22 ) :0 ij 1 for i; j = 1; 2 ij 1
j=1 i=1

which has dimension k = 3.

Let P (S) = and P (H) = then the hypothesis of interest can be written as H0 : 11 = ,
12 = (1 ), 21 = (1 ) , 22 = (1 ) (1 ) and

0 = f( 11 ; 12 ; 21 ; 22 ) : 11 = ; 12 = (1 );
21 = (1 ) ; 22 = (1 ) (1 ); 0 1; 0 1g

which has dimension q = 2 and the hypothesis is composite.

From Example 8.2.4 we can see that the relative likelihood function for ( 11 ; 12 ; 21 ; 22 ) is

x11 x12 x21 x22

L( 11 ; 12 ; 21 ; 22 ) = 11 12 21 22

The log likelihood function is

P
2 P
2
l( 11 ; 12 ; 21 ; 22 ) = xij log ij
j=1 i=1
8.3. LIKELIHOOD RATIO TESTS FOR COMPOSITE HYPOTHESES 271

xij
and the maximum likelihood estimate of ij is ^ij = n for i; j = 1; 2. Therefore

P
2 P
2 Xij
l(~11 ; ~12 ; ~21 ; ~22 ; X) = Xij log
j=1 i=1 n

If the events S and H are independent events then from Chapter 7, Problem 4 we have
that the likelihood function is

L( ; ) = ( )x11 [ (1 )]x12 [(1 ) ]x21 [(1 ) (1 )]x22

= x11 +x12
(1 )x21 +x22 x11 +x21
(1 )x12 +x22 for 0 1, 0 1

The log likelihood is

l( ; ) = (x11 + x12 ) log + (x21 + x22 ) log(1 )

+ (x11 + x21 ) log + (x12 + x22 ) log(1 )
for 0 < < 1, 0 < <1

and the maximum likelihood estimates are

x11 + x12 x11 + x21
^= and ^ =
n n
therefore

max l( 11 ; 12 ; 21 ; 22 ; X)
( 11 ; 12 ; 21 ; 22 )2 0

X11 + X12 X21 + X22

= (X11 + X12 ) log + (X21 + X22 ) log
n n
X11 + X21 X12 + X22
+ (X11 + X21 ) log + (X12 + X22 ) log
n n
The likelihood ratio test statistic can be written as

(X; 0)

= 2 l(~11 ; ~12 ; ~21 ; ~22 ; X) max l( 11 ; 12 ; 21 ; 22 ; X)

( 11 ; 12 ; 21 ; 22 )2 0

P
2 P
2 Xij
= 2 Xij log
j=1 i=1 Eij
RC
where Eij = in j , Ri = Xi1 + Xi2 , Cj = X1j + X2j for i; j = 1; 2. Eij is the expected
frequency if the hypothesis of independence is true.
The corresponding observed value is
2 P
P 2 xij
(x; 0) =2 xij log
j=1 i=1 eij
ri cj
where eij = n , ri = xi1 + xi2 , cj = x1j + x2j for i; j = 1; 2.
272 8. HYPOTHESIS TESTING

Since k q=3 2=1

2
p-value P [W (x; 0 )] where W (1)
h p i
= 2 1 P Z (x; 0 ) where Z N (0; 1)

This of course is the usual test of independence in a two-way table which was discussed in
your previous statistics course.
8.4. CHAPTER 8 PROBLEMS 273

8.4 Chapter 8 Problems

1. Suppose X1 ; X2 ; : : : ; Xn is a random sample from the distribution with probability
density function
1
f (x; ) = x for 0 x 1 for >0

(a) Find the likelihood ratio test statistic for testing H0 : = 0.

P
20
(b) If n = 20 and log xi = 25 …nd the approximate p-value for testing H0 : =1
i=1
using the asymptotic distribution of the likelihood ratio statistic. What would
you conclude?

2. Suppose X1 ; X2 ; : : : ; Xn is a random sample from the Pareto(1; ) distribution.

(a) Find the likelihood ratio test statistic for testing H0 : = 0. Indicate how to
…nd the approximate p-value.
P
25
(b) For the data n = 25 and log xi = 40 …nd the approximate p-value for testing
i=1
H0 : 0 = 1. What would you conclude?

3. Suppose X1 ; X2 ; : : : ; Xn is a random sample from the distribution with probability

density function
x 1 2
f (x; ) = 2 e 2 (x= ) for x > 0; > 0

(a) Find the likelihood ratio test statistic for testing H0 : = 0. Indicate how to
…nd the approximate p-value.
P
20
(b) If n = 20 and x2i = 10 …nd the approximate p-value for testing H : = 0:1.
i=1
What would you conclude?

4. Suppose X1 ; X2 ; : : : ; Xn is a random sample from the Weibull(2; 1 ) distribution and

independently Y1 ; Y2 ; : : : ; Ym is a random sample from the Weibull(2; 2 ) distribution.
Find the likelihood ratio test statistic for testing H0 : 1 = 2 . Indicate how to …nd
the approximate p-value.

5. Suppose (X1 ; X2 ; X3 ) Multinomial(n; 1 ; 2 ; 3 ).

(a) Find the likelihood ratio test statistic for testing H0 : 1 = 2 = 3. Indicate
how to …nd the approximate p-value.
(b) Find the likelihood ratio test statistic for testing H0 : 1 = 2 ; 2 = 2 (1 );
3 = (1 )2 . Indicate how to …nd the approximate p-value.
274 8. HYPOTHESIS TESTING

6. Suppose X1 ; X2 ; : : : ; Xn is a random sample from the N( 1 ; 21 ) distribution and in-

dependently Y1 ; Y2 ; : : : ; Ym is a random sample from the N( 2 ; 22 ) distribution. Find
the likelihood ratio test statistic for testing H0 : 1 = 2 ; 21 = 22 . Indicate how to
…nd the approximate p-value.
9. Solutions to Chapter Exercises

275
276 9. SOLUTIONS TO CHAPTER EXERCISES

9.1 Chapter 2
Exercise 2.1.5
(a) P (A) 0 follows from De…nition 2.1.3(A1). From Example 2.1.4(c) we have P A =
1 P (A). But from De…nition 2.1.3(A1) P A 0 and therefore P (A) 1.
(b) Since A = (A \ B) [ A \ B and (A \ B) \ A \ B = ? then by Example 2.1.4(b)

P (A) = P (A \ B) + P A \ B

or
P A \ B = P (A) P (A \ B)
as required.
(c) Since
A [ B = A \ B [ (A \ B) [ A \ B
is the union of three mutually exclusive events then by De…nition 2.1.3(A3) and Example
2.1.4(a) we have

P (A [ B) = P A \ B + P (A \ B) + P A \ B (9.1)

By the result proved in (b)

P A \ B = P (A) P (A \ B) (9.2)

and similarly
P A \ B = P (B) P (A \ B) (9.3)
Substituting (9.2) and (9.3) into (9.1) gives

P (A [ B) = P (A) P (A \ B) + P (A \ B) + P (B) P (A \ B)
= P (A) + P (B) P (A \ B)

as required.
9.1. CHAPTER 2 277

Exercise 2.3.7
By 2.11.8
x2 x3
log (1 + x) = x + for 1<x 1
2 3
Let x = p 1 to obtain
(p 1)2 (p 1)3
log p = (p 1) + (9.4)
2 3
which holds for 0 < p 2 and therefore also hold for 0 < p < 1. Now (9.4) can be written
as
(1 p)2 (1 p)3
log p = (1 p)
2 3
P
1 (1 p)x
= for 0 < p < 1
x=1 x
Therefore
P
1 (1 p)x
P
1
f (x) =
x=1 x=1 x log p
1 1 P (1 p)x
=
log p x=1 x
log p
= =1
log p
which holds for 0 < p < 1.

Exercise 2.4.11
(a)
Z1 Z1 1
x (x= )
f (x) dx = e dx
1 0
1
Let y = (x= ) . Then dy = x dx. When x = 0, y = 0 and as x ! 1, y ! 1.
Therefore
Z1 Z1
x 1 (x= )
f (x) dx = e dx
1 0
Z1
y
= e dy = (1) = 0! = 1
0

(b) If = 1 then
1 x=
f (x) = e for x > 0

which is the probability density function of an Exponential( ) random variable.

2
α=1, β=0.5
1.8

1.6 α=2, β=0.5

1.4

1.2
f(x)
1 α=3, β=1

0.8

0.6

0.4
α=2, β=1
0.2

0
0 0.5 1 1.5 2 2.5 3
x

Figure 9.1: Graphs of Weibull probability density function

Exercise 2.4.12
(a)
Z1 Z1 Zb
1
f (x) dx = +1
dx = lim x dx
x b!1
1 1
h i h i
b
= lim x j =1 lim b
b!1 b!1
1
= 1 lim = 1 since >0
b!1 b

(b) See Figure 9.2

3.5
α=0.5, β=2
3

2.5
f(x)
2
α=1, β=2
α=0.5, β=1
1.5

1 α=1, β=1

0.5

0
0 0.5 1 1.5 2 2.5 3
x

Figure 9.2: Graphs of Pareto probability density function

9.1. CHAPTER 2 279

Exercise 2.5.4
(a) For X Cauchy( ; 1) the probability density function is
1
f (x; ) = h i for x 2 <; 2<
1 + (x )2

and 0 otherwise. See Figure 9.3 for a sketch of the probability density function for
= 1; 0; 1.

0.35

0.3

θ=0 θ=1
θ=-1
0.25

0.2
f(x)

0.15

0.1

0.05

0
-6 -4 -2 0 2 4 6
x

Figure 9.3: Cauchy( ; 1) probability density functions for = 1; 0; 1

Let
1
f0 (x) = f (x; = 0) = for x 2 <
(1 + x2 )
and 0 otherwise. Then
1
f (x; ) = h i = f0 (x ) for x 2 <; 2<
2
1 + (x )

and therefore is a location parameter of this distribution.

(b) For X Cauchy(0; ) the probability density function is
1
f (x; ) = h i for x 2 <; >0
1 + (x= )2

and 0 otherwise. See Figure 9.4 for a sketch of the probability density function for
= 0:5; 1; 2.
Let
1
f1 (x) = f (x; = 1) = for x 2 <
(1 + x2 )
280 9. SOLUTIONS TO CHAPTER EXERCISES

0.7

0.6
θ=0.5

0.5

0.4
f (x)

0.3

θ=1
0.2

0.1
θ=2

0
-5 0 5
x

Figure 9.4: Cauchy(0; ) probability density function for = 0:5; 1; 2

and 0 otherwise. Then

1 1 x
f (x; ) = h i = f1 for x 2 <; >0
2
1 + (x= )

and therefore is a scale parameter of this distribution.

9.1. CHAPTER 2 281

Exercise 2.6.11
If X Exponential(1) then the probability density function of X is
x
f (x) = e for x 0

Y = X 1= = h (X) for > 0, > 0 is an increasing function with inverse function

X = (Y = ) = h 1 (Y ). Since the support set of X is A = fx : x 0g, the support set of
Y is B = fy : y 0g.
Since
d 1 d y
h (y) =
dy dy
y 1
=

then by Theorem 2.6.8 the probability density function of Y is

1 d 1
g(y) = f (h (y)) h (y)
dy
y 1
(y= )
= e for y 0

which is the probability density function of a Weibull( ; ) random variable as required.

Exercise 2.6.12
X is a random variable with probability density function
1
f (x) = x for 0 < x < 1; >0

and 0 otherwise. Y = log X is a decreasing function with inverse function X = e Y =

h 1 (Y ). Since the support set of X is A = fx : 0 < x < 1g, the support set of Y is
B = fy : y > 0g.
Since
d 1 d y
h (y) = e
dy dy
= e y

then by Theorem 2.6.8 the probability density function of Y is

1 d 1
g(y) = f (h (y)) h (y)
dy
y 1 y
= e e
y
= e for y 0
1
which is the probability density function of a Exponential random variable as required.
282 9. SOLUTIONS TO CHAPTER EXERCISES

Exercise 2.7.4
Using integration by parts with u = 1 F (x) and dv = dx gives du = f (x) dx, v = x and
Z1 Z1
[1 F (x)] dx = x [1 F (x)] j1
0 + xf (x) dx
0 0
Z1
= lim x f (t) dt + E (X)
x!1
x

The desired result holds if

Z1
lim x f (t) dt = 0 (9.5)
x!1
x

Since
Z1 Z1
0 x f (t) dt tf (t) dt
x x

then (9.5) holds by the Squeeze Theorem if

Z1
lim tf (t) dt = 0
x!1
x

R1 Rx
Since E (X) = xf (x) dx exists then G (x) = tf (t) dt exists for all x > 0 and lim G (x) =
0 0 x!1
E (X).
By the First Fundamental Theorem of Calculus 2.11.9 and the de…nition of an improper
integral 2.11.11
Z1
tf (t) dt = lim G (b) G (x) = E (X) G (x)
b!1
x

Therefore
Z1
lim tf (t) dt = lim [E (X) G (x)] = E (X) lim G (x)
x!1 x!1 x!1
x
= E (X) E (X)
= 0

and (9.5) holds.

9.1. CHAPTER 2 283

Exercise 2.7.9
(a) If X Poisson( ) then
P
1 x
e
E(X (k) ) = x(k)
x=0 x!
P
1 x k
k
= e let y = x k
x=k (x k)!
k P1 y
= e
y=0 y!
k
= e e by 2.11.7
k
= for k = 1; 2; : : :

Letting k = 1 and k = 2 we have

E X (1) = E (X) =

and
E X (2) = E [X (X 1)] = 2

V ar (X) = E [X (X 1)] + E (X) [E (X)]2

2 2
= + =

(b) If X Negative Binomial(k; p) then

P
1 k k
E(X (j) ) = x(j) p (p 1)x
x=0 x
P
1 k j
= pk (p 1)j ( k)(j) (p 1)x j
by 2.11.4(1)
x=j x j
P
1 k j
= pk (p 1)j ( k)(j) (p 1)x j
let y = x j
x=j x j
P
1 k j
= pk (p 1)j ( k)(j) (p 1)y
y=0 y
k j (j) k j
= p (p 1) ( k) (1 + p 1) by 2.11.3(2)
j
p 1
= ( k)(j) for j = 1; 2; : : :
p
Letting k = 1 and k = 2 we have
1
p 1 k (1 p)
E X (1) = E (X) = ( k)(1) =
p p
and
2 2
(2) (2) p 1 1 p
E X = E [X (X 1)] = ( k) = k (k + 1)
p p
284 9. SOLUTIONS TO CHAPTER EXERCISES

V ar (X) = E [X (X 1)] + E (X) [E (X)]2

2 2
1 p k (1 p) k (1 p)
= k (k + 1) +
p p p
2
k (1 p) k (1 p) k (1 p)
= + = (1 p + p)
p2 p p2
k (1 p)
=
p2

(c) If X Gamma( ; ) then

Z1 1 e x= Z1 +p 1 e x=
p px x x
E(X ) = x dx = dx let y =
( ) ( )
0 0
Z1
1 +p 1 y
= ( y) e dy
( )
0
+p Z1
+p 1 y
= y e dy which converges for +p>0
( )
0

p ( + p)
= for p >
( )

Letting k = 1 and k = 2 we have

( + 1) ( )
E (X) = =
( ) ( )
=

and

( + 2) ( + 1) ( )
E X2 = E X2 = 2
= 2
( ) ( )
2
= ( + 1)

V ar (X) = E X 2 [E (X)]2 = ( + 1) 2
( )2
2
=
9.1. CHAPTER 2 285

(c) If X Weibull( ; ) then

Z1 1 e (x= )
k x x
E(X ) = xk dx let y =
0
Z1 Z1
k
1= y k
= y e dy = y k= e y
dy
0 0

k k
= +1 for k = 1; 2; : : :

Letting k = 1 and k = 2 we have

1
E (X) = +1

and
2
E X2 = E X2 = 2
+1

so
2
2 1
V ar (X) = E X 2 [E (X)]2 = 2
+1 +1
( )
2
2 2 1
= +1 +1

Exercise 2.8.3
From Markov’s inequality we know

E jY jk
P (jY j c) for all k; c > 0 (9.6)
ck
Since we are given that X is a random variable with …nite mean and …nite variance 2

then h i
2
= E (X )2 = E jX j2

Substituting Y = X , k = 1, and c = k into (9.6) we obtain

E jX j2 2 1
P (jX j k ) 2 = 2 =
(k ) (k ) k2
or
1
P (jX j k )
k2
for all k > 0 as required.
286 9. SOLUTIONS TO CHAPTER EXERCISES

Exercise 2.9.2
p
For a Exponential( ) random variable V ar (X) = ( ) = . For g (X) = log X,
g 0 (X) = X1 . Therefore by (2.5), the variance of Y = g (X) = log X is approximately
2
2 1
g0 ( ) ( ) = ( )
= 1

which is a constant.

Exercise 2.10.3
(a) If X Binomial(n; p) then

P
1 n x
M (t) = etx p (1 p)n x
x20 x
P1 n x
= et p (1 p)n x
which converges for t 2 <
x20 x
k
= pet + 1 p by 2.11.3(1)

(b) If X Poisson( ) then

P
1 x
e
M (t) = etx
x=0 x!
x
P
1 et
= e which converges for t 2 <
x=0 x!
et
= e e by 2.11.7
+ et
= e for t 2 <

Exercise 2.10.6
If X Negative Binomial(k; p) then
k
p
MX (t) = for t < log q
1 qet

By Theorem 2.10.4 the moment generating function of Y = X + k is

MY (t) = ekt MX (t)

k
pet
= for t < log q
1 qet
9.1. CHAPTER 2 287

Exercise 2.10.14
(a) By the Exponential series 2.11.7
2
t2 =2 t2 =2 t2 =2
M (t) = e =1+ + +
1! 2!
1 1 4
= 1 + t2 + t for t 2 <
2 2!22
Since E(X k ) = k! coe¢ cient of tk in the Maclaurin series for M (t) we have

1
E (X) = 1! (0) = 0 and E X 2 = 2! =1
2

and so
V ar (X) = E X 2 [E (X)]2 = 1
(b) By Theorem 2.10.4 the moment generating function of Y = 2X 1 is

MY (t) = e t MX (2t) for t 2 <

t (2t)2 =2
= e e
t+(2t)2 =2
= e for t 2 <

By examining the list of moment generating functions in Chapter 11 we see that this is the
moment generating function of a N( 1; 4) random variable. Therefore by the Uniqueness
Theorem for Moment Generating Functions, Y has a N( 1; 4) distribution.
288 9. SOLUTIONS TO CHAPTER EXERCISES

9.2 Chapter 3
Exercise 3.2.5
(a) The joint probability function of X and Y is

f (x; y) = P (X = x; Y = y)
n! h in x y
2 x
= [2 (1 )]y (1 )2
x!y! (n x y)!
for x = 0; 1; : : : ; n; y = 0; 1; : : : ; n; x + y n

which is a Multinomial n; 2
; 2 (1 ) ; (1 )2 distribution or the trinomial distribution.
(b) The marginal probability function of X is
n
Xx n! h in x y
2 x
f1 (x) = P (X = x) = [2 (1 )]y (1 )2
x!y! (n x y)!
y=0

n!
1
X (n x)! h in x y
2 x
= [2 (1 )]y (1 )2
x! (n x)! y! (n x y)!
y=0
n h in x
x
= 2
2 (1 ) + (1 )2 by the Binomial Series 2.11.3(1)
x
n 2 x 2 n x
= 1 for x = 0; 1; : : : ; n
x
2
and so X Binomial n; .
(c) In a similar manner to (b) the marginal probability function of Y can be shown to be
Binomial(n; 2 (1 )) since P (Aa) = 2 (1 ).
(d)
P P P
t
P (X + Y = t) = f (x; y) = f (x; t x)
(x;y): x+y=t x=0
t
X n! h in t
2 x
= [2 (1 )]t x
(1 )2
x! (t x)! (n t)!
x=0

n h in t
tX t 2 x
= (1 )2 [2 (1 )]t x
t x
x=0
n h in t t
= (1 )2 2
+ 2 (1 ) by the Binomial Series 2.11.3(1)
t
n h in t
t
= 2
+ 2 (1 ) (1 )2 for t = 0; 1; : : : ; n
t

Thus X + Y Binomial n; 2 + 2 (1 ) which makes sense since X + Y is counting the

number of times an AA or Aa type occurs in n trials and the probability of success is
P (AA) + P (Aa) = 2 + 2 (1 ).
9.2. CHAPTER 3 289

Exercise 3.3.6
(a)
Z1 Z1 Z1 Z1
1
1 = f (x; y)dxdy = k dxdy
(1 + x + y)3
1 1 0 0
Z1
k 1
= lim ja0 dy
2 a!1 (1 + x + y)2
0
Z1
k 1 1
= lim 2 + dy
2 a!1 (1 + a + y) (1 + y)2
0
Z1
k 1
= dy
2 (1 + y)2
0
k 1 a
= lim j
2 a!1 (1 + y) 0
k 1
= lim +1
2 a!1 (1 + a)
k
=
2
Therefore k = 2. A graph of the joint probability density function is given in Figure 9.5.

1 .5

0 .5

0
0
0
1 1
2 2
3 3
4 4

Figure 9.5: Graph of joint probability density function for Exercise 3.3.6
290 9. SOLUTIONS TO CHAPTER EXERCISES

(b) (i)

Z2 Z1 Z2
2 1
P (X 1; Y 2) = dxdy = j10 dy
(1 + x + y)3 (1 + x + y)2
0 0 0
Z2
1 1 1 1
= 2 + dy = j20
(2 + y) (1 + y)2 (2 + y) (1 + y)
0
1 1 1 1 5
= +1= (3 4 6 + 12) =
4 3 2 12 12

(ii) Since f (x; y) and the support set A = f(x; y) : x 0; y 0g are both symmetric in x
and y, P (X Y ) = 0:5
(iii)

Z1 Z
1 y Z1
2 1
P (X + Y 1) = dxdy = j10 y
dy
(1 + x + y)3 (1 + x + y)2
0 0 0
Z1
1 1 1 1 1
= 2 + dy = yj j1
(2) (1 + y)2 4 0 (1 + y) 0
0
1 1 1
= +0 +1 =
4 2 4

Z1 Z1
2 1
f (x; y)dy = dy = lim ja0
(1 + x + y)3 a!1 (1 + x + y)2
1 0
1
= for x 0
(1 + x)2

the marginal probability density function of X is

8
< 0 x<0
f1 (x) = 1
: (1+x)2
x 0

By symmetry the marginal probability density function of Y is

8
< 0 y<0
f2 (y) = 1
: (1+y)2
y 0
9.2. CHAPTER 3 291

(d) Since

Zy Zx Zy
2 1 x
P (X x; Y y) = dsdt = 2 j0 dt
(1 + s + t)3 (1 + s + t)
0 0 0
Zy
1 1
= 2 + dt
(1 + x + t) (1 + t)2
0
1 1
= jy0
1+x+t 1+t
1 1 1
= + 1 for x 0, y 0
1+x+y 1+y 1+x
the joint cumulative distribution function of X and Y is
8
< 0 x < 0 or y < 0
F (x; y) = P (X x; Y y) = 1 1 1
: 1 + 1+x+y 1+y 1+x x 0, y 0

(e) Since

1 1 1
lim F (x; y) = lim 1+
y!1 y!1 1+x+y 1+y 1+x
1
= 1 for x 0
1+x
the marginal cumulative distribution function of X is
(
0 x<0
F1 (x) = P (X x) = 1
1 1+x x 0

Check:
d d 1
F1 (x) = 1
dx dx 1+x
1
=
(1 + x)2
= f1 (x) for x 0

By symmetry the marginal cumulative distribution function of Y is

8
< 0 y<0
F2 (y) = P (Y y) = 1
: 1 1+y y 0
292 9. SOLUTIONS TO CHAPTER EXERCISES

Exercise 3.3.7
(a) Since the support set is A = f(x; y) : y > x 0g

Z1 Z1 Z1 Zy
x y
1 = f (x; y)dxdy = k e dxdy
1 1 0 0
Z1
x y y
= k e j0 dy
0
Z1
2y y
= k e +e dy
0
1 2y y
= k lim e e ja0
a!1 2
1 2a a 1
= k lim e e +1
a!1 2 2
k
=
2
and therefore k = 2. A graph of the joint probability density function is given in Figure 9.6

1.5

0.5
0

0
0 0.5

0.5
1
1
1.5 1.5

Figure 9.6: Graph of joint probability density function for Exercise 3.3.7
9.2. CHAPTER 3 293

(b) (i) The region of integration is shown in Figure 9.7.

Z1 Z2 Z1
x y x y 2
P (X 1; Y 2) = 2 e dydx = 2 e e jx dx
x=0 y=x x=0
Z1 Z1
x 2 x 2 x 2x
= 2 e e +e dx = 2 e e +e dx
x=0 x=0

2 x 1 2x 1 1
= 2 e e e j10 = 2 e 3
e 2
e 2
+
2 2 2
3 2
= 1 + 2e 3e

y=x

(x,x)

1 x
0

Figure 9.7: Region of integration for Exercise 3.3.7(b)(i)

(ii) Since the support set A = f(x; y) : 0 < x < y < 1g contains only values for which
x < y then P (X Y ) = 1.
(iii) The region of integration is shown in Figure 9.8

Z1=2 Z
1 x Z1
x y x y 1 x
P (X + Y 1) = 2 e dydx = 2 e e jx dx
x=0 y=x x=0
Z1 Z1
x x 1 x 1 2x
= 2 e e +e dx = 2 e +e dx
x=0 x=0

1 1 2x 1=2 1 1 1 1 1
= 2 e x e j0 =2 e e +0+
2 2 2 2
1
= 1 2e
294 9. SOLUTIONS TO CHAPTER EXERCISES

(x,1-x) y =x

y =1-x
(x,x)

x
0 1/2 1

Figure 9.8: Region of integration for Exercise 3.3.7(b)(iii)

(c) The marginal probability density function of X is

Z1 Z1
x y x y a 2x
f1 (x) = f (x; y)dy = 2 e dy = 2e lim e jx = 2e for x > 0
a!1
1 x

and 0 otherwise which we recognize is an Exponential(1=2) probability density function.

The marginal probability density function of Y is

Z1 Zy
x y y x y y y
f (x; y)dy = 2 e dx = 2e e j0 = 2e 1 e for y > 0
1 0

and 0 otherwise.
(d) Since

Zx Zy Zx
s t s
P (X x; Y y) = 2 e dtds = 2 e e t jys ds
s=0 t=s 0
Zx
s y s
= 2 e e +e ds
0
Zx
y s 2s y s 1 2s
= 2 e e +e ds = 2 e e e jx0
2
0
x y 2x y
= 2e e 2e + 1 for y x 0
9.2. CHAPTER 3 295

Zy Zy
s t
P (X x; Y y) = 2 e dtds
s=0 t=s
2y y
= e 2e + 1 for x > y > 0

and
P (X x; Y y) = 0 for x 0 or y 0
therefore the joint cumulative distribution function of X and Y is
8
x y e 2x 2e y + 1 y x 0
>
> 2e
<
F (x; y) = P (X x; Y y) = e 2y 2e y + 1 x>y>0
>
>
: 0 x 0 or y 0

(e) Since the support set is A = f(x; y) : y x 0g and

x y 2x y
lim F (x; y) = lim 2e e 2e +1
y!1 y!1
2x
= 1 e for x > 0

marginal cumulative distribution function of X is

(
0 x 0
F1 (x) = P (X x) = 2x
1 e x>0

which we also recognize as the cumulative distribution function of an Exponential(1=2)

random variable.
Since
x y 2x y 2y 2y y
lim F (x; y) = lim 2e e 2e + 1 = 2e e 2e +1
x!1 x!y
2y y
= e 2e + 1 for y > 0

the marginal cumulative distribution function of Y is

(
0 y 0
F2 (y) = P (Y y) =
1 + e 2y 2e y y 0

Check:
Zy Zy
1
P (Y y) = f2 (t) dt = 2 e t
e 2t
dt = 2 e t
+ e 2t
jy0
2
0 0
1 1
= 2 e y+ e 2y
+1 =1+e 2y
2e y
for y > 0
2 2
or
d d 2y y 2y y
F2 (y) = e 2e +1 = 2e + 2e = f2 (y) for y > 0
dy dy
296 9. SOLUTIONS TO CHAPTER EXERCISES

Exercise 3.4.4
From the solution to Exercise 3.2.5 we have
n! h in x y
2 x
f (x; y) = [2 (1 )]y (1 )2
x!y! (n x y)!
for x = 0; 1; : : : ; n; y = 0; 1; : : : ; n; x + y n

n 2 x 2 n x
f1 (x) = 1 for x = 0; 1; : : : ; n
x
and
n
f2 (y) = [2 (1 )]y [1 2 (1 )]n y
for y = 0; 1; : : : ; n
y
Since
2 n
f (0; 0) = (1 )2n 6= f1 (0) f2 (0) = 1 [1 2 (1 )]n
therefore X and Y are not independent random variables.
From the solution to Exercise 3.3.6 we have
2
f (x; y) = for x 0, y 0
(1 + x + y)3
1
f1 (x) = for x 0
(1 + x)2
and
1
f2 (y) = for y 0
(1 + y)2
Since
f (0; 0) = 2 6= f1 (0) f2 (0) = (1) (1)
therefore X and Y are not independent random variables.
From the solution to Exercise 3.3.7 we have
x y
f (x; y) = 2e for 0 < x < y < 1
2x
f1 (x) = 2e for x > 0
and
y y
f2 (y) = 2e 1 e for y > 0
Since
3 1 2 2
f (1; 2) = 2e 6= f1 (1) f2 (2) = 2e 2e 1 e
therefore X and Y are not independent random variables.
9.2. CHAPTER 3 297

Exercise 3.5.3

P (X = x; Y = y)
P (Y = yjX = x) =
P (X = x)
h in x y
2 x
n!
x!y!(n x y)! [2 (1 )]y (1 )2
=
n! 2 x 2 n x
x!(n x)! 1
h in x y
y 2
(n x)! [2 (1 )] (1 )
=
y! (n x y)! 2 y 2 n x y
1 1
" #y " #(n x) y
n x 2 (1 ) (1 )2
= 2 2
y 1 1
" #y " #(n x) y
2
n x 2 (1 ) 1 2 +
= 2 2
y 1 1
" #y " #(n x) y
2 2
n x 2 (1 ) 1 2 +2
= 2 2
y 1 1
" #y " #(n x) y
n x 2 (1 ) 2 (1 )
= 2 1 2
y 1 1

for y = 0; 1; : : : ; (n x) which is the probability function of a Binomial n x; 2 1(1 2 ) as

( )
required.
We are given that there are x genotypes of type AA. Therefore there are only n x
members (trials) whose genotype must be determined. The genotype can only be of type
Aa or type aa. In a population with only these two types the proportion of type Aa would
be
2 (1 ) 2 (1 )
2 = 2
2 (1 ) + (1 ) 1
and the proportion of type aa would be

(1 )2 (1 )2 2 (1 )
2 = 2 =1 2
2 (1 ) + (1 ) 1 1

Since we have (n x) independent trials with probability of Success (type Aa) equal to
2 (1 )
then it follows that the number of Aa types, given that there are x members of type
(1 2 )
AA, would follow a Binomial n x; 2 1(1 2 ) distribution.
( )
298 9. SOLUTIONS TO CHAPTER EXERCISES

Exercise 3.5.4
For Example 3.3.3 the conditional probability density function of X given Y = y is
f (x; y) x+y
f1 (xjy) = = for 0 < x < 1 for each 0 < y < 1
f2 (y) y + 12

Check:
Z1 Z1
x+y 1 1 2 1 1
f1 (xjy) dx = 1 dx = 1 x + xyj10 = 1 +y 0 =1
y+2 y+ 2
2 y+ 2
2
1 0

By symmetry the conditional probability density function of Y given X = x is

x+y
f2 (yjx) = for 0 < y < 1 for each 0 < x < 1
x + 21

and
Z1
f2 (yjx) dy = 1
1

For Exercise 3.3.6 the conditional probability density function of X given Y = y is

2
f (x; y) (1+x+y)3 2 (1 + y)2
f1 (xjy) = = 1 = for x > 0 for each y > 0
f2 (y)
(1+y)2
(1 + x + y)3

Check:
Z1 Z1
2 (1 + y)2 2 1 1
f1 (xjy) dx = 3 dx = (1 + y) a!1
lim 2 j0
(1 + x + y) (1 + x + y)
1 0
1 1 1
= (1 + y)2 lim 2 + 2 = (1 + y)
2
=1
a!1 (1 + a + y) (1 + y) (1 + y)2
By symmetry the conditional probability density function of Y given X = x is

2 (1 + x)2
f2 (yjx) = for y > 0 for each x > 0
(1 + x + y)3
and
Z1
f2 (yjx) dy = 1
1

For Exercise 3.3.7 the conditional probability density function of X given Y = y is

f (x; y) 2e x y e x
f1 (xjy) = = = for 0 < x < y for each y > 0
f2 (y) 2e y (1 e y) 1 e y
9.2. CHAPTER 3 299

Check:
Z1 Zy x y
e 1 x y 1 e
f1 (xjy) dx = y
dx = y
e j0 = y
=1
1 e 1 e 1 e
1 0

The conditional probability density function of Y given X = x is

f (x; y) 2e x y
f2 (yjx) = = = ex y
for y > x for each x > 0
f1 (x) 2e 2x

Check:
Z1 Z1
f2 (yjx) dy = ex y
dy = ex lim e y a
jx = ex 0 + e x
=1
a!1
1 x

Exercise 3.6.10

Z1 Z1 Z1 Zy
x y
E (XY ) = xyf (x; y)dxdy = 2 xye dxdy
1 1 0 0
0 1
Z1 Zy Z1
= 2 ye y @ xe x
dxA dy = 2 ye y
xe x
e x
jy0 dy
0 0 0
Z1 Z1
y y y
= 2 ye ye e + 1 dy = 2 y2e 2y
+ ye 2y
ye y
dy
0 0
Z1 Z1 Z1
= 2 y2e 2y
dy (2y) e 2y
dy + 2 ye y
dy
0 0 0
Z1 Z1
= 2 y2e 2y
dy (2y) e 2y
dy + 2 (2) but (2) = 1! = 1
0 0
Z1 Z1
2 2y 2y u 1
= 2 2 y e dy (2y) e dy let u = 2y or y = so dy = du
2 2
0 0
Z1 Z1
u 2
u1 u1
= 2 2 e du ue du
2 2 2
0 0
Z1 Z1
1 1 1 1
= 2 u2 e u
du ue u
du = 2 (3) (2) but (3) = 2! = 2
4 2 4 2
0 0
1 1
= 2 =1
2 2
300 9. SOLUTIONS TO CHAPTER EXERCISES

Since X Exponential(1=2), E (X) = 1=2 and V ar (X) = 1=4.

Z1 Z1
y y
E (Y ) = yf2 (y)dy = 2 ye e + 1 dy
1 0
Z1 Z1 Z1
2y y 2y
= 2ye dy + 2 ye dy = 2ye dy + 2 (2)
0 0 0
Z1
2y
= 2 2ye dy let u = 2y
0
Z1 Z1
u1 1 u 1 1 3
= 2 ue du = 2 ue du = 2 (2) = 2 =
2 2 2 2 2
0 0

Z1 Z1
2 2
E Y = y f2 (y)dy = 2 y2e y
e y
+ 1 dy
1 0
Z1 Z1 Z1
2 2y 2 y
= 2 y e dy + 2 y e dy = 2 y2e 2y
dy + 2 (3)
0 0 0
Z1
= 4 2 y2e 2y
dy let u = 2y
0
Z1 Z1
u 2
u1 1 1 1 7
= 4 2 e du = 4 u2 e u
du = 4 (3) = 4 =
2 2 4 4 2 2
0 0

2
7 3 14 9 5
V ar (Y ) = E Y 2 [E (Y )]2 = = =
2 2 4 4
Therefore
1 3 3 1
Cov (X; Y ) = E (XY ) E (X) E (Y ) = 1 =1 =
2 2 4 4

and
1
Cov(X; Y ) 4 1
(X; Y ) = =q =p
X Y 1 5 5
4 4
9.2. CHAPTER 3 301

Exercise 3.7.12
Since Y Gamma( ; 1 )

1
E (Y ) = =
2
1
V ar (Y ) = = 2

and

Z1 1 y
k y e
E Y = yk dy let u = y
( )
0
Z1 k Z1
u k+ 1
u1
= e du = uk+ 1
e u
du
( ) ( )
0 0
k
( + k)
= for +k >0
( )

Since XjY = y Weibull(p; y 1=p )

1=p 1 1=p 1
E (Xjy) = y 1+ , E (XjY ) = Y 1+
p p

and

1=p
2 2 2 1
V ar (Xjy) = y 1+ 1+
p p
2=p 2 2 1
V ar (XjY ) = Y 1+ 1+
p p

Therefore

1 1=p
E (X) = E [E (XjY )] = 1+ E Y
p
1=p 1
1 p
= 1+
p ( )
1 1
1+ p p
1=p
=
( )
302 9. SOLUTIONS TO CHAPTER EXERCISES

and

V ar(X) = E[V ar(XjY )] + V ar[E(XjY )]

2 1 1
= E Y 2=p 1+ 2
1+ + V ar Y 1=p 1+
p p p
2 2 1 1
= 1+ 1+ E Y 2=p + 2 1 + V ar Y 1=p
p p p
2 2 1
= 1+ 1+ E Y 2=p
p p
1 2 h i2
+ 2 1+ E Y 1=p E Y 1=p
p
2 1
= 1+ E Y 2=p 2
1+ E Y 2=p
p p
1 1 h i2
+ 2 1+ E Y 2=p 2
1+ E Y 1=p
p p
2 1 h i2
= 1+ E Y 2=p 2
1+ E Y 1=p
p p
2 32
2=p 2 1p 1
2 p 2 1 4 p
5
= 1+ 1+
p ( ) p ( )
8 2 32 9
>
< 1+ p 2 2
1+ p1 1 >
=
p p
= 2=p 4 5
>
: ( ) ( ) >
;

Exercise 3.7.13
Since P Beta(a; b)

a ab
E (P ) = , V ar (P ) =
a+b (a + b + 1) (a + b)2

and
Z1
(a + b) a
E P k
= pk p 1
(1 p)b 1
dp
(a) (b)
0
Z1
(a + b)
= pa+k 1
(1 p)b 1
dp
(a) (b)
0
Z1
(a + b) (a + k) (a + k + b) k+a
= p 1
(1 p)b 1
dp
(a) (a + k + b) (a + k) (b)
0
(a + b) (a + k) (a + b) (a + k)
= (1) =
(a) (a + k + b) (a) (a + k + b)
9.2. CHAPTER 3 303

provided a + k > 0 and a + k + b > 0.

Since Y jP = p Geometric(p)

1 p 1 P 1
E (Y jp) = , E (Y jP ) = = 1
p P P
and
1 p 1 P 1 1
V ar (Y jp) = , V ar (Y jP ) = = 2
p2 P2 P P
Therefore
1 1
E (Y ) = E [E (Y jP )] = E 1 =E P 1
P
(a + b) (a 1)
= 1
(a) (a 1 + b)
(a 1 + b) (a 1 + b) (a 1)
= 1
(a 1) (a 1) (a 1 + b)
a 1+b b
= 1=
a 1 a 1
provided a > 1 and a + b > 1.
Now
V ar(Y ) = E[V ar(Y jP )] + V ar[E(Y jP )]

1 1 1
= E + V ar 1
P2 P P
" #
2 2
1 1 1 1
= E E +E 1 E 1
P2 P P P
2
1 1 1 1 1 1
= E E +E 2E +1 E + 2E 1
P2 P P2 P P P
2 1 1 2
= 2E P E P E P
2
(a + b) (a 2) (a + b) (a 1) (a + b) (a 1)
= 2
(a) (a 2 + b) (a) (a 1 + b) (a) (a 1 + b)
2
(a + b 2) (a + b 1) a + b 1 a+b 1
= 2
(a 2) (a 1) a 1 a 1
a+b 1 a+b 2 a+b 1
= 2 1
a 1 a 2 a 1
ab (a + b 1)
=
(a 1)2 (a 2)

provided a > 2 and a + b > 2.

304 9. SOLUTIONS TO CHAPTER EXERCISES

Exercise 3.9.4
If T = Xi + Xj ; i 6= j; then
T Binomial(n; pi + pj )
The moment generating function of T = Xi + Xj is

M (t) = E etT + E et(Xi +Xj ) = E etXi +tXj

= M (0; : : : ; 0; t; 0; : : : ; 0; t; 0; : : : ; 0)
n
= p1 + + pi et + + pj et + + pk 1 + pk for t 2 <
n
= (pi + pj ) et + (1 pi pj ) for t 2 <

which is the moment generating function of a Binomial(n; pi + pj ) random variable. There-

fore by the Uniqueness Theorem for Moment Generating Functions, T has a Binomial(n; pi + pj )
distribution.
9.3. CHAPTER 4 305

9.3 Chapter 4
Exercise 4.1.2
The support set of (X; Y ) is A = f(x; y) : 0 < x < y < 1g which is the union of the regions
E and F shown in Figure 9.3For s > 1

x=y/s
x=y
1

F E

1 x
0

Y
G (s) = P (S s) = P s = P (Y sX) = P (Y sX 0)
X
Z Z Z1 Zy Z1
= 3ydxdy = 3ydxdy = 3y xjyy=s dy
(x;y) 2 E y=0 x=y=s 0

Z1 Z1
y 1 1 1
= 3y y dy = 1 3y 2 dy = 1 y 3 j10 = 1
s s s s
0 0

The cumulative distribution function for S is

(
0 s 1
G (s) = 1
1 s s>1

1 1
As a check we note that lim 1 s = 0 = G (1) and lim 1 s = 1 so G (s) is a
s!1+ s!1
continuous function for all s 2 <.
For s > 1
d d 1 1
g (s) = G (s) = 1 =
ds ds s s2
The probability density function of S is
1
g (s) = for s > 1
s2
and 0 otherwise.
306 9. SOLUTIONS TO CHAPTER EXERCISES

Exercise 4.2.5
Since X Exponential(1) and Y Exponential(1) independently, the joint probability
density function of X and Y is
x y
f (x; y) = f1 (x) f2 (y) = e e
x y
= e

with support set RXY = f(x; y) : x > 0; y > 0g which is shown in Figure 9.9.The transfor-
. .
y . .
. .

...

x
0

Figure 9.9: Support set RXY for Example 4.2.5

mation
S : U =X +Y, V =X Y
has inverse transformation
U +V U V
X= , Y =
2 2
Under S the boundaries of RXY are mapped as

(k; 0) ! (k; k) for k 0

(0; k) ! (k; k) for k 0

and the point (1; 2) is mapped to and (3; 1). Thus S maps RXY into

RU V = f(u; v) : u < v < u; u > 0g

as shown in Figure 9.10.
9.3. CHAPTER 4 307

v v =u

...

u
0

...

v =-u

Figure 9.10: Support set RU V for Example 4.2.5

The Jacobian of the inverse transformation is

@x @x 1 1
@ (x; y) @u @v 2 2 1
= @y @y = 1 1 =
@ (u; v) @u @v 2 2 2

The joint probability density function of U and V is given by

u+v u v 1
g (u; v) = f ;
2 2 2
1 u
= e for (u; v) 2 RU V
2
and 0 otherwise.
To …nd the marginal probability density functions for U we note that the support set
RU V is not rectangular and the range of integration for v will depend on u.
Z1
g1 (u) = g (u; v) dv
1
Zu
1 u
= e dv
2
v= u
1
= ue u (2)
2
= ue u for u > 0

and 0 otherwise which is the probability density function of a Gamma(2; 1) random variable.
Therefore U Gamma(2; 1).
308 9. SOLUTIONS TO CHAPTER EXERCISES

To …nd the marginal probability density functions for V we need to consider the two
cases v 0 and v < 0. For v 0
Z1
g2 (v) = g (u; v) du
1
Z1
1 u
= e du
2
u=v
1 u b
= lim e jv
2 b!1
1 b v
= lim e e
2 b!1
1 v
= e
2
For v < 0
Z1
g2 (v) = g (u; v) du
1
Z1
1 u
= e du
2
u= v
1 u b
= lim e j v
2 b!1
1 b
= lim e ev
2 b!1
1 v
= e
2
Therefore the probability density function of V is
( 1 v
2e v<0
g2 (v) = 1 v
2e v 0

which is the probability density function of Double Exponential(0; 1) random variable.

Therefore V Double Exponential(0; 1).
9.3. CHAPTER 4 309

Exercise 4.2.7
Since X Beta(a; b) and Y Beta(a + b; c) independently, the joint probability density
function of X and Y is
(a + b) a 1 (a + b + c) a+b
f (x; y) = x (1 x)b 1
y 1
(1 y)c 1
(a) (b) (a + b) (c)
(a + b + c) a
= x 1
(1 x)b 1
y a+b 1
(1 y)c 1
(a) (b) (c)

with support set RXY = f(x; y) : 0 < x < 1; 0 < y < 1g as shown in Figure 9.11.

x
0 1

Figure 9.11: Support RXY for Exercise 4.2.5

The transformation
S : U = XY , V = X
has inverse transformation
X =V, Y = U=V
Under S the boundaries of RXY are mapped as

(k; 0) ! (0; k) 0 k 1
(0; k) ! (0; 0) 0 k 1
(1; k) ! (k; 1) 0 k 1
(k; 1) ! (k; k) 0 k 1

1 1 1 1
and the point 2; 3 is mapped to and 6; 2 . Thus S maps RXY into

RU V = f(u; v) : 0 < u < v < 1g

as shown in Figure 9.12
310 9. SOLUTIONS TO CHAPTER EXERCISES

u
0 1

Figure 9.12: Support set RU V for Exercise 4.2.5

The Jacobian of the inverse transformation is

@ (x; y) 0 1 1
= 1 @y =
@ (u; v) v @v
v

The joint probability density function of U and V is given by

u 1
g (u; v) = f v;
v v
(a + b + c) a 1 u a+b 1 u c 1 1
= v (1 v)b 1 1
(a) (b) (c) v v v
(a + b + c) a+b 1 u c 1
= u (1 v)b 1 v b 1
1 for (u; v) 2 RU V
(a) (b) (c) v
and 0 otherwise.
To …nd the marginal probability density functions for U we note that the support set
RU V is not rectangular and the range of integration for v will depend on u.
Z1 Z1
(a + b + c) a+b u c 1
g (u) = g (u; v) dv = u 1
(1 v)b 1
v b 1
1 dv
(a) (b) (c) v
1 v=u
Z1
(a + b + c) a+b u c 1 1
= u 1
v b+1
(1 v)b 1
1 dv
(a) (b) (c) v v2
u
Z1 b 1
(a + b + c) a 1 1 u c 1 u
= u ub 1
1 1 dv
(a) (b) (c) v v v2
u
Z1
(a + b + c) a 1 u b 1 u c 1 u
= u u 1 dv
(a) (b) (c) v v v2
u
9.3. CHAPTER 4 311

To evaluate this integral we make the substitution

u
v u
t=
1 u
Then
u
u = (1 u) t
v
u
1 = 1 u (1 u) t = (1 u) (1 t)
v
u
= (1 u) dt
v2
When v = u then t = 1 and when v = 1 then t = 0. Therefore
Z1
(a + b + c) a
g1 (u) = u 1
[(1 u) t]b 1
[(1 u) (1 t)]c 1
(1 u) dt
(a) (b) (c)
0
Z1
(a + b + c) a b+c 1
= u 1
(1 u) tb 1
(1 t)c 1
dt
(a) (b) (c)
0
(a + b + c) a 1 (b) (c)
= u (1 u)b+c 1
(a) (b) (c) (b + c)
(a + b + c) a 1
= u (1 u)b+c 1 for 0 < u < 1
(a) (b + c)
and 0 otherwise which is the probability density function of a Beta(a; b + c) random variable.
Therefore U Beta(a; b + c).

Exercise 4.2.12
(a) Consider the transformation
X=n
S: U= , V =Y
Y =m
which has inverse transformation
n
X= UV , Y = V
m
Since X 2 (n) independently of Y 2 (m) then the joint probability density function
of X and Z is
1 1
f (x; y) = f1 (x) f2 (y) = xn=2 1
e x=2
y m=2 1
e y=2
2n=2 (n=2) 2m=2 (m=2)
1
= xn=2 1
e x=2 m=2 1
y e y=2
2(n+m)=2 (n=2) (m=2)
with support set RXY = f(x; y) : x > 0; y > 0g. The transformation S maps RXY into
RU V = f(u; v) : u > 0; v > 0g.
312 9. SOLUTIONS TO CHAPTER EXERCISES

The Jacobian of the inverse transformation is

@x @x n @x
@ (x; y) @u @v m v @v n
= @y @y = = v
@ (u; v) @u @v 0 1 m

The joint probability density function of U and V is given by

n n
g (u; v) = f uv; v v
m m
1 n n=2 1 n n
= (uv)n=2 1
e ( m )uv=2 v m=2 1
e v=2
v
2(n+m)=2 (n=2) (m=2) m m
n n=2 n=2 1
u nu
= m
(n+m)=2
v (n+m)=2 1 e v( m +1)=2 for (u; v) 2 RU V
2 (n=2) (m=2)

and 0 otherwise.
To determine the distribution of U we still need to …nd the marginal probability density
function for U .
Z1
g1 (u) = g (u; v) dv
1
n n=2 n=2 1 Z1
u nu
= m
(n+m)=2
v (n+m)=2 1 e v( m +1)=2 dv
2 (n=2) (m=2)
0

1 1
n
Let y = v2 1 + m n
u so that v = 2y 1 + m u and dv = 2 1 + n
mu dy. Note that when
v = 0 then y = 0, and when v ! 1 then y ! 1. Therefore

n n=2 n=2 1 Z1
u n 1 (n+m)=2 1 n 1
m y
g1 (u) = 2y 1 + u e 2 1+ u dy
2(n+m)=2 (n=2) (m=2) m m
0
n n=2 n=2 1 (n+m)=2 Z1
u 2 n 1 (n+m)=2
= m
1+ u y (n+m)=2 1
e y
dy
2(n+m)=2 (n=2) (m=2) m
0
n n=2 n=2 1
m u n (n+m)=2 n+m
= 1+ u
(n=2) (m=2) m 2
n n=2 n+m
n (n+m)=2
= m 2
un=2 1
1+ u for u > 0
(n=2) (m=2) m

and 0 otherwise which is the probability density function of a random variable with a
F(n; m) distribution. Therefore U = YX=n
=m F(n; m).
(b) To …nd E (U ) we use

X=n m 1
E (U ) = E = E (X) E Y
Y =m n
9.3. CHAPTER 4 313

since X 2 (n) independently of Y 2 (m). From Example 4.2.10 we know that if

W 2 (k) then
2p (k=2 + p) k
E (W p ) = for + p > 0
(k=2) 2
Therefore
2 (n=2 + 1)
E (X) = =n
(n=2)
2 1 (m=2 1) 1 1
1
E Y = = = if m > 2
(m=2) 2 (m=2 1) m 2
and
m 1 m
E (U ) = (n) E = if m > 2
n m 2 m 2
To …nd V ar (U ) we need
" #
(X=n)2 m2
E U2 =E = E X2 E Y 2
(Y =m)2 n2

Now
22 (n=2 + 2) n n
E X2 = =4 +1 = n (n + 2)
(n=2) 2 2
2 2 (m=2 2) 1 1
2
E Y = = m m = for m > 4
(m=2) 4 2 1 2 2 (m 2) (m 4)
and
m2 1 n+2 m2
E U2 = n (n + 2) = for m > 4
n2 (m 2) (m 4) n (m 2) (m 4)

Therefore

V ar (U ) = E U 2 [E (U )]2
2
n+2 m2 m
=
n (m 2) (m 4) m 2
m 2 n+2 1
=
m 2 n (m 4) m 2
m2 (n + 2) (m 2) n (m 4)
=
m 2 n (m 4) (m 2)
m 2 2 (n + m 2)
=
m 2 n (m 4) (m 2)
2m2 (n + m 2)
= for m > 4
n (m 2)2 (m 4)
314 9. SOLUTIONS TO CHAPTER EXERCISES

Exercise 4.3.3
X1 ; X2 ; : : : ; Xn are independent and identically distributed random variables with moment
generating function M (t), E (Xi ) = , and V ar (Xi ) = 2 < 1.
p
The moment generating function of Z = n X = is
p
n(X )=
MZ (t) = E etZ = E et
p
p
n = t t n1 P n
= e E exp Xi
n i=1
p
n = t t P n
= e E exp p Xi
n i=1
t Q
p n t
n =
= e E exp p Xi since X1 ; X2 ; : : : ; Xn are independent
i=1 n
t Q
p n t
n =
= e M p since X1 ; X2 ; : : : ; Xn are identically distributed
i=1 n
p n
n = t t
= e M p
n

Exercise 4.3.7
P
n P
n
2
(Xi )2 = Xi X +X
i=1 i=1
Pn
2 P
n P
n
2
= Xi X +2 X Xi X + X
i=1 i=1 i=1
Pn
2 2
= Xi X +n X
i=1

since
P
n P
n P
n P
n
Xi X = Xi X= Xi nX
i=1 i=1 i=1 i=1
Pn 1 Pn
= Xi n Xi
i=1 n i=1
Pn P
n
= Xi Xi
i=1 i=1
= 0
9.3. CHAPTER 4 315

Exercise 4.3.11
Since X1 ; X2 ; : : : ; Xn are independent N 1 ; 21 random variables then by Theorem
4.3.8
Pn
2
2 Xi X
(n 1) S1
U= 2 = i=1 2
2
(n 1)
1 1

Since Y1 ; Y2 ; : : : ; Ym are independent N 2

2; 2 random variables then by Theorem 4.3.8

P
m
2
Yi Y
(m 1) S22 i=1 2
V = 2 = 2 (m 1)
2 2

U and V are independent random variables since X1 ; X2 ; : : : ; Xn are independent of

Y1 ; Y2 ; : : : ; Ym .
Now
2
(n 1)S1
2
S12 = 2
1
1
n 1
=
S22 = 2
2
2
(m 1)S2
2
2
m 1
U= (n 1)
=
V = (m 1)

Therefore by Theorem 4.2.11

S12 = 2
1
F (n 1; m 1)
S22 = 2
2
316 9. SOLUTIONS TO CHAPTER EXERCISES

9.4 Chapter 5
Exercise 5.4.4
If Xn Binomial(n; p) then
n
Mn (t) = E etXn = pet + q for t 2 < (9.7)

If = np then
p=and q = 1 (9.8)
n n
Substituting 9.8 into 9.7 and simplifying gives
" #n
n et 1
Mn (t) = et + 1 = 1+ for t 2 <
n n n

Now " #n
et 1 t
lim 1+ = e (e 1)
for t < 1
n!1 n
t
by Corollary 5.1.3. Since M (t) = e (e 1) for t 2 < is the moment generating function of
a Poisson( ) random variable then by Theorem 5.4.1, Xn !D X Poisson( ).
By Theorem 5.4.2

n x n x (np)x e np
P (Xn = x) = p (q) for x = 0; 1; : : :
x x!

Exercise 5.4.7
Let Xi Binomial(1; p), i = 1; 2; : : : independently. Since X1 ; X2 ; : : : are independent and
identically distributed random variables with E (Xi ) = p and V ar (Xi ) = p (1 p), then
p
n Xn p
p !D Z N(0; 1)
p (1 p)
P
n
by the Central Limit Theorem. Let Sn = Xi . Then
i=1

1 P
n
n n Xi p p
S np n Xn p
p n = p p
i=1
= p !D Z N(0; 1)
np (1 p) n p (1 p) p (1 p)

Now by 4.3.2(1) Sn Binomial(n; p) and therefore Yn and Sn have the same distribution.
It follows that
Yn np
Zn = p !D Z N(0; 1)
np (1 p)
9.4. CHAPTER 5 317

Exercise 5.5.3
(a) Let g (x) = x2 which is a continuous function for all x 2 <. Since Xn !p a then by
5.5.1(1), Xn2 = g (Xn ) !p g (a) = a2 or Xn2 !p a2 .
(b) Let g (x; y) = xy which is a continuous function for all (x; y) 2 <2 . Since Xn !p a and
Yn !p b then by 5.5.1(2), Xn Yn = g (Xn ; Yn ) !p g (a; b) = ab or Xn Yn !p ab.
(c) Let g (x; y) = x=y which is a continuous function for all (x; y) 2 <2 ; y 6= 0. Since
Xn !p a and Yn !p b 6= 0 then by 5.5.1(2), Xn =Yn = g (Xn ; Yn ) !p g (a; b) = a=b , b 6= 0
or Xn =Yn !p a=b, b 6= 0.
(d) Let g (x; z) = x 2z which is a continuous function for all (x; z) 2 <2 . Since Xn !p a
and Zn !D Z N(0; 1) then by Slutsky’s Theorem, Xn 2Zn = g (Xn ; Zn ) !D g (a; Z) =
a 2Z or Xn 2Zn !D a 2Z where Z N(0; 1). Since a 2Z N(a; 4), therefore
Xn 2Zn !D a 2Z N(a; 4)
(e) Let g (x; z) = 1=z which is a continuous function for all (x; z) 2 <2 ; z 6= 0. Since
Zn !D Z N(0; 1) then by Slutsky’s Theorem, 1=Zn = g (Xn ; Zn ) !D g (a; z) = 1=Z
or 1=Zn !D 1=Z where Z N(0; 1). Since h (z) = 1=z is a decreasing function for all z 6= 0
then by Theorem 2.6.8 the probability density function of W = 1=Z is

1 1=(2w2 ) 1
f (w) = p e for z 6= 0
2 w2

Exercise 5.5.8
By (5.9) p
n Xn
p !D Z N(0; 1)

and by Slutsky’s Theorem

p p
Un = n Xn !D Z N(0; ) (9.9)
p 1 1
Let g (x) = x, a = , and b = 1=2. Then g 0 (x) = p
2 x
and g 0 (a) = g 0 ( ) = 2
p . By (9.9)
and the Delta Method
p p 1 p 1 1
n1=2 Xn !D p ( Z) = Z N 0;
2 2 4
318 9. SOLUTIONS TO CHAPTER EXERCISES

9.5 Chapter 6
Exercise 6.4.6
The probability density function of a Exponential(1; ) random variable is
(x )
f (x; ) = e for x and 2<

and zero otherwise. The support set of the random variable X is [ ; 1) which depends on
the unknown parameter .
The likelihood function is
Qn
L( ) = f (xi ; )
i=1
Q
n
(xi )
= e if xi , i = 1; 2; : : : ; n and 2<
i=1
Qn
xi
= e en if xi and 2<
i=1

or more simply 8
<0 if > x(1)
L( ) =
:en if x(1)
where x(1) = min (x1 ; x2 ; : : : ; xn ) is the maximum of the sample. (Note: In order to observe
the sample x1 ; x2 ; : : : ; xn the value of must be smaller than all the observed xi ’s.) L( )
is a increasing function of on the interval ( 1; x(1)] ]. L( ) is maximized at = x(1) .
The maximum likelihood estimate of is ^ = x(1) and the maximum likelihood estimator
is ~ = X(1) .
Note that in this example there is no solution to dd l ( ) = dd (n ) = 0 and the maximum
likelihood estimate of is not found by solving dd l ( ) = 0.
If n = 12 and x(1) = 2 8
<0 if > 2
L( ) =
:e12 if 2
The relative likelihood function is
8
<0 if >2
R( ) =
:e12( 2) if 2

is graphed in Figure 9.13 along with lines for determining 10% and 50% likelihood intervals.
To determine the value of at which the horizontal line R = p intersects the graph of
R( ) we solve e12( 2) = p to obtain = 2 + log p=12. Since R( ) = 0 if > 2 then a
100p% likelihood interval for is of the form [2 + log (p) =12; 2]. For p = 0:1 we obtain the
10% likelihood interval [2 + log (0:1) =12; 2] = [1:8081; 2]. For p = 0:5 we obtain the 50%
likelihood interval [2 + log (0:5) =12; 2] = [1:9422; 2].
9.5. CHAPTER 6 319

0.9

0.8

0.7

0.6
R(θ)

0.5

0.4

0.3

0.2

0.1

0
1.5 1.6 1.7 1.8 1.9 2 2.1 2.2
θ

Figure 9.13: Relative likelihood function for Exercise 6.4.6

Exercise 6.7.3
By Chapter 5, Problem 7 we have
p Xn
n n
Q(Xn ; ) = q !D Z N(0; 1) (9.10)
Xn Xn
n 1 n

and therefore Q(Xn ; ) is an asymptotic pivotal quantity.

Let a be the value such that P (Z a) = (1 + p) =2 where Z N(0; 1). Then by (9.10) we
have
0 1
p Xn
n n
p P@ a q aA
Xn Xn
n 1 n
s s !
Xn a Xn Xn Xn a Xn Xn
= P p 1 +p 1
n n n n n n n n

and an approximate 100p% equal tail con…dence interval for is

r r
xn a xn xn xn a xn xn
p 1 ; +p 1
n n n n n n n n
or 2 v v 3
u u
u^ 1 ^ u^ 1 ^
6 t t 7
6^ a ;^ + a 7
4 n n 5

where ^ = xn =n.
320 9. SOLUTIONS TO CHAPTER EXERCISES

Exercise 6.7.12
(a) The likelihood function is

Q
n Q
n e (x )
L( ) = f (xi ; ) =
i=1 i=1 1+e (x ) 2

P
n Q
n 1
= exp xi en for 2<
i=1 i=1 1+e (xi ) 2

or more simply
Q
n 1
L( ) = en for 2<
i=1 1+e (xi ) 2

The log likelihood function is

P
n
(xi )
l( ) = n 2 log 1 + e for 2<
i=1

The score function is

d P
n e (xi )
S( ) = l( ) = n 2 (xi )
d i=1 1 + e
Pn 1
= n 2 xi
for 2 <
i=1 1 + e

Notice that S( ) = 0 cannot be solved explicitly. The maximum likelihood estimate can
only be determined numerically for a given sample of data x1 ; x2 ; : : : ; xn . Note that since
" #
d Pn e xi
S( )= 2 2 for 2 <
d i=1 (1 + exi )

is negative for all values of > 0 then we know that S ( ) is always decreasing so there
is only one solution to S( ) = 0. Therefore the solution to S( ) = 0 gives the maximum
likelihood estimate.
The information function is
d2
I( ) = l( )
d 2
P
n e xi
= 2 2 for 2<
i=1 (1 + exi )

(b) If X Logistic( ; 1) then the cumulative distribution function of X is

1
F (x; ) = for x 2 <; 2<
1 + e (x )

Solving
1
u=
1 + e (x )
9.5. CHAPTER 6 321

gives
1
x= log 1
u
Therefore the inverse cumulative distribution function is

1 1
F (u) = log 1 for 0 < u < 1
u
1
If u is an observation from the Uniform(0; 1) distribution then log u 1 is an obser-
vation from the Logistic( ; 1) distribution by Theorem 2.6.6.
(c) The generated data are

0:18 0:05 0:32 0:78 1:04 1:11 1:26 1:41 1:50 1:57
1:58 1:60 1:68 1:68 1:71 1:89 1:93 2:02 2:2:5 2:40
2:47 2:59 2:76 2:78 2:87 2:91 4:02 4:52 5:25 5:56

(d) Here is R code for calculating and graphing the likelihood function for these data.
# function for calculating Logistic information for data x and theta=th
LOLF<-function(th,x)
{n<-length(x)
L<-exp(n*th)*(prod(1+exp(th-x)))^(-2)
return(L)}
th<-seq(1,3,0.01)
L<-sapply(th,LOLF,x)
plot(th,L,"l",xlab=expression(theta),
ylab=expression(paste("L(",theta,")")),lwd=3)
The graph of the likelihood function is given in Figure 9.5.(e) Here is R code for Newton’s
50000
30000
L(θ)

0 10000

1.0 1.5 2.0 2.5 3.0

θ
322 9. SOLUTIONS TO CHAPTER EXERCISES

Method. It requires functions for calculating the score and information

# function for calculating Logistic score for data x and theta=th
LOSF<-function(th,x)
{n<-length(x)
S<-n-2*sum(1/(1+exp(x-th)))
return(S)}
#
# function for calculating Logistic information for data x and theta=th
LOIF<-function(th,x)
{n<-length(x)
I<-2*sum(exp(x-th)/(1+exp(x-th))^2)
return(I)}
#
# Newton’s Method for Logistic Example
NewtonLO<-function(th,x)
{thold<-th
thnew<-th+0.1
while (abs(thold-thnew)>0.00001)
{thold<-thnew
thnew<-thold+LOSF(thold,x)/LOIF(thold,x)
print(thnew)}
return(thnew)}
# use Newton’s Method to find the maximum likelihood estimate
# use the mean of the data to begin Newton’s Method
# since theta is the mean of the distribution
thetahat<-NewtonLO(mean(x),x)
cat("thetahat = ",thetahat)
The maximum estimate found using Newton’s Method is
^ = 2:018099

(f ) Here is R code for determining the values of S(^) and I(^).

# calculate Score(thetahat) and the observed information
Sthetahat<-LOSF(thetahat,x)
cat("S(thetahat) = ",Sthetahat)
Ithetahat<-LOIF(thetahat,x)
cat("Observed Information = ",Ithetahat)
The values of S(^) and I(^) are
S(^) = 3:552714 10 15

I(^) = 11:65138
9.5. CHAPTER 6 323

(g) Here is R code for plotting the relative likelihood function for based on these data.
# function for calculating Logistic relative likelihood function
LORLF<-function(th,thetahat,x)
{R<-LOLF(th,x)/LOLF(thetahat,x)
return(R)}
#
# plot the Logistic relative likelihood function
#plus a line to determine the 15% likelihood interval
th<-seq(1,3,0.01)
R<-sapply(th,LORLF,thetahat,x)
plot(th,R,"l",xlab=expression(theta),
ylab=expression(paste("R(",theta,")")),lwd=3)
abline(a=0.15,b=0,col="red",lwd=2)
The graph of the relative likelihood function is given in Figure 9.5.
1.0
0.8
0.6
R(θ)

0.4
0.2
0.0

1.0 1.5 2.0 2.5 3.0

(h) Here is R code for determining the 15% likelihood interval and the approximate 95%
con…dence interval (6.18)
# determine a 15% likelihood interval using uniroot
uniroot(function(th) LORLF(th,thetahat,x)-0.15,lower=1,upper=1.8)$root
uniroot(function(th) LORLF(th,thetahat,x)-0.15,lower=2.2,upper=3)$root
# calculate an approximate 95% confidence intervals for theta
L95<-thetahat-1.96/sqrt(Ithetahat)
U95<-thetahat+1.96/sqrt(Ithetahat)
cat("Approximate 95% confidence interval = ",L95,U95) # display values
324 9. SOLUTIONS TO CHAPTER EXERCISES

The 15% likelihood interval is

[1:4479; 2:593964]
The approximate 95% con…dence interval is

[1:443893; 2:592304]

which are very close due to the symmetric nature of the likelihood function.
9.6. CHAPTER 7 325

9.6 Chapter 7
Exercise 7.1.11
If x1 ; x2 ; :::; xn is an observed random sample from the Gamma( ; ) distribution then the
likelihood function for ; is
Q
n
L( ; ) = f (xi ; ; )
i=1
Q
n xi 1 xi =
= e for > 0; >0
i=1 ( )
1
n Q
n t2
= [ ( ) ] xi exp for > 0; >0
i=1
where
P
n
t2 = xi
i=1
or more simply
n Q
n t2
L( ; ) = [ ( ) ] xi exp for > 0; >0
i=1
The log likelihood function is
l ( ; ) = log L ( ; )
t2
= n log ( ) n log + t1 for > 0; >0

where
P
n
t1 = log xi
i=1
The score vector is
h i
@l @l
S( ; ) = @ @
h i
t2 n
= n ( ) + t1 n log 2

where
d
(z) = log (z)
dz
is the digamma function.
The information matrix is
2 3
@2l @2l
@ 2 @ @
I( ; ) = 4 @2l @2l
5
@ @ @ 2
2 3
0 n
n ( )
= 4 n 2t2 n
5
3 2

2 3
0 1
( )
= n4 1 2x
5
3 2
326 9. SOLUTIONS TO CHAPTER EXERCISES

where
0 d
(z) = (z)
dz
is the trigamma function.
The expected information matrix is

J ( ; ) = E [I ( ; ) ; X1 ; :::; Xn ]
2 0 1
3
( )
= n4 2E (X; ; )
1
5
3 2

2 3
0 1
( )
= n4 1 2
5
2 2

since E X; ; = .
S ( ; ) = (0 0) must be solved numerically to …nd the maximum likelihood estimates
of and .
9.6. CHAPTER 7 327

Exercise 7.1.14
The data are

1:58 2:78 2:81 3:29 3:45 3:64 3:81 4:69 4:89 5:37
5:47 5:52 5:87 6:07 6:11 6:12 6:26 6:42 6:74 7:49
7:93 7:99 8:14 8:31 8:72 9:26 10:10 12:82 15:22 17:82

The maximum likelihood estimates of and can be found using Newton’s Method
h i h i h i 1
(i+1) (i+1) (i) (i) (i) (i) (i) (i)
= +S ; I ; for i = 0; 1; : : :

Here is R code for Newton’s Method for the Gamma Example.

# function for calculating Gamma score for a and b and data x
GASF<-function(a,b,x))
{t1<-sum(log(x))
t2<-sum(x)
n<-length(x)
S<-c(t1-n*(digamma(a)+log(b)),t2/b^2-n*a/b)
return(S)}
#
# function for calculating Gamma information for a and b and data x
GAIF<-function(a,b,x)
{I<-length(x)*cbind(c(trigamma(a),1/b),c(1/b,2*mean(x)/b^3-a/b^2))
return(I)}
#
# Newton’s Method for Gamma Example
NewtonGA<-function(a,b,x)
{thold<-c(a,b)
thnew<-thold+0.1
while (sum(abs(thold-thnew))>0.0000001)
{thold<-thnew
thnew<-thold+GASF(thold[1],thold[2],x)%*%solve(GAIF(thold[1],thold[2]))
print(thnew)}
return(thnew)}
thetahat<-NewtonGA(2,2,x)
328 9. SOLUTIONS TO CHAPTER EXERCISES

The maximum likelihood estimates are ^ = 4:118407 and ^ = 1:657032.

The score vector evaluated at (^ ; ^ ) is
h i
S(^ ; ^ ) = 0 1:421085 10 14

which indicates we have obtained a local extrema.

The observed information matrix is
" #
^ 8:239505 18:10466
I(^ ; ) =
18:10466 44:99752

Note that since det[I(^ ; ^ )] = (8:239505) (44:99752) (18:10466)2 > 0 and

[I(^ ; ^ )]11 = 8:239505 > 0 then by the second derivative test we have found the maximum
likelihood estimates.
9.6. CHAPTER 7 329

Exercise 7.2.3
(a) The following R code generates the required likelihood regions.
# function for calculating Gamma relative likelihood for parameters a and b and
data x
GARLF<-function(a,b,that,x)
{t<-prod(x)
t2<-sum(x)
n<-length(x)
ah<-that[1]
bh<-that[2]
L<-((gamma(ah)*bh^ah)/(gamma(a)*b^a))^n
*t^(a-ah)*exp(t2*(1/bh-1/b))
return(L)}
a<-seq(1,8.5,0.02)
b<-seq(0.2,4.5,0.01)
R<-outer(a,b,FUN = GARLF,thetahat,x)
contour(a,b,R,levels=c(0.01,0.05,0.10,0.50,0.9),xlab="a",ylab="b",lwd=2)
The 1%, 5%, 10%, 50%, and 90% likelihood regions for ( ; ) are shown in Figure 9.14.
4
3
b

2
1

2 4 6 8

Figure 9.14: Likelihood regions for Gamma for 30 observations

330 9. SOLUTIONS TO CHAPTER EXERCISES

The likelihood contours are not very elliptical in shape. The contours suggest that large
values of together with small values of or small values of together with large values
of are plausible given the observed data.
(b) Since R (3; 2:7) = 0:14 the point (3; 2:7) lies inside a 10% likelihood region so it is a
plausible value of ( ; ).
(d) The 1%, 5%, 10%, 50%, and 90% likelihood regions for ( ; ) for 100 observations are
shown in Figure 9.15. We note that for a larger number of observations the likelihood
regions are more elliptical in shape.
4.5
4.0
3.5
3.0
b

2.5
2.0
1.5

2.0 2.5 3.0 3.5 4.0 4.5 5.0

Figure 9.15: Likelihood regions for Gamma for 100 observations

9.6. CHAPTER 7 331

Exercise 7.4.4
The following R code graphs the approximate con…dence regions. The function ConfRegion
was used in Example 7.4.3.
# graph approximate confidence regions
c<-outer(a,b,FUN = ConfRegion,thetahat,Ithetahat)
contour(a,b,c,levels=c(4.61,5.99,9.21),xlab="a",ylab="b",lwd=2)
4
3
b

2
1

2 4 6 8

Figure 9.16: Approximate con…dence regions for Gamma for 30 observations

These approximate con…dence regions which are ellipses are very di¤erent than the
likelihood regions in Figure 9.14. In particular we note that ( ; ) = (3; 2:7) lies inside a
10% likelihood region but outside a 99% approximate con…dence region.
There are only 30 observations and these di¤erences suggest the Normal approximation
is not very good. The likelihood regions are a better summary of the uncertainty in the
estimates.
332 9. SOLUTIONS TO CHAPTER EXERCISES

Exercise 7.4.7

Let " #
h i 1 v^11 v^12
I(^ ; ^ ) =
v^12 v^22
Since " #!
h i h i 1 0
~ ~ [J(~ ; ~ )]1=2 !D Z BVN 0 0 ;
0 1

then for large n, V ar(~ ) v^11 , V ar( ~ ) v^22 and Cov(~ ; ~ ) v^12 . Therefore an approxi-
mate 95% con…dence interval for a is given by
p p
[^ 1:96 v^11 ; ^ + 1:96 v^11 ]

and an approximate 95% con…dence interval for b is given by

p p
[ ^ 1:96 v^22 ; ^ + 1:96 v^22 ]

For the data in Exercise 7.1.14, ^ = 4:118407 and ^ = 1:657032 and

h i 1
I(^ ; ^ )
" # 1
8:239505 18:10466
=
18:10466 44:99752
" #
1:0469726 0:4212472
=
0:4212472 0:1917114

An approximate 95% marginal con…dence interval for is

p p
[4:118407 + 1:96 1:0469726; 4:118407 1:96 1:0469726] = [2:066341; 6:170473]

An approximate 95% con…dence interval for is

p p
[1:657032 1:96 0:1917114; 1:657032 + 1:96 0:1917114] = [1:281278; 2:032787]
9.6. CHAPTER 7 333

To obtain an approximate 95% marginal con…dence interval for + we note that

V ar(~ + ~ ) = V ar(~ ) + V ar( ~ ) + 2Cov(~ ; ~ )

v^11 + v^22 + 2^
v12 = v^

so that an approximate 95% con…dence interval for + is given by

p p
[^ + ^ 1:96 v^; ^ + ^ + 1:96 v^]

For the data in Example 7.1.13

^ + ^ = 4:118407 + 1:657032 = 5:775439

v^ = v^11 + v^22 + 2^
v12 = 1:0469726 + 0:1917114 + 2( 0:4212472) = 0:3961895

and an approximate 95% marginal con…dence interval for + is

p p
[5:775439 + 1:96 0:3961895; 5:775439 1:96 0:3961895] = [4:998908; 6:551971]
334 9. SOLUTIONS TO CHAPTER EXERCISES

9.7 Chapter 8
Exercise 8.1.7
(a) Let X = number of successes in n trails. Then X Binomial(n; ) and E (X) = n .
If the null hypothesis is H0 : = 0 and the alternative hypothesis is HA : 6= 0 then a
suitable test statistic is D = jX n 0 j.
For n = 100, x = 42, and 0 = 0:5 the observed value of D is d = jx n 0 j = j42 50j = 8.
The p-value is

P (jX n 0j jx n 0 j ; H0 : = 0)

= P (jX 50j 8) where X Binomial (100; 0:5)

= P (X 42) + P (X 58)
= 0:06660531 + 0:06660531
= 0:1332106

calculated using R. Since p-value > 0:1 there is no evidence based on the data against
H0 : = 0:5.
(b) If the null hypothesis is H0 : = 0 and the alternative hypothesis is HA : < 0 then
a suitable test statistic is D = n 0 X.
For n = 100, x = 42, and 0 = 0:5 the observed value of D is d = n 0 x = 50 42 = 8.
The p-value is

P (n 0 X n 0 x; H0 : = 0)

= P (50 X 8) where X Binomial (50; 0:5)

= P (X 42)
= 0:06660531

calculated using R. Since 0:05 < p-value < 0:1 there is weak evidence based on the data
against H0 : = 0:5.

Exercise 8.2.5
The model for these data is (X1 ; X2 ; : : : ; X7 )Multinomial(63; 1 ; 2 ; : : : ; 7 ) and the
1
hypothesis of interest is H0 : 1 = 2 = =7 = 7 . Since the model and parameters
P7
are completely speci…ed this is a simple hypothesis. Since j = 1 there are only k = 6
j=1
parameters.
The likelihood ratio test statistic, which can be derived in the same way as Example 8.2.4,
is
P7 Xj
(X; 0 ) = 2 Xj log
i=1 Ej
where Ej = 63=7 is the expected frequency for outcome j.
9.7. CHAPTER 8 335

For these data the observed value of the likelihood ratio test statistic is
P
7 xj
(x; 0) = 2 xj log
i=1 63=7
22 7 6
= 2 22 log + 7 log + + 6 log
63=7 63=7 63=7
= 23:27396

The approximate p-value is

2
p-value P (W 23:27396) where W (6)
= 0:0007

calculated using R. Since p-value < 0:001 there is strong evidence based on the data against
the hypothesis that the deaths are equally likely to occur on any day of the week.

Exercise 8.3.3
(a) = f( 1 ; 2 ) : 1 > 0; 2> 0g which has dimension k = 2 and
0 = f( 1 ; 2 ) : 1 = 2 ; 1 > 0; 2 > 0g which has dimension q = 1 and the hypothesis
H0 : 1 = 2 is composite.
From Example 6.2.5 the likelihood function for an observed random sample x1 ; x2 ; : : : ; xn
from an Poisson( 1 ) distribution is
nx n
L1 ( 1 ) = 1 e
1
for 1 0

with maximum likelihood estimate ^1 = x.

Similarly the likelihood function for an observed random sample y1 ; y2 ; : : : ; ym from an
Poisson( 2 ) distribution is
ny n
L2 ( 2 ) = 2 e
2
for 2 0

with maximum likelihood estimate ^2 = y. Since the samples are independent the likelihood
function for ( 1 ; 2 ) is

L ( 1; 2) = L1 ( 1 )L2 ( 2 ) for 1 0; 2 0

and the log likelihood function

l ( 1; 2) = nx log 1 n 1 + my log 2 m 2 for 1 > 0; 2 >0

The independence of the samples also implies the maximum likelihood estimators are
~1 = X and ~2 = Y . Therefore

l(~1 ; ~2 ; X; Y) = nX log X nX + mY log Y mY

= nX log X + mY log Y nX + mY
336 9. SOLUTIONS TO CHAPTER EXERCISES

If 1 = 2 = then the log likelihood function is

l ( ) = (nx + my) log (n + m) for >0

which is only a function of . To determine max l( 1 ; 2 ; X; Y) we note that

( 1 ; 2 )2 0

d nx + my
l( ) = (n + m)
d
d nx+my
and d l ( ) = 0 for = n+m and therefore

nX + mY
max l( 1 ; 2 ; X; Y) = nX + mY log nX + mY
( 1 ; 2 )2 0 n+m

The likelihood ratio test statistic is

(X; Y; 0)

= 2 l(~1 ; ~2 ; X; Y) max l( 1 ; 2 ; X; Y)
( 1 ; 2 )2 0

nX + mY
= 2 nX log X + mY log Y nX + mY nX + mY log + nX + mY
n+m
nX + m Y
= 2 nX log X + mY log Y nX + mY log
n+m

with corresponding observed value

nx + my
(x; y; 0) = 2 nx log x + my log y (nx + my) log
n+m

Since k q=2 1=1

2
p-value P [W (x; y; 0 )] where W (1)
h p i
= 2 1 P Z (x; y; 0 ) where Z N (0; 1)

P
10 P
15
(b) For n = 10, xi = 22, m = 15, yi = 40 the observed value of the likelihood ratio
i=1 i=1
test statistic is (x; y; 0) = 0:5344026 and
2
p-value P [W 0:5344026] where W (1)
h p i
= 2 1 P Z 0:5344026 where Z N (0; 1)
= 0:4647618

calculated using R. Since p-value > 0:1 there is no evidence against H0 : 1 = 2 based on
the data.
10. Solutions to Selected End of
Chapter Problems

337
338 10. SOLUTIONS TO SELECTED END OF CHAPTER PROBLEMS

10.1 Chapter 2
1(a) Starting with
P
1
x 1
= for j j < 1
x=0 1
it can be shown that
P
1
x
x = for j j < 1 (10.1)
x=1 (1 )2
P
1 1+
x2 x 1
= for j j < 1 (10.2)
x=1 (1 )3
P
1 1+4 + 2
x3 x 1
= for j j < 1 (10.3)
x=1 (1 )4

(1)
1 P
1
x (1 )2
= x = using (10.1) gives k =
k x=1 (1 )2
and therefore
f (x) = (1 )2 x x 1
for x = 1; 2; :::; 0 < <1
The graph of f (x) in Figure 10.1 is for = 0:3.

0.5

0.45

0.4

0.35

0.3
f(x)
0.25

0.2

0.15

0.1

0.05

0
1 2 3 4 5 6 7 8
x

Figure 10.1: Graph of f (x)

(2) 8
>
<0 if x < 1
F (x) = Px
>
: (1 )2 t t 1
=1 (1 + x x ) x
for x = 1; 2; :::
t=1
10.1. CHAPTER 2 339

Note that F (x) is speci…ed by indicating its value at each jump point.
(3)
P
1 P
1
E (X) = x (1 )2 x x 1
= (1 )2 x2 x 1
x=1 x=1
(1 + )
= (1 )2 using (10.2)
(1 )3
(1 + )
=
(1 )

P
1 P
1
E X2 = x2 (1 )2 xx 1
= (1 )2 x3 x 1
x=1 x=1
(1 + 4 + 2 )
= (1 )2 using (10.3)
(1 )4
(1 + 4 + 2 )
=
(1 )2

(1 + 4 + 2 ) (1 + )2 2
V ar(X) = E(X 2 ) [E (X)]2 = 2 =
(1 )2 (1 ) (1 )2
(4) Using = 0:3,

P (0:5 < X 2) = P (X = 1) + P (X = 2)
= 0:49 + (0:49)(2)(0:3) = 0:784

P (X > 0:5; X 2)
P (X > 0:5jX 2) =
P (X 2)
P (X = 1) + P (X = 2)
= =1
P (X 2)
1:(b) (1)
Z1 Z1
1 1 1
= dx = 2 dx because of symmetry
k 1 + (x= )2 1 + (x= )2
1 0
Z1
1 1
= 2 dy let y = x; dy = dx
1 + y2
0

= 2 lim arctan (b) = 2 =

b!1 2
1
Thus k = and
1
f (x) = h i for x 2 <; >0
1 + (x= )2
340 10. SOLUTIONS TO SELECTED END OF CHAPTER PROBLEMS

The graphs for = 0:5, 1 and 2 are plotted in Figure 10.2. The graph for each di¤erent
value of is obtained from the graph for = 1 by simply relabeling the x and y axes. That
is, on the x axis, each point x is relabeled x and on the y axis, each point y is relabeled
y= . The graph of f (x) below is for = 1: Note that the graph of f (x) is symmetric about
the y axis.

0.7

0.6
θ=0.5

0.5

0.4
f(x)

0.3

0.2
θ=0

0.1
θ=2

0
-5 0 5
x

Figure 10.2: Cauchy probability density functions

(2)
Zx
1 1 t
F (x) = h i dt = lim arctan jxb
2 b! 1
1
1 + (t= )
1 x 1
= arctan + for x 2 <
2
(3) Consider the integral
Z1 Z1
x t 1
h i dx = dt = lim ln 1 + b2 = +1
1 + (x= )2 1 + t2 b!1 2
0 0

Since the integral

Z1
x
h i dx
0
1 + (x= )2
does not converge, the integral
Z1
x
h i dx
1
1 + (x= )2
10.1. CHAPTER 2 341

does not converge absolutely and E (X) does not exist. Since E (X) does not exist V ar (X)
does not exist.
(4) Using = 1,

P (0:5 < X 2) = F (2) F (0:5)

1
= [arctan (2) arctan (0:5)]
0:2048

P (X > 0:5; X 2)
P (X > 0:5jX 2) =
P (X 2)
P (0:5 < X 2) F (2) F (0:5)
= =
P (X 2) F (2)
arctan (2) arctan (0:5)
= 0:2403
arctan (2) + 2

1:(c) (1)
Z1
1 jx j
= e dx let y = x , then dy = dx
k
1
Z1 Z1
jyj y
= e dy = 2 e dy by symmetry
1 0
= 2 (1) = 2 (0!) = 2
1
Thus k = 2 and
1
f (x) = e jx j for x 2 <; 2 <
2
The graphs for = 1, 0 and 2 are plotted in Figure 10.3. The graph for each di¤erent
value of is obtained from the graph for = 0 by simply shifting the graph for = 0 to
the right units if is positive and shifting the graph for = 0 to the left units if is
negative. Note that the graph of f (x) is symmetric about the line x = .
(2) 8 x
> R 1 t
>
< 1 2 e dt
> x
F (x) =
>
> R 1 t Rx 1 t+
>
: 2 e dt + 2e dt x >
1
8
< 1 ex x
2
=
:1 + 1 t+ jx 1 x+
2 2 e =1 2e x>
342 10. SOLUTIONS TO SELECTED END OF CHAPTER PROBLEMS

0.9

θ=0 θ=2
0.8 θ=-1

0.7

0.6
f(x)
0.5

0.4

0.3

0.2

0.1

0
-5 0 5
x

Figure 10.3: Double Exponential probability density functions

(3) Since the improper integral

Z1 Z1
(x ) y
(x )e dx = ye dy = (2) = 1! = 1
0

1
R1 jx j dx
converges, the integral 2 xe converges absolutely and by the symmetry of f (x)
1
we have E (X) = .
Z1
2 1
E X = x2 e jx j
dx let y = x , then dy = dx
2
1
Z1 Z1
1 2 jyj 1
= (y + ) e dy = y 2 + 2y + 2
e jyj
dy
2 2
1 1
Z1 Z1
= y2e y
dy + 0 + 2
e y
dy using the properties of even/odd functions
0 0
2 2
= (3) + (1) = 2! +
2
= 2+
Therefore
V ar(X) = E(X 2 ) [E (X)]2 = 2 + 2 2
=2
(4) Using = 0,
1 0:5 2
P (0:5 < X 2) = F (2) F (0:5) = (e e ) 0:2356
2
10.1. CHAPTER 2 343

P (X > 0:5; X 2) P (0:5 < X 2) F (2) F (0:5)

P (X > 0:5jX 2) = = =
P (X 2) P (X 2) F (2)
1 0:5 2
2 (e e )
= 0:2527
1 12 e 2

1:(e) (1)
Z1
1 1
= x2 e x
dx let y = x; dy = dx
k
0
Z1
1 1 2! 2
= 3 y2e y
dy = 3 (3) = 3 = 3
0
3
Thus k = 2 and
1 3 2 x
f (x) =
x e for x 0; > 0
2
The graphs for = 0:5, 1 and 2 are plotted in Figure 10.4. The graph for each di¤erent
value of is obtained from the graph for = 1 by simply relabeling the x and y axes. That
is, on the x axis, each point x is relabeled x= and on the y axis, each point y is relabeled
y.

0.3
θ=0.5
θ=1
θ=2
0.25

0.2
f(x)

0.15

0.1

0.05

0
0 2 4 6 8 10
x

Figure 10.4: Gamma probabilty density functions

(2) 8
>
<0 if x 0
F (x) = Rx
>
: 21 3 2
t e t dt if x > 0
0
344 10. SOLUTIONS TO SELECTED END OF CHAPTER PROBLEMS

Using integration by parts twice we have

2 3
Zx Zx
1 14
3 2
t e t
dt = ( t)2 e t jx0 + 2 2
te t dt5
2 2
0
2 80 93
< Zx =
14
= ( x)2 e x + 2 ( t) e t jx0 + e t dt 5
2 : ;
0
1h n oi
= ( x)2 e x 2 ( x) e x 2 e t jx0
2
1h i
= ( x)2 e x 2 ( x) e x 2e x + 2
2
1
= 1 e x 2 x2 + 2 x + 2 for x > 0
2

Therefore
8
<0 if x 0
F (x) =
:1 1 x 2 2
2e x +2 x+2 if x > 0

(3)

Z1
1 3 3 x 1
E (X) = x e dx let y = x; dy = dx
2
0
Z1
1 1 3! 3
= y3e y
dy = (4) = =
2 2 2
0

Z1
2 1 3 4 x 1
E X = x e dx let y = x; dy = dx
2
0
Z1
1 1 4! 12
= y4e y
dy = 2 (5) = 2 = 2
2 2 2 2
0

and
2
2 2 12 3 3
V ar (X) = E X [E (X)] = 2 = 2
10.1. CHAPTER 2 345

(4) Using = 1,

P (0:5 < X 2) = F (2) F (0:5)

1 x 2
= 1 e x + 2x + 2 j20:5
2
1 2 1 0:5 1
= 1 e (4 + 4 + 2) 1 e +1+2
2 2 4
1 13
= e 0:5 e 2 (10)
2 4
0:3089

P (X > 0:5; X 2)
P (X > 0:5jX 2) =
P (X 2)
P (0:5 < X 2) F (2) F (0:5)
= =
P (X 2) F (2)
1 0:5 13 2
e e (10)
= 2 4
1 2
1 2 e (10)
0:9555

1:(g) (1) Since

ex= ex= e x=
f ( x) = 2 = 2 = 2 = f (x)
1 + ex= e2x= 1+e x= 1+e x=

therefore f is an even function which is symmetric about the y axis.

Z1 Z1 x=
1 e
= f (x)dx = 2 dx
k 1+e x=
1 1
Z1 x=
e
= 2 2 dx by symmetry
1+e x=
0
1 1 1 1
= 2 lim x=
jb0 = 2 lim b=
=2 =
b!1 1 + e b!1 1 + e 2 2
Therefore k = 1= .
(2)
Zx t=
e 1
F (x) = 2 dt = lim t=
jxa
1+e t= a! 11+e
1
1 1
= x=
lim a=
1+e a! 11+e
1
= x=
for x 2 <
1+e
346 10. SOLUTIONS TO SELECTED END OF CHAPTER PROBLEMS

0.25

0.2

0.15

f (x)

0.1

0.05

0
-10 -5 0 5 10
x

Figure 10.5: Graph of f (x) for =2

(3) Since f is a symmetric function about x = 0 then if E (jXj) exists then E (X) = 0.
Now
Z1 x=
xe
E (jXj) = 2 2 dx
1+e x=
0

Since
Z1 Z1
x x= y
e dx = ye dy = (2)
0 0

converges and

x xe x=
x=
e 2 for x 0
1+e x=

therefore by the Comparison Test for Integrals the improper integral

Z1 x=
xe
E (jXj) = 2 2 dx
1+e x=
0

converges and thus E (X) = 0.

10.1. CHAPTER 2 347

By symmetry
Z1
x2 e x=
E X2 = 2 dx
1+e x=
1
Z0
x2 e x=
= 2 2 dx let y = x=
1+e x=
1
Z0
2 y2e y
= 2 dy
(1 + e y )2
1

Using integration by parts with

e y 1
u = y2; du = 2ydy; dv = ; v=
(1 + e y )2 (1 + e y)

we have
Z0 Z0
y2e y y2 2y
dy = lim j0
y) a
dy
(1 + e y )2 a! 1 (1 + e 1+e y
1 1
Z0
a2 y a
= lim 2 dy lim e =1
a! 1 (1 + e a ) 1+e y a! 1
1
Z0
2a y
= lim 2 dy by L’Hospital’s Rule
a! 1 e a 1+e y
1
Z0
2 y
= lim a
2 y
dy by L’Hospital’s Rule
a! 1 e 1+e
1
Z0
y ey
= 0 2 y
dy multiply by
1+e ey
1
Z0
yey
= 2 dy
1 + ey
1

Let
u = ey ; du = ey dy; log u = y

to obtain
Z0 Z1
yey log u 2
dy = du =
1 + ey 1+u 12
1 0
348 10. SOLUTIONS TO SELECTED END OF CHAPTER PROBLEMS

This de…nite integral can be found at https://en.wikipedia.org/wiki/List_of_de…nite_integrals.

Z1 2
log u
du =
1+u 12
0

Therefore
2
E X2 = 2 2
2
12
2 2
=
3
and
2 2
V ar (X) = E X 2 [E (X)]2 = 02
3
2 2
=
3
(4)

P (0:5 < X 2) = F (2) F (0:5)

1 1
= 2=2
using =2
1+e 1 + e 0:5=2
1 1
= 1
1+e 1 + e 0:25
0:1689

P (X > 0:5; X 2)
P (X > 0:5jX 2) =
P (X 2)
P (0:5 < X 2)
=
P (X 2)
F (2) F (0:5)
=
F (2)
1 1 1
= 1 0:25
1+e
1+e 1+e
0:2310
10.1. CHAPTER 2 349

2:(a) Since
f (x; ) = (1 )2 x x 1

therefore
f0 (x) = f (x; = 0) = 0
and
f1 (x) = f (x; = 1) = 0
Since
1 x
f (x; ) 6= f0 (x ) and f (x; ) 6= f1

therefore is neither a location nor scale parameter.

2: (b) Since
1
f (x; ) = h i for x 2 <; >0
1 + (x= )2

therefore
1
f1 (x) = f (x; = 1) = for x 2 <
(1 + x2 )
Since
1 x
f (x; ) = f1 for all x 2 <; >0

therefore is a scale parameter for this distribution.

2: (c) Since
1 jx j
f (x; ) = e for x 2 <; 2<
2
therefore
1 jxj
f0 (x) = f (x; = 0) = e for x 2 <
2
Since
f (x; ) = f0 (x ) for x 2 <; 2<
therefore is a location parameter for this distribution.

2: (e) Since
1 3 2 x
f (x; ) = x e for x 0; >0
2
therefore
1
f1 (x) = f (x; = 1) = x2 e x
for x 0
2
Since
f (x; ) = f1 ( x) for x 0; >0
therefore 1= is a scale parameter for this distribution.
350 10. SOLUTIONS TO SELECTED END OF CHAPTER PROBLEMS

4: (a) Note that f (x) can be written as

8 2
> c =2 ec(x )
<ke
> x< c
2
f (x) = ke (x ) =2 c x +c
>
>
:kec2 =2 e c(x ) x> +c

Therefore

Z c Z+c Z1
1 c2 =2 2
c2 =2
= e ec(x )
dx + e (x ) =2
dx + e e c(x )
dx
k
1 c +c
Z1 p Zc
2 =2 1 z 2 =2
= 2ec e cu
du + 2 p e dz
2
c c

c2 =2 1 cu b
p
= 2e lim e jc + 2 P (jZj c) where Z N (0; 1)
b!1 c
2 c2 =2 c2 p
= e e + 2 [2 (c) 1] where is the N (0; 1) c.d.f.
c
2 c2 =2 p
= e + 2 [2 (c) 1]
c

as required.

4: (b) If x < c then

Zx
c2 =2
F (x) = ke ec(u )
du; let y = u
1
Z
x
2 =2
= kec ecy dy
1
2 =2 1 cy x
= kec lim e ja
a! 1 c
k c2 =2+c(x )
= e
c

and F ( c) = kc e c2 =2 .
10.1. CHAPTER 2 351

If c x + c then

p Z
x
k c2 =2 1 (u )2 =2
F (x) = e +k 2 p e du let z = u
c 2
c

p Z
x
k c2 =2 1 z 2 =2
= e +k 2 p e dz
c 2
c
k c2 =2
p
= e + k 2 [ (x ) ( c)]
c
k c2 =2
p
= e + k 2 [ (x )+ (c) 1] :
c

If x > + c then

Z1
c2 =2 c(u )
F (x) = 1 ke e du let y = u
x
Z1
c2 =2 cy
= 1 ke e dy
x

c2 =2 1 cy b
= 1 ke lim e jx
b!1 c
k c2 =2 c(x )
= 1 e
c

Therefore

8 2
> k c =2+c(x )
<ce
>
p
x< c
2
F (x) = kc e c =2 + k 2 [ (x )+ (c) 1] c x +c
>
>
:1 k ec2 =2 c(x ) x> +c
c

Since is a location parameter (see part (c))

Z1 Z1
k k
E X = x f (x) dx = xk f0 (x ) dx let y = u (10.4)
1 1
Z1
= (y + )k f0 (y) dy
1
352 10. SOLUTIONS TO SELECTED END OF CHAPTER PROBLEMS

In particular
Z1 Z1 Z1
E (X) = (y + ) f0 (y) dy = yf0 (y) dy + f0 (y) dy
1 1 1
Z1
= yf0 (y) dy + (1) since f0 (y) is a p.d.f.
1
Z1
= yf0 (y) dy +
1

Now
Z1 Z c Zc Z1
1 c2 =2 cy y 2 =2 c2 =2 cy
yf0 (y) dy = e ye dy + ye dy + e ye dy
k
1 1 c c
let y = u in the …rst integral
Z1 Zc Z1
c2 =2 cu y 2 =2 c2 =2 cy
= e ue du + ye dy + e ye dy
c c c

By integration by parts
2 3
Z1 Zb
1 1
ye cy
dy = lim 4 ye cy b
jc + e cy
dy 5
b!1 c c
c c
1 cy b 1 cy b
= lim ye jc e jc
b!1 c c2
1 c2
= 1+ e (10.5)
c2

Also since g (y) = ye y 2 =2 is a bounded odd function and [ c; c] is a symmetric interval

about 0
Z+c
y 2 =2
ye dy = 0
c

Therefore
Z1
1 1 c2 1 c2
yf0 (y) dy = 1+ e +0+ 1+ e =0
k c2 c2
1

and
Z1
E (X) = yf0 (y) dy + = 0 + =
1
10.1. CHAPTER 2 353

To determine V ar (X) we note that

V ar (X) = E X 2 [E (X)]2
Z1
= (y + )2 f0 (y) dy 2
using (10:4)
1
Z1 Z1 Z1
2 2 2
= y f0 (y) dy + 2 yf0 (y) dy + f0 (y) dy
1 1 1
Z1
= y 2 f0 (y) dy + 2 (0) + 2
(1) 2

1
Z1
= y 2 f0 (y) dy
1

Now

Z1 Z c Zc Z1
1 2 c2 =2 2 cy 2 y 2 =2 c2 =2
y f0 (y) dy = e y e dy + y e dy + e y2e cy
dy
k
1 1 c c
(let y = u in the …rst integral)
Z1 Zc Z1
c2 =2 y 2 =2 c2 =2
= e cu
u e du + y 2 e
2
dy + e y2e cy
dy
c c c
Z1 Zc
2 =2 y 2 =2
= 2ec y2e cy
dy + y2e dy
c c

By integration by parts and using (10.5) we have

2 3
Z1 Zb
1 2
y2e cy
dy = lim 4 y 2 e cy b
jc + ye cy
dy 5
b!1 c c
c c
2 1 c2 c2
= ce + 1+ 2 e
c c
2 2 2
= c+ + 3 e c
c c
354 10. SOLUTIONS TO SELECTED END OF CHAPTER PROBLEMS

Also
Z+c Zc
y 2 =2 y 2 =2
y2e dy = 2 y2e dy
c 0
2 3
p Zc
y 2 =2 c 1 y 2 =2
= 2 4 ye j0 + 2 p e dy 5 using integration by parts
2
0
n o p
c2 =2
= 2 ce + 2 [ (c) 0:5]
p 2
= 2 [2 (c) 1] 2ce c =2

Therefore
Z1
V ar (X) = y 2 f0 (y) dy
1
2 2 2 p 2
= k 2 c + + 3 e c + 2 [2 (c) 1] 2ce c =2
c c
( )
1
= 2
p
c2 =2 + 2 [2 (c)
ce 1]
2 2 2 p 2
2 c + + 3 e c + 2 [2 (c) 1] 2ce c =2
c c

4: (c) Let

f0 (x) = f (x; = 0)
( 2
ke x =2 if jxj c
= 2
ke cjxj+c =2 if jxj > c

Since
(
ke (x )2 =2 if jx j c
f0 (x ) =
ke cjx j+c2 =2 if jx j>c
= f (x)

therefore is a location parameter for this distribution.

10.1. CHAPTER 2 355

4: (d) On the graph in Figure 10.6 we have graphed f (x) for c = 1, = 0 (red), f (x) for
c = 2, = 0 (blue) and the N(0; 1) probability density function (black).

0.4
N(0,1) p.d.f. (black)
0.35

0.3

0.25
f(x) c=2 (blue)
0.2

0.15

0.1
c=1 (red)

0.05

0
-5 0 5
x

Figure 10.6: Graphs of f (x) for c = 1, = 0 (red), c = 2, = 0 and the N(0; 1) p.d.f.
(black).

We note that there is very little di¤erence between the graphs of the N(0; 1) probability
density function and the graph for f (x) for c = 2, = 0 however as c becomes smaller
(c = 1) the “tails”of the probability density function become much “fatter”relative to the
N(0; 1) probability density function.
356 10. SOLUTIONS TO SELECTED END OF CHAPTER PROBLEMS

5: (a) Since X Geometric(p)

P
1
x p (1 p)k
P (X k) = p (1 p) = by the Geometric Series
x=k 1 (1 p)
= (1 p)k for k = 0; 1; : : : (10.6)
Therefore
P (X k + j; X k) P (X k + j)
P (X k + jjX k) = =
P (X k) P (X k)
(1 p)k+j
= by (10:6)
(1 p)k
= (1 p)j
= P (X j) for j = 0; 1; : : : (10.7)
Suppose we have a large number of items which are to be tested to determine if they are
defective or not. Suppose a proportion p of these items are defective. Items are tested
one after another until the …rst defective item is found. If we let the random variable X
be the number of good items found before observing the …rst defective item then X
Geometric(p) and (10.6) holds. Now P (X j) is the probability we …nd at least j good
items before observing the …rst defective item and P (X k + jjX k) is the probability
we …nd at least j more good items before observing the …rst defective item given that
we have already observed at least k good items before observing the …rst defective item.
Since these probabilities are the same for all nonnegative integers by (10.7), this implies
that, no matter how many good items we have already observed before observing the …rst
defective item, the probability of …nding at least j more good items before observing the
…rst defective item is the same as when we …rst began testing. It is like we have “forgotten”
that we have already observed at least k good items before observing the …rst defective
item. In other words, conditioning on the event that we have already observed at least
k good items before observing the …rst defective item does not a¤ect the probability of
observing at least j more good items before observing the …rst defective item.

5: (b) If Y Exponential( ) then

Z1
1 y= a=
P (Y a) = e dy = e for a > 0 (10.8)
a
Therefore
P (Y a + b; Y a) P (Y a + b)
P (Y a + bjY a) = =
P (Y a) P (Y a)
e (a+b)=
= a=
by (10.8)
e
b=
= e = P (Y b) for all a; b > 0
as required.
10.1. CHAPTER 2 357

6: Since f1 (x) ; f2 (x) ; : : : ; fk (x) are probability density functions with support sets A1 ;
A2 ; : : : ; Ak then we know that fi (x) > 0 for all x 2 Ai ; i = 1; 2; : : : ; k. Also since
Pk Pk
0 < p1 ; p2 ; : : : ; pk 1 with pi = 1, we have that g (x) = pi fi (x) > 0 for all
i=1 i=1
S
k
x2A= Ai and A = support set of X. Also
i=1

Z1 k
X Z1 k
X k
X
g(x)dx = pi fi (x) dx = pi (1) = pi = 1
1 i=1 1 i=1 i=1

Therefore g (x) is a probability density function.

Now the mean of X is given by
Z1 k
X Z1
E (X) = xg(x)dx = pi xfi (x) dx
1 i=1 1
k
X
= pi i
i=1

As well
Z1 k
X Z1
2 2
E X = x g(x)dx = pi x2 fi (x) dx
1 i=1 1
k
X
2 2
= pi i + i
i=1

since
Z1 Z1 Z1 Z1
2
x2 fi (x) dx = (x i) fi (x) dx + 2 i xfi (x) dx 2
i fi (x) dx
1 1 1 1
2 2 2
= i + 2 i i
2 2
= i + i

Thus the variance of X is

k k
!2
X X
2 2
V ar (X) = pi i + i pi i
i=1 i=1
358 10. SOLUTIONS TO SELECTED END OF CHAPTER PROBLEMS

7:(a) Since X Gamma( ; ) the probability density function of X is

x 1 e x=
f (x) = for x > 0
( )

and 0 otherwise. Let A = fx : f (x) > 0g = fx : x > 0g. Now y = ex = h(x) is a one-to-one
function on A and h maps the set A to the set B = fy : y > 1g. Also

1 d 1 1
x=h (y) = log y and h (y) =
dy y

The probability density function of Y is

1 d 1
g (y) = f h (y) h (y)
dy
1 log y=
(log y) e 1
=
( ) y
1 1= 1
(log y) y
= for y 2 B
( )

and 0 otherwise.

7:(b) Since X Gamma( ; ) the probability density function of X is

x 1 e x=
f (x) = for x > 0
( )

and 0 otherwise. Let A = fx : f (x) > 0g = fx : x > 0g. Now y = 1=x = h(x) is a one-to-one
function on A and h maps the set A to the set B = fy : y > 0g. Also

1 1 d 1 1
x=h (y) = and h (y) =
y dy y2

The probability density function of Y is

1 d 1
g (y) = f h (y) h (y)
dy
y 1 e 1=( y)
= for y 2 B
( )

and 0 otherwise. This is the probability density function of an Inverse Gamma( ; ) random
variable. Therefore Y = X 1 Inverse Gamma( ; ).

7:(c) Since X Gamma(k; ) the probability density function of X is

xk 1 e x=
f (x) = k
for x > 0
(k)
10.1. CHAPTER 2 359

and 0 otherwise. Let A = fx : f (x) > 0g = fx : x > 0g. Now y = 2x= = h(x) is a
one-to-one function on A and h maps the set A to the set B = fy : y > 0g. Also

1 y d 1
x=h (y) = and h (y) =
2 dy 2
The probability density function of Y is

1 d 1
g (y) = f h (y) h (y)
dy
yk 1 e y=2
= for y 2 B
(k) 2k
and 0 otherwise for k = 1; 2; : : : which is the probability density function of a 2 (2k) random
variable. Therefore Y = 2X= 2 (2k).

7:(d) Since X N ; 2 the probability density function of X is

1 1
2 (x )2
f (x) = p e2 for x 2 <
2
Let A = fx : f (x) > 0g = <. Now y = ex = h(x) is a one-to-one function on A and h maps
the set A to the set B = fy : y > 0g. Also

1 d 1 1
x=h (y) = log y and h (y) =
dy y
The probability density function of Y is

1 d 1
g (y) = f h (y) h (y)
dy
1 1
2 (log y )2
= p e2 y2B
y 2
Note this distribution is called the Lognormal distribution.
7:(e) Since X N ; 2 the probability density function of X is
1 1
2 (x )2
f (x) = p e2 for x 2 <
2
Let A = fx : f (x) > 0g = <. Now y = x 1 = h(x) is a one-to-one function on A and h
maps the set A to the set B = fy : y 6= 0; y 2 <g. Also

1 1 d 1 1
x=h (y) = and h (y) =
y dy y2
The probability density function of Y is

1 d 1
g (y) = f h (y) h (y)
dy
h i2
1 1
2
1
= p e2 y
for y 2 B
2 y2
360 10. SOLUTIONS TO SELECTED END OF CHAPTER PROBLEMS

7:(f ) Since X Uniform 2 ; 2 the probability density function of X is

1
f (x) = for <x<
2 2
and 0 otherwise. Let A = fx : f (x) > 0g = x : 2 < x < 2 . Now y = tan(x) = h(x)
is a one-to-one function on A and h maps A to the set B = fy : 1 < y < 1g. Also
d 1
x = h 1 (y) = arctan(y) and dy h 1 (y) = 1+y 2 . The probability density function of Y is

1 d 1
g (y) = f h (y) h (y)
dy
1 1
= for y 2 <
1 + y2
and 0 otherwise. This is the probability density function of a Cauchy(1; 0) random variable.
Therefore Y = tan(X) Cauchy(1; 0).
7:(g) Since X Pareto( ; ) the probability density function of X is

f (x) = for x ; ; >0

x +1
and 0 otherwise. Let A = fx : f (x) > 0g = fx : x > 0g. Now y = log (x= ) = h(x)
is a one-to-one function on A and h maps A to the set B = fy : 0 y < 1g. Also x =
d
h 1 (y) = ey= and dy h 1 (y) = ey= . The probability density function of Y is

1 d 1
g (y) = f h (y) h (y)
dy

= ey= +1
=e y
for y 2 B
ey=
and 0 otherwise. This is the probability density function of a Exponential(1) random
variable. Therefore Y = log (X= ) Exponential(1).
7:(h) If X Weibull(2; ) the probability density function of X is
2xe (x= )2
f (x) = 2 for x > 0; >0

and 0 otherwise. Let A = fx : f (x) > 0g = fx : x > 0g. Now y = x2 = h(x) is a one-to-one
function on A and h maps the set A to the set B = fy : y > 0g. Also
1 d 1
x=h (y) = y 1=2 and h 1
(y) =
dy 2y 1=2
The probability density function of Y is
1 d 1
g (y) = f h (y) h (y)
dy
2
e y=
= 2 for y 2 B
10.1. CHAPTER 2 361

2
and 0 otherwise for k = 1; 2; : : : which is the probability density function of a Exponential
random variable. Therefore Y = X 2 Exponential( 2 )

7:(i) Since X Double Exponential(0; 1) the probability density function of X is

1 jxj
f (x) = e for x 2 <
2

The cumulative distribution function of Y = X 2 is

p p
G (y) = P (Y y) = P X 2 y = P ( y X y)
Z py
1 jxj
= p 2
e dx
y
Z py
= e x dx by symmetry for y > 0
0

By the First Fundamental Theorem of Calculus and the chain rule the probability density
function of Y is

d p d p
g (y) = G (y) = e y y
dy dy
1 p
= p e y for y > 0
2 y

and 0 otherwise.

7:(j) Since X t(k) the probability density function of X is

k+1
1 x2 ( k+1
2 )
2 p
f (x) = k
1+ for x 2 <
2 k k

which is an even function. The cumulative distribution function of Y = X 2 is

p p
G (y) = P (Y y) = P X 2 y =P( y X y)
p
Z y
k+1
1 x2 ( k+1
2 )
2 p
= k
1+ dx
p 2 k k
y
p
Zy k+1 ( k+1
2 )
2 1 x2
= 2 k
p 1+ dx by symmetry for y > 0
2 k k
0

By the First Fundamental Theorem of Calculus and the chain rule the probability density
362 10. SOLUTIONS TO SELECTED END OF CHAPTER PROBLEMS

function of Y is
d
g (y) = G (y)
dy
k+1 ( k+1
2 1 y 2 ) d p
= 2 k
p 1+ y for y > 0
2 k k dy
k+1 ( k+1
2 1 y 2 ) 1
= 2 k p
p 1+ p for y > 0
2 k k 2 y
k+1
1 1=2
1 1 ( k+1
2 ) 1 p
2 1
= k 1
y 2 1+ y for y > 0 since =
2 2
k k 2

and 0 otherwise. This is the probability density function of a F(1; k) random variable.
Therefore Y = X 2 F(1; k).
10.1. CHAPTER 2 363

8:(a) Let
n+1
2 1
cn = n
p
2
n

If T t (n) then T has probability density function

(n+1)=2
t2
f (t) = cn 1 + for t 2 <; n = 1; 2; :::
n

Since f ( t) = f (t), f is an even function whose graph is symmetric about the y axis.
Therefore if E (jT j) exists then due to symmetry E (T ) = 0. To determine when E (jT j)
exists, again due to symmetry, we only need to determine for what values of n the integral

Z1 (n+1)=2
t2
t 1+ dt
n
0

converges.
There are two cases to consider: n = 1 and n > 1.
For n = 1 we have
Z1
1 1 1
t 1 + t2 dt = lim ln 1 + t2 jb0 = lim ln 1 + b2 = 1
b!1 2 b!1 2
0

and therefore E (T ) does not exist.

For n > 1
Z1 (n+1)=2 (n 1)=2
t2 n t2
t 1+ dt = lim 1+ jb0
n b!1 n 1 n
0
" (n 1)=2
#
n b2
= 1 lim 1 +
n 1 b!1 n
n
=
n 1

and the integral converges. Therefore E (T ) = 0 for n > 1.

8:(b) To determine whether V ar(T ) = E T 2 (since E (T ) = 0) exists we need to determine

for what values of n the integral

Z1 (n+1)=2
2 t2
t 1+ dt
n
0

converges.
364 10. SOLUTIONS TO SELECTED END OF CHAPTER PROBLEMS

Now
Z1 (n+1)=2
t2
t2 1 + dt
n
0
p
Zn (n+1)=2 Z1 (n+1)=2
t2 t2
= t2 1 + dt + t2 1 + dt
n p
n
0 n

The …rst integral is …nite since it is the integral of a …nite function over the …nite interval
p
[0; n]. We will show that the second integral

Z1 (n+1)=2
2 t2
t 1+ dt
p
n
n

diverges for n = 1; 2.
Now
Z1 (n+1)=2
2 t2 p
t 1+ dt let y = t= n
p
n
n

Z1
3=2 (n+1)=2
=n y2 1 + y2 dy (10.9)
1

For n = 1
y2 y2 1
= for y 1
(1 + y 2 ) 2 2
(y + y ) 2
and since
Z1
1
dy
2
1

diverges, therefore by the Comparison Test for Improper Integrals, (10.9) diverges for n = 1.
(Note: For n = 1 we could also argue that V ar(T ) does not exist since E (T ) does not exist
for n = 1.)
For n = 2,
y2 y2 1
3=2 3=2
= 3=2 for y 1
(1 + y 2 ) (y 2 + y 2 ) 2 y
and since
Z1
1 1
dy
23=2 y
1

diverges, therefore by the Comparison Test for Improper Integrals, (10.9) diverges for n = 2.
10.1. CHAPTER 2 365

Now for n > 2,

Z1 (n+1)=2
2 2 t2
E T = cn t 1+ dt
n
1
Z1 (n+1)=2
t2
= 2cn t2 1 + dt
n
0

since the integrand is an even function. Integrate by parts using

(n+1)=2
t2
u = t, dv = t 1 + dt
n
(n 1)=2
n t2
du = dt, v = 1+
n 1 n

Then
" (n+1)=2
#
2 n t2
E T = 2cn lim t 1+ jb0
b!1 n 1 n
Z1 (n 1)=2
n t2
+2cn 1+ dt
n 1 n
0
Z1 (n 1)=2
n b n t2
= 2cn lim (n+1)=2
+ cn 1+ dt
n 1 b!1 b2 n 1 n
1+ n 1

where we use symmetry on the second integral.

Now
b 1
lim (n+1)=2
= lim (n 1)=2
=0
b!1 2 b!1 n+1 b2
1 + bn n b 1+ n

by L’Hopital’s Rule. Also

Z1 (n 1)=2
n t2 y t
cn 1+ dt let p =p
n 1 n n 2 n
1

1=2 Z1 (n 2+1)=2
cn n n y2
= cn 2 1+ dy
cn 2 n 1 n 2 n 2
1
1=2
cn n n
=
cn 2 n 1 n 2

where the integral equals one since the integrand is the p.d.f. of a t (n 2) random variable.
366 10. SOLUTIONS TO SELECTED END OF CHAPTER PROBLEMS

Finally
1=2
cn n n
cn 2 n 1 n 2
n+1 n 2
p 1=2
2 2 (n 2) n n
= n p n 2+1
2 n 2
n 1 n 2
n+1 n 1=2 1=2
2 2 1 n 2 n n
= n+1 n
2 1 2
n n 1 n 2
n+1 n+1 n
2 1 2 1 2 1 n
= n+1 n n
2 1 2 1 2 1 n 1
(n 1) 1 n
=
2 (n 2) =2 n 1
n
=
n 2
Therefore for n > 2
n
V ar(T ) = E T 2 =
n 2
10.1. CHAPTER 2 367

9: (a) To …nd E X k we …rst note that since

(a + b) a
f (x) = x 1
(1 x)b 1
for 0 < x < 1
(a) (b)

and 0 otherwise then

Z1
(a + b) a
x 1
(1 x)b 1
dx = 1
(a) (b)
0

or
Z1
(a) (b)
xa 1
(1 x)b 1
dx = (10.10)
(a + b)
0

for a > 0, b > 0. Therefore

Z1
(a + b)
E X k
= xk xa 1
(1 x)b 1
dx
(a) (b)
0
Z1
(a + b)
= xa+k 1
(1 x)b 1
dx
(a) (b)
0
(a + b) (a + k) (b)
= by (10:10)
(a) (b) (a + b + k)
(a + k) (a + b)
= for k = 1; 2; : : :
(a) (a + b + k)

For k = 1 we have

(a + 1) (a + b)
E Xk = E (X) =
(a) (a + b + 1)
a (a) (a + b)
=
(a) (a + b) (a + b)
a
=
a+b

For k = 2 we have

(a + 2) (a + b)
E X2 =
(a) (a + b + 2)
(a + 1) (a) (a) (a + b)
=
(a) (a + b + 1) (a + b) (a + b)
a (a + 1)
=
(a + b) (a + b + 1)
368 10. SOLUTIONS TO SELECTED END OF CHAPTER PROBLEMS

Therefore

V ar (X) = E X 2 [E (X)]2
2
a (a + 1) a
=
(a + b) (a + b + 1) a+b
2
a (a + 1) (a + b) a (a + b + 1)
=
(a + b)2 (a + b + 1)
a a2 + ab + a + b a2 + ab + a
=
(a + b)2 (a + b + 1)
ab
= 2
(a + b) (a + b + 1)

9: (b)

4.5

3.5

3 a=1,b=3
f(x) a=3,b=1
2.5
a=2,b=4
2 a=0.7,b=0.7

1.5
a=2,b=2
1

0.5

0
0 0.2 0.4 0.6 0.8 1
x

Figure 10.7: Graphs of Beta probability density functions

9: (c) If a = b = 1 then
f (x) = 1 for 0 < x < 1
and 0 otherwise. This is the Uniform(0; 1) probability density function.
10.1. CHAPTER 2 369

10: We will prove this result assuming X is a continuous random variable. The proof for
X a discrete random variable follows in a similar manner with integrals replaced by sums.
Suppose X has probability density function f (x) and E jXjk exists for some integer
k > 1. Then the improper integral
Z1
jxjk f (x) dx
1

converges. Let A = fx : jxj 1g. Then

Z1 Z Z
jxjk f (x) dx = jxjk f (x) dx + jxjk f (x) dx
1 A A

Since
0 jxjk f (x) f (x) for x 2 A
we have Z Z
k
0 jxj f (x) dx f (x) dx = P X 2 A 1 (10.11)
A A
R1 R
Convergence of jxjk f (x) dx and (10.11) imply the convergence of jxjk f (x) dx.
1 A
Now
Z1 Z Z
j j
jxj f (x) dx = jxj f (x) dx + jxjj f (x) dx for j = 1; 2; :::; k 1 (10.12)
1 A A

and Z
0 jxjj f (x) dx 1
A
R
by the same argument as in (10.11). Since jxjk f (x) dx converges and
A

jxjk f (x) jxjj f (x) for x 2 A, j = 1; 2; :::; k 1

R
then by the Comparison Theorem for Improper Integrals jxjj f (x) dx converges. Since
A
both integrals on the right side of (10.12) exist, therefore
Z1
j
E jXj = jxjj f (x) dx exists for j = 1; 2; :::; k 1
1
370 10. SOLUTIONS TO SELECTED END OF CHAPTER PROBLEMS

11: If X Binomial(n; ) then

E (X) = n
and p p
V ar (X) = n (1 )
Let W = X=n. Then
E (W ) =
and
(1 )
V ar (W ) =
n
From the result in Section 2:9 we wish to …nd a transformation Y = g (W ) = g (X=n) such
that V ar (Y ) constant then we need g such that

dg k
=p
d (1 )

where k is chosen for convenience. We need to solve the separable di¤erential equation
Z Z
1
dg = k p d (10.13)
(1 )

Since
d p 1 d p
arcsin x = q x
dx p 2 dx
1 ( x)
1 1
= p p
1 x 2 x
1
= p
2 x (1 x)

then the solution to (10.13) is

p
g ( ) = k 2 arcsin( ) + C
p
Letting k = 1=2 and C = 0 we have g ( ) = arcsin( ).
p
Therefore if hX Binomial(n;
i ) and Y = arcsin( X=n) then
p
V ar (Y ) = V ar arcsin( X=n) constant.
10.1. CHAPTER 2 371

13:(b)

P
1 xe
M (t) = E(etx ) = etx
x=0 x!
x
P
1 et
= e
x=0 x!
et
= e e by the Exponential Series
= e (et 1) for t 2 <

t
M 0 (t) = e (e 1)
et
E(X) = M 0 (0) =

t t
M 00 (t) = e (e 1)
( et )2 + e (e 1)
et
E(X 2 ) = M 00 (0) = 2
+
2
V ar(X) = E(X ) [E(X)]2 = 2
+ 2
=

13:(c)
Z1
1
M (t) = E(etx ) = etx e (x )=
dx

= Z1
e x 1
t 1 1
= e dx which converges for t > 0 or t <

Let
1 1
y= t x; dy = t dx

to obtain

= Z1 = Z1
e x 1
t e x 1
t
M (t) = e dx = e dx

= Z1 =
e y e y b
= e dy = lim e j 1
1
t (1 t) b!1 t
1
t

e= 1
t b e = 1
t
= lim e e = e
(1 t) b!1 (1 t)
e t 1
= for t <
(1 t)
372 10. SOLUTIONS TO SELECTED END OF CHAPTER PROBLEMS

e t e t
M 0 (t) = +
(1 t) (1 t)2
e t
= [ (1 t) + ]
(1 t)2
E(X) = M 0 (0) = +

e t e t 2 et
M 00 (t) = ( )+ + [ (1 t) + ]
(1 t)2 (1 t)2 (1 t)3
E(X 2 ) = M 00 (0) = + ( + 2 )( + )
2 2 2 2
= + +3 +2 = +2 +2
= ( + )2 + 2

V ar(X) = E(X 2 ) [E(X)]2

= ( + )2 + 2
( + )2
2
=

13:(d)
Z1
tx 1
M (t) = E(e ) = etx e jx j dx
2
1
2 3
Z Z1
14
= etx ex dx + etx e (x ) dx5
2
1
2 3
Z Z1
14
= e ex(t+1) dx + e e x(1 t) dx5
2
1
" ! !#
1 e (t+1) e (1 t)
= e +e for t + 1 > 0 and 1 t>0
2 t+1 1 t
1 et et
= + for t 2 ( 1; 1)
2 t+1 1 t
et
= for t 2 ( 1; 1)
1 t2

e t e t (2t)
M 0 (t) = +
1 t2 (1 t2 )2
e t
= [ 1 t2 + 2t]
(1 t2 )2
E(X) = M 0 (0) =
10.1. CHAPTER 2 373

e t e t e t (4t)
M 00 (t) = [ 2t + 2] + + (1 t2 ) + 2t
(1 t2 )2 (1 t2 )2 (1 t2 )4
E(X 2 ) = M 00 (0) = 2 + 2

V ar(X) = E(X 2 ) [E(X)]2

2 2
= 2+
= 2

13:(e)
Z1
tX
M (t) = E e = 2xetx xdx
0

Since Z
1 1
xetx dx = x etx + C
t t

2 1 tx 1
M (t) = x e j0
t t
2 1 t 2 1
= 1 e
t t t t
t
2 (t 1) e + 1
= ; if t 6= 0
t2
For t = 0, M (0) = E (1) = 1. Therefore
8
<1 if t = 0
M (t) = 2[(t 1)et +1]
: if t 6= 0
t2

Note that

1) et + 1
2 (t 0
lim M (t) = lim 2
indeterminate of the form
t!0 t!0 t 0
t
2 e + (t 1) e t
= lim by Hospital’s Rule
t!0 2t
et + (t 1) et 0
= lim indeterminate of the form
t!0 t 0
t t t
= lim e + e + (t 1) e by Hospital’s Rule
t!0
= 1+1 1=1
= M (0)

Therefore M (t) exists and is continuous for all t 2 <.

374 10. SOLUTIONS TO SELECTED END OF CHAPTER PROBLEMS

Using the Exponential series we have for t 6= 0

2 (t 1) et + 1 2 P1 ti
= (t 1) +1
t2 t2 i=0 i!
2 P 1 ti+1 P1 ti
= +1
t2 i=0 i! i=0 i!
2 P1 ti+1 P1 ti
= t + 1 + t + +1
t2 i=1 i! i=2 i!
2 Pt 1 i+1 Pt
1 i
= 2
t i=1 i! i=2 i!
2 Pt 1 i+1 P1 ti+1
=
t2 i=1 i! i=1 (i + 1)!
2 P 1
1 1
= ti+1
t2 i=1 i! (i + 1)!

and since
2 P1 1 1
2
ti+1 jt=0 = 1
t i=1 i! (i + 1)!

therefore M (t) has a Maclaurin series representation for all t 2 < given by

2 P1 1 1
ti+1
t2 i=1 i! (i + 1)!
P1 1 1
= 2 ti 1
i=1 i! (i + 1)!
P1 1 1
= 2 ti
i=0 (i + 1)! (i + 2)!

Since E X k = k! the coe¢ cient of tk in the Maclaurin series for M (t) we have

1 1 1 1 2
E (X) = (1!) (2) =2 =
(1 + 1)! (1 + 2)! 2 6 3

and
1 1 1 1 1
E X 2 = (2!) (2) =4 =
(2 + 1)! (2 + 2)! 6 24 2

Therefore
2
2 21 2 1
V ar (X) = E X [E (X)] = =
2 3 18
10.1. CHAPTER 2 375

Alternatively we could …nd E (X) = M 0 (0) using the limit de…nition of the derivative
M (t) M (0)
M 0 (0) = lim
t!0 t
2[(t 1)et +1]
t2
1
= lim
t!0 t
2 (t 1) et + 1 t2
= lim
t!0 t3
2
= using L’Hospital’s Rule
3
Similarly E X 2 = M 00 (0) could be found using

M 0 (t) M 0 (0)
M 00 (0) = lim
t!0 t
where !
d 2 (t 1) et + 1
M 0 (t) =
dt t2
for t 6= 0.
13:(f )
Z1 Z2
tX tx
M (t) = E e = e xdx + etx (2 x) dx
0 1
Z1 Z2 Z2
= xetx dx + 2 etx dx xetx dx:
0 1 1

Since Z
1 1
xetx dx = x etx + C
t t

1 1 tx 1 2 tx 2 1 1
M (t) = x e j0 + e j1 x etx j21
t t t t t
1 1 t 1 1 2 2t 1 1 2t 1
= 1 e + e et 2 e 1 et
t t t t t t t t
2 1 1 1 1 2 1 1 1
= e2t + 2 + et 1 + 1 + 2
t t t t t t t t t
e2t t
2e + 1
= for t 6= 0
t2
For t = 0, M (0) = E (1) = 1. Therefore
(
1 if t = 0
M (t) = 2t e 2et +1
t2
if t 6= 0
376 10. SOLUTIONS TO SELECTED END OF CHAPTER PROBLEMS

Note that
2et + 1
e2t 0
lim M (t) = lim 2
, indeterminate of the form , use Hospital’s Rule
t!0 t!0 t 0
2e2t 2et 0
= lim , indeterminate of the form , use Hospital’s Rule
t!0 2t 0
2e2t et
= lim =2 1=1
t!0 1
and therefore M (t) exists and is continuous for t 2 <.
Using the Exponential series we have for t 6= 0
e2t 2et + 1 1 (2t)2 (2t)3 (2t)4
= f1 + 2t + + + +
t2 t2 2! 3! 4!
t2 t3 t4
2 1+t+ + + + + 1g
2! 3! 4!
7
= 1 + t + t2 + (10.14)
12
and since
7 2
1+t+ t + jt=0 = 1
12
(10.14) is the Maclaurin series representation for M (t) for t 2 <.
Since E X k = k! the coe¢ cient of tk in the Maclaurin series for M (t) we have
7 7
E (X) = 1! 1 = 1 and E X 2 = 2! =
12 6
Therefore
7 1
V ar (X) = E X 2 1= [E (X)]2 =
6 6
0
Alternatively we could …nd E (X) = M (0) using the limit de…nition of the derivative
e2t 2et +1
M (t) M (0) t2
1
M 0 (0) = lim = lim
t
t!0 t!0 t
7 4
2et + 1e2t t2 t2 + t3 + 12 t + t2
= lim = lim
t!0 t3 t!0 t3
7
= lim 1 + t + =1
t!0 12
Similarly E X 2 = M 00 (0) could be found using
M 0 (t) M 0 (0)
M 00 (0) = lim
t!0 t
where
d e2t 2et + 1 2 t e2t et e2t + 2et 1
M 00 (t) = = for t 6= 0
dt t2 t3
10.1. CHAPTER 2 377

14:(a)

K(t) = log M (t)

M 0 (t)
K 0 (t) =
M (t)
M 0 (0)
K 0 (0) =
M (0)
E(X)
=
1
= E(X) since M (0) = 1

M (t)M 00 (t) [M 0 (t)]2

K 00 (t) =
[M (t)]2
M (0)M 00 (0) [M 0 (0)]2
K 00 (0) =
[M (0)]2
E(X 2 ) [E(X)]2
=
1
= V ar(X)

14:(b) If X Negative Binomial(k; p) then

k
p
M (t) = for t < log q; q=1 p
1 qet

Therefore
p
K(t) = log M (t) = k log
1 qet
= k log p k log 1 qet for t < log q

qet kqet
K 0 (t) = k
=
1 qet 1 qet
kq kq
E(X) = K 0 (0) = =
1 q p

" #
1 qet et et qet
K 00 (t) = kq
(1 qet )2
1 q+q kq
V ar(X) = K 00 (0) = kq 2 = 2
(1 q) p
378 10. SOLUTIONS TO SELECTED END OF CHAPTER PROBLEMS

15:(b)
1+t
M (t) =
1 t
P
1
= (1 + t) tk for jtj < 1 by the Geometric series
k=0
P
1 P
1
= tk + tk+1
k=0 k=0
= 1 + t + t + ::: t + t2 + t3 + :::
2

P
1
= 1+ 2tk for jtj < 1 (10.15)
k=1

Since
P1 M (k) (0) P1 E(X k )
M (t) = tk = tk (10.16)
k=0 k! k=0 k!
then by matching coe¢ cients in the two series (10.15) and (10.16) we have

E(X k )
= 2 for k = 1; 2; :::
k!
or
E(X k ) = 2k! for k = 1; 2; :::

15:(c)
et
M (t) =
1 t2
P
1 ti P
1
= t2k for jtj < 1 by the Geometric series
i=0 i! k=0
t t3 t2
= 1+ + + + ::: 1 + t2 + t4 + ::: for jtj < 1
1! 2! 3!
1 1 1 1
= 1+ t+ 1+ t2 + + t3
1! 2! 1! 3!
1 1 1 1 1
+ 1+ + t4 + + + t5 + ::: for jtj < 1
2! 4! 1! 3! 5!
Since
P1 M (k) (0) P1 E(X k )
M (t) = tk = tk
k=0 k! k=0 k!
then by matching coe¢ cients in the two series we have
1 P
k
E X 2k = (2k)! for k = 1; 2; :::
i=0 (2i)!
Pk 1
E X 2k+1 = (2k + 1)! for k = 1; 2; :::
i=0 (2i + 1)!
10.1. CHAPTER 2 379

16: (a)
Z1
tY tjZj 1 z 2 =2
MY (t) = E e =E e = etjzj p e dz
2
1
Z1
2 z 2 =2 z 2 =2
= p etz e dz since etjzj e is an even function
2
0
Z1 2 Z1
2 2 2et =2 2 2zt t2 )=2
= p e (z 2zt)=2
dz = p e (z dz
2 2
0 0
Z1
2 =2 1 (z t)2 =2
= 2et p e dz let y = (z t) ; dy = dz
2
0
Zt
t2 =2 1 y 2 =2
= 2e p e dy
2
1
t2 =2
= 2e (t) for t 2 <

where is the N(0; 1) cumulative distribution function.

16: (b) To …nd E (Y ) = E (jZj) we …rst note that
1
(0) =
2
d 1 t2 =2
(t) = (t) = p e
dt 2
and
1
(0) = p
2
Then
d d h t2 =2 i
MY (t) = 2e (t)
dt dt
2 2
= 2tet =2 (t) + 2et =2 (t)

Therefore

E (Y ) = E (jZj)
d
= MY (t) jt=0
dt
h i
2 =2 2 =2
= 2tet (t) + 2et (t) jt=0
r
2 2
= 0+ p =
2
380 10. SOLUTIONS TO SELECTED END OF CHAPTER PROBLEMS

To …nd V ar (Y ) = V ar (jZj) we note that

d 0
(t) = (t)
dt
d 1 t2 =2
= p e
dt 2
te t2 =2
= p
2
and
0
(0) = 0
Therefore
d2 d d
MY (t) = MY (t)
dt2 dt dt
d h t2 =2 2
i
= 2te (t) + 2et =2 (t)
dt
d 2
= 2 et =2 [t (t) + (t)]
dt
n o
2 =2 2 =2
= 2 et t (t) + (t) + 0
(t) + tet [t (t) + (t)]

and
d2
E Y2 = MY (t) jt=0
dt2
0
= 2 (1) 0 + (0) + (0) + (0) [(0) (0) + (0)]
= 2 (0)
1
= 2 =1
2

Therefore

V ar (Y ) = V ar (jZj)
= E Y2 [E (Y )]2
r !2
2
= 1

2
= 1
2
=
10.1. CHAPTER 2 381

18: Since

MX (t) = E(etx )
P
1
= ejt pj for jtj < h; h > 0
j=0

then
P
1
MX (log s) = ej log s pj
j=0
P1
= s j pj for j log sj < h; h > 0
j=0

which is a power series in s. Similarly

P
1
MY (log s) = sj qj for j log sj < h; h > 0
j=0

which is also a power series in s.

We are given that MX (t) = MY (t) for jtj < h; h > 0. Therefore

MX (log s) = MY (log s) for j log sj < h; h > 0

and
P
1 P
1
s j pj = s j qj for j log sj < h; h > 0
j=0 j=0

Since two power series are equal if and only if their coe¢ cients are all equal we have pj = qj ,
j = 0; 1; ::: and therefore X and Y have the same distribution.
382 10. SOLUTIONS TO SELECTED END OF CHAPTER PROBLEMS

19: (a) Since X has moment generating function

et
M (t) = for jtj < 1
1 t2
the moment generating function of Y = (X 1) =2 is
h i
MY (t) = E etY = E et(X 1)=2

t
= e t=2
E e ( 2 )X

t=2 t
= e M
2
t=2 et=2 t
= e ;= for <1
t 2 2
1 2
1
= 1 2 for jtj < 2
1 4t

19: (b)
2 2
1 2 1 1 1 2
M 0 (t) = ( 1) 1 t t = t 1 t
4 2 2 4
E(X) = M 0 (0) = 0

" #
2 3
1 1 2 1 2 1
M 00 (t) = 1 t + t ( 2) 1 t t
2 4 4 2
1
E(X 2 ) = M 00 (0) =
2
1 1
V ar(X) = E(X 2 ) [E(X)]2 = 0=
2 2

19: (c) Since the moment generating function of a Double Exponential( ; ) random variable
is
e t 1
M (t) = 2 2 for jtj <
1 t
and the moment generating function of Y is
1
MY (t) = 1 2 for jtj < 2
1 4t

therefore by the Uniqueness Theorem for Moment Generating Functions Y has a Double
Exponential 0; 12 distribution.
10.2. CHAPTER 3 383

10.2 Chapter 3
1:(a)
1 P
1 P
1 P
1 P
1
= q 2 px+y = q 2 py px
k y=0 x=0 y=0 x=0
P
1 1
= q2 py by the Geometric Series since 0 < p < 1
y=0 1 p
P
1
= q py since q = 1 p
y=0
1
= q by the Geometric Series
1 p
= 1

Therefore k = 1.
1:(b) The marginal probability function of X is
!
P P
1 P
1
f1 (x) = P (X = x) = f (x; y) = q 2 px+y = q 2 px py
y y=0 y=0

1
= q 2 px by the Geometric Series
1 p
= qpx for x = 0; 1; :::

By symmetry marginal probability function of Y is

f2 (y) = qpy for y = 0; 1; :::

The support set of (X; Y ) is A = f(x; y) : x = 0; 1; :::; y = 0; 1; :::g. Since

f (x; y) = f1 (x) f2 (y) for (x; y) 2 A

therefore X and Y are independent random variables.

1:(c)
P (X = x; X + Y = t) P (X = x; Y = t x)
P (X = xjX + Y = t) = =
P (X + Y = t) P (X + Y = t)
Now
P P P
t P
t
P (X + Y = t) = q 2 px+y = q 2 px+(t x)
= q 2 pt 1
(x;y): x+y=t x=0 x=0

= q 2 pt (t + 1) for t = 0; 1; :::

Therefore
q 2 px+(t x) 1
P (X = xjX + Y = t) = 2 t
= for x = 0; 1; :::; t
q p (t + 1) t+1
384 10. SOLUTIONS TO SELECTED END OF CHAPTER PROBLEMS

2:(a)
e 2
f (x; y) = for x = 0; 1; :::; y; y = 0; 1; :::
x! (y x)!
OR
e 2
f (x; y) = for y = x; x + 1; :::; x = 0; 1; :::
x! (y x)!

P P
1 e 2
f1 (x) = f (x; y) =
y y=x x! (y x)!
e 2 P1 1
= let k = y x
x! y=x (y x)!
e 2 P1 1 e 2 e1
= =
x! k=0 k! x!
e 1
= x = 0; 1; ::: by the Exponential Series
x!
Note that X Poisson(1).

P P
y e 2
f2 (y) = f (x; y) =
x x=0 x! (y x)!
e 2 P
y
y!
=
y! x=0 x!(y x)!
e 2 P y x
y
= 1
y! x=0 x
e 2
= (1 + 1)y by the Binomial Series
y!
2y e 2
= for y = 0; 1; :::
y!

Note that Y Poisson(2).

2:(b) Since (for example)
2 2
2 12 e 3
f (1; 2) = e 6= f1 (1)f2 (2) = e = 2e
2!
therefore X and Y are not independent random variables.

The support set of X is A1 = fx : x = 0; 1; :::g, the support set of Y is A2 = fy : y = 0; 1; :::g

and the support set of (X; Y ) is A = f(x; y) : x = 0; 1; :::; y; y = 0; 1; :::g. Since A 6= A1 A2
therefore by the Factorization Theorem for Independence X and Y are not independent
random variables.
10.2. CHAPTER 3 385

3:(a) The support set of (X; Y )

A = (x; y) : 0 < y < 1 x2 ; 1<x<1

n p p o
= (x; y) : 1 y < x < 1 y; 0 < y < 1

is pictured in Figure 10.8.

y=1-x 2

x
-1 0 1

Figure 10.8: Support set for Problem 3(a)

Z1 Z1 Z Z Z1 1Z x2

1 = f (x; y)dxdy = k x2 + y dydx = k x2 + y dydx

1 1 (x;y) 2A x= 1 y=0

Z1 Z1
1 x2 1 2
= k x y + y 2 j10
2
dx = k x2 1 x2 + 1 x2 dx
2 2
1 1
Z1 h i
2
= k 2x2 1 x2 + 1 x2 dx by symmetry
0
Z1
1 51 4 5
= k 1 x4 dx = k x x j = k and thus k =
5 0 5 4
0

Therefore
5 2
f (x; y) = x +y for (x; y) 2 A
4
386 10. SOLUTIONS TO SELECTED END OF CHAPTER PROBLEMS

3:(b) The marginal probability density function of X is

Z1
f1 (x) = f (x; y)dy
1
1Z x2
5
= x2 + y dy
4
0
5
= 1 x4 for 1<x<1
8
and 0 otherwise. The support set of X is A1 = fx : 1 < x < 1g.
The marginal probability density function of Y is
Z1
f2 (y) = f (x; y)dx
1
p
Z1 y
5
= x2 + y dx
4 p
1 y
p
Z1 y
5
= x2 + y dx because of symmetry
2
0
p
5 1 3 1 y
= x + yxj0
2 3
5 1
= (1 y)3=2 + y (1 y)1=2
2 3
5
= (1 y)1=2 [(1 y) + 3y]
6
5
= (1 y)1=2 (1 + 2y) for 0 < y < 1
6
and 0 otherwise. The support set of Y is A2 = fy : 0 < y < 1g.
3:(c) The support set A of (X; Y ) is not rectangular. To show that X and Y are not
independent random variables we only need to …nd x 2 A1 , and y 2 A2 such that (x; y) 2
= A.
3
Let x = 4 and y = 12 . Since

3 1 3 1
f ; = 0 6= f1 f2 >0
4 2 4 2

therefore X and Y are not independent random variables.

10.2. CHAPTER 3 387

3:(d) The region of integration is pictured in Figure 10.9.

y=x+1
y

2
y=1-x

x
-1 0 1

Figure 10.9: Region of integration for Problem 3 (d)

Z Z Z Z
5 2 5 2
P (Y X + 1) = x + y dydx = 1 x + y dydx
4 4
(x;y) 2B (x;y) 2C

Z0 1Z x2
5 2
= 1 x + y dydx
4
x= 1 y=x+1
Z0
5 1 2
= 1 x2 y + y 2 j1x+1x dx
4 2
x= 1
Z0 n o
5 2
= 1 [2x2 (1 x2 ) + 1 x2 ] [2x2 (x + 1) + (x + 1)2 ] dx
8
1
Z0
5 5 1 5 1 4
= 1 x4 2x3 3x2 2x dx = 1 + x + x + x3 + x2 j0 1
8 8 5 2
1
5 1 1 5 2+5 3
= 1 ( 1) + + ( 1) + 1 = 1 =1
8 5 2 8 10 16
13
=
16
388 10. SOLUTIONS TO SELECTED END OF CHAPTER PROBLEMS

4:(a) The support set of (X; Y )

A = (x; y) : x2 < y < 1; 1<x<1

p p
= f(x; y) : y < x < y; 0 < y < 1g

is pictured in Figure 10.10.

y
2
y=x
y=1
1

| | x
-1 0 1

Figure 10.10: Support set for Problem 4 (a)

Z1 Z1 Z Z
1 = f (x; y)dxdy = k x2 ydydx
1 1 (x;y) 2A
Z1 Z1 Z1
2 1 21
= k x ydydx = k x2 y j 2 dx
2 x
x= 1 y=x2 x= 1

Z1
k k 1 3 1 71
= x2 1 x4 dx = x x j 1
2 2 3 7
x= 1
k 1 1 1 1 1 1
= ( 1) + ( 1) = k
2 3 7 3 7 3 7
4k
=
21
Therefore k = 21=4 and

21 2
f (x; y) = x y for (x; y) 2 A
4
and 0 otherwise.
10.2. CHAPTER 3 389

4:(b) The marginal probability density function of X is

Z1
21
f1 (x) = x2 ydy
4
x2
21x2 h i
= y 2 j1x2
8
21x2
= 1 x4 for 1<x<1
8
and 0 otherwise. The support set of X is A1 = fx : 1 < x < 1g.

The marginal probability density function of Y

p
Zy
21
f2 (y) = x2 ydx
4 p
x= y
Z p
y
21
= x2 ydx because of symmetry
2 0
7 h 3 py i
= y x j0
2
7
= y y 3=2
2
7 5=2
= y for 0 < y < 1
2
and 0 otherwise. The support set of Y is is A2 = fy : 0 < y < 1g.

The support set A of (X; Y ) is not rectangular. To show that X and Y are not inde-
pendent random variables we only need to …nd x 2 A1 , and y 2 A2 such that (x; y) 2
= A.
1 1
Let x = 2 and y = 10 . Since

1 1 1 1
f ; = 0 6= f1 f2 >0
2 10 2 10

therefore X and Y are not independent random variables.

390 10. SOLUTIONS TO SELECTED END OF CHAPTER PROBLEMS

4:(c) The region of integration is pictured in Figure 10.11.

y
2
y=x

y=1
1

y=x

| | x
-1 0 1

Figure 10.11: Region of intergration for Problem 4 (c)

Z Z
21 2
P (X Y) = x ydydx
4
(x;y) 2B
Z1 Zx
21 2
= x ydydx
4
x=0 y=x2

Z1
21
= x2 y 2 jxx2 dx
8
0
Z1
21
= x2 x2 x4 dx
8
0
Z1
21
= x4 x6 dx
8
0
21 1 5 1 7 1
= x x j
8 5 7 0
21 7 5
=
8 35
3
=
20
10.2. CHAPTER 3 391

4:(d) The conditional probability density function of X given Y = y is

f (x; y)
f1 (xjy) =
f2 (y)
21 2
4 x y
= 7 5=2
2y
3 2 3=2 p p
= x y for y<x< y; 0 < y < 1
2
and 0 otherwise. Check:
p
Z1 Zy
1 3=2
f1 (xjy) dx = y 3x2 dx
2 p
1 y
h p i
y
= y 3=2 x3 j0
3=2 3=2
= y y
= 1

The conditional probability density function of Y given X = x is

f (x; y)
f2 (yjx) =
f1 (x)
21 2
4 x y
= 21x2
(1 x4 )
8
2y
= for x2 < y < 1; 1<x<1
(1 x4 )

and 0 otherwise. Check:

Z1 Z1
1
f2 (yjx) dy = 2ydy
(1 x4 )
1 x2
1
= y 2 j1x2
(1 x4 )
1 x4
=
1 x4
= 1
392 10. SOLUTIONS TO SELECTED END OF CHAPTER PROBLEMS

6:(d) (i) The support set of (X; Y )

A = f(x; y) : 0 < x < y < 1g

is pictured in Figure 10.12.

y
y=1
1

y=x
A

| x
0
1

Figure 10.12: Graph of support set for Problem 6(d) (i)

Z1 Z1 Z Z
1 = f (x; y) dxdy = k (x + y) dydx
1 1 (x;y) 2A
Z1 Zy Z1
1 2
= k (x + y) dxdy = k x + xy jyx=0 dy
2
y=0 x=0 y=0
Z1
1 2
= k y + y 2 dx
2
0
Z1
3 2 k 31
= k y dy = y j0
2 2
0
k
=
2
Therefore k = 2 and
f (x; y) = 2 (x + y) for (x; y) 2 A
and 0 otherwise.
10.2. CHAPTER 3 393

6:(d) (ii) The marginal probability density function of X is

Z1 Z1
f1 (x) = f (x; y) dy = 2 (x + y) dy
1 y=x
2
= 2xy + y j1y=x
= (2x + 1) 2x + x2
2

= 1 + 2x 3x2 for 0 < x < 1

and 0 otherwise. The support set of X is A1 = fx : 0 < x < 1g.

The marginal probability density function of Y is

Z1 Zy
f2 (y) = f (x; y) dx = 2 (x + y) dx
1 x=0
= x + 2yx jyx=0
2

= y 2 + 2y 2 0
2
= 3y for 0 < y < 1

and 0 otherwise. The support set of Y is A2 = fy : 0 < y < 1g.

6:(d) (iii) The conditional probability density function of X given Y = y is

f (x; y)
f1 (xjy) =
f2 (y)
2 (x + y)
= for 0 < x < y < 1
3y 2
Check:
Z1 Zy Zy
2 (x + y) 1 3y 2
f1 (xjy) dx = 2
dx = 2 2 (x + y) dx = =1
3y 3y 3y 2
1 x=0 x=0

The conditional probability density function of Y given X = x is

f (x; y)
f2 (yjx) =
f1 (x)
2 (x + y)
= for 0 < x < y < 1
1 + 2x 3x2
and 0 otherwise. Check:
Z1 Z1 Z1
2 (x + y) 1 1 + 2x 3x2
f2 (yjx) dy = 2
dy = 2 (x + y) dx = =1
1 + 2x 3x 1 + 2x 3x2 1 + 2x 3x2
1 y=x y=x
394 10. SOLUTIONS TO SELECTED END OF CHAPTER PROBLEMS

6:(d) (iv)
Z1
E (Xjy) = xf1 (xjy) dx
1
Zy
2 (x + y)
= x dx
3y 2
x=0
Zy
1
= 2 x2 + yx dx
3y 2
x=0
1 2 3
= x + yx2 jyx=0
3y 2 3
1 2 3
= y + y3
3y 2 3
1 5 3
= y
3y 2 3
5
= y for 0 < y < 1
9

Z1
E (Y jx) = yf2 (yjx) dy
1
Z1
2 (x + y)
= y dy
1 + 2x 3x2
y=x
2 3
Z1
1 4
= 2 xy + y 2 dy 5
1 + 2x 3x2
y=x
1 2
= xy 2 + y 3 j1y=x
1 + 2x 3x2 3
1 2 2
= 2
x+ x3 + x3
1 + 2x 3x 3 3
2 5 3
3 +x 3x
=
1 + 2x 3x2
2 + 3x 5x3
= for 0 < x < 1
3 (1 + 2x 3x2 )
10.2. CHAPTER 3 395

6:(f ) (i) The support set of (X; Y )

A = f(x; y) : 0 < y < 1 x; 0 < x < 1g

= f(x; y) : 0 < x < 1 y; 0 < y < 1g

is pictured in Figure 10.13.

y
1

y=1-x

x
0 1

Figure 10.13: Support set for Problem 6 (f ) (i)

Z1 Z1 Z Z
1 = f (x; y) dxdx = k x2 ydydx
1 1 (x;y) 2A
Z1 Z
1 x Z1 Z1
1 21 k
= k 2
x ydydx = k x 2
y j x
dx = x2 (1 x)2 dx
2 0 2
x=0 y=0 0 0
Z1
k k 1 3 1 4 1 51
= x2 2x3 + x4 dx = x x + x j0
2 2 3 2 5
0
k 1 1 1 k 10 15 + 6
= + =
2 3 2 5 2 30
k
=
60
Therefore k = 60 and

f (x; y) = 60x2 y for (x; y) 2 A

and 0 otherwise.
396 10. SOLUTIONS TO SELECTED END OF CHAPTER PROBLEMS

6:(f ) (ii) The marginal probability density function of X is

Z1
f1 (x) = f (x; y) dy
1
Z
1 x
2
= 60x ydy
0
= 30x 2
y 2 j10 x

= 30x2 (1 x)2 for 0 < x < 1

and 0 otherwise. The support of X is A1 = fx : 0 < x < 1g. Note that X Beta(3; 3).

The marginal probability density function. of Y is

Z1
f2 (y) = f (x; y) dx
1
Z
1 y

= 60y x2 dx
0
h i
= 20y x3 j10 y

= 20y (1 y)3 for 0 < y < 1

and 0 otherwise. The support of Y is A2 = fy : 0 < y < 1g. Note that Y Beta(2; 4).

6:(f ) (iii) The conditional probability density function of X given Y = y is

f (x; y)
f1 (xjy) =
f2 (y)
60x2 y
=
20y (1 y)3
3x2
= for 0 < x < 1 y; 0 < y < 1
(1 y)3

Check:
Z1 Z
1 y
h i
1 1 (1 y)3
f1 (xjy) dx = 3x2 dx = x3 j01 y
= =1
(1 y)3 (1 y)3 (1 y)3
1 0
10.2. CHAPTER 3 397

The conditional probability density function of Y given X = x is

f (x; y)
f2 (yjx) =
f1 (x)
60x2 y
=
30x2 (1 x)2
2y
= for 0 < y < 1 x; 0 < x < 1
(1 x)2

and 0 otherwise. Check:

Z1 Z
1 x
1 1 (1 x)2
f2 (yjx) dy = 2ydy = y 2 j01 x
= =1
(1 x)2 (1 x)2 (1 x)2
1 0

6:(f ) (iv)
Z1
E (Xjy) = xf1 (xjy) dx
1
Z
1 y
3
= 3 (1 y) x x2 dx
0
3 1 41 y
= 3 (1 y) x j
4 0
3 3
= (1 y) (1 y)4
4
3
= (1 y) for 0 < y < 1
4

Z1
E (Y jx) = yf2 (yjx) dy
1
Z
1 x
2
= 2 (1 x) y (y) dy
0
2 1 31 x
= 2 (1 x) y j
3 0
2 2
= (1 x) (1 x)3
3
2
= (1 x) for 0 < x < 1
3
398 10. SOLUTIONS TO SELECTED END OF CHAPTER PROBLEMS

6:(g) (i) The support set of (X; Y )

A = f(x; y) : 0 < y < xg
is pictured in Figure 10.14.

.
y .
.

y=x

...
A

x
0

Figure 10.14: Support set for Problem 6 (g) (i)

Z1 Z1 Z Z
x 2y
1 = f (x; y) dxdx = k e dydx
1 1 (x;y) 2A
Z1 Z1 Z1 h i
x 2y 2y x b
= k e dxdy = k e lim e jy dy
b!1
y=0 x=y 0
Z1 Z1
2y y b 3y
= k e e lim e dy = k e dy
b!1
0 0
Z1
k 3y
= 3e dy
3
0
3y 1
But 3e is the probability density function of a Exponential 3 random variable and
therefore the integral is equal to 1. Therefore 1 = k=3 or k = 3.
Therefore

x 2y
f (x; y) = 3e for (x; y) 2 A
and 0 otherwise.
10.2. CHAPTER 3 399

6:(g) (ii) The marginal probability density function of X is

Z1 Zx
x 2y
f1 (x) = f (x; y) dy = 3e dy
1 0
x 1 2y x
= 3e e j0
2
3 x 2x
= e 1 e for x > 0
2
and 0 otherwise. The support of X is A1 = fx : x > 0g.

The marginal probability density function of Y is

Z1
f2 (y) = f (x; y) dx
1
Z1
x 2y
= 3e dx
y

2y x b
= 3e lim e jy
b!1

2y y b
= 3e e lim e
b!1
3y
= 3e for y > 0

1
and 0 otherwise. The support of Y is A2 = fy : y > 0g. Note that Y Exponential 3 .
6:(g) (iii) The conditional probability density function of X given Y = y is

f (x; y)
f1 (xjy) =
f2 (y)
3e x 2y
=
3e 3y
= e (x y) for x > y > 0

Note that XjY = y Two Parameter Exponential(y; 1).

The conditional probability density function of Y given X = x is

f (x; y)
f2 (yjx) =
f1 (x)
3e x 2y
= 3 x (1 2x )
2e e
2e 2y
= 2x
for 0 < y < x
1 e
400 10. SOLUTIONS TO SELECTED END OF CHAPTER PROBLEMS

and 0 otherwise. Check:

Z1 Zx 2y 2x
2e 1 2y x 1 e
f2 (yjx) dy = 2x
dy = 2x
e j0 = 2x
=1
1 e 1 e 1 e
1 0

6:(g) (iv)
Z1
E (Xjy) = xf1 (xjy) dx
1
Z1
(x y)
= xe dx
y
h i
= ey lim (x + 1) e x b
jy
b!1
b+1
= ey (y + 1) e y
lim
b!1 eb
= y + 1 for y > 0

Z1
E (Y jx) = yf2 (yjx) dy
1
Zx
2ye 2y
= dy
1 e 2x
0
1 1
= 2x
y+ e 2y jx0
1 e 2
1 1 1
= x+ e 2x
1 e 2x 2 2
1 (2x + 1) e 2x
= for x > 0
2 (1 e 2x )
10.2. CHAPTER 3 401

7:(a) Since X Uniform(0; 1)

f1 (x) = 1 for 0 < x < 1

The joint probability density function of X and Y is

f (x; y) = f2 (yjx) f1 (x)

1
= (1)
1 x
1
= for 0 < x < y < 1
1 x
and 0 otherwise.
7:(b) The marginal probability density function of Y is
Z1
f2 (y) = f (x; y) dx
1
Zy
1
= dx
1 x
x=0
= log (1 x) jy0
= log (1 y) for 0 < y < 1

and 0 otherwise.
7:(c) The conditional probability density function of X given Y = y is

f (x; y)
f1 (xjy) =
f2 (y)
1
1 x
=
log (1 y)
1
= for 0 < x < y < 1
(x 1) log (1 y)

and 0 otherwise.
402 10. SOLUTIONS TO SELECTED END OF CHAPTER PROBLEMS

9: Since Y j Binomial(n; ) then

E (Y j ) = n and V ar (Y j ) = n (1 )

Since Beta(a; b) then

a ab
E( )= , V ar ( ) =
a+b (a + b + 1) (a + b)2

and

E 2
= V ar ( ) + [E ( )]2
2
ab a
= +
(a + b + 1) (a + b)2 a+b
2
ab + a (a + b + 1)
=
(a + b + 1) (a + b)2
a [b + a(a + b) + a] a (a + b) (a + 1)
= 2 =
(a + b + 1) (a + b) (a + b + 1) (a + b)2
a (a + 1)
=
(a + b + 1) (a + b)

Therefore
a
E (Y ) = E [E (Y j )] = E (n ) = nE ( ) = n
a+b
and

V ar (Y ) = E [var (Y j )] + V ar [E (Y j )]
= E [n (1 )] + V ar (n )
2
= n E( ) E + n2 V ar ( )
a a (a + 1) n2 ab
= n +
a+b(a + b + 1) (a + b) (a + b + 1) (a + b)2
(a + b + 1) (a + 1) n2 ab
= na +
(a + b + 1) (a + b) (a + b + 1) (a + b)2
b (a + b) n2 ab
= na 2 +
(a + b + 1) (a + b) (a + b + 1) (a + b)2
nab (a + b + n)
=
(a + b + 1) (a + b)2
10.2. CHAPTER 3 403

10: (a) Since Y j Poisson( ), E (Y j ) = and since Gamma( ; ), E ( ) = .

E (Y ) = E [E (Y j )] = E ( ) =
2
Since Y j Poisson( ), V ar (Y j ) = and since Gamma( ; ), V ar ( ) = .

V ar (Y ) = E [V ar (Y j )] + V ar [E (Y j )]
= E ( ) + V ar ( )
2
= +

10: (b) Since Y j Poisson( ) and Gamma( ; ) we have

ye
f2 (yj ) = for y = 0; 1; : : : ; >0
y!
and
1e =
f1 ( ) = for >0
( )
and by the Product Rule
ye 1e =
f ( ; y) = f2 (yj ) f1 ( ) = for y = 0; 1; : : : ; >0
y! ( )
The marginal probability function of Y is
Z1
f2 (y) = f ( ; y) d
1
Z1
1 y+ 1 (1+1= ) 1 +1
= e d let x = 1+ =
y! ( )
0
y+ Z1
1
= xy+ 1
e x
dx
y! ( ) +1
0
y
(y + )
=
(1 + )y+ y! ( )
y
(y + 1) (y + 2) ( ) ( )
= y+
(1 + ) y! ( )
y
y+ 1 1 1
= 1 for y = 0; 1; : : :
y 1+ 1+
If is a nonnegative integer then we recognize this as the probability function of a Negative
Binomial ; 1+1 random variable.

11: (a) First note that

@ j+k
et1 x+t2 y = xj y k et1 x+t2 y
@tj1 @tk2
404 10. SOLUTIONS TO SELECTED END OF CHAPTER PROBLEMS

Therefore
Z1 Z1
@ j+k @ j+k
M (t1 ; t2 ) = et1 x+t2 y f (x; y) dxdx
@tj1 @tk2 @tj1 @tk2
1 1

(assuming the operations of integration and di¤erentiation can be interchanged)

Z1 Z1
= xj y k et1 x+t2 y f (x; y) dxdx
1 1

= E X j Y k et1 X+t2 Y

and
@ j+k
M (t1 ; t2 ) j(t1 ;t2 )=(0;0) = E X j Y k
@tj1 @tk2
as required. Note that this proof is for the case of (X; Y ) continuous random variables.
The proof for (X; Y ) discrete random variables follows in a similar manner with integrals
replaced by summations.
11: (b) Suppose that X and Y are independent random variables then

M (t1 ; t2 ) = E et1 X+t2 Y = E et1 X et2 Y = E et1 X E et2 Y = MX (t1 ) MY (t2 )

Suppose that M (t1 ; t2 ) = MX (t1 ) MY (t2 ) for all jt1 j < h; jt2 j < h for some h > 0. Then
Z1 Z1 Z1 Z1
t1 x+t2 y t1 x
e f (x; y) dxdx = e f1 (x) dx et2 Y f2 (y) dy
1 1 1 1
Z1 Z1
= et1 x+t2 y f1 (x) f2 (y) dxdx
1 1

But by the Uniqueness Theorem for Moment Generating Functions this can only hold
if f (x; y) = f1 (x) f2 (y) for all (x; y) and therefore X and Y are independent random
variables.
Thus we have shown that X and Y are independent random variables if and only if
MX (t1 ) MY (t2 ) = M (t1 ; t2 ).
Note that this proof is for the case of (X; Y ) continuous random variables. The proof for
(X; Y ) discrete random variables follows in a similar manner with integrals replaced by
summations.
10.2. CHAPTER 3 405

11: (c) If (X1 ; X2 ; X3 ) Multinomial(n; p1 ; p2 ; p3 ) then

n
M (t1 ; t2 ) = E et1 X1 + et2 X2 = p1 et1 + p2 et2 + p3 for t1 2 <; t2 2 <

By 11 (a)

@2
E (X1 X2 ) = M (t1 ; t2 ) j(t1 ;t2 )=(0;0)
@t1 @t2
@2 n
= p1 et1 + p2 et2 + p3 j(t1 ;t2 )=(0;0)
@t1 @t2
n 2
= n (n 1) p1 et1 p2 et2 p1 et1 + p2 et2 + p3 j(t1 ;t2 )=(0;0)
= n (n 1) p1 p2

Also
n
MX1 (t) = M (t; 0) = p1 et + p2 + p3
n
= p1 et + 1 p1 for t 2 <

and
d
E (X1 ) = MX1 (t) jt=0
dt
n 1
= np1 p1 et + 1 p1 jt=0
= np1

Similarly
E (X2 ) = np2
Therefore

Cov (X1 ; X2 ) = E (X1 X2 ) E (X1 ) E (X2 )

= n (n 1) p1 p2 np1 np2
= np1 p2
406 10. SOLUTIONS TO SELECTED END OF CHAPTER PROBLEMS

13:(a) Note that since is a symmetric matrix T = so that 1 = I (2 2 identity

matrix) and T 1 T
= I. Also t = t T since t is a scalar and xtT = txT since
T

(x ) tT is a scalar.

[x ( + t )] 1
[x ( + t )]T 2 tT t tT
= [(x ) t ] 1
[(x ) t ]T 2 tT t tT
h i
= [(x ) t ] 1
(x )T tT 2 tT t tT
= (x ) 1
(x )T (x ) 1
tT t 1
(x )T + t 1
tT 2 tT t tT
= (x ) 1
(x )T (x ) tT t (x )T + t tT 2 tT t tT
= (x ) 1
(x )T xtT + tT txT + t T
2 tT
= (x ) 1
(x )T xtT + tT xtT + tT 2 tT
= (x ) 1
(x )T 2xtT
= (x ) 1
(x )T 2xtT

as required.
Now

M (t1 ; t2 ) = E et1 X1 +t2 X2 = E exp XtT

Z1 Z1
1 1
= 1=2
exp(xtT ) exp (x ) 1
(x )T dx1 dx2
2 j j 2
1 1
Z1 Z1
1 1h i
= exp (x ) 1
(x )T 2xtT dx1 dx2
2 j j1=2 2
1 1
Z1 Z1
1 1
= exp [x ( + t )] 1
[x ( + t )]T 2 tT t tT dx1 dx2
2 j j1=2 2
1 1
Z1 Z1
1 1 1
= exp t + t tT
T
exp [x ( + t )] 1
[x ( + t )]T dx1 dx2
2 2 j j1=2 2
1 1
1
= exp tT + t tT for all t = (t1 ; t2 ) 2 <2
2

since
1 1
exp [x ( + t )] 1
[x ( + t )]T
2 j j1=2 2
is a BVN( + t ; ) probability density function and therefore the integral is equal to one.
10.2. CHAPTER 3 407

13:(b) Since
1
MX1 (t) = M (t; 0) = exp 1t + t2 2
1 for t 2 <
2
which is the moment generating function of a N 1 ; 21 random variable, then by the
Uniqueness Theorem for Moment Generating Functions X1 N 1 ; 21 . By a similar
argument X2 N 2 ; 22 .
13:(c) Since
@2 @2 T 1
M (t1 ; t2 ) = exp t + tT t
@t1 @t2 @t1 @t2 2
@ 2 1 2 2 1 2 2
= exp 1 t1 + 2 t2 + t1 1 + t1 t2 1 2 + t2 2
@t1 @t2 2 2
@ 2
= 1 + t1 1 + t2 1 2 M (t1 ; t2 )
@t2
2 2
= 1 2 M (t1 ; t2 ) + 1 + t1 1 + t2 1 2 2 + t2 2 + t1 1 2 M (t1 ; t2 )

therefore
@2
E(XY ) = M (t1 ; t2 )j(t1 ;t2 )=(0;0)
@t1 @t2
= 1 2+ 1 2

From (b) we know E(X1 ) = 1 and E(X2 ) = 2: Therefore

Cov(X; Y ) = E(XY ) E(X)E(Y ) = 1 2 + 1 2 1 2 = 1 2

13:(d) By Theorem 3.8.6, X1 and X2 are independent random variables if and only if

M (t1 ; t2 ) = MX1 (t1 )MX2 (t2 )

then X1 and X2 are independent random variables if and only if

1 1
exp 1 t1 + 2 t2 + t21 2
1 + t1 t2 1 2 + t22 2
2
2 2
1 1
= exp 1 t1 + t21 2
1 exp 2 t2 + t22 2
2
2 2
for all (t1 ; t2 ) 2 <2 or
1 1 1 1
1 t1 + 2 t2 + t21 2
1 + t1 t2 1 2 + t22 2
2 = 1 t1 + t21 2
1 + 2 t2 + t22 2
2
2 2 2 2
or
t1 t2 1 2 =0
for all (t1 ; t2 ) 2 <2 which is true if and only if = 0. Therefore X1 and X2 are independent
random variables if and only if = 0.
408 10. SOLUTIONS TO SELECTED END OF CHAPTER PROBLEMS

13:(e) Since
1
E exp(XtT ) = exp tT + t tT for t 2 <2
2
therefore

Efexp[(XA + b)tT ]g
= Efexp[XAtT + btT ]g
n h T
io
= exp(btT )E exp X tAT
T 1 T
= exp(btT ) exp tAT + tAT tAT
2
1
= exp btT + AtT + t AT A tT
2
1
= exp ( A + b) tT + t(AT A)t for t 2 <2
2
which is the moment generating function of a BVN A + b; AT A random variable, then
by the Uniqueness Theorem for Moment Generating Functions, XA+b BVN A + b; AT A .
13:(f ) First note that
" # " 1
#
1 2 1 2
1 2 1 2 1 2
= 2 2 (1 2) 2 = 2)
1
1
1 2 1 2 1 (1 1 2
2
2

" # 1=2
2 h i1=2 p
1=2 1 1 2 2 2 2 2
j j = 2 = 1 2 ( 1 2) = 1 2 1
1 2 2
" #
2 2
1 T 1 x1 1 x1 1 x2 2 x2 2
(x ) (x ) = 2)
2 +
(1 1 1 2 2

and
2
x1
(x ) 1
(x )T 1
1
" #
2 2 2
1 x1 1 x1 1 x2 2 x2 2 x1 1
= 2)
2 +
(1 1 1 2 2 1
" #
2 2 2
1 x1 1 x1 1 x2 2 x2 2 2 x1 1
= 2)
2 + 1
(1 1 1 2 2 1

1 2 2
2 2 2 2
= 2 (1 2)
(x2 2) 2 (x1 1 ) (x2 2) + 2 (x1 1)
2 1 1
2
1 2
= 2 (1 2)
(x2 2) (x1 1)
2 1
2
1 2
= 2 (1 2)
x2 2+ (x1 1)
2 1
10.2. CHAPTER 3 409

The conditional probability density function of X2 given X1 = x1 is

f (x1 ; x2 )
f1 (x1 )
h i
1 1 1 (x T
2 j j1=2
exp 2 (x ) )
= 2
x1
p 1 exp 1 1
2 1 2 1
( " #)
2
1 1 1 T x1 1
= p p exp (x ) (x )
2 1 2 1 2 2 1
" #
2
1 1 1 2
= p p exp 2 (1 2)
x2 2 + (x1 1) for x 2 <2
2 2 1
2 2 2 1

which is the probability density function of a N + 2

(x1 2 2
2 1 1) ; 2 1 random
variable.
14:(a)
M (t1 ; t2 ) = E et1 X+t2 Y
2 y 3
Z1 Zy Z1 Z
t2 )y 4
= et1 x+t2 y
2e x y
dxdy = 2 e (1
e (1 t1 )x
dx5 dy
y=0 x=0 0 0
Z1 " #
e (1 t1 )x
= 2 e (1 t2 )y
jy0 dy if t1 < 1
1 t1
0
Z1 h i
2 (1 t2 )y (1 t1 )y
= e 1 e dy which converges if t1 < 1; t2 < 1
(1 t1 )
0
Z1 h i
2 (1 t2 )y (2 t1 t2 )y
= e e dy which converges if t1 + t2 < 2; t2 < 1
(1 t1 )
0
" #
2 e (1 t2 )y e (2 t1 t2 )y
= lim + jb0
(1 t1 ) b!1 (1 t2 ) (2 t1 t2 )
" #
2 e (1 t2 )b e (2 t1 t2 )b 1 1
= lim + +
(1 t1 ) b!1 (1 t2 ) (2 t1 t2 ) (1 t2 ) (2 t1 t2 )
2 1 1
=
(1 t1 ) (1 t2 ) (2 t1 t2 )
2 2 t1 t2 (1 t2 )
=
(1 t1 ) (1 t2 ) (2 t1 t2 )
2 1 t1
=
(1 t1 ) (1 t2 ) (2 t1 t2 )
2
= for t1 + t2 < 2; t2 < 1
(1 t2 ) (2 t1 t2 )
410 10. SOLUTIONS TO SELECTED END OF CHAPTER PROBLEMS

14:(b)
2 1
MX (t) = M (t; 0) = = 1 for t < 1
2 t 1 2t
which is the moment generating function of an Exponential 12 random variable. Therefore
by the Uniqueness Theorem for Moment Generating Functions X Exponential 12 .
2 2
MY (t) = M (0; t) = = for t < 1
(1 t) (2 t) 2 3t + t2
which is not a moment generating function we recognize. We …nd the probability density
function of Y using
Zy
f2 (y) = 2e x y dx = 2e y e x jy0
0
y y
= 2e 1 e for y > 0
and 0 otherwise.
14:(c) E (X) = 12 since X Exponential 1
2 .
Alternatively
d d 2
E (X) = MX (t) jt=0 = jt=0
dt dt 2 t
2 ( 1) ( 1) 2 1
= 2 jt=0 = 4 = 2
(2 t)
Similarly
d d 2
E (Y ) = MY (t) jt=0 = jt=0
dt dt 2 3t + t2
2 ( 1) ( 3 + 2t) 6 3
= 2 jt=0 = =
(2 3t + t2 ) 4 2
Since
@2 @2 1
M (t1 ; t2 ) = 2
@t1 @t2 @t1 @t2 (1 t2 ) (2 t1 t2 )
@ 1
= 2
@t2 (1 t2 ) (2 t1 t2 )2
1 1 1 2
= 2 2 2 + (1
(1 t2 ) (2 t1 t2 ) t2 ) (2 t1 t2 )3
we obtain
@2
E (XY ) = M (t1 ; t2 ) j(t1 ;t2 )=(0;0)
@t1 @t2
1 1 1 2
= 2 2 2 + (1 j(t1 ;t2 )=(0;0)
(1 t2 ) (2 t1 t2 ) t2 ) (2 t1 t2 )3
1 2
= 2 + =1
4 8
10.2. CHAPTER 3 411

Therefore

Cov(X; Y ) = E(XY ) E(X)E(Y )

1 3
= 1
2 2
1
=
4
412 10. SOLUTIONS TO SELECTED END OF CHAPTER PROBLEMS

10.3 Chapter 4
1: We are given that X and Y are independent random variables and we want to show that
U = h (X) and V = g (Y ) are independent random variables. We will assume that X and
Y are continuous random variables. The proof for discrete random variables is obtained by
replacing integrals by sums.
Suppose X has probability density function f1 (x) and support set A1 , and Y has
probability density function f2 (y) and support set A2 . Then the joint probability density
function of X and Y is

f (x; y) = f1 (x) f2 (y) for (x; y) 2 A1 A2

Now for any (u; v) 2 <2

P (U u; V v) = P (h (X) u; g (Y ) v)
ZZ
= f1 (x) f2 (y) dxdy
B

where
B = f(x; y) : h (x) u; g (y) vg
Let B1 = fx : h (x) ug and B2 = fy : g (y) vg. Then B = B1 B2 .
Since
ZZ
P (U u; V v) = f1 (x) f2 (y) dxdy
B1 B2
Z Z
= f1 (x) dx f2 (y) dy
B1 B2
= P (h (X) u) P (g (Y ) v)
= P (U u) P (V v) for all (u; v) 2 <2

therefore U and V are independent random variables.

10.3. CHAPTER 4 413

2:(a) The transformation

S : U = X + Y; V = X
has inverse transformation
X = V; Y =U V
The support set of (X; Y ), pictured in Figure 10.15, is

RXY = f(x; y) : 0 < x + y < 1; 0 < x < 1; 0 < y < 1g

y=1-x

R xy

x
0 1

Figure 10.15: RXY for Problem 2 (a)

Under S

(k; 0) ! (k; k) 0<k<1

(k; 1 k) ! (1; k) 0<k<1
(0; k) ! (k; 0) 0<k<1

and thus S maps RXY into

RU V = f(u; v) : 0 < v < u < 1g

which is pictured in Figure 10.16.
414 10. SOLUTIONS TO SELECTED END OF CHAPTER PROBLEMS

v
1 v =u

R
uv

u
0 1

Figure 10.16: RU V for Problem 2 (a)

The Jacobian of the inverse transformation is

@ (x; y) 0 1
= = 1
@ (u; v) 1 1

The joint probability density function of U and V is given by

g (u; v) = f (v; u v) j 1j
= 24v (u v) for (u; v) 2 RU V

and 0 otherwise.
2: (b) The marginal probability density function of U is given by
Z1
g1 (u) = g (u; v) dv
1
Zu
= 24v (u v) dv
0
= 12uv 2 8v 3 ju0
= 4u3 for 0 < u < 1

and 0 otherwise.
10.3. CHAPTER 4 415

6:(a)

y
1

y=t-x

A
t

t x
0 1

Figure 10.17: Region of integration for 0 t 1

For 0 t 1

G (t) = P (T t) = P (X + Y t)
Z Z Zt Zt x
= 4xydydx = 4xydydx
(x;y) 2At x=0 y=0

Zt
= 2x y 2 jt0 x
dx
0
Zt
= 2x (t x)2 dx
0
Zt
= 2x t2 2tx + x2 dx
0
4 3 1 4t
= x2 t2 tx + x j0
3 2
4 4 1 4
= t4 t + t
3 2
1 4
= t
6
416 10. SOLUTIONS TO SELECTED END OF CHAPTER PROBLEMS

t-1 y=t-x

x
0 t-1 1

Figure 10.18: Region of integration for 1 t 2

For 1 t 2 we use
G (t) = P (T t) = P (X + Y t) = 1 P (X + Y > t)
where
Z Z Z1 Z1 Z1
P (X + Y > t) = 4xydydx = 4xydydx = 2x y 2 j1t x dx
(x;y)2 Bt x=t 1 y=t x t 1

Z1 h i Z1
= 2x 1 (t x)2 dx = 2x 1 t2 + 2tx x2 dx
t 1 t 1
4 1 41
= x 1 t + tx3
2 2
x j
3 2 t 1
4 1 4 1
= 1 t2 + t (t 1)2 1 t2 + t (t 1)3 (t 1)4
3 2 3 2
4 1 1
= 1 t2 + t + (t 1)3 (t + 3)
3 2 6
so
4 1 1
G (t) = t2 t+ (t 1)3 (t + 3) for 1 t 2
3 2 6
The probability density function of T = X + T is
8
dG (t) < 3 t
2 3
if 0 t 1
g (t) = = h i
dt :2t 4 1
1)2 (t + 3) + 16 (t 1)3
3 2 (t if 1 < t 2
8
< 2 t3 if 0 t 1
3
=
:2 t3 + 6t 4 if 1 < t 2
3
and 0 otherwise.
10.3. CHAPTER 4 417

6: (c) The transformation

S : U = X 2; V = XY
has inverse transformation p p
X= U; Y = V= U
The support set of (X; Y ), pictured in Figure 10.19, is

RXY = f(x; y) : 0 < x < 1; 0 < y < 1g

Under S

y
1

Rxy

x
0 1

Figure 10.19: Support set of (X; Y ) for Problem 6 (c)

(k; 0) ! k2 ; 0 0<k<1
(0; k) ! (0; 0) 0<k<1
(1; k) ! (1; k) 0<k<1
2
(k; 1) ! k ;k 0<k<1

and thus S maps RXY into the region

p
RU V = (u; v) : 0 < v < u; 0 < u < 1
2
= (u; v) : v < u < 1; 0 < v < 1

which is pictured in Figure 10.20.

418 10. SOLUTIONS TO SELECTED END OF CHAPTER PROBLEMS

v
1 2
u=v or v=sqrt(u )

R
uv

u
0 1

Figure 10.20: Support set of (U; V ) for Problem 6 (c)

The Jacobian of the inverse transformation is

1
@ (x; y) p
2 u
0 1
= @y =
@ (u; v) p1 2u
@u u

The joint probability density function of U and V is given by

p p 1
g (u; v) = f u; v= u
2u
p p 1
= 4 u v= u
2u
2v
= for (u; v) 2 RU V
u
and 0 otherwise.
6:(d) The marginal probability density function of U is

Z1
g1 (u) = g (u; v) dv
1
p
Zu
2v 1 h 2 pu i
= dv = v j0
u u
0
= 1 for 0 < u < 1

and 0 otherwise. Note that U Uniform(0; 1).

10.3. CHAPTER 4 419

The marginal probability density function of V is

Z1
g2 (v) = g (u; v) du
1
Z1
2v
= du
u
v2
= 2v log uj1v2
= 2v log v 2
= 4v log (v) for 0 < v < 1

and 0 otherwise.
6:(e) The support set of X is A1 = fx : 0 < x < 1g and the support set of Y is
A2 = fy : 0 < y < 1g. Since
f (x; y) = 4xy = 2x (2y)
for all (x; y) 2 RXY = A1 A2 , therefore by the Factorization Theorem for Independence
X and Y are independent random variables. Also

f1 (x) = 2x for x 2 A1

and
f2 (y) = 2y for y 2 A2
so X and Y have the same distribution. Therefore
h i
E V 3 = E (XY )3
= E X3 E Y 3
2
= E X3
2 1 32
Z
= 4 x3 (2x) dx5
0
2
2 51
= x j
5 0
2
2
=
5
4
=
25
420 10. SOLUTIONS TO SELECTED END OF CHAPTER PROBLEMS

7:(a) The transformation

S : U = X=Y; V = XY
has inverse transformation p p
X= UV ; Y = V =U
The support set of (X; Y ), picture in Figure 10.21, is

RXY = f(x; y) : 0 < x < 1; 0 < y < 1g

Rxy

x
0 1

Figure 10.21: Support set of (X; Y ) for Problem 7 (a)

Under S

(k; 0) ! (1; 0) 0<k<1

(0; k) ! (0; 0) 0 < k < 1
1
(1; k) ! ;k 0<k<1
k
(k; 1) ! (k; k) 0 < k < 1

and thus S maps RXY into

1
RU V = (u; v) : v < u < ; 0<v<1
v
which is picture in Figure 10.22.
10.3. CHAPTER 4 421

v
1
uv=1

. . .
v=u

Ruv
...

u
0 1

Figure 10.22: Support set of (U; V ) for Problem 7 (a)

The Jacobian of the inverse transformation is

p p
@ (x; y) pv pu 1 1 1
2 pu 2 v
= v = + =
@ (u; v) p 1p 4u 4u 2u
2u3=2 2 u v

The joint probability density function of U and V is given by

p p 1
g (u; v) = f uv; v=u
2u
p p 1
= 4 uv v=u
2u
2v
= for (u; v) 2 RU V
u
and 0 otherwise.
7:(b) The support set of (U; V ) is RU V = (u; v) : v < u < v1 ; 0 < v < 1 which is not
rectangular. The support set of U is A1 = fu : 0 0. Since
3 3 1 3
4 2 A2 then g2 4 > 0. Therefore g1 2 g2 4 > 0 so

1 3 1 3
g ; = 0 6= g1 g2
2 4 2 4

and U and V are not independent random variables.

422 10. SOLUTIONS TO SELECTED END OF CHAPTER PROBLEMS

7:(c) The marginal probability density function of V is given by

Z1
g2 (v) = g (u; v) du
1

Z1=v
1
= 2v du
u
v

= 2v [ln v] j1=v
v
= 4v ln v for 0 < v < 1

and 0 otherwise.
The marginal probability density function of U is given by
Z1
g1 (u) = g (u; v) dv
1
8 Ru
>
> 1
2vdv if 0 
> u
< 0
= 1=u
>
> 1
R
>
> 2vdv if u 1
: 0
u

8
>

: u13 if u 1

and 0 otherwise.
10.3. CHAPTER 4 423

8: Since X Uniform(0; ) and Y Uniform(0; ) independently the joint probability

density function of X and Y is
1
f (x; y) = 2 for (x; y) 2 RXY

where
RXY = f(x; y) : 0 < x < ; 0 < y < g
which is pictured in Figure 10.23.The transformation

y
θ

R
xy

x
0 θ

Figure 10.23: Support set of (X; Y ) for Problem 8

S: U =X Y; V = X + Y

has inverse transformation

1 1
X= (U + V ) ; Y = (V U)
2 2
Under S

(k; 0) ! (k; k) 0<k<

(0; k) ! ( k; k) 0<k<
( ; k) ! ( k; + k) 0<k<
(k; ) ! (k ;k + ) 0<k<

S maps RXY into

RU V = f(u; v) : u < v < 2 + u; <u 0 or u < v < 2 u; 0 < u < g

which is pictured in Figure 10.24.

424 10. SOLUTIONS TO SELECTED END OF CHAPTER PROBLEMS

v
2θ

v=2θ+u v=2θ-u

R
uv

v=-u v=u

u
-θ 0 θ

Figure 10.24: Support set of (U; V ) for Problem 8

The Jacobian of the inverse transformation is

1 1
@ (x; y) 2 2 1 1 1
= 1 1 = + =
@ (u; v) 2 2 4 4 2

The joint probability density function of U and V is given by

1 1 1
g (u; v) = f (u + v) ; (v u)
2 2 2
1
= for (u; v) 2 RU V
2 2
and 0 otherwise.
10.3. CHAPTER 4 425

The marginal probability density function of U is

Z1
g1 (u) = g (u; v) dv
1
8 2 R+u
>
> 1 1
>
> 2 dv = 2 2
(2 + u + u) for 
>
>
>
1
dv = 1
(2 u u) for 0 < u <
:2 2
u
2 2

8
u+
>
< 2 for 
: u
for 0 < u <
2

juj
= 2 for <u<

and 0 otherwise.
426 10. SOLUTIONS TO SELECTED END OF CHAPTER PROBLEMS

9: (a) The joint probability density function of (Z1 ; Z2 ) is

1 2 1 2
f (z1 ; z2 ) = p e z1 =2 p e z2 =2
2 2
1 (z12 +z22 )=2
= e for (z1 ; z2 ) 2 <2
2
The support set of (Z1 ; Z2 ) is <2 .
The transformation
h i
2 1=2
X1 = 1+ 1 Z1 ; X2 = 2+ 2 Z1 + 1 Z2

has inverse transformation

X1 1 1 X2 2 X1 1
Z1 = ; Z2 = p
1 1 2 2 1

The Jacobian of the inverse transformation is

1
@ (z1 ; z2 ) 0 1 1
= @z2
1
1 = p =
@ (x1 ; x2 ) @x2 2 (1
2 )1=2 1 2 1
2 j j1=2

Note that

z12 + z22
" #
2 2 2
x1 1 1 x2 2 x1 1 x2 2 2 x1 1
= + 2)
2 +
1 (1 2 1 2 1
" #
2 2
1 x1 1 x1 1 x2 2 x2 2
= 2)
2 +
(1 1 1 2 2
1 T
= (x ) (x )

where h i h i
x= x1 x2 and = 1 2
h i
Therefore the joint probability density function of X = X1 X2 is

1 1
g (x) = 1=2
exp (x ) 1
(x )T for x 2 <2
2 j j 2

and thus X BVN( ; ).

9: (b) Since Z1 N(0; 1) and Z1 N(0; 1) independently we know Z12 2 (1)and Z22
2 (1) and Z 2 + Z 2
1 2
2 (2). From (a) Z 2 + Z 2 = (X
1 2 ) 1 (X )T . Therefore
(X ) 1 (X )T 2 (2).
10.3. CHAPTER 4 427

10: (a) The joint moment generating function of U and V is

M (s; t) = E esU +tV

h i
= E es(X+Y )+t(X Y )
h i
= E e(s+t)X e(s t)Y
h i h i
= E e(s+t)X E e(s t)Y since X and Y are independent random variables
= MX (s + t) MY (s t)
(s+t)+ 2 (s+t)2 =2 (s t)+ 2 (s t)2 =2 2 2
= e e since X N ; and Y N ;
2 (2s2 +2t2 )=2
= e2 s+
2 )s2 =2 2 )t2 =2
= e(2 )s+(2
e(2 for s 2 <; t 2 <

10: (b) The moment generating function of U is

MU (s) = M (s; 0)
2 )s2 =2
= e(2 )s+(2
for s 2 <

which is the moment generating function of a N 2 ; 2 2 random variable. The moment

generating function of V is

MV (t) = M (0; t)
2 )t2 =2
= e(2 for t 2 <

which is the moment generating function of a N 0; 2 2 random variable.

Since
M (s; t) = MU (s) MV (t) for all s 2 <; t 2 <
therefore by Theorem 3.8.6, U and V are independent random variables.
Also by the Uniqueness Theorem for Moment Generating Functions U N 2 ;2 2

and V N 0; 2 2 .
428 10. SOLUTIONS TO SELECTED END OF CHAPTER PROBLEMS

12: The transformation de…ned by

X1 X1 + X2
Y1 = ; Y2 = ; Y3 = X1 + X2 + X3
X1 + X2 X1 + X2 + X3
has inverse transformation

X1 = Y1 Y2 Y3 ; X2 = Y2 Y3 (1 Y1 ) ; X3 = Y3 (1 Y2 )

Let
RX = f(x1 ; x2 ; x3 ) : 0 < x1 < 1; 0 < x2 < 1; 0 < x3 < 1g
and
RY = f(y1 ; y2 ; y3 ) : 0 < y1 < 1; 0 < y2 < 1; 0 < y3 < 1g
The transformation from (X1 ; X2 ; X3 ) ! (Y1 ; Y2 ; Y3 ) maps RX into RY .
The Jacobian of the transformation from (X1 ; X2 ; X3 ) ! (Y1 ; Y2 ; Y3 ) is
@x1 @x1 @x1
@y1 @y2 @y3
@ (x1 ; x2 ; x3 ) @x2 @x2 @x2
= @y1 @y2 @y3
@ (y1 ; y2 ; y3 ) @x3 @x3 @x3
@y1 @y2 @y3

y2 y3 y1 y3 y1 y2
= y2 y3 y3 (1 y1 ) y2 (1 y1 )
0 y3 1 y2
= y3 (1 y1 ) y22 y3 + y1 y22 y3 + (1 y2 ) y2 y32 (1 y1 ) + y1 y2 y32
= y3 y22 y3 y1 y22 y3 + y1 y22 y3 + (1 y2 ) y2 y32 y1 y2 y32 + y1 y2 y32
= y22 y32 + y2 y32 y22 y32
= y2 y32

Since X1 ; X2 ; X3 are independent Exponential(1) random variables the joint probability

density function of (X1 ; X2 ; X3 ) is
x1 x2 x3
f (x1 ; x2 ; x3 ) = e for (x1 ; x2 ; x3 ) 2 RX

The joint probability density function of (Y1 ; Y2 ; Y3 ) is

y3
g (y1 ; y2 ; y3 ) = e y2 y32
= y2 y32 e y3
for (y1 ; y2 ; y3 ) 2 RY

and 0 otherwise.
10.3. CHAPTER 4 429

Let

A1 = fy1 : 0 < y1 < 1g

A2 = fy2 : 0 < y2 < 1g
A3 = fy3 : 0 < y3 < 1g

g1 (y1 ) = 1 for y1 2 A1
g2 (y2 ) = 2y2 for y2 2 A2

and
1
g3 (y3 ) = y32 exp ( y3 ) for y3 2 A3
2
Since g (y1 ; y2 ; y3 ) = g1 (y1 ) g2 (y2 ) g3 (y3 ) for all (y1 ; y2 ; y3 ) 2 A1 A2 A3 therefore by the
Factorization Theorem for Independence, (Y1 ; Y2 ; Y3 ) are independent random variables.
Since Z
gi (yi ) dyi = 1 for i = 1; 2; 3
Ai

therefore the marginal probability density function of Yi is equal to gi (yi ) ; i = 1; 2; 3.

Note that Y1 Uniform(0; 1) = Beta(1; 1) ; Y2 Beta(2; 1) ; and Y3 Gamma(3; 1)
independently.
430 10. SOLUTIONS TO SELECTED END OF CHAPTER PROBLEMS

13: Let
RY = f(y1 ; y2 ; y3 ) : 0 < y1 < 1; 0 < y2 < 2 ; 0 < y3 < g
Consider the transformation from (X1 ; X2 ; X3 ) ! (Y1 ; Y2 ; Y3 ) de…ned by
X1 = Y1 cos Y2 sin Y3; X2 = Y1 sin Y2 sin Y3 ; X3 = Y1 cos Y3
The Jacobian of the transformation from (X1 ; X2 ; X3 ) ! (Y1 ; Y2 ; Y3 ) is
@x1 @x1 @x1
@y1 @y2 @y3
@ (x1 ; x2 ; x3 ) @x2 @x2 @x2
= @y1 @y2 @y3
@ (y1 ; y2 ; y3 ) @x3 @x3 @x3
@y1 @y2 @y3

cos y2 sin y3 y1 sin y2 sin y3 y1 cos y2 cos y3

= sin y2 sin y3 y1 cos y2 sin y3 y1 sin y2 cos y3
cos y3 0 sin y3
cos y2 sin y3 sin y2 cos y2 cos y3
= y12 sin y3 sin y2 sin y3 cos y2 sin y2 cos y3
cos y3 0 sin y3
= y12 sin y3 [cos y3 sin2 y2 cos y3 cos2 y2 cos y3
sin y3 cos2 y2 sin y3 + sin2 y2 sin y3 ]
= y12 sin y3 cos2 y3 sin2 y3
= y12 sin y3
Since the entries in @(x 1 ;x2 ;x3 ) @(x1 ;x2 ;x3 )
@(y1 ;y2 ;y3 ) are all continuous functions for (y1 ; y2 ; y3 ) 2 RY and @(y1 ;y2 ;y3 ) 6=
0 for (y1 ; y2 ; y3 ) 2 RY therefore by the Inverse Mapping Theorem the transformation has
an inverse in the neighbourhood of each point in RY .
Since X1 ; X2 ; X3 are independent N(0; 1) random variables the joint probability density
function of (X1 ; X2 ; X3 ) is
3=2 1 2
f (x1 ; x2 ; x3 ) = (2 ) exp x + x22 + x22 for (x1 ; x2 ; x3 ) 2 <3
2 1
Now
x21 + x22 + x22 = (y1 cos y2 sin y3; )2 + (y1 sin y2 sin y3 )2 + (y1 cos y3 )2
= y12 cos2 y2 sin2 y3; + sin2 y2 sin2 y3 + cos2 y3
= y12 sin2 y3; cos2 y2 + sin2 y2 + cos2 y3
= y12 sin2 y3; + cos2 y3
= y12
The joint probability density function of (Y1 ; Y2 ; Y3 ) is
3=2 1 2
g (y1 ; y2 ; y3 ) = (2 ) exp y y12 sin y3
2 1
3=2 1 2 2
= (2 ) exp y y sin y3 for (y1 ; y2 ; y3 ) 2 RY
2 1 1
10.3. CHAPTER 4 431

and 0 otherwise.
Let

A1 = fy1 : y1 > 0g
A2 = fy2 : 0 < y2 < 2 g
A3 = fy3 : 0 < y3 < g

2 1 2
g1 (y1 ) = p y12 exp y for y1 2 A1
2 2 1
1
g2 (y2 ) = for y2 2 A2
2
and
1
g3 (y3 ) = sin y3 for y3 2 A3
2
Since g (y1 ; y2 ; y3 ) = g1 (y1 ) g2 (y2 ) g3 (y3 ) for all (y1 ; y2 ; y3 ) 2 A1 A2 A3 therefore by the
Factorization Theorem for Independence, (Y1 ; Y2 ; Y3 ) are independent random variables.
Since Z
gi (yi ) dyi = 1 for i = 1; 2; 3
Ai

therefore the marginal probability density function of Yi is equal to gi (yi ) ; i = 1; 2; 3.

432 10. SOLUTIONS TO SELECTED END OF CHAPTER PROBLEMS

15: Since X 2 (n), the moment generating function of X is

1 1
MX (t) = n=2
for t <
(1 2t) 2

Since U = X + Y 2 (m), the moment generating function of U is

1 1
MU (t) = m=2
for t <
(1 2t) 2

The moment generating function of U can also be obtained as

MU (t) = E etU
= E et(X+Y )
= E etX E etY since X and Y are independent random variables
= MX (t) MY (t)

By rearranging MU (t) = MX (t) MY (t) we obtain

MU (t)
MY (t) =
MX (t)
m=2
(1 2t)
= n=2
(1 2t)
1 1
= (m n)=2
for t <
(1 2t) 2

which is the moment generating function of a 2 (m n) random variable. Therefore by

the Uniqueness Theorem for Moment Generating Functions Y 2 (m n).
10.3. CHAPTER 4 433

16:(a) Note that

P
n P
n P
n
Xi X = Xi nX = 0 and (si s) = 0
i=1 i=1 i=1

Therefore
P
n P
n s
ti Xi = si s+ Xi X +X (10.17)
i=1 i=1 n
Pn h s s i
= si Xi X + s Xi X + (si s) X + X
i=1 n n
Pn s P
n P
n
= si Ui + s Xi X +X (si s) + sX
i=1 n i=1 i=1
Pn
= si Ui + sX
i=1

Also since X1 ; X2 ; :::; Xn are independent N ; 2 random variables

n
Y n
Y
P
n 1 2 2
E exp ti Xi = E [exp (ti Xi )] = exp ti + ti (10.18)
i=1 2
i=1 i=1
P
n 1 2 Pn
= exp ti + t2i
i=1 2 i=1

Therefore by (10.17) and (10.18)

P
n P
n
E exp si Ui + sX = E exp ti Xi (10.19)
i=1 i=1
P
n 1 2 P
n
= exp ti + t2i
i=1 2 i=1

16:(b)
P
n P
n s Pn s
ti = si s+ = (si s) + n =0+s=s (10.20)
i=1 i=1 n i=1 n

P
n P
n s 2
t2i = si s+ (10.21)
i=1 i=1 n
Pn s Pn P
n s 2
= (si s)2 + 2 (si s) +
i=1 n i=1 i=1 n
Pn s2
= (si s)2 + 0 +
i=1 n
Pn s2
= (si s)2 +
i=1 n
434 10. SOLUTIONS TO SELECTED END OF CHAPTER PROBLEMS

16:(c)

P
n P
n
M (s1 ; :::; sn ; s) = E exp si Ui + sX = E exp ti Xi
i=1 i=1
P
n 2 P
n
= exp ti + t2 by (10.19)
i=1 2 i=1 i
1 Pn s2
= exp s+ 2
(si s)2 + by (10.20) and (10.21)
2 i=1 n
1 s2 1 2P n
= exp s+ 2
exp (si s)2
2 n 2 i=1

16:(d) Since
1 2 s2
MX (s) = M (0; :::; 0; s) = exp s+
2 n
and
1 P
n
MU (s1 ; :::; sn ) = M (s1 ; :::; sn ; 0) = exp 2
(si s)2
2 i=1

we have
M (s1 ; :::; sn ; s) = MX (s) MU (s1 ; :::; sn )
By the Independence Theorem for Moment Generating Functions, X and U = (U1 ; U2 ; : : : ; Un )
are independent random variables. Therefore by Chapter 4, Problem 1, X and
Pn Pn
2
Ui2 = Xi X are independent random variables.
i=1 i=1
10.4. CHAPTER 5 435

10.4 Chapter 5
1: (a) Since Yi Exponential( ; 1), i = 1; 2; : : : independently then
Z1
(y ) (x )
P (Yi > x) = e dy = e for x > ; i = 1; 2; : : : (10.22)
x

and for x >

Fn (x) = P (Xn x) = P (min (Y1 ; Y2 ; : : : ; Yn ) x) = 1 P (Y1 > x; Y2 > x; : : : ; Yn > x)

Q
n
= 1 P (Yi > x) since Y1 ; Y2 ; : : : ; Yn are independent random variables
i=1
n(x )
= 1 e using (10:22) (10.23)

Since 8
<1 if x >
lim Fn (x) =
n!1 :0 if x <

therefore
Xn !p (10.24)
(b) By (10.24) and the Limit Theorems
Xn
Un = !p 1

which is the cumulative distribution function of an Exponential(1) random variable. There-

fore Vn Exponential(1) for n = 1; 2; ::: which implies

Vn !D V Exponential (1)

(d) Since
v
P (Wn w) = P n2 (Xn ) < w = P Xn +
n2
n(w=n2 + )
= 1 e
w=n
= 1 e for w 0

therefore
lim P (Wn w) = 0 for all w 2 <
n!1
which is not a cumulative distribution function. Therefore Wn has no limiting distribution.
436 10. SOLUTIONS TO SELECTED END OF CHAPTER PROBLEMS

2: We …rst note that

P (Yn y) = P (max (X1 ; X2 ; : : : ; Xn ) y)

= P (X1 y; X2 y : : : ; Xn y)
Qn
= P (Xi y) since X1 ; X2 ; : : : ; Xn are independent random variables
i=1
Q
n
= F (y)
i=1
= [F (y)]n for y 2 < (10.25)

Since F is a cumulative distribution function, F takes on values between 0 and 1. Therefore

the function n [1 F ( )] takes on values between 0 and n. Gn (z) = P (Zn z), the
cumulative distribution function of Zn , equals 0 for z 0 and equals 1 for z n. For
0<z<n

Gn (z) = P (Zn z)
= P (n [1 F (Yn )] z)
z
= P F (Yn ) 1
n
Let A be the support set of Xi . F (x) is an increasing function for x 2 A and therefore has
an inverse, F 1 , which is de…ned on the interval (0; 1). Therefore for 0 < z < n
z
Gn (z) = P F (Yn ) 1
n
1 z
= P Yn F 1
n
z
= 1 P Yn < F 1 1
h z in
n
= 1 F F 1 1 by (10.25)
n
z n
= 1 1
n
Since h z ni
z
lim 1 1 =1 e for z > 0
n!1 n
therefore 8
<0 if z < 0
lim Gn (z) =
n!1 :1 e z if z > 0
which is the cumulative distribution function of a Exponential(1) random variable. There-
fore by the de…nition of convergence in distribution

Zn !D Z Exponential (1)
10.4. CHAPTER 5 437

3: The moment generating function of a Poisson( ) random variable is

M (t) = exp et 1 for t 2 <

The moment generating function of

p 1 P n p
Yn = n X =p Xi n
n i=1

Mn (t) = E etYn
p t P n
= E exp n t+ p Xi
n i=1
p t P n
= e n t E exp p Xi
n i=1
p Q
n t
= e n t E exp p Xi
i=1 n
since X1 ; X2 ; : : : ; Xn are independent random variables
p n
t
= e n t M p
n
p h p i
= e n t exp n et= n 1
h p p i
= exp n t + n et= n 1 for t 2 <

and
p p
log Mn (t) = n t+n et= n
1 for t 2 <
By Taylor’s Theorem
x2 ec 3
ex = 1 + x + + x
2 3!
for some c between 0 and x. Therefore
p t 1 t 2 1 t 3 cn
et= n
= 1+ p + p + p e
n 2 n 3! n
t 1 t2 1 t3
= 1+ p + + ecn
n 2 n 3! n3=2
p
for some cn between 0 and t= n.
438 10. SOLUTIONS TO SELECTED END OF CHAPTER PROBLEMS

Therefore
p p
log Mn (t) = n t+n et= n
1
p t 1 t2 1 t3
= n t+n p + + ecn
n 2 n 3! n3=2
1 2 t3 1
= t + p ecn for t 2 <
2 3! n

Since
lim cn = 0
n!1

it follows that
lim ecn = e0 = 1
n!1

Therefore
1 2 t3 1
lim log Mn (t) = lim t + p ecn
n!1 n!1 2 3! n
1 2 t 3
= t + (0) (1)
2 3!
1 2
= t for t 2 <
2
and
1
t2
lim Mn (t) = e 2 for t 2 <
n!1

which is the moment generating function of a N(0; ) random variable.

Therefore by the Limit Theorem for Moment Generating Functions

Yn !D Y N (0; )
10.4. CHAPTER 5 439

4: The moment generating function of an Exponential( ) random variable is

1 1
M (t) = for t <
1 t
Since X1 ; X2 ; : : : ; Xn are independent random variables, the moment generating function
of
1 Pn
Zn = p Xi n
n i=1
1 P n p
= p Xi n
n i=1

Mn (t) = E etZn
1 P n p
= E exp t p Xi n
n i=1
p t P n
= E e n t exp p Xi
n i=1
n t Q
p n t
= e E exp p Xi
i=1 n
p n
n t t
= e M p
n
!n
p 1
n t
= e
1 ptn
p n p
t= n t n
= e 1 p for t <
n

By Taylor’s Theorem
x2 ec 3
ex = 1 + x + + x
2 3!
for some c between 0 and x. Therefore
p 2 3
t= n t 1 t 1 t
e = 1+ p + p + p ecn
n 2! n 3! n
2 2 3 3
t 1 t 1 t
= 1+ p + + ecn
n 2 n 3! n3=2
p
for some cn between 0 and t= n.
440 10. SOLUTIONS TO SELECTED END OF CHAPTER PROBLEMS

Therefore
2 2 3 3 n
t 1 t 1 t t
Mn (t) = 1+ p + + ecn 1 p
n 2 n 3! n3=2 n
2 2 3 3
t 1 t 1 t
= [1 + p + + ecn
n 2 n 3! n3=2
2 2 3 3
t t t 1 t t 1 t t n
p p p p ecn p ]
n n n 2 n n 3! n3=2 n
2 2 3 3 3 3 4 4 n
1 t 1 t 1 t t
= 1 + ecn
2 n 2 n3=2 3! n3=2 n2
2 2 n
1 t (n)
= 1 +
2 n n

where
3 3 3 3 4 4
1 t 1 t t
(n) = + p ecn
2 n1=2 3! n n
Since
lim cn = 0
n!1

it follows that
lim ecn = e0 = 1
n!1

Also
3 3 4 4
t t
lim p = 0; lim =0
n!1 n n!1 n
and therefore
lim (n) = 0
n!1

Thus by Theorem 5.1.2

2 2 n
1 t (n)
lim Mn (t) = lim 1 +
n!1 n!1 2 n n
2 2
t =2
= e for t 2 <
2
which is the moment generating function of a N 0; random variable.
Therefore by the Limit Theorem for Moment Generating Functions
2
Zn !D Z N 0;
10.4. CHAPTER 5 441

6: We can rewrite Sn2 as

1 P
n
2
Sn2 = Xi Xn
n 1 i=1
1 Pn
2
= (Xi ) Xn
n 1 i=1
1 Pn P
n
2
= (Xi )2 2 Xn (Xi ) + n Xn
n 1 i=1 i=1
1 Pn Pn
2
= (Xi )2 2 Xn Xi n + n Xn (10.26)
n 1 i=1 i=1
1 Pn
2
= (Xi )2 2 Xn n Xn + n Xn (10.27)
n 1 i=1
1 Pn
2
= (Xi )2 n Xn
n 1 i=1
" #
2 2
2 n 1 Pn Xi Xn
= (10.28)
n 1 n i=1

Since X1 ; X2 ; : : : are independent and identically distributed random variables with

E (Xi ) = and V ar (Xi ) = 2 < 1, then

Xn !p (10.29)

by the Weak Law of Large Numbers.

By (10.29) and the Limit Theorems
2
Xn
!p 0 (10.30)

2
Xi
Let Wi = , i = 1; 2; : : : with
" #
2
Xi
E (Wi ) = E =1

and V ar (Wi ) < 1 since E Xi4 < 1.

Since W1 ; W2 ; : : : are independent and identically distributed random variables with
E (Wi ) = 1 and V ar (Wi ) < 1, then
n
1X Xi 2
Wn = !p 1 (10.31)
n
i=1

by the Weak Law of Large Numbers.

By (10.28), (10.30), (10.31) and the Limit Theorems

Sn2 !p 2
(1) (1 + 0) = 2
442 10. SOLUTIONS TO SELECTED END OF CHAPTER PROBLEMS

and therefore
Sn
!p 1 (10.32)

by the Limit Theorems.

Since X1 ; X2 ; : : : are independent and identically distributed random variables with
E (Xi ) = and V ar (Xi ) = 2 < 1, then
p
n Xn
!D Z N (0; 1) (10.33)

by the Central Limit Theorem.

By (10.32), (10.33) and Slutsky’s Theorem
p
p n(Xn )
n Xn Z
Tn = = Sn
!D =Z N (0; 1)
Sn 1
10.4. CHAPTER 5 443

7: (a) Let Y1 ; Y2 ; : : : be independent Binomial(1; ) random variables with

E (Yi ) =

and
V ar (Yi ) = (1 )
for i = 1; 2; : : :
By 4.3.2(1)
P
n
Yi Binomial (n; )
i=1
Since Y1 ; Y2 ; : : : are independent and identically distributed random variables with
E (Yi ) = and V ar (Yi ) = (1 ) < 1, then by the Weak Law of Large Numbers
1 Pn
Yi = Yn !p
n i=1
P
n
Since Xn and Yi have the same distribution
i=1

Xn
Tn = !p (10.34)
n
(b) By (10.34) and the Limit Theorems

Xn Xn
Un = 1 !p (1 ) (10.35)
n n

(c) Since Y1 ; Y2 ; : : : are independent and identically distributed random variables with
E (Yi ) = and V ar (Yi ) = (1 ) < 1, then by the Central Limit Theorem
p
n Yn
p !D Z N (0; 1)
(1 )
P
n
Since Xn and Yi have the same distribution
i=1
p Xn
n n
Sn = p !D Z N (0; 1) (10.36)
(1 )

By (10.36) and Slutsky’s Theorem

p Xn p p
Wn = n = Sn (1 ) !D W = (1 )Z
n
p
Since Z N(0; 1), W = (1 )Z N(0; (1 )) and therefore

Wn !D W N (0; (1 )) (10.37)
444 10. SOLUTIONS TO SELECTED END OF CHAPTER PROBLEMS

(d) By (10.35), (10.37) and Slutsky’s Theorem

p
Wn W Z (1 )
Zn = p !D p = p =Z N (0; 1)
Un (1 ) (1 )

(e) To determine the limiting distribution of

" r ! #
p Xn p
Vn = n arcsin arcsin
n
p
let g (x) = arcsin ( x) and a = . Then

1 1
g 0 (x) = q p
p 2 2 x
1 ( x)
1
= p
2 x (1 x)

and

g 0 (a) = g 0 ( )
1
= p
2 (1 )

By (10.37) and the Delta Method

1 p Z 1
Vn !D p Z (1 )= N 0;
2 (1 ) 2 4

(f ) The limiting variance of Wn is equal to (1 ) which depends on . The limiting

variance of Zn is 1 which does not depend on . The limiting variance of Vn is 1=4 which
p
does not depend on . The transformation g (x) = arcsin ( x) is a variance-stabilizing
transformation for the Binomial distribution.
10.4. CHAPTER 5 445

8: X1 ; X2 ; : : : are independent Geometric( ) random variables with

1
E (Xi ) =

and
1
V ar (Xi ) = 2

i = 1; 2; : : :
(a) Since X1 ; X2 ; : : : are independent and identically distributed random variables with
E (Xi ) = 1 and V ar (Xi ) = 1 2 < 1, then by the Weak Law of Large Numbers
Yn 1 Pn 1
Xn = = Xi !p (10.38)
n n i=1
(b) Since X1 ; X2 ; : : : are independent and identically distributed random variables with
E (Xi ) = 1 and V ar (Xi ) = 1 2 < 1, then by the Central Limit Theorem
p
n Xn 1
q !D Z N (0; 1) (10.39)
1
2

By (10.39) and Slutsky’s Theorem

r p 1
r
p 1 1 n Xn 1
Wn = n Xn = 2
q !D W = 2 Z
1
2

q
1
Since Z N(0; 1), W = 2 Z N 0; 1 2 and therefore

p 1 1
Wn = n Xn !D W N 0; 2 (10.40)

(c) By (10.38) and the Limit Theorems

1 1
Vn = !p = = (10.41)
1 + Xn 1+ 1 +1
(d) To …nd the distribution of p
n (Vn )
Zn = p
Vn2 (1 Vn )
we …rst note that
p 1
Wn = n Xn
p 1
= n Xn 1
p 1
= n Xn + 1
p 1 1
= n
Vn
446 10. SOLUTIONS TO SELECTED END OF CHAPTER PROBLEMS

Therefore by (10.40)
p 1 1 1
n !D W N 0; 2 (10.42)
Vn
Next we determine the limiting distribution of
p
n (Vn )

1 1
which is the numerator of Zn . Let g (x) = x and a = . Then

g 0 (x) = x 2

and
g 0 (a) = g 0 1
= 2

By (10.42) and the Delta Theorem

r
p 2 1 p 2
n (Vn ) !D Z= 1 Z N 0; (1 ) (10.43)

By (10.43) and Slutsky’s Theorem

1 p 1 p
q n (Vn ) !D q 1 Z = Z N (0; 1)
2 2
(1 ) (1 )

or p
n (Vn )
q !D Z N (0; 1) (10.44)
2
(1 )
since if Z N(0; 1) then Z N(0; 1) by symmetry of the N(0; 1) distribution.
Since p
p pn(V n )
n (Vn ) 2
(1 )
Zn = p =q 2 (10.45)
Vn2 (1 Vn ) Vn (1 Vn )
2
(1 )

then by (10.41) and the Limit Theorems

s s
2
Vn2 (1 Vn ) (1 )
2 !p 2 =1 (10.46)
(1 ) (1 )

By (10.44), (10.45), (10.46), and Slutsky’s Theorem

Z
Zn !D =Z N (0; 1)
1
10.4. CHAPTER 5 447

9: X1 ; X2 ; : : : ; Xn are independent Gamma(2; ) random variables with

2
E (Xi ) = 2 and V ar (Xi ) = 2 for i = 1; 2; : : : ; n

(a) Since X1 ; X2 ; : : : are independent and identically distributed random variables with
E (Xi ) = 2 and V ar (Xi ) = 2 2 < 1 then by the Weak Law of Large Numbers

1 Pn
Xn = Xi !p 2 (10.47)
n i=1

By (10.47) and the Limit Theorems

Xn 2 p
p !p p = 2
2 2
and p
2
p !p 1 (10.48)
Xn = 2
(b) Since X1 ; X2 ; : : : are independent and identically distributed random variables with
E (Xi ) = 2 and V ar (Xi ) = 2 2 < 1 then by the Central Limit Theorem
p p
n Xn 2 n Xn 2
Wn = p = p !D Z N (0; 1) (10.49)
2 2
2
By (10.49) and Slutsky’s Theorem
p p 2
Vn = n Xn 2 !D 2 Z N 0; 2 (10.50)

(c) By (10.48), (10.49) and Slutsky’s Theorem

p "p #" p #
n Xn 2 n Xn 2 2
Zn = p = p p !D Z (1) = Z N (0; 1)
Xn = 2 2 Xn = 2

(d) To determine the limiting distribution of

p
Un = n log Xn log (2 )
let g (x) = log x and a = 2 . Then g 0 (x) = 1=x and g 0 (a) = g 0 (2 ) = 1= (2 ). By (10.50)
and the Delta Method
1p Z 1
Un !D 2 Z=p N 0;
2 2 2
(e) The limiting variance of Zn is 1 which does not depend on . The limiting variance
of Un is 1=2 which does not depend on . The transformation g (x) = log x is a variance-
stabalizing transformation for the Gamma distribution.
448 10. SOLUTIONS TO SELECTED END OF CHAPTER PROBLEMS

10.5 Chapter 6
3: (a) If Xi Geometric( ) then

f (x; ) = (1 )x for x = 0; 1; : : : ; 0 < <1

The likelihood function is

Q
n
L( ) = f (xi ; )
i=1
Q
n
= (1 )xi
i=1
= n
(1 )t for 0 < <1

where
P
n
t= xi
i=1
The log likelihood function is

l ( ) = log L ( ) = n log + t log (1 ) for 0 < <1

The score function is

n t
S ( ) = l0 ( ) =
1
n (n + t)
= for 0 < <1
(1 )
n
S ( ) = 0 for =
n+t
Since S ( ) > 0 for 0 < < n= (n + t) and S ( ) < 0 for n= (n + t) < < 1; therefore by
the …rst derivative test the maximum likelihood estimate of is

^= n
n+t
and the maximum likelihood estimator is

^= n P
n
where T = Xi
n+T i=1

The information function is

I( ) = S 0 ( ) = l00 ( )
n t
= 2 + for 0 < <1
(1 )2

Since I ( ) > 0 for all 0 < < 1, the graph of l ( ) is concave down and this also con…rms
that ^ = n= (n + t) is the maximum likelihood estimate.
10.5. CHAPTER 6 449

3: (b) The observed information is

n t n t
I(^) = 2 + = 2 + t 2
^ (1 ^)2 n ( n+t )
n+t

(n + t)2 (n + t)2 (n + t)3

= + =
n t nt
n
=
^2 (1 ^)

The expected information is

n T nE (T )
E 2 + 2 = 2 +
(1 ) (1 )2
n n (1 )=
= 2 +
(1 )2
(1 )+
= n 2
(1 )
n
= 2 for 0 < <1
(1 )

3: (c) Since
(1 )
= E (Xi ) =

then by the invariance property of maximum likelihood estimators the maximum likelihood
estimator of = E (Xi ) is
1 ^ T
^= = =X
^ n
P
20
3: (d) If n = 20 and t = xi = 40 then the maximum likelihood estimate of is
i=1

^= 20 1
=
20 + 40 3
The relative likelihood function of is given by

L( ) 20
(1 )40
R( ) = = for 0 1
L(^) 1 20 2 40
3 3

A graph of R ( ) is given in Figure 10.25A 15% likelihood interval is found by solving

R ( ) = 0:15. The 15% likelihood interval is [0:2234; 0:4570].

R (0:5) = 0:03344 implies that = 0:5 is outside a 10% likelihood interval and we would
conclude that = 0:5 is not a very plausible value of given the data.
450 10. SOLUTIONS TO SELECTED END OF CHAPTER PROBLEMS

1.0
0.8
0.6
R(θ)

0.4
0.2
0.0

0.2 0.3 0.4 0.5

Figure 10.25: Relative likelihood function for Problem 3

4: Since (X1 ; X2 ; X3 ) Multinomial n; 2

; 2 (1 ) ; (1 )2 the likelihood function is

n! h in x1 x2
2 x1
L ( 1; 2) = [2 (1 )]x2 (1 )2
x1 !x2 ! (n x1 x2 )!
n!
= 2 x2 2x1 +x2
(1 )2n 2x1 x2
for 0 < <1
x1 !x2 ! (n x1 x2 )!
The log likelihood function is

n!
l ( ) = log + x2 log 2
x1 !x2 ! (n x1 x2 )!
+ (2x1 + x2 ) log + (2n 2x1 x2 ) log (1 ) for 0 < <1

The score function is

2x1 + x2 2n
(2x1 + x2 )
S( ) =
1
(2x1 + x2 ) (1 ) [2n (2x1 + x2 )]
=
(1 )
(2x1 + x2 ) (2x1 + x2 ) 2n + (2x1 + x2 )
=
(1 )
(2x1 + x2 ) 2n
= for 0 < < 1
(1 )
2x1 + x2
S ( ) = 0 if =
2n
10.5. CHAPTER 6 451

Since
2x1 + x2
S ( ) > 0 for 0 < <
2n
and
2x1 + x2
S ( ) < 0 for 1 > >
2n
therefore by the …rst derivative test, l ( ) has an absolute maximum at
= (2x1 + x2 ) = (2n). Thus the maximum likelihood estimate of is

^ = 2x1 + x2
2n
and the maximum likelihood estimator of is
^ = 2X1 + X2
2n
The information function is
2x1 + x2 2n (2x1 + x2 )
I( )= 2 + for 0 < <1
(1 )2
and the observed information is
2x1 + x2 (2n)2 (2x1 + x2 ) (2n)2 [2n (2x1 + x2 )]
I(^) = I = +
2n (2x1 + x2 )2 [2n (2x1 + x2 )]2
(2n)2 (2n)2 2n
= + = 2x1 +x2 2x1 +x2
(2x1 + x2 ) [2n (2x1 + x2 )] 2n 1 2n
2n
=
^ 1 ^

Since
2
X1 Binomial n; and X2 Binomial (n; 2 (1 ))
2
E (2X1 + X2 ) = 2n + n [2 (1 )] = 2n
The expected information is
2X1 + X2 2n
(2X1 + X2 )
J( ) = E 2 +
(1 )2
2n 2n (1 ) 1 1
= 2 + 2 = 2n +
(1 ) 1
2n
= for 0 < < 1
(1 )

6: (a) Given
k
P (k children in family; ) = for k = 1; 2; : : :
1 2 1
P (0 children in family; ) = for 0 < <
1 2
452 10. SOLUTIONS TO SELECTED END OF CHAPTER PROBLEMS

and the observed data

No. of children 0 1 2 3 4 Total
Frequency observed 17 22 7 3 1 50

the appropriate likelihood function for is based on the Multinomial model:

17
50! 1 2 22 2 7 3 3 4 1 5 6 0
L( ) = + +
17!22!7!3!1! 1
17
50! 1 2 49 1
= for 0 < <
17!22!7!3!1! 1 2
or more simply
17
1 2 49 1
L( ) = for 0 < <
1 2
The log likelihood function is
1
l ( ) = 17 log (1 2 ) 17 log (1 ) + 49 log for 0 < <
2
The score function is
34 17 49
S( ) = + +
1 2 1
2
34 + 17 2 2 + 49 1 3 +2 2
=
(1 ) (1 2 )
2
98 164 + 49 1
= for 0 < <
(1 ) (1 2 ) 2
The information function is
68 17 49 1
I( )= + for 0 < <
(1 2 )2 (1 )2 2 2

6: (b) S ( ) = 0 if
2 2 82 1
98 164 + 49 = 0 or + =0
49 2
Therefore S ( ) = 0 if
q s s
82 82 2 1
49 49 41 1 4 2 82 2 41 1 (82)2 2 (49)2
= = 2=
2 49 2 49 49 2 (49)2
41 1p 41 1 p
= 6724 4802 = 1922
49 98 49 98
p
Since 0 < < 21 and = 41 1
49 + 98 1922 > 1, we choose

41 1p
= 1922
49 98
10.5. CHAPTER 6 453

Since
41 1p 41 1p 1
S ( ) > 0 for 0 < < = 1922 and S ( ) < 0 for = 1922 < <
49 98 49 98 2
therefore the maximum likelihood estimate of is

^= 41 1p
= 1922 0:389381424147286 0:3894
49 98
The observed information for the given data is
68 17 49
I(^) = + 2 1666:88
(1 2^)2 (1 ^)2 ^

6: (c) A graph of the relative likelihood function is given in Figure 10.26.A 15% likelihood
1.0
0.8
0.6
R(θ)

0.4
0.2
0.0

0.30 0.35 0.40 0.45

Figure 10.26: Relative likelihood function for Problem 6

interval for is [0:34; 0:43].

6: (d) Since R (0:45) 0:0097, = 0:45 is outside a 1% likelihood interval and therefore
= 0:45 is not a plausible value of for these data.
6: (e) The expected frequencies are calculated using
!
1 2^ k
e0 = 50 and ek = 50^ for k = 1; 2; : : :
1 ^

The observed and expected frequencies are

454 10. SOLUTIONS TO SELECTED END OF CHAPTER PROBLEMS

No. of children 0 1 2 3 4 Total

Observed Frequency: fk 17 22 7 3 1 50
Expected Frequency: ek 18:12 19:47 7:58 2:95 1:15 50
We see that the agreement between observed and expected frequencies is very good and
the model gives a reasonable …t to the data.

7: Show ^ = x(1) is the maximum likelihood estimate. By Example 2.5.3(a) is a location

parameter for this distribution. By Theorem 6.6.4 Q (X; ) = ~ = X(1) is a pivotal
quantity.
P (Q (X; ) q) = P ~ q
= P X(1) q
= 1 P X(1) q +
Qn
= 1 e (q+ ) since P (Xi > x) = e (x )
for x >
i=1
nq
= 1 e for q 0
Since

P ~ + 1 log (1 p) ~
n
~ 1
= P 0 log (1 p)
n
1
= P 0 Q (X; ) log (1 p)
n
1
= P Q (X; ) log (1 p) P (Q (X; ) 0)
n
log(1 p)
= 1 e 0
= 1 (1 p) = p
h i
^+ 1
log (1 p) ; ^ is a 100p% con…dence interval for .
n
Since
1 1 p ~ + 1 log 1 + p
P ~ + log
n 2 n 2
1 1+p ~ 1 1 p
= P log log
n 2 n 2
1 1+p 1 1 p
= P log Q (X; ) log
n 2 n 2
1 p 1+p
= 1 elog( 2 ) 1 elog( 2 )
1 p 1 p
= + + + =p
2 2 2 2
10.5. CHAPTER 6 455
h i
^+ 1
log 1 p
;^ + 1
log 1+p
is a 100p% con…dence interval for .
n 2 n 2
h i
The interval ^ + n1 log (1 p) ; ^ is a better choice since it contains ^ while the interval
h i
^ + 1 log 1 p ; ^ + 1 log 1+p does not.
n 2 n 2

1 1
8:(a) If x1 ; x2 ; : : : ; xn is an observed random sample from the Gamma 2; distribution
then the likelihood function is
Q
n
L( ) = f (xi ; )
i=1
1=2 1=2 xi
Q
n xi e
= 1
i=1 2
1=2 n
Q
n 1 n=2 t
= xi e for >0
i=1 2

where
P
n
t= xi
i=1
or more simply
n=2 t
L( ) = e for >0
The log likelihood function is

l ( ) = log L ( )
n
= log t for >0
2
and the score function is
d n n 2 t
S( )= l( ) = t= for >0
d 2 2
n
S ( ) = 0 for =
2t
Since
n
S ( ) > 0 for 0 < <
2t
and
n
S ( ) < 0 for >
2t
n
therefore by the …rst derivative test l ( ) has a absolute maximum at = 2t . Thus

^= n = 1
2t 2x
is the maximum likelihood estimate of and

~= 1
2X
456 10. SOLUTIONS TO SELECTED END OF CHAPTER PROBLEMS

is the maximum likelihood estimator of .

1 1
8:(b) If Xi Gamma 2; ; i = 1; 2; : : : ; n independently then

1 1
E (Xi ) = and V ar (Xi ) =
2 2 2
and by the Weak Law of Large Numbers

1
X !p
2
and by the Limit Theorems
~ = 1 !p 1 =
2X 2 21
as required.
8:(c) By the Invariance Property of Maximum Likelihood Estimates the maximum likeli-
hood estimate of

1
= V ar (Xi ) =
2 2
is
2
1 1 4t2 t
^= 2 = 2 = =2 = 2x2
2^
n
2 2t 2n2 n

8:(d) The moment generating function of Xi is

1
M (t) = for t <
t 1=2
1

P
n
The moment generating function of Q = 2 Xi is
i=1

P
n
MQ (t) = E etQ = E exp 2t Xi
i=1
Q
n Q
n
= E [exp (2tXi )] = M (2t )
i=1 i=1
!n=2
1
= 2t
for 2t <
1
1 1
= n=2
for t <
(1 2t) 2

which is the moment generating function of a 2 (n) random variable. Therefore by the
Uniqueness Theorem for Moment Generating Functions, Q 2 (n).
10.5. CHAPTER 6 457

To construct a 95% equal tail con…dence interval for we …nd a and b such that
P (Q a) = 0:025 = P (Q > b) so that

P (a < Q < b) = P (a < 2T < b) = 0:95

or
a b
P < < = 0:95
2T 2T
a b
so that 2t ; 2t is a 95% equal tail con…dence interval for .
For n = 20 we have

P (Q 9:59) = 0:025 = P (Q > 34:17)

P
20
For t = xi = 6 a 95% equal tail con…dence interval for is
i=1

9:59 34:17
; = [0:80; 2:85]
2(6) 2(6)

Since = 0:7 is not in the 95% con…dence interval it is not a plausible value of in light of
the data.
8:(e) The information function is

d n
I( )= S( )= 2 for >0
d 2
The expected information is

n n
J ( ) = E [I ( ; X1 ; : : : ; Xn )] = E = for >0
2 2 2 2
and h i1=2 p p
n ~ n 1
J(~) ~ =p =p
2~ 2~ 2X
By the Central Limit Theorem
p
n X 21 p 1
= 2n X !D Z N (0; 1) (10.51)
p1 2
2

Let
1 1
g(x) = and a =
2x 2
then
1
g 0 (x) =
2x2
1 1
g(a) = g = 1 =
2 2 2
458 10. SOLUTIONS TO SELECTED END OF CHAPTER PROBLEMS

and
1 1
g 0 (a) = g 0 = 2 = 2 2
2 2 1
2
By the Delta Theorem and (10.51)
p 1
2n !D 2 2Z N 0; 4 4
2X
or p
2n ~ !D 2 2Z N 0; 4 4

By Slutsky’s Theorem p
2n ~
!D Z N (0; 1)
2 2
But if Z N(0; 1) then Z N(0; 1) and thus
p
n ~
p !D Z N (0; 1) (10.52)
2

Since ~ !p then by the Limit Theorems

~
!p 1 (10.53)

Thus by (10.52), (10.53) and Slutsky’s Theorem

p ~
h i1=2 n
J(~) ~ = p
2~
p
pn ~
2 Z
= ~
!D =Z N (0; 1)
1
An approximate 95% con…dence interval is given by
2 3
6 1:96 1:96 7
6^ r ; ^+ r 7
4 5
J ^ J ^

P
20
For n = 20 and t = xi = 6;
i=1

^= 1 5 n 20
6 = and J ^ = 2 = 2 = 3:6
2 20
3 2^ 2 53

and an approximate 95% con…dence interval is given by

5 1:96 5 1:96
p ; +p = [0:63; 2:70]
3 3:6 3 3:6
10.5. CHAPTER 6 459

P
20
For n = 20 and t = xi = 6 the relative likelihood function of is given by
i=1

10 6 10
L( ) e 3
R( ) = = = e10 6
for >0
L(^) 5 10 5
3 e 10

A graph of R ( ) is given in Figure 10.27.A 15% likelihood interval is found by solving

1.0
0.8
0.6
R(θ)

0.4
0.2
0.0

1 2 3 4

Figure 10.27: Relative likelihood function for Problem 8

R ( ) = 0:15. The 15% likelihood interval is [0:84; 2:91].

The exact 95% equal tail con…dence interval [0:80; 2:85], the approximate 95% con…-
dence interval [0:63; 2:70] ; and the 15% likelihood interval [0:84; 2:91] are all approximately
of the same width. The exact con…dence interval and likelihood interval are skewed to the
right while the approximate con…dence interval is symmetric about the maximum likelihood
estimate ^ = 5=3. Approximate con…dence intervals are symmetric about the maximum
likelihood estimate because they are based on a Normal approximation. Since n = 20 the
approximation cannot be completely trusted. Therefore for these data the exact con…dence
interval and the likelihood interval are both better interval estimates for .
R (0:7) = 0:056 implies that = 0:7 is outside a 10% likelihood interval so based on
the likelihood function we would conclude that = 0:7 is not a very plausible value of
given the data. Previously we noted that = 0:7 is also not contained in the exact
95% con…dence interval. Note however that = 0:7 is contained in the approximate 95%
con…dence interval and so based on the approximate con…dence interval we would conclude
that = 0:7 is a reasonable value of given the data. Again the reason for the disagreement
is because n = 20 is not large enough for the approximation to be a good one.
460 10. SOLUTIONS TO SELECTED END OF CHAPTER PROBLEMS

10.6 Chapter 7
1: Since Xi has cumulative distribution function
2
1
F (x; 1; 2) =1 for x 1; 1 > 0; 2 >0
x

the probability density function of Xi is

d
f (x; 1; 2) = F (x; 1; 2)
dx
2
2 1
= for x 1; 1 > 0; 2 >0
x x

The likelihood function is

Q
n
L ( 1; 2) = f (xi ; 1; 2)
i=1
Q
n
2 1
2
= if 0 < 1 xi ; i = 1; 2; : : : ; n and 2 >0
i=1 xi xi
1
n n 2
Q
n 2
= 2 1 xi if 0 < 1 x(1) and 2 >0
i=1

For each value of 2 the likelihood function is maximized over 1 by taking 1 to be as large
as possible subject to 0 < 1 x(1) . Therefore for …xed 2 the likelihood is maximized
for 1 = x(1) . Since this is true for all values of 2 the value of ( 1 ; 2 ) which maximizes
L ( 1 ; 2 ) will necessarily have 1 = x(1) .
To …nd the value of 2 which maximizes L x(1) ; 2 consider the function
1
n n 2 Q
n 2
L2 ( 2 ) = L x(1) ; 2 = 2 x(1) xi for 2 >0
i=1

and its logarithm

P
n
l2 ( 2 ) = log L2 ( 2 ) = n log 2 +n 2 log x(1) ( 2 + 1) log xi
i=1

Now
d n P
n
l2 ( 2 ) = l20 ( 2 ) = + n log x(1) log xi
d 2 2 i=1
n Pn xi
= log
2 i=1 x(1)
n
= t
2
n 2t
=
2
10.6. CHAPTER 7 461

where
P
n xi
t= log
i=1 x(1)
Now l20 ( 2 ) = 0 for 2 = n=t. Since l20 ( 2 ) > 0 for 0 < 2 < n=t and l20 ( 2 ) < 0 for 2 > n=t
therefore by the …rst derivative test l2 ( 2 ) is maximized for 2 = n=t = ^2 . Therefore
L2 ( 2 ) = L x(1) ; 2 is also maximized for 2 = ^2 . Therefore the maximum likelihood
estimates are
^1 = x(1) and ^2 = n
Pn
log xxi
(1)
i=1
and the maximum likelihood estimators are
~1 = X(1) and ~2 = n
P
n
Xi
log X(1)
i=1

4: (a) If the events S and H are independent events then P (S \ H) = P (S)P (H) = ,
P (S \ H) = P (S)P (H) = (1 ), etc.
The likelihood function is
n!
L( ; ) = ( )x11 [ (1 )]x12 [(1 ) ]x21 [(1 ) (1 )]x22
x11 !x12 !x21 !x22 !
or more simply (ignoring constants with respect to and )

L( ; ) = x11 +x12
(1 )x21 +x22 x11 +x21
(1 )x12 +x22 for 0 1, 0 1

The log likelihood is

l( ; ) = (x11 + x12 ) log + (x21 + x22 ) log(1 ) + (x11 + x21 ) log + (x12 + x22 ) log(1 )
for 0 < < 1, 0 < <1

Since
@l x11 + x12 x21 + x22 x11 + x12 n (x11 + x12 ) x11 + x12 n
= = =
@ 1 1 (1 )
@l x11 + x21 x12 + x22 x11 + x21 n (x11 + x21 ) x11 + x21 n
= = =
@ 1 1 (1 )
the score vector is
h i
x11 +x12 n x11 +x21 n
S( ; ) = (1 ) (1 )
for 0 < < 1, 0 < <1

Solving S( ; ) = (0; 0) gives the maximum likelihood estimates

x11 + x12 x11 + x21
^= and ^ =
n n
462 10. SOLUTIONS TO SELECTED END OF CHAPTER PROBLEMS

The information matrix is

2 3
@2l @2l
@ 2 @ @
I( ; ) = 4 5
@2l @2l
@ @ @ 2
2 x11 +x12 n (x11 +x12 ) 3
2 + (1 )2
0
= 4 5
x11 +x21 n (x11 +x21 )
0 2 + (1 )2

4:(b) Since X11 + X12 = the number of times the event S is observed and P (S) = , then
the distribution of X11 + X12 is Binomial(n; ). Therefore E (X11 + X12 ) = n and

X11 + X12 n (X11 + X12 ) n

E + =
2
(1 )2 (1 )

Since X11 + X21 = the number of times the event H is observed and P (H) = , then the
distribution of X11 + X21 is Binomial(n; ). Therefore E (X11 + X21 ) = n and

X11 + X21 n (X11 + X21 ) n

E 2 + =
(1 )2 (1 )

Therefore the expected information matrix is

2 3
n
(1 ) 0
J( ; )=4 5
n
0 (1 )

The inverse matrix is 2 3

(1 )
n 0
[J ( ; )] 1
=4 5
(1 )
0 n

Also V ar (~ ) = (1n ) and V ar( ~ ) = (1n ) so the diagonal entries of [J ( ; )] 1

give us
the variances of the maximum likelihood estimators.

xi 1
7:(a) If Yi Binomial(1; pi ) where pi = 1 + e and x1 ; x2 ; : : : ; xn are known
constants, the likelihood function for ( ; ) is
Q
n 1 yi
L( ; ) = p (1 pi )1 yi
i=1 yi i

or more simply
Q
n
L( ; ) = pyi i (1 pi )1 yi
i=1
The log likelihood function is
P
n
l ( ; ) = log L ( ; ) = [yi log (pi ) + (1 yi ) log (1 pi )]
i=1
10.6. CHAPTER 7 463

Note that
@pi @ 1 e xi
xi
= 1+e =
@ @ (1 + e x i )2

1 e xi
= xi ) (1 + e xi )
= pi (1 pi )
(1 + e
and
@pi @ 1 xi e xi
xi
= 1+e =
@ @ (1 + e xi )2

1 e xi
= xi xi ) (1 xi )
= xi pi (1 pi )
(1 + e +e
Therefore
@l @l @pi Pn yi (1 yi ) @pi
= =
@ @pi @ i=1 pi (1 pi ) @
Pn yi (1 pi ) (1 yi ) pi
= pi (1 pi )
i=1 pi (1 pi )
Pn
= [yi (1 pi ) (1 yi ) pi ]
i=1
Pn
= (yi pi )
i=1

@l @l @pi Pn yi (1 yi ) @pi
= =
@ @pi @ i=1 pi (1 pi ) @
P yi (1 pi ) (1 yi ) pi
n
= xi pi (1 pi )
i=1 pi (1 pi )
Pn
= xi [yi (1 pi ) (1 yi ) pi ]
i=1
Pn
= xi (yi pi )
i=1

The score vector is 2 3

" # P
n
@l (yi pi )
6 7
S( ; )= @
=6
4
i=1 7
5
@l P
n
@ xi (yi pi )
i=1
To obtain the expected information we …rst note that
@2l @ @l @ P
n Pn @p
i P
n
2
= = (yi pi ) = = pi (1 pi )
@ @ @ @ i=1 i=1 @ i=1

@2l @ @l @ P
n Pn @p
i P
n
= = (yi pi ) = = xi pi (1 pi )
@ @ @ @ @ i=1 i=1 @ i=1
464 10. SOLUTIONS TO SELECTED END OF CHAPTER PROBLEMS

and
@2l @ @l @ P
n P
n @pi P
n
2 = @ = xi (yi pi ) = xi = x2i pi (1 pi )
@ @ @ i=1 i=1 @ i=1

The information matrix is

2 3
@2l @2l
@ 2 @ @
I( ; ) = 4 5
@2l @2l
@ @ @ 2
2 3
P
n P
n
6 pi (1 pi ) xi pi (1 pi ) 7
6 i=1 i=1 7
= 6 n 7
4 P Pn 5
xi pi (1 pi ) x2i pi (1 pi )
i=1 i=1

which is a constant function of the random variables Y1 ; Y2 ; :::; Yn and therefore the expected
information is J ( ; ) = I ( ; )

5:(b) The maximum likelihood estimates of and are found by solving the equations
h i
@l @l
S( ; ) = @ @
P
n P
n
= (yi pi ) xi (yi pi )
i=1 i=1
h i
= 0 0

which must be done numerically.

Newton’s method is given by
h i h i h i 1
(i+1) (i+1) (i) (i) (i) (i) (i) (i)
= +S ; I ; i = 0; 1; : : : convergence

(0) ; (0)
where is an initial estimate of ( ; ).
10.7. CHAPTER 8 465

10.7 Chapter 8
1:(a) The hypothesis H0 : = 0 is a simple hypothesis since the model is completely
speci…ed.
From Example 6.3.6 the likelihood function is

n Q
n
L( ) = xi for >0
i=1

The log likelihood function is

P
n
l( ) = n log + log xi for >0
i=1

and the maximum likelihood estimate is

^= n
P
n
log xi
i=1

The relative likelihood function is

n ^
L( ) Q
n
R( ) = = xi for 0
L(^) ^ i=1

The likelihood ratio test statistic for H0 : = 0 is

( 0 ; X) = 2 log R ( 0 ; X)
" ~#
n
0 Q
n 0
= 2 log Xi
~ i=1

~ P log Xi
n
0
= 2 n log 0
~ i=1
0 ~ 1Pn
= 2 n log +n 0 log Xi
~ n i=1
P
n
log Xi
= 2n log
0
+ 0
~ 1 since i=1
=
1
~ ~ n ~
0 0
= 2n 1 log
~ ~

The observed value of the likelihood ratio test statistic is

( 0 ; x) = 2 log R ( 0 ; X)
0 0
= 2n 1 log
^ ^
466 10. SOLUTIONS TO SELECTED END OF CHAPTER PROBLEMS

The parameter space is = f : > 0g which has dimension 1 and thus k = 1. The
approximate p-value is
2
p-value P (W ( 0 ; x)) where W (1)
h p i
= 2 1 P Z ( 0 ; x) where Z N (0; 1)

P
20
(b) If n = 20 and log xi = 25 and H0 : = 1 then ^ = 20=25 = 0:8 the observed value
i=1
of the likelihood ratio test statistic is

( 0 ; x) = 2 log R ( 0 ; X)
1 1
= 2 (20) 1 log
0:8 0:8
= 40 [0:25 log (1:25)]
= 1:074258

and
2
p value P (W 1:074258) where W (1)
h p i
= 2 1 P Z 1:074258 where Z N (0; 1)
= 0:2999857

calculated using R. Since p value > 0:1 there is no evidence against H0 : = 1 based on
the data.

4: Since = f( 1 ; 2 ) : 1 > 0; 2 > 0g which has dimension k = 2 and

0 = f( 1 ; 2 ) : 1 = 2 ; 1 > 0; 2 > 0g which has dimension q = 1 and the hypothesis is
composite.
From Example 6.5.2 the likelihood function for an observed random sample x1 ; x2 ; : : : ; xn
from an Weibull(2; 1 ) distribution is

2n 1 P
n
L1 ( 1 ) = 1 exp 2 x2i for 1 >0
1 i=1

1=2
P
n
with maximum likelihood estimate ^1 = 1
n x2i .
i=1
Similarly the likelihood function for an observed random sample y1 ; y2 ; : : : ; ym from a
Weibull(2; 2 ) distribution is

2m 1 P
m
L2 ( 2 ) = 2 exp 2 yi2 for 2 >0
2 i=1

1=2
P
m
with maximum likelihood estimate ^2 = 1
m yi2 .
i=1
10.7. CHAPTER 8 467

Since the samples are independent the likelihood function for ( 1 ; 2) is

L ( 1; 2) = L1 ( 1 )L2 ( 2 ) for 1 > 0; 2 >0

and the log likelihood function

1 P
n 1 P
m
l ( 1; 2) = 2n log 1 2 x2i 2m log 2 2 yi2 for 1 > 0; 2 >0
1 i=1 2 i=1

The independence of the samples implies the maximum likelihood estimators are
1=2 1=2
~1 = 1 Pn
~2 = 1 Pm
X2 Y2
n i=1 i m i=1 i
Therefore
1 Pn 1 Pm
l(~1 ; ~2 ; X; Y) = n log X2 m log Y2 (n + m)
n i=1 i m i=1 i
If 1 = 2 = then the log likelihood function is
1 P
n P
m
l( ) = 2 (n + m) log 2 x2i + yi2 for >0
i=1 i=1

which is only a function of . To determine max l( 1 ; 2 ; X; Y) we note that

( 1 ; 2 )2 0

d 2 (n + m) 2 P
n P
m
l( ) = + 3 x2i + yi2
d i=1 i=1
d
and d l ( ) = 0 for
1=2
1 P
n P
m
= x2i + yi2
n+m i=1 i=1
and therefore
1 P
n P
m
max l( 1 ; 2 ; X; Y) = (n + m) log Xi2 + Yi2 (n + m)
( 1 ; 2 )2 0 n+m i=1 i=1

The likelihood ratio test statistic is

(X; Y; 0)

= 2 l(~1 ; ~2 ; X; Y) max l( 1 ; 2 ; X; Y)
( 1 ; 2 )2 0

1 P
n 1 Pm
= 2[ n log Xi2 m log Y2 (n + m)
n i=1 m i=1 i
1 Pn P
m
+ (n + m) log Xi2 + Yi2 + (n + m) ]
n + m i=1 i=1
1 Pn P
m
= 2[ (n + m) log Xi2 + Yi2
n + m i=1 i=1
1 P n 1 Pm
n log Xi2 m log Y2 ]
n i=1 m i=1 i
468 10. SOLUTIONS TO SELECTED END OF CHAPTER PROBLEMS

with corresponding observed value

1 P
n P
m
(x; y; 0) = 2[ (n + m) log x2i + yi2
n+m i=1 i=1
1 Pn 1 Pm
n log x2 m log y2 ]
n i=1 i m i=1 i

Since k q=2 1=1

2
p-value P [W (x; y; 0 )] where W (1)
h p i
= 2 1 P Z (x; y; 0 ) where Z N (0; 1)
11. Summary of Named
Distributions

469
Summary of Discrete Distributions
Moment
Probability
Notation and Mean Variance Generating
Function
Parameters EX VarX Function
fx
Mt

Discrete Uniforma, b b
1
b−a1 ab b−a1 2 −1 1
b−a1
∑ e tx
b≥a 2 12 xa
x  a, a  1, … , b
a, b integers t∈

HypergeometricN, r, n   xr  N−r
n−x 
 Nn 
N  1, 2, … nr nr
1 − r
 N−n Not tractable
N N N N−1
n  0, 1, … , N x  max 0, n − N  r,
r  0, 1, … , N … , minr, n

Binomialn, p
 nx p x q n−x np npq pe t  q n
0 ≤ p ≤ 1, q  1 − p
x  0, 1, … , n t∈
n  1, 2, …

Bernoullip p x q 1−x p pq pe t  q
0 ≤ p ≤ 1, q  1 − p x  0, 1 t∈

Negative Binomialk, p  xk−1 k x

x p q p k
kq kq
−k k x 1−qe t
0  p ≤ 1, q  1 − p  x p −q p p2
t  − ln q
k  1, 2, … x  0, 1, …
p
Geometricp pq x q q 1−qe t
p p2
0  p ≤ 1, q  1 − p x  0, 1, … t  − ln q
 x e − t
Poisson x!   e e −1
≥0 x  0, 1, … t∈

fx 1 , x 2 , … , x k  
Multinomialn; p 1 , p 2 , … , p k  Mt 1 , t 2 , … , t k−1  
n!
x 1 !x 2 !x k !
p x11 p x22 p xk k
0 ≤ pi ≤ 1 VarX i  p 1 e t 1  p 2 e t 2  
x i  0, 1, … , n EX i   np i
i  1, 2, … , k  np i 1 − p i  p k−1 e t k−1  p k  n
i  1, 2, … , k i  1, 2, … , k
k i  1, 2, … , k ti ∈ 
and ∑ p i  1 k
i1 and ∑ x i  n i  1, 2, … , k − 1
i1
Summary of Continuous Distributions
Probability Moment
Notation and Density Mean Variance Generating
Parameters Function EX VarX Function
fx Mt

e bt −e at
Uniforma, b
1
ab b−a 2 b−at
t≠0
b−a
2 12
ba axb 1 t0
Γab
ΓaΓb
x a−1 1 − x b−1
 k−1
Betaa, b 0x1 a ab 1∑  ai
abi
tk
k!
ab ab1ab 2 k1 i0
a  0, b  0 
Γ   x −1 e −x dx t∈
0

2 2
e −x− /2  2 t 2 /2
N,  2  2   2 e t
 ∈ ,  2  0 x∈ t∈
2 2
e −log x− /2  2
Lognormal,  2  2 /2 e 2
2 x e  DNE
2 2 2
 ∈ ,   0 x0 −e

1 1
Exponential e −x/
  2 1−t
1
0 x≥0 t 

Two Parameter 1 e t
e −x−/ 2 1−t
Exponential, 
  
1
x≥ t 
 ∈ ,   0

Double 1 e t
e −|x−|/ 2 1− 2 t 2 
Exponential, 
2  2
1
x∈ |t|  
 ∈ ,   0

1 x−/   − 
Extreme Value,  
e x−/−e 22 e t Γ1  t
  0. 5772 6
 ∈ ,   0 x∈ t  −1/
Euler’s constant

x −1 e −x/
Gamma,  1 − t −
  Γ   2
1
  0,   0 x0 t 

x −−1 e −1/x 1 1
Inverse Gamma,    Γ −1  2 −1 2 −2 DNE
  0,   0 x0 1 2
Summary of Continuous Distributions Continued
Probability Moment
Notation and Density Mean Variance Generating
Parameters Function EX VarX Function
fx Mt
x k/2−1 e −x/2
 2 k 2 k/2 Γk/2 k 2k 1 − 2t −k/2
1
k  1, 2, … x0 t 2

 
Weibull,  x −1 e −x/ 1  2 Γ1  2

 Γ1     Not tractable
2 1
  0,   0 x0 −Γ 1   

   2
Pareto,  x 1 −1 −1 2 −2 DNE
  0,   0 x 1 2
e −x−/
Logistic,   1e −x−/
2

22
e t Γ1 − tΓ1  t
3
 ∈ ,   0 x∈
1
Cauchy,   1x−/ 2 DNE DNE DNE
 ∈ ,   0 x∈
2 −k1/2
Γ k1 1 xk k
tk 2
k
0 k−2 DNE
k Γ 2
k  1, 2, … k  2, 3, … k  3, 4, …
x∈
k1
k1 2 k 1 k 2
k2
Γ 2
Fk 1 , k 2  k1 k2
 k2 2k 22 k 1 k 2 −2
Γ 2
Γ 2
k 1  1, 2, … k 1 k 2
k 2 −2 k 1 k 2 −2 2 k 2 −4 DNE
k1 −
x 2
−1
1 k1
x 2 k2  2 k2  4
k 2  1, 2, … k2

x0
X1
X X2
 BVN, 
1
 2

 1 ∈ ,  2 ∈  fx 1 , x 2   Mt 1 , t 2 
1
e − 12 x− T  −1 x−    e
T t 1 t T t
 21  1  2 2|| 1/2
2


 1  2  22 x 1 ∈ , x 2 ∈  t 1 ∈ , t 2 ∈ 

 1  0,  2  0
−1    1
12. Distribution Tables

473
N(0,1) Cumulative
Distribution Function

This table gives values of F(x) = P(X ≤ x) for X ~ N(0,1) and x ≥ 0

x 0.00 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09
0.0 0.50000 0.50399 0.50798 0.51197 0.51595 0.51994 0.52392 0.52790 0.53188 0.53586
0.1 0.53983 0.54380 0.54776 0.55172 0.55567 0.55962 0.56356 0.56749 0.57142 0.57535
0.2 0.57926 0.58317 0.58706 0.59095 0.59483 0.59871 0.60257 0.60642 0.61026 0.61409
0.3 0.61791 0.62172 0.62552 0.62930 0.63307 0.63683 0.64058 0.64431 0.64803 0.65173
0.4 0.65542 0.65910 0.66276 0.66640 0.67003 0.67364 0.67724 0.68082 0.68439 0.68793
0.5 0.69146 0.69497 0.69847 0.70194 0.70540 0.70884 0.71226 0.71566 0.71904 0.72240
0.6 0.72575 0.72907 0.73237 0.73565 0.73891 0.74215 0.74537 0.74857 0.75175 0.75490
0.7 0.75804 0.76115 0.76424 0.76730 0.77035 0.77337 0.77637 0.77935 0.78230 0.78524
0.8 0.78814 0.79103 0.79389 0.79673 0.79955 0.80234 0.80511 0.80785 0.81057 0.81327
0.9 0.81594 0.81859 0.82121 0.82381 0.82639 0.82894 0.83147 0.83398 0.83646 0.83891
1.0 0.84134 0.84375 0.84614 0.84849 0.85083 0.85314 0.85543 0.85769 0.85993 0.86214
1.1 0.86433 0.86650 0.86864 0.87076 0.87286 0.87493 0.87698 0.87900 0.88100 0.88298
1.2 0.88493 0.88686 0.88877 0.89065 0.89251 0.89435 0.89617 0.89796 0.89973 0.90147
1.3 0.90320 0.90490 0.90658 0.90824 0.90988 0.91149 0.91309 0.91466 0.91621 0.91774
1.4 0.91924 0.92073 0.92220 0.92364 0.92507 0.92647 0.92785 0.92922 0.93056 0.93189
1.5 0.93319 0.93448 0.93574 0.93699 0.93822 0.93943 0.94062 0.94179 0.94295 0.94408
1.6 0.94520 0.94630 0.94738 0.94845 0.94950 0.95053 0.95154 0.95254 0.95352 0.95449
1.7 0.95543 0.95637 0.95728 0.95818 0.95907 0.95994 0.96080 0.96164 0.96246 0.96327
1.8 0.96407 0.96485 0.96562 0.96638 0.96712 0.96784 0.96856 0.96926 0.96995 0.97062
1.9 0.97128 0.97193 0.97257 0.97320 0.97381 0.97441 0.97500 0.97558 0.97615 0.97670
2.0 0.97725 0.97778 0.97831 0.97882 0.97932 0.97982 0.98030 0.98077 0.98124 0.98169
2.1 0.98214 0.98257 0.98300 0.98341 0.98382 0.98422 0.98461 0.98500 0.98537 0.98574
2.2 0.98610 0.98645 0.98679 0.98713 0.98745 0.98778 0.98809 0.98840 0.98870 0.98899
2.3 0.98928 0.98956 0.98983 0.99010 0.99036 0.99061 0.99086 0.99111 0.99134 0.99158
2.4 0.99180 0.99202 0.99224 0.99245 0.99266 0.99286 0.99305 0.99324 0.99343 0.99361
2.5 0.99379 0.99396 0.99413 0.99430 0.99446 0.99461 0.99477 0.99492 0.99506 0.99520
2.6 0.99534 0.99547 0.99560 0.99573 0.99585 0.99598 0.99609 0.99621 0.99632 0.99643
2.7 0.99653 0.99664 0.99674 0.99683 0.99693 0.99702 0.99711 0.99720 0.99728 0.99736
2.8 0.99744 0.99752 0.99760 0.99767 0.99774 0.99781 0.99788 0.99795 0.99801 0.99807
2.9 0.99813 0.99819 0.99825 0.99831 0.99836 0.99841 0.99846 0.99851 0.99856 0.99861
3.0 0.99865 0.99869 0.99874 0.99878 0.99882 0.99886 0.99889 0.99893 0.99896 0.99900
3.1 0.99903 0.99906 0.99910 0.99913 0.99916 0.99918 0.99921 0.99924 0.99926 0.99929
3.2 0.99931 0.99934 0.99936 0.99938 0.99940 0.99942 0.99944 0.99946 0.99948 0.99950
3.3 0.99952 0.99953 0.99955 0.99957 0.99958 0.99960 0.99961 0.99962 0.99964 0.99965
3.4 0.99966 0.99968 0.99969 0.99970 0.99971 0.99972 0.99973 0.99974 0.99975 0.99976
3.5 0.99977 0.99978 0.99978 0.99979 0.99980 0.99981 0.99981 0.99982 0.99983 0.99983

N(0,1) Quantiles: This table gives values of F-1(p) for p ≥ 0.5

p 0.00 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.075 0.08 0.09 0.095
0.5 0.0000 0.0251 0.0502 0.0753 0.1004 0.1257 0.1510 0.1764 0.1891 0.2019 0.2275 0.2404
0.6 0.2533 0.2793 0.3055 0.3319 0.3585 0.3853 0.4125 0.4399 0.4538 0.4677 0.4959 0.5101
0.7 0.5244 0.5534 0.5828 0.6128 0.6433 0.6745 0.7063 0.7388 0.7554 0.7722 0.8064 0.8239
0.8 0.8416 0.8779 0.9154 0.9542 0.9945 1.0364 1.0803 1.1264 1.1503 1.1750 1.2265 1.2536
0.9 1.2816 1.3408 1.4051 1.4758 1.5548 1.6449 1.7507 1.8808 1.9600 2.0537 2.3263 2.5758
Chi‐Squared Quantiles
This table gives values of x for p = P(X ≤ x) = F(x)
df\p 0.005 0.01 0.025 0.05 0.1 0.9 0.95 0.975 0.99 0.995
1 0.000 0.000 0.001 0.004 0.016 2.706 3.842 5.024 6.635 7.879
2 0.010 0.020 0.051 0.103 0.211 4.605 5.992 7.378 9.210 10.597
3 0.072 0.115 0.216 0.352 0.584 6.251 7.815 9.348 11.345 12.838
4 0.207 0.297 0.484 0.711 1.064 7.779 9.488 11.143 13.277 14.860
5 0.412 0.554 0.831 1.146 1.610 9.236 11.070 12.833 15.086 16.750
6 0.676 0.872 1.237 1.635 2.204 10.645 12.592 14.449 16.812 18.548
7 0.989 1.239 1.690 2.167 2.833 12.017 14.067 16.013 18.475 20.278
8 1.344 1.647 2.180 2.733 3.490 13.362 15.507 17.535 20.090 21.955
9 1.735 2.088 2.700 3.325 4.168 14.684 16.919 19.023 21.666 23.589
10 2.156 2.558 3.247 3.940 4.865 15.987 18.307 20.483 23.209 25.188
11 2.603 3.054 3.816 4.575 5.578 17.275 19.675 21.920 24.725 26.757
12 3.074 3.571 4.404 5.226 6.304 18.549 21.026 23.337 26.217 28.300
13 3.565 4.107 5.009 5.892 7.042 19.812 22.362 24.736 27.688 29.819
14 4.075 4.660 5.629 6.571 7.790 21.064 23.685 26.119 29.141 31.319
15 4.601 5.229 6.262 7.261 8.547 22.307 24.996 27.488 30.578 32.801
16 5.142 5.812 6.908 7.962 9.312 23.542 26.296 28.845 32.000 34.267
17 5.697 6.408 7.564 8.672 10.085 24.769 27.587 30.191 33.409 35.718
18 6.265 7.015 8.231 9.391 10.865 25.989 28.869 31.526 34.805 37.156
19 6.844 7.633 8.907 10.117 11.651 27.204 30.144 32.852 36.191 38.582
20 7.434 8.260 9.591 10.851 12.443 28.412 31.410 34.170 37.566 39.997
21 8.034 8.897 10.283 11.591 13.240 29.615 32.671 35.479 38.932 41.401
22 8.643 9.542 10.982 12.338 14.041 30.813 33.924 36.781 40.289 42.796
23 9.260 10.196 11.689 13.091 14.848 32.007 35.172 38.076 41.638 44.181
24 9.886 10.856 12.401 13.848 15.659 33.196 36.415 39.364 42.980 45.559
25 10.520 11.524 13.120 14.611 16.473 34.382 37.652 40.646 44.314 46.928
26 11.160 12.198 13.844 15.379 17.292 35.563 38.885 41.923 45.642 48.290
27 11.808 12.879 14.573 16.151 18.114 36.741 40.113 43.195 46.963 49.645
28 12.461 13.565 15.308 16.928 18.939 37.916 41.337 44.461 48.278 50.993
29 13.121 14.256 16.047 17.708 19.768 39.087 42.557 45.722 49.588 52.336
30 13.787 14.953 16.791 18.493 20.599 40.256 43.773 46.979 50.892 53.672
40 20.707 22.164 24.433 26.509 29.051 51.805 55.758 59.342 63.691 66.766
50 27.991 29.707 32.357 34.764 37.689 63.167 67.505 71.420 76.154 79.490
60 35.534 37.485 40.482 43.188 46.459 74.397 79.082 83.298 88.379 91.952
70 43.275 45.442 48.758 51.739 55.329 85.527 90.531 95.023 100.430 104.210
80 51.172 53.540 57.153 60.391 64.278 96.578 101.880 106.630 112.330 116.320
90 59.196 61.754 65.647 69.126 73.291 107.570 113.150 118.140 124.120 128.300
100 67.328 70.065 74.222 77.929 82.358 118.500 124.340 129.560 135.810 140.170
Student t Quantiles
This table gives values of x for p = P(X ≤ x ) = F (x ), for p ≥ 0.6

df \ p 0.6 0.7 0.8 0.9 0.95 0.975 0.99 0.995 0.999 0.9995
1 0.3249 0.7265 1.3764 3.0777 6.3138 12.7062 31.8205 63.6567 318.3088 636.6192
2 0.2887 0.6172 1.0607 1.8856 2.9200 4.3027 6.9646 9.9248 22.3271 31.5991
3 0.2767 0.5844 0.9785 1.6377 2.3534 3.1824 4.5407 5.8409 10.2145 12.9240
4 0.2707 0.5686 0.9410 1.5332 2.1318 2.7764 3.7469 4.6041 7.1732 8.6103
5 0.2672 0.5594 0.9195 1.4759 2.0150 2.5706 3.3649 4.0321 5.8934 6.8688
6 0.2648 0.5534 0.9057 1.4398 1.9432 2.4469 3.1427 3.7074 5.2076 5.9588
7 0.2632 0.5491 0.8960 1.4149 1.8946 2.3646 2.9980 3.4995 4.7853 5.4079
8 0.2619 0.5459 0.8889 1.3968 1.8595 2.3060 2.8965 3.3554 4.5008 5.0413
9 0.2610 0.5435 0.8834 1.3830 1.8331 2.2622 2.8214 3.2498 4.2968 4.7809
10 0.2602 0.5415 0.8791 1.3722 1.8125 2.2281 2.7638 3.1693 4.1437 4.5869
11 0.2596 0.5399 0.8755 1.3634 1.7959 2.2010 2.7181 3.1058 4.0247 4.4370
12 0.2590 0.5386 0.8726 1.3562 1.7823 2.1788 2.6810 3.0545 3.9296 4.3178
13 0.2586 0.5375 0.8702 1.3502 1.7709 2.1604 2.6503 3.0123 3.8520 4.2208
14 0.2582 0.5366 0.8681 1.3450 1.7613 2.1448 2.6245 2.9768 3.7874 4.1405
15 0.2579 0.5357 0.8662 1.3406 1.7531 2.1314 2.6025 2.9467 3.7328 4.0728
16 0.2576 0.5350 0.8647 1.3368 1.7459 2.1199 2.5835 2.9208 3.6862 4.0150
17 0.2573 0.5344 0.8633 1.3334 1.7396 2.1098 2.5669 2.8982 3.6458 3.9651
18 0.2571 0.5338 0.8620 1.3304 1.7341 2.1009 2.5524 2.8784 3.6105 3.9216
19 0.2569 0.5333 0.8610 1.3277 1.7291 2.0930 2.5395 2.8609 3.5794 3.8834
20 0.2567 0.5329 0.8600 1.3253 1.7247 2.0860 2.5280 2.8453 3.5518 3.8495
21 0.2566 0.5325 0.8591 1.3232 1.7207 2.0796 2.5176 2.8314 3.5272 3.8193
22 0.2564 0.5321 0.8583 1.3212 1.7171 2.0739 2.5083 2.8188 3.5050 3.7921
23 0.2563 0.5317 0.8575 1.3195 1.7139 2.0687 2.4999 2.8073 3.4850 3.7676
24 0.2562 0.5314 0.8569 1.3178 1.7109 2.0639 2.4922 2.7969 3.4668 3.7454
25 0.2561 0.5312 0.8562 1.3163 1.7081 2.0595 2.4851 2.7874 3.4502 3.7251
26 0.2560 0.5309 0.8557 1.3150 1.7056 2.0555 2.4786 2.7787 3.4350 3.7066
27 0.2559 0.5306 0.8551 1.3137 1.7033 2.0518 2.4727 2.7707 3.4210 3.6896
28 0.2558 0.5304 0.8546 1.3125 1.7011 2.0484 2.4671 2.7633 3.4082 3.6739
29 0.2557 0.5302 0.8542 1.3114 1.6991 2.0452 2.4620 2.7564 3.3962 3.6594
30 0.2556 0.5300 0.8538 1.3104 1.6973 2.0423 2.4573 2.7500 3.3852 3.6460
40 0.2550 0.5286 0.8507 1.3031 1.6839 2.0211 2.4233 2.7045 3.3069 3.5510
50 0.2547 0.5278 0.8489 1.2987 1.6759 2.0086 2.4033 2.6778 3.2614 3.4960
60 0.2545 0.5272 0.8477 1.2958 1.6706 2.0003 2.3901 2.6603 3.2317 3.4602
70 0.2543 0.5268 0.8468 1.2938 1.6669 1.9944 2.3808 2.6479 3.2108 3.4350
80 0.2542 0.5265 0.8461 1.2922 1.6641 1.9901 2.3739 2.6387 3.1953 3.4163
90 0.2541 0.5263 0.8456 1.2910 1.6620 1.9867 2.3685 2.6316 3.1833 3.4019
100 0.2540 0.5261 0.8452 1.2901 1.6602 1.9840 2.3642 2.6259 3.1737 3.3905
>100 0.2535 0.5247 0.8423 1.2832 1.6479 1.9647 2.3338 2.5857 3.1066 3.3101

Course Notes STATS 325 Stochastic Processes: Department of Statistics University of Auckland
No ratings yet
Course Notes STATS 325 Stochastic Processes: Department of Statistics University of Auckland
195 pages
m820 2011
67% (3)
m820 2011
396 pages
ProbStochProc 1.42 NoSolns PDF
No ratings yet
ProbStochProc 1.42 NoSolns PDF
241 pages
NUS ST2334 Lecture Notes
No ratings yet
NUS ST2334 Lecture Notes
56 pages
Course Notes
No ratings yet
Course Notes
201 pages
Labour Law in Namibia
From Everand
Labour Law in Namibia
Collins Parker
4.5/5 (3)
Volumetric Analysis
No ratings yet
Volumetric Analysis
29 pages
Partial Molar Enthalpy
100% (2)
Partial Molar Enthalpy
4 pages
Testbank Chapter15
No ratings yet
Testbank Chapter15
11 pages
Notes PDF
No ratings yet
Notes PDF
302 pages
Notes PDF
No ratings yet
Notes PDF
302 pages
Lecture Notes Statistics 420-Probability Fall 2002
No ratings yet
Lecture Notes Statistics 420-Probability Fall 2002
127 pages
Ma 202
No ratings yet
Ma 202
219 pages
ECE342 Course Notes PDF
No ratings yet
ECE342 Course Notes PDF
165 pages
Evans Rosenthal Solutions
No ratings yet
Evans Rosenthal Solutions
367 pages
Math 272 Probability
No ratings yet
Math 272 Probability
167 pages
Prob&StatsBook PDF
No ratings yet
Prob&StatsBook PDF
202 pages
Maple Manual
No ratings yet
Maple Manual
285 pages
Prob Best-2 PDF
No ratings yet
Prob Best-2 PDF
181 pages
Probability and Stats For Data Science PDF
No ratings yet
Probability and Stats For Data Science PDF
237 pages
325 Book
100% (1)
325 Book
182 pages
EC400Stats Lecturenotes2021
No ratings yet
EC400Stats Lecturenotes2021
101 pages
Probability Notes
No ratings yet
Probability Notes
79 pages
MA-202 L
No ratings yet
MA-202 L
221 pages
Active Calculus CH 1-8 Activities (v.8.5.13) PDF
No ratings yet
Active Calculus CH 1-8 Activities (v.8.5.13) PDF
224 pages
Math 630 Course Notes Fall 2021
No ratings yet
Math 630 Course Notes Fall 2021
274 pages
Probability
No ratings yet
Probability
141 pages
00 Main
No ratings yet
00 Main
224 pages
MATH 135 Course Notes
No ratings yet
MATH 135 Course Notes
184 pages
Probabilistic Models in The Study of Language
No ratings yet
Probabilistic Models in The Study of Language
274 pages
Introduction To Probability and Statistics
100% (1)
Introduction To Probability and Statistics
179 pages
PROB Intr
No ratings yet
PROB Intr
6 pages
Course Notes Stochastic Processes - Auckland
No ratings yet
Course Notes Stochastic Processes - Auckland
195 pages
John P. Nolan - Stable Distributions
No ratings yet
John P. Nolan - Stable Distributions
34 pages
Probability May 12
No ratings yet
Probability May 12
201 pages
STAT 230 Course Notes Fall 2019
No ratings yet
STAT 230 Course Notes Fall 2019
425 pages
Lecture Notes of Advanced Probability
No ratings yet
Lecture Notes of Advanced Probability
101 pages
M135 Notes
No ratings yet
M135 Notes
204 pages
Uncertainty Theory Baoding Liu Fourth Edition
No ratings yet
Uncertainty Theory Baoding Liu Fourth Edition
491 pages
ECE311S Dynamic Systems & Control: Course Notes by Bruce Francis January 2010
No ratings yet
ECE311S Dynamic Systems & Control: Course Notes by Bruce Francis January 2010
116 pages
4b_ProbabilityNotes
No ratings yet
4b_ProbabilityNotes
79 pages
JB Ies 110 PDF
100% (1)
JB Ies 110 PDF
459 pages
STATS Textbook
100% (1)
STATS Textbook
459 pages
Calculus 1 New Lecture Notes
No ratings yet
Calculus 1 New Lecture Notes
220 pages
STAT 251 Course Text
No ratings yet
STAT 251 Course Text
179 pages
Summary I 2018-2019
No ratings yet
Summary I 2018-2019
72 pages
10 1 1 672 7118 PDF
No ratings yet
10 1 1 672 7118 PDF
35 pages
ProbStochProc 1.42 NoSolns PDF
No ratings yet
ProbStochProc 1.42 NoSolns PDF
241 pages
Hofman Notes
No ratings yet
Hofman Notes
114 pages
Probability
No ratings yet
Probability
180 pages
(Ebook) Deep learning by Goodfellow I., Bengio Y., Courville A ISBN 9781901962031, 1901962032 instant download
No ratings yet
(Ebook) Deep learning by Goodfellow I., Bengio Y., Courville A ISBN 9781901962031, 1901962032 instant download
61 pages
MATH1023+1024
No ratings yet
MATH1023+1024
397 pages
Financial Mathematics
No ratings yet
Financial Mathematics
229 pages
Lectnotemat 2
No ratings yet
Lectnotemat 2
348 pages
Introduction To Statistical Thought
100% (2)
Introduction To Statistical Thought
393 pages
Control Systems
From Everand
Control Systems
Francisco Luis Pagola y de las Heras
No ratings yet
Quasi-Monte Carlo Methods in Finance: With Application to Optimal Asset Allocation
From Everand
Quasi-Monte Carlo Methods in Finance: With Application to Optimal Asset Allocation
Mario Rometsch
No ratings yet
The Finite Element Method for Three-Dimensional Thermomechanical Applications
From Everand
The Finite Element Method for Three-Dimensional Thermomechanical Applications
Guido Dhondt
No ratings yet
The Satisfiability Problem: Algorithms and Analyses
From Everand
The Satisfiability Problem: Algorithms and Analyses
Uwe Schöning
No ratings yet
Disturbance rejection control for bipedal robot walkers
From Everand
Disturbance rejection control for bipedal robot walkers
Jaime Arcos Legarda
No ratings yet
Stochastic Processes for Insurance and Finance
From Everand
Stochastic Processes for Insurance and Finance
Tomasz Rolski
No ratings yet
Prediction of Burnout: An Artificial Neural Network Approach
From Everand
Prediction of Burnout: An Artificial Neural Network Approach
Felix Ladstätter
No ratings yet
Open Data Structures: An Introduction
From Everand
Open Data Structures: An Introduction
Pat Morin
4/5 (4)
Modified Slump Test
No ratings yet
Modified Slump Test
13 pages
Boh Philippines Black Box - 2019.07.31
No ratings yet
Boh Philippines Black Box - 2019.07.31
289 pages
Work Book HVAC
No ratings yet
Work Book HVAC
24 pages
Egg Drop Project 01
No ratings yet
Egg Drop Project 01
7 pages
WIND98 v3-0: Wind Load Design Per ASCE 7-98
No ratings yet
WIND98 v3-0: Wind Load Design Per ASCE 7-98
4 pages
Rigorous Combined Mode Matching Intgeral Eqn Analysis of Horn Antenna
No ratings yet
Rigorous Combined Mode Matching Intgeral Eqn Analysis of Horn Antenna
8 pages
Tire Modeling
No ratings yet
Tire Modeling
76 pages
Astm E797-05
No ratings yet
Astm E797-05
7 pages
VSP & Its Types
100% (1)
VSP & Its Types
2 pages
Application of Optimal Homotopy Asymptotic Method For Solving Nonlinear Equations Arising in Heat Transfer
No ratings yet
Application of Optimal Homotopy Asymptotic Method For Solving Nonlinear Equations Arising in Heat Transfer
6 pages
Research Compendium: Technological Institute of The Philippines-Manila Campus
No ratings yet
Research Compendium: Technological Institute of The Philippines-Manila Campus
17 pages
ES-3003 Heat Transfer: Gauss-Seidel Iteration: Example 8-1
No ratings yet
ES-3003 Heat Transfer: Gauss-Seidel Iteration: Example 8-1
7 pages
Ter Mopar Magnetism o
No ratings yet
Ter Mopar Magnetism o
2 pages
Bernoulli's Equation Report
No ratings yet
Bernoulli's Equation Report
7 pages
Thesis Writing: Prof. Abd Karim Alias
No ratings yet
Thesis Writing: Prof. Abd Karim Alias
57 pages
Mechanics of Laminated Composite Structures
No ratings yet
Mechanics of Laminated Composite Structures
12 pages
Max 5.2 Surfactant Brochure PDF
100% (1)
Max 5.2 Surfactant Brochure PDF
16 pages
Conducting Cyclic Potentiodynamic Polarization Measurements To Determine The Corrosion Susceptibility of Small Implant Devices
No ratings yet
Conducting Cyclic Potentiodynamic Polarization Measurements To Determine The Corrosion Susceptibility of Small Implant Devices
9 pages
Vehicle Impact Velocity Prediction From Pedestrian Throw Distance
No ratings yet
Vehicle Impact Velocity Prediction From Pedestrian Throw Distance
14 pages
Instrumentacion Sismica
No ratings yet
Instrumentacion Sismica
45 pages
The Design of Foundations For High Rise Buildings Harry Poulos PDF
100% (1)
The Design of Foundations For High Rise Buildings Harry Poulos PDF
6 pages
What Is A Cathode Ray Tube
No ratings yet
What Is A Cathode Ray Tube
2 pages
Austenitic Cast Specs
No ratings yet
Austenitic Cast Specs
2 pages
10 Coolest Numbers
No ratings yet
10 Coolest Numbers
6 pages
Tafa Coating
No ratings yet
Tafa Coating
10 pages
FGG
No ratings yet
FGG
6 pages