0% found this document useful (0 votes)
54 views

Stat2602b Topic 1

The document provides information about moment-generating functions (MGFs) including their definition, properties, and uses. Key points: - The MGF of a random variable X is defined as the expected value of e to the tX, where t is a real number. - MGFs can be used to find moments like the mean and variance of a distribution. Taking derivatives of the MGF and evaluating at t=0 gives the moments. - If X1+X2+...+Xn is the sum of independent random variables, the MGF of the sum is the product of the individual MGFs. - MGFs provide a one-to-one correspondence between

Uploaded by

wingszet24
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
54 views

Stat2602b Topic 1

The document provides information about moment-generating functions (MGFs) including their definition, properties, and uses. Key points: - The MGF of a random variable X is defined as the expected value of e to the tX, where t is a real number. - MGFs can be used to find moments like the mean and variance of a distribution. Taking derivatives of the MGF and evaluating at t=0 gives the moments. - If X1+X2+...+Xn is the sum of independent random variables, the MGF of the sum is the product of the individual MGFs. - MGFs provide a one-to-one correspondence between

Uploaded by

wingszet24
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 27

The University of Hong Kong

Department of Statistics and Actuarial Science


STAT2602B Probability and Statistics II
Semester 2 2023/2024
Topic 1: Sampling Distributions

1 Moment-generating Functions
Definition 1.1. The moment-generating function (MGF) of a random variable X
is defined as
MX (t) = E(etX )
such that the right-hand side exists for t ∈ R . We may also call MX (t) the moment-
generating function of the distribution followed by X.
Remarks:
ˆ MX (0) = E(e0 ) = 1.

ˆ The moment-generating function of a discrete random variable X is


X
MX (t) = etx f (x)
x

and that of a continuous random variable X is


Z ∞
MX (t) = etx f (x) dx.
−∞

ˆ The subscript X indicates it is the moment-generating function of the random


variable X and can be dropped.
Example 1.1. The moment-generating function of a random variable X having a
Poisson distribution with mean λ is
∞ ∞ −λ x ∞
X X
tx e λ −λ
X (λet )x t
MX (t) = E(e ) = tX tx
e P(X = x) = e =e = e−λ eλe
x=0 x=0
x! x=0
x!
t
= eλ(e −1)

Theorem 1.1. Suppose MX (t) exists. Then,



X
tr
1. MX (t) = r!
E (X r );
r=0

dr
 
2. MX (t) = E (X r ) for r = 1, 2, . . .;
dtr t=0

3. For constants a and b,

MaX+b (t) = ebt MX (at) .

1
STAT2602B TST23/24 Topic 1

Proof. 1. For discrete random variables,


∞ ∞ r X
X
tx
XX (tx)r X t
MX (t) = e P(X = x) = P(X = x) = xr P(X = x)
x x r=0
r! r=0
r! x
∞ r
X t
= E (X r ) ,
r=0
r!

For continuous random variables, use integrals instead of sums.

2. Consider the Taylor’s series expansion of the exponential function etx ,



tx
X (tx)k 1 2 2 1 3 3 1
e = = 1 + tx + t x + t x + . . . + tr xr + . . . .
k=0
k! 2! 3! r!

By definition,
1 2  1 1
t E X 2 + t3 E X 3 + . . . + tr E (X r ) + . . . .

MX (t) = 1 + tE (X) +
2! 3! r!
Consider differentiation with respect to t,
d  1 1
MX (t) = E (X) + tE X 2 + t2 E X 3 + . . . + tr−1 E (X r ) + . . . ,

dt 2! (r − 1)!
d2 2 3 1
tr−2 E (X r ) + . . . .
 
MX (t) = E X + tE X + . . . +
dt2 (r − 2)!
In general,
dr
MX (t) = E (X r ) + sum of terms in t.
dtr
Substituting t = 0, we can obtain, for example
d d2
= E X2 .

MX (t) = E (X) , MX (t)
dt t=0 dt2 t=0

In general, we have
d
MX (t) = E (X r ) .
dtr t=0

3. By definition,

MaX+b (t) = E e(aX+b)t = ebt E(eatX ) = ebt MX (at).


 

Example 1.2. Suppose that X is a continuous random variable with a probability


density function given by
 −x
e , for x > 0;
f (x) =
0, otherwise.

2
STAT2602B TST23/24 Topic 1

Find the moment-generating function of X and hence find the mean and the vari-
ance of X.

The moment-generating function of X is


Z ∞ Z ∞ Z ∞
tX tx tx −x
etx−x dx

MX (t) = E e = e f (x) dx = e e dx =
−∞ 0 0
Z ∞  ∞
1 −(1−t)x
= e−(1−t)x dx = − e
0 1−t 0
1
=
1−t
which is valid if 1 − t > 0 ⇒ t < 1. Consider the differentiation of MX (t) with
respect to t,
d 1 d2 2
MX (t) = 2 , MX (t) = .
dt (1 − t) dt2 (1 − t)3
Recall that
d 1 d (1 − t) d n
(1 − t)−n = −1 × −n (1 − t)−n−1 =
 
n = .
dt (1 − t) dt d (1 − t) (1 − t)n+1
Substituting t = 0 gives
d 1 d2 2
MX (t) = = 1, MX (t) = = 2.
dt t=0 (1 − 0)2 dt2 t=0 (1 − 0)3
Hence, the first two moments of X and its mean and variance are, respectively
d d2
= 1, E X 2 = 2 MX (t)

E (X) = MX (t) = 2,
dt t=0 dt t=0
µ = E (X) = 1, σ 2 = E X 2 − µ2 = 2 − 12 = 1.


Theorem 1.2. If X1 , X2 , . . . , Xn are independent random variables, MXi (t) exists


for i = 1, 2, . . . , n, and Y = X1 + X2 + · · · + Xn , then MY (t) exists and
n
Y
MY (t) = MXi (t).
i=1

Proof. Suppose all Xi ’s are continuous random variables and their marginal proba-
bility density functions are f1 (x1 ),f2 (x2 ), ..., fn (xn ), respectively. Then,
MY (t) = E e(X1 +X2 +···+Xn )t
 
Z ∞ Z ∞Z ∞
= ··· e(x1 +x2 +···+xn )t f1 (x1 )f2 (x2 ) · · · fn (xn )dx1 dx2 · · · dxn
Z−∞

−∞ −∞
Z ∞ Z ∞
x1 t x2 t
= e f1 (x1 )dx1 e f2 (x2 )dx2 · · · exn t fn (xn )dxn
−∞ −∞ −∞
= MX1 (t)MX2 (t) · · · MXn (t).
The proof for discrete case is similar and is omitted.

3
STAT2602B TST23/24 Topic 1

Theorem 1.3. For those probability distributions which moment-generating func-


tions exist, there is a one-to-one correspondence between moment-generating func-
tions and probability distributions. That is,

MX (t) = MY (t) ⇒ fX (x) = fY (y) .

Then, MX (t) completely determines the distribution of X.

Example 1.3. Find the probability distribution of the sum of n independent ran-
dom variables X1 , X2 , . . . , Xn having Poisson distributions with means λ1 , λ2 , . . . , λn
respectively.

Let Y = X1 + X2 + · · · + Xn . Then,
n n
Y Y t t
Pn
MY (t) = MXi (t) = eλi (e −1) = e(e −1) i=1 λi
,
i=1 i=1

which
Pn is the moment-generating function of a Poisson random variable with mean
Pni=1 λi . Therefore, by Theorem 1.3, Y has the Poisson distribution with mean
i=1 λi .

Example 1.4. For positive numbers α and λ, find the moment-generating function
of a gamma distribution Gamma(α, λ) of which the probability density function is
( α α−1 −λx
λ x e
Γ(α)
, for x > 0;
f (x) =
0, otherwise.

The moment-generating function of a gamma distribution is


Z ∞ α α−1 −λx Z ∞ α
tx λ x e λ
tX
MX (t) = E(e ) = e dx = xα−1 e−(λ−t)x dx
0 Γ(α) 0 Γ(α)
Z ∞
λα (λ − t)α α−1 −(λ−t)x
= x e dx
(λ − t)α 0 Γ(α)
λα
=
(λ − t)α

for t < λ, where



(λ − t)α α−1 −(λ−t)x
Z
x e dx = 1
0 Γ(α)
is due to the fact that
(λ − t)α α−1 −(λ−t)x
x e for x > 0
Γ(α)

is the probability density function of a Gamma(α, λ − t) distribution.

4
STAT2602B TST23/24 Topic 1

Example 1.5. Show that the sum of n independent random variables X1 , X2 , . . . , Xn


each having a Bernoulli distribution with parameter p has Bin(n, p), the binomial
distribution with parameters n and p.

For each i = 1, 2, . . . , n, the moment-generating function of Xi is


MXi (t) = e0t P(Xi = 0) + e1t P(Xi = 1) = (1 − p) + et p = 1 − p + pet
and hence the moment-generating function of X1 + X2 + · · · + Xn is
n
Y
MXi (t) = (1 − p + pet )n .
i=1

The moment-generating function of Bin(n, p) is


n
X n
X
tx
e Cxn px (1 − p)n−x
= Cxn (pet )x (1 − p)n−x = (pet + 1 − p)n .
x=0 x=0

Therefore, X1 + X2 + · · · + Xn has a Bin(n, p) distribution.

2 Simple Random Sampling


Descriptive statistics are tabular, graphical and numerical methods used to sum-
marise data whereas inferential statistics are statistical methods using data based
on a small group of elements (the sample) to make decisions or predictions about the
characteristics of a whole group of elements of interest (the population). A numer-
ical characteristic of a population (for example, population mean and population
variance) is the parameter.
Definition 1.2. For a population consisting of data x1 , x2 , . . . , xN , where N is a
positive integer, the population mean, denoted by µ, is defined as
N
x1 + x2 + · · · + xN 1 X
µ= ≡ xi ,
N N i=1

and N is called the population size.


Definition 1.3. For a sample consisting of data x1 , x2 , . . . , xn , where n is a positive
integer, the sample mean, denoted by x, is defined as
n
x1 + x2 + · · · + xn 1X
x= ≡ xi ,
n n i=1

and n is called the sample size.


Definition 1.4. For a population, the population variance, denoted by σ 2 , is defined
as !
N N N
1 X 1 X 1 X 2
σ2 = (xi − µ)2 = x2i − N µ2 = x − µ2 .
N i=1 N i=1 N i=1 i

5
STAT2602B TST23/24 Topic 1

Definition 1.5. The population standard deviation σ is the non-negative square


root of the population variance.
Definition 1.6. For a sample of size n ≥ 2, the sample variance, denoted by s2 , is
defined as n
2 1 X
s = (xi − x)2 ,
n − 1 i=1
which equals !
n
1 X
x2i − nx2 .
n−1 i=1

Definition 1.7. The sample standard deviation s is the non-negative square root
of the sample variance.
Remarks: The standard deviation is measured in the same units as the data, making
it more easily comparable, than the variance, to the mean.
Example 1.6. Suppose we want to know the mean income of all households in a
city. The population is therefore the incomes of all households in the city. We then
select randomly a household from the city and record its income. In this random
experiment, the sample space is the city and the sampling units are the households in
the city. The income corresponds to the outcome (of a household randomly selected)
of the experiment. Therefore we say that the income of a household randomly
selected is a random variable. For convenience, we define the distribution of the
population as the distribution of this random variable.
Definition 1.8. Simple random sampling is a process that all the samples with
the same size are equally likely to be chosen. A simple random sample is a sample
selected using a simple random sampling plan.
Example 1.7. To conduct random sampling, assign a number to each element of the
chosen population (or use already given numbers). Then, randomly select numbers
using a random number table or a software package.
Remarks: In this course, sampling means simple random sampling. Two common
sampling schemes include
ˆ sampling with replacement: replacing (putting back) each sampled element
before selecting subsequent elements.
ˆ sampling without replacement: not replacing any sampled element before se-
lecting subsequent elements.
Example 1.8. Suppose we have a population: −1, 1, 5 and 11. Then,
−1 + 1 + 5 + 11
µ = = 4,
4
(−1 − 4)2 + (1 − 4)2 + (5 − 4)2 + (11 − 4)2
σ2 = = 21.
4
The following table shows all possible samples and their means and variances for
sampling without replacement when the sample size is 2.

6
STAT2602B TST23/24 Topic 1

Sample Sample mean Sample variance


1 {−1, 1} 0 2
2 {−1, 5} 2 18
3 {−1, 11} 5 72
4 {1, −1} 0 2
5 {1, 5} 3 8
6 {1, 11} 6 50
7 {5, −1} 2 18
8 {5, 1} 3 8
9 {5, 11} 8 18
10 {11, −1} 5 72
11 {11, 1} 6 50
12 {11, 5} 8 18
Average 4 28

Below are some details of the calculations. For sample 1,


−1 + 1
sample mean = = 0,
2
(−1 − 0)2 + (1 − 0)2
sample variance = = 2.
2−1
For sample 2,
−1 + 5
sample mean = = 2,
2
(−1 − 2)2 + (5 − 2)2
sample variance = = 18.
2−1
The next table shows all possible samples and their means and variances for
sampling with replacement when the sample size is 2.

7
STAT2602B TST23/24 Topic 1

Sample Sample mean Sample variance


1 {−1, −1} −1 0
2 {−1, 1} 0 2
3 {−1, 5} 2 18
4 {−1, 11} 5 72
5 {1, −1} 0 2
6 {1, 1} 1 0
7 {1, 5} 3 8
8 {1, 11} 6 50
9 {5, −1} 2 18
10 {5, 1} 3 8
11 {5, 5} 5 0
12 {5, 11} 8 18
13 {11, −1} 5 72
14 {11, 1} 6 50
15 {11, 5} 8 18
16 {11, 11} 11 0
Average 4 21
Remarks:
ˆ For a particular population and a fixed sample size n, the sample mean depends
on the sample, which is drawn randomly. Therefore the sample mean may be
treated as a random variable. When the sample mean is treated as a random
variable, it is denoted by X. The value of the mean of a particular sample is
denoted by x, which is only one of the realizations of (possible values taken
on by) X. For the same reason, the sample variance and the sample standard
deviation are denoted respectively by S 2 and S when being treated as random
variables, and the values of the sample variance and the sample standard
deviation of a particular sample are denoted respectively by s2 and s.
ˆ Sampling without replacement is the procedure used most often. If the popu-
lation size is very large, the results of sampling with and without replacement
will have very small differences.
ˆ From now on, unless otherwise specified, all theoretical results in this course
are based on simple random sampling with replacement, and we will assume in
numerical examples that population sizes are very large so that these theoreti-
cal results can also be applied even without mentioning whether the sampling
is to be carried out with or without replacement.

3 Sampling Distributions of Statistics


Definition 1.9. Random sample of size n taken from a distribution (or population,
respectively) is a set of n independent random variables X1 , X2 , . . . , Xn each having
the distribution (or same distribution as the population, respectively).

8
STAT2602B TST23/24 Topic 1

Remarks: This definition corresponds to sampling with replacement.


Definition 1.10. Statistic is a function of X1 , X2 , . . . , Xn that does not depend on
any unknown parameter.
Example 1.9. Sample mean
n
1X
X= Xi
n i=1
is an example of statistic. Sample variance
n n
!
1 X 1 X 2
S2 = (Xi − X)2 = Xi2 − nX .
n − 1 i=1 n−1 i=1

is another example of statistic.


Theorem 1.4. If X1 , X2 , . . . , Xn constitute a random sample from a population
with mean µ and variance σ 2 , then
σ2
E(X) = µ, Var(X) = and E(S 2 ) = σ 2 .
n
Proof.
n
! n
1X 1X 1
E(X) = E Xi = E(Xi ) = nµ = µ.
n i=1 n i=1 n
n
! n
1X 1 X nσ 2 σ2
Var(X) = Var Xi = 2 Var(Xi ) = 2 = .
n i=1 n i=1 n n

By symmetry, for i = 1, 2, . . . , n,
E (Xi − X)2
 
 
 2
 X1 + X2 + · · · + X n
= E (X1 − X) = Var(X1 − X) = Var X1 −
n
n
!
2 2
(n − 1)X1 X Xi (n − 1) 2 σ
= Var − = 2
σ + (n − 1) 2
n i=2
n n n
n−1 2
= σ .
n
Therefore,
" n
# n
1 X 1 X 
E(S 2 ) = E (Xi − X)2 = E (Xi − X)2

n − 1 i=1 n − 1 i=1
1 n−1 2
= ·n· σ
n−1 n
= σ2.

9
STAT2602B TST23/24 Topic 1

2
Remarks: For convenience, E(X) and Var(X) may be written as µX and σX , re-
2
spectively. σX , the square root of σX , is called the standard error of the sample
mean.
Example 1.10. Recall Example 1.8,
E(X) = 4 = µ for sampling with or without replacement,
E(S 2 ) = 28 ̸= σ 2 for sampling without replacement,
E(S 2 ) = 21 = σ 2 for sampling with replacement,
The variance of the sample mean is
(−1 − 4)2 + (0 − 4)2 + (2 − 4)2 + · · · + (8 − 4)2 + (11 − 4)2
Var(X) =
16
21 σ2
= 10.5 = =
2 n
for sampling with replacement.
Theorem 1.5. If a population has N(µ, σ2 ), then
 the mean X of a sample of size
σ2
n drawn from the population will have N µ, , that is,
n
X −µ
√ ∼ N 0, 12 .

σ/ n
Proof. The moment-generating function of N(µ, σ 2 ) is
Z ∞
1 (x−µ)2
tX
E(e ) = etx √ e− 2σ2 dx
−∞ 2πσ
Z ∞
−2σ 2 tx + x2 − 2µx + µ2
 
1
= √ exp dx
−∞ 2πσ −2σ 2
 Z ∞
−2µσ 2 t − σ 4 t2 (x − µ − σ 2 t)2
  
1
= exp × √ exp dx
−2σ 2 −∞ 2πσ −2σ 2
 
1 22
= exp µt + σ t
2
because
(x − µ − σ 2 t)2
 
1
√ exp
2πσ −2σ 2
is the probability density function of N(µ + σ 2 t, σ 2 ).
Let {X1 , X2 , . . . , Xn } be a random sample drawnP from the normal population.
Then the moment-generating function of X = n1 ni=1 Xi is (by Theorem 1.1 and
Theorem 1.2.)
    n "    2 !#n
t t t 1 t
MX (t) = MPni=1 Xi = MX1 = exp µ + σ2
n n n 2 n
1 σ2 2
 
= exp µt + t ,
2 n

10
STAT2602B TST23/24 Topic 1

 2

which is the moment-generating function of N µ, σn . Therefore (by Theorem 1.3)
 
σ2
X has N µ, n .
Example 1.11. Suppose a population has N(µ, 202 ). Let n be the size of a sample.

Figure 1: Probability density functions of the sample mean for different same sizes.

Example 1.12. A soft-drink vending machine is set so that the amount of drink
dispensed is a normal random variable with a mean of 200 millilitres and a stan-
dard deviation of 15 millilitres. What is the probability that the average amount
dispensed in a random sample of size 36 is at least 204 millilitres?
 2

By Theorem 1.5, the sample mean X has N 200, 15 36
, that is, N(200, 2.52 ).
 
 X − 200 204 − 200
P X ≥ 204 = P ≥ = P (Z ≥ 1.6)
2.5 2.5
= P (Z > 0) − P(0 < Z < 1.6) = 0.5 − 0.4452
= 0.0548
Remarks: When n = 1, X has N (200, 152 ) and
 
 X − 200 204 − 200
P X ≥ 204 = P ≥ = P (Z ≥ 0.27)
15 15
= P (Z > 0) − P (0 < Z < 0.27) = 0.5 − 0.1064
= 0.3936.
Theorem 1.6. Suppose
Pn X21 , X2 , . . . , Xn are independent2 random variables each hav-
ing N(0, 1). Then, i=1 Xi has Gamma (n/2, 1/2) or χ (n).
Proof. The moment-generating function of X12 is
Z ∞ Z ∞
(1 − 2t)x2
 
tx2 1 1
2
tX12 − x2
E(e ) = e √ e dx = √ exp − dx
−∞ 2π −∞ 2π 2
Z ∞
x2
 
1 1
= √ × exp − dx
2(1 − 2t)−1
p
1 − 2t −∞ 2π(1 − 2t)−1
1
= √
1 − 2t

11
STAT2602B TST23/24 Topic 1

for t < 1/2 because


x2
 
1
exp −
2(1 − 2t)−1
p
2π(1 − 2t)−1

is the probability density function of N (0, (1 − 2t)−1 ). P


By Theorem 1.2, the moment-generating function of ni=1 Xi2 is
n 1 n/2
 
1 1 2 1
√ = n/2
= n/2
for t < ,
1 − 2t (1 − 2t) 1 2

2
−t

which is the moment-generating function of Gamma (n/2, 1/2) (refer to Example
1.4). Therefore, ni=1 Xi2 has Gamma n2 , 12 by Theorem 1.3.
P

Gamma (n/2, 1/2) is also called the chi-squared distribution with n degrees of
freedom (denoted by χ2 (n) or χ2n ), its mean is n and its variance is 2n.
Example 1.13. Figure 2 shows the probability density functions of some χ2 (df ).

Figure 2: Probability density functions of χ2 (df ).

Theorem 1.7. Suppose there are two independent random variables X and Y ,
with X having N(0, 1) and Y having χ2 (n). Then the following gives a probability
X
density function of T = p :
Y /n
 n+1
Γ n+1
 
2 t2 2
fT (t) = √  1+ ,
πnΓ n2 n
R∞
for −∞ < t < ∞, where Γ(y) = 0 uy−1 e−u du for y > 0 is the gamma function.

12
STAT2602B TST23/24 Topic 1

X
Remarks: We say that p has the Student’s t-distribution with n degrees of
Y /n
freedom (denoted by t (n) or tn ).
Proof. Exercise question 16.

Example 1.14. Figure 3 shows the probability density functions of N(0, 1) and
some t (n).

Figure 3: Probability density functions of N(0, 1) and t (n).

Theorem 1.8. Suppose there are two independent random variables U and V , U
has χ2 (m) and V has χ2 (n). Then the following gives a probability density function
U/m
of W = :
V /n
m+n
  m
m − m+n

 Γ  2  m 2 m2 −1
 2
m n
w 1+ w , for w > 0;
fW (w) = Γ 2 Γ 2 n n

0, otherwise.

Remarks:
U/m
ˆ We say that has the F -distribution with m numerator degrees of freedom
V /n
and n denominator degrees of freedom (denoted by F (m, n) or Fm,n ).

ˆ If T has t (n), then T 2 has F (1, n).

Proof. Exercise question 18.

Example 1.15. Figure 4 shows the probability density functions of some F (m, n)

Lemma 1. Suppose X and Y are independent, X has χ2 (m) and (X + Y ) has


χ2 (n) with n > m, where m and n are positive integers. Then Y has χ2 (n − m).

13
STAT2602B TST23/24 Topic 1

Figure 4: Probability density functions of some F (m, n).

Proof. Assume that MY (t) exists. We make use of the following moment-generating
functions
1 m/2

2 1
MX (t) = m/2 for t < ,
1
−t 2
2
1 n/2

2 1
MX+Y (t) = n/2
for t < .
1 2

−t 2

By Theorem 1.2,
1 n/2 1 m/2 1 (n−m)/2
 ,  
MX+Y (t) 2 2 2
MY (t) = = n/2 m/2 = (n−m)/2
MX (t) 1
−t 1
−t 1
−t
2 2 2

for t < 1/2. Therefore, Y has χ2 (n − m).


Pn
Lemma 2. Suppose x1 , x2 , . . . , xn and c are real numbers and x = i=1 xi /n. Then,
n
X n
X
2
(xi − c) = (xi − x)2 + n(x − c)2 .
i=1 i=1

Proof.
n
X n
X
2
(xi − c) = [(xi − x) + (x − c)]2
i=1 i=1
n
X n
X n
X
2
= (xi − x) + 2(x − c) (xi − x) + (x − c)2
i=1 i=1 i=1
Xn
= (xi − x)2 + 0 + n(x − c)2 .
i=1

14
STAT2602B TST23/24 Topic 1

Theorem 1.9. Suppose X1 , X2 , . . . , Xn constitute a random sample from a popu-


lation having N(µ, σ 2 ). Let X and S 2 be the sample mean and the sample variance
respectively. Then,

1. X and S 2 are independent;


(n − 1)S 2
2. has χ2 (n − 1);
σ2
X −µ
3. √ has t(n − 1).
S/ n
Proof. 1. Exercise question 15 (a special case only).

2. By Lemma 2,
n
X n
X
(Xi − µ)2 = (Xi − X)2 + n(X − µ)2 = (n − 1)S 2 + n(X − µ)2 .
i=1 i=1

Then,
n  2 2
(n − 1)S 2

X Xi − µ X −µ
= + √ .
i=1
σ σ2 σ/ n

Xi − µ X −µ
Since has N(0, 1) for i = 1, 2, . . . , n and √ has N(0, 1), we know
σ σ/ n
n  2  2
X Xi − µ 2 X −µ
that has χ (n) and √ has χ2 (1) by Theorem 1.6.
i=1
σ σ/ n
2
(n − 1)S 2

X −µ
Also, √ and are independent by the first result. Therefore,
σ/ n σ2
(n − 1)S 2
2
has χ2 (n − 1) by Lemma 1.
σ
3. Consider
X−µ
√ X−µ

X −µ σ/ n σ/ n
√ = S
=q ,
S/ n σ
(n−1)S 2 /σ 2
n−1

X −µ (n − 1)S 2 X −µ
where √ and 2
are independent by the first result, √ has
σ/ n σ σ/ n
(n − 1)S 2
N(0, 1) (by Theorem 1.5) and 2
has χ2 (n − 1) (by (2)). Therefore,
σ
X −µ
√ has t(n − 1) by Theorem 1.7.
S/ n

Theorem 1.10. Suppose X1 , X2 , . . . , Xn1 constitute a random sample from a pop-


ulation having N(µ1 , σ12 ), Y1 , Y2 , . . . , Yn2 constitute a random sample from a popula-
tion having N(µ2 , σ22 ), and the (n1 + n2 ) random variables X1 , X2 , . . . , Xn1 , Y1 , Y2 , . . . , Yn2

15
STAT2602B TST23/24 Topic 1

are independent. Let S12 be the sample variance of the first sample and S22 be that
S 2 /σ 2
of the second sample. Then, 12 12 has F (n1 − 1, n2 − 1).
S2 /σ2
(n1 − 1)S12 2 (n2 − 1)S22
Proof. By Theorem 1.9, has χ (n 1 − 1) and has χ2 (n2 − 1).
σ12 σ22
Then, by Theorem 1.8,
.
(n1 −1)S12
S12 /σ12 σ12
(n1 − 1)
=
S22 /σ22 2
.
(n2 −1)S2
σ22
(n2 − 1)

has F (n1 − 1, n2 − 1).


Lemma 3. (Converges in distribution/weak convergence) If

1. MXn (t), the moment-generating function of Xn , exists, n = 1, 2, . . .,

2. lim MXn (t) exists and equals the moment-generating function of a random
n→∞
variable Y ,

then
lim FXn (x) = FY (x) for all x at which FY (x) is continuous,
n→∞

where FXn (x) is the distribution function of Xn , n = 1, 2, . . ., and FY (x) is the


distribution function of Y .

Theorem 1.11. Suppose there is a population with mean µ and variance σ 2 > 0.
Let X be the mean of a  random sample of size n drawn from the population. Then,
X −µ X −µ
for all real number x, P √ ≤ x (that is, the distribution function of √ )
σ/ n σ/ n
tends to the distribution function of N(0, 1) when n → ∞.

X −µ
Remarks: How large should n be so that one can say that √ has approximately
σ/ n
N(0, 1)? The criterion varies from case to case. Some books say that a sample of
size 30 is large enough, but this is actually not appropriate for many cases.
Xi − µ
Proof. Let Yi = , then E(Yi ) = 0 and Var(Yi ) = 1 and suppose the moment-
σ
generating function MYi (t) exists. A Taylor’s series expansion of MYi (t) around t = 0
gives:

t2 ′′
MYi (t) = MYi (0) + tMY′ i (0) + M (ϵ), for some 0 ≤ ϵ ≤ t.
2 Yi
Let √ n
n(X − µ) 1 X
Zn = = √ Yi ,
σ n i=1

16
STAT2602B TST23/24 Topic 1

then the moment-generating function of Zn is thus given by


n n  √ n
(t/ n)2 ′′
   
Y t t t ′
MZn (t) = MYi √ = MYi √ = MYi (0) + √ MYi (0) + MYi (ϵ)
i=1
n n n 2
n
t2 ′′

t
= 1 + √ E(Yi ) + MYi (ϵ)
n 2n
 2
n
t
= 1 + MY′′i (ϵ) .
2n
As n → ∞, ϵ → 0, MY′′i (ϵ) → MY′′i (0) = E(Yi2 ) = 1. Hence,
n n
t2 ′′ t2
 
lim MZn (t) = lim 1 + MYi (ϵ) = lim 1 +
n→∞ n→∞ 2n n→∞ 2n
 2  
t 1 2
= exp = exp 0 × t + × 1 × t
2 2
which is the moment-generating function of a standard normal random variable. In
other words, Zn converges in distribution to Z where Z ∼ N(0, 1).
Example 1.16. Consider a Bernoulli random variable X such that
P (X = 1) = p and P (X = 0) = 1 − p.
Then µ = p and σ 2 = p (1 − p). Let {X1 , X2 , . . . , Xn } be the random sample of size
n drawn from the population of X. It is suggested that np and n(1 − p) should both
be greater than 5 in order to have the following result:
X −µ X −p nX − np
√ =q =p
σ/ n p(1−p) np (1 − p)
n

has approximately N(0, 1). Note that now nX = X1 +X2 +· · ·+Xn has the binomial
distribution with parameters n and p. Therefore we may also say that the normal
distribution can be used as an approximation to the binomial distribution when np
and n(1 − p) are both greater than 5.
Lemma 4.(Chebyshev’s theorem, Chebyshev’s inequality) For a random variable
X with mean µ and variance σ 2 ,
σ2
P (|X − µ| ≥ c) ≤ for any c > 0.
c2
Proof. Suppose X is a continuous random variable and its probability density func-
tion is f (x). Then,
Z ∞
2 2
(x − µ)2 f (x)dx
 
σ = E (X − µ) =
−∞
Z µ−c Z µ+c Z ∞
2 2
= (x − µ) f (x)dx + (x − µ) f (x)dx + (x − µ)2 f (x)dx
−∞ µ−c µ+c
Z µ−c Z ∞
≥ c2 f (x)dx + 0 + c2 f (x)dx
−∞ µ+c
= c2 P (|X − µ| ≥ c) .

17
STAT2602B TST23/24 Topic 1

The proof for discrete case is similar and is omitted.

Theorem 1.12 (Weak law of large numbers). Let X be the mean of a random
sample of size n from a population with mean µ and variance σ 2 . Then,

lim P |X − µ| ≥ c = 0 for any c > 0.
n→∞

Proof. For any c > 0, we have, from Lemma 4, that


 1 σ2
P |X − µ| ≥ c ≤ 2 Var(X) = 2 .
c nc
Hence, 
lim P |X − µ| ≥ c = 0 for any c > 0.
n→∞

Remarks:

ˆ We may state this result as “X converges in probability to µ”.

ˆ If Xn converges in distribution to X, it means that the behaviour of Xn is


getting closer and closer to that of X. It does not guarantee that the observed
value of Xn should often be close to the observed value of X,

ˆ If Xn converges in probability to X, it means that the observed value of Xn


is very likely to be arbitrarily close to the observed value of X.

ˆ Xn converges in probability to X implies Xn converges in distribution to X,


but not the other way around.

Example 1.17. Consider a Bernoulli random variable X such that

P (X = 1) = p and P (X = 0) = 1 − p.

Then µ = p and therefore X converges in probability to p as

¯ − p| ≥ c) ≤ lim 1 p (1 − p) = 0
lim P(|X for any c > 0.
n→∞ n→∞ c2 n

4 Order Statistics
Consider a random sample {X1 , X2 , . . . , Xn } from a population having a probability
density function. Note that

P (Xi = Xj ) = 0 for i ̸= j, i, j = 1, 2, . . . , n.

Order statistics are important in non-parametric inferences which were developed


to deal with the problem of violation of normal distributional assumption. Suppose

18
STAT2602B TST23/24 Topic 1

that we arrange in ascending order the values of X1 , X2 , . . . , Xn in a random sample


of size n from a population with distribution function F (x), we let

X(1) < X(2) < · · · < X(r) < · · · < X(n)

denote the order statistics of this sample. Here X(r) is called the r-th order statistic
from smallest to largest for r = 1, 2, . . . , n, with X(1) = min{X1 , X2 , . . . , Xn } and
X(n) = max{X1 , X2 , . . . , Xn }.

Example 1.18. Suppose {X1 , X2 , . . . , Xn } is a random sample from a population


with distribution function F (x). Find the distribution functions of

X(1) = min{X1 , X2 , . . . , Xn } and X(n) = max{X1 , X2 , . . . , Xn }.

Let U = min{X1 , X2 , . . . , Xn } and V = max{X1 , X2 , . . . , Xn }.

P (U ≤ u) = 1 − P (U > u) = 1 − P (X1 > u, X2 > u, . . . , Xn > u)


= 1 − P (X1 > u) P (X2 > u) · · · P (Xn > u)
= 1 − [1 − F (u)]n .
P (V ≤ v) = P (X1 ≤ v, X2 ≤ v, . . . , Xn ≤ v)
= P (X1 ≤ v) P (X2 ≤ v) · · · P (Xn ≤ v)
= [F (v)]n .
i.i.d.
Example 1.19. Suppose that X1 , X2 ∼ Uni(0, 1), then the order statistic X(2) is
given by 
X2 , if X1 ≤ X2 ;
X(2) =
X1 , if X1 > X2 .
Find the probability density function of X(2) .

The probability distribution of Y = X(2) is



G(y) = P X(2) ≤ y
= P (X2 ≤ y and X1 ≤ X2 ) + P (X1 ≤ y and X1 > X2 )
= P (X1 ≤ X2 ≤ y) + P (X2 < X1 ≤ y)
Z y Z x2 Z y Z x1
= 1dx1 dx2 + 1dx2 dx1
0 0 0 0
Z y Z y
= x2 dx2 + x1 dx1
0 0
= y2 for 0 < y < 1.

Hence, the probability density function of X(2) is obtained as

g(y) = G′ (y) = 2y, for 0 < y < 1.

19
STAT2602B TST23/24 Topic 1

Theorem 1.13. Suppose that {X1 , X2 , . . . , Xn } is a random sample of size n from


a continuous distribution with distribution function F (x) and probability density
function f (x) for a < x < b. Let X(1) < X(2) < · · · < X(n) be the order statistics
of the sample. Then for k = 1, 2, . . . , n, the probability density function of the k-th
order statistic X(k) is given by

n!
gk (y) = [F (y)]k−1 [1 − F (y)]n−k f (y), for a < y < b.
(k − 1)!(n − k)!

In particular, the probability density function of X(n) is

gn (y) = n [F (y)]n−1 f (y), for a < y < b;

and the probability density function of X(1) is

g1 (y) = n [1 − F (y)]n−1 f (y), for a < y < b.

Proof. According to the multinomial distribution, we have



P y < X(k) < y + h
gk (y) = lim
h→0 h
1 n!
= lim [F (y)]k−1 [F (y + h) − F (y)] [1 − F (y)]n−k
h→0 h (k − 1)!1!(n − k)!

n! F (y + h) − F (y)
= [F (y)]k−1 lim [1 − F (y)]n−k
(k − 1)!1!(n − k)! h→0 h
n!
= [F (y)]k−1 [1 − F (y)]n−k f (y)
(k − 1)!(n − k)!

Remarks: Using similar arguments as in Theorem 1.13, the joint probability density
function of X(i) and X(j) (i < j) can be obtained as

n!
gi,j (x, y) = [F (x)]i−1 [[F (y) − F (x)]j−i−1 [1 − F (y)]n−j f (x)f (y),
(i − 1)!(j − i − 1)!(n − j)!

where x ≤ y, and the joint probability density function of (X(1) , X(2) , · · · , X(n) ) is

n!f (x1 )f (x2 ) · · · f (xn ), if x1 ≤ x2 ≤ · · · ≤ xn ;
g1,2,··· ,n (x1 , x2 , · · · , xn ) =
0, otherwise.

20
The University of Hong Kong
Department of Statistics and Actuarial Science
STAT2602B Probability and Statistics II
Semester 2 2023/2024
Topic 1 Summary

1 Moment-generating Functions

(a) MX (t) = E etX
tx
 P
MX (t) = R ∞ x etx f (x) , if X is discrete,
−∞
e f (x) dx, if X is continuous.

(b) Properties

(i) MaX+b (t) = ebt MX (at)


d
(ii) E (X r ) = dtr
MX (t) t=0
(iii) MX (t) = MY (t) ⇔ fX (x) = fY (y)
Qn
(iv) If Xi ’s are independent for i = 1, . . . , n, then MPni=1 Xi (t) = i=1 MXi (t)

2 Sampling Distribution of the Sample Mean


(a) Any population distribution with finite mean µ and variance σ 2 :
  σ2
E X = µ, Var X = .
n
By the central limit theorem,

X − µ approx.
∼ N 0, 12 .


σ/ n

(b) The normal population with known variance σ 2 :

σ2
 
X −µ
√ ∼ N 0, 12 .

X ∼ N µ, ⇔
n σ/ n

(c) The normal population with unknown variance σ 2 :

X −µ
√ ∼ tn−1 .
S/ n

22
STAT2602B TST23/24 Topic 1 Summary

3 Sampling Distribution of the Sample Variance


(a) The normal population with unknown variance σ 2 :
n−1 2
S ∼ χ2n−1 .
σ2

(b) Two normal populations with unknown variances σ12 and σ22 :

S12 /σ12 σ22 S12


= ∼ Fn1 −1,n2 −1 .
S22 /σ22 σ12 S22

23
The University of Hong Kong
Department of Statistics and Actuarial Science
STAT2602B Probability and Statistics II
Semester 2 2023/2024
Topic 1 Exercise

1. Two random variables X and Y are independent and have probabilities given
by

P (X = −1) = 1/4, P (X = 0) = 1/2, P (X = 2) = 1/4,


P (Y = 3) = 1/3, P (Y = 4) = 2/3.

Let Z = X + Y .

(a) Find the probability distribution of Z.


(b) Find the moment-generating functions of X, Y and Z.
(c) Compare the products of the moment-generating functions of X and Y ,
and the moment-generating function of Z.

2. The moments of X are given by

E (X r ) = 0.8, r = 1, 2, . . . .

(a) Find the moment-generating function of X.


(b) What is the probability distribution of X?

3. Let MX (t) be the moment-generating function of the random variable X and


′ ′′
let RX (t) = ln MX (t). Show that RX (0) = µ and RX (0) = σ 2 . Use these
results to find the mean and the variance of a random variable X having the
moment-generating function

MX (t) = e4(e −1) .


t

4. Let X1 , X2 , . . . , Xn be independent random variables. Show that


n
Y
M Pn
i=1 Xi (t) = MXi (t).
i=1

5. Given the moment-generating functions of two discrete random variables, X


and Y .

MX (t) = 0.05et + 0.15e2t + 0.30e3t + 0.30e4t + 0.15e5t + 0.05e6t ,


MY (t) = 0.05et + 0.20e2t + 0.15e3t + 0.45e4t + 0.10e5t + 0.05e6t .

(a) Find the probability distributions of X and Y .

24
STAT2602B TST23/24 Topic 1 Exercise

(b) Obtain the mean, the variance, and the skewness of X and Y .

6. Given the probability distributions of two discrete random variables, X and


Y.

x -3 -2 -1 0 1 2 3
f (x) 0.06 0.09 0.10 0.50 0.10 0.09 0.06
y -3 -2 -1 0 1 2 3
f (y) 0.04 0.11 0.20 0.30 0.20 0.11 0.04

Obtain the mean, the variance, and the kurtosis of each of X and Y .

7. A random sample of size n = 100 is taken from a population with the mean
µ = 75 and the variance σ 2 = 256. Based on the central limit theorem, with
what probability can we assert that the value we obtain for X will fall between
67 and 83?

8. A random sample of size n = 225 is to be taken from the exponential popula-


tion with λ = 0.25. Based on the central limit theorem, what is the probability
that the mean of the sample will exceed 4.5?

9. Independent random samples of size 400 are taken from each of two populations
having equal means and the standard deviations σ1 = 20 and σ2 = 30. Using
the central limit theorem, what can we assert with a probability of at least
0.99 about an interval about the mean we will get for X 1 − X 2 ?

10. The actual proportion of families in a certain city who own, rather than rent,
their home is 0.70. If 84 families in this city are interviewed at random and
their responses to the question of whether they own their home are looked
upon as values of independent random variables having identical Bernoulli
distributions with the parameter p = 0.70, with what probability can we assert
that the value we obtain for the sample proportion will fall between 0.64 and
0.76, using the central limit theorem?

11. Suppose that Z follows the standard normal distribution. Find the probability
density function of Y = Z 2 by the transformation technique.

12. If X has the standard normal distribution, use the distribution function tech-
nique to find the probability density function of Z = X 2 .

13. Let X1 , X2 , . . . , Xn be independent random variables. If each of X1 , X2 , . . . , Xn


follows the normal distribution where Xi ∼ N (µi , σi2 ) for i = 1, 2, . . . , n. Con-
sider Y = X1 + X2 + . . . + Xn .

(a) Show that Y has the normal distribution, where


n n
!
X X
Y ∼N µi , σi2 .
i=1 i=1

25
STAT2602B TST23/24 Topic 1 Exercise

(b) Show that the mean Y /n has the normal distribution, where
n n
!
Y 1X 1 X 2
∼N µi , 2 σ .
n n i=1 n i=1 i

(c) Further assume that X1 , X2 , . . . , Xn are identically distributed. That


is, Xi ∼ N (µ, σ 2 ) for i = 1, 2, . . . , n. Using (a) and (b), show that
Y and Y /n have the normal distributions where Y ∼ N (nµ, nσ 2 ) and
Y /n ∼ N (µ, σ 2 /n).
14. Suppose that each of the independent random variables Z1 , Z2 , . . . , Zn follows
the standard normal distribution, where Zi ∼ N (0, 12 ) for i = 1, 2, . . . , n.
Show that Y = Z12 + Z22 + . . . + Zn2 has the chi-squared distribution with
degrees of freedom equal to n. That is, Y ∼ χ2n .
15. If X1 and X2 are independent random variables both having the standard
normal distribution, show that
(a) the joint probability density function of X1 and X̄ is
1 −x̄2 −(x1 −x̄)2
f (x1 , x̄) = e e
π
for −∞ < x1 < ∞ and −∞ < x̄ < ∞;
(b) the joint probability density function of U = X1 − X̄ and X̄ is
2 −(x̄2 +u2 )
f (u, x̄) = e
π
for u > 0 and −∞ < x̄ < ∞;
2
(c) the sample variance is S 2 = 2 X1 − X̄ = 2U 2 ;
(d) the joint probability density function of X̄ and S 2 is
1 − 1 1 2 1 2
f x̄, s2 = √ s2 2 e− 2 s × √ e−x̄

2π π
for s2 > 0 and −∞ < x̄ < ∞.
16. Suppose that Y and Z are independent random variables that Y follows the
chi-squared distribution with n degrees of freedom and that Z follows the
standard normal distribution. Show that the distribution of
Z
X=p ,
Y /n
is the Student’s t-distribution where the probability density function of X is
given by
− n+1
Γ n+1
 
2 x2 2
fX (x) = √ 1 + , for − ∞ < x < ∞.
πnΓ n2

n

26
STAT2602B TST23/24 Topic 1 Exercise

Hints: The probability density functions of Y and Z are, respectively


1
fY (y) = n/2
y n/2−1 e−y/2 , for y > 0;
Γ (n/2) 2
1 2
fZ (z) = √ e−z /2 , for − ∞ < z < ∞.

The gamma function given below is also involved.
Z ∞
Γ (t) = xt−1 e−x dx.
0

17. The claim that the variance of a normal population is σ 2 = 4 is to be rejected


if the variance of a random sample of size 9 exceeds 7.7535. What is the
probability that this claim will be rejected even though σ 2 = 4?

18. If Y and Z are independent random variables having the chi-square distribu-
tions with m and n degrees of freedom, respectively, show that the random
variable
Y /m
X= ,
Z/n
has the F distribution with degrees of freedom m and n.

Hints: The probability density function of Y is


1
fY (y) = y m/2−1 e−y/2 , for y > 0.
Γ (m/2) 2m/2

19. If S12 and S22 are the variances of independent random samples of sizes n1 and
n2 from the normal populations with the variances σ12 and σ22 , respectively,
show that
S 2 /σ 2
X = 12 12 ,
S2 /σ2
has the F distribution with (n1 − 1) and (n2 − 1) degrees of freedom.

20. If S1 and S2 are the standard deviations of independent random samples of


size n1 = 61 and n2 = 31 from normal populations with σ12 = 12 and σ22 = 18,
find P (S12 /S22 > 1.16).

27

You might also like