Stat2602b Topic 1
Stat2602b Topic 1
1 Moment-generating Functions
Definition 1.1. The moment-generating function (MGF) of a random variable X
is defined as
MX (t) = E(etX )
such that the right-hand side exists for t ∈ R . We may also call MX (t) the moment-
generating function of the distribution followed by X.
Remarks:
MX (0) = E(e0 ) = 1.
dr
2. MX (t) = E (X r ) for r = 1, 2, . . .;
dtr t=0
1
STAT2602B TST23/24 Topic 1
By definition,
1 2 1 1
t E X 2 + t3 E X 3 + . . . + tr E (X r ) + . . . .
MX (t) = 1 + tE (X) +
2! 3! r!
Consider differentiation with respect to t,
d 1 1
MX (t) = E (X) + tE X 2 + t2 E X 3 + . . . + tr−1 E (X r ) + . . . ,
dt 2! (r − 1)!
d2 2 3 1
tr−2 E (X r ) + . . . .
MX (t) = E X + tE X + . . . +
dt2 (r − 2)!
In general,
dr
MX (t) = E (X r ) + sum of terms in t.
dtr
Substituting t = 0, we can obtain, for example
d d2
= E X2 .
MX (t) = E (X) , MX (t)
dt t=0 dt2 t=0
In general, we have
d
MX (t) = E (X r ) .
dtr t=0
3. By definition,
2
STAT2602B TST23/24 Topic 1
Find the moment-generating function of X and hence find the mean and the vari-
ance of X.
Proof. Suppose all Xi ’s are continuous random variables and their marginal proba-
bility density functions are f1 (x1 ),f2 (x2 ), ..., fn (xn ), respectively. Then,
MY (t) = E e(X1 +X2 +···+Xn )t
Z ∞ Z ∞Z ∞
= ··· e(x1 +x2 +···+xn )t f1 (x1 )f2 (x2 ) · · · fn (xn )dx1 dx2 · · · dxn
Z−∞
∞
−∞ −∞
Z ∞ Z ∞
x1 t x2 t
= e f1 (x1 )dx1 e f2 (x2 )dx2 · · · exn t fn (xn )dxn
−∞ −∞ −∞
= MX1 (t)MX2 (t) · · · MXn (t).
The proof for discrete case is similar and is omitted.
3
STAT2602B TST23/24 Topic 1
Example 1.3. Find the probability distribution of the sum of n independent ran-
dom variables X1 , X2 , . . . , Xn having Poisson distributions with means λ1 , λ2 , . . . , λn
respectively.
Let Y = X1 + X2 + · · · + Xn . Then,
n n
Y Y t t
Pn
MY (t) = MXi (t) = eλi (e −1) = e(e −1) i=1 λi
,
i=1 i=1
which
Pn is the moment-generating function of a Poisson random variable with mean
Pni=1 λi . Therefore, by Theorem 1.3, Y has the Poisson distribution with mean
i=1 λi .
Example 1.4. For positive numbers α and λ, find the moment-generating function
of a gamma distribution Gamma(α, λ) of which the probability density function is
( α α−1 −λx
λ x e
Γ(α)
, for x > 0;
f (x) =
0, otherwise.
4
STAT2602B TST23/24 Topic 1
5
STAT2602B TST23/24 Topic 1
Definition 1.7. The sample standard deviation s is the non-negative square root
of the sample variance.
Remarks: The standard deviation is measured in the same units as the data, making
it more easily comparable, than the variance, to the mean.
Example 1.6. Suppose we want to know the mean income of all households in a
city. The population is therefore the incomes of all households in the city. We then
select randomly a household from the city and record its income. In this random
experiment, the sample space is the city and the sampling units are the households in
the city. The income corresponds to the outcome (of a household randomly selected)
of the experiment. Therefore we say that the income of a household randomly
selected is a random variable. For convenience, we define the distribution of the
population as the distribution of this random variable.
Definition 1.8. Simple random sampling is a process that all the samples with
the same size are equally likely to be chosen. A simple random sample is a sample
selected using a simple random sampling plan.
Example 1.7. To conduct random sampling, assign a number to each element of the
chosen population (or use already given numbers). Then, randomly select numbers
using a random number table or a software package.
Remarks: In this course, sampling means simple random sampling. Two common
sampling schemes include
sampling with replacement: replacing (putting back) each sampled element
before selecting subsequent elements.
sampling without replacement: not replacing any sampled element before se-
lecting subsequent elements.
Example 1.8. Suppose we have a population: −1, 1, 5 and 11. Then,
−1 + 1 + 5 + 11
µ = = 4,
4
(−1 − 4)2 + (1 − 4)2 + (5 − 4)2 + (11 − 4)2
σ2 = = 21.
4
The following table shows all possible samples and their means and variances for
sampling without replacement when the sample size is 2.
6
STAT2602B TST23/24 Topic 1
7
STAT2602B TST23/24 Topic 1
8
STAT2602B TST23/24 Topic 1
By symmetry, for i = 1, 2, . . . , n,
E (Xi − X)2
2
X1 + X2 + · · · + X n
= E (X1 − X) = Var(X1 − X) = Var X1 −
n
n
!
2 2
(n − 1)X1 X Xi (n − 1) 2 σ
= Var − = 2
σ + (n − 1) 2
n i=2
n n n
n−1 2
= σ .
n
Therefore,
" n
# n
1 X 1 X
E(S 2 ) = E (Xi − X)2 = E (Xi − X)2
n − 1 i=1 n − 1 i=1
1 n−1 2
= ·n· σ
n−1 n
= σ2.
9
STAT2602B TST23/24 Topic 1
2
Remarks: For convenience, E(X) and Var(X) may be written as µX and σX , re-
2
spectively. σX , the square root of σX , is called the standard error of the sample
mean.
Example 1.10. Recall Example 1.8,
E(X) = 4 = µ for sampling with or without replacement,
E(S 2 ) = 28 ̸= σ 2 for sampling without replacement,
E(S 2 ) = 21 = σ 2 for sampling with replacement,
The variance of the sample mean is
(−1 − 4)2 + (0 − 4)2 + (2 − 4)2 + · · · + (8 − 4)2 + (11 − 4)2
Var(X) =
16
21 σ2
= 10.5 = =
2 n
for sampling with replacement.
Theorem 1.5. If a population has N(µ, σ2 ), then
the mean X of a sample of size
σ2
n drawn from the population will have N µ, , that is,
n
X −µ
√ ∼ N 0, 12 .
σ/ n
Proof. The moment-generating function of N(µ, σ 2 ) is
Z ∞
1 (x−µ)2
tX
E(e ) = etx √ e− 2σ2 dx
−∞ 2πσ
Z ∞
−2σ 2 tx + x2 − 2µx + µ2
1
= √ exp dx
−∞ 2πσ −2σ 2
Z ∞
−2µσ 2 t − σ 4 t2 (x − µ − σ 2 t)2
1
= exp × √ exp dx
−2σ 2 −∞ 2πσ −2σ 2
1 22
= exp µt + σ t
2
because
(x − µ − σ 2 t)2
1
√ exp
2πσ −2σ 2
is the probability density function of N(µ + σ 2 t, σ 2 ).
Let {X1 , X2 , . . . , Xn } be a random sample drawnP from the normal population.
Then the moment-generating function of X = n1 ni=1 Xi is (by Theorem 1.1 and
Theorem 1.2.)
n " 2 !#n
t t t 1 t
MX (t) = MPni=1 Xi = MX1 = exp µ + σ2
n n n 2 n
1 σ2 2
= exp µt + t ,
2 n
10
STAT2602B TST23/24 Topic 1
2
which is the moment-generating function of N µ, σn . Therefore (by Theorem 1.3)
σ2
X has N µ, n .
Example 1.11. Suppose a population has N(µ, 202 ). Let n be the size of a sample.
Figure 1: Probability density functions of the sample mean for different same sizes.
Example 1.12. A soft-drink vending machine is set so that the amount of drink
dispensed is a normal random variable with a mean of 200 millilitres and a stan-
dard deviation of 15 millilitres. What is the probability that the average amount
dispensed in a random sample of size 36 is at least 204 millilitres?
2
By Theorem 1.5, the sample mean X has N 200, 15 36
, that is, N(200, 2.52 ).
X − 200 204 − 200
P X ≥ 204 = P ≥ = P (Z ≥ 1.6)
2.5 2.5
= P (Z > 0) − P(0 < Z < 1.6) = 0.5 − 0.4452
= 0.0548
Remarks: When n = 1, X has N (200, 152 ) and
X − 200 204 − 200
P X ≥ 204 = P ≥ = P (Z ≥ 0.27)
15 15
= P (Z > 0) − P (0 < Z < 0.27) = 0.5 − 0.1064
= 0.3936.
Theorem 1.6. Suppose
Pn X21 , X2 , . . . , Xn are independent2 random variables each hav-
ing N(0, 1). Then, i=1 Xi has Gamma (n/2, 1/2) or χ (n).
Proof. The moment-generating function of X12 is
Z ∞ Z ∞
(1 − 2t)x2
tx2 1 1
2
tX12 − x2
E(e ) = e √ e dx = √ exp − dx
−∞ 2π −∞ 2π 2
Z ∞
x2
1 1
= √ × exp − dx
2(1 − 2t)−1
p
1 − 2t −∞ 2π(1 − 2t)−1
1
= √
1 − 2t
11
STAT2602B TST23/24 Topic 1
which is the moment-generating function of Gamma (n/2, 1/2) (refer to Example
1.4). Therefore, ni=1 Xi2 has Gamma n2 , 12 by Theorem 1.3.
P
Gamma (n/2, 1/2) is also called the chi-squared distribution with n degrees of
freedom (denoted by χ2 (n) or χ2n ), its mean is n and its variance is 2n.
Example 1.13. Figure 2 shows the probability density functions of some χ2 (df ).
Theorem 1.7. Suppose there are two independent random variables X and Y ,
with X having N(0, 1) and Y having χ2 (n). Then the following gives a probability
X
density function of T = p :
Y /n
n+1
Γ n+1
2 t2 2
fT (t) = √ 1+ ,
πnΓ n2 n
R∞
for −∞ < t < ∞, where Γ(y) = 0 uy−1 e−u du for y > 0 is the gamma function.
12
STAT2602B TST23/24 Topic 1
X
Remarks: We say that p has the Student’s t-distribution with n degrees of
Y /n
freedom (denoted by t (n) or tn ).
Proof. Exercise question 16.
Example 1.14. Figure 3 shows the probability density functions of N(0, 1) and
some t (n).
Theorem 1.8. Suppose there are two independent random variables U and V , U
has χ2 (m) and V has χ2 (n). Then the following gives a probability density function
U/m
of W = :
V /n
m+n
m
m − m+n
Γ 2 m 2 m2 −1
2
m n
w 1+ w , for w > 0;
fW (w) = Γ 2 Γ 2 n n
0, otherwise.
Remarks:
U/m
We say that has the F -distribution with m numerator degrees of freedom
V /n
and n denominator degrees of freedom (denoted by F (m, n) or Fm,n ).
Example 1.15. Figure 4 shows the probability density functions of some F (m, n)
13
STAT2602B TST23/24 Topic 1
Proof. Assume that MY (t) exists. We make use of the following moment-generating
functions
1 m/2
2 1
MX (t) = m/2 for t < ,
1
−t 2
2
1 n/2
2 1
MX+Y (t) = n/2
for t < .
1 2
−t 2
By Theorem 1.2,
1 n/2 1 m/2 1 (n−m)/2
,
MX+Y (t) 2 2 2
MY (t) = = n/2 m/2 = (n−m)/2
MX (t) 1
−t 1
−t 1
−t
2 2 2
Proof.
n
X n
X
2
(xi − c) = [(xi − x) + (x − c)]2
i=1 i=1
n
X n
X n
X
2
= (xi − x) + 2(x − c) (xi − x) + (x − c)2
i=1 i=1 i=1
Xn
= (xi − x)2 + 0 + n(x − c)2 .
i=1
14
STAT2602B TST23/24 Topic 1
2. By Lemma 2,
n
X n
X
(Xi − µ)2 = (Xi − X)2 + n(X − µ)2 = (n − 1)S 2 + n(X − µ)2 .
i=1 i=1
Then,
n 2 2
(n − 1)S 2
X Xi − µ X −µ
= + √ .
i=1
σ σ2 σ/ n
Xi − µ X −µ
Since has N(0, 1) for i = 1, 2, . . . , n and √ has N(0, 1), we know
σ σ/ n
n 2 2
X Xi − µ 2 X −µ
that has χ (n) and √ has χ2 (1) by Theorem 1.6.
i=1
σ σ/ n
2
(n − 1)S 2
X −µ
Also, √ and are independent by the first result. Therefore,
σ/ n σ2
(n − 1)S 2
2
has χ2 (n − 1) by Lemma 1.
σ
3. Consider
X−µ
√ X−µ
√
X −µ σ/ n σ/ n
√ = S
=q ,
S/ n σ
(n−1)S 2 /σ 2
n−1
X −µ (n − 1)S 2 X −µ
where √ and 2
are independent by the first result, √ has
σ/ n σ σ/ n
(n − 1)S 2
N(0, 1) (by Theorem 1.5) and 2
has χ2 (n − 1) (by (2)). Therefore,
σ
X −µ
√ has t(n − 1) by Theorem 1.7.
S/ n
15
STAT2602B TST23/24 Topic 1
are independent. Let S12 be the sample variance of the first sample and S22 be that
S 2 /σ 2
of the second sample. Then, 12 12 has F (n1 − 1, n2 − 1).
S2 /σ2
(n1 − 1)S12 2 (n2 − 1)S22
Proof. By Theorem 1.9, has χ (n 1 − 1) and has χ2 (n2 − 1).
σ12 σ22
Then, by Theorem 1.8,
.
(n1 −1)S12
S12 /σ12 σ12
(n1 − 1)
=
S22 /σ22 2
.
(n2 −1)S2
σ22
(n2 − 1)
2. lim MXn (t) exists and equals the moment-generating function of a random
n→∞
variable Y ,
then
lim FXn (x) = FY (x) for all x at which FY (x) is continuous,
n→∞
Theorem 1.11. Suppose there is a population with mean µ and variance σ 2 > 0.
Let X be the mean of a random sample of size n drawn from the population. Then,
X −µ X −µ
for all real number x, P √ ≤ x (that is, the distribution function of √ )
σ/ n σ/ n
tends to the distribution function of N(0, 1) when n → ∞.
X −µ
Remarks: How large should n be so that one can say that √ has approximately
σ/ n
N(0, 1)? The criterion varies from case to case. Some books say that a sample of
size 30 is large enough, but this is actually not appropriate for many cases.
Xi − µ
Proof. Let Yi = , then E(Yi ) = 0 and Var(Yi ) = 1 and suppose the moment-
σ
generating function MYi (t) exists. A Taylor’s series expansion of MYi (t) around t = 0
gives:
t2 ′′
MYi (t) = MYi (0) + tMY′ i (0) + M (ϵ), for some 0 ≤ ϵ ≤ t.
2 Yi
Let √ n
n(X − µ) 1 X
Zn = = √ Yi ,
σ n i=1
16
STAT2602B TST23/24 Topic 1
has approximately N(0, 1). Note that now nX = X1 +X2 +· · ·+Xn has the binomial
distribution with parameters n and p. Therefore we may also say that the normal
distribution can be used as an approximation to the binomial distribution when np
and n(1 − p) are both greater than 5.
Lemma 4.(Chebyshev’s theorem, Chebyshev’s inequality) For a random variable
X with mean µ and variance σ 2 ,
σ2
P (|X − µ| ≥ c) ≤ for any c > 0.
c2
Proof. Suppose X is a continuous random variable and its probability density func-
tion is f (x). Then,
Z ∞
2 2
(x − µ)2 f (x)dx
σ = E (X − µ) =
−∞
Z µ−c Z µ+c Z ∞
2 2
= (x − µ) f (x)dx + (x − µ) f (x)dx + (x − µ)2 f (x)dx
−∞ µ−c µ+c
Z µ−c Z ∞
≥ c2 f (x)dx + 0 + c2 f (x)dx
−∞ µ+c
= c2 P (|X − µ| ≥ c) .
17
STAT2602B TST23/24 Topic 1
Theorem 1.12 (Weak law of large numbers). Let X be the mean of a random
sample of size n from a population with mean µ and variance σ 2 . Then,
lim P |X − µ| ≥ c = 0 for any c > 0.
n→∞
Remarks:
P (X = 1) = p and P (X = 0) = 1 − p.
¯ − p| ≥ c) ≤ lim 1 p (1 − p) = 0
lim P(|X for any c > 0.
n→∞ n→∞ c2 n
4 Order Statistics
Consider a random sample {X1 , X2 , . . . , Xn } from a population having a probability
density function. Note that
P (Xi = Xj ) = 0 for i ̸= j, i, j = 1, 2, . . . , n.
18
STAT2602B TST23/24 Topic 1
denote the order statistics of this sample. Here X(r) is called the r-th order statistic
from smallest to largest for r = 1, 2, . . . , n, with X(1) = min{X1 , X2 , . . . , Xn } and
X(n) = max{X1 , X2 , . . . , Xn }.
19
STAT2602B TST23/24 Topic 1
n!
gk (y) = [F (y)]k−1 [1 − F (y)]n−k f (y), for a < y < b.
(k − 1)!(n − k)!
n! F (y + h) − F (y)
= [F (y)]k−1 lim [1 − F (y)]n−k
(k − 1)!1!(n − k)! h→0 h
n!
= [F (y)]k−1 [1 − F (y)]n−k f (y)
(k − 1)!(n − k)!
Remarks: Using similar arguments as in Theorem 1.13, the joint probability density
function of X(i) and X(j) (i < j) can be obtained as
n!
gi,j (x, y) = [F (x)]i−1 [[F (y) − F (x)]j−i−1 [1 − F (y)]n−j f (x)f (y),
(i − 1)!(j − i − 1)!(n − j)!
where x ≤ y, and the joint probability density function of (X(1) , X(2) , · · · , X(n) ) is
n!f (x1 )f (x2 ) · · · f (xn ), if x1 ≤ x2 ≤ · · · ≤ xn ;
g1,2,··· ,n (x1 , x2 , · · · , xn ) =
0, otherwise.
20
The University of Hong Kong
Department of Statistics and Actuarial Science
STAT2602B Probability and Statistics II
Semester 2 2023/2024
Topic 1 Summary
1 Moment-generating Functions
(a) MX (t) = E etX
tx
P
MX (t) = R ∞ x etx f (x) , if X is discrete,
−∞
e f (x) dx, if X is continuous.
(b) Properties
X − µ approx.
∼ N 0, 12 .
√
σ/ n
σ2
X −µ
√ ∼ N 0, 12 .
X ∼ N µ, ⇔
n σ/ n
X −µ
√ ∼ tn−1 .
S/ n
22
STAT2602B TST23/24 Topic 1 Summary
(b) Two normal populations with unknown variances σ12 and σ22 :
23
The University of Hong Kong
Department of Statistics and Actuarial Science
STAT2602B Probability and Statistics II
Semester 2 2023/2024
Topic 1 Exercise
1. Two random variables X and Y are independent and have probabilities given
by
Let Z = X + Y .
E (X r ) = 0.8, r = 1, 2, . . . .
24
STAT2602B TST23/24 Topic 1 Exercise
(b) Obtain the mean, the variance, and the skewness of X and Y .
x -3 -2 -1 0 1 2 3
f (x) 0.06 0.09 0.10 0.50 0.10 0.09 0.06
y -3 -2 -1 0 1 2 3
f (y) 0.04 0.11 0.20 0.30 0.20 0.11 0.04
Obtain the mean, the variance, and the kurtosis of each of X and Y .
7. A random sample of size n = 100 is taken from a population with the mean
µ = 75 and the variance σ 2 = 256. Based on the central limit theorem, with
what probability can we assert that the value we obtain for X will fall between
67 and 83?
9. Independent random samples of size 400 are taken from each of two populations
having equal means and the standard deviations σ1 = 20 and σ2 = 30. Using
the central limit theorem, what can we assert with a probability of at least
0.99 about an interval about the mean we will get for X 1 − X 2 ?
10. The actual proportion of families in a certain city who own, rather than rent,
their home is 0.70. If 84 families in this city are interviewed at random and
their responses to the question of whether they own their home are looked
upon as values of independent random variables having identical Bernoulli
distributions with the parameter p = 0.70, with what probability can we assert
that the value we obtain for the sample proportion will fall between 0.64 and
0.76, using the central limit theorem?
11. Suppose that Z follows the standard normal distribution. Find the probability
density function of Y = Z 2 by the transformation technique.
12. If X has the standard normal distribution, use the distribution function tech-
nique to find the probability density function of Z = X 2 .
25
STAT2602B TST23/24 Topic 1 Exercise
(b) Show that the mean Y /n has the normal distribution, where
n n
!
Y 1X 1 X 2
∼N µi , 2 σ .
n n i=1 n i=1 i
26
STAT2602B TST23/24 Topic 1 Exercise
18. If Y and Z are independent random variables having the chi-square distribu-
tions with m and n degrees of freedom, respectively, show that the random
variable
Y /m
X= ,
Z/n
has the F distribution with degrees of freedom m and n.
19. If S12 and S22 are the variances of independent random samples of sizes n1 and
n2 from the normal populations with the variances σ12 and σ22 , respectively,
show that
S 2 /σ 2
X = 12 12 ,
S2 /σ2
has the F distribution with (n1 − 1) and (n2 − 1) degrees of freedom.
27