Math 5846 Chapter 2
Math 5846 Chapter 2
UNSW Sydney
OPEN LEARNING
CHAPTER 2
2 / 100
Outline:
2.1 Introduction
3 / 100
2.1 Introduction
4 / 100
Consider a situation where we want to measure something
of interest across multiple subjects. For example, we may
be interested in measuring the weight loss achieved by
participants in a weight loss program.
5 / 100
Definition
The probability mass function, P (X = x) is the
probability that the random variable X takes the value of x.
6 / 100
2.2 Discrete Random Variables and
Probability Mass Functions
7 / 100
Definition
The probability mass function, P (X = x) is the
probability that the random variable X takes the value of x.
8 / 100
Properties
The probability mass function of a discrete random variable
X has the following probabilities.
1 P (X = x) ≥ 0 for all x ∈ R
P
all x P (X = x) = 1.
2
9 / 100
Example
Suppose a fair coin is tossed three times. The sample space
S is
P3
Observe that x=0 P (X = x) = 1.
10 / 100
2.3 Cumulative Distribution Functions
11 / 100
Definition
The cumulative distribution function (cdf ) of a
random variable X is defined as
FX (x) = P (X ≤ x).
12 / 100
Result
Let X be a random variable with cdf FX (x). Then,
13 / 100
Proof.
= P (a < X ≤ b).
14 / 100
The cdf FX (x) of random variable X has the following
properties:
15 / 100
Example
For the three coin toss example Jump to Three Coin Example , the
cumulative distribution function is given by
0, x<0
1/8, 0≤x<1
FX (x) = 4/8, 1≤x<2
7/8, 2≤x<3
1, x ≥ 3.
16 / 100
Example
Let p be the probability of a head on a single coin toss.
Suppose this coin is tossed until a head turns up for the
first time. Let X denote the number of tosses required to
get the first head.
17 / 100
Example
Solution:
Assuming that each toss is independent, for n = 1, 2, 3, . . . ,
the probability mass function of X is
18 / 100
Example
Solution - continued:
The cumulative distribution function of X is
n
X
FX (n) = P (X = i)
i=1
n
i−1
X
= (1 − p) p
i=1
n
i−1
X
= p (1 − p)
i=1
h i
2 n−1
= p 1 + (1 − p) + (1 − p) + · · · + (1 − p)
19 / 100
2.4 Continuous Random Variables and
Probability Density Functions
20 / 100
When a random variable has a continuum of possible
values, it is continuous.
21 / 100
Definition
The probability density function (pdf) of a
continuous random variable is real-valued fX on R with
the property Z
fX (x)dx = P (X ∈ A)
A
Results
1 fX (x) ≥ 0 for all x ∈ R
R +∞
2
−∞
fX (x) dx = 1.
22 / 100
Regardless of whether a random variable X is continuous
or discrete, its cumulative distribution function (cdf )
is defined by
FX (s) = P (X ≤ x).
23 / 100
Continuous random variables X have the property
P (X = a) = 0 for all a ∈ R. (1)
25 / 100
Results
➌ For any continuous random variable X and pair of
numbers a ≤ b,
P (a ≤ X ≤ b) = P (a < X < b)
= P (a ≤ X < b)
= P (a < X ≤ b)
Z b
= fx (x) dx
a
= the area under fX between a and b.
26 / 100
These results demonstrate the importance of fX (x) and
FX (x).
If you know (or can derive) either fX (x) and FX (x), you
can derive any probability you want to know about X (and
indeed any property of X).
27 / 100
Example
The lifetime (in thousands of hours) of a light bulb, X, has
density fX (x) = e−x , x > 0.
28 / 100
Example
Solution:
There are several ways to obtain this probability.
R3 R3 3
1 P (2 ≤ X ≤ 3) = 2 fX (x) dx = 2 e−x dx = − e−x =
2
e−2 − e−3 ≈ 0.085.
Rx
2 Using the cdf of X, FX (x) = P (X ≤ x) = −∞ fX (y)dy =
R x −y −x
0 e dy = 1 − e , x > 0. Therefore,
P (2 ≤ X ≤ 3) = P (X ≤ 3) − P (X ≤ 2)
= FX (3) − FX (2) = (1 − e−3 ) − (1 − e−2 )
= e−2 − e−3 ≈ 0.085
29 / 100
Definition
If FX is strictly increasing in some interval, then FX−1 is
well defined.
For example,
x0.5 is the median of FX , and
x0.25 and x0.75 are the lower and upper quantile of
FX , respectively.
30 / 100
Example
Let X be a random variable with cumulative distribution
function FX (x) = 1 − e−x , x > 0.
Find the median and lower and upper quantiles of
X.
31 / 100
Example
Let X be a random variable with cumulative distribution
function FX (x) = 1 − e−x , x > 0.
Find the median and lower and upper quantiles of
X.
Solution:
Firstly, we observe that the cdf FX (x) is strictly increasing,
so we can easily find FX−1 . From Jump to Definition of pth Quantile ,
31 / 100
Example
Solution - continued:
For the lower and upper quantiles of X,
4
x0.25 = − ln(1 − 0.25) = ln(0.75) = ln
3
32 / 100
2.5 Expectation and Moments
33 / 100
The mean or average of the numbers a1 , a2 , . . . , an is
a1 + a2 + · · · + an 1 1 1
= a1 · + a2 · + · · · + an · .
n n n n
34 / 100
Definition
The expected value or mean of a discrete random
variable X is
X
E(X) = x · P (X = x). (2)
all x
Definition
The expected value or mean of a continuous random
variable X is
Z +∞
E(X) = x · fX (x) dx. (3)
−∞
35 / 100
In both cases, E(X) has the interpretation of being
the long-run average of X.
36 / 100
Example
Let X be the number of females in a committee with three
members. Assume that there is a 50 : 50 chance of each
committee member being female, and the committee
members are chosen independently.
Find E(X).
37 / 100
Example
Solution:
First, we need to write out the sample space S and
determine the probability mass function of X, the number
of females in a committee with three members. We see that
S = {M M M, M M F, M F M, F M M, M F F, F M F, F F M, F F F },
38 / 100
Example
Solution:
The corresponding probability mass function is
x 0 1 2 3
1 3 3 1
P (X = x) 8 8 8 8
3
The interpretation of 2
is not what you expect X to be; however, if you repeated
the experiment, say, 100 times, then the average of the 100 numbers observed
150 3
should be 100
= 2
. We expect to observe about 150 females in total in 100
committees. We don’t expect to see exactly 1.5 females on each committee!
39 / 100
Example
Suppose X is a standard uniform random number
generator (such a generator can be found on most
hand-held calculators).
Find E(X).
40 / 100
Example
Solution:
By Equation (3), the expectation of X is
Z +∞ Z 1
x2 1 1
E(X) = x fX (s) dx = x · 1 dx = = .
−∞ 0 2 0 2
Note that in the last two examples, the probability mass function and
probability density function of X is symmetric, and it is symmetric
about E(X). As we will see in the next few examples, that is not
always true.
41 / 100
Example
Suppose X has probability mass function
Find E(X).
42 / 100
Example
Solution:
Let q = 1 − p. By Equation (2),
∞
X ∞
X ∞
X
E(X) = x P (X = x) = x q x−1 p = p x q x−1
x=1 x=1 x=1
∞
X d x d x
= p q (since q = x q x−1 )
x=1
dq dq
∞
d X x
= p q (since we can interchange the derivative and the summation)
dq x=1
d q 1 · (1 − q) − q · (−1)
= p =p
dq 1 − q (1 − q)2
1 1 1
= p =p 2 = .
(1 − q)2 p p
43 / 100
Example
Suppose X has probability density function
Find E(X).
44 / 100
Example
Solution:
By Equation (3),
Z +∞ Z +∞
E(X) = x fX (x) dx = x e−x dx
−∞ 0
+∞
Z +∞
= −x e−x − −e−x dx
0 0
(by integration by parts )
Z +∞ +∞
= e−x dx = −e−x = 0 − (−1) = 1.
0 0
45 / 100
Example
If X is degenerate, that is, X = c with probability 1 for
some constant
P c, then X is in fact a constant, and
E(X) = all x x P (X = c) = c · 1 = c.
46 / 100
2.6 Expectation of Transformed Random
Variables
47 / 100
Sometimes, we are interested in a transformation of random
variables.
Examples are
The circumference of a tree trunk is measured, but we
want to know the cross-sectional area
ofthe trunk.
2
X
The random variable of interest is π 2π .
48 / 100
Transformations are also of interest when studying the
properties of a random variable.
49 / 100
Result
The expected value of a function of g(X) of a random
variable X is
P
all x g(x)P (X = x) if X is discrete
E[ g(X) ] =
R +∞
−∞
g(x) fX (x) dx if X is continuous
50 / 100
Example
Let I denote the electric current through a particular
circuit, and I has pdf given by
(
1
if 1 ≤ x ≤ 3
f (x) = 2
0 otherwise.
P = 3 I 2.
51 / 100
Example
Solution:
Z 3 Z 3
2 2 1
E[P ] = E[3I ] = 3x fI (x)dx = 3x2 dx
1 1 2
3 31 3
= x
2 3 1
1
= (27 − 1) = 13.
2
52 / 100
Result
In most situations,
E[ g(X)) ] ̸= g[ E(X) ].
53 / 100
2.7 Expectation of a Random Variables
Under Changes of Scale
54 / 100
Often, a change of scale is required when studying random
variables. For example, when a measurement change is
required, say grams → kilograms or o F → o C.
Results
If a is a constant and X is a discrete random variable,
then
X
E(X + a) = (x + a) P (X = x) = E(X) + a
all x
X
E(aX) = (a x) P (X = x) = a E(X).
all x
55 / 100
Results
If a is a constant and X is a continuous random
variable, then
Z
E(X + a) = (x + a) fX (x) dx = E(X) + a
Z
E(aX) = (a x) fX (x) dx = a E(X)
56 / 100
Result
If X is a continuous or discrete random variable, then
57 / 100
2.8 Standard Deviation and Variance
58 / 100
The standard deviation of a random variable is a measure
of its spread. It is closely tied to the variance of a random
variable.
Definition
If we let µ = E(X), then second moment of X about µ,
E[ (X − µ)2 ], is the variance of X denoted by V ar(X).
Definition
The standard deviation of a random variable X is the
square root of its variance:
p
standard deviation of X = V ar(X).
59 / 100
Standard deviations are more readily interpreted because
they are measured in the same units as the original variable
X.
60 / 100
Result
V ar(X) = E[ (X − µ)2 ]
= E(X 2 ) − (E(X))2
= E(X 2 ) − µ2 .
61 / 100
Proof.
62 / 100
Example
Assume the lifetime of a light bulb (in thousands of hours)
has probability density function fX (x) = e−x , x > 0.
Calculate Var(X).
63 / 100
Example
Solution:
We will use the formula V ar(X) = E(X 2 ) − (E(X))2 .
Recall that we found that E(X) = 1 earlier . So,
Z +∞
2
E(X ) = x2 e−x dx
0
Z +∞
2 −x ∞
= −[ x e ]0 − −2xe−x dx
0
(using the integration by parts formula )
= 0 + 2 E(X) = 2
64 / 100
Example
Consider two random variables A and B with probability
mass function are given by
x 1 2 3 4 5
A:
P (A = x) 0.15 0.25 0.20 0.25 0.15
x 1 2 3 4 5
B:
P (B = x) 0.10 0.10 0.60 0.10 0.10
65 / 100
Example
Solution:
We ask to find which random variable has the largest
variance.
E(A) = 1 × 0.15 + 2 × 0.25 + 3 × 0.20 + 4 × 0.25 + 5 × 0.15 = 3
66 / 100
Example
Solution - continued:
67 / 100
Results
Let a be a constant. We have
V ar(X + a) = V ar(X)
V ar(aX) = a2 V ar(X).
68 / 100
Proof.
69 / 100
2.9 Moment Generating Functions
70 / 100
Definition
The moment generating function (mgf ) of a random
variable X is
mX (u) = E(euX ).
Result
In general,
(r)
E(X r ) = mX (0) for r = 0, 1, 2, . . .
(r)
where mX (0) represents the differentiation of the moment
generation function with respect to u, r times, and evaluate
it at u = 0.
71 / 100
Proof.
First, we observe that
mX (u) = E( eu X ) (by definition)
uX (uX)2
= E 1+ + + ···
1! 2!
(using the exponential series definition )
2 3
u u u
= 1 + E(X) · + E(X 2 ) · + E(X 3 ) · + ··· .
1! 2! 3!
and so on.
72 / 100
Example
Suppose the random variable X has the following
moment-generating function
(
(1 − u)−1 if u < 1
mX (u) =
+∞ if u ≥ 1.
73 / 100
Example
Solution:
We see that (1 − u)−1 = 1−u 1
= 1 + u + u2 + · · · , when u < 1. (Recall
that this is a geometric series.) So
u u2 (r) ur
mx (u) = E(euX ) = 1 + m′X (0) + m′′X (0) + · · · + mX (0) +··· .
1! 2! r!
(r) r
Comparing the two series, we see that mX (0) ur! = ur . Therefore,
(r)
mX (0) (r)
r! = 1 and mX (0) = r!.
(r)
Thus the rth moment of X is E(X r ) = mX (0) = r!
74 / 100
Example
Suppose the random variable X has the following
probability mass function.
λx
P (X = x) = e−λ , x = 0, 1, 2, . . . , ; λ > 0.
x!
Find the moment-generating function of X.
Furthermore, find E(X) and V ar(X).
75 / 100
Example
Solution:
By the definition of moment-generating functions, we have
∞ ∞
def uX
X
ux −λ λx −λ
X (eu λ)x
mX (u) = E[e ]= e e =e
x=0
x! x=0
x!
u
= e−λ eλ e by Taylor series expansion of the the exponential )
u
= eλ (e −1) , λ > 0.
We have
u −1) u −1)
E(X) = m′X (u) = d
du eλ (e = (λ eu ) eλ (e = λ.
u=0 u=0
77 / 100
Example
Solution
1 To show that P
P (X = x) is a probability mass function, we need
n
to show that x=0 P (X = x) = 1. That is,
n n
X X n x
P (X = x) = p (1 − p)n−x
x=0 x=0
x
= (p + (1 − p))n by the binomial theorem
with a = p and b = 1 − p.
= 1.
78 / 100
Example
Solution:
➋ The moment generating function is given by
n
X
uxn x
mX (u) = e p (1 − p)n−x
x=0
x
n
X n x
= p eu (1 − p)n−x
x=0
x
= (p eu + 1 − p)n by the binomial theorem
with a = p eu and b = 1 − p.
79 / 100
Example
Solution
➌ If we differentiate the moment generating function with respect
to u, we get
d mX (u)
= n (p eu + 1 − p)n−1 p eu
du
= n p eu (p eu + 1 − p)n−1 .
80 / 100
Example
Solution - continued:
➌ To find the second moment, we use the product rule
d(yz) dz dy
=y +z
du du du
to get
d2 mX (u)
= n p eu {(n − 1) (p eu + 1 − p)n−2 p eu }
du2
+ (p eu + 1 − p)n−1 {n p eu }
u u n−1
using y = n p e and z = (p e + 1 − p)
= n p eu (p eu + 1 − p)n−2 {n p eu + 1 − p}.
Solution - continued:
➌ From this, we see that
82 / 100
Example
Suppose the random variable X has a probability density
function
fX (x) = e−x , x > 0.
Find the moment-generating function of X.
83 / 100
Example
Suppose the random variable X has a probability density
function
fX (x) = e−x , x > 0.
Find the moment-generating function of X.
Solution:
Using the definition of the moment-generating function, we
have
mX (u) = E[ euX ]
Z ∞ Z ∞
= eux e−x dx = e−x(1−u) dx
0 0
−e−x(1−u) ∞
=
1−u 0
1
= , u < 1.
1−u
83 / 100
2.10 Properties of Moment Generating
Functions
84 / 100
The following results on uniqueness and convergence for moment
generating function will be particularly important later.
Result
Let X and Y be two random variables whose moments
exist. If
mX (u) = mY (u)
for all u in a neighbourhood of zero (i.e. for all |u| < ϵ for
some ϵ > 0), then
See the proof of this result in Casella, G. and Berger, R.L. (1990).
Statistical Inference, Duxbury.
86 / 100
The proofs of the last two results rely on the theory of
Laplace transforms.
87 / 100
2.11 Chebychev’s Inequality
88 / 100
Chebychev’s inequality is a fundamental result concerning tail
probabilities of general random variables.
Chebychev’s Inequality
If X is any random variable with E(X) = µ and
V ar(X) = σ 2 , then
1
P X − µ > k σ ≤ 2.
k
89 / 100
Note that Chebychev’s Inequality makes no assumptions
about the distribution of X.
90 / 100
Proof.
We will provide proof for the continuous random variable
case only.
Z ∞
2
σ = V ar(X) = (x − µ)2 fX (x) dx
Z−∞
≥ (x − µ)2 fX (x) dx
|x−µ|>k σ
Z
≥ (k σ)2 fX (x) dx
|x−µ|>k σ
91 / 100
Proof.
Therefore, we have
Z
2 2 2
σ ≥ k σ fX (x) dx
|x−µ|>k σ
= k2 σ2 P X − µ > k σ .
By re-arranging, we get
1
P X − µ > k σ ≤ 2,
k
the desired result.
92 / 100
Example
The number of items produced by a factory in one day has
mean 500 and variance 100.
93 / 100
Example
Solution:
Let X be the number of items produced tomorrow. We are given
E(X) = µ = 500 and σ 2 = V ar(X) = 100.
Note that 400 = 500 − 100 = µ − 10σ, and 600 = 500 + 100 = µ + 10σ.
Thus,
94 / 100
Example
Solution - continued:
1
Chebychev’s Inequality states that P (|X − µ| > 10 σ) ≤ 102 .
Therefore,
95 / 100
Supplementary Material
96 / 100
Supplementary Material - Integration by Parts
97 / 100
Supplementary Material -Integration by Parts
98 / 100
Supplementary Material - Exponential Series
x x2 x3
ex = 1 + + + + ···
1! 2! 3!
99 / 100
Supplementary Material - Binomial Theorem
100 / 100