0% found this document useful (0 votes)
30 views102 pages

Math 5846 Chapter 2

This document is a chapter from a course on Probability and Stochastic Processes, covering key concepts such as discrete and continuous random variables, probability mass functions, cumulative distribution functions, and expectations. It provides definitions, properties, and examples to illustrate these concepts, including the calculation of probabilities and expected values. The chapter serves as an introduction to the foundational principles of probability theory used in statistical analysis.

Uploaded by

huangde1212
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
30 views102 pages

Math 5846 Chapter 2

This document is a chapter from a course on Probability and Stochastic Processes, covering key concepts such as discrete and continuous random variables, probability mass functions, cumulative distribution functions, and expectations. It provides definitions, properties, and examples to illustrate these concepts, including the calculation of probabilities and expected values. The chapter serves as an introduction to the foundational principles of probability theory used in statistical analysis.

Uploaded by

huangde1212
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 102

School of Mathematics and Statistics

UNSW Sydney

Introduction to Probability and Stochastic Processes

OPEN LEARNING
CHAPTER 2

2 / 100
Outline:
2.1 Introduction

2.2 Discrete Random Variables and Probability Mass Functions

2.3 Cumulative Distribution Functions

2.4 Continuous Random Variables and Probability Density Functions

2.5 Expectation and Moments

2.6 Expectation of Transformed Random Variables

2.7 Expectation of Random Variables under Changes of Scale

2.8 Standard Deviation and Variance

2.9 Moment Generating Functions

2.10 Properties of Moment Generating Functions

2.11 Chebychev’s Inequality

2.12 Supplementary Material

3 / 100
2.1 Introduction

4 / 100
Consider a situation where we want to measure something
of interest across multiple subjects. For example, we may
be interested in measuring the weight loss achieved by
participants in a weight loss program.

Inevitably, our measurements always vary from subject to


subject due to factors beyond our control or knowledge.
For this reason, we treat the measurements as random
variables - variables that have some random component.

In this chapter, we will explore random variables and some


properties of random variables that are important in a
study or experiment.

5 / 100
Definition
The probability mass function, P (X = x) is the
probability that the random variable X takes the value of x.

For discrete sample space S, a random variable X is a


function defined on S with
X
P (X = x) = P ({s})
s:X(s)=x

being the probability that X takes the value of x.

6 / 100
2.2 Discrete Random Variables and
Probability Mass Functions

7 / 100
Definition
The probability mass function, P (X = x) is the
probability that the random variable X takes the value of x.

For discrete sample space S, a random variable X is a


function defined on S with
X
P (X = x) = P ({s})
s:X(s)=x

being the probability that X takes the value of x.

8 / 100
Properties
The probability mass function of a discrete random variable
X has the following probabilities.

1 P (X = x) ≥ 0 for all x ∈ R
P
all x P (X = x) = 1.
2

9 / 100
Example
Suppose a fair coin is tossed three times. The sample space
S is

S = {HHH, HHT, HT H, T HH, T T H, T HT, HT T, T T T }.

Let the random variable X denote the number of heads


that turned up. The probability of X is given by
1
P (X = 0) = P ({T T T }) = 8
3
P (X = 1) = P ({T T H, T HT, HT T }) = 8
3
P (X = 2) = P ({HHT, HT H, T HH}) = 8
1
P (X = 3) = P ({HHH}) = 8

P3
Observe that x=0 P (X = x) = 1.

10 / 100
2.3 Cumulative Distribution Functions

11 / 100
Definition
The cumulative distribution function (cdf ) of a
random variable X is defined as

FX (x) = P (X ≤ x).

12 / 100
Result
Let X be a random variable with cdf FX (x). Then,

FX (b) − FX (a) = P (a < X ≤ b).

13 / 100
Proof.

FX (b) − FX (a) = P (X ≤ b) − P (X ≤ a) (by the definition of cdf)


= P ({a < X ≤ b} ∪ {X ≤ a}) − P (X ≤ a)
= P (a < X ≤ b) + P (X ≤ a) − P (X ≤ a)
(since {a < X ≤ b} ∪ {X ≤ a} are mutually exclusive events)

= P (a < X ≤ b).

14 / 100
The cdf FX (x) of random variable X has the following
properties:

The cdf FX (x) is a non-decreasing function of x for all


x∈R
The cdf FX (x) ranges from zero to one, which makes it
reasonable since it is a probability.
If X is a discrete random variable whose minimum value is
a, then FX (a) = P (X ≤ a) = P (X = a). If c < a, then
FX (c) = 0.
If the maximum value of X is b, then
FX (b) = P (X ≤ b) = 1. If d ≥ b, then FX (d) = 1.
All the probabilities of X can be stated in terms of the cdf
FX (x).

15 / 100
Example
For the three coin toss example Jump to Three Coin Example , the
cumulative distribution function is given by


0, x<0

1/8, 0≤x<1



FX (x) = 4/8, 1≤x<2

7/8, 2≤x<3





1, x ≥ 3.

16 / 100
Example
Let p be the probability of a head on a single coin toss.
Suppose this coin is tossed until a head turns up for the
first time. Let X denote the number of tosses required to
get the first head.

Find the probability mass function and the


cumulative distribution function of X.

17 / 100
Example

Solution:
Assuming that each toss is independent, for n = 1, 2, 3, . . . ,
the probability mass function of X is

P (X = n) = P ({T, T, T, T . . . , T , H}) = (1 − p)n−1 p,


| {z }
(n−1) tails

where for a single toss, P (T ) = 1 − p and P (H) = p.

18 / 100
Example

Solution - continued:
The cumulative distribution function of X is
n
X
FX (n) = P (X = i)
i=1
n
i−1
X
= (1 − p) p
i=1
n
i−1
X
= p (1 − p)
i=1
h i
2 n−1
= p 1 + (1 − p) + (1 − p) + · · · + (1 − p)

(This is a geometric sum.)


h 1 − (1 − p)n i
= p
1 − (1 − p)
h 1 − (1 − p)n i
= p
p
n
= 1 − (1 − p) n = 1, 2, 3, . . . , .

19 / 100
2.4 Continuous Random Variables and
Probability Density Functions

20 / 100
When a random variable has a continuum of possible
values, it is continuous.

An example of a continuous random variable is a


lightbulb’s lifetime with possible values in [0, ∞).

The analogue of a probability mass function in the discrete


case is the probability density function (pdf ).

21 / 100
Definition
The probability density function (pdf) of a
continuous random variable is real-valued fX on R with
the property Z
fX (x)dx = P (X ∈ A)
A

for any (measurable) set A ⊂ R.

Results
1 fX (x) ≥ 0 for all x ∈ R
R +∞
2
−∞
fX (x) dx = 1.

22 / 100
Regardless of whether a random variable X is continuous
or discrete, its cumulative distribution function (cdf )
is defined by
FX (s) = P (X ≤ x).

The next set of results shows how FX may be found from


fX and vice versa.

23 / 100
Continuous random variables X have the property
P (X = a) = 0 for all a ∈ R. (1)

It only makes sense to talk about the probability of X


lying in some subset of R.
A consequence of Equation (1) is that, with continuous
random variables, we do not worry about distinguishing
between < and ≤ signs. The probabilities are not affected.
For example, if X is a continuous random variable, then

P (a < X < b) = P (a ≤ X < b) = P (a < X ≤ b) = P (a ≤ X ≤ b).

This is not the case for discrete random variables. The


endpoints’ inclusion and/or exclusion will change your
answer.
24 / 100
Results
1 The cumulative distribution function (cdf) FX of a
continuous random variable X can be found from the
probability density function fX via
Z x
FX (x) = fX (y)dy.
−∞

2 The probability density function fX of a continuous


random variable X can be found from the cumulative
distribution function (cdf) FX via

fX (x) = FX′ (x).

25 / 100
Results
➌ For any continuous random variable X and pair of
numbers a ≤ b,

P (a ≤ X ≤ b) = P (a < X < b)
= P (a ≤ X < b)
= P (a < X ≤ b)
Z b
= fx (x) dx
a
= the area under fX between a and b.

26 / 100
These results demonstrate the importance of fX (x) and
FX (x).

If you know (or can derive) either fX (x) and FX (x), you
can derive any probability you want to know about X (and
indeed any property of X).

Some other important properties of the continuous random


variable X will be discussed later.

27 / 100
Example
The lifetime (in thousands of hours) of a light bulb, X, has
density fX (x) = e−x , x > 0.

Find the probability that the light bulb lasts


between two thousand and three thousand hours.

28 / 100
Example

Solution:
There are several ways to obtain this probability.
R3 R3 3
1 P (2 ≤ X ≤ 3) = 2 fX (x) dx = 2 e−x dx = − e−x =
2
e−2 − e−3 ≈ 0.085.
Rx
2 Using the cdf of X, FX (x) = P (X ≤ x) = −∞ fX (y)dy =
R x −y −x
0 e dy = 1 − e , x > 0. Therefore,

P (2 ≤ X ≤ 3) = P (X ≤ 3) − P (X ≤ 2)
= FX (3) − FX (2) = (1 − e−3 ) − (1 − e−2 )
= e−2 − e−3 ≈ 0.085

29 / 100
Definition
If FX is strictly increasing in some interval, then FX−1 is
well defined.

For a specified p ∈ (0, 1), the pth quantile of FX is xp ,


where
FX (xp ) = p or xp = FX−1 (p).

For example,
x0.5 is the median of FX , and
x0.25 and x0.75 are the lower and upper quantile of
FX , respectively.

30 / 100
Example
Let X be a random variable with cumulative distribution
function FX (x) = 1 − e−x , x > 0.
Find the median and lower and upper quantiles of
X.

31 / 100
Example
Let X be a random variable with cumulative distribution
function FX (x) = 1 − e−x , x > 0.
Find the median and lower and upper quantiles of
X.

Solution:
Firstly, we observe that the cdf FX (x) is strictly increasing,
so we can easily find FX−1 . From Jump to Definition of pth Quantile ,

p = 1 − e−xp =⇒ xp = − ln(1 − p) =⇒ FX−1 (p) = − ln(1 − p).

To compute the median, put in p = 0.5 in the above


equation and compute F −1 (p). That is, the median
= x0.5 = − ln(1 − 0.5) = − ln(0.5) = ln 2.

31 / 100
Example

Solution - continued:
For the lower and upper quantiles of X,

4
x0.25 = − ln(1 − 0.25) = ln(0.75) = ln
3

x0.75 = − ln(1 − 0.75) = ln(0.25) = ln(4) = 2 ln(2).

32 / 100
2.5 Expectation and Moments

33 / 100
The mean or average of the numbers a1 , a2 , . . . , an is
a1 + a2 + · · · + an 1 1 1
= a1 · + a2 · + · · · + an · .
n n n n

Consider a random variable X with probability mass


function P (X = 5) = 15 and P (X = 10) = 45 .
If we observe that values say 100 random variables with the
same distribution as X, we would expect to observe about
20 5’s and about 80 10s, so the mean or the average of the
100 numbers should be about
5 × 20 + 10 × 80 1 4
= 5 · + 10 · = 9.
100 5 5

That is, the mean or average is the sum of possible


values of X weighted by their probabilities.

34 / 100
Definition
The expected value or mean of a discrete random
variable X is
X
E(X) = x · P (X = x). (2)
all x

By analogy, for the continuous random variable case, we


have

Definition
The expected value or mean of a continuous random
variable X is
Z +∞
E(X) = x · fX (x) dx. (3)
−∞

35 / 100
In both cases, E(X) has the interpretation of being
the long-run average of X.

That is, in the long run, as you observe an increasing


number of values of X, the average of these values
approaches E(X).

In both cases, E(X) has the physical interpretation of the


centre of gravity of the function fX .

So if a piece of thick wood or stone were carved in the shape


of fX , it would balance on the fulcrum place at E(X).

36 / 100
Example
Let X be the number of females in a committee with three
members. Assume that there is a 50 : 50 chance of each
committee member being female, and the committee
members are chosen independently.

Find E(X).

37 / 100
Example

Solution:
First, we need to write out the sample space S and
determine the probability mass function of X, the number
of females in a committee with three members. We see that

S = {M M M, M M F, M F M, F M M, M F F, F M F, F F M, F F F },

where M and F represent male and female, respectively.

38 / 100
Example

Solution:
The corresponding probability mass function is
x 0 1 2 3

1 3 3 1
P (X = x) 8 8 8 8

By Equation (2), the expectation of X is


3
X 1 3 3 1 3
E(X) = x P (X = x) = 0 · +1· +2· +3· = .
x=0
8 8 8 8 2

3
The interpretation of 2
is not what you expect X to be; however, if you repeated
the experiment, say, 100 times, then the average of the 100 numbers observed
150 3
should be 100
= 2
. We expect to observe about 150 females in total in 100
committees. We don’t expect to see exactly 1.5 females on each committee!

39 / 100
Example
Suppose X is a standard uniform random number
generator (such a generator can be found on most
hand-held calculators).

The probability density function of X is given by


(
1 if 0 < x < 1
fX (x) =
0 otherwise

Find E(X).

40 / 100
Example

Solution:
By Equation (3), the expectation of X is
Z +∞ Z 1
x2 1 1
E(X) = x fX (s) dx = x · 1 dx = = .
−∞ 0 2 0 2

Note that in the last two examples, the probability mass function and
probability density function of X is symmetric, and it is symmetric
about E(X). As we will see in the next few examples, that is not
always true.

41 / 100
Example
Suppose X has probability mass function

P (X = x) = (1 − p)x−1 p, x = 1, 2, . . . , 0 < p < 1.

Find E(X).

42 / 100
Example

Solution:
Let q = 1 − p. By Equation (2),

X ∞
X ∞
X
E(X) = x P (X = x) = x q x−1 p = p x q x−1
x=1 x=1 x=1
∞ 
X d x d x
= p q (since q = x q x−1 )
x=1
dq dq

d  X x
= p q (since we can interchange the derivative and the summation)
dq x=1
d q  1 · (1 − q) − q · (−1)
= p =p
dq 1 − q (1 − q)2
1 1 1
= p =p 2 = .
(1 − q)2 p p

43 / 100
Example
Suppose X has probability density function

fX (x) = e−x , x > 0.

Find E(X).

44 / 100
Example

Solution:
By Equation (3),
Z +∞ Z +∞
E(X) = x fX (x) dx = x e−x dx
−∞ 0
+∞
Z +∞
= −x e−x − −e−x dx
0 0
(by integration by parts )
Z +∞ +∞
= e−x dx = −e−x = 0 − (−1) = 1.
0 0

45 / 100
Example
If X is degenerate, that is, X = c with probability 1 for
some constant
P c, then X is in fact a constant, and
E(X) = all x x P (X = c) = c · 1 = c.

Thus, the expected value of a constant is the


constant, i.e.,
E(c) = c.

This is a special case.

46 / 100
2.6 Expectation of Transformed Random
Variables

47 / 100
Sometimes, we are interested in a transformation of random
variables.

Examples are
The circumference of a tree trunk is measured, but we
want to know the cross-sectional area
 ofthe trunk.
2
X
The random variable of interest is π 2π .

The number of smartphones sold in an electronic shop


is recorded, and revenue is of interest. The shop has
bought twenty smartphones at $500 each, sells them
for $650, and can redeem unsold smartphones for $350.
The variable of interest is 650 X + (20 − X) 350 - 20
× 500.

48 / 100
Transformations are also of interest when studying the
properties of a random variable.

For example, in order to understand X, it is often useful to


look at the rth moment of X about some constant a,
defined as E[(X − a)r ].

This is another example of a transformation of X, for


which we wish to find E[ g(X) ] for some function g(x).

49 / 100
Result
The expected value of a function of g(X) of a random
variable X is
P


 all x g(x)P (X = x) if X is discrete

E[ g(X) ] =



R +∞
−∞
g(x) fX (x) dx if X is continuous

50 / 100
Example
Let I denote the electric current through a particular
circuit, and I has pdf given by
(
1
if 1 ≤ x ≤ 3
f (x) = 2
0 otherwise.

Let power P be a function of I and resistance. For


example, a circuit with resistance three Ohms,

P = 3 I 2.

What is the expected value of P through this


circuit?

51 / 100
Example

Solution:
Z 3 Z 3
2 2 1
E[P ] = E[3I ] = 3x fI (x)dx = 3x2 dx
1 1 2
3 31 3
= x
2 3 1
1
= (27 − 1) = 13.
2

52 / 100
Result
In most situations,

E[ g(X)) ] ̸= g[ E(X) ].

53 / 100
2.7 Expectation of a Random Variables
Under Changes of Scale

54 / 100
Often, a change of scale is required when studying random
variables. For example, when a measurement change is
required, say grams → kilograms or o F → o C.

Results
If a is a constant and X is a discrete random variable,
then
X
E(X + a) = (x + a) P (X = x) = E(X) + a
all x
X
E(aX) = (a x) P (X = x) = a E(X).
all x

55 / 100
Results
If a is a constant and X is a continuous random
variable, then
Z
E(X + a) = (x + a) fX (x) dx = E(X) + a
Z
E(aX) = (a x) fX (x) dx = a E(X)

56 / 100
Result
If X is a continuous or discrete random variable, then

E[ g1 (X) + · · · + gn (X) ] = E[ g1 (X) ] + · · · + E[ gn (X) ].

57 / 100
2.8 Standard Deviation and Variance

58 / 100
The standard deviation of a random variable is a measure
of its spread. It is closely tied to the variance of a random
variable.

Definition
If we let µ = E(X), then second moment of X about µ,
E[ (X − µ)2 ], is the variance of X denoted by V ar(X).

Definition
The standard deviation of a random variable X is the
square root of its variance:
p
standard deviation of X = V ar(X).

59 / 100
Standard deviations are more readily interpreted because
they are measured in the same units as the original variable
X.

So standard deviations are more commonly used as


measures of spread in applied statistics and in reporting
quantitative research results.

Variances are a bit easier to work with theoretically and are


commonly used in mathematical statistics (and this course).

60 / 100
Result

V ar(X) = E[ (X − µ)2 ]
= E(X 2 ) − (E(X))2
= E(X 2 ) − µ2 .

61 / 100
Proof.

V ar(X) = E[ (X − µ)2 ] (by definition)


= E( X − 2 µ X + µ2 )
2

= E(X 2 ) − 2 µE(X) + E(µ2 )


(expectation is linear functional)
= E(X 2 ) − 2 µ2 + µ2 (since E(X) = µ)
= E(X 2 ) − µ2 .

62 / 100
Example
Assume the lifetime of a light bulb (in thousands of hours)
has probability density function fX (x) = e−x , x > 0.

Calculate Var(X).

63 / 100
Example

Solution:
We will use the formula V ar(X) = E(X 2 ) − (E(X))2 .
Recall that we found that E(X) = 1 earlier . So,
Z +∞
2
E(X ) = x2 e−x dx
0
Z +∞
2 −x ∞
= −[ x e ]0 − −2xe−x dx
0
(using the integration by parts formula )
= 0 + 2 E(X) = 2

V ar(X) = E(X 2 ) − (E(X))2 = 2 − (12 ) = 1.

64 / 100
Example
Consider two random variables A and B with probability
mass function are given by
x 1 2 3 4 5
A:
P (A = x) 0.15 0.25 0.20 0.25 0.15

x 1 2 3 4 5
B:
P (B = x) 0.10 0.10 0.60 0.10 0.10

Which of A and B is more variable?

65 / 100
Example

Solution:
We ask to find which random variable has the largest
variance.
E(A) = 1 × 0.15 + 2 × 0.25 + 3 × 0.20 + 4 × 0.25 + 5 × 0.15 = 3

E(A2 ) = 12 × 0.15 + 22 × 0.25 + 32 × 0.20


+42 × 0.25 + 52 × 0.15 = 10.7

V ar(A) = E(A2 ) − (E(A))2 = 10.7 − 32 = 1.7

66 / 100
Example

Solution - continued:

E(B) = 1 × 0.10 + 2 × 0.10 + 3 × 0.60 + 4 × 0.10 + 5 × 0.10 = 3

E(B 2 ) = 12 × 0.10 + 22 × 0.10 + 32 × 0.60


+42 × 0.10 + 52 × 0.10 = 10.0

V ar(B) = E(B 2 ) − (E(B))2 = 10.0 − 32 = 1

We see that V ar(A) > V ar(B), so A is more variable.

67 / 100
Results
Let a be a constant. We have

V ar(X + a) = V ar(X)

V ar(aX) = a2 V ar(X).

68 / 100
Proof.

V ar(X + a) = E[ ( (X + a) − E(X + a) )2 ] (definition)


2
= E[ ( (X + a) − E(X) − a ) ]
= E[ ( X − E(X) )2 ]
= V ar(X)

V ar(aX) = E[ ( (aX) − E(aX) )2 ] (definition)


= E[ ( (aX) − a E(X) )2 ]
= a2 E[ ( X − E(X) )2 ]
= a2 V ar(X)

69 / 100
2.9 Moment Generating Functions

70 / 100
Definition
The moment generating function (mgf ) of a random
variable X is
mX (u) = E(euX ).

The name moment generating function comes from the following


result concerning the rth moment of X about zero, E(X r ):

Result
In general,
(r)
E(X r ) = mX (0) for r = 0, 1, 2, . . .
(r)
where mX (0) represents the differentiation of the moment
generation function with respect to u, r times, and evaluate
it at u = 0.

71 / 100
Proof.
First, we observe that
mX (u) = E( eu X ) (by definition)
 uX (uX)2 
= E 1+ + + ···
1! 2!
(using the exponential series definition )
2 3
u u u
= 1 + E(X) · + E(X 2 ) · + E(X 3 ) · + ··· .
1! 2! 3!

Thus, mX (0) = 1 = E(X 0 ). Next,


′ 2 2u 3 3u2
mx (u) = E(X) + E(X ) · + E(X ) · + ···
2! 3!
′ ′
=⇒ mx (0) = mx (u) = E(X)
u=0
(2) 2 2 3 3 · 2u
mx (u) = E(X ) · + E(X ) · + ···
2! 3!
(2) 2
=⇒ mx (0) = E(X ),

and so on.
72 / 100
Example
Suppose the random variable X has the following
moment-generating function
(
(1 − u)−1 if u < 1
mX (u) =
+∞ if u ≥ 1.

Find an expression for the rth moment of X.

73 / 100
Example

Solution:
We see that (1 − u)−1 = 1−u 1
= 1 + u + u2 + · · · , when u < 1. (Recall
that this is a geometric series.) So

u u2 (r) ur
mx (u) = E(euX ) = 1 + m′X (0) + m′′X (0) + · · · + mX (0) +··· .
1! 2! r!

(r) r
Comparing the two series, we see that mX (0) ur! = ur . Therefore,
(r)
mX (0) (r)
r! = 1 and mX (0) = r!.

(r)
Thus the rth moment of X is E(X r ) = mX (0) = r!

74 / 100
Example
Suppose the random variable X has the following
probability mass function.
λx
P (X = x) = e−λ , x = 0, 1, 2, . . . , ; λ > 0.
x!
Find the moment-generating function of X.
Furthermore, find E(X) and V ar(X).

75 / 100
Example

Solution:
By the definition of moment-generating functions, we have
∞ ∞
def uX
X
ux −λ λx −λ
X (eu λ)x
mX (u) = E[e ]= e e =e
x=0
x! x=0
x!
u
= e−λ eλ e by Taylor series expansion of the the exponential )
u
= eλ (e −1) , λ > 0.

We have  
u −1) u −1)
E(X) = m′X (u) = d
du eλ (e = (λ eu ) eλ (e = λ.
u=0 u=0

Similarly, we see that E(X 2 ) = λ + λ2 . Hence,

V ar(X) = E(X 2 ) − (E(X))2 = λ + λ2 − λ2 = λ.


76 / 100
Example
Suppose the random variable X has a probability mass
function
 
n x
P (X = x) = p (1 − p)n−x , x = 0, 1, 2, . . . , n; 0 < p < 1.
x

The Binomial Theorem states that


n  
n
X n x n−x
(a + b) = a b . (4)
x=0
x

Use the Binomial Theorem to


1 show that P (X = x) is a probability mass function;
2 find the moment-generating function of X;
3 find E(X) and V ar(X) using the moment-generating
function.

77 / 100
Example

Solution
1 To show that P
P (X = x) is a probability mass function, we need
n
to show that x=0 P (X = x) = 1. That is,
n n  
X X n x
P (X = x) = p (1 − p)n−x
x=0 x=0
x
= (p + (1 − p))n by the binomial theorem
with a = p and b = 1 − p.
= 1.

Indeed, P (X = x) is a probability mass function.

78 / 100
Example

Solution:
➋ The moment generating function is given by
n  
X
uxn x
mX (u) = e p (1 − p)n−x
x=0
x
n  
X n x
= p eu (1 − p)n−x
x=0
x
= (p eu + 1 − p)n by the binomial theorem
with a = p eu and b = 1 − p.

79 / 100
Example

Solution
➌ If we differentiate the moment generating function with respect
to u, we get

d mX (u)
= n (p eu + 1 − p)n−1 p eu
du
= n p eu (p eu + 1 − p)n−1 .

Evaluating this at u = 0 gives

E(X) = np(p + 1 − p)n−1 = np.

80 / 100
Example

Solution - continued:
➌ To find the second moment, we use the product rule

d(yz) dz dy
=y +z
du du du
to get

d2 mX (u)
= n p eu {(n − 1) (p eu + 1 − p)n−2 p eu }
du2
+ (p eu + 1 − p)n−1 {n p eu }
u u n−1
using y = n p e and z = (p e + 1 − p)

= n p eu (p eu + 1 − p)n−2 {n p eu + 1 − p}.

Evaluating this at u = 0 gives

E(X 2 ) = n p(p + 1 − p)n−2 {n p + 1 − p}


= n p{n p + 1 − p}.
81 / 100
Example

Solution - continued:
➌ From this, we see that

V ar(X) = E(X 2 ) − (E(X))2


= np{n p + 1 − p} − (np)2
= np(1 − p).

82 / 100
Example
Suppose the random variable X has a probability density
function
fX (x) = e−x , x > 0.
Find the moment-generating function of X.

83 / 100
Example
Suppose the random variable X has a probability density
function
fX (x) = e−x , x > 0.
Find the moment-generating function of X.
Solution:
Using the definition of the moment-generating function, we
have

mX (u) = E[ euX ]
Z ∞ Z ∞
= eux e−x dx = e−x(1−u) dx
0 0
−e−x(1−u) ∞
=
1−u 0
1
= , u < 1.
1−u
83 / 100
2.10 Properties of Moment Generating
Functions

84 / 100
The following results on uniqueness and convergence for moment
generating function will be particularly important later.

Result
Let X and Y be two random variables whose moments
exist. If
mX (u) = mY (u)
for all u in a neighbourhood of zero (i.e. for all |u| < ϵ for
some ϵ > 0), then

FX (x) = FY (x) for all x ∈ R.

This result tells us that the moment generating function of


a random variable is unique.
The proof of this result can be found in Casella, G. and Berger, R.L.
(1990) Statistical Inference, Duxbury.
85 / 100
Result
Let { Xn : n = 1, 2, 3, . . . } be a sequence of random
variables, each with moment generating function mXn (u).
Furthermore, suppose that.

lim mXn (u) = mX (u) for all u in a neighbourhood of zero


n→∞

and mX (u) is a moment generating function of a random


variable X. Then

lim FXn (x) = FX (x) for all x ∈ R.


n→∞

The convergence of moment-generating functions implies


the convergence of cumulative distribution functions.

See the proof of this result in Casella, G. and Berger, R.L. (1990).
Statistical Inference, Duxbury.

86 / 100
The proofs of the last two results rely on the theory of
Laplace transforms.

However, this is not covered in Casella and Berger (1990).

The reader should consult

Widder, D.V. (1946) The Laplace Transform. Princeton,


New Jersey: Princeton University Press.

87 / 100
2.11 Chebychev’s Inequality

88 / 100
Chebychev’s inequality is a fundamental result concerning tail
probabilities of general random variables.

It is useful for the derivation of convergence results discussed


later.

Chebychev’s Inequality
If X is any random variable with E(X) = µ and
V ar(X) = σ 2 , then
  1
P X − µ > k σ ≤ 2.
k

The probability statement in Chebychev’s Inequality is often


stated verbally as
the probability that X is more than k standard
deviations from its means is bounded by k12 .

89 / 100
Note that Chebychev’s Inequality makes no assumptions
about the distribution of X.

This is a particularly handy result.

In practice, we usually do not know the distribution of X.

By using Chebychev’s Inequality, we can make specific


probabilistic statements about a random variable based
only on its mean and deviation.

90 / 100
Proof.
We will provide proof for the continuous random variable
case only.
Z ∞
2
σ = V ar(X) = (x − µ)2 fX (x) dx
Z−∞
≥ (x − µ)2 fX (x) dx
|x−µ|>k σ
Z
≥ (k σ)2 fX (x) dx
|x−µ|>k σ

since, |x − µ| > k σ =⇒ (x − µ)2 fX (x) > (k σ)2 fX (x).

91 / 100
Proof.
Therefore, we have
Z
2 2 2
σ ≥ k σ fX (x) dx
|x−µ|>k σ

= k2 σ2 P X − µ > k σ .


By re-arranging, we get
  1
P X − µ > k σ ≤ 2,
k
the desired result.

92 / 100
Example
The number of items produced by a factory in one day has
mean 500 and variance 100.

What is the lower bound for the probability that


between 400 and 600 items will be produced
tomorrow?

93 / 100
Example

Solution:
Let X be the number of items produced tomorrow. We are given
E(X) = µ = 500 and σ 2 = V ar(X) = 100.

Note that 400 = 500 − 100 = µ − 10σ, and 600 = 500 + 100 = µ + 10σ.
Thus,

P (400 ≤ X ≤ 600) = P (400 − 500 ≤ X − 500 ≤ 600 − 500)


= P (−100 ≤ X − 500 ≤ 100)
= P (|X − 500| ≤ 100)
= P (|X − µ| ≤ 10 σ)
= 1 − P (|X − µ| > 10 σ).

94 / 100
Example

Solution - continued:
1
Chebychev’s Inequality states that P (|X − µ| > 10 σ) ≤ 102 .
Therefore,

P (400 ≤ X ≤ 600) = 1 − P (|X − µ| > 10 σ)


1
≥ 1 − 2 = 0.99.
10

95 / 100
Supplementary Material

96 / 100
Supplementary Material - Integration by Parts

Integration by parts formula


Z Z
udv = uv − vdu.

For our example, , we choose u = x and dv = e−x dx.


So du = dx and v = −e−x .

97 / 100
Supplementary Material -Integration by Parts

Integration by parts formula


Z Z
udv = uv − vdu.

For our example , we choose u = x2 and dv = e−x dx.


So du = 2 x dx and v = −e−x .

98 / 100
Supplementary Material - Exponential Series

Recall the exponential series is given by

x x2 x3
ex = 1 + + + + ···
1! 2! 3!

99 / 100
Supplementary Material - Binomial Theorem

The Binomial Theorem states that


n  
n
X n x n−x
(a + b) = a b . (5)
x=0
x

100 / 100

You might also like