0% found this document useful (0 votes)
2 views

Recitation 3 solution

The document covers discrete random variables and their distributions, including definitions of probability mass functions (PMF) and cumulative distribution functions (CDF). It also introduces famous discrete distributions such as Bernoulli, Binomial, Hypergeometric, Poisson, Geometric, and Negative Binomial distributions, along with exercises to apply these concepts. Additionally, it discusses connections between different distributions and provides examples and exercises for practical understanding.

Uploaded by

a2671631196
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views

Recitation 3 solution

The document covers discrete random variables and their distributions, including definitions of probability mass functions (PMF) and cumulative distribution functions (CDF). It also introduces famous discrete distributions such as Bernoulli, Binomial, Hypergeometric, Poisson, Geometric, and Negative Binomial distributions, along with exercises to apply these concepts. Additionally, it discusses connections between different distributions and provides examples and exercises for practical understanding.

Uploaded by

a2671631196
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 11

10510134: Probability Theory and Mathematical Statistics Fall 2023

Recitation 3

3.1 Discrete Random Variables and Distributions

3.1.1 Review

Definition 3.1 (Discrete Random Variable). A discrete random variable is a random variable
whose possible values constitute either a finite set (e.g., x1 , x2 . . . , xn ) or a countably infinite set (e.g.,
x1 , x2 , . . . , xn , . . . ).

Definition 3.2 (Probability mass function). The probability mass function (PMF) of a discrete
r.v. X is the function pX given by

pX (x) = P (X = x) := P ({s ∈ S : X(s) = x}) , for every x ∈ R.

The set of all values x such that pX (x) > 0 is called the support of X, which we denote as Supp(X).

Definition 3.3 (Cumulative distribution function). The cumulative distribution function (CDF)
FX (x) of a discrete r.v. X is defined for every real number x ∈ R by

FX (x) = P (X ≤ x) := P ({s ∈ S : X(s) ≤ x}) .

In other words, for any number x ∈ R, FX (x) is the probability that the observed value of X will be
at most x.

3.1.2 Exercises

Exercise 1. Let X be a r.v. whose possible values are 0, 1, 2, . . ., with CDF F . In some countries,
rather than using a CDF, the convention is to use the function G defined by G(x) = P (X < x) to
specify a distribution. Find a way to convert from F to G, i.e., if F is a known function show how to
obtain G(x) for all real x.

Answer:

According to the definition of CDF F (x) = P (X ≤ x), we have

G(x) = P (X < x) = P (X ≤ x) − P (X = x) = F (x) − P (X = x).

3-1
3-2

If x is not a non-negative integer, then P (X = x) = 0. For a non-negative integer x, P (X = x) =


F (x) − F (x − 1). Thus

F (x − 1) x ∈ {0, 1, 2, ...}
G(x) = .
F (x) x∈
/ {0, 1, 2, ...}

Exercise 2. An investment firm offers its customers municipal bonds that mature after varying num-
bers of years. Given that the cumulative distribution of T , the number of years to maturity for a
randomly selected bond is


0, t < 1





 1
, 1≤t<3

4
F (t) = 12 , 3 ≤ t < 5 .






3
, 5≤t<7

 4


1, t ≥ 7

Find (1) P (T = 5); (2) P (T > 3); and (3) P (1.4 < T < 6). Give your reasoning.

Answer:

1. P (T = 5) = F (5) − limx→5− F (x) = 3/4 − 1/2 = 1/4;


2. P (T > 3) = 1 − P (T ≤ 3) = 1 − F (3) = 1/2;
3. P (1.4 < T < 6)=P (T < 6) − P (T ≤ 1.4) = limx→6− F (x) − F (1.4) = 3/4 − 1/4 = 1/2

3.2 Famous Discrete Distributions

3.2.1 Review

Definitions of famous discrete distributions

• An r.v. X is said to have the Bernoulli distribution with parameter p if P (X = 1) = p and


P (X = 0) = 1 − p, where 0 < p < 1. We write this as X ∼ Bern(p).

• Suppose that n independent Bernoulli trials are performed, each with the same success probability
p. Let X be the total number of successes. The distribution of X is called the Binomial
distribution with parameters n and p. We write X ∼ Bin(n, p) to mean that X has the Binomial
distribution with parameters n and p, where n is a positive integer and 0 < p < 1.
3-3

If X ∼ Bin(n, p), then the PMF of X is


 
n k
P (X = k) = p (1 − p)n−k
k
for k = 0, 1, . . . , n (and P (X = k) = 0 otherwise).

• Consider an urn with w white balls and b black balls. We draw n balls out of the urn at
random without replacement. Let X be the number of white balls in the sample. Then X is
said to have the Hypergeometric distribution with parameters w, b, and n; we denote this by
X ∼ HGeom(w, b, n).
If X ∼ HGeom(w, b, n), then the PMF of X is
w
 b

P (X = k) = k
w+b

n−k
,
n

for integers k satisfying 0 ≤ k ≤ w and 0 ≤ n − k ≤ b, and P (X = k) = 0 otherwise.

• An r.v. X has the Poisson distribution with parameter λ, where λ > 0, if the PMF of X is
e−λ λk
P (X = k) = , k = 0, 1, 2, . . .
k!
We write this as X ∼ Pois(λ).

• Consider a sequence of independent Bernoulli trials, each with the same success probability p ∈
(0, 1), with trials performed until a success occurs. Let X be the number of failures before the
first successful trial. Then X has the Geometric distribution with parameter p; we denote this
by X ∼ Geom(p).
If X ∼ Geom(p), then the PMF of X is

P (X = k) = q k p, k = 0, 1, 2, . . . , where q = 1 − p.

Geometric distribution satisfies the memoryless property: if X ∼ Geom(p), then for any non-
negative integers m and n,

P (X ≥ m + n | X ≥ m) = P (X ≥ n).

• In a sequence of independent Bernoulli trials with success probability p, if X is the number of


failures before the r th success, then X is said to have the Negative Binomial distribution
with parameters r and p, denoted X ∼ NBin(r, p).
If X ∼ NBin(r, p), then the PMF of X is
 
n+r−1 r n
P (X = n) = p q
r−1
for n = 0, 1, 2 . . ., where q = 1 − p.
3-4

Connections between distributions

• If X ∼ Bin(n, p), viewed as the number of successes in n independent Bernoulli trials with success
probability p, then we can write X = X1 +· · ·+Xn where the Xi are independent random variables
distributed as Bern(p).

• If X ∼ HGeom(w, b, n), then we can write X = X1 + · · · + Xn where X1 , . . . , Xn are random


variables distributed as Bern(w/(w + b)) but X1 , . . . , Xn are not independent.

• (Poisson Approximation) Consider a random variable Xn with a binomial distribution Bin(n, p).
If n → ∞, p → 0 and np → λ, then for k = 0, 1, 2, . . . ,

e−λ λk
P (Xn = k) → .
k!

• Let X ∼ NBin(r, p), viewed as the number of failures before the r th success in a sequence of
independent Bernoulli trials with success probability p. Then we can write X = X1 + · · · + Xr
where the Xi are independent r.v.s distributed as Geom(p).

3.2.2 Exercises

Exercise 3. Twenty percent of all telephones of a certain type are submitted for service while under
warranty. Of these, 60% can be repaired, whereas the other 40% must be replaced with new units. If a
company purchases ten of these telephones, what is the probability that exactly two will end up being
replaced under warranty?

Answer: Let “success” correspond to a telephone that is submitted for service while under warranty and
must be replaced. Then p = P ( success ) = P ( replaced | submitted )·P ( submitted ) = (.40)(.20) = .08.
Thus X, the number among the company’s 10 phones that must be replaced, has a binomial distribution
with n = 10 and p = .08. Therefore,
!
10
P (X = 2) = (.08)2 (.92)8 = .1478
2

Exercise 4. Let X ∼ Bin(n, p) and Y ∼ Bin(m, p), independent of X (Independence of r.v.s will
be rigorously defined later in the course; for now you can interpret this as the independence of the
underlying Bernoulli trials, namely, the outcome of any trial provides no information for the outcomes
of other trials).

1. Show that n − X ∼ Bin(n, q) with q = 1 − p.


3-5

2. Show that X + Y ∼ Bin(n + m, p).


3. Show that X − Y is not Binomial.
4. Find P (X = k | X + Y = j). How does this relate to capture-recapture example (Example 2.5)
in Lecture notes?

Answer:

1. Note that for any integer k between 0 and n,

   
n n k
P (n − X = k) = P (X = n − k) = p n−k
(1 − p) =
k
q (1 − q)n−k .
n−k k

The last display is the PMF of Bin(n, q).


Intuition: n − X is the total number of failures in the n independent Bernoulli trials, where the
probability of failure in each trial is equal to q = 1 − p. We can relabel the original failure as
“success” and the original success as “failure”, then n − X is the total number of “successes” in
n independent Bernoulli trials with “success” probability q. Thus n − X ∼ Bin(n, q).
2. Denote Z = X+Y . Note that the event {Z = z} = ∪zx=0 {X = x, Y = z − x} where {X = x, Y = z − x}
for different x are mututally exclusive. Thus the PMF of random variable Z is

X
z
P (Z = z) = P (X = x, Y = z − x) (Axiom 3)
x=0
Xz
= P (X = x)P (Y = z − x) (Independence of {X = x} and {Y = z − x})
x=0
Xz    Xz     
n m x+z−x n−x+m−(z−x) z n+m−z n m n + m z n+m−z
= p q =p q = p q
x=0
x z−x x=0
x z−x z

where the last equality holds using Vandermonde’s identity in Recitation 1. Thus, we conclude
that X + Y ∼ Bin(n + m, p).
Intuition: X is the total number of successes in n independent Bernoulli trials and Y is the total
number of successes in m independent Bernoulli trials, where the success probability is equal
to p in all trials. Because of the independence between X, Y , the Bernoulli trials underlying X
and Y are also independent. Thus X + Y is the total number of successes among the m + n
independent Bernoulli trials with success probability p. According to the story of the Binomial,
we have X + Y ∼ Bin(m + n, p).
3. A Binomial can’t be negative, but X − Y is negative with positive probability.
3-6

4. By definition of conditional probability,


P (X = k, X + Y = j)
P (X = k|X + Y = j) =
P (X + Y = j)
P (X = k, Y = j − k)
=
P (X + Y = j)
P (X = k)P (Y = j − k)
=
P (X + Y = j)

n k n−k m
 j−k m−(j−k)
p q p q
=
k

j−k
n+m j n+m−j
j
pq
   
n m n+m
=
k k−j j
Note that the p disappeared! This is exactly the same distribution as in the capture-recapture
problem (it is called the Hyper-geometric distribution). To see why, imagine that there are n
tagged animals and m untagged animals, and a random sample of j of these animals is recap-
tured, each of which is recaptured with probability p (independently). Suppose we then want
to know how many of the tagged animals are recaptured, given that a total of j animals have
been recaptured. For this, p is no longer relevant, and this problem is equivalent to the original
capture-recapture problem.

Exercise 5. Let the number of chocolate chips in a certain type of cookie have a Poisson distribution.
We want the probability that a randomly chosen cookie has at least two chocolate chips to be greater
than .99. Find the smallest value of the parameter of the Poission distribution that ensures this
probability.

Answer: Let X = the number of chocolate chips in one cookie, then we have X ∼ Pois(λ) with PMF
λk
P (X = k) = e−λ , k = 0, 1, 2, ...
k!
We want the probability that a randomly chosen cookie has at least two chocolate chips to be greater
than 0.99, i.e. P (X ≥ 2) > 0.99. Equivalently, we require

P (X < 2) = P (X = 0) + P (X = 1) = e−λ (1 + λ) ≤ 0.01

which yields that λ ≥ 6.64 (only an approximate solution).

Exercise 6. An electronics store has received a shipment of 20 table radios that have connections for
an iPod or iPhone. Twelve of these have two slots (so they can accommodate both devices), and the
other eight have a single slot. Suppose that 6 of the 20 radios are randomly selected to be stored under
a shelf where radios are displayed, and the remaining ones are placed in a storeroom. Let X = the
number among the radios stored under the display shelf that have two slots.
3-7

1. What kind of a distribution does X have (name and values of all parameters)?
2. Compute P (X = 2) and P (X ≤ 2).

Answer:

1. It is a HGeom(12, 8, 6) distribution.
2. We have
12 8
 
P (X = 2) = 2
20
4
6

P (X ≤ 2) = P (X = 0) + P (X = 1) + P (X = 2)
 
12 8
 
12 8
 
12 8
= 0
20
6 + 1
20
5 + 2
20
4 .
6 6 6

Exercise 7. A couple decides to keep having children until they have at least one boy and at least one
girl, and then stop. Assume they never have twins, that the “trials” are independent with probability
1/2 of a boy, and that they are fertile enough to keep producing children indefinitely. What is the
distribution of the number of children they have?

Answer: Let X as the number of children they finally have. By law of total probability,

P (X = k) = P (X = k | first child is girl)P (first child is girl) + P (X = k | first child is boy)P (first child is boy)
1 1
= P (X = k | first child is girl) + P (X = k | first child is boy),
2 2
If this couple firstly have a boy (girl), then they will keep having children until they have a girl (boy),
which corresponds to the definition of Geometric distribution. With Y ∼ Geom(1/2), we have

1
P (X = k | first child is girl) = P (X = k | first child is boy) = P (Y = k − 1) = .
2k−1
Therefore,
1
P (X = k) = , k = 2, 3, 4, ...
2k−1
This shows that X − 2 ∼ Geom(1/2).

Exercise 8. Players A and B take turns in answering trivia questions, starting with player A answering
the first question. Each time A answers a question, she has probability p1 of getting it right. Each
time B plays, he has probability p2 of getting it right. Assume independence in whether they get the
questions right.

1. If A answers m questions, what is the PMF of the number of questions she gets right?
3-8

2. If A answers m times and B answers n times, what is the PMF of the total number of questions
they get right (you can leave your answer as a sum)? Describe exactly when/whether this is a
Binomial distribution.
3. Suppose that the first player to answer correctly wins the game (with no predetermined maximum
number of questions that can be asked). Find the probability that A wins the game.

Answer:

1. The r.v. is Binomial(m, p1 ), so the PMF is m
k
pk1 (1 − p1 )m−k for k ∈ {0, 1, ..., m}.
2. Let T be the total number of questions they get right, X the number of questions A gets right,
and Y the number of questions B gets right. We have X ∼ Binomial(m, p1 ), Y ∼ Binomial(n, p2 )
and they are independent. To get a total of k questions right, it must be that A got 0 and B got
k, or A got 1 and B got k − 1, etc. Following the analysis for question 2 in Exercise 4, the PMF
of T is
X
k X
k
P (T = k) = P (X = j, Y = k − j) = P (X = j)P (Y = k − j)
j=0 j=0
k  
X  
m n
= − p1 )
pj1 (1 m−j
pk−j (1 − p2 )n−(k−j)
j=0
j k − j 2


for k ∈ {0, 1, ..., m + n}, with the usual convention that nk is 0 for k > n.
This is the Bin(m + n, p) distribution if p1 = p2 = p, using the story for the Binomial, or using
the analysis for question 1 in Exercise 4. For p1 ̸= p2 , it’s not a Binomial distribution, since the
trials have different probabilities of success.
3. Let r = P (A wins). Conditioning on the results of the first question for each player and applying
law of total probability, we have

r = P (A wins | A gets the first Q right) × P (A gets the first Q right)


| {z } | {z }
=1 =p1

+ P (A wins | A gets the first Q wrong, B gets the second right)


| {z }
=0

× P (A gets the first Q wrong, B gets the second right)


+ P (A wins | A gets the first Q wrong, B gets the second wrong)
| {z }
=r

× P (A gets the first Q wrong, B gets the second wrong)


| {z }
(1−p1 )(1−p2 )

Therefore,
r = p1 + (1 − p1 )(1 − p2 )r
p1
which gives r = p1 +p2 −p1 p2
.
3-9

Comment 1: Here we condition on the first step of a process to simplify problems. This show the
power of conditioning as a problem-solving tool.

Comment 2: Here we provide another idea to apply first-step analysis, which is inspired by a
student attending this course in 2022 Fall.
We denote the probability that first player wins by r(p1 , p2 ) (with the probability p1 , p2 for
first/second player answering correctly). Then we can have following equation by just condi-
tioning on the outcome of first player’s answer:

r(p1 , p2 ) = P (A wins as the first player)


= P (A gets the first Q right) + P (A gets the first Q wrong but A wins later)
= p1 + P (A wins later | A gets the first Q wrong)P (A gets the first Q wrong)
= p1 + P (B does not win starting from the second Q | A gets the first Q wrong) × (1 − p1 )
= p1 + P (B does not win as the first player) × (1 − p1 )
= p1 + (1 − r(p2 , p1 ))(1 − p1 )

Here the fifth equality holds because when A gets the first question wrong, the game passes to B
and we can view B as the first player. This is why P (B does not win starting from the second Q |
A gets the first Q wrong) = P (B does not win as the first player) = (1 − r(p2 , p1 )).
Symmetrically, we have

r(p2 , p1 ) = p2 + (1 − p2 )(1 − r(p1 , p2 ))

and we can solve out r(p1 , p2 ) by these two equations, which is also p1 /(p1 + p2 − p1 p2 ). Moreover,
as a special case, we consider p = p1 = p2 , then we only need one equation:

r = p + (1 − p)(1 − r)

which gives r = 1/(2 − p).

3.3 Continuous Random Variables and Probability Density

3.3.1 Review

Definition 3.4 (Continuous random variable). A random variable has a continuous distribution
if its cumulative distribution function is continuous everywhere and differentiable almost everywhere
(i.e., differentiable everywhere except at countably many points). A continuous random variable
is a r.v. with a continuous distribution.
3-10

Definition 3.5 (Probability density function). For a continuous r.v. X with CDF FX , the probability
density function (PDF) of X is the derivative fX of the CDF, given by fX (x) = FX′ (x). The support
of X, and of its distribution, is the set

Supp(X) = {x ∈ R : fX (x) > 0} .

Theorem 3.6. Let X be a continuous r.v. with PDF fX . For any region A ⊆ R,
Z
P (X ∈ A) = fX (x)dx.
A

In particular, for any x ∈ R,


Z x
FX (x) = P (X ≤ x) = fX (t)dt.
−∞

3.3.2 Exercises

Exercise 9. A family of pdfs that has been used to approximate the distribution of income, city
population size, and size of firms is the Pareto family. The family has two parameters, k and θ, both
> 0, and the pdf is
k · θk
f (x; k, θ) = , x ≥ θ.
xk+1
1. Verify that the above is a legitimate PDF.
2. If the rv X has pdf f (x; k, θ), for any fixed b > θ, obtain an expression for P (X ≤ b).
3. For θ < a < b, obtain an expression for the probability P (a ≤ X ≤ b).

Answer:

1. It’s clear that PDF f (x; k, θ) = kθk /xk+1 ≥ 0 for all x ≥ θ, and
Z ∞ Z ∞ ∞
kθk θk
f (x; k, θ)dx = dx = − =1
θ θ xk+1 xk θ

2. For any fixed b > θ,


Z b b
kθk θk θk
P (X ≤ b) = dx = − =1−
θ xk+1 xk θ bk
3. For θ < a < b,
Z b b
kθk θk θk θk
P (a ≤ X ≤ b) = k+1
dx = − k = k
− k
a x x a a b

Exercise 10. A gorcery store sells X hundred kilograms of rice every day, where the distribution of
3-11

X is of the following form:




 0, x<0




kx2 , 0≤x<3
F (x) = .

 k (−x2 + 12x − 3) , 3≤x<6




1, x≥6

Suppose this grocery store’s total sales of rice does not reach 600 kilograms on any given day.

1. Find the value of k;


2. Does this correspond to the distribution of a continuous random variable? A discrete random
variable?
3. What is the probability that the store sells between 200 and 400 kiligrams of rice next Thursday?
4. What is the probability that the store sells over 300 kilograms of rice next Thursday?
5. We are given that the store sold at least 300 kilograms of rice last Friday. What is the probability
that it did not sell more than 400 kilograms on that day?

Answer:

1. Since this grocery store’s total sales of rice does not reach 600 kilograms on any given day, we
have P (X = 6) = 0, so
1 = F (6) = lim F (x) = k(−36 + 72 − 3)
x→6−

which yields that k = 1/33.


2. It cannot correspond to the distribution of a continuous random variable because F (x) is not a
continuous function. Note that limx→3− kx2 = 3/11, which is not equal to F (3) = k(−32 + 12 ×
3 − 3) = 8/11. This means that F is not continuous at x = 3.
It doesn’t correspond to the distribution of a discrete random variable either because X can take
arbitrary values in (0, 6), which is not countably infinite.
3. P (2 ≤ X ≤ 4) = F (4) − F (2) = 25/33.
4. P (X > 3) = 1 − F (3) = 3/11.
F (4)−limx→3− F (x)
5. P (X ≤ 4 | X ≥ 3) = P (3 ≤ X ≤ 4)/P (X ≥ 3) = 1−limx→3− F (x)
= 29/33−9/33
1−9/33
= 5/6.

You might also like