0% found this document useful (0 votes)
2 views

Chapter-3

Chapter 3 discusses discrete random variables and their associated probability distributions, including definitions and key concepts such as probability mass functions (pmf) and cumulative distribution functions (cdf). It provides examples of discrete random variables, their probability distributions, and calculations of expectations. The chapter serves as a study aid for students in a Probability and Statistics course, emphasizing the importance of attending lectures for a comprehensive understanding.

Uploaded by

f20220001
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views

Chapter-3

Chapter 3 discusses discrete random variables and their associated probability distributions, including definitions and key concepts such as probability mass functions (pmf) and cumulative distribution functions (cdf). It provides examples of discrete random variables, their probability distributions, and calculations of expectations. The chapter serves as a study aid for students in a Probability and Statistics course, emphasizing the importance of attending lectures for a comprehensive understanding.

Uploaded by

f20220001
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 26

Chapter 3

Discrete Random Variables and


Probability Distributions

Note: This module is prepared from the text/reference books of Probability and Statistics course
(MATH F113) just to help the students. The study material is expected to be useful but not
exhaustive. For detailed study, the students are advised to attend the lecture/tutorial classes
regularly, and consult the text/reference books.

Dr. Suresh Kumar, Associate Professor,


Department of Mathematics, BITS Pilani, Pilani Campus, Rajasthan-333031, INDIA.
E-mail: [email protected], [email protected]

Appeal: Please do not print this e-module unless it is really necessary.

1
Contents

3 Discrete Random Variables and Probability Distributions 1


3.1 Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
3.1.1 Expectation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
3.1.2 Variance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
3.1.3 Standard Deviation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
3.1.4 Moments and moment generating function . . . . . . . . . . . . . . . . . . . 10
3.2 Geometric Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
3.2.1 Negative Binomial Distribution . . . . . . . . . . . . . . . . . . . . . . . . . 12
3.3 Binomial Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
3.3.1 Multinomial Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
3.4 Hypergeometric Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
3.4.1 Binomial distribution as a limiting case of hypergeometric distribution . . . 21
3.4.2 Generalization of the hypergeometric distribution . . . . . . . . . . . . . . . 22
3.5 Poisson Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
3.6 Uniform Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26

2
3.1 Definitions
Discrete Random Variable
Suppose a random experiment results into finite or countably infinite outcomes with sample space
S. Then a variable X taking real values x corresponding to each outcome of the random experiment
(or each element of S) is called a discrete random variable. In other words, the discrete random
variable X is a function from the sample space S to the set of real numbers. So, in principle, the
discrete random variable X being a function could have any given definition.

Probability Mass Function (pmf )


A function f is said to be probability mass function of a discrete random variable X if it
satisfies the following three conditions:
(i) f (x) ≥ 0 for each value x of X.
(ii) fX(x) = P (X = x), that is, f (x) provides probability for each value x of X.
(iii) f (x) = 1, that is, sum of probabilities of all values x of X is 1.
X=x

Cumulative Distribution Function (cdf )


A function F defined by
X
F (x) = f (x)
X≤x

is called cumulative distribution function of X. Therefore, F (x) is the sum of probabilities of


all the values of X starting from its lowest value to the value x.

Example with finite sample space


Consider the random experiment of tossing of two fair coins. Then the sample space is

S = {HH, HT, T H, T T }.

Let X denotes the number of heads. Then X is a discrete random variable or the function from
the sample space S onto the set {0, 1, 2}, that is,

X : S = {HH, HT, T H, T T } → {0, 1, 2}.

since X(HH) = 2, X(HT ) = 1, X(T H) = 1 and X(T T ) = 0. In tabular form, it can be displayed
as

Outcome HH HT TH TT
X=x 2 1 1 0

So here the discrete random variable X assumes only three values X = 1, 2, 3.

3
We find that P (X = 0) = 14 , P (X = 1) = 1
2
and P (X = 2) = 14 . It is easy to see that the
function f given by

X=x 0 1 2
1 1 1
f (x) = P (X = x) 4 2 4

is pmf of X. It gives the probability distribution of X.


The cumulative distribution function F of X is given by

X=x 0 1 2
1 3
F (x) = P (X ≤ x) 4 4
1

Remark: Note that X is a function with domain as the sample space S. So, in the above example,
X could also be defined as the number of tails, and accordingly we could write its pdf and cdf.

Example with countably infinite sample space


Suppose a fair coin is tossed again and again till head appears. Then the sample space is
S = {H, T H, T T H, T T T H, . . . }.
The outcome H corresponds to the possibility of getting head in the first toss. The outcome T H
corresponds to the possibility of getting tail in the first toss and head in the second toss. Likewise,
T T H corresponds to the possibility of getting head in the third toss, and so on.
If X denotes the number of tosses in this experiment, then X is a function from the sample
space S to the set of natural numbers, and is given by

Outcome H TH TTH ...


X=x 1 2 3 ...

So here the discrete random variable X assumes countably infinite values X = 1, 2, 3, . . . .

The pmf of X is given by

X=x 1 2 3 ...
1 1 2 1 3
 
f (x) = P (X = x) 2 2 2
...

It can also be written in the closed form


 x
1
f (x) = , x = 1, 2, 3, ........
2
Notice that f (x) ≥ 0 for all x and
∞  x 1
X X 1 a
f (x) = = 2 1 = 1 (∵ The sum of infinite G.P. a + ar + ar2 + . . . is 1−r
).
X=x x=1
2 1− 2

4
The cumulative distribution function F of X is given by
1
x   x
1 − 21

X
2 1
F (x) = f (x) = 1 =1− , where x = 1, 2, 3, .........
X≤x
1 − 2
2
Note. Determining cdf could be very useful. For instance, in the above example, suppose it
is required to calculate P (10 ≤ X ≤ 30). Here, one option is to sum all the probabilities from
P (X = 10) to P (X = 30). Instead, we use the
 cdf to obtain
P (10 ≤ X ≤ 30) = F (30) − F (9) = 1 − 2130 − 1 − 219 = 219 − 2130 .


Some more illustrative examples


Ex. A shipment of 20 similar laptop computers to a retail outlet contains 3 that are defective.
If a school makes a random purchase of 2 of these computers, find the probability distribution for
the number of defectives.

Sol. f (0) = 68/95, f (1) = 51/190 and f (2) = 3/190.

Ex. Find the probability distribution of the number of heads in a toss of four coins. Also, plot
the probability mass function and probability histogram.

Sol. Total number of points in the  sample


 4 space
 is 16. The number points in the4sample space
4 4 4 4
with 0, 1, 2, 3 and 4 heads are 0 , 1 , 2 , 3 and 4 , respectively. So f (0) = 0 /16 = 1/16,
f (1) = 41 /16 =1/4, f (2) = 42 /16 = 3/8, f (3) = 43 /16 = 1/4 and f (4) = 44 /16 = 1/16.
Thus, f (x) = x4 /16, x = 0, 1, 2, 3, 4.
The probability mass function plot and probability histogram are shown in Figure 3.1.

Figure 3.1: Probability mass function plot and probability histogram

Ex. If a car agency sells 50% of its inventory of a certain foreign car equipped with side airbags,
find a formula for the probability distribution of the number of cars with side airbags among the
next 4 cars sold by the agency.
4

Sol. f (x) = x
/16, x = 0, 1, 2, 3, 4.

5
3.1.1 Expectation
Let X be a random variable with pmf p. Then, the expectation of X, denoted by E(X), is defined
as
X
E(X) = xf (x).
X=x

More generally, if H(X) is function of the random variable X, then we define


X
E(H(X)) = H(x)f (x).
X=x

Ex. Let X denotes the number of heads in a toss of two fair coins. Then X assumes the values
0, 1 and 2 with probabilities 14 , 12 and 14 respectively. So E(X) = 0 × 14 + 1 × 12 + 2 × 14 = 1.

Note: (1) The expectation E(X) of the random variable X is the theoretical average or mean
value of X. In a statistical setting, the average value, mean value1 and expected value are syn-
onyms. The mean value is demoted by µ. So E(X) = µ.

(2) If X is a random variable, then it is easy to verify the following:


(i) E(c) = c
(ii) E(cX) = cE(X)
(iii) E(cX + d) = cE(X) + d
(iv) E(cH(X) + dG(X)) = cE(H(X)) + dE(G(X))
where c, d are constants, and H(X) and G(X) are functions of X. Thus, expectation respects the
linearity property.

(3) The expected or the mean value of the random variable X is a measure of the location of the
center of values of X.

Some illustrative examples


Ex. A lot containing 7 components is sampled by a quality inspector; the lot contains 4 good
components and 3 defective components. A sample of 3 is taken by the inspector. Find the ex-
pected value of the number of good components in this sample.

Sol. Let X representthe number of good components in the sample. Then probability distribu-
4 3

x 3−x
tion of X is f (x) = 7
 , x = 0, 1, 2, 3.
3
1
From your high school mathematics, you know that if we have n distinct values x1 , x2 , ...., xn with frequencies
n
X
f1 , f2 , ...., fn respectively and fi = N , then the mean value is
i=1
n n   n
X fi xi X fi X
µ= = xi = f (xi )xi .
i=1
N i=1
N i=1

fi
where f (xi ) = N is the probability of occurrence of xi in the given data set. Obviously, the final expression for µ
is the expectation of a random variable X assuming the values xi with probabilities f (xi ).

6
Simple calculations yield f (0) = 1/35, f (1) = 12/35, f (2) = 18/35, and f (3) = 4/35. Therefore,
3
X 12
µ = E(X) = xf (x) = = 1.7.
x=0
7
Thus, if a sample of size 3 is selected at random over and over again from a lot of 4 good
components and 3 defective components, it will contain, on average, 1.7 good components.

Ex. A salesperson for a medical device company has two appointments on a given day. At the
first appointment, he believes that he has a 70% chance to make the deal, from which he can earn
$ 1000 commission if successful. On the other hand, he thinks he only has a 40% chance to make
the deal at the second appointment, from which, if successful, he can make $1500. What is his
expected commission based on his own probability belief? Assume that the appointment results
are independent of each other.

Sol. First, we know that the salesperson, for the two appointments, can have 4 possible commis-
sion totals: $0, $1000, $1500, and $2500. We then need to calculate their associated probabilities.
By independence, we obtain
f (0) = (1 − 0.7)(1 − 0.4) = 0.18,
f (2500) = (0.7)(0.4) = 0.28,
f (1000) = (0.7)(1 − 0.4) = 0.42,
f (1500) = (1 − 0.7)(0.4) = 0.12.

Therefore, the expected commission for the salesperson is


E(X) = (0)(0.18) + (1000)(0.42) + (1500)(0.12) + (2500)(0.28) = $1300.

Ex. Suppose that the number of cars X that pass through a car wash between 4:00 P.M. and 5:00
P.M. on any sunny Friday has the following probability distribution:

X=x 4 5 6 7 8 9
1 1 1 1 1 1
f (x) = P (X = x) 12 12 4 4 6 6

Let g(X) = 2X − 1 represent the amount of money, in dollars, paid to the attendant by the
manager. Find the attendant’s expected earnings for this particular time period.

Sol. We find
9
X
E(g(X)) = E(2X − 1) = (2x − 1)f (x) = $12.67.
x=4

3.1.2 Variance
Let X and Y be two random variables assuming the values X = 1, 9 and Y = 4, 6. We observe
that both the variables have the same mean values given by µX = µY = 5. However, we see that

7
the values of X are far away from the mean or the central value 5 in comparasion to the values of
Y . Thus, the mean value of a random variable does not account for its variability. In this regard,
we define a new parameter known as variance. It is defined as follows.

If X is a random variable with mean µ, then its variance, denoted by V (X) is defined as the
expectation of (X − µ)2 . So, we have
V (X) = E((X −µ)2 ) = E(X 2 )+µ2 −2µE(X) = E(X 2 )+E(X)2 −2E(X)E(X) = E(X 2 )−E(X)2 .

Ex. Let X denotes the number of heads in a toss of two fair coins. Then X assumes the values
0, 1 and 2 with probabilities 41 , 12 and 14 respectively. So
E(X) = 0 × 14 + 1 × 12 + 2 × 14 = 1,
E(X 2 ) = (0)2 × 41 + (1)2 × 21 + (2)2 × 14 = 32 .
∴ V (X) = 23 − 1 = 12 .

Note: (i) The variance V (X) of the random variable X is also denoted by σ 2 . So V (X) = σ 2 .
(ii) If X is a random variable and c is a constant, then it is easy to verify that V (c) = 0 and
V (cX) = c2 V (X).

Some illustrative examples


Ex. Let the random variable X represent the number of automobiles that are used for official
business purposes on any given workday. The probability distribution for company A is

x 1 2 3
f (x) 0.3 0.4 0.3

and that for company B is

x 0 1 2 3 4
f (x) 0.2 0.1 0.3 0.3 0.1

Show that the variance of the probability distribution for company B is greater than that for
company A.

Sol. µA = 2.0, σA2 = 0.6, µB = 2.0 and σB2 = 1.6.

Ex. Calculate the variance of g(X) = 2X + 3, where X is a random variable with probability
distribution

x 0 1 2 3
1 1 1 1
f (x) 4 8 2 8

2
Sol. µ2X+3 = 6, σ2X+3 = 4.

8
Ex. Find the mean and variance of a random variable X with the pmf given by

f (x) = cx, x = 1, 2, 3, ...., n

where c is a constant and n is some fixed natural number.


n
X 2
Sol. Using the condition f (x) = 1, we get c(1 + 2 + ...... + n) = 1 or c = .
x=1
n(n + 1)
n n
X X n(n + 1)(2n + 1) 2n + 1
Now µ = E(X) = xf (x) = cx2 = c = .
x=1 x=1
6 3
n n
X n2 (n + 1)2
X n(n + 1)
E(X 2 ) = x2 f (x) = cx3 = c
= .
x=1 x=1
4 2
 2
2 2 2 n(n + 1) 2n + 1
σ = E(X ) − E(X) = − .
2 3
Ex. Consider a random variable X with the pmf given by

f (x) = c 2−|x| , x = ±1, ±2, ±3, ....,

2|X|
where c is a constant. If g(X) = (−1)|X|−1 , then show that E(g(X)) exists but E(|g(X)|)
2|X| − 1
does not exist.

X
Sol. Using the condition f (x) = 1, we find c = 1/2.
x=±1
∞ ∞
X X 1
Now E(g(X)) = g(x)f (x) = (−1)|x|−1
, which is an alternating and con-
x=±1 x=±1
2(2|x| − 1)

X 1
vergent series. So E(g(X)) exists. But E(|g(X)|) = is a divergent series, so
x=±1
2(2|x| − 1)
E(|g(X)|) does not exist.

3.1.3 Standard Deviation


The variance of a random variable, by definition, is sum of the squares of the differences of the
values of the random variable from the mean value. So variance carries squared units of the original
data, and hence is a pure number often without any physical meaning. To overcome this problem,
a second measure of variability is employed known as standard deviation and is defined as follows.
Let X be a random variable with variance σ 2 . Then the standard deviation of X denoted by
σ is the the non-negative square root of X, that is,
p
σ= V (X).
Note: A large standard deviation implies that the random variable X is rather inconsistent and
somewhat hard to predict. On the other hand, a small standard deviation is an indication of
consistency and stability.

9
3.1.4 Moments and moment generating function
Let X be a random variable and k be any positive integer. Then E(X k ) defines the kth ordinary
moment of X.
Obviously, E(X) = µ is the first ordinary moment, E(X 2 ) is the second ordinary moment and
so on. Further, the ordinary moments can be obtained from the function E(etX ). For, the ordinary
k
moments E(X k ) are coefficients of tk! in the expansion
t2
E(e ) = 1 + tE(X) + E(X 2 ) + ............
tX
2!
Also, we observe that
dk 
E(X k ) = E(etX ) t=0 .

dtk

Thus, the function E(etX ) generates all the ordinary moments. That is why, it is known as the
moment generating function and is denoted by mX (t). Thus, mX (t) = E(etX ).
In general, the kth moment of a random variable X about any point a is defined as E((X −a)k ).
Obviously, a = 0 for the ordinary moments. Further, E(X − µX ) = 0 and E((X − µX )2 ) = σX 2
.
So the first moment about mean is 0 while the second moment about mean yields the variance.

3.2 Geometric Distribution


The geometric distribution arises under the following conditions:
(i) The random experiment consists of a series of independent trials.
(ii) Each trial results into two outcomes, namely success (S) and failure (F ), which have constant
probabilities p and q = 1 − p, respectively.a
(iii) X denotes the number of trials to obtain the first success.
a
Such trials are called as Bernoulli trials.
Then the sample space of the random experiment is
S = {S, F S, F F S, ..........},
and X is a discrete random variable with countably infinite values: X = 1, 2, 3, ......... such that

Outcome S FS FFS ...


X=x 1 2 3 ...
P (X = x) p qp qp2 ...

Thus, the pmf of X, denoted by g(x; p), is given by


g(x; p) = q x−1 p, x = 1, 2, 3......
The random variable X with this pmf is called geometric random variable. Here the name ‘ge-
ometric’ because the probabilities p, qp, q 2 p,.... in succession constitute a geometric progression.
Given the value of the parameter p, the probability distribution of the geometric random variable
X is uniquely described.

10
Mean, variance and mgf of geometric random variable
For the geometric random variable X, we have

X
(i) µX = E(X) = xg(x; p)
x=1

X
= xq x−1 p
x=1
X∞
=p xq x−1
x=1

!
d X
=p qx
dq x=1

X
(∵ Term by term differentiation is permissible for the convergent power series q x within its
x=1
intervalof convergence
 |q| < 1.)
d q
=p
dq 1 − q
1
=p
(1 − q)2
1
=
p
2
(ii) σX = E(X 2 ) − E(X)2 = E(X(X − 1)) + E(X) − E(X)2

X 1 1
=p x(x − 1)q x−1 + − 2
x=1
p p

X q
= pq x(x − 1)q x−2 − 2
x=1
p
d2

q q
=p 2 − 2
dq 1 − q  p
d2 q q
= pq 2 − 2
dq  1 − q  p
d 1 q
= pq 2
− 2
dq (1 − q) p
2 q
= pq −
(1 − q)3 p2
2 q
= pq 3 − 2
p p
2q q
= 2 − 2
p p
q
= 2.
p

X
(iii) mX (t) = E(etX ) = etx g(x; p)
x=1

11

X
=p etx q x−1
x=1

pX t x
= (qe )
q x=1
p qet
= (t < − ln q)
q 1 − qet
pet
=
1 − qet
Remark: Note that we can easily obtain E(X) and E(X 2 ) from the moment generating function
mX (t) by using
dk
E(X k ) = [mX (t)]t=0 ,
dtk
for k = 1 and k = 2 respectively. In other words the first and second t-derivatives of mX (t) at
t = 0 provide us E(X) and E(X 2 ), respectively. Hence we easily get mean and variance from the
moment generating function. Verify!

Some illustrative examples


Ex. A fair coin is tossed again and again till head appears. If X denotes the number of tosses in
x
this experiment, then X is a geometric random variable with the pmf g(x) = 21 , x = 1, 2, 3, .........
Here p = 12 .

Ex. For a certain manufacturing process, it is known that, on the average, 1 in every 100 items is
defective. What is the probability that the fifth item inspected is the first defective item found?

Sol. Here p = 1/100 = 0.01 and x = 5. So required probability is (0.01)(0.99)4 = 0.0096.

Ex. At a busy time, a telephone exchange is very near capacity, so callers have difficulty placing
their calls. It may be of interest to know the number of attempts necessary in order to make a
connection. Suppose that we let p = 0.05 be the probability of a connection during a busy time.
Find the probability of a successful call in the fifth attempt.

Sol. Here p = 1/100 = 0.05 and x = 5. So required probability is (0.05)(0.95)4 = 0.041.

3.2.1 Negative Binomial Distribution


In geometric distribution, X is the number of trials to obtain the first success. Its more general
version is that we choose X as the number of trials to obtain the kth success. Then X is called a
negative binomial random variable with the values X = k, k + 1, k + 2, . . . .. Since the final trial
among the x trials would result in a success, therefore the remaining k − 1 successes can occur
in x−1
k−1
ways from the x − 1 trials. Hence, the pmf of the negative binomial random variable X,
denoted by nb(x; k, p), is given by
 
x − 1 k x−k
nb(x; k, p) = p q , x = k, k + 1, k + 2, . . . .
k−1

12
If we make a change of variable via y = x − k, then
   
k+y−1 k y k+y−1 k y
nb(y; k, p) = p q = p q , y = 0, 1, 2, . . . .
k−1 y
   
n n
Here we have used the well known result: = .
x n−x
∞  
−k
X k+y−1 y
Note that (1 − q) = q is a negative binomial series.
y=0
y
X∞
Now, let us show that nb(x; k, p) = 1. For,
x=k
∞ ∞  
X X x − 1 k x−k
nb(x; k, p) = p q
x=k x=k
k−1
∞  
X y+k−1 k y
= p q , where y = x − k
y=0
k−1
∞  
k
X y+k−1 y
=p q
y=0
y
= pk (1 − q)−k
= pk p−k
= 1.

Mean, variance and mgf of negative binomial random variable


For the negative binomial random variable X, we have

X
(i) µX = E(X) = x nb(x; k, p)
x=k
∞  
X x − 1 k x−k
= x p q
x=k
k−1
∞  
X x k x−k
= k p q
x=k
k
∞  
k X x k+1 x−k
= p q
p x=k k
∞  
k X y−1
= pk+1 q y−(k+1) , where x = y − 1,
p y=k+1 (k + 1) − 1

k X
= nb(y; k + 1, p)
p y=k+1
k
= .1
p
k
=
p

13

X
(ii) E((X + 1)X) = (x + 1)x nb(x; k, p)
∞   x=k
X x − 1 k x−k
= (x + 1)x p q
x=k
k−1
∞  
X x + 1 k x−k
= (k + 1)k p q
x=k
k + 1
∞  
k(k + 1) X x + 1 k+2 x−k
= p q
p2 x=k
k + 1
∞  
k(k + 1) X y−1
= pk+2 q y−(k+2) , where x = y − 2,
p2 y=k+2
(k + 2) − 1

k(k + 1) X
= nb(y; k + 2, p)
p2 y=k+2
k(k + 1)
= .1
p2
k(k + 1)
=
p2
k(k + 1) k k 2 kq
So V (X) = E((X + 1)X) − E(X) + E(X)2 = 2
− − 2 = 2
p p p p
∞ ∞  
X X x − 1 k x−k
(iii) mX (t) = E(etX ) = etx nb(x; k, p) = etx p q
x=k x=k
k−1
∞  
X x − 1 k x−k tx
= p q e
x=k
k−1
∞  
X y + k − 1 k y t(y+k)
= p q e , where y = x − k
y=0
k−1
∞  
k tk
X y+k−1
=p e (qet )y
y=0
y
= (pet )k (1 − qet )−k
k
pet

=
1 − qet
Ex. In an NBA (National Basketball Association) championship series, the team that wins four
games out of seven is the winner. Suppose that teams A and B face each other in the championship
games and that team A has probability 0.55 of winning a game over team B.
(a) What is the probability that team A will win the series in 6 games?
(b) What is the probability that team A will win the series?

Sol. (a) Here x = 6, k = 4, p = 0.55. So required probability is


nb(4; 4, 0.55) = 6−1
4−1
(0.55)4 (1 − 0.55)6−4 = 0.1853.

(b) The team A can win the championship series in 4th or 5th or 6th or the 7th game. So
required probability is

14
nb(4; 4, 0.55) + nb(5; 4, 0.55) + nb(6; 4, 0.55) + nb(7; 4, 0.55) = 0.6083.

3.3 Binomial Distribution


The binomial distribution arises under the following conditions:
(i) The random experiment consists of a finite number n of independent trials.
(ii) Each trial results into two outcomes, namely success (S) and failure (F ), which have constant
probabilities p and q = 1 − p, respectively in each trial.
(iii) X denotes the number of successes in n trials.
Then the sample space of the random experiment is
S = S0 ∪ S1 ∪ S2 ∪ · · · ∪ Sn ,
where the sets
S0 = {F F · · · F },
S1 = {SF · · · F, F S · · · F, ...., F F · · · S},
...........
Sn = {SS  · · ·S}
carry n0 , n1 , ......, nn number of elements (outcomes)

since out of n trials, no success can take
n n

place in 0 ways,  can take place in 1 ways, and so on. Thus, the sample space S
 one success
carries n0 + n1 + ...... + nn = (1 + 1)n = 2n outcomes.
The random variable X being the number of successes in the n trials takes the values: X =
0, 1, 2, ...., n such that

Outcome S0 S1 ..... S2
X=x 0 1 ..... n
n n n n−1 n n
  
P (X = x) 0
q 1
q p ..... n
p

So, the pmf of X, denoted by b(x; n, p), is given by


 
n n−x x
b(x; n, p) = q p , x = 0, 1, 2, 3......, n.
x

The random variable X withthis pmf  is called binomial random variable. Here the name ‘binomial’
because the probabilities n0 q n , n1 q n−1 p,....., nn pn in succession are the terms in the binomial


expansion of (q + p)n . Once the values of the parameters n and p are given/determined, the pmf
uniquely describes the binomial distribution of X.

Mean, variance and mgf of binomial random variable


For the binomial random variable X, we have
n
X
(i) µX = E(X) = xb(x; n, p)
x=0
n  
X n n−x x
= x q p
x=0
x

15
n  
X n − 1 n−x x−1
= np q p
x=1
x−1
n−1  
X n − 1 n−1−y y
= np q p (where y = x − 1)
y=0
y
= np(p + q)n−1 = np.
n
X
(ii) E(X(X − 1)) = x(x − 1)b(x; n, p)
x=0
n  
X n n−x x
= x(x − 1) q p
x=0
x
n  
2
X n − 2 n−x x−2
= n(n − 1)p q p
x=2
x − 2
n−2  
2
X n − 2 n−2−y y
= n(n − 1)p q p (where y = x − 2)
y=0
y
= n(n − 1)p2 (p + q)n−2 = n(n − 1)p2 .

2
So σX = E(X 2 ) − E(X)2 = E(X(X − 1)) + E(X) − E(X)2 = n(n − 1)p2 + np − n2 p2 = npq
n
X
tX
(iii) mX (t) = E(e ) = xb(x; n, p)
x=0
n  
tx n
X
= e q n−x px
x=0
x
n  
X n n−x t x
= q (pe )
x=0
x
= (q + pet )n

Note: In the particular case n = 1, the binomial distribution is called Bernoulli distribution:

b(x; 1, p) = q 1−x px , x = 0, 1.

Some illustrative examples


Ex. Suppose a die is tossed 5 times. What is the probability of getting exactly 2 fours?

Sol. Here n = 5, p = 1/6, x = 2, and therefore


P (X = 2) = b(2; 5, 1/6) = 52 (1 − 1/6)5−2 (1/6)2 = 0.161.

Ex. The probability that a certain kind of component will survive a shock test is 3/4. Find the
probability that exactly 2 of the next 4 components tested survive.

Sol. Here n = 4, p = 3/4, x = 2, and therefore


P (X = 2) = b(2; 4, 3/4) = 42 (1 − 3/4)4−2 (3/4)2 = 27/128.

16
Ex. The probability that a patient recovers from a rare blood disease is 0.4. If 15 people are
known to have contracted this disease, what is the probability that (a) at least 10 survive, (b)
from 3 to 8 survive, and (c) exactly 5 survive?

Sol. (a) 0.0338 (b) 0.8779 (c) 0.1859

Ex. A large chain retailer purchases a certain kind of electronic device from a manufacturer. The
manufacturer indicates that the defective rate of the device is 3%.
(a) The inspector randomly picks 20 items from a shipment. What is the probability that there
will be at least one defective item among these 20?
(b) Suppose that the retailer receives 10 shipments in a month and the inspector randomly tests
20 devices per shipment. What is the probability that there will be exactly 3 shipments each con-
taining at least one defective device among the 20 that are selected and tested from the shipment?

Sol. (a) Denote by X the number of defective devices among the 20. Then X follows a binomial
distribution with n = 20 and p = 0.03. Hence, P (X ≥ 1) = 1 − P (X = 0) = 0.4562.

(b) In this case, each shipment can either contain at least one defective item or not. Hence,
testing of each shipment can be viewed as a Bernoulli trial with p = 0.4562 from part (a). As-
suming independence from shipment to shipment and denoting by Y the number of shipments
containing at least one defective item, Y follows another binomial distribution with n = 10 and
p = 0.4562. Therefore,
P (Y = 3) = 0.1602.

Ex. In a bombing attack, there is a 50% chance that any bomb will strike the target. At least
two direct hits are required to destroy the target. How many minimum number of bombs must be
dropped so that the probability of hitting the target at least twice is more than 0.99?

Sol. Let n bombs must be dropped so that there is at least 99% chance to hit the target at least
twice. Let X be random variable representing the number of bombs striking the target. Then
X = 0, 1, 2, ...., n follows a a binomial distribution with, p = 1/2, and therefore
P (X ≥ 2) ≥ 0.99 or 1 − P (X = 0) − P (X = 1) ≥ 0.99
It can be simplified to get 2n ≥ 100 + 100n. This inequality is satisfied if n ≥ 11. So at least 11
bombs must be dropped so that there is at least 99% chance to hit the target at least twice.

17
3.3.1 Multinomial Distribution
The binomial experiment becomes a multinomial experiment if we let each trial have more than
two possible outcomes. For example, the drawing of a card from a deck with replacement is a
multinomial experiment if the 4 suits are the outcomes of interest.
In general, if a given trial can result in any one of k possible outcomes o1 , o2 ,. . . , ok with
probabilities p1 , p2 ,. . . , pk , then the multinomial distribution gives the probability that o1 occurs
x1 times, o2 occurs x2 times, . . . , and ok occurs xk times in n independent trials, as follows:
 
n
f (x1 , x2 , . . . , xk ) = px1 1 px2 2 . . . pxnk ,
x1 , x2 , ..., xk

where
 
n n!
= ,
x1 , x2 , ..., xk x1 !x2 ! . . . xk !
x1 + x2 + · · · + xk = n, p1 + p2 + · · · + pk = 1.
Clearly, when k = 2, the multinomial distribution reduces to the binomial distribution.

Ex. The probabilities that a person goes to office by car, bus and train are 1/2, 1/4 and 1/4,
respectively. Find the probability that the person will go to office 2 days by car, 3 days by bus
and 1 day by train in the 6 days.

6! 1 2 1 3 1
  
Sol. 2!3!1! 2 4 4
.

Ex. The complexity of arrivals and departures of planes at an airport is such that computer
simulation is often used to model the “ideal” conditions. For a certain airport with three runways,
it is known that in the ideal setting the following are the probabilities that the individual runways
are accessed by a randomly arriving commercial jet:
Runway 1: p1 = 2/9,
Runway 2: p2 = 1/6,
Runway 3: p3 = 11/18.
What is the probability that 6 randomly arriving airplanes are distributed in the following fashion?
Runway 1: 2 airplanes,
Runway 2: 1 airplane,
Runway 3: 3 airplanes

6! 2 2 1 11 3
  
Sol. 2!1!3! 9 6 18
.

18
3.4 Hypergeometric Distribution
The hypergeometric distribution arises under the following conditions:
(i) The random experiment consists of choosing n objects without replacement from a lot of N objects
given that r objects possess a trait or property of our interest in the lot of N objects.
(ii) X denotes the number of objects possessing the trait or property in the selected sample of size n.
See the following venn diagram for an illustration.

N
N -r r

n- x x

It is easy to see that the x objects with the trait (by definition of X) are to be chosen from the
r objects in xr ways while the remaining n − x objects are to be chosen from the N − r objects
−r
in Nn−x

ways. So the n objects carrying x items with the trait can be chosen from the N objects
r N −r
in x n−x ways while Nn is the total numbers of ways in which n objects can be chosen from N
  

objects. Therefore, the pmf of X, denoted by h(x; N, r, n) is given by


r N −r
 
x n−x
∴ h(x; N, r, n) = P (X = x) = N
 .
n

The random variable X with this pmf is called hypergeometric random variable. The hypergeo-
metric distribution is characterized by the three parameters N , r and n. Note that X lies in the
range max(0, n + r − N ) ≤ x ≤ min(n, r). So minimum value of x could be n + r − N instead
of 0. To understand this, let N = 30, r = 20 and n = 15. Then the minimum value of x is
n + r − N = 15 + 20 − 30 = 5. For, there are only N − r = 10 objects without the trait in the
30 items. So a sample of 15 items certainly contains at least 5 objects with the trait. So in this
case, the random variable X takes the values 5, 6, ..., 15. Notice that the maximum value of x
is min(n, r) = min(20, 15) = 15. Similarly, if we choose n = 25, the random variable X takes the
values 15, 16, 17, 18, 19 and 20. In case, we choose n = 8, the random variable X takes the values
0, 1, 2 ..., 8.
Next, let us check whether h(x; N, r, n) is a valid pmf. Note that x ∈ [max(0, n + r −
N ), min(n, r)]. But we can take x ∈ [0, n] because in situations where this range is not [0, n],
we have h(x; N, r, n) = 0. Also, know the Vandermonde’s identity:
n   n a
    b 
X a b a+b X x n−x
= or a+b
 = 1.
x=0
x n−x n x=0 n

19
This identity is understandable in view of the following example.

Suppose a team of n persons is chosen from a group of a men and b women. The number of
ways of choosing the team of n persons from the group of a + b persons is a+bn
, the right hand
side of the Vandermonde’s identity. We can count these number of ways by considering that in
the team of n persons, x persons are men and remaining n − x persons are women. Then we end
up with getting the left hand side of the Vandermonde’s identity.
Now from the Vandermonde’s identity, it follows that
n n r N −r

X X x n−x
h(x; N, r, n) = N
 = 1. Thus, h(x; N, r, n) is a valid pmf.
x=0 x=0 n

Mean, variance and mgf of hypergeometric random variable


r

For the hypergeometric random variable X, it can be shown that µ X = E(X) = n N
and
 r  N − r N − n
2
σX =n .
N N N −1
X∞
For, µX = E(X) = xh(x; N, r, n)
 −r x
X xr Nn−x
= x N

x n
 r  X r−1 N −1−(r−1)
 
x−1 n−1−(x−1)
=n N −1

N x n−1 
r−1 N −1−(r−1)

r X x−1 n−1−(x−1)
=n , since N −1
 = 1 being the sum of the probabilities for a hypergeometric
N x n−1
distribution with parameters N − 1, r − 1 and n − 1.
 r  r − 1 
Likewise, it is easy to find that E(X(X − 1)) = n(n − 1) . Hence, we have
N N − 1
 r  N − r N − n
2 2
σX = E(X(X − 1)) + E(X) − E(X) = n .
N N N −1
Just for the sake of completeness in line with the other distributions, some details of the moment
generating function of the hypergeometric distribution are given below. It is given by
r N −r N −r
   t
2 F1 (−n, −r; N − r − n + 1; e )
X
tX tx x n−x n
mX (t) = E(e ) = e N
 = N
 .
x n n

Here 2 F1 is hypergeometric function defined as the sum of the infinite series

ab z a(a − 1)b(b − 1) z 2
2 F1 (a, b; c; z) =1+ + + .......,
c 1! c(c − 1) 2!
where a, b, c are constants, and z is variable of the hypergeometric function.

Also, note that


d ab
[2 F1 (a, b; c; z)] = 2 F1 (a + 1, b + 1; c + 1; z),
dz c

20
d2 a(a − 1)b(b − 1)
2
[2 F1 (a, b; c; z)] = 2 F1 (a + 2, b + 2; c + 2; z).
dz c(c − 1)
Following this, it can be shown that
 
d r
µX = E(X) = mX (t) =n .
dt t=0 N
2
Similarly, by calculating second derivative of mX (t) at t = 0, the variance can be found as σX =
 r  N − r N − n
E(X 2 ) − E(X)2 = n .
N N N −1

Some illustrative examples


Ex. Suppose we randomly select 5 cards without replacement from a deck of 52 playing cards.
What is the probability of getting exactly 2 red cards?

Sol. Here N = 52, r = 26, n = 5, x = 2, and therefore P (X = 2) = h(2; 52, 26, 5) = 0.3251.

Ex. Lots of 40 components each are deemed unacceptable if they contain 3 or more defectives.
The procedure for sampling a lot is to select 5 components at random and to reject the lot if a
defective is found. What is the probability that exactly 1 defective is found in the sample if there
are 3 defectives in the entire lot?

Sol. Here N = 40, r = 3, n = 5, x = 1, and therefore P (X = 1) = h(x; 40, 3, 5) = 0.3011.

3.4.1 Binomial distribution as a limiting case of hypergeometric dis-


tribution
There is an interesting relationship between the hypergeometric and the binomial distributions.
It can be shown that if the population size N → ∞ in such a way that the proportion of
successes r/N → p, and n is held constant, then the hypergeometric probability mass function
approaches the binomial probability mass function.
Proof: We have

r N −r
 
x n−x r! (N − r)! n! · (N − n)!
h(x; N, r, n) = N
= ·
x! · (r − x)! (n − x)! · (N − n − (r − x))!

n
N!
 
n r!/(r − x)! (N − r)! · (N − n)!
= · ·
x N !/(N − x)! (N − x)! · (N − r − (n − x))!
 
n r!/(r − x)! (N − r)!/(N − r − (n − x))!
= · ·
x N !/(N − x)! (N − n + (n − x))!/(N − n)!
  Y x n−x
n (r − x + k) Y (N − r − (n − x) + m)
= · ·
x k=1 (N − x + k) m=1 (N − n + m)

Now taking the large N limit for fixed r/N , n and x we get the binomial pmf,
 
n x n−x
b(x; n, p) = p q
x

21
since
(r − x + k) r
lim = lim =p
N →∞ (N − x + k) N →∞ N

and
(N − r − (n − x) + m) N −r
lim = lim = 1 − p = q.
N →∞ (N − n + m) N →∞ N

In practice, this means that we can approximate the hypergeometric probabilities with bino-
mial probabilities, provided N  n. As a rule of thumb, if the population size is more than 20
times the sample size (N > 20n) or N/n > 20, then we may use binomial probabilities in place of
hypergeometric probabilities.

Ex. A manufacturer of automobile tires reports that among a shipment of 5000 sent to a local
distributor, 1000 are slightly blemished. If one purchases 10 of these tires at random from the
distributor, what is the probability that exactly 3 are blemished?

Sol. We find P (X = 3) = 0.2013 from binomial distribution, and P (X = 3) = 0.2015 from


hypergeometric distribution.

3.4.2 Generalization of the hypergeometric distribution


Consider a lot of N objects given that r1 , r2 , ....., rk objects possess different traits of our interest
such that r1 +r2 +....+rk = N . Suppose a lot of n objects is randomly chosen (without replacement)
where x1 , x2 , ..., xk objects have the traits as in the r1 , r2 , ....., rk objects, respectively, such that
x1 + x2 + .... + xk = n. Then the probability of the random selection is
r1
 r2 
.... xrkk

x1 x2
f (x1 , x2 , ...., xk ) = N
 .
n

Ex. Ten cards are randomly chosen without replacement from a deck of 52 playing cards. Find
the probability of getting 2 spades, 3 clubs, 4 diamonds and 1 heart?

Sol. Here N = 52, r1 = 13, r2 = 13, r3 = 13, r4 = 13, n = 10, x1 = 2, x2 = 3, x3 = 4, x4 = 1. So


the required probability is
13 13 13 13
   
2 3
4
52
1
.
10

22
3.5 Poisson Distribution
Consider the pmf of the binomial random variable X:
 
n x
b(x; n, p) = p (1 − p)n−x , x = 0, 1, 2, · · · , n.
x
Let us calculate limiting form of the Binomial distribution as n → ∞, p → 0, and np = k is a
constant. We have 
n x
b(x; n, p) = p (1 − p)n−x
x
n!
= px (1 − p)n−x
x!(n − x)!
n(n − 1)...(n − x + 1) x
= p (1 − p)n−x
x!
(np)(np − p)...(np − xp + p)
= (1 − p)n−x
x!
(np)(np − p)...(np − xp + p)
= (1 − p)−x (1 − p)n
x!
(k)(k − p)...(k − xp + p)
= (1 − p)−x (1 − p)k/p (Using np = k)
x!
Thus, in the limit p → 0, we get

(k)(k − 0)...(k − 0) e−k k x


p(x; k) = (1 − 0)−x e−k = ,
x! x!
known as the pmf of Poisson distribution.

Notice that the conditions n → ∞, p → 0 and np = k, intuitively refer to a situation where


the sample space of the random experiment is a continuous interval or medium (thus carrying
infinitely many points, n → ∞); the probability p of discrete occurrences of an event of interest is
very small (p → 0) such that the mean number of occurrences np of the event remains constant k.
Thus, formally the Poisson distribution arises under the following conditions:
(i) The random experiment consists of counting or observing discrete occurrences of an event in
a continuous region or time intervala of some given size s, called as a Poisson process or Poisson
experiment. For example, counting number of airplanes landing on Delhi airport between 9am to
11am, observing the white blood cells in a sample of blood etc. are Poisson experiments.
(ii) λ denotes the number of occurrences of the event of interest per unit measurement of the given
region of size s. Then k = λs is the expected or mean number of occurrences of the event in size s.
(iii) X denotes the number of occurrences of the event in the region of size s.
a
Note that the specified region could take many forms. For instance, it could be a length, an area, a volume, a
period of time, etc.
Then X is called a Poisson random variable, and its pmf can be proved to be

e−k k x
p(x; k) = , x = 0, 1, 2, ....
x!
The Poisson distribution is characterized by the single parameter k.

23
Mean, variance and mgf of Poisson random variable

X
(i) µX = E(X) = xp(x; k)
x=0

X e−k k x
= x
x=1
x!

X k x−1
= ke−k
x=1
(x − 1)!
−k k
= ke e
=k

2
(ii) σX = E(X 2 ) − E[X]2 = E(X(X − 1)) + E(X) − E(X)2

X e−k k x
= x(x − 1) + k − k2
x=2
x!

X k x−2
=k e2 −k
+ k − k2
x=2
(x − 2)!
= k 2 e−k ek + k − k 2
=k
We notice that
2
µX = k = σX .

X
(iii) mX (t) = E(etX ) = etx p(x; k)
x=0

X e−k k x
= etx
x=0
x!

X (ket )x
= e−k
x=0
x!
t
= e−k eke
t
= ek(e −1)

Some illustrative examples


Ex. A healthy person is expected to have 6000 white blood cells per ml of blood. A person is
tested for white blood cells count by collecting a blood sample of size 0.001ml. Find the probability
that the collected blood sample will carry exactly 3 white blood cells.

e−6 63
Sol. Here λ = 6000, s = 0.001, k = λs = 6 and x = 3, and therefore P (X = 3) = p(3; 6) = 3!
.

Ex. In the last 5 years, 10 students of BITS-Pilani are placed with a package of more than one
crore. Find the probability that exactly 7 students will be placed with a package of more than one
crore in the next 3 years.

24
e−6 67
Sol. Here λ = 10/5 = 2, s = 3, k = λs = 6 and x = 7, and therefore P (X = 7) = p(7; 6) = 7!
.

Ex. During a laboratory experiment, the average number of radioactive particles passing through
a counter in 1 millisecond is 4. What is the probability that 6 particles enter the counter in a given
millisecond?

Sol. Here k = 4 and x = 6.

Ex. Ten is the average number of oil tankers arriving each day at a certain port. The facilities
at the port can handle at most 15 tankers per day. What is the probability that on a given day
tankers have to be turned away?

Sol. Here k = 10 and required probability is


P (X > 15) = 1 − P (X ≤ 15)
15
X
=1− P (X = x) = 1 − 0.9513 = 0.0487.
x=0

Note: We proved that the Binomial distribution tends to the Poisson distribution as n → ∞,
p → 0 and np = k remains constant. Thus, we may use Poisson distribution to approximate
binomial probabilities when n is large and p is small. As a rule of thumb this approximation can
safely be applied if n > 50 and np < 5.

Some illustrative examples


Ex. In a certain industrial facility, accidents occur infrequently. It is known that the probability
of an accident on any given day is 0.005 and accidents are independent of each other.
(a) What is the probability that in any given period of 400 days there will be an accident on one
day?
(b) What is the probability that there are at most three days with an accident?

Sol. Let X be a binomial random variable with n = 400 and p = 0.005. Thus, np = 2. Using the
Poisson approximation,
(a) P (X = 1) = e−2 21 = 0.271 and
X3
(b) P (X ≤ 3) = e−2 2x /x! = 0.857.
x=0

Ex. In a manufacturing process where glass products are made, defects or bubbles occur, occa-
sionally rendering the piece undesirable for marketing. It is known that, on average, 1 in every
1000 of these items produced has one or more bubbles. What is the probability that a random
sample of 8000 will yield fewer than 7 items possessing bubbles?

Sol. This is essentially a binomial experiment with n = 8000 and p = 0.001. Since p is very close to
0 and n is quite large, we shall approximate with the Poisson distribution using k = (8000)(0.001) =
8. Hence, if X represents the number of bubbles, the require probability is P (X < 7) = 0.3134.

25
3.6 Uniform Distribution
A random variable X is said to follow uniform distribution if it assumes finite number of values
all with same chance of occurrence or equal probabilities. For instance, if the random variable X
assumes n values x1 , x2 , .... , xn with equal probabilities P (X = xi ) = 1/n, then it is uniform
random variable with pmf given by
1
u(x) = , x = x1 , x2 , ...., xn .
n
The moment generating function, mean and variance of the uniform random variable respec-
tively read as
n n
1 X txi 1X
mX (t) = e , µ= xi ,
n i=1 n i=1

n n
!2
1X 2 1X
σ2 = x − xi .
n i=1 i n i=1
Ex. Suppose a fair die is thrown once. Let X denotes the number appearing on the die. Then X
is a discrete random variable assuming the values 1, 2, 3, 4, 5, 6. Also, P (X = 1) = P (X = 2) =
P (X = 3) = P (X = 4) = P (X = 5) = P (X = 6) = 1/6. Thus, X is a uniform random variable.

26

You might also like