Discrete Random Variables and Probability Distributions Edit
Discrete Random Variables and Probability Distributions Edit
DC-1
Semester-II
Paper-III: Statistical Methods in Economics-I
Lesson: Discrete Random Variables And Probability
Distributions
Lesson Developer: Chandra Goswami
College/Department: Department of Economics, Dyal
Singh College, University of Delhi
TABLE OF CONTENTS
Learning Objectives 2
1. Random Experiments 2
2. Random Variables 4
Practice Questions 19
Content Developer
Chandra Goswami, Associate Professor, Department of Economics
Dyal Singh College, University of Delhi
Reference
Jay L. Devore: Probability and Statistics for Engineering and the Sciences, Cengage
Learning, 8th edition [Chapter 3]
.
Learning Objectives:
In this chapter you will learn what is a random variable and the two fundamentally
different types of random variables. You will learn how to arrive at the probability
distributions of discrete random variables and how to represent these graphically, as
well as presentation by summary expressions. This provides the tool for evaluating the
probability that the random variable takes on specific values or a range of values. You
will also learn how the probability distribution can be used to specify a mathematical
model for the population distribution. This will help you to identify the characteristics of
the population. The chapter ends with practice questions so that you can test your
understanding of the chapter contents.
Chapter Outline
1. Random experiments
2. Random variables
3. Probability distributions for discrete random variables
4. Graphical presentations of probability distributions
5. Parameters of a probability distribution
6. The cumulative distribution function for discrete random variables
7. Deriving probability mass function from cumulative distribution function
1. RANDOM EXPERIMENTS
A random or chance experiment is an experiment which yields different possible
outcomes. These outcomes may be qualitative or quantitative. In case of qualitative
outcomes, we observe a specific attribute of the variable. Quantitative outcomes result
when we observe a number describing the attribute of the variable. Until the outcome is
observed there is uncertainty about which particular outcome will be the result of the
experiment. If the experiment is repeated under identical conditions different outcomes
are likely to be observed at each trial.
Example 1.1
If a balanced coin is tossed there are two equally possible (qualitative) outcomes, a head
(H) or a tail (T).
Example 1.2
It is known that wind speed and direction affects time taken by aircraft to reach their
destination. The three possible outcomes for arrival time on any day are: before time, on
time, or delayed.
Example 1.3
If an unbiased die is tossed it will result in one of six possible outcomes, depending on
which face shows up: 1, 2, 3, 4, 5, or 6.
Example 1 4
If example 1.2 is restated to measure the extent of time delay in the aircraft reaching its
destination, we can denote the possible outcomes as x = 0 for ontime arrival (ie, as per
the scheduled time), x < 0 as measure of before time arrival (eg, - 5 minutes indicates
arrival is 5 min ahead of scheduled time), and x > 0 for late arrival (eg, x = 22 represents
arrival is 22 min after the scheduled time). We obtain an infinite number of possible
outcomes since time is a continuous variable. Here extent of time delay (in minutes) is
the variable where x 0.
Example 1.5
A bottling plant fills cold drinks in 200 ml bottles for its client. Although the machine is
calibrated to dispense 200 ml per fill, it is noted that the fill amount varies from bottle to
bottle by small amounts. If we denote X = amount filled in a bottle (in ml), since volume
is a continuous variable, the possible values of the variable are x 200
The outcomes in examples 1.1 and 1.2 are qualitative, and quantitative in examples 1.3,
1.4 and 1.5. There are a finite number of outcomes in examples 1.1, 1.2 and 1.3, whereas
the number of outcomes is infinite in examples 1.4 and 1.5. In methods of statistical
analysis we often need some numerical aspects of experimental outcomes. The mean, for
instance, is a numerical function of the outcomes.
2. RANDOM VARIABLES
If the exhaustive set of all possible outcomes of a random experiment are known then
probabilities of occurrence of the different outcomes can be assigned. The concept of a
Definition 1
For a given sample space S of some experiment, a random variable is any rule that
associates a number with each outcome in S
A random variable (rv) is thus a function defined over the elements of S. The domain of
the rv is the sample space and the range is a set of real numbers. A random variable is,
therefore, a variable that takes on numerical values determined by the outcome of a
random experiment. Thus, the value of the random variable will vary according to the
observed outcome of a random experiment. In general, random variables are functions
that associate numbers with some specific attribute of an experimental outcome. Random
variables will be denoted by uppercase letters, such as X and Y, and their values by the
corresponding lowercase letters, such as x and y.
Since the outcomes of a random experiment can be designated as a random variable, any
numerical function of the outcomes is also a random variable. It is random since its value
depends on which particular outcomes are observed. It is a variable since different
numerical values are possible. We can, therefore, assign probabilities to its possible
values. Therefore we can say that a random variable is a variable which can take one of
the different possible values in the sample space with an assigned probability. If X
denotes the rv and s the sample outcome, then X(s) = q where q is a real number.
Example 2.1
If X is a rv with m possible values x1, x2, x3,….xm and Y is a rv with n possible values
y1, y2,….yn then the linear function X + Y is also a random variable since x + y = xi + yj
where i = 1,2,….,m, and j = 1, 2,…..,n.
Exercise 1
A balanced coin and a fair die are tossed simultaneously. List the different possible
outcomes.
Solution:
Two possible outcomes of the coin are head (H) or tail (T). Six possible outcomes of the
die are 1, 2, 3, 4, 5, and 6. Since the coin and die are tossed simultaneously the possible
outcomes are as follows:
(H,1); (H,2); (H,3); (H,4); (H,5); (H,6); (T,1); (T,2); (T,3); (T,4); (T,5); (T,6)
Exercise 2
In Exercise 1, if H is denoted by 1 and tail by 0 so that x = 0, 1. and y = 1, 2, 3, 4, 5, 6
then list the different possible outcomes for the linear function X+Y
Solution:
x + y = 1, 2, 3, 4, 5, 6, 2, 3, 4, 5, 6, 7 according to the combinations listed in exercise 1
Exercise 3
Assigning appropriate probabilities to the values of the random variables X and Y in
exercise 2, determine the probabilities of x + y.
Solution:
Since the coin is balanced, P(x=0) = p(0) = ½ and P(x=1) = p(1) = ½. Similarly, for the
fair die, p(1) = p(2) = p(3) = p(4) = p(5) = p(6) = 1/6.
Since X and Y are independent, there are 12 possible equally likely outcomes. Therefore,
p(1) = p(7) = 1/12 and p(2) = p(3) = p(4) = p(5) = p(6) = 2/12
Definition 2
Any random variable whose only possible values are 0 and 1 is called a Bernoulli
random variable
If an unbiased coin is tossed repeatedly, on each toss there are only two possible
outcomes so it is a Bernoulli rv. If an experiment can result in only two possible
outcomes – success or failure – in each trial, we have a Bernoulli random variable.
There are fundamentally two different types of random variables: discrete random
variables and continuous random variables. The distinction between discrete and
continuous random variables lies in the number of possible values the rv can take. If the
rv can have a finite number or a countably infinite number of possible values it is a
discrete rv. If, on the other hand, the outcome can be any real number in a given interval,
the number of possibilities is uncountably infinite, and the rv is said to be continuous.
Definition 3
A discrete random variable is a rv whose possible values either constitute a finite set or
else can be listed in an infinite sequence which is “countably” infinite, where there is a
first element, a second element, a third element and so on.
Examples 1.1, 1.2 and 1.3 have possible values which constitute a finite set. So is the
case with exercise 1 and exercise.2. In all these cases the possible outcomes can be
counted.
Example 2.2
A new company wishes to establish its brand image. For this purpose it runs a series of
weekly newspaper advertisements until sales of its products reach the target level.
Reaching the level of target sales is considered a success. Success may be achieved in 1
week or 2 weeks or 3 weeks and so on. If we denote success by S and failure by F then
the sample space is S = [S, FS, FFS, FFFS,………..]. We can define the random variable
X = number of weeks before the advertising campaign ends. Then, X(S) = 1, X(FS) = 2,
X(FFS) = 3, X(FFFS) =4, and so on. Any positive integer is a possible value of X. Thus,
the set of possible values of the rv X is countably infinite.
Definition 4
A random variable is continuous if both the following conditions apply
1. Its set of possible values consists either of all numbers in an interval on the
number line or all numbers in a disjoint union of such intervals.
2. No possible value of the random variable has a positive probability.
Condition 1 implies that there is no way to create a listing of all the infinite number of
possible values of the variable. Condition 2 implies that intervals of values have positive
probability. As the width of the interval diminishes, probability of the interval decreases.
In the limit, probability of the interval is zero as the width of the interval reduces to zero.
Example 2.3
The university team is scheduled to visit any minute during a three hour long
examination starting at 9am. We may want to find the probability that the team visits at a
given time or we may be interested in the probability that the visit takes place during a
given time interval. The sample space is from 0 to 180 minutes. The probability that the
Variables such as time, height, distance, temperature, area, volume, weight, etc that
require measurement are continuous. In practice, however, limitations of measurement
instruments often do not allow measurement on a continuous scale. Yet it is useful to
study models of continuous variables as they often reflect real world situations.
various possible values of the rv X. The probability assigned to any value x of the rv will
be denoted by p(x).
Definition 5
The probability distribution or probability mass function (pmf) of a discrete random
variable is defined for every number x by p(x) = P(X = x) for each x within the range of
X.
Based on the postulates of probability, a function can serve as the pmf of X if and only if
p(x) satisfies the following two conditions
1. 0 < p(x) < 1 for each value within its domain
2. p(x) = 1 where the summation is over all values within its domain.
x
The first condition states that probability cannot be negative or exceed 1. The second
condition follows from the fact that all possible values of X are mutually exclusive and
collectively exhaustive so that the sum of the probabilities must equal 1. Thus, any
function which satisfies both properties can serve as the pmf of a discrete random
variable. Examples of pmf are Bernoulli Distribution, discrete Uniform Distribution,
Binomial Distribution, Negative Binomial Distribution, Hypergeometric Distribution and
Poisson Distribution.
Note that a function which satisfies the two conditions for one set of values of X may not
do so for another set of values. In the latter case the function cannot serve as a pmf of X.
To test whether a function is a pmf we need to check whether both conditions are
satisfied for the given X values.
Exercise 4
A balanced coin is tossed three times. Let X denote the rv that is defined as the total
number of heads. List the elements of the sample space and obtain the probability
distribution of the total number of heads observed. Find a formula for the pmf of the total
number of heads observed in three tosses of a fair coin.
Solution:
Denoting H = head and T = tail, elements of the sample space are
TTT, TTH, THT, HTT, THH, HTH, HHT, HHH.
Let the rv X = total number of heads observed in 3 tosses of a balanced coin. For a
balanced coin a head and a tail are equally likely outcomes so that P(H) = P(T) = ½. It
can be assumed that the outcome of any toss is independent of the outcomes of the other
two tosses of the coin. Then,
P(TTT) = P(X = 0) = p(0) = (1/2)(1/2)(1/2) = 1/8
P(TTH or THT or HTT) = P(X = 1) = p(1) = 3/8
P(THH or HHT or HTH) = P(X = 2) = p(2) = 3/8
P(HHH) = P(X = 3) = p(3) = 1/8
The probability distribution or pmf of X is given in the following table:
x 0 1 2 3
Both conditions for a pmf are satisfied since 0 < p(x) < 1 for x = 0, 1, 2 and 3, and
p(x) = 1
x
Based on the probabilities we observe that numerators of the four fractions 1/8, 3/8, 3/8
and 1/8 are the binomial coefficients 3 , 3 , 3 , 3 . The formula for the pmf can,
0
1 2 3
3
therefore, be written as for x = 0, 1, 2 and 3.
x
8
Exercise 5
A computer shop sells desktops, laptops, notebooks and tablets. .A prospective buyer
enters the shop. The random variable can take five possible values. X = 0 if no purchase
is made, X = 1 if a tablet is purchased, X = 2 if a notebook is purchased, X = 3 if a laptop
is bought, and X = 4 if a desktop is bought. If 40% of buyers purchase a tablet, 35%
buyers opt for a notebook, 20% a laptop and 5% a desktop, what is the probability
distribution of X?
Solution:
The pmf is as follows
x 0 1 2 3 4
Exercise 6
A balanced coin is tossed four times. Use the formula derived in exercise 4 to obtain the
pmf of X = total number of heads in four tosses of the coin.
Solution:
Total number of possible outcomes is 24 = 16 as the result of each toss is independent of
4
the remaining three tosses. Using the formula p(x) = the pmf is as follows:
x
16
x 0 1 2 3 4
Exercise 7
x4
Check whether the function given by f(x) = for x = 0, 1, 2, 3, 4 can serve as the
30
probability distribution of a discrete random variable.
Solution:
For given values of x the value of the function is as follows:
f(0) = 4/30, f(1) = 5/30, f(2) = 6/30, f(3) = 7/30, f(4) = 8/30
Each of the above values are positive fractions less than 1. Hence the first condition for a
pmf is satisfied. The sum of all the values of f(x), Σf(x) = (4 + 5 + 6 + 7 + 8)/30 = 1 so
that the second condition is also satisfied. Since both the required conditions for a pmf
are satisfied, therefore, the given function can serve as a pmf for a rv having the values 0,
1, 2, 3, and 4.
is distributed at various points on the number line. The pmf can be presented graphically
in probability histograms.
For a probability histogram, above each x with P(x) > 0 construct a rectangle centered at
x. The height of each rectangle is proportional to P(x).The area of the rectangle equals
p(xi) for X = xi. If the base of each rectangle is of unit width then the height will be equal
to p(xi) for X = xi.
Example 4.1
The pmf of exercise 5 is
x 0 1 2 3 4
For all x > 4 , p(x) = 0. The probability histogram is drawn by representing 1 with the
interval 0.5 to 1.5, 2 with the interval 1.5 to 2.5, 3 with the interval 2.5 to 3.5, and so on.
The line graph and bar chart are also referred to as histograms. The line graph is drawn
by drawing lines of height p(x) for corresponding x values. The bar chart is drawn with
each rectangle centered at the x value with a height equal to the probability of the
corresponding value of the rv. The line graph and bar chart for the pmf of ex. 5 are
illustrated in Fig 2 and Fig 3 respectively.
Definition 6
Suppose p(x) depends on a quantity that can be assigned any one of a number of possible
values, with each different value determining a different probability distribution. Such a
quantity is called a parameter of the distribution.
The collection of all probability distributions for different values of the parameter is
called a family of probability distributions.
Example 5.1
We consider a random experiment that can give rise to just two possible mutually
exclusive and exhaustive outcomes 0 and 1. Then p(0) + p(1) = 1. Such a rv is called a
Bernoulli random variable. If we select α such that 0 < α < 1, the pmf of the Bernoulli rv
can then be expressed as
1 x0
p ( x) x 1
0
otherwise
For each of the possible values of α in the interval between 0 and 1, we obtain a different
probability distribution. We thus obtain a family of Bernoulli distributions with each pmf
determined by a particular value of α. Since the pmf depends on the particular value of α
we often write the pmf of the Bernoulli distribution as p(x; α) rather than just p(x). The
quantity α in the Bernoulli pmf is a parameter. The value of the parameter α distinguishes
one Bernoulli distribution from another. If α can take any value in the interval 0 to 1, we
obtain an infinite number of Bernoulli distributions, each for a different value of α.
The value of the parameter may be unknown. If the population size is very large it may
not be possible to examine all the population values to ascertain the value of α. We can
then use sample data to infer about the parameter value α, where the sample is a
representative subset of the population.
Example 5.2
If the discrete rv X can take any value x1, x2, x3, ……… xn with equal probability we
have a discrete Uniform Distribution. We can denote the minimum value x1 = α, and the
maximum value xn = β. Then the pmf of the Uniform Distribution can be expressed as
1
x
p( x) n
0 otherwise
We obtain a family of uniform distributions with the pmf of each distribution determined
by a particular set of values for α and β. The pmf can be denoted by p(x; α, β), where α
and β are the parameters of the distribution. For different combinations of values of α
and β we obtain different uniform distributions.
Definition 7
The cumulative distribution function F(x) of a discrete random variable X with pmf p(x)
is defined for every number x by F(x) = P(X < x) = p( y )
y: y x
Thus, cdf is obtained by summing the pmf p(x) over all possible values of X = y
satisfying y < x. We use F(x) to calculate the probability that the observed value of X
does not exceed x. It follows that P(X < x) < P(X < x) since the value x is included in
P(X < x) and not in P(X < x). Only if P(X = x) = 0 then P(X < x) = P(X < x). In all other
cases where P(X = x) > 0 the inequality holds, ie, P(X < x) < P(X < x).
The first property states that F(x) is non-negative. F(x) = 0 for any value of X that is less
than the smallest permissible X value of the pmf since p(x) = 0 for all such values. It
follows that when all possible values of X have been considered F(x) = 1. For higher
values of X we again have p(x) = 0 so that F(x) remains unchanged at 1. The second
property implies that if p(b) = 0 then F(a) = F(b). Otherwise F(a) < F(b) when a < b.
The graph of F(x) is a step function. If X is a discrete rv whose set of possible values are
x1, x2, ……..., where x1 < x2 < x3 < ……….., the value of F(x) is constant in the interval
between two successive values xi-1 and xi, and then increases by p(xi) at xi. F(x) again
remains flat between xi and xi+1 when it jumps up (takes a step) by p(xi+1) at xi+1. This is
illustrated in Figure 4.
Since F(xi-1) < F(xi) and F(xi) < F(xi+1), at all points of discontinuity the cdf takes on the
greater of the two values. This is indicated by heavy dots in Figure 4. It can be seen that
as x increases, the cdf will change values only at those points that can be taken by the rv
with positive probability.
Example 6.1
Using the pmf in exercise 5,
x 0 1 2 3 4
F(0) = P(X = 0) = 0
F(1) = P(X= 0 or 1) = 0 + 0.4 = 0.4
F(2) = P(X= 0 or 1 or 2) = 0.4 + 0.35 = 0.75
F(3) = P(X= 0 or 1 or 2 or 3) = 0.75 + 0.20 = 0.95
Example 7.1
Given the cdf obtained in example 6.1
0 x 0 or x 1
0.40 1 x 2
F(x) = 0.75 2 x3
0.95 3 x 4
1 4 x
we get
p(0) = 0
p(1) = 0.4 -0 = 0.4
p(2) = 0.75 – 0.4 = 0.35
p(3) = 0.95 – 0.75 = 0.20
p(4) = 1 – 0.95 = 0.05
To obtain the probability that value of X falls in the interval [a, b] such that a < b, where
both a and b are included in the interval, we have to compute P(a < X < b) = F(b) – F(a-)
where a- denotes the largest possible X value that is strictly less than a. If the only
possible values of X are integers so that a and b are both integers, then
P(a < X < b) = P( X = a or a + 1 or a + 2 or…….or b) = F(b) – F(a – 1)
This principle can be used to find the probability that X takes the value a. By setting
b = a we obtain P(X = a) = p(a) = F(a) – F(a – 1).
This method is used to derive the pmf from the cdf.
We can similarly compute P(a < X < b) = F(b-1) – F(a) where a and b are not included in
the interval.
Note that F(b) – F(a) gives us P(a < X < b) where b is included in the interval but a is not
included.
Example 7.2
Given the cdf obtained in example 6.1,
P(1< X < 3) = F(3) – F(0) = 0.95 – 0 = 0.95
Exercise 8
A study of number of delayed flights in an hour (X) at an airport due to fog in winter
revealed the following probability distribution of the rv X.
x 0 1 2 3 4 5 6
PRACTICE QUESTIONS
1. Suppose one die has spots 1, 2, 2, 3, 3, 4 and a second die has spots 1, 3, 4, 5, 6,
8. If both dice are rolled, list the sample space (all possible outcomes). Let the rv
X = total number of spots showing. What is the pmf of X? Show that this pmf is
the same as that for two normal dice, each having 1, 2, 3, 4, 5, 6 spots.
3. Urn 1 and urn 2 each have two red balls and two white balls. Two balls are drawn
simultaneously from each urn. Let
X1 = number of red balls in the sample from first urn, and
X2 = number of red balls in the sample from the second urn.
Find the pmf of X1 + X2
4. An urn contains four balls numbered 1, 2, 3, and 4. If two balls are drawn from
the urn at random and Z is the sum of the numbers on the two balls, find
(a) the probability distribution of Z and draw the histogram
(b) the cdf of Z and draw its graph
5. A coin is biased so that heads is twice as likely as tails. For three independent
tosses of the coin, find
(a) the probability distribution of X, the total number of heads
(b) the probability of getting at most two heads, using the cdf of X
(c) P(1 < X < 3) and P(X > 2), using the cdf
6. The amount of coffee (in grams) in a 230-gm jar filled by a certain machine is a
random variable whose probability density is given by
0 x 227.5
1
f ( x) 227.5 x 232.5
5
0 x 232.5
Find the probabilities that a 230-gram jar filled by this machine will contain
(a) at most 228.65 gm of coffee
(b) anywhere from 229.34 to 231.66 gm of coffee
(c) at least 229.85 gm of coffee
8. Given the following cdf, derive the pmf of Y and draw the
(a) histogram of the pmf
(b) graph of the cdf
0 y 1
0.05 1 y 2
0.15 2 y4
F ( y)
0.50 4 y8
0.90 8 y 16
1 16 y
y2
(d) Could p(y) = 50 for y = 1,…..,5 be the pmf of Y?
x 1 3 4 6 12
p(x) 0.30 0.10 0.05 0.15 0.40
(i) Derive the cumulative distribution function (cdf) of X and draw the graph
of this cdf
(ii) Using the cdf, compute P(3 ≤ X < 6), P(3 < X < 6), and P(4 ≤ X).