Chapter 3: Random Variables and Probability Distributions
Chapter 3: Random Variables and Probability Distributions
Statistics is concerned with making inferences about populations and population charac-
teristics. Experiments are conducted with results that are subject to chance. The testing
of a number of electronic components is an example of a statistical experiment, a
term that is used to describe any process by which several chance observations are
generated. It is often important to allocate a numerical description to the outcome. For
example, the sample space giving a detailed description of each possible outcome when
three electronic components are tested may be written
Definition
A random variable is a function that associates a real number with each element in
the sample space.
In this course, we’ll deal with two types of random variables, namely, discrete and con-
tinuous.
Definition
A random variable is a discrete random variable if it can take on no more than
countable (finitely many or countably infinite) number of values.
Definition
A random variable is a continuous random variable if it can take any value in an
interval.
Example. Consider the experiment of flipping a coin three times and let X be the
number of heads. Find the values of the random variable X and state whether it is
discrete or continuous.
Solution.
The sample space is S = {HHH, THH, HTH, HHT , TTH, THT , HTT , TTT }.
Example. Consider the experiment of flipping a coin repeatedly until the first occurrence
of a head and let Y be the number of flips. Find the values of the random variable Y
and state whether it is discrete or continuous.
Solution.
The sample space is S = H, TH, TTH, TTTH, . . . , TT . . . T H, . . . .
| {z }
k
Example. Consider the experiment of filling 12 ounce cans with coffee and let Z be the
amount of coffee in a randomly chosen can. Find the values of the random variable Z
and state whether it is discrete or continuous.
Solution.
Z can take any value between 0 and 12, that is, 0 ≤ x ≤ 12. Hence, Z is a continuous
random variable.
A discrete random variable assumes each of its values with a certain probability.
Definition
The probability mass function (pmf) or probability distribution (function), f (x), of
a discrete random variable X represents the probability that X takes the value x. That
is,
f (x) = P {X = x} for all values of x.
Example. Consider the experiment of flipping a coin three times and let X be the
number of heads. Find f (x) for all values of x and show the result in a table, a graph,
and a function representation.
Example. Check whether the following functions can serve as probability distribution
functions of appropriate random variables:
3
x
a) f (x) = 8
, x = 0, 1, 2, 3,
x+2
b) f (x) = 12
, x = 1, 2, 3,
2
x −1
c) f (x) = 25
, x = 0, 1, 2, 3, 4.
Definition
The cumulative (probability) distribution function (cdf), F (x), of a discrete random
variable X represents the probability that X does not exceed the value x. That is,
X
F (x) = P {X ≤ x} = f (t) for − ∞ < x < ∞.
t≤x
Example. A car agency sells 60% of its inventory of a certain foreign car equipped with
side airbags
a) Find a formula for the probability distribution of the number of cars with side airbags
among the next 4 cars sold by the agency.
b) Find the cumulative distribution function of the random variable defined in part a)
and show the result in a table and a graph representation.
A continuous random variable has a probability of 0 of assuming exactly any of its values.
Consequently, its probability distribution cannot be given in a table form.
Although the probability distribution of a continuous random variable cannot be pre-
sented in a table form, it can be stated as a formula. Such a formula would necessarily
be a function of the numerical values of the continuous random variable X and as such
will be represented by the functional notation f (x).
Definition
The probability density function (pdf) for the continuous random variable X is
constructed so that the area under its curve bounded by the x axis is equal to 1 when
computed over the range of X for which f (x) is defined.
The probability that X assumes a value between a and b is equal to the shaded area
under the probability density function between the ordinates at x = a and x = b is given
by
Z b
P (a < X < b) = f (x)dx.
a
Note: When X is continuous, it does not matter whether we include an endpoint of the
interval or not, that is,
Let X be a continuous random variable with probability density function f (x). Then
1 f (x) ≥ 0 for all x ∈ R and
R
x f (x) dx = 1.
2
Definition
The cumulative distribution function (cdf), F (x), of a continuous random variable X
with density function is
Z x
F (x) = P {X ≤ x} = P {X < x} = f (t) dt for − ∞ < x < ∞.
−∞
and
dF (x)
f (x) = , if the derivative exists.
dx
Example. Suppose that the error in the reaction temperature, in ◦ C, for a controlled
laboratory experiment is a continuous random variable X having the probability density
function (
x2
f (x) = 3
, −1 < x < 2,
0, elsewhere.
a) Evaluate c.
1 1
b) Find F (x) and use it to calculate the probability that more than 4
but fewer than 2
of the people contacted with respond to this type of solicitation.
Example. The Department of Energy (DOE) puts projects out on bid and generally esti-
mates what a reasonable bid should be. Call the estimate b. The DOE has determined
that the density function of the winning (low) bid is
5 2
8b
, 5
b≤ y ≤ 2b,
f (y ) =
0, otherwise.
Find F (y ) and use it to determine the probability that the winning bid is less than the
DOE’s preliminary estimate b.
Our study of random variables and their probability distributions in the preceding sec-
tions is restricted to one-dimensional sample spaces, in that we recorded outcomes of
an experiment as values assumed by a single random variable. There will be situations,
however, where we may find it desirable to record the simultaneous outcomes of several
random variables.
If X and Y are two discrete random variables, the probability distribution for their si-
multaneous occurrence can be represented by a function with values fX ,Y (x, y ) for any
pair of values (x, y ) within the range of the random variables X and Y . We refer to this
function as the joint probability distribution of X and Y . Hence, in the discrete case,
fX ,Y (x, y ) = P {X = x, Y = y } ,
that is, the values fX ,Y (x, y ) give the probability that outcomes x and y occur at the
same time.
Definition
For the discrete case, the marginal distributions of X alone and of Y alone are
X X
fX (x) = fX ,Y (x, y ) and fY (y ) = fX ,Y (x, y ) .
y x
Let X and Y be discrete random variables with joint probability density function fX ,Y (x, y ).
Then
1 fX ,Y (x, y ) ≥ 0 for all (x, y ) and
P P
y fX ,Y (x, y ) = 1.
2
x
Example(*). Two ballpoint pens are selected at random from a box that contains 3 blue
pens, 2 red pens, and 3 green pens. If X is the number of blue pens selected and Y is
the number of red pens selected, find
a) the joint probability function fX ,Y (x, y ) and show it in a table representation,
b) the marginal densities of X and Y ,
c) P {(X , Y ) ∈ A}, where A is the region {(x, y ) |x + y ≤ 1}.
When X and Y are two continuous random variables, the joint density function (fX ,Y (x, y ))
is a surface lying above the xy plane and P {(X , Y ) ∈ A}, where A is any region in the
xy plane, is equal to the volume of the right cylinder bounded by the base A and the
surface.
Definition
For the continuous case, the marginal distributions of X alone and of Y alone are
Z Z
fX (x) = fX ,Y (x, y ) dy and fY (y ) = fX ,Y (x, y ) dx.
y x
Let X and Y be continuous random variables with joint probability density function
fX ,Y (x, y ). Then
1 fX ,Y (x, y ) ≥ 0 for all (x, y ) and
R R
x y fX ,Y (x, y ) dxdy = 1.
2
Example. A privately owned business operates both a drive-in facility and a walk-in
facility. On a randomly selected day, let X and Y , respectively, be the proportions of
the time that the drive-in and the walk-in facilities are in use, and suppose that the joint
density function of these random variables is
2
5
(2x + 3y ) , 0 ≤ x ≤ 1, 0 ≤ y ≤ 1
fX ,Y (x, y ) =
0, elsewhere.
R∞ R∞
a) Verify −∞ −∞ fX ,Y (x, y ) dxdy = 1.
b) Find the marginal densities of X and Y ,
n o
1 1 1
c) Find P {(X , Y ) ∈ A}, where A = (x, y ) |0 < x < ,
2 4
<y < 2
.
Statistical Independence
Definition
Let X and Y be two random variables, discrete or continuous, with joint probability
distribution fX ,Y (x, y ) and marginal distributions fX (x) and fY (y ), respectively. The
random variables X and Y are said to be statistically independent if and only if
fX ,Y (x, y ) = fX (x) fY (y )
Example. Show that the random variables X and Y in Example(*) are not statistically
independent.
Definition
Let X1 , X2 , . . . , Xn be n random variables, discrete or continuous, with joint probability
distribution fX1 ,X2 ,...,Xn (x1 , x2 , . . . , xn ) and marginal distributions fX1 (x1 ), fX2 (x2 ), . . . ,
fXn (xn ), respectively. The random variables X1 , X2 , . . . , Xn are said to be mutually
statistically independent if and only if
fX1 ,X2 ,...,Xn (x1 , x2 , . . . , xn ) = fX1 (x1 ) , fX2 (x2 ) , . . . , fXn (xn )
Example. Suppose that the shelf life, in years, of a certain perishable food product pack-
aged in cardboard containers is a random variable whose probability density function is
given by −x
e , x > 0,
f (x) =
0, elsewhere.
Let X1 , X2 , and X3 represent the shelf lives for three of these containers selected inde-
pendently and find P (X1 < 2, 1 < X2 < 3, X3 > 2).