04_DiscreteRVs
04_DiscreteRVs
ECE 3077 Notes by M. Davenport, J. Romberg, C. Rozell, and M. Wakin. Last updated 23:38, September 5, 2023
Discrete random variables
Until now, we have discussed probability almost entire in the context
of events, in which there are only two possibilities: either they happen
or they don’t, with some associated probabilities. This is limiting in
two ways. First, we often want to talk about numerical measurements
associated with these events (i.e., the probability of getting 2 heads
in 3 coin tosses, regardless of the exact sequence). Second, we often
want to think simultaneously about the probabilities associated with
all possible values of these measurements (i.e., the three probabilities
associated with getting 1, 2 or 3 heads in 3 coin tosses). In short, we
want a mathematical framework that lets us quantitatively consider
a variable that can take many values but is random in some way.
Definition: A random variable X is a mapping from the sample
space ⌦ to the real line1 :
X : ⌦ ! R,
1
ECE 3077 Notes by M. Davenport, J. Romberg, C. Rozell, and M. Wakin. Last updated 23:38, September 5, 2023
heads, the event that there were three heads, etc. Now we can simply
talk about “the number of heads” as a (random) variable.
The examples above are discrete random variables because the values
they take are from a discrete set. Continuous random variables also
abound: the current temperature in the room, the amount of time
until a light bulb burns out, etc. Since the discrete case is easier, we
will start by developing some essential tools here and then generalize.
A word on notation
Random variables are just that — random; all of the interesting dis-
cussion about them occurs before we know their value. We will use
capital letters to denote random variables, e.g., X for the number of
heads in five flips.
Where this will really start to get important (and confusing) is when
we talk about multiple random variables, in which case it is easier (in
the long run) to let x denote the particular value that X might take, y
the value that Y might take, etc. This will seem a little unnatural at
first, but just remember: X (or Y or Z ...) is a random variable, while
x (or y or z) is just an old-fashioned regular variable representing an
(unknown) concrete value that X might take.
2
ECE 3077 Notes by M. Davenport, J. Romberg, C. Rozell, and M. Wakin. Last updated 23:38, September 5, 2023
Probability mass functions (pmfs) for discrete
random variables
A random variable is completely described by the probabilities of the
values it can take. These are encapsulated in the probability mass
function for X, which we denote as pX (k). For discrete random
variables, the definition is straightforward: Probability mass function:
what is the probability
pX (k) = P (X = k) . that a random variable has
some particular value
(Notice that we have adopted the notation P (X = k) over the strictly-
more-correct-but-also-more-clunky P ({X = k}).)
Examples.
For an experiment involving the roll of one fair six-sided die,
let X be the number of “pips” facing upwards at the end of the
roll (i.e., the numerical value of the result of the roll). Then
(
1/6 k = 1, 2, . . . , 6
pX (k) =
0 otherwise.
0.16
0.14
0.12
0.1
0.08
0.06
0.04
0.02
0
0 1 2 3 4 5 6 7 8
k!
3
ECE 3077 Notes by M. Davenport, J. Romberg, C. Rozell, and M. Wakin. Last updated 23:38, September 5, 2023
For an experiment involving the roll of two fair six sided dice,
let X be the sum of the values of the two rolls. Then
8
> k 1
< 36 2k7
pX (k) = 1336 k 8 k 12
>
:
0 otherwise
This pmf is illustrated below:
sum of two die pmf
0.18
0.16
0.14
0.12
0.1
0.08
0.06
0.04
0.02
0
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14
k!
An unfair coin is flipped 10 times, the probability that is lands
on heads on any one flip is 0.4, and the flips are independent of
one another. Let X be the total number of heads. Then
10
pX (k) = k (0.4)k (0.6)10 k
for k = 0, 1, 2, . . . , 10
This pmf is illustrated below:
# of heads pmf
0.35
0.3
0.25
0.2
0.15
0.1
0.05
0
0 1 2 3 4 5 6 7 8 9 10
k!
4
ECE 3077 Notes by M. Davenport, J. Romberg, C. Rozell, and M. Wakin. Last updated 23:38, September 5, 2023
You open a book written in English to a random page and place
your finger down at a random location. Let
0.12
0.1
0.08
0.06
0.04
0.02
0
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26
k!
Properties for pmfs
Every pmf obeys two properties which follow immediately from the
Kolmogorov axioms:
1. Positivity:
pX (k) 0 for all k
2. Normalization: X
pX (k) = 1
k
5
ECE 3077 Notes by M. Davenport, J. Romberg, C. Rozell, and M. Wakin. Last updated 23:38, September 5, 2023
Examples of important discrete pmfs
Bernoulli random variables.
These are the simplest random variables of all. They only take two
values, 1 or 0. The pmf is defined by a single parameter p, the prob-
ability that X = 1:
(
p, k=1
pX (k) =
1 p, k = 0.
Bernoulli random variables are useful for things like modeling coin
flips, bits, yes/no decisions, wins/loss, make/miss, Republican/Democrat,
etc.
6
ECE 3077 Notes by M. Davenport, J. Romberg, C. Rozell, and M. Wakin. Last updated 23:38, September 5, 2023
Binomial random variables.
We consider a fixed number n of independent Bernoulli random vari-
ables (with parameter p), then set
7
ECE 3077 Notes by M. Davenport, J. Romberg, C. Rozell, and M. Wakin. Last updated 23:38, September 5, 2023
to get the first
How many trials success
Geometric random variables.
A geometric random variable models the number of Bernoulli trials it
will take to have our first success. We consider a (possibly infinitely
long) series of Bernoulli trials with parameter p, and let
pX (k) = (1 p)k 1 p, k = 1, 2, . . . .
8
ECE 3077 Notes by M. Davenport, J. Romberg, C. Rozell, and M. Wakin. Last updated 23:38, September 5, 2023
How many events happend over a period of
time
Poisson random variables.
Poisson random variables are useful for modeling the number of events
that occurred over a certain stretch of time:
How many packets will be routed to this server in the next
minute?
How many cars will pass under the 5th street bridge between
2:00p and 2:07p this afternoon?
How many photons will hit this detector in the next 5 ms?
The pmf for a Poisson random variable X is given by
k
pX (k) = e , k = 0, 1, 2, . . . ,
k!
where 0 is an intensity parameter (i.e., the larger , the more
events we can expect to happen in a given interval).
It is clear that pX (k) 0; we can check the normalization property
by recalling the Taylor series expansion for e :
2 3
e =1+ + + + ···
2! 3!
1
X k
= .
k!
k=0
Thus 1 1
X X k
pX (k) = e = e e = 1.
k!
k=0 k=0
9
ECE 3077 Notes by M. Davenport, J. Romberg, C. Rozell, and M. Wakin. Last updated 23:38, September 5, 2023
Functions of a random variable
Random variables are just that: variables. They can be manipulated
using algebraic rules just like standard variables. In particular, if you
plug a random variable into a function, the output will be another
random variable.
Here is a simple example. Suppose that X is a random variable
modeling the Atlanta rainfall each day measured in inches (rounded
to the nearest inch). I might be more interested in the rainfall in
centimeters, which I can compute as
Y = 2.54 X.
10
ECE 3077 Notes by M. Davenport, J. Romberg, C. Rozell, and M. Wakin. Last updated 23:38, September 5, 2023
Example: Suppose that X is a discrete random variable with pmf
(
1/6, when k is an integer with 3 k 2
pX (k) = .
0, otherwise
11
ECE 3077 Notes by M. Davenport, J. Romberg, C. Rozell, and M. Wakin. Last updated 23:38, September 5, 2023
Expectation of a random variable
Since random variables give us a way to talk quantitatively about un-
certain quantities, they should also give us a way to make predictions
about future outcomes. After seeing a lot of trials we could find their
average. But is it possible to compute what we expect this average to
be before seeing any data? This will lead us to our most basic way to
describe a random variable: the expectation (or expected value).
Example: Suppose you are playing a game where you roll a (fair)
die and get paid in dollars the amount shown on the die. If we let X
denote the number of dollars you win on a particular roll of the die,
then X is a random variable with pmf given by
(
1
for k = 1, . . . , 6
pX (k) = 6
0 otherwise
How much would you pay to play this game (i.e., how much money
can you expect to make “per roll”)?
If we roll the die N times, the total amount of money you make is
1 · n1 + 2 · n2 + 3 · n3 + 4 · n4 + 5 · n5 + 6 · n6 ,
12
ECE 3077 Notes by M. Davenport, J. Romberg, C. Rozell, and M. Wakin. Last updated 23:38, September 5, 2023
for all k. (After all, what do probabilities mean if not this?) This
motivates the definition of the expected payout as
This is really just a “weighted average” of all the values X can take,
where the weights are given by the probabilities that X takes each of
those values. Think of it as a center of mass for the pmf.
Important Points:
The expectation is also often referred to as the expected value
or the mean.
Although X is a random variable, the expectation of X is not
random, it is a completely deterministic function of the pmf of
X.
The expectation of X is not necessarily the same thing as the
“expected outcome”. In the example above, the expected pay-
out is 3.50, but on no actual roll of the die will you ever win
3.50. Similarly, the average number of children in a family in
the US used to be 2.1, but no one ever had 0.1 children.
13
ECE 3077 Notes by M. Davenport, J. Romberg, C. Rozell, and M. Wakin. Last updated 23:38, September 5, 2023
Exercise:
Suppose that X has the pmf
(
k
10 for k = 1, 2, 3, 4
pX (k) =
0 otherwise.
Calculate E[X].
Ans.
1
X 1 2 3 4
E[X] = k pX (k) = 1 · +2· +3· +4· = 3.
10 10 10 10
k= 1
Exercise:
Suppose that X is a Poisson random variable with parameter , mean-
ing that
k
pX (k) = e k = 0, 1, 2, . . . .
k!
Calculate E[X].
Ans.
1
X k
E[X] = ke
k!
k=0
1
X k 1
= e
(k 1)!
k=1
X1 k0
= e
k0!
k 0 =0
= e e
= .
14
ECE 3077 Notes by M. Davenport, J. Romberg, C. Rozell, and M. Wakin. Last updated 23:38, September 5, 2023
Exercise:
Google is interested in hiring you to do some consulting work for
them. They have two projects they would like your help with, but
they will only pay you if they are satisfied with your work.
Project 1 pays 1000 and you believe that the probability you
will complete the project to Google’s satisfaction is 0.8.
Project 2 pays 2000 and you believe that the probability you
will complete the project to Google’s satisfaction is 0.5.
If Google is happy with your work on the project you choose to do
first, they will pay you and then give you the chance to do the second
project, but if they don’t like your work they will send you on your
way. Which should you do first to maximize your expected earnings?
Ans. Suppose you decide to do Project 1 first. There are three
outcomes: you complete both jobs successfully (probability of 0.8 ·
0.5 = 0.4) for a total payout of 3000, you complete the first job
successfully but not the second (probability of 0.8 · 0.5 = 0.4) for a
total payout of 1000, you do not complete the first job successfully
(probability of 0.2) for a payout of 0. So in this case the expected
payout P is
E[P ] = 0.4 · 3000 + 0.4 · 1000 + 0 = $1600.
Now suppose you decide to do Project 2 first. The three outcomes
are: you complete both jobs successfully (probability of 0.5·0.8 = 0.4)
for a payout of 3000, you complete the first job successfully but not
the second (probability of 0.5 · 0.2 = 0.1) for a payout of 2000, you
do not complete the first job successfully (probability of 0.5) for a
payout of 0. In this case, the expected payout P is
E[P ] = 0.4 · 3000 + 0.1 · 2000 + 0 = $1400.
Thus, you should attempt Project 1 first.
15
ECE 3077 Notes by M. Davenport, J. Romberg, C. Rozell, and M. Wakin. Last updated 23:38, September 5, 2023
Note:
It is possible that the pmf of a random variable is well-defined, but
the expectation is not well-defined. For example, say X has pmf
6 1
pX (k) = , k = 1, 2, . . .
⇡2 k2
P 2
This is a proper pmf, since it is a fact that k 1 1/k = ⇡ 2 /6. But
1 1
6 X 1 6 X1
E[X] = 2 k 2= 2 = 1.
⇡ k ⇡ k
k=1 k=1
This is still a valid pmf; we just can’t talk about it’s expectation.
This occurs when we encounter random variables with “heavy tails”,
meaning that the probability that they take on very big values is
rather large.
E[g(X)] 6= g(E[X]).
I cannot stress this enough. Making these two things equal is one of
the most common mistakes that probability students make.
16
ECE 3077 Notes by M. Davenport, J. Romberg, C. Rozell, and M. Wakin. Last updated 23:38, September 5, 2023
Exercise:
Suppose that X is a discrete random variable with pmf
(
1
, when k is an integer with 4 k 4
pX (k) = 9 .
0, otherwise
1. Let Y = g(X), where g(k) = |k|. Compute E[Y ].
Ans.
4
X |k|
E[Y ] =
9
k= 4
4+3+2+1+0+1+2+3+4
=
9
20
= .
9
17
ECE 3077 Notes by M. Davenport, J. Romberg, C. Rozell, and M. Wakin. Last updated 23:38, September 5, 2023
Is this the same as if you calculated your expected velocity and used
that to calculate your travel time?
Ans. The expected velocity V is
Then
distance 1
= = 0.758 hours = 4.55 minutes.
E[V ] 2 · 6.6
So no, it is not the same at all. To summarize:
1 1 1 1
E[T ] = E = E 6= .
2V 2 V 2 E[V ]
18
ECE 3077 Notes by M. Davenport, J. Romberg, C. Rozell, and M. Wakin. Last updated 23:38, September 5, 2023
Variance
While the expectation tells us something about the average outcome,
we are also interested in quantifying how likely X is to be close to
that expectation. For example, consider the pmfs:
8 8
> 1 > 1
<2 , k = 1 < 2 , k = 100
pX (k) = 12 , k = 1 and pX (k) = 12 , k = 100
>
: >
:
0, otherwise 0, otherwise
In both cases, E[X] = 0, but in the second case typical values of X
are much farther from the expectation.
To quantify this, another important quantity associated with a ran-
dom variable is its variance:
var(X) = E[(X E[X])2 ].
When we compute the variance, we are calculating the expected value
of (X E[X])2 to tell us roughly how X varies from its expectation
E[X] on average. Notice that since (X E[X])2 0, the variance is
always non-negative: var(X) 0.
There is nothing particularly sacred about measuring how X varies
from E[X] via (X E[X])2 . For example, we could also measure
this via something like |X E[X]| (which is called the “absolute
deviation”). However, the particular choice of (X E[X])2 has a
very special role in probability theory.
Closely related to the variance is the standard deviation:
p
X = var(X)
The variance and the standard deviation are measures of the disper-
sion of X around its mean. We will use both, but X is often easier
to interpret since it has the same units as X. (For example, if X is
in “feet”, the var(X) is in “feet2 ” while X is also in “feet”.)
19
ECE 3077 Notes by M. Davenport, J. Romberg, C. Rozell, and M. Wakin. Last updated 23:38, September 5, 2023
Example. Let X be a Bernoulli random variable, with
(
1 p k=0
pX (k) = .
p k=1
Then
E[X] = p, var(X) = E[(X p)2 ] = p(1 p)
Exercise:
Suppose X has pmf
(
1
3 for k = 1, 2, 3
pX (k) =
0 otherwise,
20
ECE 3077 Notes by M. Davenport, J. Romberg, C. Rozell, and M. Wakin. Last updated 23:38, September 5, 2023
Exercise:
Suppose X has pmf
(
1
9 for k = 4, . . . , 4
pX (k) =
0 otherwise.
Calculate var(X).
Ans. We see that E[X] = 0, and hence
16 + 94 + 1 + 0 + 1 + 4 + 9 + 16 20
var(X) = E[X 2 ] = = .
9 3
Exercise:
Suppose X has pmf
(
1
2N +1 for k = N, . . . , N
pX (k) =
0 otherwise.
Calculate var(X).
P N3 2
It may be helpful to recall that N 2
k=1 k = 3 + N2 + N6 .
Ans. Again, it is clear that E[X] = 0, and thus
N
X
2 1
var(X) = E[X ] = k2
2N + 1
k= N
XN
1
=2 k2
2N + 1
k=1
✓ 3 ◆
2 N N2 N
= + + .
2N + 1 3 2 6
21
ECE 3077 Notes by M. Davenport, J. Romberg, C. Rozell, and M. Wakin. Last updated 23:38, September 5, 2023
Properties of mean and variance
Below, X is a random variable, and a, b 2 R are constants.
1. E[X + b] = E[X] + b
2. E[aX] = a E[X]
E[aX + b] = a E[X] + b.
where we have used the fact that since E[X] is not random at
all, E[E[X]] = E[X], etc.
5. var(X + b) = var(X).
(You can prove that at home.)
6. var(aX) = a2 var X
22
ECE 3077 Notes by M. Davenport, J. Romberg, C. Rozell, and M. Wakin. Last updated 23:38, September 5, 2023