0% found this document useful (0 votes)
5 views

04_DiscreteRVs

Uploaded by

haithamnoruldeen
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views

04_DiscreteRVs

Uploaded by

haithamnoruldeen
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 23

Discrete Random Variables

ECE 3077 Notes by M. Davenport, J. Romberg, C. Rozell, and M. Wakin. Last updated 23:38, September 5, 2023
Discrete random variables
Until now, we have discussed probability almost entire in the context
of events, in which there are only two possibilities: either they happen
or they don’t, with some associated probabilities. This is limiting in
two ways. First, we often want to talk about numerical measurements
associated with these events (i.e., the probability of getting 2 heads
in 3 coin tosses, regardless of the exact sequence). Second, we often
want to think simultaneously about the probabilities associated with
all possible values of these measurements (i.e., the three probabilities
associated with getting 1, 2 or 3 heads in 3 coin tosses). In short, we
want a mathematical framework that lets us quantitatively consider
a variable that can take many values but is random in some way.
Definition: A random variable X is a mapping from the sample
space ⌦ to the real line1 :

X : ⌦ ! R,

i.e., X assigns a real number to every possible outcome in the sample


space.
Examples.
1. In an experiment involving drawing M&Ms from a bag, the
number of candies you have to draw before repeating a color.
2. In a complex system, the number of days until a part failure,
the number of parts that failed today, the number of customers
a↵ected by a failure, etc.
Random variables give us a succinct way to talk about numerical
outcomes; before, we talked about the event that there were two
1
As you might expect, we can also define random variables that are complex-
valued, vector-valued, matrix-valued, etc.

1
ECE 3077 Notes by M. Davenport, J. Romberg, C. Rozell, and M. Wakin. Last updated 23:38, September 5, 2023
heads, the event that there were three heads, etc. Now we can simply
talk about “the number of heads” as a (random) variable.
The examples above are discrete random variables because the values
they take are from a discrete set. Continuous random variables also
abound: the current temperature in the room, the amount of time
until a light bulb burns out, etc. Since the discrete case is easier, we
will start by developing some essential tools here and then generalize.

A word on notation

Random variables are just that — random; all of the interesting dis-
cussion about them occurs before we know their value. We will use
capital letters to denote random variables, e.g., X for the number of
heads in five flips.

We will use lowercase letters to denote particular outcomes a random


variable might take. For example, we might ask about

P ({X = k}) for k = 2, 3, 4.

Where this will really start to get important (and confusing) is when
we talk about multiple random variables, in which case it is easier (in
the long run) to let x denote the particular value that X might take, y
the value that Y might take, etc. This will seem a little unnatural at
first, but just remember: X (or Y or Z ...) is a random variable, while
x (or y or z) is just an old-fashioned regular variable representing an
(unknown) concrete value that X might take.

2
ECE 3077 Notes by M. Davenport, J. Romberg, C. Rozell, and M. Wakin. Last updated 23:38, September 5, 2023
Probability mass functions (pmfs) for discrete
random variables
A random variable is completely described by the probabilities of the
values it can take. These are encapsulated in the probability mass
function for X, which we denote as pX (k). For discrete random
variables, the definition is straightforward: Probability mass function:
what is the probability
pX (k) = P (X = k) . that a random variable has
some particular value
(Notice that we have adopted the notation P (X = k) over the strictly-
more-correct-but-also-more-clunky P ({X = k}).)
Examples.
For an experiment involving the roll of one fair six-sided die,
let X be the number of “pips” facing upwards at the end of the
roll (i.e., the numerical value of the result of the roll). Then
(
1/6 k = 1, 2, . . . , 6
pX (k) =
0 otherwise.

This is called a discrete uniform random variable, and the pmf


is illustrated below:
one die roll pmf
0.18

0.16

0.14

0.12

0.1

0.08

0.06

0.04

0.02

0
0 1 2 3 4 5 6 7 8

k!

3
ECE 3077 Notes by M. Davenport, J. Romberg, C. Rozell, and M. Wakin. Last updated 23:38, September 5, 2023
For an experiment involving the roll of two fair six sided dice,
let X be the sum of the values of the two rolls. Then
8
> k 1
< 36 2k7
pX (k) = 1336 k 8  k  12
>
:
0 otherwise
This pmf is illustrated below:
sum of two die pmf
0.18

0.16

0.14

0.12

0.1

0.08

0.06

0.04

0.02

0
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14

k!
An unfair coin is flipped 10 times, the probability that is lands
on heads on any one flip is 0.4, and the flips are independent of
one another. Let X be the total number of heads. Then
10
pX (k) = k (0.4)k (0.6)10 k
for k = 0, 1, 2, . . . , 10
This pmf is illustrated below:
# of heads pmf
0.35

0.3

0.25

0.2

0.15

0.1

0.05

0
0 1 2 3 4 5 6 7 8 9 10

k!

4
ECE 3077 Notes by M. Davenport, J. Romberg, C. Rozell, and M. Wakin. Last updated 23:38, September 5, 2023
You open a book written in English to a random page and place
your finger down at a random location. Let

X = 1, if the closest letter is an “a” (upper- or lower-case)


X = 2, if the closest letter is a “b”
...
X = 26, if the closest letter is a “z”.

The corresponding pmf is illustrated below:


English letters pmf
0.14

0.12

0.1

0.08

0.06

0.04

0.02

0
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26

k!
Properties for pmfs
Every pmf obeys two properties which follow immediately from the
Kolmogorov axioms:
1. Positivity:
pX (k) 0 for all k

2. Normalization: X
pX (k) = 1
k

When we combine these we also see that pX (k)  1 for all k.

5
ECE 3077 Notes by M. Davenport, J. Romberg, C. Rozell, and M. Wakin. Last updated 23:38, September 5, 2023
Examples of important discrete pmfs
Bernoulli random variables.
These are the simplest random variables of all. They only take two
values, 1 or 0. The pmf is defined by a single parameter p, the prob-
ability that X = 1:
(
p, k=1
pX (k) =
1 p, k = 0.

Bernoulli random variables are useful for things like modeling coin
flips, bits, yes/no decisions, wins/loss, make/miss, Republican/Democrat,
etc.

See also http://bit.ly/UDt7Wc (wikipedia).

6
ECE 3077 Notes by M. Davenport, J. Romberg, C. Rozell, and M. Wakin. Last updated 23:38, September 5, 2023
Binomial random variables.
We consider a fixed number n of independent Bernoulli random vari-
ables (with parameter p), then set

X = the number of the n trials that had value 1

As we have seen already


✓ ◆
n k
pX (k) = p (1 p)n k , k = 0, 1, . . . , n.
k

HOW While it is not immediately obvious,


Pnit is possible to check that pX (k)
is a valid pmf in the sense that k=0 pX (k) = 1. (This is a conse-
many quence of something called the “Binomial Theorem”.)
snclesses
Binomial random variables are useful for modeling the number of
for
“successes” (or failures) over a series of independent trials.
bernoulli
What is the probability that there is an error in k of the n bits
trials I transmit?
What is the probability that LeBron James makes k out of the
next n free throws he shoots?
What is the probability that k out of the next n visitors to my
web site click on a certain link?
What is the probability that 11 out of 12 regulation footballs
will suddenly lose air on their own?

See also http://bit.ly/UDtkIW (wikipedia).

7
ECE 3077 Notes by M. Davenport, J. Romberg, C. Rozell, and M. Wakin. Last updated 23:38, September 5, 2023
to get the first
How many trials success
Geometric random variables.
A geometric random variable models the number of Bernoulli trials it
will take to have our first success. We consider a (possibly infinitely
long) series of Bernoulli trials with parameter p, and let

X = the number of trials until I see the first 1.

Then the pmf for X is

pX (k) = (1 p)k 1 p, k = 1, 2, . . . .

(This is simply the probability that we get k 1 zeros in a row and


then a one.) This pmf satisfies the required normalization property
since
1
X 1
X 1
X 1
pX (k) = (1 p)k 1 p = p (1 p)k = p · = 1.
1 (1 p)
k=1 k=1 k=0

Geometric random variables are good models for discrete “waiting”


processes:
How many times will I flip this coin before I see a “heads”?
How many pages can I print out before the printer jams?
How many attempts until LeBron James hits his next three-
pointer?

See also http://bit.ly/TAjUQm (wikipedia).

8
ECE 3077 Notes by M. Davenport, J. Romberg, C. Rozell, and M. Wakin. Last updated 23:38, September 5, 2023
How many events happend over a period of
time
Poisson random variables.
Poisson random variables are useful for modeling the number of events
that occurred over a certain stretch of time:
How many packets will be routed to this server in the next
minute?
How many cars will pass under the 5th street bridge between
2:00p and 2:07p this afternoon?
How many photons will hit this detector in the next 5 ms?
The pmf for a Poisson random variable X is given by
k
pX (k) = e , k = 0, 1, 2, . . . ,
k!
where 0 is an intensity parameter (i.e., the larger , the more
events we can expect to happen in a given interval).
It is clear that pX (k) 0; we can check the normalization property
by recalling the Taylor series expansion for e :
2 3
e =1+ + + + ···
2! 3!
1
X k
= .
k!
k=0

Thus 1 1
X X k
pX (k) = e = e e = 1.
k!
k=0 k=0

See also http://bit.ly/UDtLTF (wikipedia).

9
ECE 3077 Notes by M. Davenport, J. Romberg, C. Rozell, and M. Wakin. Last updated 23:38, September 5, 2023
Functions of a random variable
Random variables are just that: variables. They can be manipulated
using algebraic rules just like standard variables. In particular, if you
plug a random variable into a function, the output will be another
random variable.
Here is a simple example. Suppose that X is a random variable
modeling the Atlanta rainfall each day measured in inches (rounded
to the nearest inch). I might be more interested in the rainfall in
centimeters, which I can compute as

Y = 2.54 X.

Y is another discrete random variable, and we can compute its pmf


from the pmf of X.
In general, if Y = g(X) is a function of a random variable X, we can
compute the pmf of Y using
X
pY (y) = pX (k).
{k|g(k)=y}

(Now that we are starting to talk about multiple random variables,


we will start using notation like pY (y) and pX (x). Remember that
X and Y are random variables, but x and y stand for the particular
values X and Y might take. This might seem a little confusing now,
but it is much easier in the long run since it helps you keep track of
which value goes with which random variable. This will all get a bit
clearer as we see more examples.)

10
ECE 3077 Notes by M. Davenport, J. Romberg, C. Rozell, and M. Wakin. Last updated 23:38, September 5, 2023
Example: Suppose that X is a discrete random variable with pmf
(
1/6, when k is an integer with 3  k  2
pX (k) = .
0, otherwise

1. Sketch the pmf of X.

2. Let Y = |X|. Calculate and sketch the pmf of Y .

3. Let Y = X 2 . Calculate and sketch the pmf of Y .

11
ECE 3077 Notes by M. Davenport, J. Romberg, C. Rozell, and M. Wakin. Last updated 23:38, September 5, 2023
Expectation of a random variable
Since random variables give us a way to talk quantitatively about un-
certain quantities, they should also give us a way to make predictions
about future outcomes. After seeing a lot of trials we could find their
average. But is it possible to compute what we expect this average to
be before seeing any data? This will lead us to our most basic way to
describe a random variable: the expectation (or expected value).
Example: Suppose you are playing a game where you roll a (fair)
die and get paid in dollars the amount shown on the die. If we let X
denote the number of dollars you win on a particular roll of the die,
then X is a random variable with pmf given by
(
1
for k = 1, . . . , 6
pX (k) = 6
0 otherwise

How much would you pay to play this game (i.e., how much money
can you expect to make “per roll”)?
If we roll the die N times, the total amount of money you make is

1 · n1 + 2 · n2 + 3 · n3 + 4 · n4 + 5 · n5 + 6 · n6 ,

where nk is the number of times the die landed on k. We could then


compute the average amount earned per roll as
1 · n1 + 2 · n2 + 3 · n3 + 4 · n4 + 5 · n5 + 6 · n6
M= .
N
As N gets large, we expect that
nk 1
⇡ P (X = k) = pX (k) =
N 6

12
ECE 3077 Notes by M. Davenport, J. Romberg, C. Rozell, and M. Wakin. Last updated 23:38, September 5, 2023
for all k. (After all, what do probabilities mean if not this?) This
motivates the definition of the expected payout as

E[X] = 1 · pX (1) + 2 · pX (2) + 3 · pX (3) + 4 · pX (4) + 5 · pX (5) + 6 · pX (6)


1+2+3+4+5+6
= = 3.5.
6

Definition: The expectation of a random variable X with pmf


pX (k) is X
E[X] = k pX (k).
k

This is really just a “weighted average” of all the values X can take,
where the weights are given by the probabilities that X takes each of
those values. Think of it as a center of mass for the pmf.

Important Points:
The expectation is also often referred to as the expected value
or the mean.
Although X is a random variable, the expectation of X is not
random, it is a completely deterministic function of the pmf of
X.
The expectation of X is not necessarily the same thing as the
“expected outcome”. In the example above, the expected pay-
out is 3.50, but on no actual roll of the die will you ever win
3.50. Similarly, the average number of children in a family in
the US used to be 2.1, but no one ever had 0.1 children.

13
ECE 3077 Notes by M. Davenport, J. Romberg, C. Rozell, and M. Wakin. Last updated 23:38, September 5, 2023
Exercise:
Suppose that X has the pmf
(
k
10 for k = 1, 2, 3, 4
pX (k) =
0 otherwise.

Calculate E[X].
Ans.
1
X 1 2 3 4
E[X] = k pX (k) = 1 · +2· +3· +4· = 3.
10 10 10 10
k= 1

Exercise:
Suppose that X is a Poisson random variable with parameter , mean-
ing that
k
pX (k) = e k = 0, 1, 2, . . . .
k!
Calculate E[X].
Ans.
1
X k
E[X] = ke
k!
k=0
1
X k 1
= e
(k 1)!
k=1
X1 k0
= e
k0!
k 0 =0
= e e
= .

14
ECE 3077 Notes by M. Davenport, J. Romberg, C. Rozell, and M. Wakin. Last updated 23:38, September 5, 2023
Exercise:
Google is interested in hiring you to do some consulting work for
them. They have two projects they would like your help with, but
they will only pay you if they are satisfied with your work.
Project 1 pays 1000 and you believe that the probability you
will complete the project to Google’s satisfaction is 0.8.
Project 2 pays 2000 and you believe that the probability you
will complete the project to Google’s satisfaction is 0.5.
If Google is happy with your work on the project you choose to do
first, they will pay you and then give you the chance to do the second
project, but if they don’t like your work they will send you on your
way. Which should you do first to maximize your expected earnings?
Ans. Suppose you decide to do Project 1 first. There are three
outcomes: you complete both jobs successfully (probability of 0.8 ·
0.5 = 0.4) for a total payout of 3000, you complete the first job
successfully but not the second (probability of 0.8 · 0.5 = 0.4) for a
total payout of 1000, you do not complete the first job successfully
(probability of 0.2) for a payout of 0. So in this case the expected
payout P is
E[P ] = 0.4 · 3000 + 0.4 · 1000 + 0 = $1600.
Now suppose you decide to do Project 2 first. The three outcomes
are: you complete both jobs successfully (probability of 0.5·0.8 = 0.4)
for a payout of 3000, you complete the first job successfully but not
the second (probability of 0.5 · 0.2 = 0.1) for a payout of 2000, you
do not complete the first job successfully (probability of 0.5) for a
payout of 0. In this case, the expected payout P is
E[P ] = 0.4 · 3000 + 0.1 · 2000 + 0 = $1400.
Thus, you should attempt Project 1 first.

15
ECE 3077 Notes by M. Davenport, J. Romberg, C. Rozell, and M. Wakin. Last updated 23:38, September 5, 2023
Note:
It is possible that the pmf of a random variable is well-defined, but
the expectation is not well-defined. For example, say X has pmf
6 1
pX (k) = , k = 1, 2, . . .
⇡2 k2
P 2
This is a proper pmf, since it is a fact that k 1 1/k = ⇡ 2 /6. But
1 1
6 X 1 6 X1
E[X] = 2 k 2= 2 = 1.
⇡ k ⇡ k
k=1 k=1

This is still a valid pmf; we just can’t talk about it’s expectation.
This occurs when we encounter random variables with “heavy tails”,
meaning that the probability that they take on very big values is
rather large.

Expectations of functions of a random variable


By recalling our discussion about functions of random variables, it
is straightforward to define the expectation of a function a random
variable g(X) via:
X
E[g(X)] = g(k)pX (k).
k

It is important to note that in general:

E[g(X)] 6= g(E[X]).

I cannot stress this enough. Making these two things equal is one of
the most common mistakes that probability students make.

16
ECE 3077 Notes by M. Davenport, J. Romberg, C. Rozell, and M. Wakin. Last updated 23:38, September 5, 2023
Exercise:
Suppose that X is a discrete random variable with pmf
(
1
, when k is an integer with 4  k  4
pX (k) = 9 .
0, otherwise
1. Let Y = g(X), where g(k) = |k|. Compute E[Y ].
Ans.
4
X |k|
E[Y ] =
9
k= 4
4+3+2+1+0+1+2+3+4
=
9
20
= .
9

2. In this case, does E[g(X)] = g(E[X])?


Ans. Here E[X] = 0, so this is clearly not true.
Exercise:
If the weather is good, which happens with probability 0.7, I walk the
half mile from Tech Square to Klaus at a speed of V = 3 miles per
hour; otherwise, I take the Tech Trolley. Assume that when I take
the trolley my average speed is V = 15 miles per hour. What is the
expected value of the time T it takes me to get to class?
Ans. If I walk (which happens with probability 0.7), the travel time is
0.5/3 hrs. If I take the trolley (which happens with probability 0.3),
the travel time is 0.5/15 hrs. Thus the expected travel time T is
05. 0.5 3.8
E[T ] = 0.7 · + 0.3 · = = 0.1267 hours = 7.6 minutes.
3 15 30

17
ECE 3077 Notes by M. Davenport, J. Romberg, C. Rozell, and M. Wakin. Last updated 23:38, September 5, 2023
Is this the same as if you calculated your expected velocity and used
that to calculate your travel time?
Ans. The expected velocity V is

E[V ] = 0.7 · 3 + 0.3 · 15 = 6.6.

Then
distance 1
= = 0.758 hours = 4.55 minutes.
E[V ] 2 · 6.6
So no, it is not the same at all. To summarize:
 
1 1 1 1
E[T ] = E = E 6= .
2V 2 V 2 E[V ]

18
ECE 3077 Notes by M. Davenport, J. Romberg, C. Rozell, and M. Wakin. Last updated 23:38, September 5, 2023
Variance
While the expectation tells us something about the average outcome,
we are also interested in quantifying how likely X is to be close to
that expectation. For example, consider the pmfs:
8 8
> 1 > 1
<2 , k = 1 < 2 , k = 100
pX (k) = 12 , k = 1 and pX (k) = 12 , k = 100
>
: >
:
0, otherwise 0, otherwise
In both cases, E[X] = 0, but in the second case typical values of X
are much farther from the expectation.
To quantify this, another important quantity associated with a ran-
dom variable is its variance:
var(X) = E[(X E[X])2 ].
When we compute the variance, we are calculating the expected value
of (X E[X])2 to tell us roughly how X varies from its expectation
E[X] on average. Notice that since (X E[X])2 0, the variance is
always non-negative: var(X) 0.
There is nothing particularly sacred about measuring how X varies
from E[X] via (X E[X])2 . For example, we could also measure
this via something like |X E[X]| (which is called the “absolute
deviation”). However, the particular choice of (X E[X])2 has a
very special role in probability theory.
Closely related to the variance is the standard deviation:
p
X = var(X)
The variance and the standard deviation are measures of the disper-
sion of X around its mean. We will use both, but X is often easier
to interpret since it has the same units as X. (For example, if X is
in “feet”, the var(X) is in “feet2 ” while X is also in “feet”.)

19
ECE 3077 Notes by M. Davenport, J. Romberg, C. Rozell, and M. Wakin. Last updated 23:38, September 5, 2023
Example. Let X be a Bernoulli random variable, with
(
1 p k=0
pX (k) = .
p k=1

Then
E[X] = p, var(X) = E[(X p)2 ] = p(1 p)

Exercise:
Suppose X has pmf
(
1
3 for k = 1, 2, 3
pX (k) =
0 otherwise,

and Y has pmf (


1
3 for k = 0, 2, 4
pY (k) =
0 otherwise.
Calculate var(X) and var(Y ).
Ans. It is clear that E[X] = E[Y ] = 2. Thus
1 1 1 2
var(X) = · (1 2)2 + · (2 2)2 + · (3 2)2 = .
3 3 3 3
Similarly, for Y ,
1 1 1 8
var(Y ) = · (0 2)2 + · (2 2)2 + · (4 2)2 = .
3 3 3 3

20
ECE 3077 Notes by M. Davenport, J. Romberg, C. Rozell, and M. Wakin. Last updated 23:38, September 5, 2023
Exercise:
Suppose X has pmf
(
1
9 for k = 4, . . . , 4
pX (k) =
0 otherwise.

Calculate var(X).
Ans. We see that E[X] = 0, and hence
16 + 94 + 1 + 0 + 1 + 4 + 9 + 16 20
var(X) = E[X 2 ] = = .
9 3

Exercise:
Suppose X has pmf
(
1
2N +1 for k = N, . . . , N
pX (k) =
0 otherwise.

Calculate var(X).
P N3 2
It may be helpful to recall that N 2
k=1 k = 3 + N2 + N6 .
Ans. Again, it is clear that E[X] = 0, and thus
N
X
2 1
var(X) = E[X ] = k2
2N + 1
k= N
XN
1
=2 k2
2N + 1
k=1
✓ 3 ◆
2 N N2 N
= + + .
2N + 1 3 2 6

21
ECE 3077 Notes by M. Davenport, J. Romberg, C. Rozell, and M. Wakin. Last updated 23:38, September 5, 2023
Properties of mean and variance
Below, X is a random variable, and a, b 2 R are constants.

1. E[X + b] = E[X] + b

2. E[aX] = a E[X]

3. We can collect the two results above into one statement:

E[aX + b] = a E[X] + b.

So if g has the form g(x) = ax + b, then we actually do have


E[g(X)] = g(E[X]) — but again, this is not true in general.

4. var(X) = E[X 2 ] (E[X])2 .


It is easy to see why this is true:

var(X) = E[(X E[X])2 ] = E[X 2 ] 2 E[X E[X]] + E[(E[X])2 ]


= E[X 2 ] (E[X])2 ,

where we have used the fact that since E[X] is not random at
all, E[E[X]] = E[X], etc.

5. var(X + b) = var(X).
(You can prove that at home.)

6. var(aX) = a2 var X

22
ECE 3077 Notes by M. Davenport, J. Romberg, C. Rozell, and M. Wakin. Last updated 23:38, September 5, 2023

You might also like