0% found this document useful (0 votes)
2 views

Ch3 Random Variables

The document provides an introduction to probability, focusing on discrete random variables and their properties, including definitions, distributions, and operations. It covers specific distributions such as Bernoulli, binomial, and geometric distributions, along with examples and probability mass functions. Additionally, it discusses the algebra of random variables, including operations like addition and multiplication, and the concept of joint distribution.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views

Ch3 Random Variables

The document provides an introduction to probability, focusing on discrete random variables and their properties, including definitions, distributions, and operations. It covers specific distributions such as Bernoulli, binomial, and geometric distributions, along with examples and probability mass functions. Additionally, it discusses the algebra of random variables, including operations like addition and multiplication, and the concept of joint distribution.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 27

[ ] - Probability

Probability
Stefano Bonaccorsi
/

/
Table of Contents
Introduction

▶ Introduction

▶ Properties of random variables

▶ Specific distributions

▶ Operations on random variables

/
Discrete random variables
Introduction

Learning Goals
. Know the definition of a discrete random variable.
. Know the Bernoulli, binomial, and geometric distributions and examples of what
they model.
. Be able to describe the probability mass function and cumulative distribution
function using tables and formulas.
. Be able to construct new random variables from old ones.

Random Variables
This topic is largely about introducing some useful terminology, building on the notions of
sample space and probability function. The key words are
. Random variable X
. Probability distribution function (pdf, p(x))
. Cumulative distribution function (cdf, F(x))
/
Discrete random variables
Introduction

Discrete random variables


A discrete random variable X is just a mapping X : Ω → R where R is a discrete (finite or
numerable) subset of the real numbers R.
In other words: it is a function that assigns a number x to each outcome in Ω.
The set R is called the range of X. The term ‘discrete’ is included because of the restriction
on R (in Chapter we will study continuous random variables, where this restriction is
removed). In many of the examples we will study below, R will be a subset of the integers
N.
Note
In a more general framework, we should restrict the notion of random variables to those
mappings satisfying an additional property called measurability. If, as in many of the
examples given below, we have F = P(Ω), this is automatically satisfied. We will discuss
this again in Chapter .

/
Example .
Two fair dice are thrown. What is the probability that the sum of the numbers appearing
on the two upturned faces equals seven?

Solution
We write each outcome as a pair of numbers: (score on die , score on die )
As each die has six faces, the basic principle of counting tells us that Ω has members.
1
By the principle of symmetry, we see that each outcome has probability 36 . We now
introduce a random variable X = sum of scores of the two dice.
Clearly, X can take any whole number between and as its value. We are interested in
the event that X takes the value . We write this simply as (X = 7), then

(X = 7) = {(1, 6), (2, 5), (3, 4), (4, 3), (5, 2), (6, 1)}

Now we have P(X = 7) = 16 .

/
Example .
Two fair dice are thrown. Which sum of scores on the two dice has the maximum
probability?

Solution
To establish this, we calculate all the probabilities P(X = r) where r goes from to ,
imitating the process of Example . . We find that
r=
P(X = r) = 1
36
2
36
3
36
4
36
5
36
6
36
5
36
4
36
3
36
2
36
1
36

so we see that X = 7 is the best bet.

/
Table of Contents
Properties of random variables

▶ Introduction

▶ Properties of random variables

▶ Specific distributions

▶ Operations on random variables

/
Let X be a random variable on Ω taking values in the set R = {x1 , x2 , . . . , xn } ⊂ R. Now
define the numbers
p(xj ) = P(X = xj ), for 1 ≤ j ≤ n.

p is called the probability distribution (or probability law) of X. For example, in Example
. above, R = {2, 3, ..., 12} and p(2) = 36 1
, p(3) = 181
, etc.

/
Now consider the Boolean algebra P(R) and for A ∈ P(R) define

p(A) = p(xj ),
xj ∈A

where the sum is over those xj ’s which lie in A.


Lemma
p is a probability measure on P(R).

Proof.
p(A) ≥ 0 since for each atom xj it holds p(xj ) ≥ 0. p is a probability measure since
∑ ∑
p(R) = p(xj ) = P(X = xj ) = P( ∪ (X = xj )) = P(X ∈ R) = 1
xj ∈R
xj ∈R xj ∈R

/
For those who have studied mappings on sets we can characterize p as the probability
measure on P(R) given by p = P ◦ X−1 .

Although, in the above example, the sum which defines p(A) is finite, if R has an infinite
number of elements then the corresponding sum will be infinite and should thus be
interpreted as a convergent series.

/
Cumulative distribution function
A very useful function that we can associate with a random variable X is its cumulative
distribution F. This is defined by

F(x) = P(X ≤ x) = P(X = y)
y∈R, y≤x

Lemma .
Assume that the elements in R are given in increasing order: xi < xi+1 if xi , xi+1 ∈ R. Then

F(xi+1 ) − F(xi ) = p(xi+1 )

In particular, F is an increasing function, that is, F(xi+1 ) ≥ F(xi ).


If xn is the last element in R then F(xn ) = 1.

/
Returning to the setting of Example . we have the following table:
r=
1 3 6 10 15 21 26 30 33 35
F(r) = 36 36 36 36 36 36 36 36 36 36 1

Remark
Notice that P(X > x) = 1 − F(x) and P(x < X ≤ y) = F(y) − F(x).

/
Table of Contents
Specific distributions

▶ Introduction

▶ Properties of random variables

▶ Specific distributions

▶ Operations on random variables

/
Uniform
Specific distributions

This is a random variable X for which


1
p(x1 ) = p(x2 ) = ... = p(xn ) =
n
Uniform random variables arise naturally when we apply the principle of symmetry.
An example is throwing a single fair die and letting X = number on uppermost face of di.e
In this case R = {1, 2, 3, 4, 5, 6} and n = 6.

/
Bernoulli random variables
Specific distributions

These are named after Jacob Bernoulli, who wrote an important manuscript about
probability theory in the eighteenth century. We take R = {0, 1} and define p(1) = p,
p(0) = 1−p where 0 ≤ p ≤ 1.
Examples of such random variables include the tossing of a biased coin, choosing an
experience at random that might be a ‘success’ or ‘failure’, or emitting symbols into a
binary symmetric channel.
A Bernoulli random variable is uniform if and only if p = 21 so that both and are
equiprobable. This special case is often called the symmetric Bernoulli distribution.

/
‘Certain’ variables
Specific distributions

Suppose that out of a range of possibilities {x1 , x2 , . . . , xn }, we know for certain that a
particular value, xj , say, will occur. It is often useful to consider xj as the value of a random
variable X whose probability law is the Dirac measure δxj at xj .

R Someone may wonder why we haven’t specified the probability space in the above
examples. This is because it has, to some extent, become irrelevant and all the
information we require is contained in the random variable and its distribution

/
Binomial distribution
Specific distributions
The binomial distribution Binomial(n, p), or Bin(n, p), models the number of successes in
n independent Bernoulli(p) trials.
There is a hierarchy here:
• A single Bernoulli trial is, for example, one toss of a coin.
• A single binomial trial consists of n Bernoulli trials.

For coin flips, the sample space for a Bernoulli trial is {H, T}. The sample space for a
binomial trial consists of all sequences of heads and tails of length n. Similarly:
• A Bernoulli random variable takes values in {0, 1}.
• A binomial random variable takes values in {0, 1, 2, . . . , n}.

R Binomial(1, p) is equivalent to Bernoulli(p).


R The number of heads in n flips of a coin with probability p of heads follows a
Binomial(n, p) distribution.
/
Binomial Probability Distribution Function
Specific distributions

We describe X ∼ Bin(n, p) by giving its values and probabilities. Let k be an arbitrary


integer in {0, 1, . . . , n}.
The binomial coefficient, also known as ”n choose k,” is given by:
( )
n n!
= .
k k!(n − k)!

The probability distribution function (pdf) of a Binomial(n, p) random variable is:

Values a 0 (n) 1 ··· (n) k ··· n


pdf p(a) (1 − p)n 1 p1 (1− p)n−1 ··· k pk (1− p)n−k ··· pn

/
Example. Probability of or More Heads for a Fair Coin
Consider X ∼ Bin(5, p) where p = 1/2.
The
(5) binomial (5)for n =(55)are:
(5)coefficients ( 5) ( 5)
0 = 1, 1 = 5, 2 = 10, 3 = 10, 4 = 5, 5 = 1.
The pmf table for X ∼ Bin(5, p) is:

a
p(a) (1 − p)5 5p(1 − p)4 10p2 (1 − p)3 10p3 (1 − p)2 5p4 (1 − p) p5

Calculating P(X ≥ 3):


( ) ( ) 3 ( )2 ( ) ( )4 ( )1 ( ) ( )5
5 1 1 5 1 1 5 1
P(X ≥ 3) = + +
3 2 2 4 2 2 5 2
1 1 1 16 1
= 10 · +5· +1· = = .
32 32 32 32 2
Think: Why is the value 1/2 not surprising?
/
Geometric Distributions
Specific distributions

A geometric distribution models the number of tails before the first head in a sequence of
coin flips (Bernoulli trials).
We may define tails as failures and heads as successes. Then, X is the number of failures
before the first success.
This model applies to many different scenarios. The most neutral language describes it as
the number of tails before the first head.

/
Formal Definition of a Geometric Distribution
Specific distributions
The random variable X follows a geometric distribution with parameter p if:
• X takes values 0, 1, 2, 3, . . .
• Its probability mass function (pmf) is given by:

P(X = k) = (1 − p)k p. ()

We denote this by X ∼ Geometric(p) or Geo(p). In table form:


Values a ···
pmf p(a) p (1 − p)p (1 − p)2 p (1 − p)3 p ···
The geometric distribution is a discrete distribution with an infinite number of possible
values. When working with successes and failures, one must be clear whether modeling
the number of successes before the first failure or the number of failures before the first
success. To avoid confusion, use the neutral language of counting tails before the first
head.
/
Example. Computing Geometric Probabilities
Suppose that the inhabitants of an island plan their families by having children until the
first girl is born. Assume:
• The probability of having a girl with each pregnancy is 0.5, independent of other
pregnancies.
• All babies survive, and there are no multiple births.
What is the probability that a family has k boys?

Solution: Using neutral language, consider boys as tails and girls as heads. Then, the
number of boys in a family is the number of tails before the first head.
Let X be the number of boys in a randomly chosen family. Since X is a geometric random
variable, we compute: P(X = k) = ( 12 )k · ( 12 ) = ( 12 )k+1 .
Think: What is the ratio of boys to girls on the island?

/
Table of Contents
Operations on random variables

▶ Introduction

▶ Properties of random variables

▶ Specific distributions

▶ Operations on random variables

/
Operations on random variables
Operations on random variables
Next, we consider the ‘algebra of random variables’. Random variables can be added,
multiplied by ‘scalars’ and multiplied together to form new random variables.
More precisely, let X and Y be two random variables and α be a real number; then we can
form new random variables X + Y, αX, X + α and XY.
To understand the meaning of these operations, we consider some examples. For
example:
(i) Let X be the number of boys born in your city next year and Y be the number of girls;
then X + Y is the number of children born next year.
(ii) Let X denote the length (in feet) of a randomly chosen wooden plank removed from
a lorry load. Let α = 0.3038; then αX denotes the same length in meters.
(iii) If X is as in (ii) and α is the known length of a steel cap to be placed on the edge of
each plank, then X + α is the length of a (randomly chosen) capped plank.
(iv) Let X be a random force acting on a particle and Y be a random distance that the
particle might move; then XY is the (random) work done on the particle.

/
Operations on random variables
Operations on random variables

In general, we can form the random variable f(X), where f is any function on R: if X has
range {x1 , x2 , . . . , xn }, then f(X) has range {f(x1 ), f(x2 ), . . . , f(xn )} (whose elements need
not to be different!)

Example
Suppose that X is uniformly distributed in {0, 1, 2}. Then Y = cos(X/2) is uniformly
distributed in {−1, 0, 1}.
However, if X is uniformly distributed in {0, 1, . . . , 4}, then Z = cos(X/2) has the same
range as Y but a different distribution pZ (−1) = 51 , pZ (0) = pZ (1) = 25

/
Joint distribution
Operations on random variables

Let X be a random variable with range RX = {x1 , x2 , . . . , xn } and probability law p and let
Y be a random variable (with the same probability space as X) with range
RY = {y1 , y2 , . . . , ym } and probability law q. We define the joint distribution of X and Y by

pij = P(X = xi , Y = yj )

for 1 ≤ i ≤ n, 1 ≤ j ≤ m.

Note that the following equalities hold:


m ∑
n ∑
n ∑
m
pij = pi , pij = qj , pij = 1.
j=1 i=1 i=1 j=1

/
[ ] - Probability
Thank you for listening!
Any questions?

You might also like