Introduction To Probability Distributions
Introduction To Probability Distributions
Distributions
Random Variables
Random Variable (RV): A numeric outcome that results
from an experiment
For each element of an experiment’s sample space, the
random variable can take on exactly one value
Discrete Random Variable: An RV that can take on only
a finite or countably infinite set of outcomes
Continuous Random Variable: An RV that can take on
any value along a continuum (but may be reported
“discretely”)
Random Variables are denoted by upper case letters (Y)
Individual outcomes for an RV are denoted by lower
case letters (y)
Random Variable
• A random variable x takes on a defined set
of values with different probabilities.
• For example, if you roll a die, the outcome is random
(not fixed) and there are 6 possible outcomes, each of
which occur with probability one-sixth.
• For example, if you poll people about their voting
preferences, the percentage of the sample that responds
“Yes on Proposition 100” is a also a random variable (the
percentage will be slightly differently every time you
poll).
-3 -2 -1 0 1 2 3
2.1.1 Definition of a Random
Variable (2/2)
0 pi 1 i pi 1
and
Probability : P( X xi ) pi
2.1.2 Probability Mass Function
(1/2) Example 1 : Machine
Breakdowns
P (cost=50)=0.3,
P (cost=200)=0.2,
xi 50 200 350
P (cost=350)=0.5 0.3 0.2 0.5
pi
0.3 + 0.2 + 0.5 =1
f ( x)
0.5
0.3
0.2
Abbreviation : c.d.f
F ( x)
1.0
0.5
0.3
0.5
0.3
p(x)
1/6
x
1 2 3 4 5 6
P(x) 1
all x
Probability mass function (pmf)
x p(x)
1 p(x=1)=1/6
2 p(x=2)=1/6
3 p(x=3)=1/6
4 p(x=4)=1/6
5 p(x=5)=1/6
6 p(x=6)=1/6
1.0
Cumulative distribution function
(CDF)
1.0 P(x)
5/6
2/3
1/2
1/3
1/6
1 2 3 4 5 6 x
Cumulative distribution
function
x P(x≤A)
1 P(x≤1)=1/6
2 P(x≤2)=2/6
3 P(x≤3)=3/6
4 P(x≤4)=4/6
5 P(x≤5)=5/6
6 P(x≤6)=6/6
Practice Problem:
The number of patients seen in any given hour is a random
variable represented by x. The probability distribution for x is:
x 10 11 12 13 14
P(x) 0.4 0.2 0.2 0.1 0.1
Find the probability that in a given hour:
a. exactly 14 patients arrive p(x=14)= 0.1
b. At least 12 patients arrive p(x12)= (0.2 + 0.1 +0.1) =
0.4
c. At most 11 patients arrive p(x≤11)= (0.4 +0.2) =
0.6
Review Question 1
If you toss a die, what’s the probability that you
roll a 3 or less?
a. 1/6
b. 1/3
c. 1/2
d. 5/6
e. 1.0
Review Question 1
If you toss a die, what’s the probability that you
roll a 3 or less?
a. 1/6
b. 1/3
c. 1/2
d. 5/6
e. 1.0
Review Question 2
Two dice are rolled and the sum of the face
values is six? What is the probability that at
least one of the dice came up a 3?
a. 1/5
b. 2/3
c. 1/2
d. 5/6
e. 1.0
Review Question 2
Two dice are rolled and the sum of the face
values is six. What is the probability that at least
one of the dice came up a 3?
e
x x
e 0 1 1
0
0
2.2 Continuous Random Variables
2.2.1 Example of Continuous Random Variables (1/1)
statespace
f ( x)dx 1
2.2.2 Probability Density Function (2/4)
50.5
49.5
(1.5 6( x 50.0) 2 )dx [1.5 x 2( x 50.0)3 ]50.5
49.5 49.5 50.5 x
[1.5 50.5 2(50.5 50.0)3 ]
[1.5 49.5 2(49.5 50.0)3 ]
75.5 74.5 1.0
2.2.2 Probability Density Function (4/4)
50.1
49.8
(1.5 6( x 50.0) 2 ) dx [1.5 x 2( x 50.0) 3 ]50.1
49.8
f ( x)
[1.5 50.1 2(50.1 50.0) 3 ]
[1.5 49.8 2(49.8 50.0) 3 ]
75.148 74.716 0.432
dF ( x )
f ( x)
dx
P (a X b) P ( X b) P ( X a )
F (b) F (a )
P (a X b) P (a X b)
2.2.2 Probability Density Function (2/3)
x
F ( x) P( X x) (1.5 6( y 50.0) 2 )dy
49.5
P ( X 50.0) 0.5
F ( x)
P( X 49.7) 0.104
p(x)=e-x
x
1 2
2 2
P(1 x 2) e x
e x
e 2 e 1 0.135 0.368 0.23
1
1
Expected Value and Variance
Then, E ( X ) (why?)
E( X ) x p(x )
all x
i i
Continuous case:
E( X )
all x
xi p(xi )dx
Symbol Interlude
E(X) = µ
these symbols are used interchangeably
Example: expected value
x 10 11 12 13 14
P(x) 0.4 0.2 0.2 0.1 0.1
x i n
1
X i 1
n
i 1
xi ( )
n
1 1 1 “49 choose 6”
7.2 x 10-8
49 49! 13,983,816
Out of 49
6 43!6!
numbers, this is
the number of
distinct
The probability function (note, sums to 1.0): combinations of 6.
x$ p(x)
-1 .999999928
Expected Value
E(X) = P(win)*$2,000,000 + P(lose)*-$1.00
= 2.0 x 106 * 7.2 x 10-8+ .999999928 (-1) = .144 - .999999928 = -$.86
A roulette wheel has the numbers 1 through 36, as well as 0 and 00. If you
bet $1 that an odd number comes up, you win or lose $1 according to
whether or not that event occurs. If random variable X denotes your net
gain, X=1 with probability 18/38 and X= -1 with probability 20/38.
E(X) = 1(18/38) – 1 (20/38) = -$.053
On average, the casino wins (and the player loses) 5 cents per game.
The casino rakes in even more if the stakes are higher:
E(X) = 10(18/38) – 10 (20/38) = -$.53
If the cost is $10 per game, the casino wins an average of 53 cents per game.
If 10,000 games are played in a night, that’s a cool $5300.
Expected value isn’t
everything though…
Take the hit new show “Deal or No Deal”
Everyone know the rules?
Let’s say you are down to two cases left. $1
and $400,000. The banker offers you
$200,000.
So, Deal or No Deal?
Deal or No Deal…
This could really be represented as a
probability distribution and a non-
random variable:
x$ p(x)
+1 0.50
+$400,000 0.50
x$ p(x)
+$200,000 1.0
Expected value doesn’t help…
x$ p(x)
+1 0.50
+$400,000 0 .50
x$ p(x)
+$200,000 1.0
E ( X ) 200,000
How to decide?
Variance!
• If you take the deal, the variance/standard
deviation is 0.
•If you don’t take the deal, what is average
deviation from the mean?
•What’s your gut guess?
Variance/standard deviation
2=Var(x) =E(x-)2
2 Var ( x) E[( x ) 2 ]
all x
( xi ) 2 p(xi )
Variance, continuous
Discrete case:
Var ( X ) (x
all x
i ) p(xi )
2
Continuous case?:
( xi ) p(xi )dx
2
Var ( X )
all x
Symbol Interlude
Var(X)= 2
SD(X) =
these symbols are used interchangeably
Similarity to empirical variance
( xi x ) 2 N
1
i 1
n 1
i 1
( xi x ) (2
n 1
)
2
(x
all x
i ) p(xi )
2
2
all x
( xi ) 2 p(xi )
.997 .99
Standard deviation is $.99. Interpretation: On average, you’re
either 1 dollar above or 1 dollar below the mean, which is just
under zero. Makes sense!
Review Question 3
The expected value and variance of a
coin toss (H=1, T=0) are?
a. .50, .50
b. .50, .25
c. .25, .50
d. .25, .25
Review Question 3
The expected value and variance of a
coin toss are?
a. .50, .50
b. .50, .25
c. .25, .50
d. .25, .25
Important discrete probability
distribution: The binomial
Bernoulli Trials
Two Possible Outcomes
Success, with probability p
Failure, with probability q = 1 p
k
As k varies with fixed n and p, the binomial
probabilities define a binomial probability
distribution over {0, 1, 2, …, n}.
Mean and Mode of a Binomial
Mean: The expected number of
successes in n trials.
np
Mode: The most likely number of
successes in n trials.
m int np p
Binomial Probability
Distribution
A fixed number of observations (trials), n
e.g., 15 tosses of a coin; 20 patients; 1000 people
surveyed
A binary outcome
e.g., head or tail in each toss of a coin; disease or no
disease
Generally called “success” and “failure”
Probability of success is p, probability of failure is 1 – p
Constant probability for each observation
e.g., Probability of getting a tail is the same each time we
toss the coin
Binomial distribution
Take the example of 5 coin tosses.
What’s the probability that you flip
exactly 3 heads in 5 coin tosses?
Binomial distribution
Solution:
One way to get exactly 3 heads: HHHTT
10 x (½)5=31.25%
Binomial distribution
function:
X= the number of heads tossed in 5 coin
tosses
p(x)
x
0 1 2 3 4 5
number of heads
Binomial distribution,
generally
Note the general pattern emerging if you have only two possible
outcomes (call them 1/0 or yes/no or success/failure) in n independent
trials, then the probability of exactly X “successes”=
n = number of trials
n X n X
p (1 p )
X 1-p = probability
of failure
X=# p=
successes probability of
out of n success
trials
Binomial distribution: example
20 10 10
(.5) (.5) .176
10
Binomial distribution: example
If I toss a coin 20 times, what’s the
probability of getting of getting 2 or
fewer heads?
20 20!
0
(.5) (.5)
20
(.5) 20 9.5 x10 7
0 20!0!
20 20!
(.5)1
(.5)19
(.5) 20 20 x9.5 x10 7 1.9 x10 5
1 19!1!
20 20!
2 18
(.5) (.5) (.5) 20 190 x9.5 x10 7 1.8 x10 4
2 18!2!
1.8 x10 4
**All probability distributions are
characterized by an expected value and a
variance:
0 1 2 3 4 5 6 7 8
Answer, continued
0 1 2 3 4 5 6 7 8
a. 2.5
b. 13.5
c. 15.0
d. 6.0
e. .05
Review Question 4
In your case-control study of smoking and lung-
cancer, 60% of cases are smokers versus only 10%
of controls. What is the odds ratio between smoking
and lung cancer?
a. 2.5
b. 13.5
c. 15.0
.6
d. 6.0 .4 3 x 9 27 13.5
e. .05 .1 2 1 2
.9
Review Question 5
What’s the probability of getting exactly 5
heads in 10 coin tosses?
10 5 5
(.50) (.50)
a. 0
b. (.50) 5 (.50) 5
10
5
c. 10 10
(.50) (.50)
5
5
d. 10 10 0
(.50) (.50)
10
Review Question 5
What’s the probability of getting exactly 5
heads in 10 coin tosses?
10 5 5
(.50) (.50)
a. 0
b. (.50) 5 (.50) 5
10
5
c. 10 10
(.50) (.50)
5
5
d. 10 10 0
(.50) (.50)
10
Review Question 6
A coin toss can be thought of as an example of a binomial
distribution with N=1 and p=.5. What are the expected
value and variance of a coin toss?
a. .5, .25
b. 1.0, 1.0
c. 1.5, .5
d. .25, .5
e. .5, .5
Review Question 6
A coin toss can be thought of as an example of a binomial
distribution with N=1 and p=.5. What are the expected
value and variance of a coin toss?
a. .5, .25
b. 1.0, 1.0
c. 1.5, .5
d. .25, .5
e. .5, .5
Review Question 7
If I toss a coin 10 times, what is the expected value
and variance of the number of heads?
a. 5, 5
b. 10, 5
c. 2.5, 5
d. 5, 2.5
e. 2.5, 10
Review Question 7
If I toss a coin 10 times, what is the expected value
and variance of the number of heads?
a. 5, 5
b. 10, 5
c. 2.5, 5
d. 5, 2.5
e. 2.5, 10
Review Question 8
In a randomized trial with n=150, the goal is to randomize
half to treatment and half to control. The number of people
randomized to treatment is a random variable X. What is
the probability distribution of X?
a. X~Normal(=75,=10)
b. X~Exponential(=75)
c. X~Uniform
d. X~Binomial(N=150, p=.5)
e. X~Binomial(N=75, p=.5)
Review Question 8
In a randomized trial with n=150, every subject has a 50%
chance of being randomized to treatment. The number of
people randomized to treatment is a random variable X.
What is the probability distribution of X?
a. X~Normal(=75,=10)
b. X~Exponential(=75)
c. X~Uniform
d. X~Binomial(N=150, p=.5)
e. X~Binomial(N=75, p=.5)
Review Question 9
In the same RCT with n=150, if 69
end up in the treatment group and 81
in the control group, how far off is
that from expected?
x np (1 p )
Differs
by a
factor
pˆ p of n.
For proportion:
np(1 p) p (1 p)
pˆ 2 2
n n
P-hat stands for “sample p (1 p )
proportion.” pˆ
n
It all comes back to normal…
Statistics for proportions are based on a
normal distribution, because the
binomial can be approximated as
normal if np>5