Introduction to Probability Distributions [Autosaved]
Introduction to Probability Distributions [Autosaved]
Distributions
Abdul Rauf Siddiqi
MS Bioinformatics France
PhD Bioinformatics France
Postdoc France, MBA LUMS
Random Variable
• A random variable x takes on a defined set of
values with different probabilities.
• For example, if you roll a die, the outcome is random
(not fixed) and there are 6 possible outcomes, each of
which occur with probability one-sixth.
• For example, if you poll people about their voting
preferences, the percentage of the sample that responds
“Yes on Proposition 100” is a also a random variable (the
percentage will be slightly differently every time you
poll).
p(x)
1/6
x
1 2 3 4 5 6
P(x) 1
all x
Probability mass function (pmf)
x p(x)
1 p(x=1)=1/6
2 p(x=2)=1/6
3 p(x=3)=1/6
4 p(x=4)=1/6
5 p(x=5)=1/6
6 p(x=6)=1/6
1.0
Cumulative distribution function
(CDF)
1.0 P(x)
5/6
2/3
1/2
1/3
1/6
1 2 3 4 5 6 x
Cumulative distribution
function
x P(x≤A)
1 P(x≤1)=1/6
2 P(x≤2)=2/6
3 P(x≤3)=3/6
4 P(x≤4)=4/6
5 P(x≤5)=5/6
6 P(x≤6)=6/6
Practice Problem:
The number of patients seen in the ER in any given hour is a
random variable represented by x. The probability distribution
for x is:
x 10 11 12 13 14
P(x) .4 .2 .2 .1 .1
a. 1/6
b. 1/3
c. 1/2
d. 5/6
e. 1.0
Review Question 1
If you toss a die, what’s the probability that you
roll a 3 or less?
a. 1/6
b. 1/3
c. 1/2
d. 5/6
e. 1.0
Review Question 2
Two dice are rolled and the sum of the face
values is six? What is the probability that at
least one of the dice came up a 3?
a. 1/5
b. 2/3
c. 1/2
d. 5/6
e. 1.0
Review Question 2
Two dice are rolled and the sum of the face
values is six. What is the probability that at least
one of the dice came up a 3?
e
x x
e 0 1 1
0
0
Continuous case: “probability
density function” (pdf)
p(x)=e-x
x
1 2
2 2
x x
P(1 x 2) e e e 2 e 1 .135 .368 .23
1
1
Example 2: Uniform
distribution
The uniform distribution: all values are equally likely.
f(x)= 1 , for 1 x 0
p(x)
x
1
1 x
0
0
1 0 1
Example: Uniform distribution
What’s the probability that x is between 0 and ½?
E( X ) x p(x )
all x
i i
Continuous case:
E( X )
all x
xi p(xi )dx
Symbol Interlude
E(X) = µ
these symbols are used interchangeably
Example: expected value
x 10 11 12 13 14
P(x) .4 .2 .2 .1 .1
x i n
i 1 1
X xi ( )
n i 1 n
The probability (frequency) of each
person in the sample is 1/n.
Expected Value
Expected value is an extremely useful
concept for good decision-making!
Example: the lottery
The Lottery (also known as a tax on people
who are bad at math…)
A certain lottery works by picking 6 numbers
from 1 to 49. It costs $1.00 to play the
lottery, and if you win, you win $2 million
after taxes.
1 1 1 “49 choose 6”
7.2 x 10 -8
49 49! 13,983,816
Out of 49 numbers,
6 43!6! this is the number
of distinct
combinations of 6.
The probability function (note, sums to 1.0):
x$ p(x)
-1 .999999928
Expected Value
E(X) = P(win)*$2,000,000 + P(lose)*-$1.00
= 2.0 x 106 * 7.2 x 10-8+ .999999928 (-1) = .144 - .999999928 = -$.86
A roulette wheel has the numbers 1 through 36, as well as 0 and 00.
If you bet $1 that an odd number comes up, you win or lose $1
according to whether or not that event occurs. If random variable X
denotes your net gain, X=1 with probability 18/38 and X= -1 with
probability 20/38.
On average, the casino wins (and the player loses) 5 cents per game.
If the cost is $10 per game, the casino wins an average of 53 cents per
game. If 10,000 games are played in a night, that’s a cool $5300.
Expected value isn’t
everything though…
Take the hit new show “Deal or No Deal”
Everyone know the rules?
Let’s say you are down to two cases left. $1
and $400,000. The banker offers you
$200,000.
So, Deal or No Deal?
Deal or No Deal…
This could really be represented as a
probability distribution and a non-
random variable:
x$ p(x)
+1 .50
+$400,000 .50
x$ p(x)
+$200,000 1.0
Expected value doesn’t help…
x$ p(x)
+1 .50
+$400,000 .50
x$ p(x)
+$200,000 1.0
E ( X ) 200,000
How to decide?
Variance!
• If you take the deal, the variance/standard
deviation is 0.
•If you don’t take the deal, what is average
deviation from the mean?
•What’s your gut guess?
Variance/standard deviation
2=Var(x) =E(x-)2
2 Var( x) E[( x ) 2 ]
all x
( xi ) 2 p(xi )
Variance, continuous
Discrete case:
Var( X ) (x
all x
i ) p(xi )
2
Continuous case?:
all x
Symbol Interlude
Var(X)= 2
SD(X) =
these symbols are used interchangeably
Similarity to empirical variance
( xi x ) 2 N
i 1 1
( xi x ) ( 2
)
n 1 i 1 n 1
2
(x
all x
i ) p(xi )
2
2
all x
( xi ) 2 p(xi )
Var( X ) (x
all x
i ) p(xi )
2
Answer
2
(x )
all x
i
2
p(xi )
(1 .053) 2 (18 / 38) (1 .053) 2 (20 / 38)
(1.053) 2 (18 / 38) (1 .053) 2 (20 / 38)
(1.053) 2 (18 / 38) (.947) 2 (20 / 38)
.997
.997 .99
Standard deviation is $.99. Interpretation: On average, you’re
either 1 dollar above or 1 dollar below the mean, which is just
under zero. Makes sense!
Review Question 3
The expected value and variance of a
coin toss (H=1, T=0) are?
a. .50, .50
b. .50, .25
c. .25, .50
d. .25, .25
Review Question 3
The expected value and variance of a
coin toss are?
a. .50, .50
b. .50, .25
c. .25, .50
d. .25, .25
Important discrete probability
distribution: The binomial
Binomial Probability
Distribution
A fixed number of observations (trials), n
e.g., 15 tosses of a coin; 20 patients; 1000 people
surveyed
A binary outcome
e.g., head or tail in each toss of a coin; disease or no
disease
Generally called “success” and “failure”
Probability of success is p, probability of failure is 1 – p
Constant probability for each observation
e.g., Probability of getting a tail is the same each time
we toss the coin
Binomial distribution
Take the example of 5 coin tosses.
What’s the probability that you flip
exactly 3 heads in 5 coin tosses?
Binomial distribution
Solution:
One way to get exactly 3 heads: HHHTT
5 C3 = 5!/3!2! = 10
10 x (½)5=31.25%
Binomial distribution
function:
X= the number of heads tossed in 5 coin
tosses
p(x)
x
0 1 2 3 4 5
number of heads
Binomial distribution,
generally
Note the general pattern emerging if you have only two possible
outcomes (call them 1/0 or yes/no or success/failure) in n independent
trials, then the probability of exactly X “successes”=
n = number of trials
n X n X
p (1 p)
X 1-p = probability
of failure
X=# p=
successes probability of
out of n success
trials
Binomial distribution: example
20 10 10
(.5) (.5) .176
10
Binomial distribution: example
If I toss a coin 20 times, what’s the
probability of getting of getting 2 or
fewer heads?
20 20!
(.5) (.5)
0 20
(.5) 20 9.5 x10 7
0 20!0!
20 20!
(.5)1
(.5)19
(.5) 20 20 x9.5 x10 7 1.9 x10 5
1 19!1!
20 20!
(.5) (.5)
2 18
(.5) 20 190 x9.5 x10 7 1.8 x10 4
2 18!2!
1.8 x10 4
**All probability distributions are
characterized by an expected value and a
variance:
500 500
500
499
500
500
(.05) (.95) (.05) (.95) (.05) (.95) ... (.05) (.95) .01
0 1 2 498 10 490
0 1 2 10
Practice Problem:
You are conducting a case-control study of
smoking and lung cancer. If the probability of
being a smoker among lung cancer cases is .6,
what’s the probability that in a group of 8 cases
you have:
0 1 2 3 4 5 6 7 8
Answer, continued
0 1 2 3 4 5 6 7 8
a. 2.5
b. 13.5
c. 15.0
d. 6.0
e. .05
Review Question 4
In your case-control study of smoking and
lung-cancer, 60% of cases are smokers versus
only 10% of controls. What is the odds ratio
between smoking and lung cancer?
a. 2.5
b. 13.5 .6
c. 15.0 .4 3 x 9 27 13.5
d. 6.0 .1 2 1 2
e. .05 .9
Review Question 5
What’s the probability of getting exactly 5
heads in 10 coin tosses?
10
(.50) (.50)
5 5
a. 0
b. (.50) 5 (.50) 5
10
5
c. 10
(.50) (.50)
10 5
5
d. 10
(.50) (.50)
10 0
10
Review Question 5
What’s the probability of getting exactly 5
heads in 10 coin tosses?
10
(.50) (.50)
5 5
a. 0
b. (.50) 5 (.50) 5
10
5
c. 10
(.50) (.50)
10 5
5
d. 10
(.50) (.50)
10 0
10
Review Question 6
A coin toss can be thought of as an example of
a binomial distribution with N=1 and p=.5.
What are the expected value and variance of a
coin toss?
a. .5, .25
b. 1.0, 1.0
c. 1.5, .5
d. .25, .5
e. .5, .5
Review Question 6
A coin toss can be thought of as an example of
a binomial distribution with N=1 and p=.5.
What are the expected value and variance of a
coin toss?
a. .5, .25
b. 1.0, 1.0
c. 1.5, .5
d. .25, .5
e. .5, .5
Review Question 7
If I toss a coin 10 times, what is the expected
value and variance of the number of heads?
a. 5, 5
b. 10, 5
c. 2.5, 5
d. 5, 2.5
e. 2.5, 10
Review Question 7
If I toss a coin 10 times, what is the expected
value and variance of the number of heads?
a. 5, 5
b. 10, 5
c. 2.5, 5
d. 5, 2.5
e. 2.5, 10
Review Question 8
In a randomized trial with n=150, the goal is to
randomize half to treatment and half to
control. The number of people randomized to
treatment is a random variable X. What is the
probability distribution of X?
a. X~Normal(=75,=10)
b. X~Exponential(=75)
c. X~Uniform
d. X~Binomial(N=150, p=.5)
e. X~Binomial(N=75, p=.5)
Review Question 8
In a randomized trial with n=150, every
subject has a 50% chance of being randomized
to treatment. The number of people
randomized to treatment is a random variable
X. What is the probability distribution of X?
a. X~Normal(=75,=10)
b. X~Exponential(=75)
c. X~Uniform
d. X~Binomial(N=150, p=.5)
e. X~Binomial(N=75, p=.5)
Review Question 9
In the same RCT with n=150, if 69
end up in the treatment group and 81
in the control group, how far off is
that from expected?
x np(1 p)
Differs
by a
factor
pˆ p of n.
For proportion:
np(1 p) p(1 p)
pˆ 2 2
n n
P-hat stands for “sample p(1 p)
proportion.”
pˆ
n
It all comes back to normal…
Statistics for proportions are based on a
normal distribution, because the
binomial can be approximated as
normal if np>5