Lecture+8+Probability+Theory
Lecture+8+Probability+Theory
Data Science
Lecture 08:
Probability Theory
Anirudh Wodeyar & Tim Dick
Recap
• Sampling distribution: Distribution of a statistic (e.g. a
mean)
• Confidence Limits
• Bootstrapping
The origins of probability theory: Gambling
• Let’s bet.
frequencies into a HH
probability tree:
How? HT
TH
TT
Some useful jargon
Experiment
An action whose outcome is determined by chance
Example: Flipping a coin
Sample Space
The set of possible outcomes of an experiment (denoted by )
Example: the number of spots on each side of the die
Event
A subset of the sample space (denoted by )
Example: the outcome of the experiment has two heads.
Thinking of probability as expected frequency
• Can turn expected
frequencies into a
probability tree
• Now, probability
along any branch is
multiplication of
probabilities along it
Thinking of probability as expected frequency
• What if we first
flipped a coin, then
rolled a fair die?
Thinking of probability as expected frequency
Experiment: Draw a card out a full deck of cards
(52 outcomes)
Figure: highlighted
Overlapping Event Probabilities
Intersection of Events and (denoted by )
Example: Rolling a single die
Figure: highlighted
Mutually exclusive/Disjoint Event Probabilities can be added
(OR rule)
3. Probabilities of events that are mutually exclusive
(no overlapping events) can be added
together to get the total probability.
then:
Independent event probabilities can be multiplied
4. Multiple probabilities for the total probability of a
sequence of events (that are happening
together).
Events are sets, as a consequence we also have
Union of Events and (denoted by )
Example: Rolling a single die
Figure: highlighted
Mutual Exclusion vs Independence
• What is the difference?
The origins of probability theory: Gambling
• Can you now calculate what the optimal odds should be
for the following? Assuming we played it 1000 times.
Probability of an event
The probability of an event is the sum of the probabilities of the
elements in (denoted by )
Probability measures (example)
A fair die
An unfair die
Probability Theory Basics
Rules of calculating probabilities
• , if each element in has equal probability, where denotes the
cardinality (how many?) of the set.
• Given events and :
• For disjoint events and :
¿ +¿ −
Probability Theory Basics
Rules of calculating probabilities
• , if each element in has equal probability, where denotes the
cardinality of the set.
• Given events and :
• For disjoint events and :
• For any event :
Proof:
Calculating Probabilities (example)
Example: Rolling a fair die
, events:
Everything is Conditional!
• Probability of heads being 0.5 is conditional on a truly
fair coin toss.
• Probability of a die being 1/6 is conditional on a fair die.
A B
Everything is Conditional! Or Independent.
Independent events
Events and are called independent if
a) or
b)
These are equivalent if and , since 𝑃 ( 𝐴∩ 𝐵 )
• 𝑃 ( 𝐴∣ 𝐵 ) =
𝑃 ( 𝐵)
Concluding:
and are independent
Conditional Probabilities
• When screening for breast cancer, mammography is roughly 90%
accurate, in the sense that 90% of women with cancer, and 90% of
women without cancer, will be correctly classified.
•
•
Conditional Probabilities
• When screening for breast cancer, mammography is roughly 90%
accurate, in the sense that 90% of women with cancer, and 90% of
women without cancer, will be correctly classified.
•
Conditional Probabilities
• Suppose 1% of women being screened actually have cancer: what is the
probability that a randomly chosen woman will have a positive
mammogram, and if she does, what is the chance she really has cancer?
• …
Positive Test
(M+) 9 99 108
10 990 200
• =
Conditional Probabilities
• Despite “90% accuracy”, this is pretty bad overall result.
• A positive mammography here only gives an 8% chance
of having breast cancer. What use is that?
• At what percentage would this test be useful?
• If accuracy is 99%, then we have:
Exercise: Build both the tree and contingency table.
• The Triple Blood Test screens a pregnant woman and provides as
estimated risk of her baby being born with the genetic disorder
Down syndrome.
• A study of 5282 women aged 35 or over analyzed the Triple
Blood Test to test its accuracy. It was reported that of the 5282
women, “48 of the 54 cases of Down syndrome would have
been identified using the test and 25 percent of the unaffected
pregnancies would have been identified as being at high risk
(positive) for Down syndrome.”
• What is probability of Down’s if you had a positive test?
Bayes’ rule
Statistical Interpretation of Bayes’ rule
Prior probability of How expected the
the hypothesis observation is
given the
𝑷 ¿
hypothesis
How expected
Probability of the
the observation
hypothesis given
𝑷
the observation ¿ is without the
hypothesis
Bayes’ rule
Example
Let and denote the events that a randomly chosen person has cancer and is
diagnosed as having cancer, respectively.
Data: , , , , , and
Question: Calculate
Solution:
Bayes’ rule
Example
Let and denote the events that a randomly chosen person has cancer and is
diagnosed as having cancer, respectively.
Data: , , , , , and
Derived information: and
Question (2): What is the probability that someone diagnosed with cancer
actually has the disease?
Solution:
Bayes’ rule
Example
Let and denote the events that a randomly chosen person has cancer and is
diagnosed as having cancer, respectively.
Data: , , , , , and
Derived information: and
Question (3): What is the probability that someone who is diagnosed as not
having cancer actually has the disease?
Solution:
Conditional Probability
• Used when “adjusting for” confounds
• We condition on, for example, summer:
• P(Ice Cream Sales | Summer, Shark Attacks) = 0.
• Conditioning simply mean that we have altered
the subset of events over which we measure
probability.
Prosecutor’s fallacy
• Assuming that the probability P(A|B) is the same as P(B|
A)
• In Forensic science, the statement:
• “If the accused is innocent, there is only a 1 in a billion
chance that the DNA at the crime scene matches them”
• Is understood to mean:
• “Given the DNA evidence, there is only 1 in a billion
chance the accused in innocent”
Prosecutor’s fallacy
• “If the accused is innocent, there is only a 1 in a billion
chance that the DNA at the crime scene matches them”
• P(DNA match | innocent) = 1/1000000000
• “Given the DNA evidence, there is only 1 in a billion
chance the accused in innocent”
• P(innocent | DNA match) = ?
• = P(DNA match | innocent) * P(innocent)/P(DNA match)
• = We don’t know!
Prosecutor’s fallacy
• Easier to see the error when you frame it as:
• “Given you’re a student, you don’t earn money”
• Vs
• “Given you don’t earn money, you’re a student”
Understanding the nature of probability
• Let’s philosophise!
• Probability is tricky.
Example 1
square:
Introduction to Random Variables
Random Variable (RV)
is a random variable for the sample space if it assigns a real
number to each element of ,
Example 2
Rolling a die until a comes up where denotes an outcome of or
number of rolls required
Introduction to Random Variables
Discrete vs Continuous Random Variables
is discrete be default if is finite or countable.
Example
(notice: is uncountable)
is continuous
is discrete
Discrete Random Variables
Probability distribution
The probability distribution of a discrete random variable (RV) is
defined as
• One way to do this is to calculate how many days had more than 7 events
in the past 3 years and divide by the number of days. And lean on the
long-run frequency as a measure of probability.