0% found this document useful (0 votes)
1 views

Lecture+8+Probability+Theory

This lecture covers the fundamentals of probability theory, including concepts such as sampling distribution, confidence limits, and bootstrapping. It discusses the origins of probability through gambling scenarios, introduces key terminology, and explains various rules and interpretations of probability, including conditional probabilities and Bayes' rule. The lecture emphasizes the complexities and nuances of understanding probability in real-world contexts.

Uploaded by

ghm5vn8pzv
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
1 views

Lecture+8+Probability+Theory

This lecture covers the fundamentals of probability theory, including concepts such as sampling distribution, confidence limits, and bootstrapping. It discusses the origins of probability through gambling scenarios, introduces key terminology, and explains various rules and interpretations of probability, including conditional probabilities and Bayes' rule. The lecture emphasizes the complexities and nuances of understanding probability in real-world contexts.

Uploaded by

ghm5vn8pzv
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 72

KEN1435: Principles of

Data Science
Lecture 08:
Probability Theory
Anirudh Wodeyar & Tim Dick
Recap
• Sampling distribution: Distribution of a statistic (e.g. a
mean)
• Confidence Limits
• Bootstrapping
The origins of probability theory: Gambling
• Let’s bet.

• What odds would you give me for:


- Getting a six when you throw a fair die at most 4 times?
- Getting two sixes when you throw a pair of dice 24 times?

• Which is more likely? Which one would you put your


money on?
Chevalier de Mere’s guess
• In Game 1, he thought since there seemed to be equal
chance for the 6 always, so 1 in 6 times, then the
probability was .
• And for Game 2, he thought since the chance of two
sixes at once is 1 in 36, then the probability was , so that
both games looked the same to him.
• Now what would you put your money on?
The Actual Probability
The Actual Probability
Probability is Tricky and Unintuitive
• How do you understand probability? Can you measure it
out in the world like it is a real thing?

• We can measure distances, temperatures, voltages,


where are the probabilities?
Thinking of probability as expected frequency
• What is the
probability of 2
heads when flipping
2 coins?
Thinking of probability as expected frequency
• Can turn expected Sample Space

frequencies into a HH

probability tree:
How? HT

TH

TT
Some useful jargon
Experiment
An action whose outcome is determined by chance
Example: Flipping a coin

Sample Space
The set of possible outcomes of an experiment (denoted by )
Example: the number of spots on each side of the die

Event
A subset of the sample space (denoted by )
Example: the outcome of the experiment has two heads.
Thinking of probability as expected frequency
• Can turn expected
frequencies into a
probability tree

• Now, probability
along any branch is
multiplication of
probabilities along it
Thinking of probability as expected frequency
• What if we first
flipped a coin, then
rolled a fair die?
Thinking of probability as expected frequency
Experiment: Draw a card out a full deck of cards

Sample space : the cards in a full deck of cards.

(52 outcomes)

Event : the card drawn from the deck is a queen.


What rules of probability do we see from the tree?
1. The probability of an event E (e.g. one head = {HT, TH}) is a number
between 0 and 1. 0 is only for impossible events (e.g. 0 heads, 0 tails)
and 1 for certain events (at least one head or tail).
We represent this as .
Events and their Complements
2. Given the sample space of events (flipping 2 coins), the probability of
an event not happening is 1 - the probability of it happening. Probability
of not two tails = 1 – ¼ = ¾ .
Events and their Complements
Complement of an Event (denoted by , or )
Example: Rolling a single die

Figure: highlighted
Overlapping Event Probabilities
Intersection of Events and (denoted by )
Example: Rolling a single die

Figure: highlighted
Mutually exclusive/Disjoint Event Probabilities can be added
(OR rule)
3. Probabilities of events that are mutually exclusive
(no overlapping events) can be added
together to get the total probability.
then:
Independent event probabilities can be multiplied
4. Multiple probabilities for the total probability of a
sequence of events (that are happening
together).
Events are sets, as a consequence we also have
Union of Events and (denoted by )
Example: Rolling a single die

Figure: highlighted
Mutual Exclusion vs Independence
• What is the difference?
The origins of probability theory: Gambling
• Can you now calculate what the optimal odds should be
for the following? Assuming we played it 1000 times.

Getting a six when you throw a fair die at most 4 times


vs
Getting two sixes when you throw a pair of dice 24
times?
Coin flipping is always biased!
Probability Theory Basics
Probability measure
A function that maps each possible event to a number in .
Properties of probability measures:

Probability of an event
The probability of an event is the sum of the probabilities of the
elements in (denoted by )
Probability measures (example)
A fair die

An unfair die
Probability Theory Basics
Rules of calculating probabilities
• , if each element in has equal probability, where denotes the
cardinality (how many?) of the set.
• Given events and :
• For disjoint events and :

¿ +¿ −
Probability Theory Basics
Rules of calculating probabilities
• , if each element in has equal probability, where denotes the
cardinality of the set.
• Given events and :
• For disjoint events and :
• For any event :

Proof:
Calculating Probabilities (example)
Example: Rolling a fair die
, events:
Everything is Conditional!
• Probability of heads being 0.5 is conditional on a truly
fair coin toss.
• Probability of a die being 1/6 is conditional on a fair die.

• What is a conditional probability?


• For two events A and B, the conditional probability
Everything is Conditional!
• For two events A and B, the conditional probability
Everything is Conditional!
• For two events A and B, the conditional probability when
A and B are independent is:

A B
Everything is Conditional! Or Independent.
Independent events
Events and are called independent if
a) or
b)
These are equivalent if and , since 𝑃 ( 𝐴∩ 𝐵 )
• 𝑃 ( 𝐴∣ 𝐵 ) =
𝑃 ( 𝐵)
Concluding:
and are independent
Conditional Probabilities
• When screening for breast cancer, mammography is roughly 90%
accurate, in the sense that 90% of women with cancer, and 90% of
women without cancer, will be correctly classified.

• Suppose 1% of women being screened actually have cancer: what is the


probability that a randomly chosen woman will have a positive
mammogram, and if she does, what is the chance she really has cancer?
Conditional Probabilities
• When screening for breast cancer, mammography is roughly 90%
accurate, in the sense that 90% of women with cancer, and 90% of
women without cancer, will be correctly classified.

• Suppose 1% of women being screened actually have cancer: what is the


probability that a randomly chosen woman will have a positive
mammogram, and if she does, what is the chance she really has cancer?
Conditional Probabilities
• When screening for breast cancer, mammography is roughly 90% accurate,
in the sense that 90% of women with cancer, and 90% of women without
cancer, will be correctly classified.

• Suppose 1% of women being screened actually have cancer: what is the


probability that a randomly chosen woman will have a positive
mammogram, and if she does, what is the chance she really has cancer?



Conditional Probabilities
• When screening for breast cancer, mammography is roughly 90%
accurate, in the sense that 90% of women with cancer, and 90% of
women without cancer, will be correctly classified.

• Suppose 1% of women being screened actually have cancer: what is the


probability that a randomly chosen woman will have a positive
mammogram, and if she does, what is the chance she really has cancer?


Conditional Probabilities
• Suppose 1% of women being screened actually have cancer: what is the
probability that a randomly chosen woman will have a positive
mammogram, and if she does, what is the chance she really has cancer?
• …

• Another way to think about this is in terms of expected


frequencies. For 1000 women, how many have breast
cancer? What percentage of them will test positive?
Conditional Probabilities
Conditional Probabilities from Contingency Table
Cancer (BC+) No Cancer (BC-)

Positive Test
(M+) 9 99 108

Negative Test 1 891 892


(M-)

10 990 200

• =
Conditional Probabilities
• Despite “90% accuracy”, this is pretty bad overall result.
• A positive mammography here only gives an 8% chance
of having breast cancer. What use is that?
• At what percentage would this test be useful?
• If accuracy is 99%, then we have:
Exercise: Build both the tree and contingency table.
• The Triple Blood Test screens a pregnant woman and provides as
estimated risk of her baby being born with the genetic disorder
Down syndrome.
• A study of 5282 women aged 35 or over analyzed the Triple
Blood Test to test its accuracy. It was reported that of the 5282
women, “48 of the 54 cases of Down syndrome would have
been identified using the test and 25 percent of the unaffected
pregnancies would have been identified as being at high risk
(positive) for Down syndrome.”
• What is probability of Down’s if you had a positive test?
Bayes’ rule
Statistical Interpretation of Bayes’ rule
Prior probability of How expected the
the hypothesis observation is
given the

𝑷 ¿
hypothesis

How expected
Probability of the
the observation
hypothesis given
𝑷
the observation ¿ is without the
hypothesis
Bayes’ rule
Example
Let and denote the events that a randomly chosen person has cancer and is
diagnosed as having cancer, respectively.
Data: , , , , , and
Question: Calculate
Solution:
Bayes’ rule
Example
Let and denote the events that a randomly chosen person has cancer and is
diagnosed as having cancer, respectively.
Data: , , , , , and
Derived information: and
Question (2): What is the probability that someone diagnosed with cancer
actually has the disease?
Solution:
Bayes’ rule
Example
Let and denote the events that a randomly chosen person has cancer and is
diagnosed as having cancer, respectively.
Data: , , , , , and
Derived information: and
Question (3): What is the probability that someone who is diagnosed as not
having cancer actually has the disease?
Solution:
Conditional Probability
• Used when “adjusting for” confounds
• We condition on, for example, summer:
• P(Ice Cream Sales | Summer, Shark Attacks) = 0.
• Conditioning simply mean that we have altered
the subset of events over which we measure
probability.
Prosecutor’s fallacy
• Assuming that the probability P(A|B) is the same as P(B|
A)
• In Forensic science, the statement:
• “If the accused is innocent, there is only a 1 in a billion
chance that the DNA at the crime scene matches them”
• Is understood to mean:
• “Given the DNA evidence, there is only 1 in a billion
chance the accused in innocent”
Prosecutor’s fallacy
• “If the accused is innocent, there is only a 1 in a billion
chance that the DNA at the crime scene matches them”
• P(DNA match | innocent) = 1/1000000000
• “Given the DNA evidence, there is only 1 in a billion
chance the accused in innocent”
• P(innocent | DNA match) = ?
• = P(DNA match | innocent) * P(innocent)/P(DNA match)
• = We don’t know!
Prosecutor’s fallacy
• Easier to see the error when you frame it as:
• “Given you’re a student, you don’t earn money”
• Vs
• “Given you don’t earn money, you’re a student”
Understanding the nature of probability
• Let’s philosophise!

• Probability is tricky.

• How can we understand what it means?


Understanding the nature of probability
• Classical probability: Dice, coins, packs of cards – derived based
on symmetries. We assume fairness, and don’t think about the
conditionality on fairness.
• Enumerative or combinatorial probability: We calculate the
expected frequencies of events when we have categories. Like
picking a green ball out of a bag of 4 green and 5 blue balls. This
too assumes something about fairness and so again, under this
approach we have to pretend everything is fair.
- Consider if the bag is shaped weirdly, or if you naturally put your hand in
a certain way. Or as happened with the Vietnam war draft, the balls are
badly randomized.
Understanding the nature of probability
• Long-run frequency probability: We don’t make a
definition of equally likely. Instead, we do 387500 coin
tosses and assess the long run probability. How do we
understand the probability of rain in tomorrow’s
weather?
• Propensity or “chance”: The idea of probability being out
in the world. As though it is truly measureable – if you
were all-knowing, then you would know it.
Understanding the nature of probability
• Subjective probability: The idea that every probability is
essentially saying something about a person’s judgement
about that specific occasion. This can definitely be based
on background knowledge, but in essence says that the
probability isn’t out in the world, instead it appears out
of a desire to try to make sense of the world.
• For example, a weatherman has a better sense for the
possibility of rain tomorrow than you or I.
Understanding the nature of probability
• Subjective probability: Calling it subjective doesn’t mean
it isn’t useful or replicable across individuals. We can
both come to the same value for the probability of dice
because we both have the same knowledge about it.
• So when working with probability, even if we think it is
subjective, we can pretend it represents something
objective in the world.
• Indeed, As if thinking is how we do science!
Connecting probability, data, and statistical learning
Probability is natural in situation 1:
1. Data is generated by a randomizing device like flipping
dice or coins or a randomized control trial.

Generally, we face the following situation:


2. The pre-existing data-point is chosen by a randomizing
device, like when people take part in a survey.
Connecting probability, data, and statistical learning
Or we have situation 3, which is how a lot of data is
generally generated:
3. There is no randomness at all but we act as if the data is
in fact generated by a random process. Like how salty the
spoon of soup was and pretending it came from a bell
curve of saltiness.

Note: Under the subjective probability expectation, all three


situations become situation 3.
Connecting probability, data, and statistical learning
Consider situations 1 and 2: so dice and times when we
have a randomizing device allowing selection, then we have
a random variable.
Definition
Random variable: A source of data that is believed to be produced from a
probability distribution. For example, outcomes of a coin can be assumed
to come from a probability distribution with weight 0.5 at heads and 0.5
at tails. Every time we flip a coin, then the probabilities collapse and we
have an outcome.
Introduction to Random Variables
Random Variable (RV)
is a random variable for the sample space if it assigns a real
number to each element of ,

Example 1

square:
Introduction to Random Variables
Random Variable (RV)
is a random variable for the sample space if it assigns a real
number to each element of ,

Example 2
Rolling a die until a comes up where denotes an outcome of or
number of rolls required
Introduction to Random Variables
Discrete vs Continuous Random Variables
is discrete be default if is finite or countable.
Example
(notice: is uncountable)
is continuous
is discrete
Discrete Random Variables
Probability distribution
The probability distribution of a discrete random variable (RV) is
defined as

Example: Rolling a die until a comes up

number of rolls required, i.e.


Discrete Random Variables
Probability distribution
The probability distribution of a discrete random variable (RV) is
defined as

Example: Rolling a die


, , where s is the outcome of the roll
Discrete Random Variables
Cumulative distribution (denoted by )

Example: Rolling a die


, , where s is the outcome of the roll
Connecting probability, data, and statistical learning
Consider situation 3: There is no randomness at all but we act as if
the data is in fact generated by a random process. .

For example, consider the case where we have data about


homicide events. This is not coming from a randomizing device, or
any sort of random process we can immediately identify.

Can we connect this to probability theory to help answer the


following question: How possible is it that tomorrow there are 7
homicide events?
Modeling Homicide Events
• What is the data? We have the number of homicide events on
every data for the past year.

• We could make a prediction for tomorrow based on the mean:


note that this is already connecting to probability theory. We are
assuming that the highest probability for the number of
homicide events tomorrow is the mean for the past year.

• Mean = 1.41/day – this seems a reasonable guess for tomorrow.


Modeling Homicide events
• Suppose we wanted to answer the question: how likely is it that there are
over 7 homicide events tomorrow?

• One way to do this is to calculate how many days had more than 7 events
in the past 3 years and divide by the number of days. And lean on the
long-run frequency as a measure of probability.

• Alternatively, we can use a discrete probability distribution (as opposed to


continuous): the Poisson distribution.
• Why use a distribution? They tend to allow us to generalize our
understanding of the data and to connect it to other similar situations.
And they are less susceptible to being influenced by outliers.
Modeling Homicide
A Poisson distribution is fully
characterized by the mean: all we
need is the mean to draw
samples and make a distribution.

We should compare the real data


to the expected distribution to
make sure they fit (more on this
in Simulation and Statistical
Analysis).
Modeling Homicide
Ok, the distribution seems
reasonable.

Then we can calculate the


probability based on the
distribution.

This means summing over the


probability that the number of
homicides is > 7.
Modeling Homicide Events
P(Homicide events > 7| Poisson)
= 0.07 %

Can we check this with the past


data?

Yes, we can calculate the


expected frequency directly.
Modeling Homicide events
• Despite having all the data, we could pretend the data came
from a random distribution.
• The function for the distribution seems to capture some
underlying structure to the data.
• Like the central limit theorem, there simply are distributions
that can do a good job a being a way to summarize data that are
otherwise very difficult to understand how they are generated.
Modeling Homicide events
• For example, another way to predict how many homicides
happen tomorrow is to implement some way to read people’s
minds.
• Now we know how many people are feeling murderous and
based on that make a prediction for how many homicide events
happen tomorrow.
• This is not something we can do.
• The Poisson captures the doubt that comes from not having all
the information about every individual and gives us a way to
quantify that.
Summary
• Probability is a useful way to quantify uncertainty.
• Probability is most intuitively understood from the place of
expected frequency.
• Can think about probability in several ways.
• Having access to probability theory allows us to do statistics in a
more formal way, that is, not simply relying on long-run
frequencies.

You might also like