0% found this document useful (0 votes)
2 views

25 Intro to Bayesian Inference (1)

The document provides an overview of Bayesian inference, including the use of conjugate priors and examples such as Normal-Normal and Beta-Binomial distributions. It discusses point estimates, posterior probabilities, and Bayesian hypothesis testing, emphasizing the intuitive nature of Bayesian methods compared to classical statistics. Additionally, it touches on practical applications and considerations in Bayesian analysis, including the importance of prior knowledge and uncertainty estimates.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views

25 Intro to Bayesian Inference (1)

The document provides an overview of Bayesian inference, including the use of conjugate priors and examples such as Normal-Normal and Beta-Binomial distributions. It discusses point estimates, posterior probabilities, and Bayesian hypothesis testing, emphasizing the intuitive nature of Bayesian methods compared to classical statistics. Additionally, it touches on practical applications and considerations in Bayesian analysis, including the importance of prior knowledge and uncertainty estimates.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 31

Join at slido.

com
#EPIB621-25

ⓘ Start presenting to display the joining instructions on this slide.


Correlated
Outcome
Course Map
Count Poisson Regression Two-way Table
Outcome Model Selection

Binary Goodness of Fit


Outcome Logistic Regression Inference
Confounding

Continuous Multiple Linear Regression Interactions


Outcome
Simple Linear Regression Inference Dummy Variable

Foundation Probability Statistical Inference Bayesian Global Test

22
26 Introduction to Bayesian Inference
Qihuang Zhang

EPIB 621: Data Analysis in Health Sciences


4
Basics of Bayesian inference
Conjugate priors?

▪ Result in a posterior from the same distribution family as the prior;


▪ Result in a posterior that is analytically tractable;
▪ i.e., ∫ 𝑃(𝜃)𝑃(𝑦 | 𝜃)𝑑𝜃 can be obtained analytically.

→ We can decide the prior according to the likelihood (data generating model)

Depends on the type of outcome we have

5
Conjugate priors – example 1
Normal-Normal

For continuous outcomes where we can reasonably assume


𝑌𝑖 ∼ 𝑁 𝜃, 𝜎2 , 𝑖 = 1, . . . , 𝑛

a normal prior
𝜃 ∼ 𝑁(µ, 𝜏 2 )
(known hyper-parameters 𝜇 and 𝜏 ) results in a normal posterior

𝜎2 2 −1
𝜇 + 𝜏 𝑦ത 𝑛 1
𝜃|𝑦 ∼ Normal 𝑛 , 2+ 2
𝜎 2 𝜎 𝜏
+ 𝜏2
𝑛

6
What will happen if we increase tau?

We have higher confidence in prior

We have lower confidence in prior

We expected the posterior give higher weights on the sample data

We expected the posterior give higher weights on the prior

ⓘ Start presenting to display the poll results on this slide.


Point estimates
The Posterior is the distribution of the parameters. To facilitate comparison with
frequentist results, we would like to have a single value as our estimate:
▪ Posterior mean
𝜃෠ = 𝐸 (𝜃 | 𝑦 )
▪ Posterior median:

𝜃෠ is the point such that 𝑃(𝜃 < 𝜃෠𝑚𝑒𝑑 | 𝑦 ) = 0.5


▪ Posterior mode (aka Maximum a posteriori – MAP):
𝜃෠𝑀𝐴𝑃 = arg max 𝑝(𝜃|𝑦)
𝜃
→ Often being chosen when a distribution is skewed

8
For Normal prior + Normal Outcome, what is the
preferred form of the reported point estimates?

Posterior mean
Posterior median
Posterior mode

It doesn't matter

ⓘ Start presenting to display the poll results on this slide.


Conjugate priors – example 2
Beta-Binomial
▪ For Binary outcomes where

𝑌𝑖 ∼ 𝐵𝑒𝑟𝑛𝑜𝑢𝑙𝑙𝑖(1, 𝜋), 𝑖 = 1, . . . , 𝑛

a Beta prior 𝜋 ∼ 𝑏𝑒𝑡𝑎(𝑎, 𝑏)

results in a Beta posterior


𝑛 𝑛

𝜋| 𝑦 ∼ 𝑏𝑒𝑡𝑎 𝑎 + ෍ 𝑦𝑖 , 𝑏 + 𝑛 − ෍ 𝑦𝑖
𝑖=1 𝑖=1

10
Conjugate priors – example 2
About beta distribution “Out of 𝑎 + 𝑏 − 2 experiments, we
expected 𝑎 − 1 success and 𝑏 − 1 failure,
𝜋 ~ 𝑏𝑒𝑡𝑎(𝑎, 𝑏) what is the distribution of success
(0 ≤ 𝜋 ≤ 1) proportion 𝜋?”

𝑎
→𝐸 𝜋 =
𝑎+𝑏

→ The mode of 𝑃:
𝑎−1
Mode 𝜋 =
𝑎+𝑏−2
(for 𝑎 > 1 and 𝑏 > 1)

11
Conjugate priors – example 2
About beta distribution What would happen if the prior is
Beta(1,1)?
The posterior is given by:
𝑛 𝑛

𝜋| 𝑦 ∼ beta 1 + ෍ 𝑦𝑖 , 1 + 𝑛 − ෍ 𝑦𝑖
𝑖=1 𝑖=1

The posterior mode is given by:

1 +σ𝑛𝑖=1 𝑦𝑖 −1 σ𝑛
𝑖=1 𝑦𝑖
mode 𝜃 = =
1 +σ𝑛 𝑛
𝑖=1 𝑦𝑖 +1 +𝑛 −σ𝑖=1 𝑦𝑖 −2 𝑛

→ The lowest confidence

12
Beta-Binomial example
▪ Consider the mysterious infectious disease that I know nothing about and am
interested in estimating the risk of infection among McGill students/faculty/staff.
→ I draw a random sample of 50 individuals at McGill and test them, among
these 16 are infected.

▪ How do I estimate the risk in the Bayesian framework?


We have no information now → Consider a beta(1,1) prior
𝜋 ∼ 𝑏𝑒𝑡𝑎(1,1) 𝑌𝑖 ∼ 𝐵𝑖𝑛(1, 𝜋)
16+1
Mean: 𝐸 𝜋𝑖 𝑌𝑖 =
𝜋|𝑌𝑖 ∼ 𝑏𝑒𝑡𝑎(1 + 16, 1 + 50 − 16) 2+50

16
Mode: 𝑀𝑜𝑑𝑒 𝜋𝑖 𝑌𝑖 =
50

13
Beta-Binomial example (Cont’d)
Prior knowledge:
▪ Later I learn that my colleague has performed a similar study with a sample size
of 35 with 8 infected individuals. How can I incorporate this information in
my analysis?
𝑎=9
𝑎+𝑏−2 𝑎−1 𝑏 = 28

𝜋 ~ 𝑏𝑒𝑡𝑎(9, 28)

𝜋|𝑌 ~ 𝑏𝑒𝑡𝑎(9 + 16, 28 + 34)

14
Prior*
• Noninformative priors:
a prior that can be guaranteed to play a minimal role
Rationale: “Let the data speak for themselves.”
aka, reference priors, vague, flat, diffuse
e.g, beta(1,1), N(0,infinity)

• Weakly informative priors:


contain enough information to ‘regularize’ the posterior
e.g., beta(3,8) , N(0,3)

15
Non-conjugate framework*
▪ Conjugate priors aren’t always available;
▪ Most realistic models fall outside the conjugate family of models;
▪ These are cases where
Recall Bayes Theorem:

𝑝(𝑦) = ∫ 𝑝(𝑦 | 𝜃)𝑝(𝜃)𝑑𝜃


𝑃 𝑦 𝜃 𝑃(𝜃)
𝑃(𝜃|𝑦) =
A useless but annoying term…. 𝑃(𝑦)

→ cannot be obtained analytically;


▪ Explicit calculation of posterior distributions aren’t possible
An overview of Bayesian computation*
. Numerical integration
⏵approximate integrals by computing values of a continuous function at
finite number of points
⏵stochastic (e.g. Monte Carlo)
. Distributional approximations
⏵approximate the posterior with a simpler parametric distribution
⏵e.g. variational Bayes methods
. Sampling techniques:
⏵approximate the posterior by a set of samples drawn from it
⏵e.g. Markov chain Monte Carlo

17
Posterior Estimates vs Posterior Probabilities
• The posterior estimates help to conduct the estimation of the parameters
in the Bayesian framework
Posterior Mean/Median/Mode

• For decision-making, the posterior probability might be useful.


→ The probability that the parameter value is within a certain range, less
than or greater than a given value given the observed data
𝑃(𝑎 < 𝜃 < 𝑏 | 𝑦)
𝑃(𝜃 > 𝑎 | 𝑦)
𝑃(𝜃 < 𝑏 | 𝑦)

18
Posterior probabilities: example
▪ In the mysterious disease example, what is the posterior probability that
the risk is below 20% from my study?

𝑃(𝑝 < 0.2 | 𝑦):

pbeta(0.2, shape1 = a , shape2 = b)

▪ What is the posterior probability that the risk is between 10% and 20%
from both studies?

𝑃(0.2 < 𝑝 < 0.3 | 𝑦)

pbeta(0.3,a , b) - pbeta(0.2,a, b)

19
19
Uncertainty estimates: Bayesian credible intervals
▪ ... are defined based on the posterior probabilities: they are simply the range
of most credible parameter values under the posterior (i.e., given the data)

▪ (1 − 𝛼)% credible interval is the interval (𝑥1 , 𝑥2 ) such that


𝑃(𝑥1 < 𝜃 < 𝑥2 | 𝑦) = 1 − 𝛼
𝑥1 and 𝑥2 are therefore the 𝛼/2 and 1 − 𝛼/2 quantiles of the posterior
distribution

What is the confidence interval for 𝜋?


→ Check out R lab

20
Bayesian hypothesis testing
▪ Much more intuitive than classical hypothesis testing;
▪ Relies on the probabilities of the statistical hypotheses, 𝐻0 and 𝐻𝐴 given the
observed data:
𝑃(𝐻0 |𝑦), 𝑃(𝐻𝐴 |𝑦)
→ The first one is exactly what we are told to not confuse p-values with! 😂

▪ Posterior probability of the alternative hypothesis is sometimes used to define


“statistical significance”:
𝑃(𝐻𝐴 |𝑦) > 0.95

21
Bayesian hypothesis testing

▪ Inference based on alternative analysis is OK, but we might be concerned about


priors if they are too strong and imprecise.

▪ A more common “test statistic” is the Bayes factor:


𝑃 𝐻𝐴 𝑦 𝑃 𝐻0 𝑦
𝐵𝐹 = /
𝑃(𝐻𝐴 ) 𝑃(𝐻0 )

22
Bayesian hypothesis testing

▪ 𝐵𝐹 = 1 No evidence in favour of alternative


▪ 𝐵𝐹 > 10 Strong evidence in favour of alternative
▪ 𝐵𝐹 > 100 Decisive evidence in favour of alternative

23
Some Life Lessons from Statistics

24
Panic about something?

Expectation = cost × probability of happening.

▪ We often over-exaggerate either cost or probability of happening but forget


that another component can be extremely small.

▪ Do you worry about car accidents on a daily basis? If the expected cost is
lower than it, then why would you worry about it every day?

25
About Confidence Interval

[ est. - margin of error ,


est. + margin of error ]

▪ Allow the existence of error and allow yourself for a larger margin of error.

▪ As a reward, you will get more confidence.

26
About Poisson Distribution
In one year, you will hear about 8 good news. When will that happen?
Randomly sample 8 points?

1 year

27
About Poisson Distribution
However, if you think of the good news per day is a Poisson distribution.
Number of good news on day 𝑖 ~ 𝑃𝑜𝑖𝑠𝑠𝑜𝑛(8/365)

What it really looks like would be:

1 year

28
What it really looks like would be:

1 year

The period with no good news will take up the majority of our time

1 year

It’s absolutely normal that multiple bad things can happen at the
same time. You will get over it anyway.

29
What do we know about Final Exams
Mainly focus on the content after midterm (after and including lecture
12).
General Questions: Similar to the warm-up questions
Data analysis 1: Logistic regression
Data analysis 2: Poisson regression
Data analysis 3: Correlated outcome
Data analysis 4 (small): Two-way Table
Data analysis 5 (small): Bayesian Statistics

What is NOT covered:


- Conditional logistic regression (but you should know the idea of
matching)
- This Lecture (lecture 25) after slide 15
30
• Reminder:
Course evaluation

Thank you!

You might also like