Analytics of Observational data lec 10
Analytics of Observational data lec 10
1
5/8/2024
BAYES’ THEOREM
CONDITIONAL
PROBABILITY
2
5/8/2024
3
5/8/2024
4
5/8/2024
COMPARE
10
5
5/8/2024
Why Bayes?
Because Bayes answers the questions we really
care about.
Pr(I have disease | test +) vs Pr(test + | disease)
11
12
6
5/8/2024
13
Probability
Limiting relative frequency:
#times a happens in n trials
P(a) = lim
n®¥ n
A nice definition mathematically, but not so great
in practice. (So, we often appeal to symmetry…)
What if you can’t get all the way to infinity today?
What if there is only 1 trial? E.g., P(snow tomorrow)
What if appealing to symmetry fails? E.g., I take a
penny out of my pocket and spin it. What is P(H)?
IT142IU - ANALYTICS FOR OBSERVATIONAL DATA 5/8/2024 14
14
7
5/8/2024
Subjective probability
odds prob
Prob = Odds =
1+ odds 1- prob
Fair die
Event Prob Odds
even # 1/2 1 [or 1:1]
X>2 2/3 2 [or 2:1]
roll a 2 1/6 1/5 [or 1/5:1 or 1:5]
Persi Diaconis: "Probability isn't a fact about the world;
probability is a fact about an observer's knowledge."
IT142IU - ANALYTICS FOR OBSERVATIONAL DATA 5/8/2024 15
15
Bayes’ Theorem
P(a|b) = P(a,b)/P(b)
16
8
5/8/2024
P(a)P(b | a)
P(a|b) =
å P(a*)P(b | a*)
a*
17
18
9
5/8/2024
P(dis)P(test+ | dis)
p(dis | test+) =
P(dis)P(test+ | dis) + P(Ødis)P(test+ | Ødis)
(0.01)(0.95) 0.0095
= = » 0.24
(0.01)(0.95) + (0.99)(0.03) 0.0095 + 0.0297
IT142IU - ANALYTICS FOR OBSERVATIONAL DATA 5/8/2024 19
19
VISUALIZE NB THEOREM
• https://towardsdatascience.com/visualizing-bayesian-theorem-
ccea9e3696a2
20
10
5/8/2024
21
22
11
5/8/2024
What is MCMC?
We want to find P(θ | data), which is equal to
P(θ)*P(data |θ)/P(data) and is proportional to
P(θ)*P(data | θ), which is "prior*likelihood”.
But to use Bayesian methods we need to be
able to evaluate the denominator, which is the
integral of the numerator, integrated over the
parameter space. In general, this integral is
very hard to evaluate.
Note: There are two special cases in which the
mathematics works out: for a beta-prior with binomial-
likelihood (which gives a beta posterior) and for a
normal prior with normal likelihood (which gives a
normal
IT142IU - ANALYTICS posterior).
FOR OBSERVATIONAL DATA 5/8/2024 23
23
24
12
5/8/2024
25
26
13
5/8/2024
REMARKS
• The mode of the posterior distribution is between the mode of the prior
distribution and the mode of the likelihood function, but the posterior mode is
closer to the likelihood mode for larger sample size.
• The width of the posterior highest density intervals (HDI) is smaller for the larger
sample size.
• The larger sample implied a smaller range of credible underlying biases in the coin.
• The more data we have, the more precise is the estimate of the parameter(s) in
the model.
27
28
14
5/8/2024
PRIOR DISTRIBUTIONS
29
POSTERIOR DISTRIBUTIONS
30
15
5/8/2024
31
(2
i =1
2 - 12
) exp(- 12 ( xi - ) 2 / 2 )
µ exp(- 12 2 (1 / 02 + n / 2 ) + ( 0 / 02 + å xi / 2 ) + cons)
i
32
16
5/8/2024
µ exp( - 12 2 (1 / 02 + n / 2 ) + ( 0 / 02 + å x i / 2 ) + cons )
i
® = (1 / + n / )
2
0
2 -1
and = ( 0 / + å x i / 2 )
2
0
i
33
34
17
5/8/2024
As n ® ¥
Posterior precision 1 / = (1 / 02 + n / 2 ) ® n / 2
So posterior variance ® 2 /n
Posterior mean = ( 0 / 02 + x /( 2 / n)) ® x
And so posterior distribution
p ( | x) ® N ( x , 2 / n)
Compared to p ( x | ) = N ( , 2 / n) in the frequentist setting
35
36
18
5/8/2024
CONSTRUCTING POSTERIOR 1
37
38
19
5/8/2024
CONSTRUCTING POSTERIOR 2
• Again to construct the posterior we use the earlier formulae we have just
calculated
• From the prior, 0 = 170, 02 = 9
• From the data, x = 165.52, 2 = 50, n = 10
• The posterior is therefore
p ( | x) ~ N ( 2 , 2 )
where 2 = ( 19 + 10
50 )
-1
= 3.214,
2 = 2 ( 170
9 + 50 ) = 167.12.
1655.2
39
PRIOR 2 COMPARISON
Note this prior is not as close to the data as prior 1 and hence posterior is somewhere
between prior and likelihood.
40
20
5/8/2024
41
CREDIBLE INTERVALS
• If we consider the heights example with our first prior then our posterior is
P(μ|x)~ N(165.23,2.222),
and a 95% credible interval for μ is
165.23±1.96×sqrt(2.222) =
(162.31,168.15).
Similarly prior 2 results in a 95% credible interval for μ is
(163.61,170.63).
Note that credible intervals can be interpreted in the more natural way that
there is a probability of 0.95 that the interval contains μ rather than the
frequentist conclusion that 95% of such intervals contain μ.
42
21
5/8/2024
HYPOTHESIS TESTING
43
BAYES FACTORS
44
22
5/8/2024
Let us assume that H0 is μ >165 and hence H1 is μ ≤165. Now we have π0=
π1=0.5 under the N(165,4) prior
The posterior is N(165.23,2.222) which results in p0 =0.561 p1=0.439 and
results in a Bayes factor of 0.561/0.439=1.278 here the Bayes factor is
close to 1 and so the data has not much altered our beliefs about the
hypothesis under discussion.
45
46
23