0% found this document useful (0 votes)

9 views

Bayes_Expected_Utility

Uploaded by

Osman Hamdi

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

9 views

Bayes_Expected_Utility

Uploaded by

Osman Hamdi

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 50

Bayesian Inference

and Decision
Theory
Unit 1: A Brief Tour of Bayesian
Inference and Decision Theory

©Kathryn Blackmond Laskey Spring 2023 Unit 1 v2- 1 -

What this Course is About

• You will learn a way of thinking about problems of inference and

decision-making under uncertainty
• You will learn to construct mathematical models for inference and
decision problems
• You will learn how to apply these models to draw inferences from
data and to make decisions
• These methods are based on Bayesian Decision Theory, a formal
theory for rational inference and decision making

©Kathryn Blackmond Laskey Spring 2023 Unit 1 v2- 2 -

Logistics
• Web site
• http://seor.vse.gmu.edu/~klaskey/SYST664/SYST664.html
• Blackboard site: http://mymason.gmu.edu
• Textbook and Software
• Hoff, A First Course in Bayesian Statistical Methods, Springer, 2009 (Free softcopy from Mason library)
• Other recommended texts on course web site
• We will use R, a free open-source statistical computing environment: http://www.r-project.org/. R code for many textbook
examples is on authorʼs web site
• Later in the semester we will use JAGS, an open-source package for Markov Chain Monte Carlo simulation (interfaces with
R): http://mcmc-jags.sourceforge.net/
• Requirements
• Regular assignments (30%); take-home midterm (35%); take-home final (35%)
• Office hours
• Official office hours are 3:00-400 PM Mondays (by appointment only), 4:00-5:00PM Wednesdays (via Zoom)
• I respond to questions by email and am available by appointment
• Course delivery
• 4:30-7:10 Mondays, in person ENT 276 and online via Zoom; all classes recorded
• Policies and Resources
• Academic integrity policy
• Read the policies and resources section of the syllabus

©Kathryn Blackmond Laskey Spring 2023 Unit 1 v2- 3 -

Course Outline

• Unit 1: A Brief Tour of Bayesian Inference and Decision Theory

• Unit 2: Random Variables, Parametric Models, and Inference from Observation
• Unit 3: Statistical Models with a Single Parameter
• Unit 4: Monte Carlo Approximation
• Unit 5: The Normal Model
• Unit 6: Markov Chain Monte Carlo
• Unit 7: Hierarchical Bayesian Models
• Unit 8: Bayesian Regression and Analysis of Variance
• Unit 9: Multinomial Distribution and Latent Groups
• Unit 10: Hypothesis Tests, Bayes Factors, and Bayesian Model Averaging

(There may be changes especially in second half; we may not make it to last unit)

©Kathryn Blackmond Laskey Spring 2023 Unit 1 v2- 4 -

Learning Objectives for Unit 1

• Describe the elements of a decision model

• Refresh knowledge of probability
• Apply Bayes rule for simple inference problems and interpret the results
• Explain why Bayesians believe inference cannot be separated from
decision-making
• Compare Bayesian and frequentist philosophies of statistical inference
• Compute and interpret the expected value of information (VOI) for a
decision problem with an option to collect information
• Download, install and use R statistical software

©Kathryn Blackmond Laskey Spring 2023 Unit 1 v2- 5 -

Bayesian Inference

• Bayesians use probability to quantify rational

degrees of belief
• Bayesians view inference as belief dynamics
• Use evidence to update prior beliefs to posterior
beliefs
• Posterior beliefs become prior beliefs for future
evidence
• Inference problems are usually embedded
in decision problems
• We will learn to build models of inference
and decision problems

“All models are wrong but some are useful”

George Box
©Kathryn Blackmond Laskey Spring 2023 Unit 1 v2- 6 -
Decision Theory

• Decision theory is a formal theory of decision making under uncertainty

• A decision problem consists of:
• Possible actions: {a}aÎA
• States of the world (usually uncertain): {s}sÎS
• Possible consequences: {c}cÎC (depends on action and state)

• Question: What is the best action?

• Answer (according to decision theory):
• Measure “goodness” of consequences with a utility function u(c)
• Measure likelihood of states with probability distribution p(s) (more generally, p(s|a))
• Best action with respect to model maximizes expected utility:

𝑎 ∗ = argmax 𝐸#(%) [𝑢(𝑐(𝑠, 𝑎))|𝑎] For brevity, we may write E[u(a)] for 𝐸! [𝑢(𝑐(𝑠, 𝑎))|𝑎]
"
• Caveat emptor:
• How good it is for you depends on fidelity of model to your beliefs and preferences
©Kathryn Blackmond Laskey Spring 2023 Unit 1 v2- 7 -
Illustrative Example:
Highly Oversimplified Decision Problem

• Decision problem: Should patient be treated for disease?

• We suspect she may have disease but do not know
• Without treatment the disease will lead to long illness
• Treatment has unpleasant side effects
• Decision model:
• Actions: aT (treat) and aN (don’t treat)
• States of world: sD (disease now) and sW (well now)
• Consequences: c(sD, aT)= c(sW, aT)=cWS (well shortly, side effects), c(sW, aN)=cWN (well shortly, no
side effects), c(sD, aN)=cDN (disease for long time, no side effects)
• Probabilities and Utilities:
u(cWN ) 100
• P(sD) = 0.3
u(cWS ) 90 = EU(aT)
• u(cWN) = 100, u(cWS) = 90; u(cDN) = 0 EU(aN ) 70

• Expected utility:
• Treat: .3´90 + .7´90 = 90
• Don't treat: .3´0 + .7´100 = 70
• Best action is aT (treat patient) u(cDN ) 0
P(sD) = 0.3 P(sW) = 0.7

©Kathryn Blackmond Laskey Spring 2023 Unit 1 v2- 8 -

Decision Model: Summary

• To model a decision problem we specify:

• Possible actions [aT, aN in medical example]
• Consequences [cWN, cWS, cDN in medical example]
• States [sD, sW in medical example]
• Probabilities of states [P(sD)=0.3, P(sW)=0.7 in medical example]
• Utilities for consequences [u(cWN) = 100, u(cWS) = 90; u(cDN) = 0 in medical example]
• To find the best decision we calculate the expected utility for each action
and choose the best
• E[u(aT)] = 90 ; E[u(aN)] = 70 in medical example; best decision is to treat
• Notation: for brevity we write E[u(a)] for EP(s)[u(c(s,a)|a]
• Sometimes we minimize expected loss (negative utility) instead of
maximizing expected utility
©Kathryn Blackmond Laskey Spring 2023 Unit 1 v2- 9 -
Sensitivity Analysis:
How Optimal Decision Varies with Sickness Probability

• Expected utility of not treating depends on the probability

p = P(sD) of having the disease
• E[U|aT] = 90 No dependence on p
• E[U|aN] = 0p + 100(1-p) = 100(1 – p) Decreases as p increases
• We should treat if p > 0.1, don’t treat if p < 0.1
• When we are unsure about the value of p Expected Utility for Disease Problem

we may want to explore how the optimal 100

decision changes as we vary p

90
Expected gain from
80
treatment at p = 0.3

If our estimate is near the crossover point,

Expected Utility
70

• 60

we may want to gather information to 50

refine our estimate of p

30
E[U(Treat)]

We will use Bayesian inference to update

20
• 10 E[U(NoTreat)]

our estimate of the probability 0

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

P(Sick)
©Kathryn Blackmond Laskey Spring 2023 Unit 1 v2- 10 -
Interlude: Review of Probability Basics

• Probability is a mathematical representation for uncertainty

• We assign probability to events
• An event 𝐴 is a subset of the sample space Ω
• A probability distribution is a function on events that satisfies:
• 𝑃 𝐴 ≥ 0 for all events 𝐴 Kolmogorov’s axioms
• 𝑃 Ω =1
• If 𝐴𝑖Ç𝐴𝑗 = Æ, then 𝑃(𝐴1È𝐴2È ⋯ ) = 𝑃(𝐴1) + 𝑃(𝐴2) + ⋯
• From these properties we can derive others, e.g.:
• 𝑃(𝐴) ≤ 1 for all events A
• 𝑃(⊘) = 0
• If 𝐴 ⊂ 𝐵 then 𝑃 𝐴 ≤ 𝑃(𝐵)
• 𝑃(𝐴 ∪ 𝐵) = 𝑃(𝐴) + 𝑃(𝐵) − 𝑃(𝐴 ∩ 𝐵) for events 𝐴 and 𝐵

©Kathryn Blackmond Laskey Spring 2023 Unit 1 v2- 11 -

Conditional Probability

• The conditional probability 𝑃(𝐴|𝐵) satisfies:

• 𝑃(𝐴|𝐵)𝑃(𝐵) = 𝑃(𝐴 ∩ 𝐵)
!(#∩%)
• If 𝑃 𝐵 > 0 then 𝑃 𝐴 𝐵 =
!(%)

• A and B are independent if 𝑃(𝐴|𝐵) = 𝑃(𝐴) B|B

¬A
• This implies 𝑃 𝐴 ∩ 𝐵 = 𝑃 𝐴 𝑃(𝐵)
• The law of total probability is: A|B
• If 𝐵𝑖Ç𝐵𝑗 = Æ and Ω = 𝐵1È𝐵2È ⋯ then
𝑃 𝐴 = - 𝑃 𝐴 ∩ 𝐵' = - 𝑃 𝐴 𝐵' 𝑃(𝐵' )
' '

©Kathryn Blackmond Laskey Spring 2023 Unit 1 v2- 12 -

Bayes Rule: The Law of Belief Dynamics

• Objective: use evidence to update beliefs

• H1, … Hn: exclusive and exhaustive hypotheses HiÇHj = Æ, Ω=H1ÈH2È…
• E: evidence (with positive probability) P(E)>0
H2
• Procedure: apply Bayes Rule: E
H1
"($(∩&) " &|$( "($() " &|$( "($()
𝑃 𝐻! 𝐸 = = =∑
"(&) "(&) ) " &|$) " $)
• Bayes Rule (odds likelihood form):

H 2|E
" $(|& " &|$( "($()
= P(E)>0, P(H2)>0
H1 |"
" $) |& " &|$) "($) )

©Kathryn Blackmond Laskey Spring 2023 Unit 1 v2- 13 -

Interpreting Bayes Rule

Bayes Rule (odds likelihood form):

H 2|E
• H2
2 30 |4 2 4|30 2(30 ) E
= H1 H1 |"
2 31 |4 2 4|31 2(31 )

• Terminology:
𝑃(𝐻) - The prior probability of 𝐻 𝑃(𝐸|𝐻) - The likelihood for E given H
𝑃(𝐸) - The predictive probability of 𝐸 𝑃(𝐻|𝐸) - The posterior probability of 𝐻 given 𝐸
2 3|4' 2(4' )
- The likelihood ratio for 𝐻𝑖 versus 𝐻𝑗 - The prior odds ratio for 𝐻𝑖 versus Hj
2 3|4( 2(4( )

• The posterior probability of 𝐻𝑖 increases relative to 𝐻𝑗 if the evidence is more

likely given 𝐻𝑖 than given 𝐻𝑗

©Kathryn Blackmond Laskey Spring 2023 Unit 1 v2- 14 -

Probability Review: Summary

• Events: subsets of sample space

• Probability:
• Maps event to number between 0 and 1
• Measures how likely event is to occur
• Satisfies basic rules
• Conditional probabilities measure how likely an event is given that
another event has occurred
• Bayes rule tells us how probabilities change when we get new
evidence

©Kathryn Blackmond Laskey Spring 2023 Unit 1 v2- 15 -

Extending the Disease Example:
Gathering Information
• We can perform a test before deciding whether to treat the patient
• Test has two outcomes: tP (positive) and tN (negative)
• Quality of test is characterized by two numbers:
• Sensitivity: Probability that test is positive if patient has disease
• Specificity: Probability that test is negative if patient does not have disease

• Test characteristics:
• Sensitivity: P(tP | sD) = 0.95
• Specificity: P(tN | sW) = 0.85
• How does the model change if test results are available?
• Take test, observe outcome t
• Revise prior beliefs P(sD) to obtain posterior beliefs P(sD|t)
• Re-compute optimal decision using P(sD|t)
©Kathryn Blackmond Laskey Spring 2023 Unit 1 v2- 16 -
Disease Example with Test
EU(aN | tN )

• Review of Problem Ingredients: EU(aT)

97.5
90
• P(sD) = 0.3 (prior probability of disease)
70
• P(tP | sD) = 0.95; P(tN | sW) = 0.85 (sensitivity & specificity of test) EU(aN )

• u(cWN) = 100, u(cWS) = 90; u(cDN) = 0 (utilities)

EU(aN | tP ) 26.9

P(𝑡" |𝑠! )P(𝑠! ) P(𝑡$ |𝑠! )P(𝑠! )

• P(sD | tN) = • P(sD | tP) =

(0.3 x 0.05)/(0.3 x 0.05 + 0.7 x 0.85) = 0.025 (0.3 x 0.95)/(0.3 x 0.95 + 0.7 x 0.15) = 0.731
• EU(aN | tN) = 0.025 ´ 0 + (1-0.025) ´ 100 = 97.5 • EU(aN | tP) = 0.731 ´ 0 + (1-0.731) ´ 100 = 26.9
• EU(aT | tN) = 0.025 ´ 90 + (1-0.025) ´ 90 = 90 • EU(aT | tP) = 0.731 ´ 90 + (1-0.731) ´ 90 = 90
• Best action is not to treat
• Best action is to treat
• Optimal policy is to treat if positive; don’t treat if negative
• We will call this strategy aF (FollowTest)
©Kathryn Blackmond Laskey Spring 2023 Unit 1 v2- 17 -
Decision Model with Test: Summary
• To model a decision problem we specify:
• Possible actions [aT, aN in medical example]
• Consequences [cWN, cWS, cDN in medical example]
• States [sD, sW in medical example]
• Probabilities of states depend on test outcome
• P(sD | tN) = 0.025, P(sW| tN) = 0.975 with negative test
• P(sD | tP) = 0.731, P(sW| tP) = 0.269 with positive test
• Utilities for consequences [u(cWN) = 100, u(cWS) = 90; u(cDN) = 0 in medical example]
• To find the best decision we calculate the expected utility for each action
given the test result and choose the best
• E[u(aT | tN)] = 90 ; E[u(aN | tN)] = 97.5; best decision is not to treat if test is negative
• E[u(aT | tP)] = 90 ; E[u(aN | tP)] = 26.9; best decision is to treat if test is positive
• We always make our decision based on the information we have at the time
of the decision, so if a test result is available, we use the probability given
the test result
©Kathryn Blackmond Laskey Spring 2023 Unit 1 v2- 18 -
Should We Gather Information?
• Reminder of problem ingredients: EU(aN | tN )
• P(sD) = 0.3 (prior probability of disease) EU(aF)
EU(aT) 97.5
• u(cWN) = 100, u(cWS) = 90; u(cDN) = 0 (utilities) = EU(aT|tP ) 90 94.6
= EU(aT|tN )
• P(tP | sD) = 0.95; P(tN | sW) = 0.85 (sensitivity & specificity of test) 70

• Expected utility after doing test: EU(aN )

• If test is positive we should treat, with EU(aT) = EU(aT | tP) = 90 EU(aN | tP ) 26.9

• If test is negative we should not treat, with EU(aN | tN) = 97.54098

• Probability test will be positive (use law of total probability):
• P(tP) = P(tP | sD) P(sD) + P(tP | sW) P(sW) = 0.95x0.3 + 0.15x0.7 = 0.39
• Expected utility of FollowTest strategy (treat if test is positive, otherwise not):
• EU(aF) = P(tP) EU(aT | tP) + P(tN) EU(aN | tN)
= 0.39 x 90 + (1-0.39) x 97.54098 = 94.6
• EU(aF) is larger than EU(aT) = 90 so we should do the test
©Kathryn Blackmond Laskey Spring 2023 Unit 1 v2- 19 -
Decision Tree for Disease Model with Test
26.9

90
//

94.6
97.5

94.6 97.5

70
//
90
//

©Kathryn Blackmond Laskey Spring 2023 Unit 1 v2- 20 -

Expected Value of Information
• Expected Value of Sample Information (EVSI) is gain in expected utility
from doing a test
• EVSI for our medical example is 94.6 – 90 = 4.6
• Expected Value of Perfect Information (EVPI) is gain in expected utility
from perfect knowledge of an uncertain variable
• For medical example:
• Suppose an oracle will tell us whether patient is sick (sensitivity = specificity = 1)
• 30% chance we discover she is sick and treat - utility 90
• 70% chance we discover she is well and don’t treat - utility 100
• Expected utility if we ask the oracle 0.3 x 90 + 0.7 x 100 = 97
• Therefore EVPI = 97 - 90 = 7
• EVPI ≥ EVSI ≥ 0
• EVSI = 0 if information won’t change your decision
©Kathryn Blackmond Laskey Spring 2023 Unit 1 v2- 21 -
Should We Collect Information?

• General Principle: Free information can never hurt

• Whether we should do the test depends on whether utility gain
EVSI=4.6 is greater than cost of information
• To analyze decision of whether to collect information:
• Find maximum expected utility option if we don't collect information
• Compute its expected utility U0
• Find EVPI
• Compare EVPI with cost of information
• If EVPI is too small in relation to cost then stop; otherwise, compute EVSI
• Compare EVSI with cost of information
• Collect information if expected utility gain is greater than cost of information

©Kathryn Blackmond Laskey Spring 2023 Unit 1 v2- 22 -

Strategy Regions for Medical Decision
(Without Test)
Expected utility of not treating
depends on the probability
p = P(sD) of having the disease Expected Utility for Disease Problem
• E[U|aT] = 90 100

• E[U|aN] = 0p + 100(1-p) = 100(1 – p) 90

Expected gain from treatment at
80
• The strategy regions for the decision p = 0.3

Expected Utility
(without test):
70

• aT if p > 0.1 50

40
• aN if p < 0.1
30
• What are the strategy regions if we 20
E[U(Treat)]

do a test? 10 E[U(NoTreat)]
0
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

P(Sick)
©Kathryn Blackmond Laskey Spring 2023 Unit 1 v2- 23 -
Expected Utility of FollowTest Policy
as Function of Prior Probability p = P(sD)
• FollowTest strategy treats if test is positive and otherwise not
World Probability Action Utility Before doing test, we think:
State P(t|s) P(s) • There are four possibilities for disease status
Sick, .95p Treat 90 and test results. Their probabilities are shown
Positive in the table
Sick, .05p NoTreat 0 P(s, t) = P(t|s) P(s) = P(s|t) P(t)
Negative • We treat if test is positive and don’t treat if test
Well, .15(1-p) Treat 90 is negative, with utilities shown in last column.
Positive • We multiply probability times utility for each
Well, .85(1-p) NoTreat 100 world state and sum to get the expected utility
Negative of FollowTest

E[U|aF] = 0.95p´90 + .05p´0 + 0.15 (1-p)´90 + 0.85 (1-p)´100

= 98.5 – 13p

©Kathryn Blackmond Laskey Spring 2023 Unit 1 v2- 24 -

Strategy Regions for Medical Decision
(With Test)
Strategy Regions for Costless Test Test
100
Sensitivity Analysis for EVSI with Costless
• FollowTest: EU(aF) = 98.5 – 13p
95
EVSI is positive for
0.017 < p < 0.654;
• AlwaysTreat: EU(aT) = 90
otherwise EVSI = 0 • FollowTest is better when 98.5 – 13p > 90
Expected Utility

EVSI
90
or p < 8.5/13 = 0.654
85 • NeverTreat: EU(aN) = 100(1 – p)
80
E[U(Treat)]
• FollowTest is better when 98.5 – 13p > 100(1-p)
or p > 1.5/87 = 0.017
E[U(NoTreat)]
E[U[FollowTest)]

75
0 0.2 0.4 0.6 0.8 1

P(Sick)

Region Optimal Strategy

p < 0.017 NeverTreat
0.017 < p < 0.654 FollowTest
p > 0.654 AlwaysTreat

©Kathryn Blackmond Laskey Spring 2023 Unit 1 v2- 25 -

Decision Model for Whether to Test: Summary

• To model a decision problem we specify:

• Possible policies* [aT, aN, aF in medical example with test]
• Consequences [cWN, cWS, cDN as previously]
• States [(sD, tP), (sD, tN), (sW, tP), (sW, tN), in medical example with test]
• Probabilities of states
• P(sD, tP)=0.95p, P(sD, tN)=0.05p, P(sW, tP)=0.15(1-p), P(sW, tN)=0.85(1-p)
• For p=0.3, P(sD, tP)=0.285, P(sD, tN)=0.015, P(sW, tP)=0.105, P(sW, tN)=0.595
• Utilities for consequences [u(cWN) = 100, u(cWS) = 90; u(cDN) = 0 as previously]
• To find the best decision we calculate the expected utility for each
policy
• E[u(aT)] = 90 ; E[u(aN)] = 100(1 – p); E[u(aF)] = 98.5 – 13p in medical example with test
• Best policy is aT if p≤0.017; aF if 0.017< p < 0.654; aN if p > 0.654
• For p=0.3, best policy is aF
*We use the word policy for a sequence of actions taken over time that can depend on information we acquire over time
©Kathryn Blackmond Laskey Spring 2023 Unit 1 v2- 26 -
Strategy Regions for Costly Test:

Expected Utility of Optimal Strategy with Costly Test

100 • FollowTest: EU(aF) = 98.5 – 13p – c
c=0
98 c=1 • AlwaysTreat: EU(aT) = 90
c=4
c≥7.2
• FollowTest is better when
96
98.5 – 13p – c > 90 or p < (8.5-c)/13
Expected Utility

c≥7.2

94
• NeverTreat: EU(aN) = 100(1 – p)
92 Gain from doing test with c=1 at p=0.3
• FollowTest is better when
98.5 – 13p – c > 100(1-p) or p > (1.5+c)/87
90

88 (c is cost of test)
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
P(Sick)

• Test is worth doing if gain is larger than cost.

• Range of values for which test is worth doing:
(1.5+c)/87 < p < (8.5-c)/13
©Kathryn Blackmond Laskey Spring 2023 Unit 1 v2- 27 -
EVSI and Costly Test
Expected(U*lity(of(Op*mal(Strategy(with(Costly(Test(

• Information collection is optimal when EVSI 100"

is greater than cost of test 98"

c=0"

c=1"

Probability range where testing is optimal

96"
c=4"

• c≥7.2"

depends on cost of test

94"

92"

• In our example for a test with cost c: 90"

• Testing is optimal if (1.5+c)/87 < p < (8.5-c)/13 88"

0" 0.1" 0.2" 0.3" 0.4" 0.5" 0.6" 0.7" 0.8" 0.9" 1"

EVSI%as%a%Func,on%of%Prior%Probability%
8"

4"
Range of optimality
3"
of test with c=1
2"

0"
©Kathryn Blackmond Laskey Spring 2023 Unit 1 v2-0"28 - 0.2" 0.4" 0.6" 0.8" 1"
Summary : Value of Information and Strategy
Regions
• Collecting information may have value if it might change your decision
• Expected value of perfect information (EVPI) is utility gain from knowing true value of
uncertain variable
• Expected value of sample information (EVSI) is utility gain from available information
• In our example, EVSI is positive for 0.017 < p < 0.654
• If 0.017 ≤ p ≤ 0.1 EVSI is 87p - 1.5
• If 0.1 ≤ p ≤ 0.654 EVSI is 8.5 – 13p
• If p = 0.3 EVSI is 8.5 – 13p = 4.6 (testing is optimal)
• Costly information has value when EVSI is greater than cost of information
• In our example:
• If 0.017 ≤ p ≤ 0.1 Test if 87p - 1.5 > c (where c is cost of test)
• If 0.1 ≤ p ≤ 0.654 Test if 8.5 – 13p > c
• If p = 0.3 Test if 4.6 > c (test if c is less than 4.6)
©Kathryn Blackmond Laskey Spring 2023 Unit 1 v2- 29 -
What if a Probability is Unknown?
• The model for our medical example depends on several parameters
• Prior probability of disease
• Sensitivity of test
• Specificity of test
• Usually these probabilities are estimated from data and/or expert
judgment
• “Randomized clinical trials have established that Test T has sensitivity 0.95 and
specificity 0.85 for Disease D”
• “Given the presenting symptoms and my clinical judgment, I estimate a 30%
probability that the patient has Disease D.”
• How does a Bayesian combine data and expert judgment?
• Use clinical judgment to quantify uncertainty about as a probability distribution
• Gather data
• Use Bayes rule to obtain posterior distribution for the unknown probability
• If appropriate, use clinical judgment to adjust results of studies to apply to a
particular patient
©Kathryn Blackmond Laskey Spring 2023 Unit 1 v2- 30 -
Example: Bayesian Inference about a
Probability (with a very small sample)
The unknown probability actually has a
• Assign prior distribution to possible values of disease continuous range of values. We will treat
continuous distributions later. For now we
probability p approximate with a finite set of values.
• Although p can be any real number between zero and 1,
we pretend there are only 20 equally spaced possible values
• Our prior distribution is consistent with our estimate p=0.3
• Observe 10 independent and identically distributed
(iid) cases
• (X1, X2, X3, X4, X5, X6, X7, X8, X9 X10) = (0, 1, 0, 0, 0, 0, 1, 0, 0, 1)
• Cases 2, 7, and 10 have disease; the rest do not
• How do we find the posterior distribution of the
unknown probability?
©Kathryn Blackmond Laskey Spring 2023 Unit 1 v2- 31 -
Posterior Distribution of Disease Parameter

• Applying Bayes Rule

• We observed 3 cases of disease in 10 trials
• Likelihood of data is 𝑝 , (1 − 𝑝)-
• Multiply prior 𝑔(𝑝) times likelihood 𝑝 , (1 − 𝑝)- and divide
by sum:
.(/)/0 (01/)1
𝑔(𝑝│𝑥) = ∑23 .(/3)/30 (01/3)1

• Notice that the posterior distribution depends only

on the number of cases with and without the disease
• Cases with and without the disease are sufficient for
inference about p
Underscore indicates a vector: x=(x1, x2, x3, x4, x5, x6, x7, x8, x9, x10)
©Kathryn Blackmond Laskey Spring 2023 Unit 1 v2- 32 -
Bayesian Inference Example:
R Code

Horizontal axis is p = P(Sick);

height of bar is probability that P(Sick) = p
©Kathryn Blackmond Laskey Spring 2023 Unit 1 v2- 33 -
Bayesian Learning and Sample Size
• When the sample size is very large:
• The posterior distribution will be concentrated around the maximum likelihood estimate and is relatively
insensitive to the prior distribution
• We wonʼt go too far wrong if we act as if the parameter is equal to the maximum likelihood estimate
• When the sample size is very small:
• The posterior distribution is highly dependent on the prior distribution
• Reasonable people may disagree on the value of the parameter
• When the sample size is moderate, Bayesian learning can be a big improvement on either expert
judgment alone or data alone
• Achieving the benefit requires careful modeling
• This course will teach methods for constructing Bayesian models
• A powerful characteristic of the Bayesian approach is the flexibility to tailor results to moderate-
sized sub-populations
• Bayesian estimate “shrinks” estimates of sub-population parameters toward population average
• Amount of shrinkage depends on sample size and similarity of sub-population to overall population
• Shrinkage improves estimates for small to moderate sized sub-populations

Effect of Sample Size on Posterior Distribution
Sample size 5: 1 with, 4 without • These plots show the posterior distribution for Θ
when:
0.4"

0.35"

Prior distribution is uniform

0.3"

0.25"
•
0.2"
• 20% of patients in sample have the disease
0.15"

0.1"
• Posterior distribution becomes more concentrated
around 1/5 as sample size gets larger
0.05"

0"
0.025" 0.075" 0.125" 0.175" 0.225" 0.275" 0.325" 0.375" 0.425" 0.475" 0.525" 0.575" 0.625" 0.675" 0.725" 0.775" 0.825" 0.875" 0.925" 0.975"

Sample size 20: 4 with, 16 without Sample size 80: 16 with, 64 without
0.4" 0.4"

0.35" 0.35"

0.3" 0.3"

0.25" 0.25"

0.2" 0.2"

0.15" 0.15"

0.1" 0.1"

0.05" 0.05" Horizontal axis is q = P(sD); height

0"
0.025" 0.075" 0.125" 0.175" 0.225" 0.275" 0.325" 0.375" 0.425" 0.475" 0.525" 0.575" 0.625" 0.675" 0.725" 0.775" 0.825" 0.875" 0.925" 0.975"
0"
0.025" 0.075" 0.125" 0.175" 0.225" 0.275" 0.325" 0.375" 0.425" 0.475" 0.525" 0.575" 0.625" 0.675" 0.725" 0.775" 0.825" 0.875" 0.925" 0.975"
of bar is probability that Θ =q

Sample Size and Impact of the Prior
Distribution
• Prior distribution favors low probabilities:
Posterior(Distribu,on(for(1(Case(in(5(Samples( Posterior(Distribu,on(for(10(Cases(in(50(Samples(
Prior%Distribu+on%%
0.4" 0.4"
0.4"

0.35" 0.35"
0.35"

0.3" 0.3"
0.3"

0.25" 0.25"
0.25"

0.2" 0.2" 0.2"

0.15" 0.15" 0.15"

0.1" 0.1" 0.1"

0.05" 0.05" 0.05"

0" 0"
0"
0.025" 0.075" 0.125" 0.175" 0.225" 0.275" 0.325" 0.375" 0.425" 0.475" 0.525" 0.575" 0.625" 0.675" 0.725" 0.775" 0.825" 0.875" 0.925" 0.975" 0.025" 0.075" 0.125" 0.175" 0.225" 0.275" 0.325" 0.375" 0.425" 0.475" 0.525" 0.575" 0.625" 0.675" 0.725" 0.775" 0.825" 0.875" 0.925" 0.975"
0.025" 0.075" 0.125" 0.175" 0.225" 0.275" 0.325" 0.375" 0.425" 0.475" 0.525" 0.575" 0.625" 0.675" 0.725" 0.775" 0.825" 0.875" 0.925" 0.975"

Horizontal axis is q =
• Prior distribution favors high probabilities: P(sD); height of bar is
probability that Θ =q
Prior%Distribu+on% Posterior(Distribu,on(for(1(Case(in(5(Samples( Posterior(Distribu,on(for(10(Cases(in(50(Samples(
0.4" 0.4" 0.4"

0.35" 0.35" 0.35"

0.3" 0.3" 0.3"

0.25" 0.25" 0.25"

0.2" 0.2" 0.2"

0.15" 0.15" 0.15"

0.1" 0.1" 0.1"

0.05" 0.05" 0.05"

0" 0" 0"

0.025" 0.075" 0.125" 0.175" 0.225" 0.275" 0.325" 0.375" 0.425" 0.475" 0.525" 0.575" 0.625" 0.675" 0.725" 0.775" 0.825" 0.875" 0.925" 0.975" 0.025" 0.075" 0.125" 0.175" 0.225" 0.275" 0.325" 0.375" 0.425" 0.475" 0.525" 0.575" 0.625" 0.675" 0.725" 0.775" 0.825" 0.875" 0.925" 0.975" 0.025" 0.075" 0.125" 0.175" 0.225" 0.275" 0.325" 0.375" 0.425" 0.475" 0.525" 0.575" 0.625" 0.675" 0.725" 0.775" 0.825" 0.875" 0.925" 0.975"

• Bayesian inference “shrinks” posterior distribution toward prior expectations

• Posterior distribution for smaller sample is more sensitive to prior distribution
• Posterior distribution for larger sample is less sensitive to prior distribution

Some Concepts of Probability
• Classical - Probability is a ratio of favorable cases to total (equipossible) cases
• Frequency - Probability is the limiting value as the number of trials becomes infinite of the
frequency of occurrence of a type of event
• Logical - Probability is a logical property of one’s state of information about a phenomenon
• Propensity - Probability is a propensity for certain kinds of physical event to occur
• Subjective - Probability is an ideal rational agent’s degree of belief about an uncertain event
• Algorithmic - The algorithmic probability of a finite sequence is the probability that a universal
computer fed a random input will give the sequence as output (related to Kolmogorov complexity)
• Game Theoretic - Probability is an agent’s optimal “announced certainty” for an event in a multi-
agent game in which agents receive rewards that depend on both forecasts and outcomes
Probability really is none of these things.
Probability can represent all of these things.
©Kathryn Blackmond Laskey Spring 2023 Unit 1 v2- 37 -
The Frequentist
• A frequentist believes:
• Probability can be legitimately applied only to repeatable problems
• Probability is an objective property in the real world
• Probability applies only to random processes
• Probabilities are associated only with collectives not individual events

• Frequentist Inference
• Data are drawn from a distribution of known form but with an unknown parameter (this includes
“nonparametric” statistics in which the unknown parameter is the distribution itself)
• Distribution may arise from explicit randomization or may be considered “close enough” to random
• Inference treats data as random and parameter as fixed
• For example: A sample X1,…XN is drawn from a normal distribution with mean Θ . A 95% confidence
interval is constructed. The interpretation is:
If an experiment like this were performed many times we would expect in 95% of the cases that an interval calculated
by the procedure we applied would include the true value of Θ .
• A frequentist can say nothing about any individual experiment about q!
©Kathryn Blackmond Laskey Spring 2023 Unit 1 v2- 38 -
The Subjectivist

• A subjectivist believes:
• Probability as an expression of a rational agent’s degrees of belief about uncertain
propositions.
• Rational agents may disagree. There is no “one correct probability.”
• If the agent receives feedback her assessed probabilities will in the limit converge to
observed frequencies
• Subjectivist Inference:
• Probability distributions are assigned to unknowns (parameters and observations).
• Condition on knowns; use probability to express uncertainty about unknowns
• For example: A sample X1,…XN is drawn from a normal distribution with mean q having prior
distribution g(q). A 95% posterior credible interval is constructed, and the result is the
interval (3.7, 4.9). The interpretation is:
Given prior distribution for q and observed data, the probability that q lies between 3.7 and 4.9 is 95%.
• A subjectivist can draw conclusions about what we should believe about 𝜃 and
about what we should expect on the next trial

The Bayesian Resurgence
• Bayesian inference is as old as probability
• Subjective view fell into disfavor in 19th and early 20th centuries
• Positivism, empiricism, and quest for objectivity in science
• “Paradoxes” and systematic deviation of human judgment from Bayesian
“norm”
• There has been a recent resurgence
• Computational advances make calculation possible for complex models
• Bayesian models can coherently integrate many different kinds of information
• Physical cause and effect
• Logical implication
• Informed expert judgment
• Empirical observation
• Unified theory and methods for data-rich and data-poor problems
• Clear connection to decision making
©Kathryn Blackmond Laskey Spring 2023 Unit 1 v2- 40 -
Comparison: Understandability,
Subjectivity and Honest Reporting
• Often the Bayesian answer is what the decision maker really wants to hear.
• Untrained people often interpret results in the Bayesian way.
• Frequentists are disturbed by dependence of the posterior interval on “subjective” prior
distribution.
It is more important that stochastics provides a means of communication among researchers
whose personal beliefs about the phenomena under study may differ. If these beliefs are
allowed to contaminate the reporting of results, … how are the results of different researchers
to be compared?
- H. Dinges
• Bayesians say the prior distribution is not the only subjective element in an analysis.
• Bayesian probability statements are always subjective, but statistical analyses are often
done for public consumption. Whose probability distribution should be reported?
• For large samples, a good Bayesian analysis and a good frequentist analysis are usually
similar
• If results are sensitive to the prior distribution, a Bayesian analyst should report this sensitivity
and present a range of results obtained from a range of prior distributions

Comparison: Generality

• Subjectivists can handle problems the frequentist approach cannot (in

particular, problems with not enough data for sound frequentist
inference).
• Frequentist statisticians say this comes at a price -- when there are
not enough data the result will be highly dependent on the prior
distribution.
• Subjectivists often apply frequentist techniques but with a Bayesian
interpretation
• Frequentists often apply Bayesian methods if they have good
frequency properties

Coherence and Rationality

• In the mid 20th century, several authors proposed systems of axioms

intended to characterize rational behavior
• Proofs that decision-makers satisfying these axioms must be expected utility
maximizers
• Proofs that decision-makers not satisfying these axioms are vulnerable to
exploitation (“Dutch book”)
• Well-documented systematic departures of human decision-making from
expected utility maximization
• A decision-maker is called coherent if she behaves as a maximizer of
expected utility
• Should coherence be equated with rationality?
©Kathryn Blackmond Laskey Spring 2023 Unit 1 v2- 43 -
Axioms for Probability
De Groot, 1970

There is a qualitative relationship of relative likelihood  , that operates on pairs of

events, that satisfies the following conditions:

SP1. For any two uncertain events A and B, one of the following relations holds: A Comparability
 B, A  B or A ~ B.
SP2. If A 1, A2, B1, and B2 are four events such that A1∩A2=∅, B1∩B2=∅, and if Ai Union of disjoint
 Bi, for i = 1,2, then A1∪A2  B1∪B2. If in addition Ai  Bi for either i=1 or events
i=2, then A1∪A2  B1∪B2.
SP3. If A is any event, then ∅  A. Furthermore, there is some event A0 for which Null lottery
∅  A0.
SP4. If A1⊃A2⊃ … is a decreasing sequence of events, and B is some event such Decreasing
sequences
that Ai  B for i=1, 2, …, then  i=1 Ai  B .
∞

SP5. There is an experiment, with a numerical outcome between the values of 0 and Existence of
1, such that if Ai is the event that the outcome x lies within the interval ai ≤ x ≤ uniform
bi, for i=1,2, then A1  A2 if and only if (b1-a1) ≤ (b2-a2). distribution
©Kathryn Blackmond Laskey Spring 2023 Unit 1 v2- 44 -
Axioms for Utility
Watson and Buede, 1987

A reward is a prize the decision maker cares about. A lottery is a situation in which
the decision maker will receive one of the possible rewards, where the reward to be
received is governed by a probability distribution. There is a qualitative relationship
of relative preference  * , that operates on lotteries, that satisfies the following
conditions:

SU1. For any two lotteries L1 and L2, either L1  * L2, L1  * L2, or L1~*L2.
Furthermore, if L1, L2, and L3 are any lotteries such that L1  * L2 and Comparability and transitivity
L2  * L3, then L1  * L3.
SU2. If r 1, r2 and r3 are rewards such that r1  * r2  * r3, then there exists a
probability p such that [r1: p; r3: (1-p)] ~* r2, where [r1:p; r3:(1-p)] is a Lottery equivalence
lottery that pays r1 with probability p and r3 with probability (1-p ).
SU3. If r1 ~* r2 are rewards, then for any probability p and any reward r3, Substitutability of equivalent
[r1: p; r3: (1-p)] ~* [r2: p; r3: (1-p )]
rewards
SU4. If r1  * r2 are rewards, then [r1: p; r2: (1-p)]  * [r1: q; r2: (1-q)] if and
only if p > q. Higher chance of better reward
SU5. Consider three lotteries, Li = [r1: pi; r2: (1-pi)], i = 1, 2, 3, giving different
probabilities of the two rewards r1 and r2. Suppose lottery M gives entry to Compound lottery
lottery L2 with probability q and L3 with probability 1-q. Then L1~*M if and
©Kathryn Blackmondonly if p1 = qp2 + (1-q)p3. Spring 2023
Laskey Unit 1 v2- 45 -
Probabilities and Utilities
• If your beliefs satisfy SP1-SP5, then there is a probability
distribution Pr(⋅) over events such that for any two events A1
and A2, Pr(A1) ≥ Pr(A2) if and only if A1  A2.
• If your preferences satisfy SU1-SU5, then there is a utility
function u(⋅) defined on rewards such that for any two lotteries
L1 and L2, L1  * L2 if and only if E[u(L1)] ≥ E[u(L2)], where
E[⋅] denotes the expected value with respect to the probability
distribution Pr(⋅).
?

Why be a Bayesian?
• Arguments from theory
• A coherent decision maker uses probability to represent uncertainty, uses
utility to represent value, and maximizes expected utility
• If you are not coherent then someone can make "Dutch book" on you (turn
you into a "money pump")
• Pragmatic arguments
• Useful and principled methodology for modeling inference, decision and
learning
• Analyze engineering tradeoffs between accuracy, complexity and cost
• Represent and incorporate both empirical data and informed engineering
judgment
• Handle small, moderate and large sample sizes and parameter sets
• Interpretability of results and understandability of model
• Arguments from experience
• Successful applications attributed to decision theory
©Kathryn Blackmond Laskey Spring 2023 Unit 1 v2- 47 -
What do you think?

Unit 1: Summary and Synthesis

• Bayesian statistics is a theory of rational belief dynamics

• We took a broad-brush tour of Bayesian methodology
• We applied Bayesian thinking to a simplified medical example that illustrates
many of the concepts we will be learning this semester
• Bayesian decision theory provides a methodology for rational choice under
uncertainty
• The twentieth century saw a resurgence of interest in subjective probability and
an increased understanding of the appropriate role of subjectivity in science
• Most statistics texts and courses take a frequentist approach but this is changing
• The inventors of probability theory thought of it as a logic of enlightened rational reasoning.
In the nineteenth century this was replaced by a view of probability as measuring “objective”
propensities of “intrinsically random” phenomena
• Bayesian methods often require more computational power than traditional frequentist
methods
• The computer revolution has enabled the Bayesian resurgence

References for Unit 1
• Bayes, Thomas. An essay towards solving a problem in the doctrine of chances. Philosophical Transactions of the Royal Society of
London, 53:370- 418, 1763.
• Bashir, S.A., Getting Started in R, http://www.luchsinger-mathematics.ch/Bashir.pdf
• Dawid, A.P. and Vovk, V.G. (1999), Prequential Probability: Principles and Properties, Bernoulli, 5: 125-162.
• de Finetti, Bruno. Theory of Probability: A Critical Introductory Treatment. New York: Wiley, 1974.
• Gelman, A., Carlin, J., Stern, H. and Rubin, D., Bayesian Data Analysis (2nd edition), Chapman & Hall, 2004. Chapter 1
• Hájek, Alan, "Interpretations of Probability", The Stanford Encyclopedia of Philosophy (Summer 2003 Edition), Edward N. Zalta (ed.),
URL = <http://plato.stanford.edu/archives/sum2003/entries/probability-interpret/>.
• Lee, P. Bayesian Statistics: An Introduction, 4th ed. Springer, 2012. Chapter 1
• Li, Ming and Vitanyi, Paul. An Introduction to Kolmogorov Complexity and Its Applications. (2nd ed) Springer-Verlag, 2005.
• Nau, Robert F. (1999), Arbitrage, Incomplete Models, And Interactive Rationality, working paper, Fuqua School of Business, Duke
University.
• Neapolitan, R. Learning Bayesian Networks, Prentice Hall, 2003.
• Jaynes, E., Probability Theory: The Logic of Science, Cambridge University Press, 2003)
• Savage, L.J., The Foundations of Statistics. Dover, 1972.
• Shafer, G. Probability and Finance: It’s Only a Game, Wiley, 2001.
• von Mises R., 1957, Probability, Statistics and Truth, revised English edition, New York: Macmillan

Statistical Inference For Engineers and Data Scientists Solutions Manual
No ratings yet
Statistical Inference For Engineers and Data Scientists Solutions Manual
12 pages
Approximating The Shapiro-Wilk W-Test For Non-Normality
No ratings yet
Approximating The Shapiro-Wilk W-Test For Non-Normality
3 pages
Integer Programming Practice
No ratings yet
Integer Programming Practice
5 pages
24-Module - 5 Uncertain Knowledge and Reasoning-12!03!2024
No ratings yet
24-Module - 5 Uncertain Knowledge and Reasoning-12!03!2024
54 pages
FALLSEM2023-24 CSE3013 ETH VL2023240103712 2023-08-01 Reference-Material-I
No ratings yet
FALLSEM2023-24 CSE3013 ETH VL2023240103712 2023-08-01 Reference-Material-I
34 pages
Module 5
No ratings yet
Module 5
65 pages
Chapter 5
No ratings yet
Chapter 5
18 pages
Bcse306l Ai Module-5 Smsatapathy (1)
No ratings yet
Bcse306l Ai Module-5 Smsatapathy (1)
98 pages
UNIT-4-UNCERTAINTY
No ratings yet
UNIT-4-UNCERTAINTY
14 pages
Lecture7 -Probabilistic Reasoning (Updated)
No ratings yet
Lecture7 -Probabilistic Reasoning (Updated)
59 pages
09 AI Probability Based Expert Systems
No ratings yet
09 AI Probability Based Expert Systems
64 pages
Unit-4 Uncertainty
No ratings yet
Unit-4 Uncertainty
49 pages
Module-5 complete notes-Quantifying Uncertainty 20th February 2024
No ratings yet
Module-5 complete notes-Quantifying Uncertainty 20th February 2024
66 pages
SDA Bayes
No ratings yet
SDA Bayes
12 pages
Module 5
No ratings yet
Module 5
14 pages
PAI Module 5
No ratings yet
PAI Module 5
13 pages
Module 2
No ratings yet
Module 2
12 pages
System Identification
No ratings yet
System Identification
114 pages
4 AI Module 5
No ratings yet
4 AI Module 5
13 pages
TTNT 07 QUANTIFYING UNCERTAINTY
No ratings yet
TTNT 07 QUANTIFYING UNCERTAINTY
35 pages
UNIT-4 Uncertainty in Artificial Intelligence
No ratings yet
UNIT-4 Uncertainty in Artificial Intelligence
38 pages
AIML mod 2
No ratings yet
AIML mod 2
13 pages
Module 5
No ratings yet
Module 5
18 pages
Wa0031.
No ratings yet
Wa0031.
41 pages
Topic - 7 (Uncertainty)
No ratings yet
Topic - 7 (Uncertainty)
25 pages
Lecture 2-3 Reasoning With Uncertainty-1
No ratings yet
Lecture 2-3 Reasoning With Uncertainty-1
27 pages
Chapter13 Uncertainty
No ratings yet
Chapter13 Uncertainty
49 pages
Elementary Probability and Naive Bayes Classifiers
No ratings yet
Elementary Probability and Naive Bayes Classifiers
88 pages
MOD 3-1
No ratings yet
MOD 3-1
80 pages
Artificial Intelligence- Module 4
No ratings yet
Artificial Intelligence- Module 4
19 pages
Analyse a health care system and explain how would you represent a medical diagnostic system
No ratings yet
Analyse a health care system and explain how would you represent a medical diagnostic system
4 pages
Module 5 Complete Notes Quantifying Uncertainty 20th February 2024.Pptx
No ratings yet
Module 5 Complete Notes Quantifying Uncertainty 20th February 2024.Pptx
66 pages
AI notes
No ratings yet
AI notes
30 pages
IAI-Unit5-set 2
No ratings yet
IAI-Unit5-set 2
5 pages
AI_BAD402_ M5
No ratings yet
AI_BAD402_ M5
14 pages
UNIT II_AIML.docsd
No ratings yet
UNIT II_AIML.docsd
43 pages
AIML UNIT 2
No ratings yet
AIML UNIT 2
22 pages
University of Dar Es Salaam Coict: Department of Computer Science & Eng
No ratings yet
University of Dar Es Salaam Coict: Department of Computer Science & Eng
42 pages
Chapter Five AI
No ratings yet
Chapter Five AI
30 pages
AI-UNIT-IVM
No ratings yet
AI-UNIT-IVM
24 pages
CS3491 Unit 2 Aiml
100% (1)
CS3491 Unit 2 Aiml
21 pages
unit2 AI & ML
No ratings yet
unit2 AI & ML
29 pages
Uncertainty PDF
No ratings yet
Uncertainty PDF
102 pages
Module 4 - Probability Reasoning and Uncertainty
No ratings yet
Module 4 - Probability Reasoning and Uncertainty
80 pages
Module 3
No ratings yet
Module 3
36 pages
AIML Lect7 Bayes
No ratings yet
AIML Lect7 Bayes
48 pages
Topic - 8 (Uncertainty)
No ratings yet
Topic - 8 (Uncertainty)
25 pages
Uncertainty: Vineet Sahula
No ratings yet
Uncertainty: Vineet Sahula
42 pages
Unit II Full Notes
No ratings yet
Unit II Full Notes
108 pages
PPT05-Quantifying Uncertainty
No ratings yet
PPT05-Quantifying Uncertainty
39 pages
Topic - 8 (Uncertainty)
No ratings yet
Topic - 8 (Uncertainty)
25 pages
Uncertainty
No ratings yet
Uncertainty
27 pages
mid2
No ratings yet
mid2
211 pages
ai-lecture10
No ratings yet
ai-lecture10
20 pages
Unit 2 - Probabilistic Reasoning
No ratings yet
Unit 2 - Probabilistic Reasoning
25 pages
Quantifying Uncertainty: Week 5
No ratings yet
Quantifying Uncertainty: Week 5
38 pages
Lecture 29
No ratings yet
Lecture 29
65 pages
Ai2 Unit
No ratings yet
Ai2 Unit
22 pages
M05_01 Quantifying Uncertainty
No ratings yet
M05_01 Quantifying Uncertainty
20 pages
04 - Probability in AI
No ratings yet
04 - Probability in AI
169 pages
Dempster Shafer
No ratings yet
Dempster Shafer
134 pages
Foundations of Elementary Analysis
From Everand
Foundations of Elementary Analysis
Roshan Trivedi
No ratings yet
Full-versionINBPSO
No ratings yet
Full-versionINBPSO
20 pages
Particle_swarm_optimization_for_integer_programmin
No ratings yet
Particle_swarm_optimization_for_integer_programmin
7 pages
L12_intro-cnn-part1__slides
No ratings yet
L12_intro-cnn-part1__slides
56 pages
L13_intro-cnn-part2__slides
No ratings yet
L13_intro-cnn-part2__slides
92 pages
Constraint Programming: Michael Trick Carnegie Mellon
No ratings yet
Constraint Programming: Michael Trick Carnegie Mellon
41 pages
Next Generation Factory Layouts
No ratings yet
Next Generation Factory Layouts
19 pages
Materials Requirements Planning Vs Just in Time
0% (1)
Materials Requirements Planning Vs Just in Time
28 pages
Sequencing and Scheduling
100% (1)
Sequencing and Scheduling
20 pages
Math484 V1 PDF
No ratings yet
Math484 V1 PDF
167 pages
Rule 37. Failure To Make Disclosures or To Cooperate in Discovery Sanctions
No ratings yet
Rule 37. Failure To Make Disclosures or To Cooperate in Discovery Sanctions
5 pages
Advanced Vibration Chapter03
No ratings yet
Advanced Vibration Chapter03
130 pages
EXTRAJUDICIAL SETTLEMENT OF ESTATE, Cabuyadao
No ratings yet
EXTRAJUDICIAL SETTLEMENT OF ESTATE, Cabuyadao
2 pages
Test Bank for BUSN 11th by Kelly - Available For Quick Download And Unlimited Reading
100% (8)
Test Bank for BUSN 11th by Kelly - Available For Quick Download And Unlimited Reading
43 pages
SunSpec CSIP Conformance Test Procedures V1.2
No ratings yet
SunSpec CSIP Conformance Test Procedures V1.2
232 pages
CH02EN Operating Instruction
No ratings yet
CH02EN Operating Instruction
12 pages
45
No ratings yet
45
18 pages
What Is Design of Experiments
No ratings yet
What Is Design of Experiments
9 pages
MJ_STYLE
No ratings yet
MJ_STYLE
7 pages
Process and Operating Costing
No ratings yet
Process and Operating Costing
20 pages
Rape 1
No ratings yet
Rape 1
4 pages
Public Procurement Lesson 7
No ratings yet
Public Procurement Lesson 7
37 pages
A Report On District (Gopalgonj) Legal Aid Office Visit
No ratings yet
A Report On District (Gopalgonj) Legal Aid Office Visit
9 pages
6601 Manual
No ratings yet
6601 Manual
92 pages
Practice Questions of Tally
100% (6)
Practice Questions of Tally
18 pages
Definitions Under Indian Contract Act
No ratings yet
Definitions Under Indian Contract Act
36 pages
BSE 17062019162527 NSEintimationAnnualreport 249
No ratings yet
BSE 17062019162527 NSEintimationAnnualreport 249
313 pages
BL
No ratings yet
BL
36 pages
Kcse Analysis From 2017 To 2023 Mathematics Tr. Brian PP1 and PP2
No ratings yet
Kcse Analysis From 2017 To 2023 Mathematics Tr. Brian PP1 and PP2
4 pages
Tarsansar
No ratings yet
Tarsansar
27 pages
Stippel Ryan Resume 2014
No ratings yet
Stippel Ryan Resume 2014
2 pages
Lic Aao Six Months
No ratings yet
Lic Aao Six Months
2 pages
Semester II Syllabus M.tech Structural Engg
No ratings yet
Semester II Syllabus M.tech Structural Engg
16 pages
Branding Amman P1 LR
No ratings yet
Branding Amman P1 LR
56 pages
Dimming Actuators - Jung
No ratings yet
Dimming Actuators - Jung
157 pages
April 16th: - : "Enter The Value of A:" "Enter The Value of B:"
No ratings yet
April 16th: - : "Enter The Value of A:" "Enter The Value of B:"
6 pages
Quantitative Paper 504
No ratings yet
Quantitative Paper 504
14 pages
Turbo Air Manual
No ratings yet
Turbo Air Manual
46 pages
Erased Log by Sos
No ratings yet
Erased Log by Sos
2 pages
Review of Irrigation Advisory Platforms: A Benchmarking Study
No ratings yet
Review of Irrigation Advisory Platforms: A Benchmarking Study
11 pages

Bayes_Expected_Utility

Uploaded by

Bayes_Expected_Utility

Uploaded by

Bayesian Inference

©Kathryn Blackmond Laskey Spring 2023 Unit 1 v2- 1 -

• You will learn a way of thinking about problems of inference and

©Kathryn Blackmond Laskey Spring 2023 Unit 1 v2- 2 -

©Kathryn Blackmond Laskey Spring 2023 Unit 1 v2- 3 -

• Unit 1: A Brief Tour of Bayesian Inference and Decision Theory

©Kathryn Blackmond Laskey Spring 2023 Unit 1 v2- 4 -

• Describe the elements of a decision model

©Kathryn Blackmond Laskey Spring 2023 Unit 1 v2- 5 -

• Bayesians use probability to quantify rational

“All models are wrong but some are useful”

• Decision theory is a formal theory of decision making under uncertainty

• Question: What is the best action?

• Decision problem: Should patient be treated for disease?

©Kathryn Blackmond Laskey Spring 2023 Unit 1 v2- 8 -

• To model a decision problem we specify:

• Expected utility of not treating depends on the probability

we may want to explore how the optimal 100

decision changes as we vary p

If our estimate is near the crossover point,

we may want to gather information to 50

refine our estimate of p

We will use Bayesian inference to update

our estimate of the probability 0

• Probability is a mathematical representation for uncertainty

©Kathryn Blackmond Laskey Spring 2023 Unit 1 v2- 11 -

• The conditional probability 𝑃(𝐴|𝐵) satisfies:

• A and B are independent if 𝑃(𝐴|𝐵) = 𝑃(𝐴) B|B

©Kathryn Blackmond Laskey Spring 2023 Unit 1 v2- 12 -

• Objective: use evidence to update beliefs

©Kathryn Blackmond Laskey Spring 2023 Unit 1 v2- 13 -

Bayes Rule (odds likelihood form):

• The posterior probability of 𝐻𝑖 increases relative to 𝐻𝑗 if the evidence is more

©Kathryn Blackmond Laskey Spring 2023 Unit 1 v2- 14 -

• Events: subsets of sample space

©Kathryn Blackmond Laskey Spring 2023 Unit 1 v2- 15 -

• Review of Problem Ingredients: EU(aT)

• u(cWN) = 100, u(cWS) = 90; u(cDN) = 0 (utilities)

P(𝑡" |𝑠! )P(𝑠! ) P(𝑡$ |𝑠! )P(𝑠! )

• P(sD | tN) = • P(sD | tP) =

• Expected utility after doing test: EU(aN )

• If test is negative we should not treat, with EU(aN | tN) = 97.54098

©Kathryn Blackmond Laskey Spring 2023 Unit 1 v2- 20 -

• General Principle: Free information can never hurt

©Kathryn Blackmond Laskey Spring 2023 Unit 1 v2- 22 -

• E[U|aN] = 0p + 100(1-p) = 100(1 – p) 90

E[U|aF] = 0.95p´90 + .05p´0 + 0.15 (1-p)´90 + 0.85 (1-p)´100

©Kathryn Blackmond Laskey Spring 2023 Unit 1 v2- 24 -

Region Optimal Strategy

©Kathryn Blackmond Laskey Spring 2023 Unit 1 v2- 25 -

• To model a decision problem we specify:

Expected Utility of Optimal Strategy with Costly Test

• Test is worth doing if gain is larger than cost.

• Information collection is optimal when EVSI 100"

is greater than cost of test 98"

Probability range where testing is optimal

depends on cost of test

• In our example for a test with cost c: 90"

• Testing is optimal if (1.5+c)/87 < p < (8.5-c)/13 88"

• Applying Bayes Rule

• Notice that the posterior distribution depends only

Horizontal axis is p = P(Sick);

©Kathryn Blackmond Laskey Spring 2023 Unit 1 v2- 34 -

Prior distribution is uniform

0.05" 0.05" Horizontal axis is q = P(sD); height

©Kathryn Blackmond Laskey Spring 2023 Unit 1 v2- 35 -

0.2" 0.2" 0.2"

0.15" 0.15" 0.15"

0.1" 0.1" 0.1"

0.05" 0.05" 0.05"

0.35" 0.35" 0.35"

0.3" 0.3" 0.3"

0.25" 0.25" 0.25"

0.2" 0.2" 0.2"

0.15" 0.15" 0.15"

0.1" 0.1" 0.1"