Bayes_Expected_Utility
Bayes_Expected_Utility
and Decision
Theory
Unit 1: A Brief Tour of Bayesian
Inference and Decision Theory
(There may be changes especially in second half; we may not make it to last unit)
𝑎 ∗ = argmax 𝐸#(%) [𝑢(𝑐(𝑠, 𝑎))|𝑎] For brevity, we may write E[u(a)] for 𝐸! [𝑢(𝑐(𝑠, 𝑎))|𝑎]
"
• Caveat emptor:
• How good it is for you depends on fidelity of model to your beliefs and preferences
©Kathryn Blackmond Laskey Spring 2023 Unit 1 v2- 7 -
Illustrative Example:
Highly Oversimplified Decision Problem
• Expected utility:
• Treat: .3´90 + .7´90 = 90
• Don't treat: .3´0 + .7´100 = 70
• Best action is aT (treat patient) u(cDN ) 0
P(sD) = 0.3 P(sW) = 0.7
Expected Utility
70
• 60
30
E[U(Treat)]
P(Sick)
©Kathryn Blackmond Laskey Spring 2023 Unit 1 v2- 10 -
Interlude: Review of Probability Basics
H 2|E
" $(|& " &|$( "($()
= P(E)>0, P(H2)>0
H1 |"
" $) |& " &|$) "($) )
H 2|E
• H2
2 30 |4 2 4|30 2(30 ) E
= H1 H1 |"
2 31 |4 2 4|31 2(31 )
• Terminology:
𝑃(𝐻) - The prior probability of 𝐻 𝑃(𝐸|𝐻) - The likelihood for E given H
𝑃(𝐸) - The predictive probability of 𝐸 𝑃(𝐻|𝐸) - The posterior probability of 𝐻 given 𝐸
2 3|4' 2(4' )
- The likelihood ratio for 𝐻𝑖 versus 𝐻𝑗 - The prior odds ratio for 𝐻𝑖 versus Hj
2 3|4( 2(4( )
• Test characteristics:
• Sensitivity: P(tP | sD) = 0.95
• Specificity: P(tN | sW) = 0.85
• How does the model change if test results are available?
• Take test, observe outcome t
• Revise prior beliefs P(sD) to obtain posterior beliefs P(sD|t)
• Re-compute optimal decision using P(sD|t)
©Kathryn Blackmond Laskey Spring 2023 Unit 1 v2- 16 -
Disease Example with Test
EU(aN | tN )
• If test is positive we should treat, with EU(aT) = EU(aT | tP) = 90 EU(aN | tP ) 26.9
90
//
94.6
97.5
94.6 97.5
//
70
//
90
//
Expected Utility
(without test):
70
60
• aT if p > 0.1 50
40
• aN if p < 0.1
30
• What are the strategy regions if we 20
E[U(Treat)]
do a test? 10 E[U(NoTreat)]
0
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
P(Sick)
©Kathryn Blackmond Laskey Spring 2023 Unit 1 v2- 23 -
Expected Utility of FollowTest Policy
as Function of Prior Probability p = P(sD)
• FollowTest strategy treats if test is positive and otherwise not
World Probability Action Utility Before doing test, we think:
State P(t|s) P(s) • There are four possibilities for disease status
Sick, .95p Treat 90 and test results. Their probabilities are shown
Positive in the table
Sick, .05p NoTreat 0 P(s, t) = P(t|s) P(s) = P(s|t) P(t)
Negative • We treat if test is positive and don’t treat if test
Well, .15(1-p) Treat 90 is negative, with utilities shown in last column.
Positive • We multiply probability times utility for each
Well, .85(1-p) NoTreat 100 world state and sum to get the expected utility
Negative of FollowTest
EVSI
90
or p < 8.5/13 = 0.654
85 • NeverTreat: EU(aN) = 100(1 – p)
80
E[U(Treat)]
• FollowTest is better when 98.5 – 13p > 100(1-p)
or p > 1.5/87 = 0.017
E[U(NoTreat)]
E[U[FollowTest)]
75
0 0.2 0.4 0.6 0.8 1
P(Sick)
c≥7.2
94
• NeverTreat: EU(aN) = 100(1 – p)
92 Gain from doing test with c=1 at p=0.3
• FollowTest is better when
98.5 – 13p – c > 100(1-p) or p > (1.5+c)/87
90
88 (c is cost of test)
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
P(Sick)
c=1"
• c≥7.2"
92"
EVSI%as%a%Func,on%of%Prior%Probability%
8"
7"
6"
5"
4"
Range of optimality
3"
of test with c=1
2"
1"
0"
©Kathryn Blackmond Laskey Spring 2023 Unit 1 v2-0"28 - 0.2" 0.4" 0.6" 0.8" 1"
Summary : Value of Information and Strategy
Regions
• Collecting information may have value if it might change your decision
• Expected value of perfect information (EVPI) is utility gain from knowing true value of
uncertain variable
• Expected value of sample information (EVSI) is utility gain from available information
• In our example, EVSI is positive for 0.017 < p < 0.654
• If 0.017 ≤ p ≤ 0.1 EVSI is 87p - 1.5
• If 0.1 ≤ p ≤ 0.654 EVSI is 8.5 – 13p
• If p = 0.3 EVSI is 8.5 – 13p = 4.6 (testing is optimal)
• Costly information has value when EVSI is greater than cost of information
• In our example:
• If 0.017 ≤ p ≤ 0.1 Test if 87p - 1.5 > c (where c is cost of test)
• If 0.1 ≤ p ≤ 0.654 Test if 8.5 – 13p > c
• If p = 0.3 Test if 4.6 > c (test if c is less than 4.6)
©Kathryn Blackmond Laskey Spring 2023 Unit 1 v2- 29 -
What if a Probability is Unknown?
• The model for our medical example depends on several parameters
• Prior probability of disease
• Sensitivity of test
• Specificity of test
• Usually these probabilities are estimated from data and/or expert
judgment
• “Randomized clinical trials have established that Test T has sensitivity 0.95 and
specificity 0.85 for Disease D”
• “Given the presenting symptoms and my clinical judgment, I estimate a 30%
probability that the patient has Disease D.”
• How does a Bayesian combine data and expert judgment?
• Use clinical judgment to quantify uncertainty about as a probability distribution
• Gather data
• Use Bayes rule to obtain posterior distribution for the unknown probability
• If appropriate, use clinical judgment to adjust results of studies to apply to a
particular patient
©Kathryn Blackmond Laskey Spring 2023 Unit 1 v2- 30 -
Example: Bayesian Inference about a
Probability (with a very small sample)
The unknown probability actually has a
• Assign prior distribution to possible values of disease continuous range of values. We will treat
continuous distributions later. For now we
probability p approximate with a finite set of values.
• Although p can be any real number between zero and 1,
we pretend there are only 20 equally spaced possible values
• Our prior distribution is consistent with our estimate p=0.3
• Observe 10 independent and identically distributed
(iid) cases
• (X1, X2, X3, X4, X5, X6, X7, X8, X9 X10) = (0, 1, 0, 0, 0, 0, 1, 0, 0, 1)
• Cases 2, 7, and 10 have disease; the rest do not
• How do we find the posterior distribution of the
unknown probability?
©Kathryn Blackmond Laskey Spring 2023 Unit 1 v2- 31 -
Posterior Distribution of Disease Parameter
0.35"
0.25"
•
0.2"
• 20% of patients in sample have the disease
0.15"
0.1"
• Posterior distribution becomes more concentrated
around 1/5 as sample size gets larger
0.05"
0"
0.025" 0.075" 0.125" 0.175" 0.225" 0.275" 0.325" 0.375" 0.425" 0.475" 0.525" 0.575" 0.625" 0.675" 0.725" 0.775" 0.825" 0.875" 0.925" 0.975"
Sample size 20: 4 with, 16 without Sample size 80: 16 with, 64 without
0.4" 0.4"
0.35" 0.35"
0.3" 0.3"
0.25" 0.25"
0.2" 0.2"
0.15" 0.15"
0.1" 0.1"
0.35" 0.35"
0.35"
0.3" 0.3"
0.3"
0.25" 0.25"
0.25"
0" 0"
0"
0.025" 0.075" 0.125" 0.175" 0.225" 0.275" 0.325" 0.375" 0.425" 0.475" 0.525" 0.575" 0.625" 0.675" 0.725" 0.775" 0.825" 0.875" 0.925" 0.975" 0.025" 0.075" 0.125" 0.175" 0.225" 0.275" 0.325" 0.375" 0.425" 0.475" 0.525" 0.575" 0.625" 0.675" 0.725" 0.775" 0.825" 0.875" 0.925" 0.975"
0.025" 0.075" 0.125" 0.175" 0.225" 0.275" 0.325" 0.375" 0.425" 0.475" 0.525" 0.575" 0.625" 0.675" 0.725" 0.775" 0.825" 0.875" 0.925" 0.975"
Horizontal axis is q =
• Prior distribution favors high probabilities: P(sD); height of bar is
probability that Θ =q
Prior%Distribu+on% Posterior(Distribu,on(for(1(Case(in(5(Samples( Posterior(Distribu,on(for(10(Cases(in(50(Samples(
0.4" 0.4" 0.4"
• Frequentist Inference
• Data are drawn from a distribution of known form but with an unknown parameter (this includes
“nonparametric” statistics in which the unknown parameter is the distribution itself)
• Distribution may arise from explicit randomization or may be considered “close enough” to random
• Inference treats data as random and parameter as fixed
• For example: A sample X1,…XN is drawn from a normal distribution with mean Θ . A 95% confidence
interval is constructed. The interpretation is:
If an experiment like this were performed many times we would expect in 95% of the cases that an interval calculated
by the procedure we applied would include the true value of Θ .
• A frequentist can say nothing about any individual experiment about q!
©Kathryn Blackmond Laskey Spring 2023 Unit 1 v2- 38 -
The Subjectivist
• A subjectivist believes:
• Probability as an expression of a rational agent’s degrees of belief about uncertain
propositions.
• Rational agents may disagree. There is no “one correct probability.”
• If the agent receives feedback her assessed probabilities will in the limit converge to
observed frequencies
• Subjectivist Inference:
• Probability distributions are assigned to unknowns (parameters and observations).
• Condition on knowns; use probability to express uncertainty about unknowns
• For example: A sample X1,…XN is drawn from a normal distribution with mean q having prior
distribution g(q). A 95% posterior credible interval is constructed, and the result is the
interval (3.7, 4.9). The interpretation is:
Given prior distribution for q and observed data, the probability that q lies between 3.7 and 4.9 is 95%.
• A subjectivist can draw conclusions about what we should believe about 𝜃 and
about what we should expect on the next trial
SP1. For any two uncertain events A and B, one of the following relations holds: A Comparability
B, A B or A ~ B.
SP2. If A 1, A2, B1, and B2 are four events such that A1∩A2=∅, B1∩B2=∅, and if Ai Union of disjoint
Bi, for i = 1,2, then A1∪A2 B1∪B2. If in addition Ai Bi for either i=1 or events
i=2, then A1∪A2 B1∪B2.
SP3. If A is any event, then ∅ A. Furthermore, there is some event A0 for which Null lottery
∅ A0.
SP4. If A1⊃A2⊃ … is a decreasing sequence of events, and B is some event such Decreasing
sequences
that Ai B for i=1, 2, …, then i=1 Ai B .
∞
SP5. There is an experiment, with a numerical outcome between the values of 0 and Existence of
1, such that if Ai is the event that the outcome x lies within the interval ai ≤ x ≤ uniform
bi, for i=1,2, then A1 A2 if and only if (b1-a1) ≤ (b2-a2). distribution
©Kathryn Blackmond Laskey Spring 2023 Unit 1 v2- 44 -
Axioms for Utility
Watson and Buede, 1987
A reward is a prize the decision maker cares about. A lottery is a situation in which
the decision maker will receive one of the possible rewards, where the reward to be
received is governed by a probability distribution. There is a qualitative relationship
of relative preference * , that operates on lotteries, that satisfies the following
conditions:
SU1. For any two lotteries L1 and L2, either L1 * L2, L1 * L2, or L1~*L2.
Furthermore, if L1, L2, and L3 are any lotteries such that L1 * L2 and Comparability and transitivity
L2 * L3, then L1 * L3.
SU2. If r 1, r2 and r3 are rewards such that r1 * r2 * r3, then there exists a
probability p such that [r1: p; r3: (1-p)] ~* r2, where [r1:p; r3:(1-p)] is a Lottery equivalence
lottery that pays r1 with probability p and r3 with probability (1-p ).
SU3. If r1 ~* r2 are rewards, then for any probability p and any reward r3, Substitutability of equivalent
[r1: p; r3: (1-p)] ~* [r2: p; r3: (1-p )]
rewards
SU4. If r1 * r2 are rewards, then [r1: p; r2: (1-p)] * [r1: q; r2: (1-q)] if and
only if p > q. Higher chance of better reward
SU5. Consider three lotteries, Li = [r1: pi; r2: (1-pi)], i = 1, 2, 3, giving different
probabilities of the two rewards r1 and r2. Suppose lottery M gives entry to Compound lottery
lottery L2 with probability q and L3 with probability 1-q. Then L1~*M if and
©Kathryn Blackmondonly if p1 = qp2 + (1-q)p3. Spring 2023
Laskey Unit 1 v2- 45 -
Probabilities and Utilities
• If your beliefs satisfy SP1-SP5, then there is a probability
distribution Pr(⋅) over events such that for any two events A1
and A2, Pr(A1) ≥ Pr(A2) if and only if A1 A2.
• If your preferences satisfy SU1-SU5, then there is a utility
function u(⋅) defined on rewards such that for any two lotteries
L1 and L2, L1 * L2 if and only if E[u(L1)] ≥ E[u(L2)], where
E[⋅] denotes the expected value with respect to the probability
distribution Pr(⋅).
?