0% found this document useful (0 votes)

6 views

SDA Bayes

Uploaded by

krittpooomdamr

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

6 views

SDA Bayes

Uploaded by

krittpooomdamr

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 12

Signal & Data Analysis in Neuroscience

Bayesian Decisions

Izhar Bar-Gad
Room: 408 Phone: 7141 Email: [email protected]

Outline

 Bayesian decisions

 The Bayesian student

 The Bayesian doctor

Taken (almost) entirely from course:

Visual Recognition (236875) in the Technion

Decision theory

 Decision theory is an interdisciplinary

area of study concerned with:

1. How decision-makers make decisions.

2. How optimal decisions can be reached.

 Decoding of neural information (and

other types of encodings) relies heavily
on decision theory.

1
Simple decision example

 Suppose that we know (via prior knowledge)

that 25% of the newborns on April 1st are male
and 75% are females.

 Our friend just had a newborn baby on that

day but we forgot to ask about his/her gender.
Should we buy the baby a pink or blue shirt?
(Yes, I know that colors don’t matter but to this specific mother, they do)

 Thus, we need to guess the value of the

variable X reflecting the state of nature using
the a priori probabilities.

Decision error

 Decision error  the probability of picking one

possibility when the state of nature is different.

 The decision is done to minimize the error.

 In this example
 If we decide boy  P ( girl )
P (error )  
 If we decide girl  P(boy)

Simple decision example –

adding features
 Some features may give us information about
the state of nature.
 Assuming that we know the weight distribution of boys
(blue) and girls (red) and the happy mother told us that
the baby weighs 4Kg, which shirt should we bring?
0.7

0.6

0.5

0.4
PDF

0.3

0.2

0.1

0
0 2 4 6 8
Weight (Kg)

2
Simple decision example –
conditional probability

 Assuming that the weight is represented by

the random variable Y. The distribution of the
weights assuming the gender is described by
the class conditional probability p(y|x)

 So now the question becomes: what is that

probability of a specific gender given the
weight  p(x|y) ???

Conditional probability

When 2 variables are statistically dependent,

knowing the value of one of them lets us get a
better estimate of the value of the other one.

This is expressed by the conditional probability

of x given y:
P ( x, y )
P( x | y ) 
P( y)

If x and y are statistically independent, then

P( x | y )  P ( x )

Bayes’ rule I

 The law of total probability: If event X can occur

in m different ways x1, x2,…, xm and if they are
mutually exclusive  the probability of X is the
sum of the probabilities x1, x2,…, xm .

P ( y )   P( x, y ).
x

 From the definition of conditional probability

P ( x, y )  P ( x | y ) P ( y )  P ( y | x ) P ( x )

3
Bayes’ rule II

P( y | x) P( x)
P( x | y ) 
P( y)

P( y | x) P ( x ) P ( y | x) P( x )
P( x | y)  
 P ( x, y )  P ( y | x ) P ( x )
x x

likelihood  prior
posterior 
evidence

Bayes' rule – continuous case

 For continuous random variable we refer to

densities rather than probabilities; in particular,
p ( x, y )
p( x | y) 
p( y)

 The Bayes’ rule for densities becomes:

p( y | x ) p ( x )
p( x | y )  


 p( y | x) p( x)dx

Bayes’ rule - importance

 x is termed the cause & y is termed the effect.

Assuming x is present, we know the likelihood
of y to be observed

 Bayes’ rule allows to determine the likelihood of

a cause x given an observation y. Note: there
may be many causes producing y.

 Bayes’ rule shows how probability for x

changes from prior p(x) before we observe
anything, to posterior p(x| y) once we have
observed y.

4
Bayes’ decision rule

 Decision:
boy : if P(boy|weight) > P(girl|weight)
girl : otherwise
or
boy : if P (weight |boy)P(boy) > P(weight|girl)P(girl)
girl : otherwise

Loss function

 The problem arises when different decisions

have different consequences (for example: pink shirt for a boy
is less acceptable in many cultures than a blue one for a girl).

 Loss (or cost) function states exactly how costly

each action is, and is used to convert a
probability determination into a decision. Loss
functions let us treat situations in which some
kinds of classification mistakes are more costly
than others.

Expected loss

 Suppose that we observe a particular y and

that we contemplate taking action  i .

 If the true state of nature is xj the loss is  ( i | x j )

 Before we have done an observation

C
the expected loss is R( )   ( | x ) P( x )
i  i j j j 1

 After the observation the expected risk which

is called now the conditional risk is given by
C
R( i | y )    ( i | x j ) P( x j | y )
j 1

5
Bayes’ decision rule

 Compute the conditional risk for each action

C
R( i | y )    ( i | x j ) P( x j | y )
j 1

 Select the action  i for which R( i | y) is minimal.

 The resulting minimum risk is called the Bayes

Risk, denoted R*, and is the best performance
that can be achieved.

Optimal Bayes Decision Strategies

 A strategy or decision function (y) is a

mapping from observations to actions.

 The total risk of a decision function is given by

E p ( y ) [ R ( ( y ) | y )]   p ( y )  R( ( y ) | y )
y
 A decision function is optimal if it minimizes the
total risk. This optimal total risk is called Bayes
risk.

Outline

 Bayesian decisions

 The Bayesian student

 The Bayesian doctor

6
The student dilemma

A student needs to achieve a decision on which

courses to take, based only on his first lecture.

From his previous experience, he knows the

prior probabilities :

Quality of good fair bad

the course
P(xj)  prior 0.2 0.4 0.4

The student dilemma

 The student also knows the class-conditionals:

P(y|xj) good fair bad

Interesting lecture 0.8 0.5 0.1
Boring lecture 0.2 0.5 0.9

 The loss function is given by the matrix

(ai|xj) good course fair course bad course
Taking the course 0 5 10
Not taking the 20 5 0
course

The student dilemma

The student wants to make an optimal decision

of taking the course based on the first lecture.

The probability of hearing an interesting lecture:

P(interesting)= P(interesting|good)* P(good)

+ P(interesting|fair)* P(fair)
+ P(interesting|bad)* P(bad)
= 0.8*0.2+0.5*0.4+0.1*0.4 = 0.4

P(boring)= 1-P(interesting) = 1-0.4 =0.6

Assuming that the lecture was interesting, what

are the posterior probabilities of each of the 3
possible “states of nature”?

7
The student dilemma

P(good course|interesting lecture)

P(interesting|good)Pr(good) 0.8*0.2
   0.4
P(interesting) 0.4

P(fair|interesting)
P(interesting|fair)P(fair) 0.5* 0.4
   0.5
P(interesting) 0.4

 We can get P(bad|interesting)=0.1 either by the

same method, or by noting that it complements
to 1 the above two.

The student dilemma

 The student needs to minimize the conditional risk.

c
R( i | y )    ( i | x j ) P( x j | y)
j 1
 In this case there are only two possible actions: taking or not
taking the course.

R(taking|interesting)= P(good|interesting)λ(taking course|good)

+P(fair|interesting)λ(taking course|fair)
+P(bad|interesting)λ(taking course|bad)
=0.4*0+0.5*5+0.1*10=3.5

R(not taking|interesting)=P(good|interesting)λ(not taking course|good)

+P(fair|interesting)λ(not taking course|fair)
+P(bad|interesting)λ(not taking course|bad)
=0.4*20+0.5*5+0.1*0=10.5

The student dilemma

 So, if the first lecture was interesting, the

student will minimize the conditional risk by
taking the course.

 In order to construct the full decision function,

we need to define the risk minimization action
for the case of boring lecture, as well.

8
Outline

 Bayesian decisions

 The Bayesian student

 The Bayesian doctor

The Bayesian Doctor Example

A person doesn’t feel well and goes to the doctor.

Assume two states of nature:
x1 : The person has a common flu.
x2 : The person has a vicious bacterial infection.

The doctors prior is: p ( x1 )  0.9 p ( x2 )  0.1

This doctor has two possible actions:

a1 = Prescribe hot tea.
a2 = Prescribe antibiotics.

The doctor can use prior and predict optimally: always flu.

Therefore doctor will always prescribe hot tea.

The Bayesian Doctor Example

 But there is very high risk: Although this doctor can

diagnose with very high rate of success using the prior,
(s)he can lose a patient once in a while.

 Denote the two possible actions:

a1 = prescribe hot tea
a2 = prescribe antibiotics

 Now assume the following cost (loss) matrix:

x1 x2
i , j  a1 0 10
a2 1 0

9
The Bayesian Doctor Example

 Choosing a1 results in expected risk of

R (a1 )  p ( x1 )  1,1  p ( x2 )  1,2

 0  0.1 10  1

 Choosing a2 results in expected risk of

R(a2 )  p( x1 )  2,1  p ( x2 )  2,2

 0.9 1  0  0.9

 So, considering the costs it’s much better

(and optimal!) to always give antibiotics.

The Bayesian Doctor Example

 However, doctors can also produce some

observations such as performing a blood test.

 The possible results of the blood test are:

y1 = negative (no bacterial infection)
y2 = positive (infection)

 Blood tests are never conclusive leading to the

class conditional probabilities.
p ( y1 | x1 )  0.8 p ( y2 | x1 )  0.2
p ( y1 | x2 )  0.3 p ( y2 | x2 )  0.7

The Bayesian Doctor Example

 Define the conditional risk given the

observation R (a | y )  p (x | y )  
i 

j i, j
j

 We would like to compute the conditional

risk for each action and observation so that
the doctor can choose an optimal action
that minimizes risk.
 How can we compute p ( x j | y ) ?
 We use the class conditional probabilities
and Bayes inversion rule.

10
The Bayesian Doctor Example

 The results of the blood test follow the

probabilities:

p( y1 )  p( y1 | x1 )  p( x1 )  p ( y1 | x2 )  p( x2 )
 0.8  0.9  0.3  0.1
 0.75
p ( y2 )  1  p ( y1 )  0.25

The Bayesian Doctor Example

R( a1 | y1 )  p( x1 | y1 )  1,1  p ( x2 | y1 )  1,2
 0  p( x2 | y1 ) 10
p ( y1 | x2 )  p ( x2 )
 10 
p ( y1 )
0.3  0.1
 10   0.4
0.75

R (a2 | y1 )  p ( x1 | y1 )  2,1  p ( x2 | y1 )  2,2

 p ( x1 | y1 ) 1  p ( x2 | y1 )  0
p( y1 | x1 )  p ( x1 )

p ( y1 )
0.8  0.9
  0.96
0.75

The Bayesian Doctor Example

R (a1 | y2 )  p( x1 | y2 )  1,1  p( x2 | y2 )  1,2

 0  p ( x2 | y2 ) 10
p( y2 | x2 )  p ( x2 )
 10 
p ( y2 )
0.7  0.1
 10   2.8
0.25

R (a2 | y2 )  p ( x1 | y2 )  2,1  p( x2 | y2 )  2,2

 p ( x1 | y2 ) 1  p ( x2 | y2 )  0
p ( y2 | x1 )  p ( x1 )

p ( y2 )
0.2  0.9
  0.72
0.25

11
The Bayesian Doctor Example

 To summarize: R (a1 | y1 )  0.4

R (a2 | y1 )  0.96
R (a1 | y2 )  2.8
R (a2 | y2 )  0.72

 Given an observation y, we can minimize the

expected loss by minimizing the conditional risk.

 The doctor chooses:

 Hot tea if blood test is negative
 Antibiotics otherwise.

Optimal Bayes Decision Strategies

 The total risk of a decision function is given by

E p ( y ) [ R ( ( y ) | y )]   p( y )  R ( ( y ) | y )
y
 A decision function is optimal if it minimizes the
total risk. This optimal total risk is called Bayes
risk.

 In the Bayesian doctor example:

 The prior risk (the doctor always gives antibiotics): 0.9
 The Bayes risk: 0.75*0.4+0.25*0.72=0.48

Statistical Inference For Engineers and Data Scientists Solutions Manual
No ratings yet
Statistical Inference For Engineers and Data Scientists Solutions Manual
12 pages
Advanced Wind Turbine Technology
100% (1)
Advanced Wind Turbine Technology
351 pages
AIML Lect7 Bayes
No ratings yet
AIML Lect7 Bayes
48 pages
Statistics 512 Notes 25: Decision Theory: of Nature. The Set of All Possible Value of
No ratings yet
Statistics 512 Notes 25: Decision Theory: of Nature. The Set of All Possible Value of
11 pages
Ch1 - Bayesian Analysis
No ratings yet
Ch1 - Bayesian Analysis
5 pages
Bayesian Decision Theory
No ratings yet
Bayesian Decision Theory
65 pages
Theory For Classification and Linear Models (I)
No ratings yet
Theory For Classification and Linear Models (I)
32 pages
Homework1 Solutions
No ratings yet
Homework1 Solutions
5 pages
Bayesian Decision Theory: CS479/679 Pattern Recognition Dr. George Bebis
No ratings yet
Bayesian Decision Theory: CS479/679 Pattern Recognition Dr. George Bebis
64 pages
Bayesian Decision Theory
No ratings yet
Bayesian Decision Theory
63 pages
Bayes Decision Theory
No ratings yet
Bayes Decision Theory
53 pages
Bayes Decision Theory: How To Make Decisions in The Presence of Uncertainty?
No ratings yet
Bayes Decision Theory: How To Make Decisions in The Presence of Uncertainty?
16 pages
Bayesian Theory
No ratings yet
Bayesian Theory
66 pages
Revised Lecture Notes 2
No ratings yet
Revised Lecture Notes 2
16 pages
Unit 2 .Statistical Decision Making-1
No ratings yet
Unit 2 .Statistical Decision Making-1
213 pages
pr2 bayes
No ratings yet
pr2 bayes
44 pages
PR Mod1
No ratings yet
PR Mod1
4 pages
Bayes
No ratings yet
Bayes
10 pages
Lecture 5
No ratings yet
Lecture 5
16 pages
Bayes&Voice Recognition
No ratings yet
Bayes&Voice Recognition
76 pages
Bayesian Decision Making
No ratings yet
Bayesian Decision Making
31 pages
ML BayesionBeliefNetwork Lect12 14
No ratings yet
ML BayesionBeliefNetwork Lect12 14
99 pages
Baes Theory
No ratings yet
Baes Theory
76 pages
Statistics 512 Notes 26: Decision Theory Continued: FX FX D
No ratings yet
Statistics 512 Notes 26: Decision Theory Continued: FX FX D
11 pages
Naive Bayes
No ratings yet
Naive Bayes
21 pages
Bayes Reasoning
No ratings yet
Bayes Reasoning
45 pages
Bayesian
No ratings yet
Bayesian
14 pages
Bayesian Learning: Thanks To Nir Friedman, HU
No ratings yet
Bayesian Learning: Thanks To Nir Friedman, HU
41 pages
Notes On ML
No ratings yet
Notes On ML
42 pages
Lecture 1.2
No ratings yet
Lecture 1.2
7 pages
Lecture 2 BayesianHypothesisTesting
No ratings yet
Lecture 2 BayesianHypothesisTesting
10 pages
RN Notes
No ratings yet
RN Notes
119 pages
L4 Naive Bayes
No ratings yet
L4 Naive Bayes
31 pages
T06 - Bayes Classifiers
No ratings yet
T06 - Bayes Classifiers
22 pages
Point Estimation
No ratings yet
Point Estimation
5 pages
Bayesian Decision Theory
No ratings yet
Bayesian Decision Theory
39 pages
04 Probability and Learning PDF
No ratings yet
04 Probability and Learning PDF
34 pages
Bayes_Expected_Utility
No ratings yet
Bayes_Expected_Utility
50 pages
ML - Unit 2
No ratings yet
ML - Unit 2
15 pages
Pattern Recognition
No ratings yet
Pattern Recognition
76 pages
Bayesian Decision Theory
No ratings yet
Bayesian Decision Theory
5 pages
Bayesian_theory_daniel_restrepo
No ratings yet
Bayesian_theory_daniel_restrepo
8 pages
Bayesian Decision Theory: Prof. Richard Zanibbi
No ratings yet
Bayesian Decision Theory: Prof. Richard Zanibbi
47 pages
Main
No ratings yet
Main
195 pages
PR January20 03 PDF
No ratings yet
PR January20 03 PDF
74 pages
babybayes-master
No ratings yet
babybayes-master
172 pages
Bayes
No ratings yet
Bayes
48 pages
Lec4_Probability Theory and Naive Bayes Classifier
No ratings yet
Lec4_Probability Theory and Naive Bayes Classifier
27 pages
Naive Bayes
No ratings yet
Naive Bayes
29 pages
Module3 - Learning, Uncertainity Lecture Notes. 16861418577274
No ratings yet
Module3 - Learning, Uncertainity Lecture Notes. 16861418577274
30 pages
26-Bayes Rule-16-03-2024
No ratings yet
26-Bayes Rule-16-03-2024
18 pages
System Identification
No ratings yet
System Identification
114 pages
4-Bayesian Theory
No ratings yet
4-Bayesian Theory
65 pages
Quantifying Uncertainty: Presentation For
No ratings yet
Quantifying Uncertainty: Presentation For
15 pages
Decisiontheory 0
No ratings yet
Decisiontheory 0
13 pages
Chapter 4 Bayesian Networks
No ratings yet
Chapter 4 Bayesian Networks
62 pages
Lecture 2 Part 1: Statistical Analysis (Bayesian Decision Theory, Probability Theory)
No ratings yet
Lecture 2 Part 1: Statistical Analysis (Bayesian Decision Theory, Probability Theory)
22 pages
UNIT 5 Artificial Intelligence Notes
No ratings yet
UNIT 5 Artificial Intelligence Notes
20 pages
Stat Risk
No ratings yet
Stat Risk
6 pages
On Knowledge, Choice, Will, Desire, and the Moral Ideal
From Everand
On Knowledge, Choice, Will, Desire, and the Moral Ideal
Russell Hasan
No ratings yet
Math for Computer Applications
From Everand
Math for Computer Applications
The Editors of REA
No ratings yet
Logistic Regression
100% (1)
Logistic Regression
37 pages
Compendium Iim Shillong Analytics and Prod Man
No ratings yet
Compendium Iim Shillong Analytics and Prod Man
68 pages
Generalized Linear Failure Rate Distribution
No ratings yet
Generalized Linear Failure Rate Distribution
23 pages
EM Converge Property
No ratings yet
EM Converge Property
8 pages
Generalized linear mixed models modern concepts methods and applications 1st Edition Stroup - Read the ebook online or download it for a complete experience
No ratings yet
Generalized linear mixed models modern concepts methods and applications 1st Edition Stroup - Read the ebook online or download it for a complete experience
54 pages
Flaxman Et Al. - 2016 - Filter Bubbles, Echo Chambers, and Online News Con
No ratings yet
Flaxman Et Al. - 2016 - Filter Bubbles, Echo Chambers, and Online News Con
23 pages
It14 Belotti
No ratings yet
It14 Belotti
37 pages
Chapter 3 (PR)
No ratings yet
Chapter 3 (PR)
29 pages
An Introduction to Generalized Linear Models Annette J. Dobson 2024 scribd download
100% (4)
An Introduction to Generalized Linear Models Annette J. Dobson 2024 scribd download
55 pages
Applied Regression Analysis
No ratings yet
Applied Regression Analysis
16 pages
Enma104 Lessson2 Probability
No ratings yet
Enma104 Lessson2 Probability
9 pages
ML notes
No ratings yet
ML notes
16 pages
Detail Syllabus For B.A. Part II Honours Anthropology Honours
No ratings yet
Detail Syllabus For B.A. Part II Honours Anthropology Honours
36 pages
Variance Component Estimation & Best Linear Unbiased Prediction (Blup)
100% (1)
Variance Component Estimation & Best Linear Unbiased Prediction (Blup)
16 pages
Discrete Bilal Distribution With Right-Censored Da
No ratings yet
Discrete Bilal Distribution With Right-Censored Da
20 pages
Geo-Risk 2017 Keynote Lectures
100% (5)
Geo-Risk 2017 Keynote Lectures
163 pages
A Practical Guide To Maxent For Modeling Species' Distributions: What It Does, and Why Inputs and Settings Matter
No ratings yet
A Practical Guide To Maxent For Modeling Species' Distributions: What It Does, and Why Inputs and Settings Matter
12 pages
Gretl Guide (251 300)
No ratings yet
Gretl Guide (251 300)
50 pages
SPSS Word Rike
No ratings yet
SPSS Word Rike
8 pages
Teacher Assistants Working With Students With Disability: The Role of Adaptability in Enhancing Their Workplace Wellbeing
No ratings yet
Teacher Assistants Working With Students With Disability: The Role of Adaptability in Enhancing Their Workplace Wellbeing
24 pages
Bolker Et Al 2009 General Mixed Model
No ratings yet
Bolker Et Al 2009 General Mixed Model
9 pages
Phyto 1999 89 11 1088
No ratings yet
Phyto 1999 89 11 1088
16 pages
DOC-20250318-WA0029
No ratings yet
DOC-20250318-WA0029
24 pages
Gauss Quadrature Rule For Integration-More Examples Electrical Engineering
No ratings yet
Gauss Quadrature Rule For Integration-More Examples Electrical Engineering
3 pages
Nordgaard and Rasmusson 2012 Likelihood Ratio As Value of Evidence
No ratings yet
Nordgaard and Rasmusson 2012 Likelihood Ratio As Value of Evidence
13 pages
Ins 657 Risk Management Chapter 3 (2019)
No ratings yet
Ins 657 Risk Management Chapter 3 (2019)
33 pages
Econometrics Likelihood Ratio Wald and Lagrange Multiplier Tests
No ratings yet
Econometrics Likelihood Ratio Wald and Lagrange Multiplier Tests
6 pages
CS 229 Autumn 2016 Problem Set #3 Solutions: Theory & Unsuper-Vised Learning
No ratings yet
CS 229 Autumn 2016 Problem Set #3 Solutions: Theory & Unsuper-Vised Learning
16 pages
MIT18 05S14 Class10slides PDF
No ratings yet
MIT18 05S14 Class10slides PDF
17 pages