0% found this document useful (0 votes)
6 views

SDA Bayes

Uploaded by

krittpooomdamr
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
6 views

SDA Bayes

Uploaded by

krittpooomdamr
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 12

Signal & Data Analysis in Neuroscience

Bayesian Decisions

Izhar Bar-Gad
Room: 408 Phone: 7141 Email: [email protected]

Outline

 Bayesian decisions

 The Bayesian student

 The Bayesian doctor

Taken (almost) entirely from course:


Visual Recognition (236875) in the Technion

Decision theory

 Decision theory is an interdisciplinary


area of study concerned with:

1. How decision-makers make decisions.


2. How optimal decisions can be reached.

 Decoding of neural information (and


other types of encodings) relies heavily
on decision theory.

1
Simple decision example

 Suppose that we know (via prior knowledge)


that 25% of the newborns on April 1st are male
and 75% are females.

 Our friend just had a newborn baby on that


day but we forgot to ask about his/her gender.
Should we buy the baby a pink or blue shirt?
(Yes, I know that colors don’t matter but to this specific mother, they do)

 Thus, we need to guess the value of the


variable X reflecting the state of nature using
the a priori probabilities.

Decision error

 Decision error  the probability of picking one


possibility when the state of nature is different.

 The decision is done to minimize the error.

 In this example
 If we decide boy  P ( girl )
P (error )  
 If we decide girl  P(boy)

Simple decision example –


adding features
 Some features may give us information about
the state of nature.
 Assuming that we know the weight distribution of boys
(blue) and girls (red) and the happy mother told us that
the baby weighs 4Kg, which shirt should we bring?
0.7

0.6

0.5

0.4
PDF

0.3

0.2

0.1

0
0 2 4 6 8
Weight (Kg)

2
Simple decision example –
conditional probability

 Assuming that the weight is represented by


the random variable Y. The distribution of the
weights assuming the gender is described by
the class conditional probability p(y|x)

 So now the question becomes: what is that


probability of a specific gender given the
weight  p(x|y) ???

Conditional probability

When 2 variables are statistically dependent,


knowing the value of one of them lets us get a
better estimate of the value of the other one.

This is expressed by the conditional probability


of x given y:
P ( x, y )
P( x | y ) 
P( y)

If x and y are statistically independent, then


P( x | y )  P ( x )

Bayes’ rule I

 The law of total probability: If event X can occur


in m different ways x1, x2,…, xm and if they are
mutually exclusive  the probability of X is the
sum of the probabilities x1, x2,…, xm .

P ( y )   P( x, y ).
x

 From the definition of conditional probability

P ( x, y )  P ( x | y ) P ( y )  P ( y | x ) P ( x )

3
Bayes’ rule II

P( y | x) P( x)
P( x | y ) 
P( y)

P( y | x) P ( x ) P ( y | x) P( x )
P( x | y)  
 P ( x, y )  P ( y | x ) P ( x )
x x

likelihood  prior
posterior 
evidence

10

Bayes' rule – continuous case

 For continuous random variable we refer to


densities rather than probabilities; in particular,
p ( x, y )
p( x | y) 
p( y)

 The Bayes’ rule for densities becomes:


p( y | x ) p ( x )
p( x | y )  


 p( y | x) p( x)dx

11

Bayes’ rule - importance

 x is termed the cause & y is termed the effect.


Assuming x is present, we know the likelihood
of y to be observed

 Bayes’ rule allows to determine the likelihood of


a cause x given an observation y. Note: there
may be many causes producing y.

 Bayes’ rule shows how probability for x


changes from prior p(x) before we observe
anything, to posterior p(x| y) once we have
observed y.

12

4
Bayes’ decision rule

 Decision:
boy : if P(boy|weight) > P(girl|weight)
girl : otherwise
or
boy : if P (weight |boy)P(boy) > P(weight|girl)P(girl)
girl : otherwise

 Error:
 If we decide boy  P( girl | weight )
P (error | weight )  
 If we decide girl  P(boy | weight )
P(error|weight) = min [P(boy|weight) , P(girl|weight)]

13

Loss function

 The problem arises when different decisions


have different consequences (for example: pink shirt for a boy
is less acceptable in many cultures than a blue one for a girl).

 Loss (or cost) function states exactly how costly


each action is, and is used to convert a
probability determination into a decision. Loss
functions let us treat situations in which some
kinds of classification mistakes are more costly
than others.

14

Expected loss

 Suppose that we observe a particular y and


that we contemplate taking action  i .

 If the true state of nature is xj the loss is  ( i | x j )

 Before we have done an observation


C
the expected loss is R( )   ( | x ) P( x )
i  i j j j 1

 After the observation the expected risk which


is called now the conditional risk is given by
C
R( i | y )    ( i | x j ) P( x j | y )
j 1

15

5
Bayes’ decision rule

 Compute the conditional risk for each action


C
R( i | y )    ( i | x j ) P( x j | y )
j 1

 Select the action  i for which R( i | y) is minimal.

 The resulting minimum risk is called the Bayes


Risk, denoted R*, and is the best performance
that can be achieved.

16

Optimal Bayes Decision Strategies

 A strategy or decision function (y) is a


mapping from observations to actions.

 The total risk of a decision function is given by


E p ( y ) [ R ( ( y ) | y )]   p ( y )  R( ( y ) | y )
y
 A decision function is optimal if it minimizes the
total risk. This optimal total risk is called Bayes
risk.

17

Outline

 Bayesian decisions

 The Bayesian student

 The Bayesian doctor

18

6
The student dilemma

A student needs to achieve a decision on which


courses to take, based only on his first lecture.

From his previous experience, he knows the


prior probabilities :

Quality of good fair bad


the course
P(xj)  prior 0.2 0.4 0.4

19

The student dilemma

 The student also knows the class-conditionals:

P(y|xj) good fair bad


Interesting lecture 0.8 0.5 0.1
Boring lecture 0.2 0.5 0.9

 The loss function is given by the matrix


(ai|xj) good course fair course bad course
Taking the course 0 5 10
Not taking the 20 5 0
course

20

The student dilemma

The student wants to make an optimal decision


of taking the course based on the first lecture.

The probability of hearing an interesting lecture:

P(interesting)= P(interesting|good)* P(good)


+ P(interesting|fair)* P(fair)
+ P(interesting|bad)* P(bad)
= 0.8*0.2+0.5*0.4+0.1*0.4 = 0.4

P(boring)= 1-P(interesting) = 1-0.4 =0.6

Assuming that the lecture was interesting, what


are the posterior probabilities of each of the 3
possible “states of nature”?

21

7
The student dilemma

P(good course|interesting lecture)


P(interesting|good)Pr(good) 0.8*0.2
   0.4
P(interesting) 0.4

P(fair|interesting)
P(interesting|fair)P(fair) 0.5* 0.4
   0.5
P(interesting) 0.4

 We can get P(bad|interesting)=0.1 either by the


same method, or by noting that it complements
to 1 the above two.

22

The student dilemma

 The student needs to minimize the conditional risk.


c
R( i | y )    ( i | x j ) P( x j | y)
j 1
 In this case there are only two possible actions: taking or not
taking the course.

R(taking|interesting)= P(good|interesting)λ(taking course|good)


+P(fair|interesting)λ(taking course|fair)
+P(bad|interesting)λ(taking course|bad)
=0.4*0+0.5*5+0.1*10=3.5

R(not taking|interesting)=P(good|interesting)λ(not taking course|good)


+P(fair|interesting)λ(not taking course|fair)
+P(bad|interesting)λ(not taking course|bad)
=0.4*20+0.5*5+0.1*0=10.5

23

The student dilemma

 So, if the first lecture was interesting, the


student will minimize the conditional risk by
taking the course.

 In order to construct the full decision function,


we need to define the risk minimization action
for the case of boring lecture, as well.

24

8
Outline

 Bayesian decisions

 The Bayesian student

 The Bayesian doctor

25

The Bayesian Doctor Example

A person doesn’t feel well and goes to the doctor.


Assume two states of nature:
x1 : The person has a common flu.
x2 : The person has a vicious bacterial infection.

The doctors prior is: p ( x1 )  0.9 p ( x2 )  0.1

This doctor has two possible actions:


a1 = Prescribe hot tea.
a2 = Prescribe antibiotics.

The doctor can use prior and predict optimally: always flu.

Therefore doctor will always prescribe hot tea.

26

The Bayesian Doctor Example

 But there is very high risk: Although this doctor can


diagnose with very high rate of success using the prior,
(s)he can lose a patient once in a while.

 Denote the two possible actions:


a1 = prescribe hot tea
a2 = prescribe antibiotics

 Now assume the following cost (loss) matrix:

x1 x2
i , j  a1 0 10
a2 1 0

27

9
The Bayesian Doctor Example

 Choosing a1 results in expected risk of


R (a1 )  p ( x1 )  1,1  p ( x2 )  1,2

 0  0.1 10  1

 Choosing a2 results in expected risk of


R(a2 )  p( x1 )  2,1  p ( x2 )  2,2

 0.9 1  0  0.9

 So, considering the costs it’s much better


(and optimal!) to always give antibiotics.

28

The Bayesian Doctor Example

 However, doctors can also produce some


observations such as performing a blood test.

 The possible results of the blood test are:


y1 = negative (no bacterial infection)
y2 = positive (infection)

 Blood tests are never conclusive leading to the


class conditional probabilities.
p ( y1 | x1 )  0.8 p ( y2 | x1 )  0.2
p ( y1 | x2 )  0.3 p ( y2 | x2 )  0.7

29

The Bayesian Doctor Example

 Define the conditional risk given the


observation R (a | y )  p (x | y )  
i 

j i, j
j

 We would like to compute the conditional


risk for each action and observation so that
the doctor can choose an optimal action
that minimizes risk.
 How can we compute p ( x j | y ) ?
 We use the class conditional probabilities
and Bayes inversion rule.

30

10
The Bayesian Doctor Example

 The results of the blood test follow the


probabilities:

p( y1 )  p( y1 | x1 )  p( x1 )  p ( y1 | x2 )  p( x2 )
 0.8  0.9  0.3  0.1
 0.75
p ( y2 )  1  p ( y1 )  0.25

31

The Bayesian Doctor Example

R( a1 | y1 )  p( x1 | y1 )  1,1  p ( x2 | y1 )  1,2
 0  p( x2 | y1 ) 10
p ( y1 | x2 )  p ( x2 )
 10 
p ( y1 )
0.3  0.1
 10   0.4
0.75

R (a2 | y1 )  p ( x1 | y1 )  2,1  p ( x2 | y1 )  2,2


 p ( x1 | y1 ) 1  p ( x2 | y1 )  0
p( y1 | x1 )  p ( x1 )

p ( y1 )
0.8  0.9
  0.96
0.75

32

The Bayesian Doctor Example

R (a1 | y2 )  p( x1 | y2 )  1,1  p( x2 | y2 )  1,2


 0  p ( x2 | y2 ) 10
p( y2 | x2 )  p ( x2 )
 10 
p ( y2 )
0.7  0.1
 10   2.8
0.25

R (a2 | y2 )  p ( x1 | y2 )  2,1  p( x2 | y2 )  2,2


 p ( x1 | y2 ) 1  p ( x2 | y2 )  0
p ( y2 | x1 )  p ( x1 )

p ( y2 )
0.2  0.9
  0.72
0.25

33

11
The Bayesian Doctor Example

 To summarize: R (a1 | y1 )  0.4


R (a2 | y1 )  0.96
R (a1 | y2 )  2.8
R (a2 | y2 )  0.72

 Given an observation y, we can minimize the


expected loss by minimizing the conditional risk.

 The doctor chooses:


 Hot tea if blood test is negative
 Antibiotics otherwise.

34

Optimal Bayes Decision Strategies

 The total risk of a decision function is given by


E p ( y ) [ R ( ( y ) | y )]   p( y )  R ( ( y ) | y )
y
 A decision function is optimal if it minimizes the
total risk. This optimal total risk is called Bayes
risk.

 In the Bayesian doctor example:


 The prior risk (the doctor always gives antibiotics): 0.9
 The Bayes risk: 0.75*0.4+0.25*0.72=0.48

35

12

You might also like