Full Download Bayesian Brain Probabilistic Approaches to Neural Coding Computational Neuroscience Kenji Doya PDF DOCX
Full Download Bayesian Brain Probabilistic Approaches to Neural Coding Computational Neuroscience Kenji Doya PDF DOCX
https://ebookfinal.com/download/bayesian-rationality-the-
probabilistic-approach-to-human-reasoning-1st-edition-mike-oaksford/
ebookfinal.com
https://ebookfinal.com/download/memory-and-the-computational-brain-
why-cognitive-science-will-transform-neuroscience-1st-edition-c-r-
gallistel/
ebookfinal.com
https://ebookfinal.com/download/the-probabilistic-mind-prospects-for-
bayesian-cognitive-science-1st-edition-nick-chater/
ebookfinal.com
https://ebookfinal.com/download/advances-in-neural-population-
coding-1st-edition-m-a-l-nicolelis-eds/
ebookfinal.com
Bayesian nets and causality Philosophical and
computational foundations Williamson
https://ebookfinal.com/download/bayesian-nets-and-causality-
philosophical-and-computational-foundations-williamson/
ebookfinal.com
https://ebookfinal.com/download/brain-renaissance-from-vesalius-to-
modern-neuroscience-1st-edition-marco-catani/
ebookfinal.com
https://ebookfinal.com/download/the-computational-brain-patricia-s-
churchland/
ebookfinal.com
https://ebookfinal.com/download/computational-neuroscience-in-
epilepsy-1st-edition-ivan-soltesz/
ebookfinal.com
https://ebookfinal.com/download/coaching-the-brain-practical-
applications-of-neuroscience-to-coaching-1st-edition-joseph-oconnor/
ebookfinal.com
Bayesian Brain Probabilistic Approaches to Neural
Coding Computational Neuroscience Kenji Doya Digital
Instant Download
Author(s): Kenji Doya, Shin Ishii, Alexandre Pouget
ISBN(s): 9780262042383, 026204238X
Edition: Kindle
File Details: PDF, 67.15 MB
Year: 2007
Language: english
Bayesian Brain
PROBABILISTIC APPROACHES TO
NEURAL CODING
edited by
KENJl DOYA, SHIN ISHII,
ALEXANDRE POUGET,
AND RAJESH P. W. RAO
Bayesian Brain
Computational Neuroscience
Terrence J. Sejnowski and Tomaso A. Poggio, editors
MIT Press books may be purchased at special quantity discounts for business
or sales promotional use. For information, please email [email protected]
or write to Special Sales Department, The MIT Press, 55 Hayward Street, Cam-
bridge, MA 02142.
This book was typeset by the authors using LATEX2Eandfbookstyle file by Christo-
pher Manning. Printed and bound in the United States of America.
Series Foreword ix
Preface xi
I Introduction 1
1 A Probability Primer
Kenji Doya and Shin Ishii 3
1.1 What Is Probability? 3
1.2 Bayes Theorem 6
1.3 Measuring Information 6
1.4 Making an Inference 8
1.5 Learning from Data 10
1.6 Graphical Models and Other Bayesian Algorithms 13
Index 324
Series Foreword
Terrence J. Sejnowski
Tomaso Poggio
Preface
When we perceive the physical world, make a decision, and take an action, a
critical issue that our brains must deal with is uncertainty: there is uncertainty
associated with the sensory system, the motor apparatus, one's own knowl-
edge, and the world itself. The Bayesian framework of statistical estimation
provides a coherent way of dealing with these uncertainties. Bayesian meth-
ods are becoming increasingly popular not only in building artificial systems
that can handle uncertainty but also in efforts to develop a theory of how the
brain works in the face of uncertainty.
At the core of the Bayesian way of thinking is the Bayes theorem, which
maintains, in its simplest interpretation, that one's belief about the world should
be updated according to the product of what one believed in before and what
evidence has come to light since. The strength of the Bayesian approach comes
from the fact that it offers a mathematically rigorous computational mechanism
for combining prior knowledge with incoming evidence.
A classical example of Bayesian inference is the Kalman filter, which has
been extensively used in engineering, communication, and control over the
past few decades. The Kalman filter utilizes knowledge about the noise in
sensory observations and the dynamics of the observed system to keep track
of the best estimate of the system's current state and its variance. Although
Kalman filters assume linear dynamics and Gaussian noise, recent Bayesian
filters such as particle filters have extended the basic idea to nonlinear, non-
Gaussian systems. However, despite much progress in signal processing and
pattern recognition, no artificial system can yet match the brain's capabilities
in tasks such as speech and natural scene recognition. Understanding how
the brain solves such tasks could offer considerable insights into engineering
artificial systems for similar tasks.
A Bayesian approach can contribute to an understanding of the brain at mul-
tiple levels. First, it can make normative predictions about how an ideal per-
ceptual system combines prior knowledge with sensory observations, enabling
principled interpretations of data from behavioral and psychophysical exper-
iments. Second, algorithms for Bayesian estimation can provide mechanistic
interpretations of neural circuits in the brain. Third, Bayesian methods can
be used to optimally decode neural data such as spike trains. Lastly, a better
understanding the brain's computational mechanisms should have a synergis-
tic impact on the development of new algorithms for Bayesian computation,
leading to new applications and technologies.
Acknowledgments
We wish to thank The Cabinet Office, Japanese Neural Network Society, Tama-
gawa University, and Kyushu Institute of Technology for sponsoring Okinawa
Computational Neuroscience Course 2004, and Sydney Brenner, President of
Okinawa Institute of Science and Technology, for advocating the course.
...
Pveface xzzz
The course and the subsequent book publication would have been impossi-
ble without all the secretarial work by Izumi Nagano. Emiko Asato and Ya-
suhiro Inamine also helped us in manuscript preparation. Our thanks go to
MIT Press editors Barbara Murphy and Kate Blakinger and to Terry Sejnowski
for supporting our book proposal.
AP would like to thank the National Science Foundation (NSF) for their
support. RPNR would like to acknowledge the support of the National Sci-
ence Foundation (NSF),the ONR Adaptive Neural Systems program, the Sloan
Foundation, and the Packard Foundation.
P A R TI
Introduction
1 A Probability Primer
Kenji Doya and Shin Ishii
Here the integral is taken over the whole range of the random variable X .
Despite these differences, we often use the same notation P ( X ) for both
probability distribution and density functions, and call them just probability for
convenience. This is because many of the mathematical formulas and deriva-
tions are valid for both discrete and continuous cases.
or a density P ( X ) as
P ( d a t a hypothesis)P(hypothesis)
P(hypothesisdata) =
P(data)
Here, the Baves theorem dictates how we should update our belief of a certain
hypothesis, P(hypothesis)based on how well the acquired data were predicted
from the hypothesis, P ( d a t a hypothesis). In this context, the terms in the Bayes
theorem (1.10)have conventional names: P ( X ) is called the prior probability and
P ( X 1 Y ) is called the posterior probability of X given Y . P ( Y 1 X ) is a generative
model of observing Y under hypothesis X , but after a particular observation is
made it is called the likelihood of hypothesis X given data y.
The marginal probability P ( Y ) serves as a normalizing denominator so that
the sum of P ( X Y ) for all possible hypotheses becomes unity. It appears as if
the marginal distribution is there just-for the sake of bookkeeping, b u t as we
will see later, it sometimes give us insightful information about the quality of
our inference.
that we can measure the information of two independent events x and y, with
joint probability P ( x ,y) = P ( y ) P ( y ) by
, the sum of each event, i.e.
1 1 1 1
log -= log
P ( x ) P ( y ) = log ( x ) + 1% -
P ( x ,y) - P(Y) '
It is often convenient to use a binary logarithm, and in this case the unit of
information is called a bit.
1.3.1 Entropy
By observing repeatedly, x should follow P ( X ) ,so the average information we
have from observing this variable is
H ( X ) = E [- log P ( X ) ]= )-P ( X ) log P ( X ) , (1.12)
X
+
I ( X ;Y ) = H ( X ) - H ( X 1 Y ) = H ( Y ) - H ( Y X ) = H ( X ) H ( Y ) - H ( X , Y ) .
(1.16)
1 A Probability Primer
1 1 P(x)
log -- log -
p ( x ) = 1% -.
Q( X I Q( X I
If x turns out to follow distribution P ( X ) ,then the average difference is
Here, P ( X l y l , ..., y t P l ) is the posterior of X given the sensory inputs till time
t - 1 and serves as the prior for further estimation at time t.
So far we assumed that the world state X stays the same, but what occurs
if the state changes while we make sequential observations? If we have the
knowledge about how the world state would change, for example, by a state
transition probability P ( X t I X t - I ) , then we can use the posterior at time t - 1
multiplied by this transition probability as the new prior at t:
1 A Probability Primer
\,I = (N-z)!z!
Poisson 1
I
+axe-"
A!
1
I
x = O , l , 2 , ... 1
I
a 1
I
a
-(z-@)~
Gaussian zEe 202
X E R P a2
or normal I I I I
Gamma l b a x a - 1 e -bz x >0 a a
Jr
- -
r(n) b b2
r(a)= xa-l e-x dx
( r ( a )= a! if a is an integer)
Thus the sequence of the Bayesian estimation of the state is given by the fol-
lowing iteration:
This iterative estimation is practically very useful and is in general called the
Bayesfilter. The best known classical example of the Bayes filter is the Kalman
filter, which assumes linear dynamics and Gaussian noise. More recently, a
method called particlefilter has been commonly used for tasks like visual track-
ing and mobile robot localization [ 5 ] .
l o g P ( ~ i..., ; Y T I X I ,
T
... X T , ~ )= z l o g p ( y t x t , O ) =
;
t=l
z
T
t=l
-
(yt - w x t I 2
202
-T log 60.
From this, we can see that finding the ML estimate of the linear weight w is the
same as finding the least mean-squared error (LMSE) estimate that minimizes the
mean-squared error
Ip(0) = Ey [( d log P ( Y 8 )
) 2] = El [- "l o ~ ~ ~ Y H ) ]
A theorem called Crame'r-Rao inequality gives a limit of how small the variance
of an unbiased estimate 8 can be, namely,
For example, after some calculation we can see that the Fisher information
matrix for a Gaussian distribution with parameters 0 = ( p ;a 2 ) is
In the above linear example, if we have a prior knowledge that the slope is
not so steep, we can assume a Gaussian prior of w,
consistent with the observed data, so it is also called evidence. A parameter like
a, of the prior probability for a parameter w is called a hyperparameter, and the
evidence is used for selection of prior probability, or hyperparameter tuning.
The same mechanism can also be used for selecting one of discrete candi-
dates of probabilistic models n'i, .... In this case the marginal probability
for a model P(A)&)can be used for model selection, then called Bayesian criterion
for model selection.
where Pai denotes the parent variables of the variable X i in the DAG, such
a model is called a Bayesian network. For estimation of any missing variable
in a Bayesian network, various belief propagation algorithms, the most famous
one being the message passing algorithm, have been devised in recent years, and
there are excellent textbooks to refer to when it becomes necessary for us to use
one.
References
[I] Papoulis A (1991) Random Variables,and Stochastic Process. New York: McGraw-Hill.
[2] Bayes T (1763) An essay towards solving a problem in the doctrine of chances. Philo-
sophical Transactions of Royal Society, 53,370-418.
[3] Cox RT (1946) Probability, frequency and reasonable expectation. American Journal of
Physics, 14, 1-13.
[4] Shanon CE (1948) A mathematical theory of communication. Bell System Technical JOLLY-
nal, 27,379-423,623-656.
[5] Doucet A, de Freitas ND, Gordon N, eds. (2001) Sequential Monte Carlo Methods in Prac-
tice. New York: Springer-Verlag.
[6] MacKay DJC (2003) Inforrnation Theory, Inference, and Learning Algorithms. Cambridge,
UK: Cambridge University Press.
P A R T I1
Figure 2.1 Simultaneously recorded spike trains from twenty retinal ganglion cells
firing in response to a repeated natural movie. Figure courtesy of J. Puchalla and M.
Berry.
rons to show how stimuli are encoded in the arrival times of single spikes, and
conversely, how one may go about decoding the meaning of single spikes as a
representation of stimulus features. We will show how information theory has
been applied to address questions of precision and reliability in spike trains,
and to evaluate the role of multiple-spike symbols.
H[P(X)]
=- P(x)log, P(x).
xEX
20 2 Spike Coding
This is simply an average over the Shannon information, or the average num-
ber of bits required to specify the variable. Here we will mostly be concerned
with the mutual information, the amount by which knowing about one random
variable reduces uncertainty about another. For our purposes, one of these
variables will be the stimulus, S, and the other, the response, R. The response
R must clearly be variable (that is, have nonzero entropy) in order to encode
a variable stimulus. However, the variability observed in the response might
reflect variability in the input, or it might be due to noise. Mutual information
quantifies the amount of useful variability, the amount of entropy that is asso-
ciated with changes in the stimulus rather than changes that are not correlated
with the stimulus- i.e. noise.
The mutual information is defined as
I ( S ;R ) = ): P ( s ,r ) log, P ( s ,r )
SES.TER P(s)P(r)
This representation shows that the mutual information is simply the Kullback-
Leibler divergence between the joint distribution of S and R and their marginal
distributions, and therefore measures how far S and R are from independence.
Another way to write the mutual information is
In this form, it is clear that the mutual information is the difference between the
total entropy of the response and the entropy of the response to a fixed input,
averaged over all inputs. This averaged entropy is called the noise entropy,
and quantifies the "blurring" of the responses due to noise. We emphasize
that "noise" here is not necessarily noise in the physical sense; it is simply any
nonreproducibility in the mapping from S to R. This "noise" may in fact be
informative about some other variable not included in our definition of S or
R. Mutual information is symmetric with respect to S and R. It represents the
amount of information about S encoded in R, or equivalently, the amount of
information about R that can be predicted from a known S.
enough to contain a maximum of one spike only. Knowing nothing about the
stimulus, the probability to observe either a spike, r = 1, or no spike, r = 0, in
this time bin is
Here a time average, ~ , d t has , been substituted for the average over the
ensemble of stimuli, J d s P ( s ) , or its discrete equivalent appearing in equation
(2.3). This is valid if the random sequence s ( t ) is sufficiently long. Assuming
T
that p << 1, we may expand log(1 - p) p and use $ J, dt p ( t ) + p for T + cx,
to obtain
To obtain information per spike rather than information per second, we di-
vide by the mean number of spikes per second, r a t , and truncate to first order:
While p << 1, p(t) may be large, and one might worry that one is discarding
an important component of the information, this truncation amounts to com-
puting only information in spikes, and neglecting the information contributed
by the lack of spikes, or silences. For salamander retinal ganglion cells, this
contribution turns out to be very small: we found it to be less than 1% of the
total information (A. L. Fairhall and M. J. Berry 11, unpublished observations).
This result may break down for neurons with higher firing rates.
2 Spike Coding
time (s)
Figure 2.2 (a). A spike train and its representationin terms of binary "letters." A word
consists of some number of sequentialbinary letters. @). A randomly varying Gaussian
stimulus (here, a velocity stimulus) and the spike-trainresponses from fly visual neuron
H1 for many repetitions. Adapted from Strong et a1.1731
symbols. Let us consider the symbol defined by the joint occurrence of some
pair of output events, El and E2. The synergy [13] is defined as the difference
between the mutual information between output and stimulus obtained from
the joint event compared with that obtained if the two events were observed
independently,
A positive value for Syn(E1,E2;s) indicates that El and E2 encode the stimu-
lus synergistically; a negative value implies that the two events are redundant.
This and related quantities [27, 48, 571 have been applied to multiple-spike
outputs to assess the contribution to stimulus encoding of timing relationships
among spikes in the fly visual neuron H1 [13]and the LGN [57]. Petersen et al.
[47,52]found that, for responses in rat barrel cortical neurons, the first spike of
a burst conveyed most of the information about the stimulus.
2 Spike Coding
Figure 2.3 (a). The probability distribution P(w)of all words w generated by a long
random stimulus. (b).The word distribution P(wls(t))generated at a particular time t
by a given repeated stimulus sequence s ( t ) .
Figure 2.4 Information rate in spikes per second as a function of parameters of the
spike-train representation, the bin width At, and the inverse total word length 1/L.
Reproduced with permission from Reinagel and Reid,[55] (see color insert).
tems, and for neural populations, we are only beginning to have the available
experimental and theoretical methods to address it. Here we have seen that
information-theoretic tools allow us to address the question in a quantitative
way.
"No," was Ali's reply; "I fled from the palace as soon as
I had secured about me a large sum in gold, and some of
my more portable treasures. Hassan, who followed me a
day or two afterwards, brought me many more things of
value. I made it worth his while to keep silent, and began a
series of journeys in various parts of the world, partly to
carry on trade in horses and jewels, partly—as I once said
to you before—to flee from myself."
"I will never divulge it; but would that I had never
heard it!" was Robin's reply.
Ali was not far from the tent, and in the midst of his
gloomy reflections, his ear caught low sounds of distress
issuing from it. He went nearer and listened. The Persian
heard Robin pouring out the anguish of his young loving
heart in tones that Ali had never before heard bursting from
human lips. The words were uttered between broken sobs,
for Robin was too weak to restrain his emotions, and he
thought himself quite alone. Ali could distinguish such
sentences as these:
Robin's tears were falling fast; his were not the only
tears that fell. Ali's eyes, that had never wept since the
days of his childhood, were moistened now; the knee that
had never been bent in real supplication for mercy was now
on the earth, the hard heart was throbbing, and what had
been but stern remorse was softening into repentance.
"The Feringhee is pleading for me, God will hear him!
The boy calls me brother, the name which he denied to me
before, he gives me now! If the disciple think me yet within
reach of mercy, will the Master cast me out?"
CHAPTER XIX.
A BITTER CUP.
"We have won poor spoil this time," said the chief of the
Shararat band. "These kafirs had hardly a piastre amongst
them, the jewels are tinsel and glass; two of the party are
dead already, and two of those left are not worth a handful
of date-stones."
* A maund is 80 lbs.
"I would not give a lame ass for the two," quoth the
chief.
CHAPTER XX.
DESERT DANGERS.
But the stick did not descend, nor was the double
burden lifted by the pale-faced captive.
CHAPTER XXI.
ONLY ONE LAMB.
"'God is love.'"
CHAPTER XXII.
SLAVERY.
The Arab would not have dared to have declared all this
had he not thought that, the bridal party being on the point
of starting for a place distant hundreds of miles from Djauf,
there was no danger of detection. The sinfulness of fraud
and falsehood never troubled the conscience of the Arab, for
he could not be said to possess one. He had been nurtured
on lies, and felt rather pride than shame at success in
cheating his employer.
"After all," thought Harold, "I am not the first one of the
Lord's people to have to endure the humiliation of having a
price put upon me." Harold remembered Joseph; he
remembered One far more exalted than Israel's son, for
whose sacred person pieces of silver had been counted
down. It is only in sin that there is shame.
CHAPTER XXIII.
A PROMISE.
"I do not understand you, boy!" said Ali, and very deep
grew the furrow on his brow. "I have taken one life, and I
cannot restore it; God does not require an impossibility."
"If you were in my place what would you do?" asked the
Amir.
Ali did not lose his temper, but his voice sounded harsh
as, after a pause of some minutes, he expressed himself as
follows:
Our website is not just a platform for buying books, but a bridge
connecting readers to the timeless values of culture and wisdom. With
an elegant, user-friendly interface and an intelligent search system,
we are committed to providing a quick and convenient shopping
experience. Additionally, our special promotions and home delivery
services ensure that you save time and fully enjoy the joy of reading.
ebookfinal.com