0% found this document useful (0 votes)
6 views

Prefrontal executive function and adaptive behavio

The article reviews the role of the prefrontal cortex (PFC) in higher cognitive functions such as planning, reasoning, and creativity, emphasizing its importance in adaptive behavior within complex environments. It discusses how the PFC overcomes limitations of reinforcement learning (RL) by implementing a theoretical framework that combines learning, planning, and creativity to optimize behavior in uncertain situations. The findings suggest that the PFC utilizes both model-free and model-based RL processes to adaptively manage behavior based on internal and external cues.

Uploaded by

Said Mhb
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
6 views

Prefrontal executive function and adaptive behavio

The article reviews the role of the prefrontal cortex (PFC) in higher cognitive functions such as planning, reasoning, and creativity, emphasizing its importance in adaptive behavior within complex environments. It discusses how the PFC overcomes limitations of reinforcement learning (RL) by implementing a theoretical framework that combines learning, planning, and creativity to optimize behavior in uncertain situations. The findings suggest that the PFC utilizes both model-free and model-based RL processes to adaptively manage behavior based on internal and external cues.

Uploaded by

Said Mhb
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 7

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/286636889

Prefrontal executive function and adaptive behavior in complex environments

Article in Current opinion in neurobiology · April 2016


DOI: 10.1016/j.conb.2015.11.004

CITATIONS READS

70 465

1 author:

Etienne Koechlin
Ecole Normale Supérieure de Paris
94 PUBLICATIONS 9,560 CITATIONS

SEE PROFILE

All content following this page was uploaded by Etienne Koechlin on 21 November 2018.

The user has requested enhancement of the downloaded file.


Available online at www.sciencedirect.com

ScienceDirect

Prefrontal executive function and adaptive behavior


in complex environments
Etienne Koechlin

The prefrontal cortex (PFC) subserves higher cognitive abilities computational studies and outline a general theoretical
such as planning, reasoning and creativity. Here we review framework describing the PFC function as implementing
recent findings from both empirical and theoretical studies adaptive processes devoted to overcoming key RL adap-
providing new insights about these cognitive abilities and their tive limitations.
neural underpinnings in the PFC as overcoming key adaptive
limitations in reinforcement learning. We outline a unified
From reinforcement learning to adaptive
theoretical framework describing the PFC function as
planning
implementing an algorithmic solution approximating
A first critical limitation in basic RL (also named model-free
statistically optimal, but computationally intractable, adaptive
RL) is that behavior cannot adjust to internal changes in
processes. The resulting PFC functional architecture combines
subjective values of action outcomes [7,8]. Consider, for
learning, planning, reasoning and creativity processes for
instance, action A in a given situation leads to water and
balancing exploitation and exploration behaviors and
action B leads to food. If you are thirsty but replete, RL
optimizing behavioral adaptations in uncertain, variable and
will reinforce action A relative to B in this situation. When
open-ended environments.
the situation reoccurs, you will then select action A rather
Address
than B. If you are now hungry rather than thirsty, howev-
INSERM, Département d’Etudes Cognitives, Ecole Normale Supérieure,
29 rue d’Ulm, 75005 Paris, France er, this is certainly a maladaptive behavior. The problem
arises because basic RL make no distinctions between
Corresponding author: Koechlin, Etienne ([email protected]) rewarding values of action outcomes and action outcomes
per se.
Current Opinion in Neurobiology 2016, 37:1–6
This review comes from a themed issue on Neurobiology of cognitive Overcoming this limitation requires learning an internal
behavior model that specifies the outcomes resulting from actions,
Edited by Alla Karpova and Roozbeh Kiani
regardless of rewarding values. Learning this model is
simply based on outcome likelihoods given actions and
For a complete overview see the Issue and the Editorial
current states. This predictive model is thus learned be-
Available online 11th December 2015 sides the stimulus–action associations learned through
http://dx.doi.org/10.1016/j.conb.2015.11.004 RL (collectively named the selective model here). The
0959-4388/Published by Elsevier Ltd. predictive model especially enables to internally emulate
RL without physically acting [5]: This model predicts
the outcomes of actions derived from the selective model,
so that their rewarding values may be internally experi-
enced according to the current motivational state of the
agent (e.g. thirsty or hungry). Stimulus–action associa-
Introduction tions are then adjusted accordingly through standard RL
Adaptive behavior is critical for organisms to survive in algorithms. This emulation is commonly referred to as
real-world situations that are often changing. Basal gan- model-based RL [5]. Behavior is thus adjusted to the
glia in vertebrates are subcortical nuclei including the agent’s motivational state before acting and reflects in-
striatum that are thought to implement basic adaptive ternal planning. Model-based RL also enables to gener-
processes akin to what is usually referred to as (temporal- ally adapt faster than RL to external changes in action-
difference) Reinforcement Learning [1–4]. RL consists of outcome contingencies and/or outcome values [5].
adjusting online stimulus–action associations to the re-
warding/punishing values of action outcomes. Important- Empirical studies confirm that human behaviors cannot
ly, RL is both a very simple and robust process endowing be fully explained by model-free RL, but instead have a
the animal with the ability to learn optimal behavioral model-based component. [9,10,11]. Neuroimaging
strategies even in complex and uncertain situations [5]. studies show that both the inferior parietal cortex and
In mammals, basal ganglia further form loop circuits with lateral PFC are involved in learning predictive models
the prefrontal cortex (PFC) [6] to further the flexibility [12], with the former possibly encoding these models [13]
and complexity of the behavioral repertoire, in essence and the latter, in association with the hippocampus,
overcoming the critical limitations of the RL processes. retrieving these models for emulating model-based RL
Here, we review recent findings from both empirical and [9]. Furthermore, empirical evidence argues that the

www.sciencedirect.com Current Opinion in Neurobiology 2016, 37:1–6


2 Neurobiology of cognitive behavior

orbitofrontal cortex (ventromedial PFC in humans) in Figure 1


association with the striatum encode action outcomes
from predictive models and their actual rewarding values
[14–18,19]. Together, these studies also suggest that the
ventromedial PFC may directly learn and encode simple SMA dm
predictive models directly mapping stimulus–action pairs preSMA PF
onto expected valued outcomes [20], while the inferior C
parietal cortex and lateral PFC may be involved in
implementing more complex predictive models as
AC
multi-step state–action–state maps (Figure 1). C

Some authors have proposed that in the brain, model-free


and model-based RL form two concurrent instrumental
controllers. In this view, their relative contribution to
action selection is a function of the relative uncertainty Model-free
and/or reliability about reward and outcome expectations BG RL
derived from selective and predictive models, respective- Model-Based
RL vmPFC
ly [21,22]. Others have proposed that model-free and
model-based RL form two cooperative systems with
model-free RL driving online behavior and model-based
RL working off-line in the background to continuously
adjust model-free RL [5,10,23]. Recent behavioral
results support the second view [10]. As shown below, 6
this view is also more consistent with the present theo- 8
retical framework.
9 lateral
PFC
From adaptive planning to Bayesian inference 44
A second critical limitation of RL systems described 45 46
above is that adapting and learning new external contin-
gencies gradually erases previously learned ones. This 10
again leads to maladaptive behavior in environments
exhibiting periodically recurring external contingencies 47
(i.e. recurrent situations): RL systems have no memory
OFC
and need to entirely relearn previously encountered
situations. In uncertain and open-ended environments
where new situations may always arise (i.e. the environ-
ment corresponds to an infinite-multidimensional space),
overcoming this limitation requires solving a nonparamet- Current Opinion in Neurobiology

ric probabilistic inference problem [24] for constantly


arbitrating between continuing to adjust ongoing behav- Reinforcement learning in the human frontal lobes. Schematic diagram
ior through (model-free and/or model-based) RL, switch- showing main subcortical and cortical structures involved in
reinforcement learning (RL). Green: brain regions involved in model-
ing to previously learned behaviors and even creating/ free RL. BG: basal ganglia. SMA: supplementary motor area. Numbers
learning new behaviors. Previously learned behaviors indicate broadmann’s area. BA 6: premotor cortex. Red: Brain regions
along with the ongoing behavior thus form a collection involved in model-based RL. vmPFC: ventromedial PFC. dmPFC:
of discrete entities stored in long-term memory and dorsomedial PFC. OFC: orbitofrontal cortex. ACC: anterior cingulate
cortex. Arrows indicate critical interregional connectivity presumably
referred to as task sets [25]. Task sets are abstract instan-
underpinning RL.
tiations of the situations the agent inferred to have
encountered so far and comprises the selective and pre-
dictive model learned when the task set guided behavior
[26]. Task sets further comprise an additional internal computationally intractable and consequently, biological-
model — the contextual model of the likelihood of any ly implausible.
external cues — learned when the task set guided behav-
ior in the past [27,28], and likely encoded in lateral PFC Recent studies, however, show that a biologically plausi-
regions [29,30]. The aforementioned arbitration problem ble, online algorithm approximating Dirichlet process
has optimal statistical solutions based on Dirichlet process mixtures can account for human behavior in both recur-
mixtures [31,32] which in practical cases, are actually rent and open-ended environments [24,33,34]. This

Current Opinion in Neurobiology 2016, 37:1–6 www.sciencedirect.com


Prefrontal function and adaptive behavior Koechlin 3

algorithm has two key features. First, it infers online the Figure 2
absolute reliability of the task set driving ongoing behav-
ior (named the actor), i.e. the posterior probability that the
actor remains consistent with current external contingen-
cies or equivalently, that the external situation has not SMA dm
changed (considering a potentially infinite range of ex- preSMA
preS PF
In
ternal situations). While the actor remains reliable (more hi C
& b
likely consistent than inconsistent: reliability > 0.5), it sw it th
ex it e
adjusts through RL and drives behavior. Actor reliability pl A ch ac
or C in to
is inferred according to the likelihood of actual action at C to r
outcomes and external cues derived from actor’s predic- io
n
tive and contextual model, respectively. Second, when
the actor passes from the reliable to the unreliable status Actor
(reliability < 0.5), the situation has presumably changed
reliability
and the overall algorithm passes from the exploitation to
inference
the exploration mode. A new actor is then formed as a BG
weighted mixture of task sets stored in long-term memo- vmPFC
ry, thereby creating a new task set in memory. More
specifically, the new task set is built through the Bayesian
averaging of internal (selective, predictive and contextu-
al) models across stored task sets, given current external
cues and action outcomes according to task-sets’ contex-
tual and predictive models. When external situations 6
reoccur, accordingly, new task sets are created and Create new
8
learned, so that the number of task sets with quasi- task sets as actor
identical models stored in long-term memory scales with 9 lateral
the occurrence frequencies of situations. As a result, new PFC
actors are formed according to the occurrence frequencies 44 switch to
and current external evidence associated with previously exploit
encountered situations. When external situations are alternative alternative
task sets task sets'
new, such new actors simply learn new external contin-
as actor reliability
gencies in their internal models, which register in the
newly created task set. Note that the new actor may be inference
initially inferred as being unreliable but still drives be- OFC
havior. Through RL, however, the new actor will subse-
quently reach reliability. The overall algorithm then
returns to the exploitation mode: when this actor is
eventually deemed as unreliable, a new actor is again
created as described above. Current Opinion in Neurobiology

Neuroimaging results show that the activity in different Prefrontal functional architecture regulating adaptive behavior beyond
RL. Schematic anatomic-functional diagram of prefrontal cortex (see
parts of the PFC co-varies with various parameters of the
Figure 1 legend for labels). Yellow: PFC regions monitoring the reliability
proposed algorithm and related algorithmic events [34] of the task set driving ongoing behavior and learning external
(Figure 2): (1) the PFC appears to specifically monitor the contingencies through RL (i.e. the actor) for detecting when it becomes
absolute reliability of actors in the ventromedial region unreliable and switching into exploration. Cyan: PFC regions involved in
according to action outcomes. This finding is consistent creating new actor task-sets from long-term memory for driving
exploration behavior. Blue: PFC regions monitoring the reliability of
with the involvement of ventromedial PFC in encoding alternative task sets not contributing to ongoing behavior for detecting
predictive models (see above) and subjects’ confidence in when one becomes reliable and exploiting it as actor, thereby preventing
their behavioral choices [35,36]; (2) algorithmic transi- or switching out of exploration periods (in which case the newly created
tions from the exploitation to the exploration mode (when task set driving exploration is simply disbanded). Arrows indicate critical
information flows from anterior regions involved in inferential processes
reliable actors become unreliable) were associated with
to posterior regions involved in switching processes.
selective transient responses in the dorsomedial PFC
(dorsal anterior cingulate cortex: dACC). Consistent with
this result, other fMRI studies in human subjects have shown that neuronal ensembles in dACC exhibit abrupt
provided evidence for the involvement of dACC in con- phases transitions when animals switch from exploitation
trolling the switch from exploitation to exploration [37]. to exploration. [38,39,40,41]. Electrophysiological
Furthermore, extracellular recordings in rodents have recordings further suggest that the dACC enforces

www.sciencedirect.com Current Opinion in Neurobiology 2016, 37:1–6


4 Neurobiology of cognitive behavior

switching away from the current actor at the task set level, the environment. With no additional adaptive mecha-
while the pre-SMA is involved in inhibiting its subordi- nisms, the process remains irreversible and lacks flexi-
nate elements like stimulus–action associations compos- bility. To overcome this limitation and more efficiently
ing selective models [42–44]. approximate optimal adaptive processes in uncertain
environments [24], the algorithm described in [33]
From Bayesian inference to adaptive creativity has a third key feature: it keeps monitoring the absolute
Key to the efficiency of the algorithm described above is reliability of task sets previously used as reliable actors
that it gradually builds a repertoire of task sets in long- along with the ongoing actor task set (thereby forming
term memory. That repertoire is used to create new actors an inferential buffer). Whenever one monitored task set
rather than adapting ongoing actors through RL when the becomes reliable (at most one task set can be reliable at
current situation is inferred as having changed. Actor the same time), it becomes the actor. Whenever none
creation uses previously learned task sets optimally, be- are deemed reliable, a new actor is then created from
cause new actors are created according to both the recur- long-term memory as described above. Critically, this
rence frequencies of situations and current external newly created actor is rejected and disbanded, whenever
evidence supporting associated task sets [24,33]. One it remains unreliable while one monitored task set
limitation of this creation process, however, is that it becomes reliable. Conversely, it is confirmed and con-
builds selective models from previously learned selective solidated in long-term memory, whenever it becomes
models with no consideration for whether rewarding reliable before the other monitored task sets. This
values of action outcomes may have changed since these algorithm thus implements the notion of hypothesis
selective models were learned. As explained above (first testing bearing upon newly created task sets.
section), this may lead to create maladaptive new actors.
Behavioral studies show that this algorithm predicts hu-
A direct solution to this problem is to assume that the man behavioral performances in protocols featuring un-
process of actor creation triggers model-based RL before certain, changing and open-ended environments
the new actor actually starts driving behavior [24,33]. [33,34]. Interestingly, the best prediction was obtained
Because new actors are assumed to be stored as new task when the inferential buffer had a monitoring capacity of
sets in long-term memory, this model-based RL is better three/four task sets and only the actor in the buffer
conceived as directly adjusting off-line the new actor’s contribute to drive behavior. These findings match the
selective model before the latter drives behavior. This notion and capacity of human working memory [48–50].
hypothesis has not been empirically tested yet but some Moreover, neuroimaging results suggest that the PFC
empirical evidence supports it. First, the hypothesis pre- implements this algorithm [34]: firstly, activations in the
dicts that when the situation changes, model-based RL frontopolar PFC encode the absolute reliability of non-
predominates over model-free RL, while model-free RL actor task sets monitored in the inferential buffer; sec-
gradually begins to dominate when the situation perpe- ondly, lateral PFC transiently responds to algorithmic
tuates. Consistently, previous studies provide evidence rejection events, i.e. when the algorithm rejects a newly
that behavioral performances predominantly reflect mod- created actor and retrieve one monitored task set to serve
el-based RL after changes in external contingencies and as actor; and finally, ventral striatum transiently responds
subsequently, reflect model-free RL when the behavior to algorithmic confirmation events, i.e. when the algo-
becomes increasingly habitual [22]. Second, task set rithm confirms and consolidates the creation of new
creation was associated with fMRI activations in lateral actors. These findings are consistent with the involve-
premotor and PFC regions along with activations in the ment of frontopolar PFC in holding on and monitoring
striatum [34,45] (Figure 2). Consistent with the hypoth- the opportunity to switch to alternative courses of action
esis, fMRI studies have shown the striatum to be impli- [15,16,30,51,52], the involvement of lateral PFC in task
cated in both model-free and model-based RL [46]. switching [25,29] and the ventral striatum in reinforce-
Third, the hypothesis implicates that the new actor is ment learning [53].
mentally emulated before the behavioral switch occurs, as
medial PFC activations suggest in a recent human fMRI Concluding remarks
study [47]. We outlined here a theoretical framework providing a
unified view of PFC function as overcoming critical
From adaptive creativity to hypothesis testing limitations of model-free RL processes in basal ganglia.
The algorithmic approach we describe here suggests Beyond its role in model-based RL processes, the ven-
that the core PFC function is to enable adaptive behav- tromedial PFC appears to further implement Bayesian
ior to abruptly switch from RL-based adjustments of inference processes monitoring the absolute reliability of
ongoing behavior to memory-based constructs of new ongoing behavior based on action outcome expectations.
behaviors. Such a switching/creation process is intrinsi- In contrast, the dorsomedial PFC detects when ongoing
cally non-parametric (i.e. discrete) given the presumed behavior becomes unreliable in order to explore and
open-ended (i.e. infinitely multi-dimensional) nature of create new behaviors built from long-term memory rather

Current Opinion in Neurobiology 2016, 37:1–6 www.sciencedirect.com


Prefrontal function and adaptive behavior Koechlin 5

than to stay adjusting ongoing behavior through RL. 11. Otto AR, Gershman SJ, Markman AB, Daw ND: The curse of
planning: dissecting multiple reinforcement-learning systems
Additionally, the frontopolar PFC cortex appears to infer by taxing the central executive. Psychol Sci 2013, 24:751-761.
the absolute reliability of alternative behavioral strate- 12. Glascher J, Daw N, Dayan P, O’Doherty JP: States versus
gies. The lateral PFC detects when such alternative rewards: dissociable neural prediction error signals
strategies become reliable for retrieving and exploiting underlying model-based and model-free reinforcement
learning. Neuron 2010, 66:585-595.
them to drive behavior rather than creating new behav-
ioral strategies. This functional architecture appears to 13. Liljeholm M, Wang S, Zhang J, O’Doherty JP: Neural correlates of
the divergence of instrumental probability distributions. J
reflect the implementation of an online algorithm approx- Neurosci 2013, 33:12519-12527.
imating statistically optimal but computationally intrac- 14. Hampton AN, Bossaerts P, O’Doherty JP: The role of the
table processes driving adaptive behavior in uncertain, ventromedial prefrontal cortex in abstract state-based
changing and open-ended environments. This theoretical inference during decision making in humans. J Neurosci 2006,
26:8360-8367.
framework may help future research to better understand
brain and neural mechanisms regulating learning and 15. Boorman ED, Behrens TE, Woolrich MW, Rushworth MF: How
green is the grass on the other side? Frontopolar cortex and
planning, reasoning and creativity processes subserving the evidence in favor of alternative courses of action. Neuron
exploitation and exploration strategies in the service of 2009, 62:733-743.
adaptive behavior. 16. Boorman ED, Behrens TE, Rushworth MF: Counterfactual choice
and learning in a neural network centered on human lateral
frontopolar cortex. PLoS Biol 2011, 9:e1001093.
Conflict of interest statement
17. Daw ND, Gershman SJ, Seymour B, Dayan P, Dolan RJ: Model-
Nothing declared. based influences on humans’ choices and striatal prediction
errors. Neuron 2011, 69:1204-1215.
Acknowledgements 18. Zhu L, Mathewson KE, Hsu M: Dissociable neural
Supported by a European Research Council Grant (ERC-2009- representations of reinforcement and belief prediction errors
AdG#250106). underlie strategic learning. Proc Natl Acad Sci U S A 2012,
109:1419-1424.

References and recommended reading 19. Jones JL, Esber GR, McDannald MA, Gruber AJ, Hernandez A,
 Mirenzi A, Schoenbaum G: Orbitofrontal cortex supports
Papers of particular interest, published within the period of review,
have been highlighted as: behavior and learning using inferred but not cached values.
Science 2012, 338:953-956.
 of special interest The study demonstrates that in rats, the OFC is required for infering
rewarding values based on internal predictive models.
 of outstanding interest
20. Wilson RC, Takahashi YK, Schoenbaum G, Niv Y: Orbitofrontal
1. Stephenson-Jones M, Samuelsson E, Ericsson J, Robertson B,  cortex as a cognitive map of task space. Neuron 2014, 81:
Grillner S: Evolutionary conservation of the basal ganglia as a 267-279.
common vertebrate mechanism for action selection. Curr Biol The authors review empirical evidence supporting that the OFC/vmPFC
2011, 21:1081-1091. represents predictive models of action outcomes associated with
ongoing behavior.
2. Schultz W, Dayan P, Montague PR: A neural substrate of
prediction and reward. Science 1997, 275:1593-1599. 21. Daw ND, Niv Y, Dayan P: Uncertainty-based competition
between prefrontal and dorsolateral striatal systems for
3. O’Doherty J, Dayan P, Schultz J, Deichmann R, Friston K, behavioral control. Nat Neurosci 2005, 8:1704-1711.
Dolan RJ: Dissociable roles of ventral and dorsal striatum in
instrumental conditioning. Science 2004, 304:452-454. 22. Lee SW, Shimojo S, O’Doherty JP: Neural computations
underlying arbitration between model-based and model-free
4. Doya K: Reinforcement learning: computational theory and learning. Neuron 2014, 81:687-699.
biological mechanisms. HFSP J 2007, 1:30-40.
23. Pezzulo G, Rigoli F, Chersi F: The mixed instrumental controller:
5. Sutton RS, Barto AG: Reinforcement Learning: An Introduction. using value of information to combine habitual choice and
 Cambridge, MA: MIT Press; 1998. mental simulation. Front Psychol 2013, 4:92.
A major reference in the field. A profound but easily accessible, inspira-
tional introduction to reinforcement learning from a machine learning 24. Koechlin E: An evolutionary computational theory of prefrontal
perspective.  executive function in decision-making. Philos Trans R Soc Lond
6. Alexander GE, DeLong MR, Strick PL: Parallel organization of B Biol Sci 2014:369.
functionally segregated circuits linking basal ganglia and This study describes the evolution of PFC function from rodents to
cortex. Annu Rev Neurosci 1986, 9:357-381. humans as the gradual addition of new inferential capabilities for dealing
with a computationally intractable, basic decision problem: exploring and
7. Dickinson A: Actions and habits: the development of a learning new behavioral strategies versus exploitating and adjusting
behavioural autonomy. Philos Trans R Soc Lond B Biol Sci 1985, previously learned ones through RL.
308:67-78.
25. Sakai K: Task set and prefrontal cortex. Annu Rev Neurosci
8. Balleine BW, Dickinson A: Goal-directed instrumental action: 2008, 31:219-245.
contingency and incentive learning and their cortical
substrates. Neuropharmacology 1998, 37:407-419. 26. Domenech P, Koechlin E: Executive control and decision-
making in the prefrontal cortex. Curr Opin Behav Sci 2015,
9. Simon DA, Daw ND: Neural correlates of forward planning in a 1:101-106.
spatial decision task in humans. J Neurosci 2011, 31:5526-5539.
27. Collins AG, Frank MJ: Cognitive control over learning: creating,
10. Gershman SJ, Markman AB, Otto AR: Retrospective revaluation clustering, and generalizing task-set structure. Psychol Rev
 in sequential decision making: a tale of two systems. J Exp 2013, 120:190-229.
Psychol Gen 2014, 143:182-194.
Behavioral study providing evidence that model-free RL controls beha- 28. Collins AG, Cavanagh JF, Frank MJ: Human EEG uncovers latent
vior, while model-based RL trains the model-free system in the back- generalizable rule structure during learning. J Neurosci 2014,
ground, as originally suggested in [5]. 34:4677-4685.

www.sciencedirect.com Current Opinion in Neurobiology 2016, 37:1–6


6 Neurobiology of cognitive behavior

29. Koechlin E, Ody C, Kouneiher F: The architecture of cognitive exploration phases consisting of active resampling of external contin-
control in the human prefrontal cortex. Science 2003, 302: gencies.
1181-1185.
41. Tervo DG, Proskurin M, Manakov M, Kabra M, Vollmer A,
30. Koechlin E, Summerfield C: An information theoretical  Branson K, Karpova AY: Behavioral variability through
approach to prefrontal executive function. Trends Cogn Sci stochastic choice and its gating by anterior cingulate cortex.
2007, 11:229-235. Cell 2014, 159:21-32.
Using circuit perturbations in transgenic rats, the authors demonstrate
31. Ferguson TS: A Bayesian analysis of some non-parametric that switching between exploitation versus exploration behavioral modes
problems. Ann Stat 1973, 1:209-230. is controlled by locus coeruleus input into ACC.
32. Teh YW, Jordan MI, Beal MJ, Blei DM: Hierarchical Dirichlet
42. Isoda M, Hikosaka O: Switching from automatic to controlled
processes. J Am Stat Assoc 2006, 101:1566-1581.
action by monkey medial frontal cortex. Nat Neurosci 2007,
33. Collins AG, Koechlin E: Reasoning, learning, and creativity: 10:240-248.
 frontal lobe function and human decision-making. PLoS Biol
2012, 10:e1001293. 43. Matsuzaka Y, Akiyama T, Tanji J, Mushiake H: Neuronal activity in
Detailed and comprehensive computational account of human adaptive the primate dorsomedial prefrontal cortex contributes to
behavior in uncertain, variable and open-ended environments in terms of strategic selection of response tactics. Proc Natl Acad Sci U S A
statistical reasoning, and its variations across individuals. 2012, 109:4633-4638.

34. Donoso M, Collins AG, Koechlin E: Human cognition. 44. Bonini F, Burle B, Liegeois-Chauvel C, Regis J, Chauvel P, Vidal F:
 Foundations of human reasoning in the prefrontal cortex. Action monitoring and medial frontal cortex: leading role of
Science 2014, 344:1481-1486. supplementary motor area. Science 2014, 343:888-891.
This model-based fMRI study, central to the present account, charac-
terizes the algorithmic architecture of the PFC combining RL, Bayesian 45. Badre D, Kayser AS, D’Esposito M: Frontal cortex and the
inference, and hypothesis testing in regulating adaptive behavior. See discovery of abstract action rules. Neuron 2010, 66:315-326.
also [33]. 46. Wunderlich K, Dayan P, Dolan RJ: Mapping value based
35. De Martino B, Fleming SM, Garrett N, Dolan RJ: Confidence in planning and extensively trained choice in the human brain.
value-based choice. Nat Neurosci 2013, 16:105-110. Nat Neurosci 2012, 15:786-791.

36. Lebreton M, Abitbol R, Daunizeau J, Pessiglione M: Automatic 47. Schuck NW, Gaschler R, Wenke D, Heinzle J, Frensch PA,
integration of confidence in the brain valuation signal. Nat Haynes JD, Reverberi C: Medial prefrontal cortex predicts
Neurosci 2015, 18:1159-1167. internally driven strategy shifts. Neuron 2015.

37. Kolling N, Behrens TE, Mars RB, Rushworth MF: Neural 48. Cowan N: Working-memory capacity limits in a theoretical
mechanisms of foraging. Science 2012, 336:95-98. context. In Human Learning and Memory: Advances in Theory
and Applications. Edited by Izawa C, Ohta N. Erlbaum; 2005:
38. Durstewitz D, Vittoz NM, Floresco SB, Seamans JK: Abrupt 155-175.
transitions between prefrontal neural ensemble states
accompany behavioral transitions during rule learning. Neuron 49. Oberauer K: Declarative and procedural working memory:
2010, 66:438-448. common principles, common capacity limits? Psychol Belg
2010, 50:277-308.
39. Hayden BY, Pearson JM, Platt ML: Neuronal basis of sequential
 foraging decisions in a patchy environment. Nat Neurosci 2011, 50. Risse S, Oberauer K: Selection of objects and tasks in working
14:933-939. memory. Q J Exp Psychol 2010, 63:784-804.
This electrophysiological monkey study reports that in foraging, ACC
neurons encoded the relative value of staying for exploiting the current 51. Koechlin E, Basso G, Pietrini P, Panzer S, Grafman J: The role of
depleting patch, versus switching away for exploring a new patch. This the anterior prefrontal cortex in human cognition. Nature 1999,
decision variable increased up to a threshold triggering exploration. The 399:148-151.
study reveals neuronal mechanisms triggering behavioral switches.
52. Koechlin E, Hyafil A: Anterior prefrontal function and the limits
40. Karlsson MP, Tervo DG, Karpova AY: Network resets in medial of human decision-making. Science 2007, 318:594-598.
 prefrontal cortex mark the onset of behavioral uncertainty.
Science 2012, 338:135-139. 53. Liljeholm M, O’Doherty JP: Contributions of the striatum to
This study shows abrupt transitions in neural ensembles in rat medial PFC learning, motivation, and performance: an associative
associated with switching away from exploitation. Switches triggered account. Trends Cogn Sci 2012, 16:467-475.

Current Opinion in Neurobiology 2016, 37:1–6 www.sciencedirect.com

View publication stats

You might also like