0% found this document useful (0 votes)
1 views

PIIS0960982219305469

Uploaded by

chris montto
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
1 views

PIIS0960982219305469

Uploaded by

chris montto
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 15

Report

An Analysis of Decision under Risk in Rats


Highlights Authors
d A novel task enables application of core behavioral economic Christine M. Constantinople,
approaches in rodents Alex T. Piet, Carlos D. Brody

d Like humans, rats exhibit nonlinear utility and probability Correspondence


weighting
[email protected]
d Rats also exhibit trial history effects, consistent with ongoing
learning
In Brief
Constantinople et al. apply prospect
d A reinforcement learning model incorporating subjective theory, the predominant economic theory
value accounts for the data of decision-making under risk, to rats.
Rats exhibit signatures of both prospect
theory and reinforcement learning. The
authors present a model that integrates
these frameworks, accounting for rats’
nonlinear econometric functions and also
trial-by-trial learning.

Constantinople et al., 2019, Current Biology 29, 2066–2074


June 17, 2019 ª 2019 Elsevier Ltd.
https://doi.org/10.1016/j.cub.2019.05.013
Current Biology

Report

An Analysis of Decision under Risk in Rats


Christine M. Constantinople,1,5,6,* Alex T. Piet,1,4 and Carlos D. Brody1,2,3
1Princeton Neuroscience Institute, Princeton University, Washington Road, Princeton, NJ 08544, USA
2Department of Molecular Biology, Princeton University, Washington Road, Princeton, NJ 08544, USA
3Howard Hughes Medical Institute, Princeton University, Washington Road, Princeton, NJ 08544, USA
4Present address: Allen Institute for Brain Science, Westlake Avenue N, Seattle, WA 98109, USA
5Present address: Center for Neural Science, New York University, Washington Place, New York, NY 10003, USA
6Lead Contact

*Correspondence: [email protected]
https://doi.org/10.1016/j.cub.2019.05.013

SUMMARY might occur [3–5]. Although these models described how ani-
mals might learn associations between stimuli, they were natu-
In 1979, Daniel Kahneman and Amos Tversky pub- rally extended to account for learning values from experience
lished a ground-breaking paper titled ‘‘Prospect [7, 10]. Models of trial-and-error learning from animal learning
Theory: An Analysis of Decision under Risk,’’ which theory form the basis for reinforcement learning algorithms,
presented a behavioral economic theory that ac- including temporal difference learning, which captures temporal
counted for the ways in which humans deviate from relationships between predictors and outcomes [7, 11]. Rein-
forcement learning provides a powerful framework for value-
economists’ normative workhorse model, Expected
based decision-making in psychology and neuroscience, in
Utility Theory [1, 2]. For example, people exhibit
which value estimates are learned from experience and updated
probability distortion (they overweight low probabili- trial-to-trial based on prediction errors [7, 12, 13].
ties), loss aversion (losses loom larger than gains), Reinforcement learning has had profound impact in part
and reference dependence (outcomes are evaluated because many of its components have been related to neural sub-
as gains or losses relative to an internal reference strates [12, 14–17]. However, standard reinforcement learning al-
point). We found that rats exhibited many of these gorithms dictate that agents learn the expected value (volume 3
same biases, using a task in which rats chose probability) of actions or outcomes with experience [6], meaning
between guaranteed and probabilistic rewards. that they will exhibit linear utility and probability weighting func-
However, prospect theory assumes stable prefer- tions. This is incompatible with prospect theory. We found that
ences in the absence of learning, an assumption at rats exhibited signatures of both prospect theory and reinforce-
ment learning, and we present an initial attempt to integrate these
odds with alternative frameworks such as animal
frameworks. First, we focus on prospect theory.
learning theory and reinforcement learning [3–7].
Most economic studies examine decisions between clearly
Rats also exhibited trial history effects, consistent described lotteries (i.e., ‘‘decisions from description’’). Studies
with ongoing learning. A reinforcement learning of risky choice in rodents, however, typically examine decisions
model in which state-action values were updated between prospects that are learned over time (i.e., ‘‘decisions
by the subjective value of outcomes according to from experience’’), which are difficult to reconcile with prospect
prospect theory reproduced rats’ nonlinear utility theory [18–20]. We designed a task in which reward probability
and probability weighting functions and also and amount are communicated by sensory evidence, eliciting
captured trial-by-trial learning dynamics. decisions from description rather than experience. This enabled
behavioral economic approaches, such as estimating utility
functions.
RESULTS Rats initiated a trial by nose-poking in the center port of a
three-port wall. Light flashes were presented from left and right
Two key components of prospect theory are utility (rewards are side ports, and the number of flashes conveyed the probability
evaluated by the subjective satisfaction or ‘‘utility’’ they provide) of water reward at each port. Simultaneously, auditory clicks
and probability distortion (people often overweight low and un- were presented from left and right speakers, and click rate
derweight high probabilities; Figure 1D). In this theory, subjective conveyed the volume of water reward baited at each port (Fig-
value is determined by the shapes of subjects’ utility and proba- ures 1A and 1B). One port offered a guaranteed or safe reward,
bility weighting functions. and the other offered a risky reward with an explicitly cued prob-
Learning theories provide an alternative account of subjective ability. The safe and risky ports (left or right) varied randomly.
value. In animal learning theory, Thorndike’s ‘‘Law of Effect’’ One of four water volumes could be the guaranteed or risky
described the effect of reinforcers on action selection [8], and reward (6, 12, 24, 48 mL); risky reward probabilities ranged
Pavlov’s subsequent experiments demonstrated how animals from 0 to 1, in increments of 0.1 (Figures 1A and 1B).
learn to associate stimuli with rewards [9]. The Rescorla-Wagner High-throughput training generated 36 trained rats and many
model of classical conditioning formalized how such learning tens of thousands of choices per rat, enabling detailed behavioral

2066 Current Biology 29, 2066–2074, June 17, 2019 ª 2019 Elsevier Ltd.
A Rat Behavioral Task B 1 of 3 stimulus-reward mappings
1.0 48

Volume ( L)
48 p=0.3 p=1.0 6

Probability
0.5 24
12
6
0 5 10 6 12 24 48
# Flashes Click rates
C Rat J266 (56750 trials)
Safe side: 6 µ l (p=1) Safe side: 24 µ l (p=1)
Center LED ~2.6-3.35 seconds 100
48

Risky side volume (µl)


24

% Chose safe
Nose in center poke
12
L/R clicks convey 6
reward volume 50
Safe side: 12 µ l (p=1) Safe side: 48 µ l (p=1)
L/R flashes convey variable delays 48
reward prob 24
12
Nose in side poke 6 0
0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1
Risky side probability Risky side probability
D Prospect Theory Behavioral Model E
Model prediction for J266 (5-fold cross validation)
(x-r) if x>r Value = u(x) * w(p)
u(x) = { - (r-x) if x<r
w(p) = e Safe side: 6 µ l (p=1) Safe side: 24 µ l (p=1)
100
48

Risky side volume (µl)


24
subjective prob (w(p))

% Chose safe
reference 1
Value =ValueR - ValueL 12
point (r)
utility (u(x))

gains 6
Safe side: 12 µ l (p=1) Safe side: 48 µ l (p=1) 50
losses
48
P(ChooseR ) 24
12
0 1 6 Risky side probability Risky side probability
Logistic( Value 0
+ bias) 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1
reward (x) prob (p)

Figure 1. Rats Choose between Guaranteed and Probabilistic Rewards


(A) Behavioral task and timing of task events: flashes cue reward probability (p) and click rates convey water volume (x) on each side. Safe and risky sides are not
fixed.
(B) Relationship between cues and reward probability and volume in one task version. Alternative versions produced similar results (Figure S2). There were four
possible volumes (6, 12, 24, or 48 mL), and the risky side offered reward probabilities between 0 and 1 in increments of 0.1.
(C) One rat’s performance for each of the safe side volumes. Axes are probability and volume of risky options.
(D) A behavioral model inferred the utility and probability weighting functions that best explained rats’ choices. See Box 1 for details.
(E) Model prediction for held-out data from one rat, averaged over 5 test sets. See also Figures S1 and S2.

quantification. Rats demonstrated they learned the meaning of tive to differences in larger rewards. Rats’ median a was 0.54,
the cues by frequently ‘‘opting-out’’ of trials offering smaller indicating concave utility, like humans [2] (Figures 2A and S1F).
rewards, leaving the center poke despite incurring a time-out To test for diminishing marginal sensitivity, we compared perfor-
penalty and white-noise sound (Figures S1A–S1C). This indi- mance on trials offering guaranteed outcomes of 0 or 24 mL, and
cated that they associated the click rates with water volumes, 24 or 48 mL (Figures 2B and 2C). Concave utility implies that 24
instead of relying on a purely perceptual strategy. It is possible and 48 mL are less discriminable than 0 and 24 mL (Figure 2C).
that opting-out, which persisted despite longer time-out pen- Indeed, the concavity of the utility function was correlated with
alties for low-volume trials (STAR Methods), reflected reward- reduced discriminability on trials offering 24 and 48 mL (Figures
rate maximizing strategies [21–23]. Rats favored prospects with 2D and 2E; p = 1.03e7, Pearson’s correlation). This was true
higher expected value (Figures 1C and S1). when the guaranteed outcome of 0 included trials offering
We used a standard choice model [24, 25] to estimate each 24 mL with p = 0 (Figures 2D and 2E), or all volumes with p = 0 (Fig-
rat’s utility and probability weighting functions according to ures S2G and S2H). Our choice set did not permit analysis of
prospect theory (Box 1; Figures 1D and S1). The model predicted trials offering non-zero rewards, as these trials (24 versus 0,
rats’ choices on held-out data (Figure 1E). It outperformed alter- 48 versus 24) were the only ones with equal reward differences.
native models, including one imposing linear probability weight- This suggests that rats, like humans, exhibit diminishing marginal
ing (according to Expected Utility Theory [26]), one that fit linear sensitivity.
weights for probabilities and volumes, and several models imple- Rats’ probability weighting functions revealed overweighting
menting purely perceptual strategies with sensory noise (Figures of probabilities (Figure 2F). A logistic regression model that
S2A–S2F). parameterized each probability to predict choice yielded regres-
Concave utility (the utility function exponent a < 1) produces sion weights mirroring the probability weighting functions (Fig-
diminishing marginal sensitivity, in which subjects are less sensi- ures 2G and 2H). Control experiments indicated that nonlinear

Current Biology 29, 2066–2074, June 17, 2019 2067


Box 1. Prospect Theory Behavioral Model

We modeled the probability that the rat chose the right side by a logistic function whose argument was the difference between the
subjective value of each option ðVR  VL Þ plus a trial history-dependent term. Subjective utility was parameterized as:
 a
ðx  rÞ if x > r
uðxÞ = (1)
kðr  xÞa if x < r;

where a is a free parameter, and x is reward volume. r is the reference point, which determines whether rewards are perceived as
gains or losses. We first consider the case where r = 0, so
uðxÞ = xa : (2)

The subjective probability of each option is computed by:


d
wðpÞ = ebðlnðpÞÞ ; (3)

where b and d are free parameters and p is the objective probability offered. Combining utility and probability yields the subjective
value for each option:
VR = uðxR ÞwðpR Þ (4)

VL = uðxL ÞwðpL Þ: (5)

These were normalized by the max over trials and transformed into choice probabilities via a logistic function:
1  2i
PðChooseR Þ = i + ; (6)
1 + elðVR VL Þ + bias

where i captures stimulus independent variability (lapse rate) and l determines the sensitivity of choices to the difference in sub-
jective value ðVR  VL Þ. The bias term was composed of three possible parameters, depending on trial history:

8
< + =h1 if t­1was safe L=R choice
bias = + =h2 if t­1was risky L=R rew (7)
:
+ =h3 if t­1was risky L=R miss:

utility and probability weighting were not due to perceptual errors vealed systematic shifts in utility and probability weighting func-
in estimating flashes and clicks [27] (Figures S2I–S2K). tions: utility functions became less concave and probability
To evaluate rats’ risk attitudes, we measured the certainty weighting functions became more elevated to reflect increased
equivalents (CEs) for all gambles of 48 mL [2, 28, 29]. The cer- likelihood of risky choice following rewards (Figure 3A). This
tainty equivalent is the guaranteed reward the rat deems equal was consistent across rats, as observed in certainty equivalents
to the gamble (Figures 2I and 2J). If it is less than the gamble’s from the data (p = 6.49e5, paired t test) and model (Figure 3B;
expected value, that indicates risk aversion: the subject p = 2.46e7).
effectively ‘‘undervalues’’ the gamble and will accept a smaller Another feature of human behavior is ‘‘reference depen-
reward to avoid risk (Figure 2K). Conversely, if the certainty dence’’: people evaluate rewards as gains or losses relative to
equivalent is greater than the gamble’s expected value, the sub- an internal reference point. It is unclear what determines the
ject is risk seeking, and risk neutral if they are equal. Measured reference point [30]; proposals include status quo wealth [1],
certainty equivalents closely matched those predicted from the reward expectation [31, 32], heuristics based on the prospects
model, using an analytic expression incorporating utility and [33, 34], or recent experience [35, 36].
probability weighting functions (CE = wðpÞ1=a ; STAR Methods. Rats demonstrated reference dependence by treating smaller
Pearson’s correlation 0.96, p = 1.58e11; Figure 2K). This rewards as losses. They exhibited win-stay and lose-switch
non-parametric assay further validated the model fits and re- biases: following unrewarded trials, rats were more likely to
vealed heterogeneous risk preferences across rats (Figure 2L; switch ports (Figure 3C). Surprisingly, most rats exhibited
Figures S3A–S3C). ‘‘switch’’ biases after receiving 6 or 12 mL, consistent with treat-
Although rats exhibited nonlinear utility and probability weight- ing these outcomes as losses. The ‘‘win or lose’’ threshold (i.e.,
ing, consistent with prospect theory, they also exhibited trial-by- reference point) was experience dependent: a separate cohort
trial learning, consistent with reinforcement learning. Fitting the of rats (n = 3) trained with doubled rewards (12–96 mL) exhibited
model to trials following rewarded and unrewarded choices re- lose-switch biases after receiving 12 or 24 mL (Figure 3D).

2068 Current Biology 29, 2066–2074, June 17, 2019


A B C D

E F G H

I J L
K

Figure 2. Non-parametric Analyses Confirm Nonlinear Utility and Probability Weighting and Reveal Diverse Risk Attitudes
(A) Model fits of subjective utility functions for each rat, normalized by the maximum volume (48 mL).
(B) Schematic linear utility function: the perceptual distance (or discriminability, d’) between 0 and 24 mL is the same as 24 and 48 mL.
(C) Schematic concave utility function: 24 and 48 mL are less discriminable than 0 and 24 mL.
(D) One rat’s performance on trials with guaranteed outcomes of 0 versus 24 mL (green) or 24 versus 48 mL (purple). Performance ratio on these trials (‘‘d’ ratio’’)
less than 1 indicates diminishing sensitivity. Error bars are binomial confidence intervals.
(E) The concavity of the utility function (a) is significantly correlated with reduced discriminability of larger rewards (p = 1.03e7, Pearson’s correlation). Pink circle
is rat from (D).
(F) Model fits of probability weighting functions.
(G) Weights from logistic regression parameterizing each probability match probability weighting function for one rat. Error bars are SEM for each regression
coefficient.
(H) Mean squared error between regression weights and parametric fits for each rat (mean MSE = 0.006, in units of probability).
(I and J) To obtain certainty equivalents, we measured psychometric functions for each probability of receiving 48 mL and estimated the certain volume at which
performance = 50%.
(K) Measured (blue) and model-predicted (red) certainty equivalents from one rat indicates systematic undervaluing of the gamble, or risk aversion. Error bars for
model prediction are 95% confidence intervals of parameters from 5-fold cross validation. Data are mean ± SEM for left-out test sets.
(L) Distribution of certainty equivalent areas computed using analytic expression from model fits. Measured certainty equivalents were similar (Figure S3C).
See also Figures S2 and S3.

The win or lose threshold was often reward-history dependent Box 1). Subjective value was reparameterized to include the
(Figure 3E). Therefore, we parameterized a reference point, r, as zero outcome of the gamble, which is a loss when r > 0:
taking one of two values depending on whether the previous trial
VR = uðxR ÞwðpR Þ + uð0Þwð1  pR Þ (8)
was rewarded (see STAR Methods). Rewards less than r were
negative (losses). The relative amplitude of losses versus gains
was controlled by the parameter k (Equation 1; Figure 1D; VL = uðxL ÞwðpL Þ + uð0Þwð1  pL Þ (9)

Current Biology 29, 2066–2074, June 17, 2019 2069


A B C D

E F G H

Figure 3. Rats Exhibit Evidence of Trial-by-Trial Learning


(A) Probability weighting function (left) and utility function (right) for one rat from model fit to trials following reward (turquoise) or no reward (black).
(B) Certainty equivalent areas predicted from model fits for all rats following rewarded and unrewarded trials (p = 2.46e7, paired t test).
(C) DProbability of repeating left or right choices (relative to mean probability of repeating), following each reward. Points above the dashed line indicate an
increased probability of repeating (‘‘stay’’); those below indicate a decreased probability (‘‘switch’’). Black curve is average ±SEM across rats.
(D) A separate cohort of 3 rats was trained with doubled water volumes. They exhibited lose-switch biases following 12 and 24 mL.
(E) Win-stay and lose-switch biases for one rat separated by reward history two trials back.
(F) Schematic illustrating that with concave utility, rewards should be more (less) discriminable when the reference point is high (low).
(G) Psychometric performance from one rat when the inferred reference point was low (black) or high (blue). Red curve is ideal performance.
(H) Value function with the median parameters across rats indicates loss aversion (median a = 0.6, k = 1.7).
See also Figure S3.

Model comparison (Akaike information criterion, AIC) favored (k < 1). Still, the median k across rats suggests similarity to hu-
the reference point model for all rats (Figures S3D and S3E). mans (Figure 3H).
We also parameterized the reference point as reflecting several Prospect theory does not account for how agents learn sub-
trials back with an exponential decay, where the time constant jective values from experience, and we explicitly incorporated
was a free parameter (see STAR Methods). For most rats trial history parameters to account for trial-by-trial learning
(20/36, 77%), this did not significantly improve model perfor- (Equations 6 and 7, Figure 4A). To examine learning dynamics,
mance compared to the reference point reflecting one trial we fit the model to the first and second half of all trials, once
back, although it was a better fit for a minority of rats with longer rats achieved criterion performance (Figure S4A). There was no
integration time constants over trials (Figures S3F–S3H). For the significant change in the parameters for the utility or probability
sake of simplicity and because it generally provided a better fit, weighting functions (Figure S4B). Rats showed a significant in-
we focused on the ‘‘one-trial back’’ reference point model. crease in the softmax parameter with training, indicating
Interestingly, the reference point from the model was not signif- increased sensitivity to value differences, and a decrease in
icantly correlated with average reward rate for each rat [37]; this one of the trial history parameters, h1 , indicating reduced win-
was true regardless of whether opt-out trials were included in stay biases (Figure S4).
estimates of average reward per trial (Pearson’s correlation, Reinforcement learning describes an adaptive process in
p > 0.05). which animals learn the value of states and actions. This frame-
With concave utility, rats should exhibit sharper psychometric work, however, implies linear utility and probability weighting.
performance when the reference point is high (and rewards are We simulated choices of a Q-learning agent, which learned the
more discriminable; Figure 3F). Indeed, performance was closer state-action values of each unique trial type. Fitting the prospect
to ideal when the reference point was high (Figure 3G; mean theory model to these simulated choices recovered linear utility
squared error [MSE] between psychometric and ideal perfor- and probability weighting functions (regardless of learning rate;
mance was 0.143 low ref versus 0.122 high ref, p = 3.4e5, Figure 4B). This is expected: since trial sequences were random-
paired t test across rats). ized (i.e., each trial was independent), basic reinforcement
A loss parameter k > 1 indicates ‘‘loss aversion’’ or a greater learning will learn the expected value of each option [6] and will
sensitivity to losses than gains (Equation 1). We observed a me- resolve to linear utility and probability weighting functions. We
dian k of 1.66 (Figure 3H). There was variability across rats: 16/36 therefore implemented a reinforcement learning model that
rats (44%) were not loss averse but were more sensitive to gains could accommodate nonlinear subjective utility and probability

2070 Current Biology 29, 2066–2074, June 17, 2019


A C

D E F G

Figure 4. Integrating Prospect Theory and Reinforcement Learning Captures Nonlinear Subjective Functions and Learning
(A) Prospect theory model predictions for each rat, without the trial history parameters (h1–h3, see Box 1) does not account for win-stay and lose-switch trial
history effects. Inclusion of these parameters accounts for these effects.
(B) Prospect theory model fit to simulated choices from a basic reinforcement learning agent yields linear utility and probability weighting functions over a range of
generative learning rates (0.2, 0.4, 0.6, 0.8, 1.0, overlaid).
(C) Schematic of model incorporating prospect theory and reinforcement learning.
(D) The hybrid model described in (C) accounts for win-stay and lose-switch effects.
(E) The model recovers nonlinear utility and probability weighting functions.
(F) Model comparison when the error term used in the model was the subjective value (as shown in C) or the expected value (probability 3 reward). Red arrow is
mean DAIC (p = 1.38e8, paired t test of AIC).
(G) Binned values of rats’ lose-switch biases (measured from the data) plotted against the best-fit learning rate, alearn . Pearson’s correlation coefficient is
0.37 across rats (p = 0.026).
See also Figure S4.

weighting but also learning over trials (Figure 4C [10, 38]). The to the model incorporating subjective value according to pros-
model assumed that the rats learned the value of left and right pect theory (Figure 4F; p = 1.38e8, paired t test of AIC). Finally,
choices for each unique combination of probability (p) and each rats’ learning rate ðalearn Þ was negatively correlated with the
reward (mL) according to the following equation: magnitude of their lose-swich biases, suggesting an inverse
relationship between learning dynamics governing gradual im-
Vp;mL ðt + 1Þ = Vp;mL ðtÞ + alearn ðwðpÞuðxÞ  Vp;mL ðtÞÞ (10)
provements in task performance, and trial-by-trial learning,
where alearn is the learning rate parameter, and wð pÞ and uðxÞ are which is deleterious to performance when trials are independent
parameterized as in Equations 2 and 3. The learned values of the [27, 39, 40]. Rats with slower dynamics (lower learning rate)
right and left prospects on each trial were transformed into showed more prominent trial history effects, whereas rats with
choice probabilities via a logistic function (see STAR Methods). rapid learning showed reduced trial history biases (Figure 4G).
We also implemented a global bias for left and right choices de-
pending on reward history (Figures 4C and 4D). In this model, DISCUSSION
utility and probability weighting functions were exclusively
used for learning or updating values, whereas choice depended There is a strong foundation for using animal models to study the
on the learned values on each trial. Although the parameters of cost-benefit calculations underlying economic theory [41]. In
the utility and probability weighting functions were free parame- foraging studies, animals either exploit a current option for
ters in this model, we recovered parameter values identical to the reward or explore a new one and often maximize their rate of
prospect theory model (Figure 4E; Figure S4C). Importantly, a reward [42–45]. Rodents exhibit complex aspects of economic
reinforcement learning model in which the expected value (EV) decision-making, including regret [46, 47] and sensitivity to
was the error signal driving learning underperformed compared sunk costs [48, 49]. Here, we applied prospect theory to rats.

Current Biology 29, 2066–2074, June 17, 2019 2071


Like humans, rats exhibit nonlinear concave utility for gains, ciated with task states, but the reward prediction error driving
probability distortion, reference dependence, and, frequently, learning is in units of subjective value according to prospect
loss aversion. Nearly all rats exhibited concave utility, which theory [10, 38]. This hypothesis is consistent with studies of
produced diminishing marginal sensitivity. In contrast, most dopamine neurons, which are thought to instantiate reward
studies in monkeys have reported convex utility [24, 50–53] prediction errors in the brain in a temporal-difference learning
(but see [25]). In Expected Utility Theory, concave utility indicates algorithm [6, 7, 13, 14]. Conditioned stimuli predicting rewards
risk aversion [26]. However, in prospect theory, concave utility with different probabilities or magnitudes have been shown to
can coincide with risk-seeking behavior due to the elevation of elicit phasic dopamine responses reflecting the value of the ex-
the probability weighting function [28, 29]. pected reward [71–73]. In delay discounting tasks, the phasic
Our rats differ from primates in that they do not appear to dopamine response reflects discounted value of delayed re-
underweight moderate and high probabilities [1, 24, 28]. The ‘‘in- wards in monkeys and rats [74, 75]. Finally, recent work has
verted-S’’ shape of the probability weighting function may reflect shown that dopamine reward prediction errors reflect the
diminishing sensitivity relative to two reference points that, in the shape of monkeys’ measured utility functions [76]. The hypoth-
probability domain, correspond to 0 and 1 [2, 28]. The rats, either esis of a reward prediction error in units of subjective value
due to the task or species differences, may not treat certainty as (perhaps according to prospect theory in the case of explicitly
a reference point. described lotteries) is also conceptually related to studies of
Reward history modified rats’ risk preferences, producing homeostatic reinforcement learning, in which internal state in-
shifts in utility and probability weighting functions. The extent fluences subjective valuation [77]. This hypothesis bridges
to which risk preferences are stable traits is an area of active animal learning theory, reinforcement learning, and economic
research [54, 55]. Recent work suggests a general or stable concepts of subjective value. A key topic of future research
component of risk attitudes, but variability across domains should address how subjective estimates of value arise: are
(e.g., finance, recreation [55]). In foraging tasks, risk preferences they innate, learned early in life, or constantly evolving over
reflect food availability and/or energy budget in a variety of spe- the lifespan?
cies [44, 56]. Here, we document dynamic risk preferences;
these trial-by-trial dynamics are not likely driven by physiological STAR+METHODS
factors (e.g., energy budget) but may reflect dynamic internal or
cognitive states mediated by reinforcement learning. Detailed methods are provided in the online version of this paper
We found evidence for reference dependence, in which rats’ and include the following:
treatment of outcomes as gains or losses reflected their reward
history. Studies in several species, including capuchin monkeys d KEY RESOURCES TABLE
[57], have also suggested reference dependence. Starlings and d CONTACT FOR REAGENT AND RESOURCE SHARING
locusts prefer options that previously were encountered under d EXPERIMENTAL MODEL AND SUBJECT DETAILS
greater hunger, presumably because those rewards were B Subjects
perceived to have greater reference-dependent value [58–60]. d METHOD DETAILS
Rats modulate their approach speed for rewards depending on B Behavioral training
previously experienced reward amounts [61, 62]. Regret, which d QUANTIFICATION AND STATISTICAL ANALYSIS
reflects a post-decision valuation of a choice relative to an un- B Behavioral model
chosen alternative, may be a reference-dependent computation; B Alternative models
for regret, the reference point would be the counterfactual pros- B Psychometric curves
pect [63]. B Logistic regression to compare regressors to probabil-
While variable, the median loss parameter ðkÞ across rats indi- ity weighting functions
cated loss aversion, which has been documented in capuchin B Certainty equivalents
monkeys [64]. We note that we did not examine losses by taking B Behavioral model with reference point
reinforcers away from the animal. However, several decision the- B Behavioral model integrating Prospect Theory and
ories [32, 65] posit that rewards less than the reference point are Reinforcement Learning
losses. Loss aversion in humans is remarkably variable [66] and d DATA AND SOFTWARE AVAILABILITY
possibly domain specific [67, 68]. The nature of loss aversion is
intriguing: is it a constant, a psychological trait similar to risk pref- SUPPLEMENTAL INFORMATION
erences [55], or an emergent property of constructing prefer-
ences [69]? Supplemental Information can be found online at https://doi.org/10.1016/j.
Prospect theory, animal learning theory, and reinforcement cub.2019.05.013.
learning are complementary frameworks for studying deci-
sion-making (but see [70]). Reinforcement learning and animal ACKNOWLEDGMENTS
learning theory are principally concerned with how subjects
learn values over experience and use those learned values to The authors thank Paul Glimcher, Kenway Louie, Mike Long, Cristina Savin,
David Schneider, Kevin Miller, Ben Scott, Mikio Aoi, Matthew Lovett-Barron,
make decisions. Prospect theory, in contrast, does not address
Cristina Domnisoru, Alejandro Ramirez, and members of the Brody lab for
learning but describes nonlinear distortions that account for the helpful discussions and comments on the manuscript. We thank J. Teran, K.
decision. We propose a simple approach for integrating these Osorio, L. Teachen, and A. Sirko for animal training. This work was funded in
frameworks, in which animals learn the values of actions asso- part by a K99/R00 award from NIMH (MH111926 to C.M.C.).

2072 Current Biology 29, 2066–2074, June 17, 2019


AUTHOR CONTRIBUTIONS 19. Barron, G., and Erev, I. (2003). Small feedback-based decisions and their
limited correspondence to description-based decisions. J. Behav. Decis.
All authors provided feedback on analyses and the manuscript. C.M.C. de- Making 16, 215–233.
signed and performed all experiments, analyzed the data, and wrote the initial 20. Erev, I., and Roth, A.E. (2014). Maximization, learning, and economic
draft of the paper. A.T.P. and C.D.B. provided guidance for modeling and behavior. Proc. Natl. Acad. Sci. USA 111 (Suppl 3 ), 10818–10825.
analysis.
21. Herrnstein, R.J. (1961). Relative and absolute strength of response as a
function of frequency of reinforcement. J. Exp. Anal. Behav. 4, 267–272.
DECLARATION OF INTERESTS 22. Heyman, G.M., and Duncan Luce, R. (1979). Operant matching is not a
logical consequence of maximizing reinforcement rate. Anim. Learn.
The authors declare no competing interests. Behav. 7, 133–140.
23. Gallistel, C.R., Mark, T.A., King, A.P., and Latham, P.E. (2001). The rat ap-
Received: November 7, 2018 proximates an ideal detector of changes in rates of reward: implications for
Revised: March 6, 2019 the law of effect. J. Exp. Psychol. Anim. Behav. Process. 27, 354–372.
Accepted: May 1, 2019
24. Stauffer, W.R., Lak, A., Bossaerts, P., and Schultz, W. (2015). Economic
Published: May 30, 2019
choices reveal probability distortion in macaque monkeys. J. Neurosci.
35, 3146–3154.
REFERENCES 25. Yamada, H., Tymula, A., Louie, K., and Glimcher, P.W. (2013). Thirst-
dependent risk preferences in monkeys identify a primitive form of wealth.
1. Kahneman, D., and Tversky, A. (1979). Prospect Theory: An Analysis of Proc. Natl. Acad. Sci. USA 110, 15788–15793.
Decision under Risk. Econometrica 47, 263.
26. von Neumann, J., and Morgenstern, O. (2007). Theory of Games and
2. Tversky, A., and Kahneman, D. (1992). Advances in prospect theory: Economic Behavior (Princeton University Press).
Cumulative representation of uncertainty. J. Risk Uncertain. 5, 297–323.
27. Scott, B.B., Constantinople, C.M., Erlich, J.C., Tank, D.W., and Brody,
3. Bush, R.R., and Mosteller, F. (2006). A mathematical model for simple C.D. (2015). Sources of noise during accumulation of evidence in unre-
learning. In Selected Papers of Frederick Mosteller, S.E. Fienberg, and strained and voluntarily head-restrained rats. eLife 4, e11308.
D.C. Hoaglin, eds. (Springer), pp. 221–234.
28. Gonzalez, R., and Wu, G. (1999). On the shape of the probability weighting
4. Bush, R.R., and Mosteller, F. (2006). A model for stimulus generalization function. Cognit. Psychol. 38, 129–166.
and discrimination. In Selected Papers of Frederick Mosteller, S.E. 29. Abdellaoui, M., Bleichrodt, H., and L’Haridon, O. (2008). A tractable
Fienberg, and D.C. Hoaglin, eds. (Springer), pp. 235–250. method to measure utility and loss aversion under prospect theory.
5. Rescorla, R.A., and Wagner, A.R. (1972). A theory of pavlovian condition- J. Risk Uncertainty 36, 245.
ing: Variations in the effectiveness of reinforcement and nonreinforce- 30. Barberis, N. (2013). Thirty years of prospect theory in economics: A review
ment. In Classical Conditioning II: Current Research and Theory, A.H. and assessment J. Econ. Perspect 27, 173–196.
Black, and W.F. Prokasy, eds. (Appleton-Century-Crofts), pp. 64–99.
31. Köszegi, B., and Rabin, M. (2007). Reference-dependent risk attitudes.
6. Glimcher, P.W. (2011). Understanding dopamine and reinforcement Am. Econ. Rev. 97, 1047–1073.
learning: the dopamine reward prediction error hypothesis. Proc. Natl. 32. Köszegi, B., and Rabin, M. (2006). A model of reference-dependent pref-
Acad. Sci. USA 108 (Suppl 3 ), 15647–15654.
erences. Q. J. Econ. 121, 1133–1165.
7. Sutton, R.S., and Barto, A.G. (2018). Reinforcement Learning: An 33. Bleichrodt, H., Pinto, J.L., and Wakker, P.P. (2001). Making descriptive
Introduction (The MIT Press). use of prospect theory to improve the prescriptive use of expected utility.
8. Thorndike, E.L. (1911). Animal Intelligence: Experimental Studies (The Manage. Sci. 47, 1498–1514.
Macmillan Company). 34. van Osch, S.M.C., van den Hout, W.B., and Stiggelbout, A.M. (2006).
9. Pavlov, P.I. (2010). Conditioned reflexes: An investigation of the physiolog- Exploring the reference point in prospect theory: gambles for length of
ical activity of the cerebral cortex. Ann. Neurosci. 17, 136–141. life. Med. Decis. Making 26, 338–346.
10. Glimcher, P.W. (2010). Foundations of Neuroeconomic Analysis (Oxford 35. Khaw, M.W., Glimcher, P.W., and Louie, K. (2017). Normalized value cod-
University Press). ing explains dynamic adaptation in the human valuation process. Proc.
Natl. Acad. Sci. USA 114, 12696–12701.
11. Schultz, W., Dayan, P., and Montague, P.R. (1997). A neural substrate of
prediction and reward. Science 275, 1593–1599. 36. Hunter, L.E., and Gershman, S.J. (2018). Reference-dependent prefer-
ences arise from structure learning. bioRxiv. https://doi.org/10.1101/
12. Lee, D., Seo, H., and Jung, M.W. (2012). Neural basis of reinforcement
252692.
learning and decision making. Annu. Rev. Neurosci. 35, 287–308.
37. Constantino, S.M., and Daw, N.D. (2015). Learning the opportunity cost of
13. Niv, Y. (2009). Reinforcement learning in the brain. J. Math. Psychol. 53,
time in a patch-foraging task. Cogn. Affect. Behav. Neurosci. 15, 837–853.
139–154.
38. Niv, Y., Edlund, J.A., Dayan, P., and O’Doherty, J.P. (2012). Neural predic-
14. Bornstein, A.M., and Daw, N.D. (2011). Multiplicity of control in the basal tion errors reveal a risk-sensitive reinforcement-learning process in the hu-
ganglia: Computational roles of striatal subregions. Curr. Opin. man brain. J. Neurosci. 32, 551–562.
Neurobiol. 21, 374–380.
39. Akrami, A., Kopec, C.D., Diamond, M.E., and Brody, C.D. (2018). Posterior
15. van der Meer, M.A.A., and Redish, A.D. (2011). Ventral striatum: a critical parietal cortex represents sensory history and mediates its effects on
look at models of learning and evaluation. Curr. Opin. Neurobiol. 21, behaviour. Nature 554, 368–372.
387–392.
40. Busse, L., Ayaz, A., Dhruv, N.T., Katzner, S., Saleem, A.B., Schölvinck,
16. Averbeck, B.B., and Costa, V.D. (2017). Motivational neural circuits under- M.L., Zaharia, A.D., and Carandini, M. (2011). The detection of visual
lying reinforcement learning. Nat. Neurosci. 20, 505–512. contrast in the behaving mouse. J. Neurosci. 31, 11351–11361.
17. Ito, M., and Doya, K. (2011). Multiple representations and algorithms for 41. Kagel, J.H., Battalio, R.C., and Green, L. (1995). Economic Choice Theory:
reinforcement learning in the cortico-basal ganglia circuit. Curr. Opin. An Experimental Analysis of Animal Behavior (Cambridge University
Neurobiol. 21, 368–373. Press).
18. Hertwig, R., and Erev, I. (2009). The description-experience gap in risky 42. Charnov, E.L. (1976). Optimal foraging, the marginal value theorem. Theor.
choice. Trends Cogn. Sci. 13, 517–523. Popul. Biol. 9, 129–136.

Current Biology 29, 2066–2074, June 17, 2019 2073


43. Kacelnik, A. (1984). Central place foraging in starlings (Sturnus vulgaris). I. €hmer, D., and Stone, R. (2011). Anticipated regret as an explanation of
63. Kra
Patch residence time. J. Anim. Ecol. 53, 283–299. uncertainty aversion. Econom. Theory 52, 709–728.
44. Stephens, D.W., and Krebs, J.R. (1986). Foraging Theory (Princeton 64. Chen, M.K., Keith Chen, M., Lakshminarayanan, V., and Santos, L.R.
University Press). (2006). How basic are behavioral biases? Evidence from capuchin monkey
45. Ollason, J.G. (1980). Learning to forage—optimally? Theor. Popul. Biol. 18, trading behavior. J. Polit. Econ. 114, 517–537.
44–56. 65. Gul, F. (1991). A theory of disappointment aversion. Econometrica 59,
46. Steiner, A.P., and Redish, A.D. (2014). Behavioral and neurophysiological 667–686.
correlates of regret in rat decision-making on a neuroeconomic task. Nat.
66. Sayman, S., and Öncüler, A. (2005). Effects of study design characteristics
Neurosci. 17, 995–1002.
on the WTA–WTP disparity: A meta analytical framework. J. Econ.
47. Sweis, B.M., Thomas, M.J., and Redish, A.D. (2018). Mice learn to avoid Psychol. 26, 289–312.
regret. PLoS Biol. 16, e2005853.
67. Dhar, R., and Wertenbroch, K. (1999). Consumer choice between hedonic
48. Sweis, B.M., Abram, S.V., Schmidt, B.J., Seeland, K.D., MacDonald, A.W.,
and utilitarian goods. J. Marketing Res XXXVII, 60–71.
3rd, Thomas, M.J., and Redish, A.D. (2018). Sensitivity to ‘‘sunk costs’’ in
mice, rats, and humans. Science 361, 178–181. 68. Heath, T.B., Ryu, G., Chatterjee, S., McCarthy, M.S., Mothersbaugh, D.L.,
Milberg, S., and Gaeth, G.J. (2000). Asymmetric competition in choice
49. Wikenheiser, A.M., and David Redish, A. (2012). Sunk costs account for
and the leveraging of competitive disadvantages. J. Consum. Res. 27,
rats’ decisions on an intertemporal foraging task. BMC Neurosci. 13, 63.
291–308.
50. McCoy, A.N., and Platt, M.L. (2005). Risk-sensitive neurons in macaque
posterior cingulate cortex. Nat. Neurosci. 8, 1220–1227. €chter, S., and Herrmann, A. (2006). Exploring the Nature
69. Johnson, E.J., Ga
of Loss Aversion. IZA Discussion Papers 2015. Institute for the Study of
51. Hayden, B.Y., and Platt, M.L. (2007). Temporal discounting predicts risk
Labor (IZA).
sensitivity in rhesus macaques. Curr. Biol. 17, 49–53.
52. So, N.Y., and Stuphorn, V. (2010). Supplementary eye field encodes option 70. Plonsky, O., and Erev, I. (2017). Learning in settings with partial feedback
and action value for saccades with variable reward. J. Neurophysiol. 104, and the wavy recency effect of rare events. Cognit. Psychol. 93, 18–43.
2634–2653. 71. Fiorillo, C.D., Tobler, P.N., and Schultz, W. (2003). Discrete coding of
53. Chen, X., and Stuphorn, V. (2018). Inactivation of medial frontal cortex reward probability and uncertainty by dopamine neurons. Science 299,
changes risk preference. Curr. Biol. 28, 3709. 1898–1902.
54. Schildberg-Hörisch, H. (2018). Are risk preferences stable? J. Econ. 72. Morris, G., Arkadir, D., Nevet, A., Vaadia, E., and Bergman, H. (2004).
Perspect. 32, 135–154. Coincident but distinct messages of midbrain dopamine and striatal toni-
55. Frey, R., Pedroni, A., Mata, R., Rieskamp, J., and Hertwig, R. (2017). Risk cally active neurons. Neuron 43, 133–143.
preference shares the psychometric structure of major psychological 73. Tobler, P.N., Fiorillo, C.D., and Schultz, W. (2005). Adaptive coding of
traits. Sci. Adv. 3, e1701381. reward value by dopamine neurons. Science 307, 1642–1645.
56. Kacelnik, A., and Bateson, M. (1996). Risky theories—The effects of vari-
74. Kobayashi, S., and Schultz, W. (2008). Influence of reward delays on re-
ance on foraging decisions. Am. Zool. 36, 402–434.
sponses of dopamine neurons. J. Neurosci. 28, 7837–7846.
57. Lakshminarayanan, V.R., Keith Chen, M., and Santos, L.R. (2011). The
75. Day, J.J., Jones, J.L., Wightman, R.M., and Carelli, R.M. (2010). Phasic nu-
evolution of decision-making under risk: Framing effects in monkey risk
cleus accumbens dopamine release encodes effort- and delay-related
preferences. J. Exp. Soc. Psychol. 47, 689–693.
costs. Biol. Psychiatry 68, 306–309.
58. Pompilio, L., and Kacelnik, A. (2005). State-dependent learning and sub-
optimal choice: when starlings prefer long over short delays to food. 76. Stauffer, W.R., Lak, A., and Schultz, W. (2014). Dopamine reward predic-
Anim. Behav. 70, 571–578. tion error responses reflect marginal utility. Curr. Biol. 24, 2491–2500.

59. Pompilio, L., Kacelnik, A., and Behmer, S.T. (2006). State-dependent 77. Keramati, M., and Gutkin, B. (2014). Homeostatic reinforcement learning
learned valuation drives choice in an invertebrate. Science 311, 1613– for integrating reward collection and physiological stability. eLife 3..
1615. Published online December 2, 2014. 10.7554/eLife.04811.
60. Marsh, B. (2004). Energetic state during learning affects foraging choices 78. Brunton, B.W., Botvinick, M.M., and Brody, C.D. (2013). Rats and humans
in starlings. Behav. Ecol. 15, 396–399. can optimally accumulate evidence for decision-making. Science 340,
61. Crespi, L.P. (1942). Quantitative variation of incentive and performance in 95–98.
the white rat. Am. J. Psychol. 55, 467. 79. Hanks, T.D., Kopec, C.D., Brunton, B.W., Duan, C.A., Erlich, J.C., and
62. Zeaman, D. (1949). Response latency as a function of the amount of rein- Brody, C.D. (2015). Distinct relationships of parietal and prefrontal cortices
forcement. J. Exp. Psychol. 39, 466–483. to evidence accumulation. Nature 520, 220–223.

2074 Current Biology 29, 2066–2074, June 17, 2019


STAR+METHODS

KEY RESOURCES TABLE

REAGENT or RESOURCE SOURCE IDENTIFIER


Experimental Models: Organisms/Strains
Rat: Long Evans Taconic RRID: RGD_1566430
Rat: Sprague Dawley Taconic RRID: RGD_1566440
Rat: Long Evans Hilltop http://www.hilltoplabs.com/
Rat: Long Evans Harlan RRID: RGD_5508398
Rat: Pvalb-iCre University of Missouri RRRC RRID: RGD_10412329
Software and Algorithms
MATLAB MathWorks RRID: SCR_001622
Behavioral control software Bcontrol http://brodywiki.princeton.edu/bcontrol/
index.php/Main_Page

CONTACT FOR REAGENT AND RESOURCE SHARING

Further information and requests for resources should be directed to and will be fulfilled by the Lead Contact, Christine Constanti-
nople ([email protected]). Transgenic (Pvalb-iCre)2Ottc rats (n = 5) were obtained by an MTA from University of Missouri
RRRC.

EXPERIMENTAL MODEL AND SUBJECT DETAILS

Subjects
A total of 39 male rats between the ages of 6 and 24 months were used for this study, including 35 Long-evans and 4 Sprague-Dawley
rats (Rattus norvegicus). The Long-evans cohort also included LE-Tg (Pvalb-iCre)2Ottc rats (n = 5) made at NIDA/NIMH and obtained
from the University of Missouri RRRC (transgenic line 0773). These are BAC transgenic rats expressing Cre recombinase in parval-
bumin expressing neurons. Animal use procedures were approved by the Princeton University Institutional Animal Care and Use
Committee (IACUC #1853) and carried out in accordance with National Institutes of Health standards.
Rats were typically housed in pairs or singly; rats that trained during the day were housed in a reverse light cycle room. Some rats
trained overnight, and were not housed with a reverse light cycle. Access to water was scheduled to within-box training, 2-4 hours per
day, usually 7 days a week, and between 0 and 1 hour ad lib following training.

METHOD DETAILS

Behavioral training
Rats were trained in a high-throughput facility using a computerized training protocol. Rats were trained in operant training boxes with
three nose ports. When an LED from the center port was illuminated, the animal could initiate a trial by poking his nose in that port;
upon trial initiation the center LED turned off. While in the center port, rats were continuously presented with a train of randomly timed
clicks from a left speaker and, simultaneously, a different train of clicks from a right speaker. The click trains were generated by Pois-
son processes with different underlying rates [78, 79]; the rates conveyed the water volume baited at each side port. After a variable
pre-flash interval ranging from 0 to 350ms, rats were also presented with light flashes from the left and right side ports; the number of
flashes conveyed reward probability at each port. Each flash was 20ms in duration; flashes were presented in fixed bins, spaced
every 250ms, to avoid perceptual fusion of consecutive flashes [27]. After a variable post-flash delay period from 0 to 500ms, the
end of the trial was cued by a go sound and the center LED turning back on. The animal was then free to choose the left or right center
port, and potentially collect reward.
The trials were self-paced: on trials when rats did not receive reward, they were able to initiate another trial immediately. However, if
rats terminated center fixation prematurely, they were penalized with a white noise sound and a time out penalty. Since rats dispro-
portionately terminated trials offering low volumes, we scaled the time out penalty based on the minimum reward offered. The
time out penalties were adjusted independently for each rat to minimize terminated trials (as an example, several rats were penalized
with 6 s time-outs for terminating trials offering a minimum of 6mL, 4.5 s for terminating trials offering a minimum of 12mL, 3 s for ter-
minating trials offering a minimum of 24mL, and 1.5 s for terminating trials offering a minimum of 48mL).

Current Biology 29, 2066–2074.e1–e5, June 17, 2019 e1


In this task, the rats were required to reveal their preference between safe and risky rewards. To determine when rats were
sufficiently trained to understand the meaning of the cues in the task, we evaluated the ‘‘efficiency’’ of their choices as follows.
For each training session, we computed the average expected value per trial of an agent that chose randomly, and a perfect expected
value maximizer, or an agent that always chose the side with the greater expected value. We compared the expected value per trial
from the rat’s choices relative to these lower and upper bounds. Specifically, the efficiency was calculated as follows:
ratEV=trial  randEV=trial
efficiency = 0:5 + 0:5 (11)
EV maxEV=trial  randEV=trial

The threshold for analysis was the median performance of all sessions minus 1.5 times the interquartile range of performance across
the second half of all sessions. Once performance surpassed this threshold, it was typically stable across months. Occasional days
with poor performance were usually due to hardware malfunctions in the rig. Days in which performance was below threshold were
excluded from analysis.

QUANTIFICATION AND STATISTICAL ANALYSIS

Behavioral model
We fit a behavioral model separately for each rat (see Box 1 for description of the model). We used MATLAB’s constrained minimi-
zation function fmincon to minimize the sum of the negative log likelihoods with respect to the model parameters. 20 random seeds
were used in the maximum likelihood search for each rat; parameter values with the maximum likelihood of these seeds were deemed
the best fit parameters. When evaluating model performance (e.g., Figure 1E), we performed 5-fold cross-validation and evaluated
the predictive power of the model on the held-out test sets.
We initially evaluated three different parametric forms of the probability weighting function, the one- and two-parameter Prelec
models and the linear in log-odds model (see below) [24, 28]. We compared the different parametric forms using Akaike Information
Criterion (AIC), AIC = 2k + 2nLL, where k is the number of parameters, and nLL is the negative log likelihood of the model. AIC
favored the two-parameter Prelec model for nearly all rats, although some rats were equally well-fit by the linear in log-odds model
(data not shown). Therefore, we implemented the two-parameter Prelec model.
d
One­parameter Prelec : wðpÞ = eðlnðpÞÞ ; (12)
where p is the true probability, and d is a free parameter. d controls the curvature of the weighting function; its crossover point is fixed
at 1=e.
d
Two­parameter Prelec : wðpÞ = ebðlnðpÞÞ ; (13)
Where p is the true probability, b and d are free parameters. d primarily controls the curvature and b primarily controls the elevation of
the weighting function.
dpr
Linear in log­odds : wðpÞ = r; (14)
dpr + ð1  pÞ
where p is the true probability and g and d are free parameters. g primarily controls the curvature of the weighting function and d con-
trols the elevation.

Alternative models
We compared the prospect theory model to a number of alternative models. The Expected Utility Theory model (EUT) has the same
form as the prospect theory model, except that the subjective value on each side is the product of objective probability and subjective
utility (see Figure S2A):
VR = uðxR ÞpR (15)

VL = uðxL ÞpL : (16)


The linear weighting model fit different weights to flashes and clicks, before combining them (see Figure S1G):
uðxÞ = cx (17)

wðpÞ = dp; (18)


where c and d are constants, x is the click rate and p is the number of flashes presented to the animal on each side. The value on each
side is the product of linearly weighted flashes and clicks:
V = uðxÞwðpÞ: (19)

e2 Current Biology 29, 2066–2074.e1–e5, June 17, 2019


We next included sensory noise as part of a perceptual strategy. Previously, we have used a signal-detection theory (SDT) model
to estimate rats’ perceptual variability (noise) in estimating numbers of flashes and clicks; we found they exhibit a property called
scalar variability, meaning that the standard deviation in their estimate grows linearly with the mean [27]. We implemented four
different signal-detection theory models that instantiated scalar noise, according to this work. The models differ in the decision
rules they apply. The models assume that on each trial, the rats’ estimate of the number of flashes (clicks) on each side is a
random variable drawn from a normal distribution, the mean of which corresponds to the actual number of flashes (clicks) pre-
sented to the animal. According to scalar variability, the standard deviation is linearly related to the number of flashes (clicks).
There are two free parameters that define this linear relationship; we fit separate linear scaling relationships to the estimation
of clicks and flashes:
sF = mF x + bF (20)

sC = mC x + bC (21)
Where mF , mC , bF , bC , are free parameters. x is the number of flashes (clicks) presented to the rat on each side. For the first two SDT
models, we compute choice probabilities based on the flash difference ðDFÞ, and click difference ðDCÞ separately, where these
choice probabilities are calculated as follows, according to [27]:
Z N  qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
Pðwent rightjDFÞ = N RF  LF ; s2RF + s2LF dðRF  LF Þ (22)
0

Z N  qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
Pðwent rightjDCÞ = N RC  LC ; s2RC + s2LC dðRC  LC Þ (23)
0

RF ðRC Þ and LF ðLC Þ are the number of right and left flashes (clicks) presented to the rat on each trial, and the s terms are the noise
terms defined in Equations 20 and 21. One model (SDT1) assumes that the rat’s choice is given by the average of these probabilities
(see Figure S2C). Another model (SDT2) assumes that the rat’s choice is given by the most informative cue on each trial (the choice
probability most different from 0.5; see Figure S2D).
Alternatively, it’s possible that the rats combine the noisy estimates of flashes and clicks on each side. Therefore, we evaluated two
additional models parameterized as follows:
Z N  qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
Pðwent rightjright evÞ = N RF + RC ; s2RF + s2RC dðRF + RC Þ (24)
0

Z N  qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
Pðwent rightjleft evÞ =  N LF + LC ; s2LF + s2LC dðLF + LC Þ (25)
0

One model (SDT3) assumes that the rat’s choice is given by the average of these probabilities (Figure S2E), and the other (SDT4) as-
sumes that the rat’s choice is given by the most informative side on each trial (the choice probability most different from 0.5;
Figure S2F).

Psychometric curves
We measured rats’ psychometric performance when choosing between the safe and risky options. For these analyses, we excluded
trials where both the left and right side ports offered certain rewards. We binned the data into 11 bins of the difference in the subjective
value (inferred from the behavioral model) of the safe minus the risky option. Psychometric plots show the probability that the subjects
chose the safe option as a function of this difference (see Figure S1D). We fit a 4-parameter sigmoid of the form:
1  2a
PðchooseS Þ = y0 + ; (26)
1 + eðbðVsVR x0 ÞÞ
where y0 , a, b, and x0 were free parameters. Parameters were fit using a gradient-descent algorithm to minimize the mean square
error between the data and the sigmoid, using the sqp algorithm in MATLAB’s constrained optimization function fmincon.

Logistic regression to compare regressors to probability weighting functions


We fit a logistic regression model with a separate regressor for each probability the rat may have been offered (0 to 1 in 0.1 incre-
ments), plus a constant term. To compare the regressors to the parametric fits, we normalized the regressors for each probability
by subtracting the minimum and dividing by the maximum regressor value, so they ranged from 0 to 1 (Figure 2G). We computed
the mean square error between these normalized regressor values and the probability weighting functions (Figure 2H). The model
was fit using MATLAB’s function glmfit.

Current Biology 29, 2066–2074.e1–e5, June 17, 2019 e3


Certainty equivalents
Non-parametric estimate
We estimated rat’s certainty equivalents by evaluating their psychometric performance (%Chose risky) for each gamble of 48 mL, and
estimating the value of the psychometric curve at which performance was at 50% (Figure 2I). To do this, we fit a line to the two points
of the psychometric curve above and below chance level using MATLAB’s regress.m function, and interpolated the value of that line
that would correspond to 50%.
Analytic expression for CE from the model fits
We compared our estimates of rats’ certainty equivalents from their behavioral data to an analytic expression from the subjective
probability and utility functions we obtained from the model. We define the certainty equivalent, x~, as the guaranteed reward equal
to a gamble, x with probability p. In the case of linear probability weighting, we express this as follows:
pxa = x~a
lnðpÞ + alnðxÞ = alnðx~Þ
 
x~
lnðpÞ = aln
x (27)
 
1 x~
lnðpÞ = ln
a x
1 x~
pa =
x
For nonlinear probability weighting, substituting wð pÞ for p yields an analytic expression for the certainty equivalent from the expo-
nent of the utility function ðaÞ and the probability weighting function (also see [29]).

Behavioral model with reference point


The behavioral model with the reference point (see Figure 3) was similar to the behavioral model described above, except for elab-
orations of the subjective utility function uðxÞ and subjective value ðVR ; VL Þ. We modified the subjective utility function to include a
dynamic reference point, r, below which value was treated negatively (as a loss). The relative amplitude of losses versus gains
was controlled by the scale parameter k.
ðx  rÞa if x > r
uðxÞ = a (28)
kðr  xÞ if x < r
where, as before, a is the exponent of the utility function, and x is the offered reward. We also reparameterized subjective value. The
risky prospect offers two possible outcomes: x with probability p, and 0 with probability 1  p. In the absence of a reference point, the
zero reward outcome (0, 1  p) does not influence choice ð0a = 0Þ. However, if r > 0, the zero reward outcome can be perceived as a
loss. Therefore, in the reference point model, subjective value was reparameterized to incorporate this possible outcome of the
gamble:
VR = uðxR ÞwðpR Þ + uð0Þwð1  pR Þ (29)

VL = uðxL ÞwðpL Þ + uð0Þwð1  pL Þ (30)


We parameterized the reference point, r, to take on two discrete values depending on whether the previous trial was rewarded or not.
There were two additional free parameters, y and m that could account for asymmetric effects of rewarded and unrewarded trials:

m if t  1 was rewarded
rðtÞ = (31)
y if t  1 was not rewarded:

We constrained r > 0:

Behavioral model integrating Prospect Theory and Reinforcement Learning


This behavioral model was similar to the prospect theory model, except that wð pÞ and uðxÞ were used to update the subject’s value of
each unique trial type based on experience. There were separate state-action value matrices for left and right choices. The entry of
each matrix corresponded to a unique trial type, for each unique probability p and reward volume mL,
Vp;mL ðt + 1Þ = Vp;mL ðtÞ + alearn ðwðpÞuðxÞ  Vp;mL ðtÞÞ; (32)
where alearn is an additional free parameter fit by the model. wð pÞ and uðxÞ are parameterized as they were in the prospect theory
model, according to Equations 2 and 3 in the main text.

e4 Current Biology 29, 2066–2074.e1–e5, June 17, 2019


We also included a global bias for the entire left or right value matrix that reflected reward history as follows:

bwin uðxÞ if t was rewarded
bias = ; (33)
bloss uðxÞ if t was rewarded

where uðxÞ corresponds to the subjective utility of the chosen reward volume. Choice probabilities were computed according to
Equation 6 in the main text.

DATA AND SOFTWARE AVAILABILITY

Behavioral data are available upon request by contacting the Lead Contact, Christine Constantinople ([email protected]).

Current Biology 29, 2066–2074.e1–e5, June 17, 2019 e5

You might also like