PIIS0960982219305469
PIIS0960982219305469
Report
*Correspondence: [email protected]
https://doi.org/10.1016/j.cub.2019.05.013
SUMMARY might occur [3–5]. Although these models described how ani-
mals might learn associations between stimuli, they were natu-
In 1979, Daniel Kahneman and Amos Tversky pub- rally extended to account for learning values from experience
lished a ground-breaking paper titled ‘‘Prospect [7, 10]. Models of trial-and-error learning from animal learning
Theory: An Analysis of Decision under Risk,’’ which theory form the basis for reinforcement learning algorithms,
presented a behavioral economic theory that ac- including temporal difference learning, which captures temporal
counted for the ways in which humans deviate from relationships between predictors and outcomes [7, 11]. Rein-
forcement learning provides a powerful framework for value-
economists’ normative workhorse model, Expected
based decision-making in psychology and neuroscience, in
Utility Theory [1, 2]. For example, people exhibit
which value estimates are learned from experience and updated
probability distortion (they overweight low probabili- trial-to-trial based on prediction errors [7, 12, 13].
ties), loss aversion (losses loom larger than gains), Reinforcement learning has had profound impact in part
and reference dependence (outcomes are evaluated because many of its components have been related to neural sub-
as gains or losses relative to an internal reference strates [12, 14–17]. However, standard reinforcement learning al-
point). We found that rats exhibited many of these gorithms dictate that agents learn the expected value (volume 3
same biases, using a task in which rats chose probability) of actions or outcomes with experience [6], meaning
between guaranteed and probabilistic rewards. that they will exhibit linear utility and probability weighting func-
However, prospect theory assumes stable prefer- tions. This is incompatible with prospect theory. We found that
ences in the absence of learning, an assumption at rats exhibited signatures of both prospect theory and reinforce-
ment learning, and we present an initial attempt to integrate these
odds with alternative frameworks such as animal
frameworks. First, we focus on prospect theory.
learning theory and reinforcement learning [3–7].
Most economic studies examine decisions between clearly
Rats also exhibited trial history effects, consistent described lotteries (i.e., ‘‘decisions from description’’). Studies
with ongoing learning. A reinforcement learning of risky choice in rodents, however, typically examine decisions
model in which state-action values were updated between prospects that are learned over time (i.e., ‘‘decisions
by the subjective value of outcomes according to from experience’’), which are difficult to reconcile with prospect
prospect theory reproduced rats’ nonlinear utility theory [18–20]. We designed a task in which reward probability
and probability weighting functions and also and amount are communicated by sensory evidence, eliciting
captured trial-by-trial learning dynamics. decisions from description rather than experience. This enabled
behavioral economic approaches, such as estimating utility
functions.
RESULTS Rats initiated a trial by nose-poking in the center port of a
three-port wall. Light flashes were presented from left and right
Two key components of prospect theory are utility (rewards are side ports, and the number of flashes conveyed the probability
evaluated by the subjective satisfaction or ‘‘utility’’ they provide) of water reward at each port. Simultaneously, auditory clicks
and probability distortion (people often overweight low and un- were presented from left and right speakers, and click rate
derweight high probabilities; Figure 1D). In this theory, subjective conveyed the volume of water reward baited at each port (Fig-
value is determined by the shapes of subjects’ utility and proba- ures 1A and 1B). One port offered a guaranteed or safe reward,
bility weighting functions. and the other offered a risky reward with an explicitly cued prob-
Learning theories provide an alternative account of subjective ability. The safe and risky ports (left or right) varied randomly.
value. In animal learning theory, Thorndike’s ‘‘Law of Effect’’ One of four water volumes could be the guaranteed or risky
described the effect of reinforcers on action selection [8], and reward (6, 12, 24, 48 mL); risky reward probabilities ranged
Pavlov’s subsequent experiments demonstrated how animals from 0 to 1, in increments of 0.1 (Figures 1A and 1B).
learn to associate stimuli with rewards [9]. The Rescorla-Wagner High-throughput training generated 36 trained rats and many
model of classical conditioning formalized how such learning tens of thousands of choices per rat, enabling detailed behavioral
2066 Current Biology 29, 2066–2074, June 17, 2019 ª 2019 Elsevier Ltd.
A Rat Behavioral Task B 1 of 3 stimulus-reward mappings
1.0 48
Volume ( L)
48 p=0.3 p=1.0 6
Probability
0.5 24
12
6
0 5 10 6 12 24 48
# Flashes Click rates
C Rat J266 (56750 trials)
Safe side: 6 µ l (p=1) Safe side: 24 µ l (p=1)
Center LED ~2.6-3.35 seconds 100
48
% Chose safe
Nose in center poke
12
L/R clicks convey 6
reward volume 50
Safe side: 12 µ l (p=1) Safe side: 48 µ l (p=1)
L/R flashes convey variable delays 48
reward prob 24
12
Nose in side poke 6 0
0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1
Risky side probability Risky side probability
D Prospect Theory Behavioral Model E
Model prediction for J266 (5-fold cross validation)
(x-r) if x>r Value = u(x) * w(p)
u(x) = { - (r-x) if x<r
w(p) = e Safe side: 6 µ l (p=1) Safe side: 24 µ l (p=1)
100
48
% Chose safe
reference 1
Value =ValueR - ValueL 12
point (r)
utility (u(x))
gains 6
Safe side: 12 µ l (p=1) Safe side: 48 µ l (p=1) 50
losses
48
P(ChooseR ) 24
12
0 1 6 Risky side probability Risky side probability
Logistic( Value 0
+ bias) 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1
reward (x) prob (p)
quantification. Rats demonstrated they learned the meaning of tive to differences in larger rewards. Rats’ median a was 0.54,
the cues by frequently ‘‘opting-out’’ of trials offering smaller indicating concave utility, like humans [2] (Figures 2A and S1F).
rewards, leaving the center poke despite incurring a time-out To test for diminishing marginal sensitivity, we compared perfor-
penalty and white-noise sound (Figures S1A–S1C). This indi- mance on trials offering guaranteed outcomes of 0 or 24 mL, and
cated that they associated the click rates with water volumes, 24 or 48 mL (Figures 2B and 2C). Concave utility implies that 24
instead of relying on a purely perceptual strategy. It is possible and 48 mL are less discriminable than 0 and 24 mL (Figure 2C).
that opting-out, which persisted despite longer time-out pen- Indeed, the concavity of the utility function was correlated with
alties for low-volume trials (STAR Methods), reflected reward- reduced discriminability on trials offering 24 and 48 mL (Figures
rate maximizing strategies [21–23]. Rats favored prospects with 2D and 2E; p = 1.03e7, Pearson’s correlation). This was true
higher expected value (Figures 1C and S1). when the guaranteed outcome of 0 included trials offering
We used a standard choice model [24, 25] to estimate each 24 mL with p = 0 (Figures 2D and 2E), or all volumes with p = 0 (Fig-
rat’s utility and probability weighting functions according to ures S2G and S2H). Our choice set did not permit analysis of
prospect theory (Box 1; Figures 1D and S1). The model predicted trials offering non-zero rewards, as these trials (24 versus 0,
rats’ choices on held-out data (Figure 1E). It outperformed alter- 48 versus 24) were the only ones with equal reward differences.
native models, including one imposing linear probability weight- This suggests that rats, like humans, exhibit diminishing marginal
ing (according to Expected Utility Theory [26]), one that fit linear sensitivity.
weights for probabilities and volumes, and several models imple- Rats’ probability weighting functions revealed overweighting
menting purely perceptual strategies with sensory noise (Figures of probabilities (Figure 2F). A logistic regression model that
S2A–S2F). parameterized each probability to predict choice yielded regres-
Concave utility (the utility function exponent a < 1) produces sion weights mirroring the probability weighting functions (Fig-
diminishing marginal sensitivity, in which subjects are less sensi- ures 2G and 2H). Control experiments indicated that nonlinear
We modeled the probability that the rat chose the right side by a logistic function whose argument was the difference between the
subjective value of each option ðVR VL Þ plus a trial history-dependent term. Subjective utility was parameterized as:
a
ðx rÞ if x > r
uðxÞ = (1)
kðr xÞa if x < r;
where a is a free parameter, and x is reward volume. r is the reference point, which determines whether rewards are perceived as
gains or losses. We first consider the case where r = 0, so
uðxÞ = xa : (2)
where b and d are free parameters and p is the objective probability offered. Combining utility and probability yields the subjective
value for each option:
VR = uðxR ÞwðpR Þ (4)
These were normalized by the max over trials and transformed into choice probabilities via a logistic function:
1 2i
PðChooseR Þ = i + ; (6)
1 + elðVR VL Þ + bias
where i captures stimulus independent variability (lapse rate) and l determines the sensitivity of choices to the difference in sub-
jective value ðVR VL Þ. The bias term was composed of three possible parameters, depending on trial history:
8
< + =h1 if t1was safe L=R choice
bias = + =h2 if t1was risky L=R rew (7)
:
+ =h3 if t1was risky L=R miss:
utility and probability weighting were not due to perceptual errors vealed systematic shifts in utility and probability weighting func-
in estimating flashes and clicks [27] (Figures S2I–S2K). tions: utility functions became less concave and probability
To evaluate rats’ risk attitudes, we measured the certainty weighting functions became more elevated to reflect increased
equivalents (CEs) for all gambles of 48 mL [2, 28, 29]. The cer- likelihood of risky choice following rewards (Figure 3A). This
tainty equivalent is the guaranteed reward the rat deems equal was consistent across rats, as observed in certainty equivalents
to the gamble (Figures 2I and 2J). If it is less than the gamble’s from the data (p = 6.49e5, paired t test) and model (Figure 3B;
expected value, that indicates risk aversion: the subject p = 2.46e7).
effectively ‘‘undervalues’’ the gamble and will accept a smaller Another feature of human behavior is ‘‘reference depen-
reward to avoid risk (Figure 2K). Conversely, if the certainty dence’’: people evaluate rewards as gains or losses relative to
equivalent is greater than the gamble’s expected value, the sub- an internal reference point. It is unclear what determines the
ject is risk seeking, and risk neutral if they are equal. Measured reference point [30]; proposals include status quo wealth [1],
certainty equivalents closely matched those predicted from the reward expectation [31, 32], heuristics based on the prospects
model, using an analytic expression incorporating utility and [33, 34], or recent experience [35, 36].
probability weighting functions (CE = wðpÞ1=a ; STAR Methods. Rats demonstrated reference dependence by treating smaller
Pearson’s correlation 0.96, p = 1.58e11; Figure 2K). This rewards as losses. They exhibited win-stay and lose-switch
non-parametric assay further validated the model fits and re- biases: following unrewarded trials, rats were more likely to
vealed heterogeneous risk preferences across rats (Figure 2L; switch ports (Figure 3C). Surprisingly, most rats exhibited
Figures S3A–S3C). ‘‘switch’’ biases after receiving 6 or 12 mL, consistent with treat-
Although rats exhibited nonlinear utility and probability weight- ing these outcomes as losses. The ‘‘win or lose’’ threshold (i.e.,
ing, consistent with prospect theory, they also exhibited trial-by- reference point) was experience dependent: a separate cohort
trial learning, consistent with reinforcement learning. Fitting the of rats (n = 3) trained with doubled rewards (12–96 mL) exhibited
model to trials following rewarded and unrewarded choices re- lose-switch biases after receiving 12 or 24 mL (Figure 3D).
E F G H
I J L
K
Figure 2. Non-parametric Analyses Confirm Nonlinear Utility and Probability Weighting and Reveal Diverse Risk Attitudes
(A) Model fits of subjective utility functions for each rat, normalized by the maximum volume (48 mL).
(B) Schematic linear utility function: the perceptual distance (or discriminability, d’) between 0 and 24 mL is the same as 24 and 48 mL.
(C) Schematic concave utility function: 24 and 48 mL are less discriminable than 0 and 24 mL.
(D) One rat’s performance on trials with guaranteed outcomes of 0 versus 24 mL (green) or 24 versus 48 mL (purple). Performance ratio on these trials (‘‘d’ ratio’’)
less than 1 indicates diminishing sensitivity. Error bars are binomial confidence intervals.
(E) The concavity of the utility function (a) is significantly correlated with reduced discriminability of larger rewards (p = 1.03e7, Pearson’s correlation). Pink circle
is rat from (D).
(F) Model fits of probability weighting functions.
(G) Weights from logistic regression parameterizing each probability match probability weighting function for one rat. Error bars are SEM for each regression
coefficient.
(H) Mean squared error between regression weights and parametric fits for each rat (mean MSE = 0.006, in units of probability).
(I and J) To obtain certainty equivalents, we measured psychometric functions for each probability of receiving 48 mL and estimated the certain volume at which
performance = 50%.
(K) Measured (blue) and model-predicted (red) certainty equivalents from one rat indicates systematic undervaluing of the gamble, or risk aversion. Error bars for
model prediction are 95% confidence intervals of parameters from 5-fold cross validation. Data are mean ± SEM for left-out test sets.
(L) Distribution of certainty equivalent areas computed using analytic expression from model fits. Measured certainty equivalents were similar (Figure S3C).
See also Figures S2 and S3.
The win or lose threshold was often reward-history dependent Box 1). Subjective value was reparameterized to include the
(Figure 3E). Therefore, we parameterized a reference point, r, as zero outcome of the gamble, which is a loss when r > 0:
taking one of two values depending on whether the previous trial
VR = uðxR ÞwðpR Þ + uð0Þwð1 pR Þ (8)
was rewarded (see STAR Methods). Rewards less than r were
negative (losses). The relative amplitude of losses versus gains
was controlled by the parameter k (Equation 1; Figure 1D; VL = uðxL ÞwðpL Þ + uð0Þwð1 pL Þ (9)
E F G H
Model comparison (Akaike information criterion, AIC) favored (k < 1). Still, the median k across rats suggests similarity to hu-
the reference point model for all rats (Figures S3D and S3E). mans (Figure 3H).
We also parameterized the reference point as reflecting several Prospect theory does not account for how agents learn sub-
trials back with an exponential decay, where the time constant jective values from experience, and we explicitly incorporated
was a free parameter (see STAR Methods). For most rats trial history parameters to account for trial-by-trial learning
(20/36, 77%), this did not significantly improve model perfor- (Equations 6 and 7, Figure 4A). To examine learning dynamics,
mance compared to the reference point reflecting one trial we fit the model to the first and second half of all trials, once
back, although it was a better fit for a minority of rats with longer rats achieved criterion performance (Figure S4A). There was no
integration time constants over trials (Figures S3F–S3H). For the significant change in the parameters for the utility or probability
sake of simplicity and because it generally provided a better fit, weighting functions (Figure S4B). Rats showed a significant in-
we focused on the ‘‘one-trial back’’ reference point model. crease in the softmax parameter with training, indicating
Interestingly, the reference point from the model was not signif- increased sensitivity to value differences, and a decrease in
icantly correlated with average reward rate for each rat [37]; this one of the trial history parameters, h1 , indicating reduced win-
was true regardless of whether opt-out trials were included in stay biases (Figure S4).
estimates of average reward per trial (Pearson’s correlation, Reinforcement learning describes an adaptive process in
p > 0.05). which animals learn the value of states and actions. This frame-
With concave utility, rats should exhibit sharper psychometric work, however, implies linear utility and probability weighting.
performance when the reference point is high (and rewards are We simulated choices of a Q-learning agent, which learned the
more discriminable; Figure 3F). Indeed, performance was closer state-action values of each unique trial type. Fitting the prospect
to ideal when the reference point was high (Figure 3G; mean theory model to these simulated choices recovered linear utility
squared error [MSE] between psychometric and ideal perfor- and probability weighting functions (regardless of learning rate;
mance was 0.143 low ref versus 0.122 high ref, p = 3.4e5, Figure 4B). This is expected: since trial sequences were random-
paired t test across rats). ized (i.e., each trial was independent), basic reinforcement
A loss parameter k > 1 indicates ‘‘loss aversion’’ or a greater learning will learn the expected value of each option [6] and will
sensitivity to losses than gains (Equation 1). We observed a me- resolve to linear utility and probability weighting functions. We
dian k of 1.66 (Figure 3H). There was variability across rats: 16/36 therefore implemented a reinforcement learning model that
rats (44%) were not loss averse but were more sensitive to gains could accommodate nonlinear subjective utility and probability
D E F G
Figure 4. Integrating Prospect Theory and Reinforcement Learning Captures Nonlinear Subjective Functions and Learning
(A) Prospect theory model predictions for each rat, without the trial history parameters (h1–h3, see Box 1) does not account for win-stay and lose-switch trial
history effects. Inclusion of these parameters accounts for these effects.
(B) Prospect theory model fit to simulated choices from a basic reinforcement learning agent yields linear utility and probability weighting functions over a range of
generative learning rates (0.2, 0.4, 0.6, 0.8, 1.0, overlaid).
(C) Schematic of model incorporating prospect theory and reinforcement learning.
(D) The hybrid model described in (C) accounts for win-stay and lose-switch effects.
(E) The model recovers nonlinear utility and probability weighting functions.
(F) Model comparison when the error term used in the model was the subjective value (as shown in C) or the expected value (probability 3 reward). Red arrow is
mean DAIC (p = 1.38e8, paired t test of AIC).
(G) Binned values of rats’ lose-switch biases (measured from the data) plotted against the best-fit learning rate, alearn . Pearson’s correlation coefficient is
0.37 across rats (p = 0.026).
See also Figure S4.
weighting but also learning over trials (Figure 4C [10, 38]). The to the model incorporating subjective value according to pros-
model assumed that the rats learned the value of left and right pect theory (Figure 4F; p = 1.38e8, paired t test of AIC). Finally,
choices for each unique combination of probability (p) and each rats’ learning rate ðalearn Þ was negatively correlated with the
reward (mL) according to the following equation: magnitude of their lose-swich biases, suggesting an inverse
relationship between learning dynamics governing gradual im-
Vp;mL ðt + 1Þ = Vp;mL ðtÞ + alearn ðwðpÞuðxÞ Vp;mL ðtÞÞ (10)
provements in task performance, and trial-by-trial learning,
where alearn is the learning rate parameter, and wð pÞ and uðxÞ are which is deleterious to performance when trials are independent
parameterized as in Equations 2 and 3. The learned values of the [27, 39, 40]. Rats with slower dynamics (lower learning rate)
right and left prospects on each trial were transformed into showed more prominent trial history effects, whereas rats with
choice probabilities via a logistic function (see STAR Methods). rapid learning showed reduced trial history biases (Figure 4G).
We also implemented a global bias for left and right choices de-
pending on reward history (Figures 4C and 4D). In this model, DISCUSSION
utility and probability weighting functions were exclusively
used for learning or updating values, whereas choice depended There is a strong foundation for using animal models to study the
on the learned values on each trial. Although the parameters of cost-benefit calculations underlying economic theory [41]. In
the utility and probability weighting functions were free parame- foraging studies, animals either exploit a current option for
ters in this model, we recovered parameter values identical to the reward or explore a new one and often maximize their rate of
prospect theory model (Figure 4E; Figure S4C). Importantly, a reward [42–45]. Rodents exhibit complex aspects of economic
reinforcement learning model in which the expected value (EV) decision-making, including regret [46, 47] and sensitivity to
was the error signal driving learning underperformed compared sunk costs [48, 49]. Here, we applied prospect theory to rats.
59. Pompilio, L., Kacelnik, A., and Behmer, S.T. (2006). State-dependent 77. Keramati, M., and Gutkin, B. (2014). Homeostatic reinforcement learning
learned valuation drives choice in an invertebrate. Science 311, 1613– for integrating reward collection and physiological stability. eLife 3..
1615. Published online December 2, 2014. 10.7554/eLife.04811.
60. Marsh, B. (2004). Energetic state during learning affects foraging choices 78. Brunton, B.W., Botvinick, M.M., and Brody, C.D. (2013). Rats and humans
in starlings. Behav. Ecol. 15, 396–399. can optimally accumulate evidence for decision-making. Science 340,
61. Crespi, L.P. (1942). Quantitative variation of incentive and performance in 95–98.
the white rat. Am. J. Psychol. 55, 467. 79. Hanks, T.D., Kopec, C.D., Brunton, B.W., Duan, C.A., Erlich, J.C., and
62. Zeaman, D. (1949). Response latency as a function of the amount of rein- Brody, C.D. (2015). Distinct relationships of parietal and prefrontal cortices
forcement. J. Exp. Psychol. 39, 466–483. to evidence accumulation. Nature 520, 220–223.
Further information and requests for resources should be directed to and will be fulfilled by the Lead Contact, Christine Constanti-
nople ([email protected]). Transgenic (Pvalb-iCre)2Ottc rats (n = 5) were obtained by an MTA from University of Missouri
RRRC.
Subjects
A total of 39 male rats between the ages of 6 and 24 months were used for this study, including 35 Long-evans and 4 Sprague-Dawley
rats (Rattus norvegicus). The Long-evans cohort also included LE-Tg (Pvalb-iCre)2Ottc rats (n = 5) made at NIDA/NIMH and obtained
from the University of Missouri RRRC (transgenic line 0773). These are BAC transgenic rats expressing Cre recombinase in parval-
bumin expressing neurons. Animal use procedures were approved by the Princeton University Institutional Animal Care and Use
Committee (IACUC #1853) and carried out in accordance with National Institutes of Health standards.
Rats were typically housed in pairs or singly; rats that trained during the day were housed in a reverse light cycle room. Some rats
trained overnight, and were not housed with a reverse light cycle. Access to water was scheduled to within-box training, 2-4 hours per
day, usually 7 days a week, and between 0 and 1 hour ad lib following training.
METHOD DETAILS
Behavioral training
Rats were trained in a high-throughput facility using a computerized training protocol. Rats were trained in operant training boxes with
three nose ports. When an LED from the center port was illuminated, the animal could initiate a trial by poking his nose in that port;
upon trial initiation the center LED turned off. While in the center port, rats were continuously presented with a train of randomly timed
clicks from a left speaker and, simultaneously, a different train of clicks from a right speaker. The click trains were generated by Pois-
son processes with different underlying rates [78, 79]; the rates conveyed the water volume baited at each side port. After a variable
pre-flash interval ranging from 0 to 350ms, rats were also presented with light flashes from the left and right side ports; the number of
flashes conveyed reward probability at each port. Each flash was 20ms in duration; flashes were presented in fixed bins, spaced
every 250ms, to avoid perceptual fusion of consecutive flashes [27]. After a variable post-flash delay period from 0 to 500ms, the
end of the trial was cued by a go sound and the center LED turning back on. The animal was then free to choose the left or right center
port, and potentially collect reward.
The trials were self-paced: on trials when rats did not receive reward, they were able to initiate another trial immediately. However, if
rats terminated center fixation prematurely, they were penalized with a white noise sound and a time out penalty. Since rats dispro-
portionately terminated trials offering low volumes, we scaled the time out penalty based on the minimum reward offered. The
time out penalties were adjusted independently for each rat to minimize terminated trials (as an example, several rats were penalized
with 6 s time-outs for terminating trials offering a minimum of 6mL, 4.5 s for terminating trials offering a minimum of 12mL, 3 s for ter-
minating trials offering a minimum of 24mL, and 1.5 s for terminating trials offering a minimum of 48mL).
The threshold for analysis was the median performance of all sessions minus 1.5 times the interquartile range of performance across
the second half of all sessions. Once performance surpassed this threshold, it was typically stable across months. Occasional days
with poor performance were usually due to hardware malfunctions in the rig. Days in which performance was below threshold were
excluded from analysis.
Behavioral model
We fit a behavioral model separately for each rat (see Box 1 for description of the model). We used MATLAB’s constrained minimi-
zation function fmincon to minimize the sum of the negative log likelihoods with respect to the model parameters. 20 random seeds
were used in the maximum likelihood search for each rat; parameter values with the maximum likelihood of these seeds were deemed
the best fit parameters. When evaluating model performance (e.g., Figure 1E), we performed 5-fold cross-validation and evaluated
the predictive power of the model on the held-out test sets.
We initially evaluated three different parametric forms of the probability weighting function, the one- and two-parameter Prelec
models and the linear in log-odds model (see below) [24, 28]. We compared the different parametric forms using Akaike Information
Criterion (AIC), AIC = 2k + 2nLL, where k is the number of parameters, and nLL is the negative log likelihood of the model. AIC
favored the two-parameter Prelec model for nearly all rats, although some rats were equally well-fit by the linear in log-odds model
(data not shown). Therefore, we implemented the two-parameter Prelec model.
d
Oneparameter Prelec : wðpÞ = eðlnðpÞÞ ; (12)
where p is the true probability, and d is a free parameter. d controls the curvature of the weighting function; its crossover point is fixed
at 1=e.
d
Twoparameter Prelec : wðpÞ = ebðlnðpÞÞ ; (13)
Where p is the true probability, b and d are free parameters. d primarily controls the curvature and b primarily controls the elevation of
the weighting function.
dpr
Linear in logodds : wðpÞ = r; (14)
dpr + ð1 pÞ
where p is the true probability and g and d are free parameters. g primarily controls the curvature of the weighting function and d con-
trols the elevation.
Alternative models
We compared the prospect theory model to a number of alternative models. The Expected Utility Theory model (EUT) has the same
form as the prospect theory model, except that the subjective value on each side is the product of objective probability and subjective
utility (see Figure S2A):
VR = uðxR ÞpR (15)
sC = mC x + bC (21)
Where mF , mC , bF , bC , are free parameters. x is the number of flashes (clicks) presented to the rat on each side. For the first two SDT
models, we compute choice probabilities based on the flash difference ðDFÞ, and click difference ðDCÞ separately, where these
choice probabilities are calculated as follows, according to [27]:
Z N qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
Pðwent rightjDFÞ = N RF LF ; s2RF + s2LF dðRF LF Þ (22)
0
Z N qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
Pðwent rightjDCÞ = N RC LC ; s2RC + s2LC dðRC LC Þ (23)
0
RF ðRC Þ and LF ðLC Þ are the number of right and left flashes (clicks) presented to the rat on each trial, and the s terms are the noise
terms defined in Equations 20 and 21. One model (SDT1) assumes that the rat’s choice is given by the average of these probabilities
(see Figure S2C). Another model (SDT2) assumes that the rat’s choice is given by the most informative cue on each trial (the choice
probability most different from 0.5; see Figure S2D).
Alternatively, it’s possible that the rats combine the noisy estimates of flashes and clicks on each side. Therefore, we evaluated two
additional models parameterized as follows:
Z N qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
Pðwent rightjright evÞ = N RF + RC ; s2RF + s2RC dðRF + RC Þ (24)
0
Z N qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
Pðwent rightjleft evÞ = N LF + LC ; s2LF + s2LC dðLF + LC Þ (25)
0
One model (SDT3) assumes that the rat’s choice is given by the average of these probabilities (Figure S2E), and the other (SDT4) as-
sumes that the rat’s choice is given by the most informative side on each trial (the choice probability most different from 0.5;
Figure S2F).
Psychometric curves
We measured rats’ psychometric performance when choosing between the safe and risky options. For these analyses, we excluded
trials where both the left and right side ports offered certain rewards. We binned the data into 11 bins of the difference in the subjective
value (inferred from the behavioral model) of the safe minus the risky option. Psychometric plots show the probability that the subjects
chose the safe option as a function of this difference (see Figure S1D). We fit a 4-parameter sigmoid of the form:
1 2a
PðchooseS Þ = y0 + ; (26)
1 + eðbðVsVR x0 ÞÞ
where y0 , a, b, and x0 were free parameters. Parameters were fit using a gradient-descent algorithm to minimize the mean square
error between the data and the sigmoid, using the sqp algorithm in MATLAB’s constrained optimization function fmincon.
We constrained r > 0:
where uðxÞ corresponds to the subjective utility of the chosen reward volume. Choice probabilities were computed according to
Equation 6 in the main text.
Behavioral data are available upon request by contacting the Lead Contact, Christine Constantinople ([email protected]).