0% found this document useful (0 votes)
24 views

Fleming 2017

This document describes a new hierarchical Bayesian method for estimating metacognitive efficiency from confidence ratings. It introduces the meta-d' model of metacognition and explains how a Bayesian approach can enhance statistical power and avoid issues with traditional point estimation. Simulations show the hierarchical method performs better than alternatives with limited data, such as from patient studies. Software is provided to implement the hierarchical meta-d' estimation.

Uploaded by

ljier06
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
24 views

Fleming 2017

This document describes a new hierarchical Bayesian method for estimating metacognitive efficiency from confidence ratings. It introduces the meta-d' model of metacognition and explains how a Bayesian approach can enhance statistical power and avoid issues with traditional point estimation. Simulations show the hierarchical method performs better than alternatives with limited data, such as from patient studies. Software is provided to implement the hierarchical meta-d' estimation.

Uploaded by

ljier06
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 14

Neuroscience of Consciousness, 2017, 1–14

doi: 10.1093/nc/nix007
Research article

HMeta-d: hierarchical Bayesian estimation of


metacognitive efficiency from confidence ratings

Downloaded from https://academic.oup.com/nc/article/2017/1/nix007/3748261 by guest on 06 January 2024


Stephen M. Fleming*
Wellcome Trust Centre for Neuroimaging, University College London, 12 Queen Square, WC1N 3BG London,
UK
*Correspondence address: E-mail: [email protected]

Abstract
Metacognition refers to the ability to reflect on and monitor one’s cognitive processes, such as perception, memory and
decision-making. Metacognition is often assessed by whether an observer’s confidence ratings are predictive of objective
success, but simple correlations between performance and confidence are susceptible to undesirable influences such as re-
sponse biases. Recently, an alternative approach to measuring metacognition has been developed that characterizes meta-
cognitive sensitivity (meta-d’) by assuming a generative model of confidence within the framework of signal detection the-
ory. However, current estimation routines require an abundance of confidence rating data to recover robust parameters,
and only provide point estimates of meta-d’. In contrast, hierarchical Bayesian estimation methods provide opportunities to
enhance statistical power, incorporate uncertainty in group-level parameter estimates and avoid edge-correction con-
founds. Here I introduce such a method for estimating metacognitive efficiency (meta-d’/d’) from confidence ratings and
demonstrate its application for assessing group differences. A tutorial is provided on both the meta-d’ model and the prepa-
ration of behavioural data for model fitting. Through numerical simulations I show that a hierarchical approach outper-
forms alternative fitting methods in situations where limited data are available, such as when quantifying metacognition in
patient populations. In addition, the model may be flexibly expanded to estimate parameters encoding other influences on
metacognitive efficiency. MATLAB software and documentation for implementing hierarchical meta-d’ estimation (HMeta-
d) can be downloaded at https://github.com/smfleming/HMeta-d.

Key words: metacognition; confidence; signal detection theory; Bayes

Introduction metacognition and conscious visual experience are both conse-


Metacognition is defined as ‘knowledge of one’s own cognitive quences of disruptions to higher-order representations (Lau and
processes’ (Flavell 1979). For example, we can reflect on whether Rosenthal 2011; Ko and Lau 2012). While there are clearly other
a particular percept is accurate or inaccurate, and this ability to drivers of confidence in one’s task performance aside from sen-
‘know that we know’ is a central aspect of conscious experience sory certainty (such as response requirements; Pouget et al.
(Schooler 2002). Consider blindsight, a neurological condition 2016; Denison 2017), understanding the mechanisms supporting
that sometimes arises following selective lesions to primary vi- metacognition may shed light on the putative underpinnings of
sual cortex (Weiskrantz et al. 1974). A blindsight patient may conscious experience. Understanding the relationship between
perform a task (e.g. discriminating the location of a stimulus) at metacognition and perceptual and cognitive processes also has
a reasonably high level in the otherwise blind field, and yet lack broader application in work on judgment and decision-making
insight as to whether they have performed accurately on any (Lichtenstein et al. 1982), developmental psychology (Weil et al.
given trial (Persaud et al. 2007). It is plausible that a joint lack of 2013; Goupil et al. 2016), social psychology (Heatherton 2011)

Received: 1 September 2016; Revised: 14 March 2017. Accepted: 20 March 2017


C The Author 2017. Published by Oxford University Press.
V
This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/),
which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.

1
2 | Fleming

and clinical disorders (David et al. 2012; Moeller and Goldstein Previously meta-d’ has been fitted using gradient ascent on
2014). the likelihood [maximum likelihood estimation (MLE)],
Metacognitive ‘sensitivity’ can be assessed by the extent to minimization of sum-of-squared error (SSE) or using analytic
which an observer’s confidence ratings are predictive of their approximation (Maniscalco and Lau 2012; Barrett et al. 2013).
actual success. Consider a simple decision task such as whether However, several factors make a Bayesian approach attractive
a briefly flashed visual stimulus is categorized as being tilted to for typical metacognition studies:
the left or right, followed by a confidence rating in being correct.
1. Point estimates of meta-d’ are inevitably noisy. Several pa-
The task of assessing response accuracy using confidence rat-
rameters must be estimated in the signal detection model,
ings is often called the ‘type 2 task’ (Clarke et al. 1959; Galvin
including multiple type 2 criteria [specifically, ðk  1Þ  2,
et al. 2003) to differentiate it from the ‘type 1 task’ of discrimi-
where k ¼ number of confidence ratings available]. One com-
nating between states of the world (e.g. left or right tilts). If
mon issue in cognitive neuroscience is that trial numbers
higher confidence ratings are given after correct judgments and
per condition are also low (e.g. in patient studies, or tasks
lower confidence ratings after incorrect judgments, we can as-
conducted in conjunction with neuroimaging), and fre-
cribe high metacognitive sensitivity to the subject. Thus a sim-

Downloaded from https://academic.oup.com/nc/article/2017/1/nix007/3748261 by guest on 06 January 2024


quentist estimates of hit and false-alarm rates fail to ac-
ple and intuitive way of assessing metacognitive sensitivity is
count for uncertainty about these rates that is a
to correlate confidence with accuracy (Nelson 1984).
consequence of finite data. A Bayesian analysis incorporates
However, confidence–accuracy correlations (e.g. gamma and
such uncertainty into parameter estimates.
phi correlations) are affected by the confounding factors of type
2. A hierarchical Bayesian approach is the correct way to com-
1 performance (d’) and type 2 response bias (overall level of con-
bine information about within- and between-subject uncer-
fidence; Masson and Rotello 2009; Fleming and Lau 2014).
tainty. In a typical study, the metacognitive sensitivities of
Consider two subjects A and B performing the same task but
two groups (e.g. patients and controls) are compared. Single-
with different baseline levels of performance. A and B may have
subject maximum likelihood fits are carried out, and the fit-
the same underlying metacognitive ability, but their confi-
ted meta-d’ parameters are entered into an independent
dence–accuracy correlations may differ due to differing perfor-
samples t-test. Any information about the uncertainty in
mance levels. In this situation, we may erroneously conclude
each subject’s parameter fits is discarded in this procedure.
that A and B have different metacognition, despite their under-
In contrast, using hierarchical Bayes, information about un-
lying metacognitive ability being equal. More generally, an im-
certainty is retained, such that group-level parameters are
portant lesson from the signal detection theory (SDT) approach
less influenced by single-subject fits that have a high degree
to modelling type 1 and type 2 tasks is that type 1 sensitivity (d’)
of uncertainty. In turn, hierarchical model fits are able to
and type 1 criterion (c) influence measures of type 2 sensitivity
capitalize on the statistical strength offered by the degree to
(Galvin et al. 2003).
which subjects are similar with respect to one or more
Recently, an alternative approach to measuring metacogni-
model parameters, mutually constraining the subject-level
tive sensitivity has been developed by Maniscalco and Lau
model fits.
(2012). This approach posits a generative model of confidence
3. In fitting SDT models to data, padding (edge correction) is of-
reports within the framework of SDT (Fig. 1A). Fitting the model
ten applied to avoid zero counts of confidence ratings in par-
to data returns a parameter, meta-d’, that reflects an individ-
ticular cells [e.g. high confidence error trials; Hautus (1995);
ual’s metacognitive sensitivity. Specifically, meta-d’ is the value
Macmillan and Creelman (2005)]. This padding may bias
of type 1 performance (d’) that would have been predicted to
subject-specific parameter estimates particularly when the
give rise to the observed confidence rating data assuming an
overall trial number is low. A Bayesian approach avoids the
ideal observer with type 1 d’ ¼ meta-d’. Meta-d’ can then be com-
need for edge correction as the generative multinomial
pared with actual d’ and a relative measure of metacognitive
model naturally handles zero cell counts, and a hierarchical
sensitivity can then be calculated as a ratio (meta-d’/d’) or sub-
specification pools data over subjects (Lee 2008).
traction (meta-d’-d’). Meta-d’/d’ is a measure of ‘metacognitive
4. A hierarchical model makes testing group-level hypotheses
efficiency’—given a particular level of task performance, how
natural and straightforward. For example, say we are inter-
efficient is the individual’s metacognition? If meta-d’ ¼ d’, then
ested in testing whether a particular patient group has lower
the observer is metacognitively ‘ideal’, using all the information
metacognitive sensitivity compared with controls.
available for the type 1 task when reporting type 2 confidence.
Hierarchical Bayes allows us to directly estimate the poste-
However, we might find that meta-d’ < d’, due to some degree of
rior distribution of a parameter that characterizes the differ-
noise or imprecision introduced when rating one’s confidence.
ences between groups, and provides a principled framework
Conversely we may find that meta-d’ > d’ if subjects are able to
for hypothesis testing. Finally, a Bayesian framework for
draw on additional information such as hunches (Rausch and
cognitive modelling enjoys other advantages that have been
Zehetleitner 2016; Scott et al. 2014) further processing of stimu-
outlined in detailed elsewhere (Kruschke 2014; Lee and
lus information (Rabbitt and Vyas 1981; Charles et al. 2013) or
Wagenmakers 2014). Briefly, they include the ability to gain
knowledge of other influences on task performance when mak-
evidence in favour of the null hypothesis as well as against
ing their metacognitive judgments (Fleming and Daw 2017).
it; the ability to combine prior information (e.g. a prior on
The properties of the meta-d’ model have been thoroughly
the distribution of metacognitive sensitivity in a healthy
explored in previous articles (Maniscalco and Lau 2012; Barrett
population) with new data; and the flexible extension of the
et al. 2013; Fleming and Lau 2014;Maniscalco and Lau 2014). The
model to estimate subject- and trial-level influences on
goal of the present article is 2-fold. First, I introduce a new
metacognition.
method for estimating meta-d’/d’ from confidence ratings using
hierarchical Bayes, and provide a tutorial on its usage. Second, I The basics of Bayesian estimation of cognitive models are in-
demonstrate the benefits of applying this method to derive tuitive. First, prior information is specified in the form of proba-
group-level estimates of metacognitive efficiency in situations bility distributions over model parameters, and observed data
where data are limited. are used to update beliefs to construct a posterior distribution
HMeta-d: estimating metacognitive efficiency | 3

Downloaded from https://academic.oup.com/nc/article/2017/1/nix007/3748261 by guest on 06 January 2024


B

Figure 1. The meta-d’ model. (A) The right-hand panel shows schematic confidence-rating distributions conditional on correct and incorrect de-
cisions. A subject with good metacognitive sensitivity will provide higher confidence ratings when they are correct, and lower ratings when in-
correct, and these distributions will only weakly overlap (solid lines). Conversely a subject with poorer metacognitive sensitivity will show
greater overlap between these distributions (dotted lines). These theoretical correct/error distributions are obtained by ‘folding’ a type 1 SDT
model around the criterion [see Galvin et al. (2003), for further details], and normalizing such that the area under each curve sums to 1. The
overlap between distributions can be calculated through type 2 ROC analysis (middle panel). The theoretical type 2 ROC is completely deter-
mined by an equal-variance Gaussian SDT model; we can therefore invert the model to determine the type 1 d’ that best fits the observed confi-
dence rating data, which is labelled meta-d’. Meta-d’ can be directly compared with the type 1 d’ calculated from the subject’s decisions—if
meta-d’ is equal to d’, then the subject approximates the ideal SDT prediction of metacognitive sensitivity. (B) Simulated data from a SDT
model with d’ ¼ 2. The y-axis plots the conditional probability of a particular rating given the first-order response is correct (green) or incorrect
(red). In the right-hand panel, Gaussian noise has been added to the internal state underpinning the confidence rating (but not the decision)
leading to a blurring of the correct/incorrect distributions. Open circles show fits of the meta-d’ model to each simulated dataset.

or belief in a particular parameter. The ‘hierarchical’ component control over false positives. Model code and examples are freely
of hierarchical Bayes simply indicates that multiple instances of available online at https://github.com/smfleming/HMeta-d (last
a particular parameter (e.g. across different subjects) are esti- accessed 4th January 2017).
mated in the same model. The development of efficient sam-
pling routines for arbitrary models such as Markov chain Monte
Carlo (MCMC), their inclusion in freely available software pack- Methods
ages such as JAGS (http://mcmc-jags.sourceforge.net; last ac-
Outline of the meta-d’ model
cessed 31st August 2016) and STAN (http://mc-stan.org; last
accessed 31st August 2016) and advances in computing power The meta-d’ model is summarized in graphical form in Fig. 1A.
means that Bayesian estimation of arbitrary models is now The raw data for the model fit is the observed distribution of
straightforward to implement in practice (Kruschke 2014). confidence ratings conditional on whether a decision is correct
In this article, I briefly introduce the meta-d’ model and its or incorrect. Intuitively, if a subject has greater metacognitive
hierarchical Bayesian variant [further details of the model can sensitivity, they are able to monitor their decision performance
be found in the Appendix and in Maniscalco and Lau (2014)]. by providing higher confidence ratings when they are correct,
I then provide a step-by-step MATLAB tutorial for fitting meta- and lower ratings when incorrect, and these distributions will
d’ to single-subject and group data. Finally, I conduct parameter only weakly overlap (solid lines). Conversely, a subject with
recovery simulations to compare hierarchical Bayesian and poorer metacognitive sensitivity will show greater overlap be-
standard estimation routines. These results show that, particu- tween these distributions (dotted lines). The overlap between
larly when data are limited, the new HMeta-d method outper- distributions can be calculated through type 2 receiver operat-
forms traditional fitting procedures and provides appropriate ing characteristic (ROC) analysis. The conditional probability
4 | Fleming

P(confidence ¼ y j accuracy) is calculated for each confidence k ¼ number of confidence ratings available. These criteria are
level; cumulating these conditional probabilities and plotting response-conditional, with k  1 criteria following an S1 re-
them against each other produces the type 2 ROC function. sponse and k  1 criteria following an S2 response (c2; “S1” and
A type 2 ROC that bows sharply upwards indicates a high degree c2; “S2” ). The raw data comprise counts of confidence ratings con-
of sensitivity to correct/incorrect decisions; a type 2 ROC closer ditional on both the stimulus category (S1 or S2) and the re-
to the major diagonal indicates weaker metacognitive sponse (S1 or S2). Type 1 criterion c and sensitivity d’ are
sensitivity. estimated from the data using standard formulae (Macmillan
The area under the type 2 ROC (AUROC2) is itself a useful and Creelman 2005) (In HMeta-d there is also a user option for
non-parametric measure of metacognitive sensitivity, indicat- jointly estimating both d’ and meta-d’ in a hierarchical
ing how well an observer’s ratings discriminate between correct framework).
and incorrect decisions. However, as outlined in the introduc- The fitting of meta-d’ rests on calculating the likelihood of
tion, AUROC2 is affected by type 1 performance. In other words, the confidence rating data given a particular type 2 ROC gener-
a change in task performance (d’ or criterion) is expected, a pri- ated by systematic variation of type 1 SDT parameters d’ and c,
ori, to lead to changes in AUROC2 despite endogenous metacog- and type 2 criteria c2. By convention, the prefix ‘meta-’ is added

Downloaded from https://academic.oup.com/nc/article/2017/1/nix007/3748261 by guest on 06 January 2024


nitive efficiency remaining unchanged. By explicitly modelling to each type 1 SDT parameter to indicate that the parameter is
the connection between performance and metacognition we being used to fit type 2 ROC curves. Thus, the type 1 SDT param-
can appropriately handle this confound. The core idea behind eters d’, c and c2, when used to characterize type 2 ROC curves,
the meta-d’ approach is that a single theoretical type 2 ROC is are named meta-d’, meta-c and meta-c2. Describing the ob-
completely determined by an equal-variance Gaussian SDT served type 2 ROC in terms of these type 1 SDT parameters un-
model with parameters d’, criterion c and confidence criteria c2 derpins the meta-d’ model.
(the arrow going from left to right in Fig. 1A). The converse is The Appendix contains equations for deriving type 2 proba-
therefore also true: an observed type 2 ROC implies a particular bilities from the type 1 SDT model for both S1 and S2 responses.
type 1 d’ (the arrow going from right to left in Fig. 1A), condi- Given a particular setting of the parameters meta-d’, meta-c and
tional on fixing the type 1 criterion c, which in the meta-d’ meta-c2 these equations specify a multinomial probability dis-
model is typically set to the observed value. We can then invert tribution Pðconf ¼ y j stim ¼ i; resp ¼ jÞ over observed confi-
the model to determine the type 1 d’ that best fits the observed dence counts. The likelihood of the type 2 confidence data for a
confidence rating data. As this pseudo-d’ is fit only to confi- particular setting of parameters h can be characterized using
dence rating data, and not the subject’s decisions, we label it the multinomial model as:
meta-d’. Meta-d’ can be directly compared with the type 1 d’ cal-
Y
culated from the subject’s decisions—if meta-d’ is equal to d’, LðhjdataÞ / Ph ðconf ¼ y j stim ¼ i; resp ¼ jÞndata ðconf¼yjstim¼i; resp¼jÞ

then the subject approximates the ideal SDT prediction of meta- y;i;j

cognitive sensitivity. The relative values of d’ and meta-d’ thus


quantify the relative sensitivity of decisions and confidence rat- Best-fitting parameters are then obtained by finding param-
ings respectively. A ratio of these quantities (meta-d’/d’) pro- eter settings that maximize the likelihood of the data:
vides a summary measure of ‘metacognitive efficiency’.
Figure 1B provides a concrete example. The data in both pan- 0
 
h ¼ arg max Lðh j dataÞ; subject to : metac0 ¼ c ; c metac ascending
els are simulated from a SDT model with d’ ¼ 2 and symmetric h
flanking confidence criteria positioned such that stronger inter-
 
nal signals lead to higher confidence ratings on a 1–4 scale. The
where c metac ascending is a Boolean function which returns a
y-axis plots the conditional probability of a particular rating
value of ‘true’ only if the type 1 and type 2 criteria stand in ap-
given the first-order response is correct (green) or incorrect
propriate ordinal relationships, i.e. each element in c ascending is
(red). In both panels, the simulations return higher confidence
at least as large as the previous element, and c’ is a measure of
ratings more often on correct trials and lower confidence more
type 1 response bias.
often on incorrect trials. However, in the right-hand panel,
Gaussian noise has been added to the internal state underpin-
ning the confidence rating (but not the decision). This leads to a Hierarchical Bayesian estimation of meta-d’
blurring of the correct/incorrect distributions, such that higher In hierarchical Bayesian estimation of meta-d’ (HMeta-d), the
confidence ratings are used even when the decision is incorrect. model is similar except group-level prior densities are specified
The open circles show fits of the meta-d’ model to each simu- over each of the subject-level parameters referred to in the pre-
lated dataset. While both fits return type 1 d’ values of 2.0, the vious section. A further difference between HMeta-d and single-
meta-d’ value in the right-hand panel is much lower than on subject estimation is that the group-level parameter of interest
the left, leading to a meta-d’/d’ ratio of 64% of optimal. Notably is the ratio of meta-d’/d’ rather than meta-d’ itself. The rationale
meta-d’ in the left panel is similar to d’, as expected if confi- for this modelling choice is that while each subject or group
dence ratings are generated from an ideal observer model with- may differ in type 1 d’, our parameter of interest is metacogni-
out any additional noise. This example illustrates how meta-d’ tive efficiency at the group level, not meta-d’ (which itself will
can appropriately recover changes in the fidelity of confidence be influenced by subject- or group-level variability in d’). Thus d’
ratings independently of changes in performance. is treated as a subject-level nuisance parameter. (Alternative es-
timation schemes are possible; for instance, calculating the ra-
tio of hierarchical nodes independently encoding meta-d’ and
Single-subject optimization of meta-d’ d’. I chose the ‘nuisance parameter’ scheme as it stays closest to
I first briefly review the standard meta-d’ model and the maxi- the standard MLE approach.) An advantage of this scheme is
mum likelihood method for obtaining single-subject parameter that group-level inference is carried out directly on metacogni-
estimates. The model contains free parameters for meta-d’ and tive efficiency rather than a transformed parameter. I specified
the positions of the ðk  1Þ  2 confidence criteria, where model parameters such that the prior on log(meta-d’/d’)
HMeta-d: estimating metacognitive efficiency | 5

Downloaded from https://academic.oup.com/nc/article/2017/1/nix007/3748261 by guest on 06 January 2024


Figure 2. The hierarchical meta-d’ model. (A) Probabilistic graphical model for estimating metacognitive efficiency using hierarchical Bayes
(HMeta-d). The nodes represent all the relevant variables for parameter estimation, and the graph structure is used to indicate dependencies
between the variables as indicated by directed arrows. As is convention, unobserved variables are represented without shading and observed
variables (in this case, confidence rating counts) are represented with shading. Point estimates for type 1 d’ and criterion are represented as
black dots, and the box encloses participant-level parameters subscripted with s. The main text contains a description of each node and its
prior distribution. Figure created using the Daft package in Python (http://daft-pgm.org; last accessed 31st August 2016). (B) Prior over the
group-level estimate of log(meta-d’/d’) (lM ). The solid line shows a kernel density estimate of samples from the prior; the histogram represents
empirical meta-d’/d’ estimates obtained from 167 subjects (see main text for details).

encompassed 167 MLE parameter estimates aggregated from rd  HN ð1Þ


previous behavioural studies of metacognition of perceptual
decision-making in our laboratory (Fleming et al. 2010; Fleming
et al. 2012; Weil et al. 2013; Palmer et al. 2014; Fig. 2B). This prior cs2; “S1” ½1 : k  1  Nðlc2 ; rc2 Þ
was chosen to roughly capture the shape of the empirical distri-
bution, while allowing additional variance to relax its influence
on posterior estimates. The priors on both log(meta-d’/d’) and cs2; “S2” ½1 : k  1  Nðlc2 ; rc2 Þ
type 2 criteria weakly constrain parameter values to sensible
ranges, and can be easily changed by the user in the model
ds  Nð0; rd Þ
specification files. A log-normal prior is appropriate for a ratio
parameter, ensuring that increases and decreases relative to
the expected value of 1 are given equal weight (Keene 1995;
logðMs Þ ¼ lM þ nM  ds :
Howell 2009).
Dependencies between nodes in the HMeta-d model are il-
N represents a normal distribution parmeterized by mean
lustrated as a probabilistic graphical model in Fig. 2A. The box
and standard deviation; HN represents a positive-only, half-
encloses participant-level parameters subscripted with s. Each
normal parameterized by standard deviation. l and r represent
node is specified as follows [where M denotes log(meta-d’/d’)]:
the group-level prior means and standard deviations of subject-
level parameters. Thus lc2 and rc2 refer to the mean and SD of
lc2  N ð0; 10Þ
the type 2 criteria, and lM and rM to the mean and SD of
log(meta-d’/d’). During model development, it was observed
that the hierarchical variance parameter rM occasionally be-
rc2  HN ð10Þ
came ‘trapped’ near zero during sampling. This problem is
fairly common in hierarchical models, and one solution is pa-
lM  Nð0; 1Þ rameter expansion, whereby the original model is augmented
by redundant multiplicative parameters that introduce an ad-
ditional random component in the sampling process (Gelman
rM ¼ jnM j  ds and Hill 2007; Lee and Wagenmakers 2014). Here I employ the
scheme suggested by Matzke et al. (2014), such that the mean
and variance of logðMs Þ are scaled by a redundant multiplica-
nM  Betað1; 1Þ tive parameter nM . The posterior on rM can then be recovered
by adjusting for the influence of this additional random
component.
6 | Fleming

The HMeta-d toolbox uses MCMC sampling as implemented confidence counts following S1 presentation listed above for
in JAGS (Plummer 2003) to estimate the joint posterior distribu- subject 1, one would enter in MATLAB:
tion of all model parameters, given the model specification and
nR_S1{1} ¼ [100 50 20 10 5 1]
the data. This estimation takes the form of samples from the
posterior, with the entire sequence of samples known as a and so on for each subject in the dataset. These cell arrays then
chain. It is important to check that these samples approximate contain confidence counts for all subjects, and are passed in
the ‘stationary distribution’ of the posterior; i.e. that they are one step to the main HMeta-d function:
not affected by the starting point of the chain(s), and the sam-
fit ¼ fit_meta_d_mcmc_group(nR_S1, nR_S2)
pling behaviour is roughly constant over time without slow
drifts or autocorrelation. The default settings of the toolbox dis- An optional third argument to this function is mcmc_params
card early samples to avoid sensitivity to initial values and run which is a structure containing fields for choosing different
multiple chains, allowing the user to diagnose convergence model variants, and for specifying the details of the MCMC rou-
problems as described below. tine. If omitted reasonable default settings are chosen.
The call to fit_meta_d_mcmc_group returns a ‘fit’ structure

Downloaded from https://academic.oup.com/nc/article/2017/1/nix007/3748261 by guest on 06 January 2024


with several subfields. The key parameter of interest is
Preparing confidence rating data fit.mu_logMratio, which is the mean of the posterior distribu-
tion of the group-level log(meta-d’/d’). fit.mcmc contains the
Fitting of group-level data in the HMeta-d toolbox requires simi-
samples of each parameter, which can be plotted with the
lar data preparation to that required when obtaining single-
helper function plotSamples. For instance to plot the MCMC
subject fits using MLE or SSE in Maniscalco and Lau’s MATLAB
samples of lM , one would enter:
code (available at http://www.columbia.edu/bsm2105/
type2sdt/; last accessed 31st August 2016). I therefore start with plotSamples(exp(fit.mcmc.samples.mu_logMratio))
a short tutorial on preparing data for estimating single-subject
Note the ‘exp’ to allow plotting of meta-d’/d’ rather than
meta-d’, before explaining how to input data from a group of
log(meta-d’/d’). The exampleFit_ scripts in the toolbox provide
subjects into the hierarchical model.
other examples, such as how to set up response-conditional
Data from each subject need to be coerced into two vectors,
models and to visualize subject-level fits.
nR_S1 and nR_S2, which contain confidence-rating counts for
An important step in model fitting is checking that the
when the ‘stimulus’ was S1 and S2, respectively. Each vector
MCMC chains have converged to a stationary distribution.
has length k  2, where k is the number of ratings available.
While there is no way to guarantee convergence for a given
Confidence counts are entered such that the first entry refers to
number of MCMC samples, some heuristics can help identify
counts of maximum confidence in an S1 response, and the last
problems. By using plotSamples, we can visualize the traces to
entry to maximum confidence in an S2 response. For example,
check that there are no drifts or jumps and that each chain
if three levels of confidence rating were available and
occupies a similar position in parameter space. Another useful
nR_S1 ¼ [100 50 20 10 5 1], this corresponds to the following ^
statistic is Gelman and Rubin’s scale-reduction statistic R,
rating counts following S1 presentation:
which is stored in the field fit.mcmc.Rhat for each parameter
responded S1, rating ¼ 3: 100 times
(Gelman and Rubin 1992). This provides a formal test of conver-
responded S1, rating ¼ 2: 50 times
gence that compares within-chain and between-chain variance
responded S1, rating ¼ 1: 20 times
of different runs of the same model, and will be close to 1 if the
responded S2, rating ¼ 1: 10 times ^ in-
samples of the different chains are similar. Large values of R
responded S2, rating ¼ 2: 5 times
dicate convergence problems and values < 1.1 suggest
responded S2, rating ¼ 3: 1 time
convergence.
This pattern of responses corresponds to responding ‘high
As well as obtaining an estimate for group-level meta-d’/d’,
confidence, S1’ most often following S1 presentations, and least
we are often interested in our certainty in this parameter value.
often with ‘high confidence, S2’. A mirror image of this vector
This can be estimated by computing the symmetric 95% credible
would be expected for nR_S2. For example, nR_S2 ¼ [3 7 8 12 27
interval (CI), which is the interval bounded by the 2.5% and
89] corresponds to the following rating counts following S2
97.5% percentiles of MCMC samples. An alternative formulation
presentation:
is the 95% highest-density interval (HDI), which is the shortest
responded S1, rating ¼ 3: 3 times
possible interval containing 95% of the MCMC samples, and is
responded S1, rating ¼ 2: 7 times
not necessarily symmetric (Kruschke 2014). The helper func-
responded S1, rating ¼ 1: 8 times
tions calc_CI and calc_HDI take as input a vector of samples
responded S2, rating ¼ 1: 12 times
and return the 95% CI/HDI:
responded S2, rating ¼ 2: 27 times
responded S2, rating ¼ 3: 89 times calc_CI(exp(fit.mcmc.samples.mu_logMratio(:))
Together these vectors specify the confidence  stimu-
The colon in the brackets selects all samples in the array regard-
lus  response matrix that is the basis of the meta-d’ fit, and can
less of their chain of origin. As HMeta-d uses Bayesian estima-
be passed directly into Maniscalco and Lau’s fit_meta_d_MLE
tion it is straightforward to use the group-level posterior
function to estimate meta-d’ on a subject-by-subject basis.
density for hypothesis testing. For instance, if the question is
whether one group of subjects has greater metacognitive effi-
ciency than a second group, we can ask whether the CI/HDI of
Fitting a hierarchical model the difference overlaps with zero (see ‘Empirical examples’
Estimating a group-level model using HMeta-d requires very lit- Section for an example of this). However, note that it is incorrect
tle extra work. In HMeta-d, the nR_S1 and nR_S2 variables are to use the subject-level parameters estimated as part of the hi-
cell arrays of vectors, with each entry in the cell containing con- erarchical model in a frequentist test (e.g. a t-test); this violates
fidence counts for a single subject. For example, to specify the the independence assumption.
HMeta-d: estimating metacognitive efficiency | 7

Downloaded from https://academic.oup.com/nc/article/2017/1/nix007/3748261 by guest on 06 January 2024


Figure 3. HMeta-d output. (A) Example output from HMeta-d fit to simulated data with ground truth meta-d’/d’ fixed at 0.8 for 20 subjects. The
left panel shows the first 1000 samples from each of three MCMC chains for parameter lmetad0 =d0 ; the right panel shows all samples aggregated
in a histogram. (B) Parameter recovery exercise using HMeta-d to fit data simulated from 7 groups of 20 subjects with different levels of meta-
d’/d’ ¼ [0.5 0.75 1.0 1.25 1.5 1.75 2]. Error bars denote 95% CI.

In addition to enabling inference on individual parameter d’/d’ was fixed at 0.8. The chains show excellent mixing with a
distributions, there may be circumstances in which we wish to modest number of samples (10 000 per chain; R ^ ¼ 1.000) and the
compare models of different complexity (see ‘Discussion’ posterior is centred around the ground truth simulated value.
Section). To enable this, JAGS returns the deviance information
criteria (DIC) for each model which is a summary measure of
goodness of fit, while penalizing for model complexity Parameter recovery
(Spiegelhalter et al. 2002; lower is better). While DIC is known to To further validate the model, a parameter recovery exer-
be somewhat biased towards models with greater complexity, it cise was carried out in which data were simulated from 7 groups
is a common metric for assessing model fit in hierarchical mod- of 20 subjects with different levels of meta-d’/
els. In HMeta-d the DIC for each model can be obtained in d’ ¼ [0.5 0.75 1.0 1.25 1.5 1.75 2]. All other settings were as de-
fit.mcmc.dic. scribed in the previous section. Figure 3B plots the fitted
group-level lmetad0 =d0 and its associated 95% CI for each of the
simulated datasets against the empirical ground truth, demon-
Simulations strating robust parameter recovery.
To assess properties of the model fit and compare alternative
fitting procedures, simulated confidence rating data were gen-
erated for pre-specified levels of metacognitive efficiency. Type Empirical examples
2 probabilities Pðconf ¼ yjvstim; respÞ were computed from the To illustrate the practical application of HMeta-d I fit data from
equations in the Appendix for particular settings of meta-d’, c a recent experiment that examined metacognitive sensitivity in
and c2. These probabilities were then used to generate multino- perceptual and mnemonic tasks in patients with post-surgical
mial response counts using the function mnrnd in MATLAB, lesions and controls (Fleming et al. 2014). This study found (us-
where the sample size of each type 1 response class (hits, false ing single-subject estimates of meta-d’/d’) that metacognitive
alarms, correct rejections and misses) was obtained from a efficiency in patients with lesions to anterior prefrontal cortex
standard type 1 SDT model with criterion c and d’. This allowed (aPFC) was selectively compromised on a visual perceptual task
for independent control over d’ (i.e. the number of hits and false but unaffected on a memory task, suggesting that the neural ar-
alarms) and meta-d’ (the response-conditional distribution of chitecture supporting metacognition may comprise domain-
confidence ratings). After determining the value of d’ for each specific components differentially affected by neurological
simulation, the relevant value of meta-d’ could then be chosen insult.
to ensure a particular target meta-d’/d’ level. This procedure is For didactic purposes here I restrict comparison of metacog-
implemented in the MATLAB function metad_sim included as nition in the aPFC patients (N ¼ 7) and healthy controls (HC;
part of the toolbox. N ¼ 19) on the perceptual task. The task required a two-choice
discrimination as to which of the two briefly presented patches
contained a greater number of small white dots, followed by a
Results continuous confidence rating on a sliding scale from 1 (low con-
fidence) to 6 (high confidence). For analysis these confidence
Example fit
ratings were binned into four quantiles. For each subject confi-
Figure 3A shows the output of a typical call to HMeta-d and the dence rating data (levels 1–4) were sorted according to the posi-
resultant posterior samples of the population-level estimate of tion of the target stimulus (L/R) and the subject’s response (L/R),
metacognitive efficiency, lmetad0 =d0 , plotted with plotSamples. thereby specifying the two nR_S1 and nR_S2 arrays required for
The data were generated as 20 simulated subjects, each with estimating meta-d’.
400 trials and 4 possible confidence levels (confidence criteria For each group I constructed cell arrays of confidence counts
c2 ¼ 6½0:5 1 1:5; type 1 criterion c ¼ 0). For each subject, type 1 and estimated lmetad0 =d0 with the default settings in HMeta-d.
d’ was sampled from a normal distribution Nð2; 0:2Þ, and meta- The resultant posterior distributions are plotted in the left panel
8 | Fleming

Downloaded from https://academic.oup.com/nc/article/2017/1/nix007/3748261 by guest on 06 January 2024


Figure 4. Empirical applications of HMeta-d. (A) HMeta-d fits to data from the perceptual metacognition task reported in Fleming et al. (2014).
Each histogram represents posterior densities of lmetad0 =d0 for two groups of subjects: HC ¼ healthy controls; aPFC ¼ anterior prefrontal cortex
lesion patients. The right panel shows the difference (in log units) between the group posteriors. The white bar indicates the 95% CI which ex-
cludes zero. (B) Example of extending the HMeta-d model to estimate the correlation coefficient q between metacognitive efficiencies in two
domains. The dotted line shows the ground-truth correlation between pairs of meta-d’/d’ values for 100 simulated subjects.

of Fig. 4A, and the posterior distribution of the difference is was selected from the set (0.5, 1, 2), and two type 2 criteria were
shown in the right panel. Several features are evident from specified such that 6 dc20 ¼ 1. The generated data thus consisted
these outputs. First, there is a reduced metacognitive efficiency of a 2 (stimulus) 2 (responses) 2 (high/low confidence) matrix
in the aPFC group compared with controls, as revealed by the of response counts. In the second set of experiments type 1 d’
95% CI of the difference being greater than zero (right-hand was kept constant at 1, and the type 2 criteria were selected
panel). Second, the posterior distribution of metacognitive effi- from the set 6 dc20 ¼ ð0:5; 1; 2Þ. Generative meta-d’/d’ was fixed at
ciency in the healthy controls is overlapping with the optimal 1, and type 1 criterion was fixed at 0.
estimate of 1. Finally, for the aPFC group, which compromises Each simulated subject’s data was fit using the MLE and SSE
fewer subjects, there is a higher degree of uncertainty about the routines available from http://www.columbia.edu/bsm2105/
true metacognitive efficiency—the width of the posterior distri- type2sdt/, correcting for zero response counts by adding 0.25 to
bution is greater. This is due to the parameter estimate being all cells [a generalization of the log-linear correction typically
constrained by fewer data points and is a natural consequence applied when estimating type 1 d’, as recommended by Hautus
of the Bayesian approach. (1995)]. For each group of 20 subjects the mean meta-d’/d’ ratio
and the output of a one-sample t-test against the null value of 1
was stored. The same data (without padding) were entered into
Comparison of fitting procedures the hierarchical Bayesian estimation routine as described above
To compare the quality of the fit of the hierarchical Bayesian and the posterior mean stored. A false positive was recorded if a
method against MLE and SSE point-estimate approaches, I ran a one-sample t-test against the null value (meta-d’/d’ ¼ 1) was sig-
series of simulation experiments to investigate parameter re- nificant (P < 0.05) for the MLE/SSE approaches, or if the symmet-
covery of known meta-d’/d’ ratios for different d’ and type 2 cri- ric 95% credible interval excluded 1 for the hierarchical
teria placements across a range of trial counts. Bayesian approach. This procedure was repeated 100 times for
In each experiment, I simulated confidence rating data for each setting of trial counts and parameters.
groups of N ¼ 20 subjects while manipulating the number of tri- Figure 5A and B shows the results of Experiments 1 and 2,
als (20, 50, 100, 200, 400). In the first set of experiments, type 1 d’ respectively, for medium levels of metacognitive efficiency
HMeta-d: estimating metacognitive efficiency | 9

Downloaded from https://academic.oup.com/nc/article/2017/1/nix007/3748261 by guest on 06 January 2024


Figure 5. Simulation experiments—medium metacognitive efficiency (meta-d’/d’ ¼ 1). (A and B) Estimated meta-d’/d’ ratio for different fitting
procedures while varying (A) d’ values or (B) type 2 criteria placements. Each data point reflects the average of 100 simulations each with N ¼ 20
subjects. Error bars reflect standard errors of the mean. The ground truth value of meta-d’/d’ is shown by the dotted line.

Figure 6. Simulation experiments—low metacognitive efficiency (meta-d’/d’ ¼ 0.5). For legend see Fig. 5.

(meta-d’/d’ ¼ 1). For intermediate values of d’ and criteria (mid- values, even when trial counts are low, by avoiding padding and
dle panels), all methods perform similarly, and recover the true capitalizing on the hierarchical structure of the model to mutu-
meta-d’/d’ ratio. However when d’ is low, or criteria are extreme, ally constrain subject-level fits. Alternatively, HMeta-d may rely
the MLE and SSE methods tend to misestimate metacognitive more on the prior when data are scarce, thus shrinking group
efficiency when the number of trials per subject is < 100, leading estimates to the prior mean. The second explanation predicts
to high false positive rates. These misestimations are similar to that HMeta-d would become less accurate when true metacog-
the effect of zero cell-count corrections on recovery of type 1 d’ nitive efficiency deviates from the prior mean (meta-d’/d’  1).
(Hautus 1995). In contrast, HMeta-d provides accurate parame- To adjudicate between these explanations I repeated the
ter recovery in the majority of cases. simulations at low (meta-d’/d’ ¼ 0.5) and high (meta-d’/d’ ¼ 1.5)
Why does HMeta-d outperform classical estimation proce- metacognitive efficiency (Figs 6 and 7). These results show that
dures in this case? There are two possible explanations. First, HMeta-d is able to retrieve the true meta-d’/d’ even when meta-
HMeta-d may be more efficient at retrieving true parameter cognitive efficiency is appreciably less than or greater than 1
10 | Fleming

Downloaded from https://academic.oup.com/nc/article/2017/1/nix007/3748261 by guest on 06 January 2024


Figure 7. Simulation experiments—high metacognitive efficiency (meta-d’/d’ ¼ 1.5). For legend see Fig. 5.

Flexible extensions of the basic model


An advantage of working with Bayesian graphical models is
that they are easily extendable to estimate other influences on
metacognitive efficiency in the context of the same model (Lee
and Wagenmakers 2014). For instance, one question of interest
is whether metacognitive ability in one domain, such as percep-
tion, is predictive of metacognitive ability in another domain,
such as memory. Evidence pertaining to this question is mixed:
some studies have found evidence for a modest correlation in
metacognitive efficiency across domains (McCurdy et al. 2013;
Ais et al. 2016), whereas others have reported a lack of correla-
tion (Kelemen et al. 2000; Baird et al. 2013). One critical issue in
testing this hypothesis is that uncertainty in the model’s esti-
mate of meta-d’ should be incorporated into an assessment of
Figure 8. Observed false positive rates for each fitting procedure.
any correlation between the two domains. This is naturally ac-
Average false positive rates for hypothesis tests against ground truth
commodated by embedding an estimate of the correlation coef-
meta-d’/d’ values from the simulations in Figs 5–7. Individual data
points reflect single experiments (the false positive rate for a partic-
ficient in a hierarchical estimation of metacognitive efficiency.
ular combination of metacognitive efficiency level, parameters and To expand the model, each subject’s metacognitive efficien-
trial count). Error bars reflect standard errors of the mean. For trial cies in the two domains (M1, M2) are specified as draws from a
counts < 200, MLE or SSE methods result in unacceptably high false bivariate Gaussian (Note parameter expansion is omitted here
positive rates due to consistent over- or underestimation of meta- for clarity):
cognitive efficiency.
" # " #!
lM1 r2M1 qrM1 rM2
½logðM1s Þ logðM2s Þ  N ; :
lM2 qrM1 rM2 r2M2

(see also Fig. 3B), consistent with the prior exerting limited in- Priors were specified as follows:
fluence on the results. One notable exception is found when
type 1 d’ is high, and trial counts are very low (20 per subject); l1M ; l2M  Nð0; 1Þ
in this case (upper right-hand panels), all fitting methods tend
to overestimate metacognitive efficiency.
Figure 8 provides a summary of false positive rates recorded rM1 ; rM2  InvSqrtGammað0:001; 0:001Þ
across all experiments for the three methods. Point-estimate
approaches (SSE and MLE) return unacceptably high false posi-
tive rates when trial counts are less than 200 per subject, due q  Uniformð1; 1Þ:
to consistent over- or underestimation of metacognitive effi-
To demonstrate the application of this expanded model I
ciency. In contrast, HMeta-d provides good control of the false
simulated 100 subjects’ confidence data from the type 2 SDT
positive rate in all cases except when trial counts are very low
model in two ‘tasks’. Each task’s generative meta-d’/d’ was
(<50 per subject).
HMeta-d: estimating metacognitive efficiency | 11

drawn from a bivariate Gaussian with mean ¼ lM1 ¼ lM2 ¼ 0:8 More generally, whether one should use metacognitive sen-
and standard deviations rM1 ¼ rM2 ¼ 0:5. Type 1 d’ was gener- sitivity (e.g. meta-d’ or AUROC2) or metacognitive efficiency
ated separately for each task from a Nð2; 0:2Þ distribution. The (meta-d’/d’) as a measure of metacognition depends on the goal
generative correlation coefficient q was set to 0.6. Data from of an analysis. For example, if we are interested in establishing
both domains are then passed into the model simultaneously, the presence or absence of metacognition in a particular condi-
and a group-level posterior distribution on the correlation coef- tion, such as when performance is particularly low (Scott et al.
ficient q is returned. Figure 4B shows this posterior together 2014) or in particular subject groups such as human infants
with the 95% CI, which encompasses the generative correlation (Goupil et al. 2016), computing metacognitive sensitivity alone
coefficient. may be sufficient. However, when comparing experimental con-
ditions or groups which may differ systematically in perfor-
mance, estimating metacognitive efficiency appropriately
Discussion controls for confounds introduced by type 1 performance and
The quantification of metacognition from confidence ratings is response biases. Note however there are also limitations in the
a question with application in several subfields of psychology applicability of the meta-d’ model. First and foremost, the task

Downloaded from https://academic.oup.com/nc/article/2017/1/nix007/3748261 by guest on 06 January 2024


and neuroscience, including consciousness, decision-making, should be amenable to analysis in a two-choice SDT framework,
memory, education, aging and psychiatric disorders. There are as fitting meta-d’ requires specification of a 2 (stimulus) 2
now several tools in the psychologist’s armoury for estimating (response) N (confidence rating) matrix. If a task does not con-
how closely subjective reports track task performance (Fleming form to these specifications (such as one with N alternative re-
and Lau 2014). An important advance is the recognition that sponses) then employing an alternative non-parametric
simple correlation coefficients are affected by fluctuations in measure of metacognitive sensitivity such as the area under the
performance and confidence bias, and the meta-d’ model was type 2 ROC (AUROC2) may be preferable (Fleming and Lau 2014).
developed to allow correction of metacognitive sensitivity for In addition, like all analysis approaches, meta-d’ assumes a par-
these potential confounds (Maniscalco and Lau 2012). ticular generative model of the confidence data that is at best
The hierarchical Bayesian approach to estimating metacog- incomplete, and untenable in certain circumstances. For in-
nitive efficiency introduced here enjoys several advantages. It stance, equal variance is specified for S1 and S2 distributions
naturally incorporates variable uncertainty about finite hit and and stable confidence criteria are assumed which may be at
false-alarm rates; it is the correct way to incorporate informa- odds with the findings of serial adjustments in criteria
tion about within- and between-subject uncertainty; it avoids (Treisman 1984; Rahnev et al. 2015; Norton et al. 2017).
the need for edge correction or data modification, and provides (Maniscalco and Lau’s fit_meta_d_MLE code allows setting the
a flexible framework for hypothesis testing and model expan- ratio of S1 and S2 variances as a free parameter; it would be pos-
sion. The toolbox provides a simple MATLAB implementation sible to incorporate a similar parameter in future versions of
that harnesses the MCMC sampler JAGS to return posterior dis- HMeta-d. However as described by Maniscalco and Lau (2014)
tributions over group-level model parameters. The tutorial out- there is ambiguity between changes in response-specific meta-
lined how data preparation is identical to that required for the cognitive efficiency and the variance ratio, and therefore we
existing maximum-likelihood routines, allowing the user to recommend users employ the equal-variance model unless
easily apply both approaches once data are in the correct for- they have access to independent estimates of the variance
mat. In simulation experiments, the hierarchical approach re- inequality).
covered more accurate parameter estimates than commonly More broadly, meta-d’ is primarily a tool for estimating
used alternatives (MLE and SSE), and this benefit was greatest metacognitive sensitivity, and additional considerations are
when there are limited numbers of trials per subject (Fig. 5). It is needed when developing a complete model of confidence
notable that the point-estimate approaches severely underesti- (Pouget et al. 2016; Fleming and Daw 2017). Recent modelling
mate average meta-d’/d’ ratios for low d’ and trial numbers work has sought to explicitly characterize type 1 and type 2 pro-
< 100 per subject, leading to a high false positive rate. Given cesses (Jang et al. 2012; Maniscalco and Lau 2016; Fleming and
that low (type 1) d’ is commonplace in psychophysical studies of Daw 2017), permitting flexible modelling of relationships be-
conscious awareness and metacognition, such biases may lead tween performance and metacognition. For instance, in
to erroneous conclusions that metacognitive efficiency is below Fleming and Daw’s ‘second-order’ model, an underlying genera-
the ideal observer prediction. In contrast, over-estimations tive model of action is specified, and confidence is formulated
were observed when d’ was high. as an inference on the model’s probability of being correct, con-
ditioned on both internal states and self-action. These frame-
works allow for multiple drivers of metacognitive sensitivity, in
Practical recommendations for quantifying contrast to the meta-d’ model which describes sensitivity only
metacognition relative to type 1 performance. It is thus useful to view meta-d’
If group-level estimates of meta-d’/d’ are of primary interest, as complementary to these modelling efforts. Just as d’ provides
HMeta-d allows direct, unbiased inference at this upper level of a bias-free measure of perceptual sensitivity that may be ex-
the hierarchy while appropriately handling participant-level plained by a number of contributing factors, meta-d’ provides a
uncertainty. The HMeta-d toolbox also allows Bayesian estima- bias-free metric for metacognitive sensitivity without commit-
tion of single-subject meta-d’, but if single-subject estimates are ment to a particular processing architecture.
of primary interest, the MLE approach may be simpler and com-
putationally less expensive. However, advantages of using a
Bayesian approach are obtained even in this case: uncertainty
Future directions
in parameter estimates can be easily quantified (as the posterior The HMeta-d model code can be flexibly extended to allow esti-
credible interval), with such uncertainty appropriately reducing mation of other influences on metacognitive sensitivity. Here
as trial count increases, and edge correction confounds are one simple example is explored, the specification of a
avoided. population-level correlation coefficient relating metacognitive
12 | Fleming

efficiencies across domains. More broadly, it may be possible to ability for memory and perception. J Neurosci 2013;33:16657–65.
specify flexible general linear models linking trial- or subject- http://doi.org/10.1523/JNEUROSCI.0786-13.2013
level variables to meta-d’ (Kruschke 2014). Currently this re- Barrett AB, Dienes Z, Seth AK. Measures of metacognition on
quires bespoke model specification, but in future work we hope signal-detection theoretic models. Psychol Methods
to provide a flexible user interface for the specification of arbi- 2013;18:535–52. http://doi.org/10.1037/a0033268
trary models (cf. Wiecki et al. 2013). Estimation of single-trial in- Charles L, Van Opstal F, Marti S, et al. Distinct brain mecha-
fluences on metacognitive efficiency, such as attentional state nisms for conscious versus subliminal error detection.
or brain activity, is a particularly intriguing proposition. NeuroImage 2013;73:80–94. http://doi.org/10.1016/j.neuro
Currently, estimation of meta-d’ requires many trials, restrict- image.2013.01.054
ing studies of the neural basis of metacognitive efficiency to Clarke F, Birdsall T, Tanner W. Two types of ROC curves and defi-
between-condition or between-subject analyses. Extending the nition of parameters. J Acoust Soc Am 1959;31:629–30.
HMeta-d framework to estimate trial-level effects on meta-d’ David AS, Bedford N, Wiffen B, et al. Failures of metacognition
may therefore accelerate our understanding of the neural basis and lack of insight in neuropsychiatric disorders. Philos Trans
of metacognitive efficiency. R Soc B Biol Sci 2012;367:1379–90. http://doi.org/10.1098/rstb.

Downloaded from https://academic.oup.com/nc/article/2017/1/nix007/3748261 by guest on 06 January 2024


Also naturally accommodated in a hierarchical framework is 2012.0002
the comparison of different model structures for metacognition Denison RN. Precision, not confidence, describes the uncertainty
within and across tasks. A currently open question is whether of perceptual experience. (Response to John Morrison’s
metacognition relies on common or distinct processes across "Perceptual confidence".) Anal Philos 2017;58:58–70.
different domains, such as perception or memory (Baird et al. Flavell JH. Metacognition and cognitive monitoring: a new area
2013; McCurdy et al. 2013; Fleming et al. 2014; Ais et al. 2016). One of cognitive–developmental inquiry. Am Psychologist
approach to addressing this question is to specify variants of 1979;34:906–11. http://doi.org/10.1037/0003-066X.34.10.906
the HMeta-d model in which different parameters are shared Fleming SM, Daw ND. Self-evaluation of decision-making: a gen-
across domains, such as meta-d’ and/or the confidence criteria. eral Bayesian framework for metacognitive computation.
Through model comparison, one could then obtain the model Psychol Rev 2017;124:91.
that best accounted for the relationship between metacognitive Fleming SM, Lau HC. How to measure metacognition. Front Hum
performance across different domains, and shed light on the Neurosci 2014;8:443. http://doi.org/10.3389/fnhum.2014.00443
common and distinct components. Fleming SM, Huijgen J, Dolan RJ. Prefrontal contributions to meta-
cognition in perceptual decision making. J Neurosci
2012;32:6117–25. http://doi.org/10.1523/JNEUROSCI.6489-11.2012
Conclusions Fleming SM, Ryu J, Golfinos JG, et al. Domain-specific impairment
This article introduces a hierarchical Bayesian approach to esti- in metacognitive accuracy following anterior prefrontal le-
mating metacognitive efficiency. This approach has several sions. Brain 2014;137:2811–22. http://doi.org/10.1093/brain/
methodological advantages in comparison to current methods awu221
that focus on single-subject point estimates, and may prove Fleming SM, Weil RS, Nagy Z, et al. Relating introspective accu-
particularly beneficial for studies of metacognition in patient racy to individual differences in brain structure. Science
populations and cognitive neuroscience experiments where of- 2010;329:1541–3. http://doi.org/10.1126/science.1191883
ten only limited data are available. More broadly, this frame- Galvin SJ, Podd JV, Drga V, et al. Type 2 tasks in the theory of sig-
work can be flexibly extended to specify and compare different nal detectability: discrimination between correct and incorrect
models of meta-d’ within a common scheme, thereby advanc- decisions. Psychon Bull Rev 2003;10:843–76.
ing our understanding of the neural and computational basis of Gelman A, Hill J. Data Analysis Using Regression and Multilevel/
self-evaluation. Hierarchical Models. Cambridge: Cambridge University Press,
2007.
Gelman A, Rubin DB. Inference from iterative simulation using
Acknowledgements multiple sequences. Stat Sci 1992;7:457–72. http://doi.org/10.
I thank Dora Matzke for advice on parameter expansion, 1214/ss/1177011136
and David Huber and one other anonymous reviewer for Goupil L, Romand-Monnier M, Kouider S. Infants ask for help
helpful suggestions. when they know they don’t know. Proc Natl Acad Sci
2016;113:3492–6. http://doi.org/10.1073/pnas.1515129113
Hautus MJ. Corrections for extreme proportions and their biasing
Funding effects on estimated values of d’. Behav Res Methods Instr
Comput 1995;27:46–51.
This work is funded by a Sir Henry Wellcome Fellowship
Heatherton TF. Neuroscience of self and self-regulation. Ann Rev
from the Wellcome Trust (096185) awarded to S.M.F. The
Psychol 2011;62:363–90. http://doi.org/10.1146/annurev.psych.
Wellcome Trust Centre for Neuroimaging is supported by
121208.131616
core funding from the Wellcome Trust (091593/Z/10/Z).
Howell DC. Statistical Methods for Psychology. Boston, MA:
Conflict of interest statement. None declared. Wadsworth Pub Co, 2009.
Jang Y, Wallsten TS, Huber DE. A stochastic detection and re-
trieval model for the study of metacognition. Psychol Rev
References 2012;119:186.
Ais J, Zylberberg A, Barttfeld P, et al. Individual consistency in the Keene ON. The log transformation is special. Stat Med
accuracy and distribution of confidence judgments. Cognition 1995;14:811–9.
2016;146:377–86. http://doi.org/10.1016/j.cognition.2015.10.006 Kelemen WL, Frost PJ, Weaver CA. Individual differences in
Baird B, Smallwood J, Gorgolewski KJ, et al. Medial and lateral metacognition: evidence against a general metacognitive abil-
networks in anterior prefrontal cortex support metacognitive ity. Memory Cogn 2000;28:92–107.
HMeta-d: estimating metacognitive efficiency | 13

Ko Y, Lau H. A detection theoretic explanation of blindsight sug- Palmer EC, David AS, Fleming SM. Effects of age on metacogni-
gests a link between conscious perception and metacognition. tive efficiency. Conscious Cogn 2014;28:151–60. (http://doi.org/
Philos Trans R Soc B Biol Sci 2012;367:1401–11. 10.1016/j.concog.2014.06.007
Kruschke JK. Doing Bayesian Data Analysis. Academic Press, 2014. Persaud N, McLeod P, Cowey A. Post-decision wagering objec-
Lau HC, Rosenthal D. Empirical support for higher-order theories tively measures awareness. Nat Neurosci 2007;10:257–61.
of conscious awareness. Trends Cogn Sci 2011;15:365–73. http:// Plummer M. JAGS: A program for analysis of Bayesian graphical
doi.org/10.1016/j.tics.2011.05.009 models using Gibbs sampling. In: Proceedings of the 3rd
Lee MD. BayesSDT: software for Bayesian inference with signal International Workshop on Distributed Statistical Computing, 2003.
detection theory. Behav Res Methods 2008;40:450–6. Pouget A, Drugowitsch J, Kepecs A. Confidence and certainty:
Lee MD, Wagenmakers E-J. Bayesian Cognitive Modeling: A Practical distinct probabilistic quantities for different goals. Nat Neurosci
Course. Cambridge: Cambridge University Press, 2014. 2016;19:366–74. http://doi.org/10.1038/nn.4240
Lichtenstein S, Fischhoff B, Phillips LD. Calibration of probabili- Rabbitt P, Vyas S. Processing a display even after you make a re-
ties: the state of the art to 1980. In: Kahneman D, Slovic P, sponse to it. how perceptual errors can be corrected. Quart J
Tversky A (eds), Judgment under Uncertainty: Heuristics and Exp Psychol Sect A 1981;33:223–39. http://doi.org/10.1080/

Downloaded from https://academic.oup.com/nc/article/2017/1/nix007/3748261 by guest on 06 January 2024


Biases. Cambridge: Cambridge University Press, 1982. 14640748108400790
Macmillan N, Creelman C. Detection Theory: A User’s Guide. New Rahnev D, Koizumi A, McCurdy LY, et al. Confidence leak in per-
York: Lawrence Erlbaum, 2005. ceptual decision making. Psychol Sci 2015; 26:1664–80. http://
Maniscalco B, Lau H. Signal detection theory analysis of Type 1 doi.org/10.1177/0956797615595037
and Type 2 data: Meta-d”, response-specific Meta-d”, and the Rausch M, Zehetleitner M. Visibility is not equivalent to confi-
unequal variance SDT model. In: Fleming SM, Frith CD (eds), dence in a low contrast orientation discrimination task. Front
The Cognitive Neuroscience of Metacognition. Berlin Heidelberg: Psychol 2016;22:591.
Springer, 2014. Schooler JW. Re-representing consciousness: dissociations be-
Maniscalco B, Lau HC. A signal detection theoretic approach for tween experience and meta-consciousness. Trends Cogn Sci
estimating metacognitive sensitivity from confidence ratings. 2002;6:339–44.
Conscious Cogn 2012;21:422–30. http://doi.org/10.1016/j.concog. Scott RB, Dienes Z, Barrett AB, et al. Blind insight: metacogni-
2011.09.021 tive discrimination despite chance task performance.
Maniscalco B, Lau H. The signal processing architecture underly- Psychol Sci 2014;25:2199–208. http://doi.org/10.1177/
ing subjective reports of sensory awareness. Neurosci 0956797614553944
Conscious 2016;1:niw002 http://doi.org/10.1093/nc/niw002 Spiegelhalter DJ, Best NG, Carlin BP, et al. Bayesian measures of
Masson MEJ, Rotello CM. Sources of bias in the Goodman- model complexity and fit. J R Stat Soc Ser B 2002;64:583–639.
Kruskal gamma coefficient measure of association: implica- http://doi.org/10.1111/1467-9868.00353
tions for studies of metacognitive processes. J Exp Psychol Learn Treisman M. A theory of criterion setting: an alternative to the
Memory Cogn 2009;35:509–27. http://doi.org/10.1037/a0014876 attention band and response ratio hypotheses in magnitude
Matzke D, Lee MD, Wagenmakers E-J. Signal detection theory: estimation and cross-modality matching. J Exp Psychol Gen
parameter expansion. In: Lee MD, Wagenmakers E-J (eds), 1984;113:443–63.
Bayesian Cognitive Modeling: A Practical Course. Cambridge Weil LG, Fleming SM, Dumontheil I, et al. The development of
University Press, 2014, 187–95. metacognitive ability in adolescence. Conscious Cogn
McCurdy LY, Maniscalco B, Metcalfe J, et al. Anatomical coupling 2013;22:264–71. http://doi.org/10.1016/j.concog.2013.01.004
between distinct metacognitive systems for memory and vi- Weiskrantz L, Warrington EK, Sanders MD, et al. Visual capacity
sual perception. J Neurosci 2013;33:1897–906. http://doi.org/10. in the hemianopic field following a restricted occipital abla-
1523/JNEUROSCI.1890-12.2013 tion. Brain 1974;97:709–28.
Moeller SJ, Goldstein RZ. Impaired self-awareness in human ad- Wiecki TV, Sofer I, Frank MJ. HDDM: Hierarchical Bayesian estima-
diction: deficient attribution of personal relevance. Trends Cogn tion of the drift-diffusion model in Python. Front Neuroinformat
Sci 2014;18:635–41. http://doi.org/10.1016/j.tics.2014.09.003 2013;7:14. http://doi.org/10.3389/fninf.2013.00014
Nelson T. A comparison of current measures of the accuracy of
feeling-of-knowing predictions. Psychol Bull 1984;95:109–33.
Norton EH, Fleming SM, Daw ND, et al. Suboptimal criterion
learning in static and dynamic environments. PLoS Comput Biol
2017;13:e1005304.
14 | Fleming

Appendix
Type 2 SDT model equations Probðconf ¼ y j stim ¼ S2; resp ¼ “S1”Þ
 0  0
/ c2;“S1” ðyÞ; d2  / c2;“S1” ðy þ 1Þ; d2
For a discrete confidence scale ranging from 1 to k, k – 1 type 2 ¼  d0 
criteria are required to rate confidence for each response type. / c; 2
We define type 2 confidence criteria for S1 and S2 responses as:

  Probðconf ¼ y j stim ¼ S1; resp ¼ “S2”Þ


conf¼2
c2;“S1” ¼ c; c2;“S1” conf¼3
; c2;“S1” conf¼k
; . . . ; c2;“S1” ; 1  0  0
/ c2;“S2” ðy þ 1Þ;  d2  / c2;“S2” ðyÞ;  d2
¼  0 
1  / c;  d2
 
conf¼2 conf¼3 conf¼k
c2;“S2” ¼ c; c2;“S2” ; c2;“S2” ; . . . ; c2;“S2” ;1
Probðconf ¼ y j stim ¼ S2; resp ¼ “S2”Þ   
0 0
And / c2;“S2” ðy þ 1Þ; d2  / c2;“S2” ðyÞ; d2
 0

Downloaded from https://academic.oup.com/nc/article/2017/1/nix007/3748261 by guest on 06 January 2024


¼ ;
  1  / c; d2
c ascending ¼ cconf¼k conf¼k1
2;“S1” ; c2;“S1” ; ...;cconf¼1 conf¼1 conf¼2 conf¼k
2;“S1” ; c; c2;“S2” ;c2;“S2” ; ...;c2;“S2”

where /ðÞ is the cumulative distribution function of the stan-


Then the probabilities of each confidence rating conditional dard normal distribution.
on a given stimulus and response are as follows:

Probðconf ¼ y j stim ¼ S1; resp ¼ “S1”Þ


 0  0
/ c2;“S1” ðyÞ;  d2  / c2;“S1” ðy þ 1Þ;  d2
¼  0 
/ c;  d2

You might also like