Tutorial MixtureModel Brms
Tutorial MixtureModel Brms
3 A tutorial for estimating mixture models for visual working memory tasks in brms:
10 Figures: 14
11 Tables: 2
12
13
14
15
16 Author Note
17 Gidon Frischkorn and Vencislav Popov contributed equally to the manuscript and should
1 Abstract
2 Mixture models for visual working memory tasks using continuous report recall are
3 highly popular measurement models in visual working memory research. Yet, efficient and easy-
4 to-implement estimation procedures that flexibly enable group or condition comparisons are
5 scarce. Specifically, most software packages implementing mixture models have used maximum
6 likelihood estimation for single-subject data. Such estimation procedures require large trial
7 numbers per participant to obtain robust and reliable estimates. This problem can be solved with
8 hierarchical Bayesian estimation procedures that provide robust and reliable estimates with lower
9 trial numbers. In this tutorial, we illustrate how mixture models for visual working memory tasks
10 can be specified and fit in the R package brms. The benefit of this implementation over existing
12 the mixture models with an efficient linear model syntax that enables us to adapt the mixture
13 model to practically any experimental design. Specifically, this implementation allows varying
15 hierarchical structure and the specification of informed priors can improve subject-level
16 parameter estimation and solve estimation problems frequently. We will illustrate these benefits
17 in different examples and provide R code for easy adaptation to other use cases. We also
18 introduce a new R package called bmm, which simplifies the process of estimating these models
19 with brms.
20 Keywords: Tutorial, Mixture Model, Visual Working Memory, brms, Bayesian Modeling
Running head: ESTIMATING VWM MIXTURE MODELS IN BRMS 3
1 A tutorial for estimating mixture models for visual working memory tasks in brms:
3 In research on visual working memory participants are often asked to remember and
4 reproduce continuous features of visual objects such as their color or orientation (Prinzmetal,
5 Amiri, Allen & Edwards, 1998; Wilken & Ma, 2004). These continuous reproduction tasks
6 produce rich data that is often analyzed by measurement mixture models. Using such models
7 allows researchers to dissociate relevant aspects of behavioral performance such as the precision
8 of the memory representation vs the probability of recalling the correct feature (Zhang & Luck,
9 2008; Bays et al., 2009; Oberauer et al., 2017; Brady et al.; 2022; Oberauer, 2022). Although
10 mixture models have been widely applied by many researchers in the field1, a flexible, well-
11 documented, and easily accessible way of efficient estimation of these models is lacking. This
12 tutorial provides an implementation of three highly popular measurement models for visual
13 working memory tasks using the R package brms (Bürkner et al., 2018).
14 In the continuous reproduction task (sometimes also called delayed estimation task),
15 participants encode a set of visual objects into visual working memory and are then asked to
16 reproduce a specific feature of one cued object on a continuous scale at test (see Figure 1 for an
17 illustration). Most often the features used in these tasks are colors sampled from a color wheel
18 (Wilken & Ma, 2004) or continuous orientations of a bar or a triangle (Bays et al., 2011). The set
20 over the screen. Thus, participants must associate the to-be-remembered features (e.g. color or
1
For example, a Google Scholar query for “‘mixture model’ AND ‘visual working memory’” returns 930
results; the article by Zhang and Luck (2008) which introduced mixture modeling in visual working memory has
been cited 1677 times; the MATLAB package MemToolbox which implements various mixture models for visual
working memory has been cited 252 times (Suchow et al., 2013).
Running head: ESTIMATING VWM MIXTURE MODELS IN BRMS 4
Figure 1. Illustration of a typical continuous reproduction task using colored squares as visual objects. Participants
should remember which color was presented at which location and after a short retention interval, they are asked to
reproduce the color of the cued item by selection on the color wheel. The dependent variable is the response error,
that is the deviation of the selected response from the originally presented color (illustrated by the arc).
1 orientation) with the spatial locations they are presented at. The precision of the representation of
2 an object’s feature in visual working memory is measured as the angular deviation from the true
6 average deviation of the response from the true feature value. In many studies, this average recall
7 error has been the main dependent variable for evaluating the effect of experimental
8 manipulations. Yet, the average recall error confounds different properties of memory
9 representations and does not sufficiently represent the theoretical processes assumed by current
10 models of visual working memory. Therefore, different measurement models have been
11 proposed to formalize distinct aspects of visual working memory models and how they translate
12 into observed behavior (for an overview of currently proposed measurement models, see
2 because they decompose the average recall error into several theoretically meaningful
3 parameters. The three measurement models we will be addressing in this tutorial paper are a) the
4 two-parameter mixture model (Zhang & Luck, 2008), b) the three-parameter mixture model
5 (Bays et al., 2009), and c) the interference measurement model (Oberauer et al., 2017). The first
7 measurement model (Oberauer et al., 2017). At the core of these models is the assumption that
8 responses in continuous reproduction tasks can stem from different distributions depending on
9 the continuous activation of different memory representation or the cognitive state a person is in
12 the cued object with a certain precision of its feature in visual working memory (see solid blue
13 distribution in Figure 2), versus b) having no representation in visual working memory and thus
14 guessing a random response (see the dashed red distribution in Figure 2). Responses based on a
15 noisy memory representation of the correct feature come from a circular normal distribution (i.e.,
16 von Mises) centered on the correct feature value, while guessing responses come from a uniform
17 distribution along the entire circle. The three-parameter mixture model adds a third state, namely
18 confusing the cued object with another object shown during encoding and thus reporting the
19 feature of the other object (see long dashed green distribution in Figure 2). Responses from this
20 state are sometimes called non-target responses or swap errors. Finally, the interference
22 activation – that is background noise, general, and context activation – and additionally accounts
23 for the spatial proximity between items, predicting that confusion between spatially close items
Running head: ESTIMATING VWM MIXTURE MODELS IN BRMS 6
Figure 2. In the described measurement model, the probability of reporting a certain feature value depends on the
cognitive state a person is in at recall. If the person recalls the cued object, it will report values following the solid
blue distribution. If the person recalls another object, it will report values following the long dashed green
distribution centered on the other object value. If the person cannot recall anything, it will be guessing a random
value following the dashed red distribution. Depending on the relative proportions of these distribution the
overserved responses will follow the black-dotted mixture of all distributions.
1 is more likely than between distant items. For all three models, the resulting observed recall
2 distribution is a weighted mixture of the different distributions included in the model (see the
4 When applied to the data, the first two mixture models estimates several parameters: κ,
5 which is the precision of the von Mises distribution of memory representations (which is the
6 inverse of σ, the standard deviation of the circular normal distribution); pmem, the probability that
7 a response comes from memory of the correct feature; pnon-target, the probability that a response
8 comes from memory for an incorrect feature associated with another object in memory; and
9 pguessing, the probability that a response is a random guess. In the two-parameter mixture model
10 pnon-target = 0 and pmem+pguessing = 1, while in the three-parameter mixture model pmem + pnon-target +
Running head: ESTIMATING VWM MIXTURE MODELS IN BRMS 7
1 pguessing = 1. Formally, according to the mixture models the probability of responding with a
2 feature x is:
∑$+,- 𝑣𝑀(𝑥; 𝜇+ , κ)
3 𝑃(𝑥) = 𝑝!"! 𝑣𝑀(𝑥; 𝜇# , κ) + 𝑝$%$&'()*"' + 𝑝*."// 𝑣𝑀(𝑥; 0,0) (1)
𝑛−1
4 where vM is the von Mises distribution, 𝜇# is the location of the target feature, 𝜇+
5 represents the locations of the non-target features, and 𝑣𝑀(𝑥; 0,0) is the von Mises distribution
6 with 0 precision, which is equivalent to a uniform circular distribution. Finally, n specifies the
7 number of features to be held in memory (i.e., set size). The interference measurement model
8 further decomposes the probabilities of selecting a target, a non-target or random guessing into
9 continuous sources of activations for context activation (c), item activation (a), and background
10 noise (n). For more details on this decomposition see Oberauer et al. (2017).
12 parameter estimation
13 Until recently, most researchers have used custom-built code to implement the existing
14 mixture models. Although there is software that implements some of these measurement models
15 (e.g., Grange et al., 2021), this software mostly uses a two-step procedure. First, the parameters
16 of the model are estimated separately for each subject in each condition using maximum
17 likelihood methods. Then, in a second step the parameter estimates are analyzed with traditional
18 inference methods such as t-tests, ANOVA, or linear regression to determine which parameters
20 several ways and can either lead to the over- or underestimation of standard errors in statistical
21 tests (Boehm et al., 2018; Skrondal & Laake, 2001). Furthermore, to obtain robust parameter
22 estimates maximum likelihood estimation requires at least 200 trials per subject per condition
1 To solve these issues, hierarchical Bayesian implementations of these models have also
2 been proposed (Hardman, 2016/2017; Oberauer et al., 2017; Suchow et al., 2013). Hierarchical
3 Bayesian parameter estimation provides several benefits over frequentist estimations (again, see
4 Table 1 for an overview). Critically, by estimating the data from all subjects and all conditions
5 simultaneously, robust parameter estimates can be obtained with less data per subject and
6 condition (see Appendix A for recovery simulations, which shows a case where non-hierarchical
7 maximum likelihood estimation fails to recover the correct parameters, while the hierarchical
Speed of model Very quick: only seconds per subject Slow: from a few minutes for simple
estimation models to several hours or even days
for more complex models and larger
data sets
Required data > 200 retrievals per participant in > 50 retrievals per participant in each
each condition (Grange & Moore, condition (see Appendix)
2022)
must write their likelihood function linear model syntax can specify
and adapt it for each experiment. models for almost any use case of the
mixture model.
Model Separately, for each subject à As the model is estimated for the
comparison problem with consistency over the whole sample simultaneously, model
whole sample (for an example, see comparisons can be made over the
Popov et al., 2021) whole sample (for an example, see
Oberauer et al., 2017).
2 models, if these already exist? The proposed implementations have specified the mixture model
3 in JAGS (Oberauer et al., 2017), MATLAB (Suchow et al., 2013) or a no longer maintained R
4 package (Hardman, 2016/2017). The JAGS implementation by Oberauer et al. (2017) uses code
5 designed for the specific experimental designs and factors analyzed in Oberauer et al. (2017).
6 Consequently, for other experiments with different factors and factor levels the JAGS code
7 would need to be adjusted for the specific conditions and groups in it. Thus, researchers would
Running head: ESTIMATING VWM MIXTURE MODELS IN BRMS 10
1 need to adapt JAGS code to apply these models to their specific experiments. This is a skill only
3 implementation by Suchow et al. (2013) is more flexible and well documented. However,
4 MATLAB is an expensive proprietary language that not everyone has access to. Nowadays, R is
5 the language of choice when teaching statistical analysis in Psychology departments, thus many
6 more researchers are familiar with it compared to MATLAB. Finally, the CatContModel R
7 package is not available on CRAN and is no longer actively maintained and thus does not provide
8 the stability and flexibility of an actively maintained and tested package such as brms.
10 allows to estimate all the three models we are presenting implementations for here. In particular,
11 they do not provide an implementation of the interference measurement model. And although
12 they provide a hierarchical Bayesian implementation of the mixture models over all subjects
13 none of the implementations allows to estimate the three-parameter mixture model or the
14 interference model simultaneously over different set sizes. Specifically, when parameters of the
15 model vary across conditions, these previous implementations need to fit a separate model to
16 each condition, followed by a 2-step inference procedure, thus significantly reducing the benefits
18 To overcome these difficulties and to make the discussed measurement models more
19 accessible, we illustrate how to implement these models in the R package brms, a general-
20 purpose package for estimating Bayesian multilevel regression models (Bürkner, 2017, 2018a,
21 2018b). The major benefit of this implementation is that brms provides a powerful linear model
22 syntax that allows us to flexibly specify which model parameters should vary dependent on
1 the measurement models for visual working memory tasks that can be adapted to practically any
2 experimental design. Additionally, brms uses the probabilistic programming language STAN
3 (Carpenter et al., 2017) to estimate parameters. Arguably, STAN is the most cutting-edge
4 estimation algorithm for Bayesian modeling and provides robust parameter estimates even when
Allows No Yes
continuous
predictors
Note: * any model in bmm can implement variable precision by including a random effect
over trials in the linear model syntax that captures trial-by-trial variability in parameters.
Running head: ESTIMATING VWM MIXTURE MODELS IN BRMS 12
1 parameters are correlated and with a small posterior sample size. Finally, as a general-purpose
2 package, brms has a large and active community, which can assist in solving problems. In sum,
3 implementing measurement models for visual working memory tasks in brms will enable more
4 researchers to use these models in their work and provides state-of-the-art hierarchical Bayesian
6 brms vs bmm
7 All mixture models presented in this tutorial can be implemented in brms without
9 model (Zhang & Luck, 2008) this way is relatively easy. However, implementing the three-
10 parameter mixture model (Bays et al., 2009) and the interference measurement model (Oberauer
11 & Lin, 2017) is more complicated, because they require transformation of parameters and
13 functionality to allow the estimation of both the three-parameter mixture model and the
14 interference measurement model over varying set sizes. To make these implementations more
15 accessible and reduce the chance for errors, we wrote an R package called bmm (Bayesian
16 Measurement Modeling)4, which provides wrapper functions around brms that take care of these
17 procedures and allow the user to specify the desired model more easily.
2
R scripts for all following examples are available on github: https://github.com/GidonFrischkorn/Tutorial-
MixtureModel-VWM/tree/main/scripts/
3
For example, you can compare the model formulas and priors for Example 3 written for brms vs bmm in
the files scripts/brms_examples/Example3_Bays2009.R and scripts/bmm_examples/Example3_Bays2009.R. The
bmm package allows the user to specify only the important elements of the formula, and then generates the
necessary formula and priors for brms itself. More generally, the relationship between bmm and brms is the same as
that between brms and stan – brms handles model specification, data recoding and other labor intensive procedures,
and then generates the relevant stan code. For measurement models, bmm takes care of similar tasks that cannot be
achieved in brms, before submitting the model to brms for estimation.
4
The bmm package and its installation instructions are available on github:
https://github.com/venpopov/bmm
Running head: ESTIMATING VWM MIXTURE MODELS IN BRMS 13
1 In the examples that follow, we first demonstrate how the two-parameter mixture model
2 can be implemented directly in brms. Although it is easier to use bmm even for these cases, we
3 wanted to show the full process in brms so that readers can become familiar with how the models
4 are implemented. Then, in Example 3-5 we only show how to use bmm to estimate the three-
5 parameter mixture model and the interference measurement model, rather than how to implement
8 In most common brms use cases, researchers specify how the dependent variable is
9 distributed (e.g., a gaussian distribution for continuous data or a binomial distribution for
10 accuracy data), and then specify a linear model that predicts parameters of this data distribution
12 Additionally, the model syntax implements generalized hierarchical modeling, meaning you can
13 specify random effects over subjects that account for individual differences in the overall fixed
14 effect. Initially, brms was not specifically designed to implement mixture models for visual
15 working memory, but one of its recent updates makes that possible: in addition to linear models
16 for single data distributions, brms now also allows users to specify mixtures of data distributions.
17 This is a critical feature that we will build on for estimating the above-described measurement
18 models for visual working memory tasks. Specifically, there are five steps we need to take for
21 2. Specify the model formula to set up the model and predict parameters of interest.
22 3. Set priors to follow the assumptions of the different measurement models and
4 mixture model (Zhang & Luck, 2008). To additionally showcase some of the powerful features
5 of this implementation in brms we will give some examples for different experimental designs.
6 This section is long because we explain all concepts in detail, but the actual code is relatively
7 short and simple. In fact, setting up most of the examples and running the brms models took us
8 from as little as 30 minutes to less than a few hours, showcasing the flexibility and adaptability
10 The data
11 For our first example, we used the data from Experiment 2 reported in Zhang & Luck
12 (2008). This experiment included a varying number (1, 2, 3, or 6) of spatially distributed colored
13 objects on screen and participants were asked to report the color of one randomly chosen object
14 after a short retention interval on the color wheel. For modeling with brms, the data needs to be
15 in long format, where each row represents a single observation, and each column specifies what
16 conditions generated this observation (see Figure 3 for an illustration of the first few rows of the
17 used dataset).
Figure 3. Structure of the Zhang & Luck (2008) dataset. subID = participant number; trial = trial number; setsize =
number of presented colors; RespErr = difference between response and target color location in radians; Pos_Lure1
to Pos_Lure5 = location of non-target colors relative to the target color location (in radians).
Running head: ESTIMATING VWM MIXTURE MODELS IN BRMS 15
2 For this first example, we will estimate the two-parameter mixture model, allowing the
3 precision of memory and the probability that an item is in memory to vary as a function of set
4 size. To specify the mixture family required for the two-parameter mixture model, we use the
5 mixture function provided by brms.5 For the two-parameter mixture model, we need a mixture of
6 two distributions, one for guessing and one for sampling from the memory representation. To
7 ensure that both distributions cover the same range of responses, it is easiest to specify a mixture
8 of two von Mises distributions. This is possible, because a von Mises distribution with a
9 precision of zero is equal to a uniform distribution over the circular space. Thus, the mixture
13 We can next specify the model formula to predict parameters from the model. This serves
15 specifies the variable names of the dependent and independent variables in our data set, and 3) it
16 specifies which parameters of the measurement model will be predicted by which variables in
17 our data. For mixture families in brms there are two classes of parameters: a) distributional
18 parameters of each distribution contained in the mixture (such as the mean, mu, and the
5
In principle, the mixture-function in brms can be used to specify any finite mixture of the supported data
distributions. This is a powerful feature that generalizes to any case in which the data does not follow a single
distribution and we hope that this tutorial enables researchers outside visual working memory research to consider
mixture models in cases where they might provide additional theoretical insight. One important limitation of the
implementation of mixture families to date is that the number of distributions cannot be determined or estimated by
the data and thus needs to be equal for all conditions. This requires additional variables and more complex model
code for a general-purpose implementation of the more complex measurement models for visual working memory
tasks. This will be explained when these models are introduced.
Running head: ESTIMATING VWM MIXTURE MODELS IN BRMS 16
1 precision, kappa, of the von Mises distributions), and b) mixing proportions (theta) for each
3 It is important to note that brms does not directly estimate the probabilities that each
4 response comes from each distribution (e.g. pmem and pguessing). Instead, brms estimates mixing
5 proportions that are weights applied to each of the mixture distributions and they are transformed
6 into probabilities (e.g. pmem and pguessing) using a softmax normalization. Thus, for a mixture of K
7 distributions, the probability for the data to stem from a specific mixture distribution i is:
𝑒 0!
8 𝑝+ = 0"
(2)
∑1
2 ,# 𝑒
9 Therefore, the mixing weights can range from minus to plus infinity, with negative values
10 resulting in a low probability of the data stemming from the respective mixture distributions and
11 positive values resulting in high probabilities of the data coming from the respective mixture
12 distributions. Because the distribution probabilities sum to 1 there would be infinitely many
13 solutions for obtaining specific probabilities (e.g. for any value j, if 𝜃# = 𝜃- = 𝑗, pmem = 0.5).
14 Thus, by default one of the mixing proportions is fixed to 0 in brms (usually the mixing
15 proportion of the last mixture distribution, which we will use as the guessing distribution), and
16 all other proportions are estimated freely. For example, if the mixing weight for responses
17 coming from memory is estimated as 2, we can obtain the response probabilities as such:
𝑒 0# 𝑒- 7.389
18 𝑝!"! = 0 = = = 0.88
𝑒 # + 𝑒 0$ 𝑒 - + 𝑒 4 7.389 + 1
𝑒 0$ 𝑒4 1
19 𝑝*."// = 0 = = = 0.12
𝑒 # + 𝑒 0$ 𝑒 - + 𝑒 4 7.389 + 1
20 For both the distribution parameters and the mixing proportions, the parameters are
21 indexed with an integer which specifies the distribution they are associated with. So, in our case
Running head: ESTIMATING VWM MIXTURE MODELS IN BRMS 17
1 we have the distributional parameters for two von Mises distributions, that is mu1 & mu2 for the
2 means, and kappa1 & kappa2 for the precision. Additionally, we have two mixing weights
3 theta1 and theta2. Distribution 1 is the von Mises distribution centered on the target response,
4 while Distribution 2 is the uniform circular distribution. Technically, there are thus six
5 parameters in this model. Practically, four of these parameters will be fixed to a constant or
6 determined by some variable in our data (see the next section “Setting priors to identify the
7 model” for more details) – mu1 and mu2 are fixed to 0, because the mean of the error and the
8 uniform distributions are 0, kappa2 will be fixed to 0 because the second von mises distribution
9 should be uniformly distributed, and theta2 will be fixed to 0 for the reasons explained in the
10 previous paragraph. Thus, only two parameters remain to be estimated: kappa1 (memory
11 precision) and theta1 (mixing proportion for responses coming from memory).
12 For illustration purposes, we specified the model formula for all parameters except theta2
13 for the two-parameter measurement model using the brmsformula or short bf function below:
18 For all linear model formulas in brms, the left side of an equation refers to the to-be
19 predicted variable or parameter and the right side specifies the variables used to predict it. In the
20 first line of the brmsformula, the left side specifies the dependent variable (RespErr) to be used
21 to fit the model. In this formula, the precision of the first von Mises (kappa1) is set to vary over
22 setsize, and additionally this setsize effect can vary over subject. Likewise, the mixing proportion
23 of the first von Mises distribution (theta1) varies over the setsize variable, and this effect can also
24 vary over subjects. Specifically, using the 0 + setsize coding we directly estimated the parameter
25 values for each set size. Instead, we could have used an effect coding specifying 1 + setsize.
Running head: ESTIMATING VWM MIXTURE MODELS IN BRMS 18
1 Then, brms would have estimated an intercept and effects following the effect coding of the
2 setsize variable (e.g., treatment or effects contrast as specified by contr.treat or contr.sum in R).
3 The random slopes for the setsize effects are set by specifying which of the fixed effects can vary
4 over which index variable using the || coding. The double vertical line additionally specifies that
5 correlations between the different random effects should not be estimated. This setting speeds up
6 model estimation and makes interpretation of the fixed effects more straightforward. Kappa2 is
7 not estimated but should be included in the formula so that we can specify a constant prior on it
8 later.
9 Additionally, there are three points that need to be considered: 1) the von Mises
11 converted to radians if it was originally coded in degrees, and 2) in the above specified model,
12 the response variable refers to the response error, that is the deviation of the response in one trial
13 from the target response of the to-be-reproduced item. Finally, 3) all but one of the mixing
14 proportions of any mixture family can be predicted. This is necessary to identify the softmax
17 To sufficiently identify the mixture distributions and implement assumptions from the
18 respective measurement model, we must constrain some model parameters via priors. Ultimately,
19 in the two-parameter mixture model we only estimate a) the precision of the target distribution,
20 that is kappa1, and b) the probability of an item being stored in memory, that is defined by the
21 mixing proportion theta1. All other model parameters thus need to be constrained via priors or
1 First, we want the guessing distribution to be uniform over the whole circular space. This
2 is achieved by fixing the precision of the second von Mises distribution to practically zero. For
3 estimation purposes brms uses a logarithmic link function for the precision parameters of the von
4 Mises. So, the precision kappa on the native scale is transformed onto the parameter space using
5 a logarithmic function. As log(0) is not defined, we can only fix kappa for the second von Mises
6 to a value very close to zero. Practically, any native kappa value below 10-3 achieves a virtually
7 uniform distribution6. Since the priors are set on the transformed parameters this would equate to
8 any value smaller than log (10&5 ). For convenience we set it to -100, which is a lot smaller.
9 Naturally, the mean or location of a uniform distribution on a circular space is not properly
10 defined, thus we also must fix this value (mu2), conventionally to 0. Likewise, we need to fix the
11 location of our target distribution (mu1), the first von Mises in our mixture, to the location of the
12 cued target. For all examples, we have specified the model formula using the response error as
13 dependent variable. Thus, the location of the memory distribution needs to be fixed to zero.
14 Finally, the softmax transformation needs to be identified internally by fixing one mixture
15 proportion as a reference. This is already internally implemented in brms for all mixture models.
16 With this default, the freely estimated mixture proportion for any mixture of two distributions
17 can be transformed into mixture probabilities using the inverse logit function. For more than two
18 distributions, we need to compute the probability using the softmax normalization. However, this
20 These constraints are best implemented into the model using the prior function of brms.
21 Specifically, we set constant priors for both parameters of the second von Mises and another
6
This is the value used for generating the guessing distribution in Figure 2.
Running head: ESTIMATING VWM MIXTURE MODELS IN BRMS 20
1 constant prior for the mean or location of our first von Mises. The prior specification looks like
2 this:
7 The first argument of the prior command always denotes the prior distribution we want to
8 use. The constant prior fixes the parameter to the exact specified value. The class argument
9 specifies which class of parameters the prior should be applied to, and the dpar argument
12 Using this specification, we can now estimate the parameters of the model. This is done
13 using the Bayesian regression model (brm) function of brms. We need to call the defined mixture
14 family and the specified priors together with our mixture formula and the data for which the
21 Additionally, further arguments can be submitted to the brm function: for example, how
22 many warmup and total iterations should be performed, or how many MCMC chains should be
23 sampled. For an explanation on these additional settings please see the brms user guide.
24 The function will first compile the code of the STAN model and then, after finishing
25 compilation, it will run the specified number of MCMC chains. The default value is four with
26 each chain having 1000 warmup samples and 1000 samples after warmup. For faster estimation,
Running head: ESTIMATING VWM MIXTURE MODELS IN BRMS 21
1 we recommend setting parallel sampling prior to running the model. This can be done, by
3 options(mc.cores = parallel::detectCores())
4 This option will allow the simultaneous estimation of as many MCMC chains as your
5 processor has capacity for. Typically, your system will be able to estimate at least 4 chains in
6 parallel. More recent PC and laptops can allow up to 32 parallel chains. However, 4 chains are
7 typically sufficient to ensure that different starting values converge to the same model results. To
8 avoid re-estimating the model after each time you close an R session, consider saving the fit-
9 object. For an example how to set up your R code to only estimate the model should there not be
10 a saved file for the model object, see the github repository that provides R code and results for all
11 examples.
13 Once parameter estimation is completed, we need to evaluate the model fit and results
14 from the parameter estimation. For this, we can choose from the wide range of functions
Figure 4. Example of a posterior predictive plot obtained via the pp_check function provided by brms. The black
line illustrates the distribution of the data. The different blue lines illustrate ten independent predicted distributions
from the model. The better the model predicted distributions overlay with the data, the better the model captures the
data. The provided posterior predictive plot thus illustrates a good fit of the model to the data.
Running head: ESTIMATING VWM MIXTURE MODELS IN BRMS 22
1 provided by brms and other R packages designed for analyzing posterior predictives from
2 Bayesian models (e.g., bayesplot, tidybayes, etc.). Our recommendation is to at least have a look
3 at graphical model fit plots that overlay posterior predictions from the model over the observed
4 data. In brms this can be done using the pp_check function (see Figure 4 for an illustration). If
5 the model fits the data reasonably well, you can proceed to examine and interpret the estimated
6 parameters.
8 parameters of the model (see Figure 5 for the screenshot of the results for this example). The
9 Group-Level Effects section of the model summary (top section in Figure 5) provides
10 information on the random effects, in our example the variation of effects over subjects. Unless
11 you are interested in individual differences this section is not of primary interest. The main
12 takeaway for our first example is that there is credible variation across all set sizes for both the
Figure 5. Screenshot of the summary output for the estimated parameters from the two-parameter mixture model of
the Zhang & Luck (2008, Exp. 2) data estimated via brms.
Running head: ESTIMATING VWM MIXTURE MODELS IN BRMS 23
1 precision of memory representations (kappa1) and the probability for recalling an item from
2 memory (theta1). The Population-Level Effects section (bottom section in Figure 5) summarizes
3 the differences between parameters varying over the variables we specified in our model
4 formula, in this case set size. You need to keep in mind that the reported values are for the
5 estimates on the parameter space (i.e., log transformed kappa; mixing proportions instead of
6 probabilities), so you need to transform them first. For more information about the output of
8 There are several ways of extracting information from a fitted brms object. The functions
9 fixef and ranef for example provide summaries of the estimated fixed and random effects7. These
10 can be used to transform both fixed and random effects from the parameter space to the native
11 scale. In this case, we can use an exponential transformation to transform kappa estimates (i.e.,
12 exp (𝑘𝑎𝑝𝑝𝑎1)) and an inverse logit (equation 2) to transform theta estimates into probabilities.
13 Additionally, we can convert kappa into the standard deviation of the von Mises using the
#
14 approximation of 𝑠𝑑 = G6 that is adequate for large κ values, or with the more accurate k2sd()
15 function from the bmm package. Keep in mind that this is scaled in radians, because our response
16 variable was provided in radians. Nevertheless, the standard deviation estimates in radians can
/7%
17 also be transformed into standard deviations in degrees using: 𝑠𝑑7"* = 8
∗ 180. All these
18 transformed estimates can then for example be used to plot fixed effects over conditions or
7
Please note, that random effects are centered, meaning parameter estimates reflect deviations from the
respective mean not the absolute estimate for each subject. Therefore, you need to add the fixed effect for the
respective condition to the random effects and then transform them to the absolute scale. Otherwise, you will obtain
incorrect estimates for the different subjects.
Running head: ESTIMATING VWM MIXTURE MODELS IN BRMS 24
1 In Figure 6 we show the brms parameter estimates for the Zhang and Luck (2008) data.
2 Estimating the parameters from the two-parameter mixture model in brms yielded a good model
3 fit (see Figure 4) and arrived at practically equivalent results as reported in Zhang & Luck
4 (2008). This demonstrates that our implementation works as intended and converges in
5 parameter estimates with other estimation procedures. We will not go into detail for all the
6 possibilities offered by brms to evaluate model results. The R code in the online supplement
7 illustrates some common steps in evaluating and plotting model results. Additional online
8 material in several blogs or in the brms documentations provides ample introduction into post-
Figure 6. Replication of results from Zhang & Luck (2008) using the introduced hierarchical modeling
framework for the two-parameter mixture model in brms. Panel A (on the left) shows the results for the
probability of having an item in memory. Panel B (on the right) shows the results for the precision of memory
representations. The posterior mean (point) and 95% credibility interval (line range) of the parameter
estimates are shown in black. The average of the subject wise estimates reported by Zhang & Luck (2008) are
shown by the black diamonds. The grey distributions illustrate the whole posterior distribution of estimated
parameters from the brms model implementation.
Running head: ESTIMATING VWM MIXTURE MODELS IN BRMS 25
2 As we noted in Section brms vs bmm, the entire procedure described above can be done
3 with fewer steps with the bmm package we wrote. It uses brms for model estimation under the
4 hood, but it automatically generates the mixture family and the prior constraints, and it allows us
5 to specify the model formula directly for the parameters of interest. The following code is
14 You should keep several things in mind. First, in the bmm specification, the parameters
15 you provide to the formula are not kappa1, kappa2 and theta1, but rather kappa and thetat
16 (mixing proportion for the target responses). These conventions make it simple to extend this to
17 the three-parameter model by including a third parameter thetant, which is the mixing proportion
18 for the non-target responses. Second, the function fit_model can be used to estimate any of the
19 three models we discussed, by specifying the argument model_type (‘2p’ for the two-parameter
20 model, ‘3p’ for the three-parameter model, or ‘IMMabc’, ‘IMMbsc’ and ‘IMMfull’ for several
21 versions of the interference measurement model). You can pass any arguments to this function
22 that you can pass to brms as well. For more information, please consult the documentation of the
23 bmm package. Once fit, you can evaluate the model in the same way as described in the section
1 Example 2: Estimating varying parameter values for a factorial design with within- and
2 between-subject factor
3 In our second example, we used data reported in Loaiza & Souza (2018). In this
4 experiment, a group of younger (N = 25) and older adults (N = 24) were instructed to memorize
5 five colored disks distributed around an imaginary circle in the center on screen. The retention
6 interval was manipulated to be either short or long, and a variable number of cues (0, 1, 2) could
7 indicate the to be tested item prior to recall (for details please refer to the original publication).
8 The cues presented during the retention interval are thought to bring the to-be tested item back
9 into the focus of attention and thereby improving its accessibility and potentially the precision of
10 the memory representation (Souza & Oberauer, 2016). We chose this example to illustrate how
11 to adapt the implementation of the two-parameter mixture model for a more complex design
13 Except for the specification of the model formula, all steps are the same as in Example 1.
14 We specify the same mixture family of two von Mises distributions, and we use the same priors
15 to constrain our model parameters. We only need to adapt our model formula to incorporate the
16 respective independent variables to predict the mixture model parameters. For this dataset we
17 have two within subject factors (retention interval & cue condition) and one between subject
18 factor (age group). Again, we set up the model formula to directly estimate the parameters for
19 each combination of these factors by suppressing the estimation of an intercept using the 0 +
20 coding8:
8
Normally, only specifying the interaction of different factors with a colon estimates the interaction
without the main effects. However, suppressing the intercept using the 0 coding combined with the interaction coded
with the colon directly estimates the parameter means for all combinations of the involved factors.
Running head: ESTIMATING VWM MIXTURE MODELS IN BRMS 27
7 The response error in this data set is captured in the “dev_rad” variable that codes the deviation
8 of the response from the target in radians. Therefore, we use this variable in the first line to
9 specify the main dependent variable. Next, we predict both the precision of memory responses
10 (kappa1) and the probability of recalling an item from memory (theta1) by all three independent
11 variables. This is done by specifying the full three-way interaction of all three independent
12 variables for the fixed effect part of the model formula. Combined with suppressing the intercept,
13 this will directly estimate both the precision of memory representations and the probability of
14 recalling an item from memory for all combinations of the factor levels of all independent
16 estimates for the two within subject factors. To avoid assuming that variability is equal between
17 younger and older adults, we grouped the id variable over age group to allow for different
18 degrees of variability across age groups (for details on these specific setting, please see the
1 Using the adapted model formula, we estimated the parameters of the two-parameter
2 mixture model with the same priors as set in the first example. The posterior of the parameter
3 estimates across the experimental conditions for both age groups are shown in Figure 7. Again,
4 we added the parameter estimates from the original publication to verify that the brms
5 application converges with the original results. As already found in the original publication there
6 were clear benefits for the probability of recalling an item from memory with at least one retro-
7 cue compared to no cue for both younger and older adults. Additionally, older adults generally
8 had a lower probability of recalling an item from memory than younger adults (see panel A of
9 Figure 7), as well as a lower precision of memory representations (see panel B of Figure 7).
Figure 7. Reproduction of the results by Loaiza & Souza (2018) using the brms implementation to estimate
parameters from the two-parameter mixture model. The probability of recalling an item from memory (Pmem) is
shown in panel A (left side), and the imprecision of memory representation (SD of the von Mises) is shown in panel
B (right side). The distributions shaded in black to light gray illustrate the whole posterior distribution of the
respective estimates for the different number of cues. The dot indicates the posterior mean, and the line the 95%
highest density interval of posterior estimates. The diamond indicates the average estimate from the original
publication.
Running head: ESTIMATING VWM MIXTURE MODELS IN BRMS 29
Figure 8. Example of data structure for the three-parameter model. One trial per set size is shown.
2 In this example, we show how to estimate Bays et al.’s (2009) three-parameter mixture
3 model using bmm. We apply the model to the data reported by Bays et al (2009), which was a
4 simple set size experiment akin to the one reported by Zhang and Luck (2008). Participants
7 First, we need to make sure that the data is in the correct format. The model expects that
8 the outcome variable is response error relative to the target, and that the positions of the non-
9 targets are also coded relative to the target. For example, if the target was a color with value 0.7,
10 and the non-targets were 0.5, 0.9, and 1.1, they need to be re-coded by subtracting the target
11 value, i.e., -0.2, 0.2 and 0.4. Each non-target value should be stored in a separate column. If
12 different set sizes are presented, there should be as many non-target columns as the maximum set
13 size – 1, and for set sizes less than that, values in the extraneous columns should be coded as NA
15 Then, just like in Section “How to do all steps with bmm”, we specify the model formula:
16 ff <- bf(RespErr ~ 1,
17 kappa ~ 0 + setsize + (0 + setsize || subID),
18 thetat ~ 0 + setsize + (0 + setsize || subID),
19 thetant ~ 0 + setsize + (0 + setsize || subID))
Running head: ESTIMATING VWM MIXTURE MODELS IN BRMS 30
1 where kappa is the precision of the von Mises distributions, thetat is the mixing proportion for
2 the target responses, and thetant is the mixing proportion for the non-target responses, setsize is a
3 factor variable, and subID is the subject number. In this model, following Bays et al. (2009), we
4 allow all parameters of the model to vary as a function of set size, and we also estimate a random
6 Then, we can estimate this model using the fit_model function for the bmm package:
15 parameter model (in fact you can fit both separately and compare them), the names of the
16 columns containing the relative non-target values (i.e., non_targets). If the experiment contains a
17 variable set size manipulation, you should provide the name of the column that contains the set
18 size variable to the argument setsize; otherwise, if the experiment always shows the same set
20 Figure 9 shows the estimates of the brms model, and the estimates originally reported by
21 Bays et al. (2009). Despite some differences, all original parameter estimates lay within the 95%
22 highest density interval of the hierarchical model estimates, showing that the two estimation
23 techniques converge. The largest discrepancy occurs in the estimate of memory imprecision for
24 large set sizes – the original publication estimates are higher than those estimated by brms. We
25 will not examine this issue in detail because it is beyond the scope of this tutorial; however, one
26 possible reason is that the original estimates used the two-step procedure we described earlier,
27 where parameters are obtained separately for each individual. Given the small number of
Running head: ESTIMATING VWM MIXTURE MODELS IN BRMS 31
Figure 9. Reproduction of the results by Bays et al. (2009) using the brms implementation to estimate parameters
from the three-parameter mixture model. A: imprecision of memory representation (SD of the von Mises), B:
Probability of non-target responses, C: Probability of random responses (guessing). The distributions shaded in gray
illustrate the whole posterior distribution of the respective estimates. The dot indicates the posterior median, and the
line the 95% highest density interval of posterior estimates. The diamond indicates the average maximum likelihood
estimate from the original publication.
1 subjects (8), even one subject with unusually high imprecision would be enough to skew the
2 estimate upwards. Hierarchical estimation uses data from all subjects to inform parameters for
3 each individual subject, which often results in so called shrinkage, and it is usually associated
4 with better parameter recovery results (for example, see Appendix A).
5 The formula syntax and the bmm function are very flexible – almost any experimental
6 design can be specified in the model formula, including categorical and continuous predictors,
7 multifactor designs, within and between-subject designs, etc. Furthermore, you can pass any
8 additional sampling or options arguments to fit_model that you would pass to brms, which
9 provides maximum flexibility. For example, even though fit_model generates reasonable priors,
10 you can supplement or replace those and pass other priors to the prior argument. For more
2 estimation
3 For this example, we will demonstrate how you can put prior constraints on parameters.
4 To illustrate when this can be helpful, we will use an unpublished dataset from our lab, in which
5 people performed continuous color report task with variable set sizes from 1 to 8. We fit the two-
6 parameter model as described in Example 1. Figure 10A and 10B show the estimates for the
7 memory imprecision and probability of guessing as a function of set size. While the general
8 pattern is as expected, that is, worse performance with increasing set size, something unusual
9 stands out for set sizes 7 and 8. The estimated guessing probability is higher for set size 7
10 compared to set size 8, but with a higher memory precision. Based on existing literature, we
11 know this pattern is highly unlikely. The problem is that when guessing probability is relatively
12 high (pg > .40), the mixture model can fail to recover parameters and can mistake high
13 imprecision for guessing, leading to a trade-off in parameter estimates (Grange & Moore, 2022).
14 Following theoretical considerations and the existing literature (Oberauer & Lin, 2017;
15 van den Berg et al., 2012), we expect memory imprecision and guessing probability to increase
16 monotonically as a function of set size. Fortunately, the Bayesian framework allows us to set
17 prior constraints on the estimates to force them to increase monotonically. This will likely
18 provide enough information to the model so that it would not trade-off imprecision with
19 guessing. We chose this example, because it occurred in our real work and represents a relatively
20 complicated case, which will showcase the flexibility of the Bayesian estimation framework.
Running head: ESTIMATING VWM MIXTURE MODELS IN BRMS 33
Figure 10. Parameter estimates of the unconstrained (A and B) and the monotonic constrained (C and D) two-
parameter model in Example 4. A: imprecision of memory representation (SD of the von Mises), B: Probability of
random responses (guessing). The distributions shaded in gray illustrate the whole posterior distribution of the
respective estimates. The dot indicates the posterior median, and the line the 95% highest density interval of
posterior estimates.
1 To enforce monotonic increase of the parameters over set size, we need to do two things.
2 First, we need to change the contrasts in the model, and second, specify priors according to our
3 theoretical assumptions over the parameters. By default, all regression models in R, including
4 brms, use dummy coding for factors. This means, that the model estimates an intercept that
5 corresponds to the parameter value for the first factor level (in this case, set size 1), and
Running head: ESTIMATING VWM MIXTURE MODELS IN BRMS 34
1 regression coefficients for each other level of the factor, which correspond to the difference
2 between each factor level and the intercept. This default coding can be extracted via the
4 2 3 4 5 6 7 8
5 1 0 0 0 0 0 0 0
6 2 1 0 0 0 0 0 0
7 3 0 1 0 0 0 0 0
8 4 0 0 1 0 0 0 0
9 5 0 0 0 1 0 0 0
10 6 0 0 0 0 1 0 0
11 7 0 0 0 0 0 1 0
12 8 0 0 0 0 0 0 1
13 The row numbers reflect the 8 different values of our factor (set size), and the columns reflect the
14 desired contrast, in this case the default dummy coding. We want to set up a different type of
15 contrast where each regression coefficient reflects the differences between the current factor
16 level and the previous factor level. This would allow us to tell the model that this difference
17 should always be positive or negative. Because the regression coefficients represent the
18 differences between every pair of neighboring factors, this would enforce a monotonic increase
19 over set size. This is not a tutorial on basic linear modeling, so without going into too much
21 2 3 4 5 6 7 8
22 1 0 0 0 0 0 0 0
23 2 1 0 0 0 0 0 0
24 3 1 1 0 0 0 0 0
25 4 1 1 1 0 0 0 0
26 5 1 1 1 1 0 0 0
27 6 1 1 1 1 1 0 0
28 7 1 1 1 1 1 1 0
29 8 1 1 1 1 1 1 1
31
Running head: ESTIMATING VWM MIXTURE MODELS IN BRMS 35
7 estimate the difference between each consecutive factor levels, instead of difference relative to
8 the intercept. We can set priors over these estimates to force them to be all non-negative, by
10 ff <- bf(anglediff ~ 1,
11 kappa ~ setsize,
12 thetat ~ setsize)
13 pr <- prior_('normal(0.0,0.8)', class='b', nlpar='kappa', lb=0) +
14 prior_('logistic(0,1)', class='b', nlpar='thetat', lb=0)
23 The new estimates are shown on Figure 10C and 10D. In comparison with panels A and
24 B, we can see that both the probability of guessing, and the imprecision increase monotonically
25 over set size, as we would expect based on prior theoretical and empirical work. Specifically, the
26 model no longer estimates a dip in imprecision only at set size 7, together with more guesses
27 relative to set size 8. It also produces a less inflated imprecision estimate for set size 8. A
28 surprising finding is that imposing monotonic constraints drastically reduced uncertainty in the
29 parameter estimates, particularly for the probability of guessing (compare the highest density
1 This example illustrates a case when providing a somewhat informative prior to the
2 model helps it reduce the uncertainty around model estimates. Generally, imposing constraints
3 via informative priors should also reflect theoretically reasonable constraints. From our
4 perspective, the assumption that memory precision and probability of recall from memory
5 decline with larger set sizes aligns with previous findings and theoretical assumptions (Oberauer
6 & Lin, 2023; van den Berg et al., 2012; Zhang & Luck, 2008). Critically, the assumptions that
7 these parameters increase monotonically, includes that parameters of the model will no longer
8 change between set sizes when set size reaches a certain threshold (Pratte, 2020). Ultimately, the
9 goal of this example was to illustrate the possibilities researchers have in considering informative
10 priors to improve model estimation and testing theoretical assumptions (Haaf & Rouder, 2018).
11 Example 5: Comparing parameter estimates from the 3-parameter mixture model with the
13 For our last example, we will demonstrate how you can estimate parameters from the
14 interference measurement model and illustrate what the interference measurement provides in
15 terms of theoretical interpretation that goes beyond the 3-parameter mixture model. For this, we
16 have re-analyzed data from Experiment 1 reported in Oberauer et al. (2017). This experiment
17 collected data from 20 young adults who had to do a continuous color reproduction task – a
18 variable number of color patches (from one up to eight) appeared on the screen, and then
19 participants had to report the color of one of the patches on a color wheel. For each set size 100
21 First, we estimated parameters for the three-parameter mixture model from this data. For
22 this we used the fit_model function implemented in the bmm package and specified that we want
23 to fit the 3-parameter mixture model and vary all three parameters over set size:
Running head: ESTIMATING VWM MIXTURE MODELS IN BRMS 37
1 ff <- bf(devRad ~ 1,
2 kappa ~ 0 + SetSize + (0 + SetSize || ID),
3 thetat ~ 0 + SetSize + (0 + SetSize || ID),
4 thetant ~ 0 + SetSize + (0 + SetSize || ID))
11 As for Example 3, we first specified the dependent variable in the formula. In this case
12 devRad is the deviation of the response from the target in radians. Then we estimated the
13 precision of memory responses (kappa), and both the proportion of target and non-target
14 responses (thetat & thetant) for all set sizes. We also included random effects for all estimated
1 The results for the parameters of the three parameters mixture model are displayed in
2 Fehler! Verweisquelle konnte nicht gefunden werden.. Consistent with the results from
3 previous examples, we see that the probability of recalling the target (Pmem) from memory
4 reduces from small to large set sizes (see Fehler! Verweisquelle konnte nicht gefunden
5 werden.A). Conversely, the probability of committing a swap error (Pswap), that is recalling one
6 of the non-target items increased from small to large set sizes (see Fehler! Verweisquelle
7 konnte nicht gefunden werden.B). Finally, the precision of memory responses (kappa) reduces
8 with set size (see Fehler! Verweisquelle konnte nicht gefunden werden.D). These results
10 Neither the two-parameter nor the three-parameter mixture model provide any theoretical
11 foundation of the processes underlying the mixture of different response distributions. The
12 interference measurement model (IMM; Oberauer et al., 2017) attempts to fill this gap by
13 assuming that the probability of recalling an item from one of the different mixture distributions
15 background noise (n) from general activation (a) for all items presented in the current trial, and
16 context activation (c) for memory items that are associated with the context cued at retrieval. In
17 the full IMM it is additionally assumed that context activation can be generalized to non-targets
18 following a generalization gradient (s) that reflects the precision of cue-target associations on the
19 context dimension (for a more detailed description please see, Oberauer et al., 2017). However,
20 as the estimation of the s parameter requires the additional information of spatial distance
21 between target and non-targets and more data to be estimated with sufficient precision, we will
22 focus on the reduced IMM only including the a, b, and c parameters (IMMabc).
Running head: ESTIMATING VWM MIXTURE MODELS IN BRMS 39
1 Both the two-parameter and the three-parameter mixture models are special cases of the
2 full IMM (Oberauer et al., 2017). The IMMabc is mathematically equivalent to the three-
3 parameter mixture model and provides a re-parameterization of the recall probabilities into
4 different sources of activation. Additionally discarding the general activation a from the IMM
5 would result in a model equivalent to the two-parameter mixture model. The main benefit of the
6 IMM over the two- and the three-parameter mixture model is that its parameters are grounded in
7 an explanatory model of visual working memory (Oberauer & Lin, 2017) that provides
9 Fitting the IMM to data using the fit_model function implemented in the bmm packages is
10 like fitting the three-parameter or the two-parameter mixture model. First, we need to specify the
12 experimental conditions:
13 ff <- bf(devRad ~ 1,
14 kappa ~ 0 + SetSize + (0 + SetSize || ID),
15 c ~ 0 + SetSize + (0 + SetSize || ID),
16 a ~ 0 + SetSize + (0 + SetSize || ID))
17
18 First, we again specified the dependent variable, devRad (i.e., the deviation of the response from
19 the target in radian). Then we specified that the precision of memory responses (kappa), and both
20 the context activation c and general activation a should be estimated for all set sizes. We again
21 included random effects for all estimated parameters for all set sizes.
22 Then, we can submit this formula to the fit_model function, now choosing the model type
23 “IMMabc” to estimate the IMMabc. Like for the three-parameter mixture model we additionally
24 have to specify the variables that contain the position of non-targets relative to the target in
1 formula = ff,
2 data = df_OberauerLin2017_E1,
3 model_type = 'IMMabc',
4 non_targets = paste0('Item',2:8,'_Col_rad'),
5 setsize = "SetSize")
Figure 12. Parameter estimates for the IMMabc for Experiment 1 reported by Oberauer et al. (2017). We report the
posterior estimates (means, 95% highest density intervals and the full posteriors) of the context activation (Panel A)
and the general activation (Panel B) prior to being normalized through the softmax function Therefore, it is possible
to obtain negative activation values. With respect to interpretation these must be evaluated relative to the parameter
fixed for scaling. In this case, we fixed the background noise (n) to zero (illustrated by dotted red line in Panel B).
Panel C shows the estimates for the precision of memory representations. All parameters were allowed to vary
between set size.
6 The resulting parameter estimates of the IMMabc are displayed in Figure 12. Consistent
7 with previous results, the context activation (see Figure 12A), that is the strength of the
8 association between the color and the spatial location, decreases with larger set sizes (see
9 estimates of the full IMM in Oberauer et al., 2017). The general activation however remains
10 constant across set size, indicating that colors within each trial are held active at a similar
11 activation level independent of set size. At first sight the negative estimates for general activation
12 are surprising, however these estimates reflect the general activation a on the logarithmic scale
13 prior to the normalization by the softmax function. Thus, it is best to interpret the estimated
14 activations relative to the activation component fixed for scaling reasons. In this case, the
15 background noise was fixed for scaling. The negative estimates for the general activation
16 therefore indicate that these activations were lower than the background noise (dotted red line in
Running head: ESTIMATING VWM MIXTURE MODELS IN BRMS 41
1 Figure 12B), whereas context activation was higher than the background noise. Finally, like the
2 results from the three-parameter mixture model the precision of memory representations
4 When comparing the results from the IMMabc to the three-parameter mixture model
5 results there are several things to note: 1) the reduction in context activation qualitatively
6 resembles the reduction in the Pmem parameter from the three-parameter mixture model, 2)
7 likewise the estimates for the precision of memory representations are practically identical for
8 the IMMabc and the three-parameter mixture model, however 3) the pattern of general activation
9 indicates no change (if anything, a reduction from small to large set sizes) whereas Pswap
10 increases with larger set sizes. This indicates that the increase is swap errors is mainly due to the
11 larger number of items a target can get confused with than each of the items having a higher
12 activation. In addition, the reduction in context activation also reduced the difference in
14 All in all, this example has illustrated that estimating the IMMabc is straightforward
15 using the implementation in the bmm package. Currently, this implementation per default fixes
16 the background noise to zero (on the logarithmic scale, reflecting a background noise of 1 on the
17 native scale) and freely estimates all other IMM parameters. The bmm package has also
18 implemented the two other versions of the IMM proposed by Oberauer et al. (2017)9. The
19 IMMbsc assumes that swap errors occur only as a function of generalization on the context
20 dimension and thus does not contain the general activation component but estimates the
21 generalization gradient (s) instead. The IMMfull combines confusions as a function of similarity
9
Please see the “scripts/bmm_examples” folder on the GitHub repository for examples implementing both
the IMMbsc and the IMMfull for Experiment 1 reported by Oberauer et al. (2017).
Running head: ESTIMATING VWM MIXTURE MODELS IN BRMS 42
1 on the context dimension, and confusions independent of similarity by estimating both the
2 generalization gradient (s) and general activation (a). To estimate both these models, users must
3 additionally provide the names for the variables coding the spatial distance between the target
4 and all non-targets. This is necessary for estimating the generalization gradient (s). An adapted
6 ff <- bf(devRad ~ 1,
7 kappa ~ 0 + SetSize + (0 + SetSize || ID),
8 c ~ 0 + SetSize + (0 + SetSize || ID),
9 a ~ 0 + SetSize + (0 + SetSize || ID),
10 s ~ 0 + SetSize + (0 + SetSize || ID),
11 )
1 General discussion
2 Measurement models of visual working memory have become a popular and useful tool
4 meaningful model parameters (Oberauer et al., 2017). This tutorial described how researchers
5 can use an established package for Bayesian hierarchical estimates, brms, to estimate the two-
6 parameter mixture model (Zhang & Luck, 2008), the three-parameter mixture model (Bays et al.,
7 2009) and the interference measurement model (Oberauer et al., 2017). We also introduced a
8 new package, bmm, which makes it even easier to specify these models in brms. Additionally,
9 we provide a GitHub repository with well documented code for each of the five examples, which
10 can be a useful learning tool. The five examples we presented demonstrate the flexibility of these
11 implementations. Any model that can be specified with the brms formula syntax can be
12 estimated – including, single and multi-factorial designs, within- and between-subject designs,
13 continuous and categorical predictors, and various random effect structures. In contrast to the
14 typical two-step maximum likelihood procedure, researchers can predict different model
15 parameters as a function of different conditions, rather than allowing all parameters to vary
17 When estimating mixture models with brms and bmm, users should keep several
19 • All responses and item values should be coded in radians, not degrees
20 • The response variable should contain the response error, i.e., the response relative to
21 the target
23 need to provide the values of the non-target items, relative to the target item. For the
Running head: ESTIMATING VWM MIXTURE MODELS IN BRMS 44
1 IMMbsc and the IMMfull you need to additionally provide the spatial distance of
3 • For efficient estimation, brms transforms the kappa and probability parameters (as we
5 formula). The output of the estimated model (via the summary() function) presents
7 exponentiating kappa values and putting thetat and thetant through the softmax
10 • Before interpreting the estimated parameters, it is important to examine the overall fit
11 of the model (e.g., by visual inspection of the predicted vs. the observed data)
12 • For statistical inference, you could either use the highest density intervals of the
16 Theoretical considerations
17 Computational models of memory and cognition are often split into two classes –
18 measurement models and process or explanatory models. Both types of models decompose
19 behavior into separate meaningful parameters, but in contrast to process models, measurement
20 models typically do not provide mechanistic explanations for differences in experimental effects.
21 Instead, measurement models allow their parameters to vary across experimental conditions to
22 account for these differences. Nonetheless, measurement models provide considerable benefits
23 (Farrell & Lewandowsky, 2018; Frischkorn et al., 2022; Frischkorn & Schubert, 2018; Oberauer
Running head: ESTIMATING VWM MIXTURE MODELS IN BRMS 45
1 et al., 2017). They decompose the observed behavior into meaningful parameters and provide a
3 researchers to evaluate experimental effects on the level of these parameters instead of the
4 behavioral responses providing a more fine-grained perspective with respect to the potential
5 inference.
7 explanatory model of visual working memory (Oberauer & Lin, 2017, 2023) and thus inherits
8 some of the mechanistic processes implemented in this model. Therefore, its parameters can be
9 interpreted in terms of theoretical processes or activation sources for which the contribution to
10 observed behavior is clearly specified. The fact that both the two- and the three-parameter
11 mixture model are mathematically equivalent to special cases of the interference measurement
12 model additionally highlights that these models themselves do not provide evidence in favor of
13 either slot (Adam et al., 2017; Ngiam et al., 2022; Pratte, 2020; Zhang & Luck, 2008) or resource
14 accounts (S. Ma et al., 2022; W. J. Ma et al., 2014; van den Berg et al., 2014) of working
15 memory. Therefore, in fitting a two-parameter mixture model you should not assume that visual
16 working memory is limited by slots. To do so, additional assumptions would have to be added to
17 these models (e.g., memory precision following a specific function across set sizes) to account
18 for theoretical ideas put forth by slot or resource accounts of visual working memory.
19 Additionally, the fact that the IMM assumes continuous activations underlying the retrieval of
20 memory representations adds a third account to the explanation of capacity limits in visual
21 working memory, the binding hypothesis that assumes that capacity limits a person’s ability to
1 Although the different measurement models introduced in this paper are linked to
2 different theoretical perspectives, the goal of this paper is not to discuss and compare these
3 theoretical models of working memory. Therefore, we deliberately did not weigh in on the
4 discussion which of the different measurement models is considered superior and better
5 supported by empirical data and refrained from an in-depth explanation and discussion of
8 The goal of this tutorial was to introduce the implementations of three measurement
9 models for continuous reproduction tasks in brms and introduce the bmm package that aims to
10 ease the use and application of these measurement models. We hope that the implementations we
11 introduced here will enable more researchers to compare these different models and contribute
12 towards the evaluation of benefits and problems of these different measurement models.
13 Generally, we think that analyzing data on the level of cognitive processes will provide more
14 refined insights into the effects of different experimental manipulations and advance our field
15 towards a more comprehensive explanation of working memory and its limited capacity.
17 This tutorial is deliberately limited to mixture models that provide measurement models
18 for continuous reproduction tasks. Obviously, research on visual working memory and working
19 memory more generally uses a broad range of different tasks, procedures, and materials, such as
21 reasonable to assume that the distinction between recall from memory, random guessing, and
22 variable precision or strength of memory representations, should be relevant across a broad range
23 of tasks (Oberauer & Lewandowsky, 2019; Oberauer & Lin, 2023), modeling behavioral
Running head: ESTIMATING VWM MIXTURE MODELS IN BRMS 47
1 responses in change detection tasks or tasks using discrete stimulus material requires entirely
2 different implementations of these concepts (see for example Oberauer & Lewandowsky,
3 2019).10 Similarly, other recently proposed measurement models for continuous reproduction
4 tasks, such as the target competition confusability model (TCC; Schurgin et al., 2020) and the
5 signal discrimination model (SDM; Oberauer, 2021), do not use a continuous distribution, such
6 as the von Mises distribution, to model the recall error in continuous reproduction tasks and
7 instead introduce several additional steps and different distributional assumptions.11 Therefore,
8 we felt that to keep this tutorial accessible and limited in length it is best to focus only on a set of
10 Nonetheless, the setup of the bmm package provides the foundation for the
11 implementation of a broad range of cognitive measurement models, while still retaining the
12 accessible and easy-to-use features that we highlighted in this tutorial. On a more abstract level,
13 the implementations for the measurement models we presented here illustrate that cognitive
15 (see Figure 13 for an illustration). Previous research has already highlighted the benefits of more
16 closely aligning the modeling of behavioral data with distributions that more adequately
17 resemble core features of the observed data (Haines et al., 2020). For example, instead of
10
First, the underlying response distribution for these tasks is different: In Change Detection Tasks
responses follow a binomial distribution, whereas with discrete stimulus material responses follow a multinomial
distribution of different responses or response categories. Second, these different response distributions necessitate a
different translation of the assumed cognitive states a person might be in, into the observed responses (Lin &
Oberauer, 2022). Due to these complications, we restricted ourselves to mixture models for continuous reproduction
tasks that all share the von Mises distribution as response distribution and assume behavioral responses to represent
a mixture of different von Mises distributions.
11
Specifically, they assume a continuous activation function (either a La Place or a von Mises distribution)
for the responses around the circle and model recall as a competitive selection of all 360 responses possible on the
color wheel. Formally, this resembles a multinomial response distribution for which the probability of selecting the
different response options is derived as a function of the continuous activation function. The dependencies in the
probabilities of recalling neighboring response options is introduced by the continuous activation function but not
included in the response scale being continuous.
Running head: ESTIMATING VWM MIXTURE MODELS IN BRMS 48
Figure 13. Illustration of the relationship between generalized linear models to cognitive measurement models.
Generalized linear models provide a distributional description of the observed data. For example, accuracy data that
can stem from a binomial process with a certain number of trials and probability of success, or reaction time data
can stem from a lognormal distribution with a certain mean and standard deviation. Cognitive measurement models
provide an additional decomposition of the distributional parameters, that is probabilities of success, means, or
standard deviations, into cognitive processes assumed to underlie the observed behavior. This often integrates
several distributional parameters or additional information about the experimental setup or stimuli.
1 modeling aggregated reaction times with standard gaussian models, using generalized linear
2 mixed models assuming the data to follow a lognormal or inverse gaussian distributions
3 dramatically improves the inference and strengthens the conclusion that can be draw from the
4 analyses (Boehm et al., 2018; Rouder & Haaf, 2019). The core insight of the here presented
6 distributional models for which the distributional parameters of the generalized linear mixed
7 model are a function of cognitive measurement model parameters (again see Figure 13 for an
8 illustration). These functions that translate the cognitive measurement model parameters into
9 distributional parameters is what we essentially implemented for the three measurement models
11 To avoid that researchers have to code the translation functions of cognitive model
12 parameters into distributional parameters themselves each time they want to use a specific
13 model, we developed the bmm package. Additionally, the bmm package will also perform some
Running head: ESTIMATING VWM MIXTURE MODELS IN BRMS 49
Figure 14. Flow chart for the functionality of the fit_model function implemented in the bmm package. The user
only needs to specify the inputs to the fit_model function, the bmm package then takes care of the rest.
1 additional checks for the data provided with the model, specify reasonable priors for the different
2 model parameters, and ensure that model estimation runs as efficiently as possible. This way,
3 data can be analyzed on the level of latent cognitive processes instead of observed behavior. For
4 this, you only have to specify the linear model formula predicting which of the cognitive model
5 parameters vary as a function of experimental manipulation and select the according model to be
7 As of now, the bmm package only includes the three cognitive measurement models
8 presented in this paper. Prospectively, we plan to extend the bmm package with implementations
9 of measurement models for a broad range of tasks, such as signal-detection models (DeCarlo,
10 1998; Vuorre, 2017), models for verbal working memory tasks (Oberauer & Lewandowsky,
11 2019), more recent models for continuous reproduction tasks (Oberauer, 2021; Schurgin et al.,
12 2020), as well as models for reaction time data (Annis et al., 2017; Peña & Vandekerckhove,
13 2024), while still retaining the easy and accessible usability we presented here. But given the
14 additional coding work that would be required for these implementations, the time it would take
15 to explore them, and an in-depth introduction how to use these models, this was beyond the
16 scope of this tutorial. Instead, this tutorial constitutes the first step in the development of a more
Running head: ESTIMATING VWM MIXTURE MODELS IN BRMS 50
1 general and broadly applicable package aiming to ease the use and application of cognitive
3 Conclusion
4 We have demonstrated how to implement three different kinds of mixture models for
5 visual working memory tasks in the R package brms. Additionally, we provide useful wrapper
6 functions to estimate these models in a newly developed R package for Bayesian measurement
7 models, bmm. To ease the comprehension of our examples we share R code for both the
8 implementation in brms without relying on bmm functions and the implementation using the
9 bmm function in a GitHub repository. We hope that these implementations will enable more
10 researchers to fit mixture models to visual working memory tasks with continuous reproduction
11 recall. We are curious to see how benefits of the current implementation, for example the
12 possibility to predict parameters of the implemented models by continuous predictors will aid
13 researchers in gaining insight into questions that could not be addressed with other
21 First, we will generate synthetic continuous reproduction data. Each of N=20 participants
22 contributes Nobs=50 observations in two conditions A and B. We selected 50 observations,
23 because from our experience, with so few observations the MLE non-hierarchical method
24 often fails to recover the correct individual parameters, so it will serve as a good showcase.
25 We assume that participants vary in their memory precision (i.e. the 𝜅 parameter of the
26 von Mises distribution) and in the likelihood that their response comes from memory (𝜌;
27 here we specify it as 𝜃, which is the logit transformed 𝜌, because that is how brms
28 estimates it). Therefore, the observations for each participant in conditions A and B are
29 generated as follows, where j indexes participants, 𝛥𝜌 and 𝛥𝜅 are the differences in the
30 parameters between conditions B and A, 𝑣𝑀 is the von Mises distribution and 𝒰 is the
31 uniform distribution:
32 𝑦92 ∼ 𝜌2 ⋅ 𝑣𝑀P0, 𝜅2 Q + P1 − 𝜌2 Q ⋅ 𝒰(−𝜋, 𝜋)
12
This appendix is adapted from a Rmarkdown notebook, which is available at:
https://github.com/GidonFrischkorn/Tutorial-MixtureModel-
VWM/blob/main/scripts/parameter_recovery_multi_level_simulation.Rmd
Running head: ESTIMATING VWM MIXTURE MODELS IN BRMS 52
2 𝜅 ∼ 𝒩(10,2.5)
3 𝛥𝜅 ∼ 𝒩(20,5)
𝑒0
4 𝜌=
1 + 𝑒0
5 𝜃 ∼ 𝒩(1,0.5)
6 𝛥𝜃 ∼ 𝒩(−1,0.25)
7 In simple terms, participants on average have a precision of 𝜅 = 10 in Condition A, but they
8 vary around that estimate (SD=2.5). In condition B, participants’ precision is increased on
9 average by 𝛥𝜅 = 20 relative to Condition A, but this also varies (SD=5). Similarly for the
10 probability in memory (𝜌) parameter. The code below accomplishes this (parts are
11 commented out, because we load these values at the beginning of the script):
12
13 #### generate synthetic data
14
15 #### first, set population parameters
16 N = 20
17 Nobs = 50
18 kappa1a_mu = 10
19 kappa1a_sd = 2.5
20 kappa1delta_mu = 20
21 kappa1delta_sd = 5
22 theta1a_mu = 1
23 theta1a_sd = 0.5
24 theta1delta_mu = -1
25 theta1delta_sd = 0.25
26
27 #### generate participant parameters (for theta, logit units)
28 # kappa1a_i = rnorm(N, kappa1a_mu, kappa1a_sd)
29 # kappa1delta_i = rnorm(N, kappa1delta_mu, kappa1delta_sd)
30 # theta1a_i = rnorm(N, theta1a_mu, theta1a_sd)
31 # theta1delta_i = rnorm(N, theta1delta_mu, theta1delta_sd)
32
33 #### put parameters together
34 # true_pars <- data.frame(id = rep(1:N, 2), condition = rep(c('A','B')
35 , each=N),
36 # kappa = c(kappa1a_i, kappa1a_i+kappa1delta_i),
37 # pmem = gtools::inv.logit(c(theta1a_i, theta1a_i+
38 theta1delta_i)))
39
Running head: ESTIMATING VWM MIXTURE MODELS IN BRMS 53
1
2 #### simulate data for each trial
3 # dat <- data.frame()
4 # for (i in 1:N) {
5 # A_n = floor(Nobs*gtools::inv.logit(theta1a_i[i]))
6 # B_n = floor(Nobs*gtools::inv.logit(theta1a_i[i]+theta1delta_i[i]))
7 # datA <- data.frame(y=c(rvon_mises(A_n, 0, kappa1a_i[i]), runif(Nob
8 s-A_n,-pi,pi)), condition = "A", id = i)
9 # datB <- data.frame(y=c(rvon_mises(B_n, 0, kappa1a_i[i]+kappa1delta
10 _i[i]), runif(Nobs-B_n,-pi,pi)), condition = "B", id = i)
11 # DAT <- bind_rows(datA,datB)
12 # dat <- bind_rows(dat,DAT)
13 # }
1 ## 26 6 B 30.961803 0.5047793
2 ## 27 7 B 35.266937 0.4891554
3 ## 28 8 B 19.996771 0.3925293
4 ## 29 9 B 37.457545 0.2399274
5 ## 30 10 B 30.225545 0.4020229
6 ## 31 11 B 33.576452 0.3754082
7 ## 32 12 B 24.988404 0.4132181
8 ## 33 13 B 41.206685 0.4409797
9 ## 34 14 B 36.394617 0.5327623
10 ## 35 15 B 27.157000 0.6347218
11 ## 36 16 B 21.471546 0.3948013
12 ## 37 17 B 30.043178 0.3331486
13 ## 38 18 B 28.469415 0.8281906
14 ## 39 19 B 25.249893 0.6915995
15 ## 40 20 B 35.677790 0.5827495
16 true_pars %>%
17 gather(par, value, kappa, pmem) %>%
18 ggplot(aes(value)) +
19 geom_histogram(bins=10) +
20 facet_grid(condition ~ par, scales="free") +
21 theme_bw()
Running head: ESTIMATING VWM MIXTURE MODELS IN BRMS 55
1
2 And here is the distribution of errors, characterized by higher precision in condition B, but
3 with more guessing:
4 dat %>%
5 ggplot(aes(y)) +
6 geom_density(aes(color=condition)) +
7 theme_bw() +
8 xlab('Angle error')
Running head: ESTIMATING VWM MIXTURE MODELS IN BRMS 56
1
2 And as you can see, there is quite some variability by participant:
3 ggplot(dat, aes(y, color=condition)) +
4 geom_density() +
5 facet_wrap(~id) +
6 theme_bw()
7
8
Running head: ESTIMATING VWM MIXTURE MODELS IN BRMS 57
3 As is standard in the literature, we will first fit the mixture model using MLE separately to
4 each participant and condition. We accomplish this using a couple of custom functions, but
5 the same can be achieved with some existing R packages.
6 # 2-p likelihood function
7 LL <- function(dat) {
8 LL_resp <- function(par) {
9 y = dat$y
10 kappa = exp(par[1])
11 pmem = gtools::inv.logit(par[2])
12 lik_vm <- brms::dvon_mises(y,mean(y),kappa)
13 lik_un <- brms::dvon_mises(y,mean(y),0)
14 lik <- pmem*lik_vm+(1-pmem)*lik_un
15 LL <- -sum(log(lik))
16 }
17 }
18
19 # function for fit and return parameter estimates
20 fit_mixture <- function(dat) {
21 require(stats4)
22 LL_resp <- LL(dat)
23 fit <- optim(c(logkappa=2,theta=1), LL_resp)
24 coef = as.data.frame(t(fit$par))[1,]
25 coef$convergence <- fit$convergence
26 coef$kappa = exp(coef$logkappa)
27 coef$pmem = gtools::inv.logit(coef$theta)
28 return(coef)
29 }
30
31
32 # estimate parameters separately for each participant and condition
33 mle_est <- dat %>%
34 group_by(id, condition) %>%
35 do({fit_mixture(.)}) %>%
36 arrange(condition, id)
37 ## Loading required package: stats4
38
39 First, we check that all fits have converged:
Running head: ESTIMATING VWM MIXTURE MODELS IN BRMS 58
1
2 mean(mle_est$convergence == 0)
3 ## [1] 1
4
5 And now, we can check how well the individual participant parameters have been
6 recovered. We plot the estimated parameters (y-axis) vs the true generating parameters (x-
7 axis):
8
9 r_kappa <- round(cor.test(true_pars$kappa, mle_est$kappa)$est,2)
10 r_pmem <- round(cor.test(true_pars$pmem, mle_est$pmem)$est,2)
11
12 p1 <- left_join(true_pars, mle_est, by=c('id','condition')) %>%
13 ggplot(aes(kappa.x, kappa.y, color=condition)) +
14 geom_point() +
15 ggtitle('Kappa') +
16 xlab('True parameter') +
17 ylab('MLE estimate') +
18 theme_bw() +
19 annotate('text', x=15, y=40, label=paste0('r(40) = ', r_kappa)) +
20 geom_abline(intercept=0, slope=1) +
21 theme(legend.position="")
22
23 p2 <- left_join(true_pars, mle_est, by=c('id','condition')) %>%
24 ggplot(aes(pmem.x, pmem.y, color=condition)) +
25 geom_point() +
26 ggtitle('Pmem') +
27 xlab('True parameter') +
28 ylab('MLE estimate') +
29 theme_bw() +
30 annotate('text', x=0.4, y=0.8, label=paste0('r(40) = ', r_pmem)) +
31 geom_abline(intercept=0, slope=1)
32
33 p1+p2
Running head: ESTIMATING VWM MIXTURE MODELS IN BRMS 59
1
2
3 As we can, the non-hierarchical MLE estimates of Pmem are pretty good; however, the
4 model fails to accurately estimate the individual kappa parameters. With only 50
5 observations per participant and condition, that’s normal in our experience. Notably, from
6 these estimates, a researcher might erroneously conclude that our manipulation affects
7 only pmem, but not kappa (which, interestingly, happens often in published research on
8 VWM). Let’s see if the hierarchical model can do better.
9
10
12 The code is commented out, because it takes several hours to fit. The results are preloaded
13 at the begining of the script, so they can be accessed by the relevant object names.
14 # # create mixture of von Mises distributions
15 # mix_vonMises <- mixture(von_mises,von_mises,order = "none")
16 #
17 # # set up mixture model. allow kappa and theta to vary by condition
18 # bf_mixture <- bf(y ~ 1,
19 # kappa1 ~ condition + (condition||id),
20 # kappa2 ~ 1,
21 # theta1 ~ condition + (condition||id))
22 #
23 #
24 # # check default priors
25 # get_prior(bf_mixture, dat, mix_vonMises)
Running head: ESTIMATING VWM MIXTURE MODELS IN BRMS 60
1 #
2 ## constrain priors. Set mean of von mises distributions to be 0; set
3 kappa of the
4 # # second von mises distribution to be very low, approximating a unif
5 orm distribution
6 # mix_priors <- prior(constant(0), class = Intercept, dpar = "mu1") +
7 # prior(constant(0), class = Intercept, dpar = "mu2") +
8 # prior(constant(-100), class = Intercept, dpar = "kappa2")
9 #
10 # brms_fit <- brm(bf_mixture, dat, mix_vonMises, mix_priors)
11
12 Let’s examine the model. Convergence of all parameters looks good (Rhat < 1.01, and high
13 Tail_ESS; no warnings).
14
15 brms_fit
16 ## Family: mixture(von_mises, von_mises)
17 ## Links: mu1 = tan_half; kappa1 = log; mu2 = tan_half; kappa2 = log
18 ; theta1 = identity; theta2 = identity
19 ## Formula: y ~ 1
20 ## kappa1 ~ condition + (condition || id)
21 ## kappa2 ~ 1
22 ## theta1 ~ condition + (condition || id)
23 ## Data: dat4 (Number of observations: 2000)
24 ## Samples: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
25 ## total post-warmup samples = 4000
26 ##
27 ## Group-Level Effects:
28 ## ~id (Number of levels: 20)
29 ## Estimate Est.Error l-95% CI u-95% CI Rhat Bul
30 k_ESS
31 ## sd(kappa1_Intercept) 0.22 0.10 0.03 0.42 1.00
32 975
33 ## sd(kappa1_conditionB) 0.17 0.12 0.01 0.46 1.00
34 1590
35 ## sd(theta1_Intercept) 0.53 0.14 0.28 0.83 1.00
36 956
37 ## sd(theta1_conditionB) 0.26 0.17 0.01 0.64 1.00
38 722
39 ## Tail_ESS
40 ## sd(kappa1_Intercept) 728
41 ## sd(kappa1_conditionB) 2029
42 ## sd(theta1_Intercept) 1853
Running head: ESTIMATING VWM MIXTURE MODELS IN BRMS 61
1 ## sd(theta1_conditionB) 1087
2 ##
3 ## Population-Level Effects:
4 ## Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ES
5 S Tail_ESS
6 ## mu1_Intercept 0.00 0.00 0.00 0.00 1.00 400
7 0 4000
8 ## kappa1_Intercept 2.44 0.09 2.27 2.63 1.00 327
9 9 2446
10 ## mu2_Intercept 0.00 0.00 0.00 0.00 1.00 400
11 0 4000
12 ## kappa2_Intercept -100.00 0.00 -100.00 -100.00 1.00 400
13 0 4000
14 ## theta1_Intercept 0.92 0.15 0.63 1.22 1.00 188
15 2 2779
16 ## kappa1_conditionB 0.93 0.13 0.68 1.18 1.00 427
17 8 3091
18 ## theta1_conditionB -1.10 0.14 -1.38 -0.82 1.00 447
19 4 3181
20 ##
21 ## Samples were drawn using sampling(NUTS). For each parameter, Bulk_E
22 SS
23 ## and Tail_ESS are effective sample size measures, and Rhat is the po
24 tential
25 ## scale reduction factor on split chains (at convergence, Rhat = 1).
26
27 Now, the values above don’t tell us much, because in brms, kappa is log transformed and
28 theta is the logit/softmax transformation of pmem. Therefore, we first need to extract the
29 random effects, which are the parameter estimates for each participant, and then transform
30 the parameters into the relevant scale:
31
32 ranefs <- ranef(brms_fit)$id
33 logkappa <- c(fixef(brms_fit)['kappa1_Intercept','Estimate']+ranefs[,'
34 Estimate','kappa1_Intercept'],
35 fixef(brms_fit)['kappa1_Intercept','Estimate']+ranefs[,'
36 Estimate','kappa1_Intercept']+
37 fixef(brms_fit)['kappa1_conditionB','Estimate']+ranefs[,
38 'Estimate','kappa1_conditionB'])
39
40 theta = c(fixef(brms_fit)['theta1_Intercept','Estimate']+ranefs[,'Esti
41 mate','theta1_Intercept'],
42 fixef(brms_fit)['theta1_Intercept','Estimate']+ranefs[,'Esti
43 mate','theta1_Intercept']+
Running head: ESTIMATING VWM MIXTURE MODELS IN BRMS 62
1 fixef(brms_fit)['theta1_conditionB','Estimate']+ranefs[,'Est
2 imate','theta1_conditionB'])
3
4 brms_est <- data.frame(id = rep(1:N, 2),
5 condition = rep(c('A','B'), each=N),
6 kappa = exp(logkappa),
7 pmem = c(gtools::inv.logit(theta)))
8
9 As with the MLE model before, we can now check how well the individual participant
10 parameters have been recovered. We plot the estimated parameters (y-axis) vs the true
11 generating parameters (x-axis):
12
13 r_kappa <- round(cor.test(true_pars$kappa, brms_est$kappa)$est,2)
14 r_pmem <- round(cor.test(true_pars$pmem, brms_est$pmem)$est,2)
15
16 p3 <- left_join(true_pars, brms_est, by=c('id','condition')) %>%
17 ggplot(aes(kappa.x, kappa.y, color=condition)) +
18 geom_point() +
19 ggtitle('Kappa') +
20 xlab('True parameter') +
21 ylab('BRMS estimate') +
22 theme_bw() +
23 annotate('text', x=15, y=40, label=paste0('r(40) = ', r_kappa)) +
24 geom_abline(intercept=0, slope=1) +
25 theme(legend.position="")
26
27 p4 <- left_join(true_pars, brms_est, by=c('id','condition')) %>%
28 ggplot(aes(pmem.x, pmem.y, color=condition)) +
29 geom_point() +
30 ggtitle('Pmem') +
31 xlab('True parameter') +
32 ylab('BRMS estimate') +
33 theme_bw() +
34 annotate('text', x=0.4, y=0.8, label=paste0('r(40) = ', r_pmem)) +
35 geom_abline(intercept=0, slope=1)
36
37 p3+p4
Running head: ESTIMATING VWM MIXTURE MODELS IN BRMS 63
1
2
3 And voila - we can see that the brms model does a drastically better job at recovering the
4 kappa parameter, relative to the MLE non-hierarchical version. This is because in the
5 hierarchical model, the data from all participants partially informs parameter estimates for
6 each individual participant, which is particularly helpful when we only have a few
7 observations per participant. When considering both conditions combined, the correlations
8 between the true kappa and the estimated kappa increased from 0.41 in the mle estimate
9 to 0.93 in the brms estimate. Within each condition the correlation is lower (0.57 for
10 condition A and 0.51 for condition B), but these are still several times higher than the mle
11 estimates (0.26 for condition A and 0.17 for condition B). Thus, while only 50 observations
12 per condition are still not enough for a very reliable estimation with brms of within-
13 condition individual differences, the estimates are drastically improved relative to the mle
14 implementation, which fail to recover any information related to the kappa parameter.
15
Running head: ESTIMATING VWM MIXTURE MODELS IN BRMS 64
2 All code and data used in this tutorial is openly available on GitHub:
4 package can be installed from GitHub as well: https://github.com/venpopov/bmm. The output for
1 References
2 Adam, K. C. S., Vogel, E. K., & Awh, E. (2017). Clear evidence for item limits in visual
4 Annis, J., Miller, B. J., & Palmeri, T. J. (2017). Bayesian inference with Stan: A tutorial
6 https://doi.org/10/gf5smk
7 Bays, P. M., Catalao, R. F. G., & Husain, M. (2009). The precision of visual working
9 https://doi.org/10.1167/9.10.7
10 Boehm, U., Marsman, M., Matzke, D., & Wagenmakers, E.-J. (2018). On the importance
13 Bürkner, P.-C. (2017). brms: An R Package for Bayesian Multilevel Models Using Stan.
15 Bürkner, P.-C. (2018a). Advanced Bayesian multilevel modeling with the R package
17 Bürkner, P.-C. (2018b). brms: Bayesian Regression Models using “Stan” (2.5.0)
19 Carpenter, B., Gelman, A., Hoffman, M. D., Lee, D., Goodrich, B., Betancourt, M.,
20 Brubaker, M., Guo, J., Li, P., & Riddell, A. (2017). Stan: A Probabilistic Programming
3 Frischkorn, G. T., & Schubert, A.-L. (2018). Cognitive Models in Intelligence Research:
4 Advantages and Recommendations for Their Application. Journal of Intelligence, 6(3), 34.
5 https://doi.org/10/gd3vqn
8 https://doi.org/10.1016/j.intell.2022.101681
9 Grange, J. A., & Moore, S. B. (2022). mixtur: An R package for designing, analysing,
10 and modelling continuous report visual short-term memory studies. Behavior Research Methods,
12 Grange, J. A., Moore, S. B., & Berry, E. D. J. (2021). mixtur: Modelling Continuous
14 project.org/package=mixtur
15 Haaf, J. M., & Rouder, J. N. (2018). Some do and some don’t? Accounting for variability
17 Haines, N., Kvam, P. D., Irving, L. H., Smith, C., Beauchaine, T. P., Pitt, M. A., Ahn,
18 W.-Y., & Turner, B. (2020). Learning from the Reliability Paradox: How Theoretically Informed
19 Generative Models Can Advance the Social, Behavioral, and Brain Sciences.
20 https://doi.org/10/gg8662
22 models for delayed estimation tasks (Version 0.8.0) [Computer software] [R].
1 Lin, H.-Y., & Oberauer, K. (2022). An interference model for visual working memory:
3 https://doi.org/10.1016/j.cogpsych.2022.101463
4 Loaiza, V. M., & Souza, A. S. (2018). Is refreshing in working memory impaired in older
5 age? Evidence from the retro-cue paradigm. Annals of the New York Academy of Sciences,
7 Ma, S., Popov, V., & Zhang, Q. (2022). A Neural Index Reflecting the Amount of
10 Ma, W. J., Husain, M., & Bays, P. M. (2014). Changing concepts of working memory.
12 Ngiam, W. X. Q., Foster, J. J., Adam, K. C. S., & Awh, E. (2022). Distinguishing guesses
13 from fuzzy memories: Further evidence for item limits in visual working memory. Attention,
17 Oberauer, K., & Lewandowsky, S. (2019). Simple measurement models for complex
19 https://doi.org/10.1037/rev0000159
20 Oberauer, K., & Lin, H.-Y. (2017). An interference model of visual working memory.
22 Oberauer, K., & Lin, H.-Y. (2023). An interference model for visual and verbal working
1 Oberauer, K., Stoneking, C., Wabersich, D., & Lin, H.-Y. (2017). Hierarchical Bayesian
2 measurement models for continuous reproduction of visual features from working memory.
6 Popov, V., So, M., & Reder, L. M. (2021). Memory resources recover gradually over
7 time: The effects of word frequency, presentation rate, and list composition on binding errors and
10 Pratte, M. S. (2020). Set size effects on working memory precision are not due to an
12 https://doi.org/10/ghr327
15 Schurgin, M. W., Wixted, J. T., & Brady, T. F. (2020). Psychophysical scaling reveals a
16 unified theory of visual memory strength. Nature Human Behaviour, 4(11), Article 11.
17 https://doi.org/10.1038/s41562-020-00938-0
18 Skrondal, A., & Laake, P. (2001). Regression among factor scores. Psychometrika, 66(4),
19 563–575. https://doi.org/10/fckhxw
20 Souza, A. S., & Oberauer, K. (2016). In search of the focus of attention in working
21 memory: 13 years of the retro-cue effect. Attention, Perception, & Psychophysics, 78(7), 1839–
22 1860. https://doi.org/10/f83nvs
23 Suchow, J. W., Brady, T. F., Fougnie, D., & Alvarez, G. A. (2013). Modeling visual
Running head: ESTIMATING VWM MIXTURE MODELS IN BRMS 69
2 https://doi.org/10.1167/13.10.9
3 van den Berg, R., Awh, E., & Ma, W. J. (2014). Factorial comparison of working
5 van den Berg, R., Shin, H., Chou, W.-C., George, R., & Ma, W. J. (2012). Variability in
6 encoding precision accounts for visual short-term memory limitations. Proceedings of the
9 https://mvuorre.github.io/posts/2017-10-09-bayesian-estimation-of-signal-detection-theory-
10 models/#ref-estes_problem_1956
13