0% found this document useful (0 votes)
43 views

Tutorial MixtureModel Brms

Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
43 views

Tutorial MixtureModel Brms

Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 69

Running head: ESTIMATING VWM MIXTURE MODELS IN BRMS 1

3 A tutorial for estimating mixture models for visual working memory tasks in brms:

4 Introducing the Bayesian Measurement Modeling (bmm) package for R

6 Gidon T. Frischkorn & Vencislav Popov

7 Department of Psychology, University of Zurich

9 Word count: 11929

10 Figures: 14

11 Tables: 2

12

13

14

15

16 Author Note

17 Gidon Frischkorn and Vencislav Popov contributed equally to the manuscript and should

18 be considered co-first authors. Correspondence concerning this manuscript should be addressed

19 to [email protected] and [email protected]


Running head: ESTIMATING VWM MIXTURE MODELS IN BRMS 2

1 Abstract

2 Mixture models for visual working memory tasks using continuous report recall are

3 highly popular measurement models in visual working memory research. Yet, efficient and easy-

4 to-implement estimation procedures that flexibly enable group or condition comparisons are

5 scarce. Specifically, most software packages implementing mixture models have used maximum

6 likelihood estimation for single-subject data. Such estimation procedures require large trial

7 numbers per participant to obtain robust and reliable estimates. This problem can be solved with

8 hierarchical Bayesian estimation procedures that provide robust and reliable estimates with lower

9 trial numbers. In this tutorial, we illustrate how mixture models for visual working memory tasks

10 can be specified and fit in the R package brms. The benefit of this implementation over existing

11 hierarchical Bayesian implementations is that brms integrates hierarchical Bayesian estimation of

12 the mixture models with an efficient linear model syntax that enables us to adapt the mixture

13 model to practically any experimental design. Specifically, this implementation allows varying

14 model parameters over arbitrary groups or experimental conditions. Additionally, the

15 hierarchical structure and the specification of informed priors can improve subject-level

16 parameter estimation and solve estimation problems frequently. We will illustrate these benefits

17 in different examples and provide R code for easy adaptation to other use cases. We also

18 introduce a new R package called bmm, which simplifies the process of estimating these models

19 with brms.

20 Keywords: Tutorial, Mixture Model, Visual Working Memory, brms, Bayesian Modeling
Running head: ESTIMATING VWM MIXTURE MODELS IN BRMS 3

1 A tutorial for estimating mixture models for visual working memory tasks in brms:

2 Introducing the Bayesian Measurement Model (bmm) package for R

3 In research on visual working memory participants are often asked to remember and

4 reproduce continuous features of visual objects such as their color or orientation (Prinzmetal,

5 Amiri, Allen & Edwards, 1998; Wilken & Ma, 2004). These continuous reproduction tasks

6 produce rich data that is often analyzed by measurement mixture models. Using such models

7 allows researchers to dissociate relevant aspects of behavioral performance such as the precision

8 of the memory representation vs the probability of recalling the correct feature (Zhang & Luck,

9 2008; Bays et al., 2009; Oberauer et al., 2017; Brady et al.; 2022; Oberauer, 2022). Although

10 mixture models have been widely applied by many researchers in the field1, a flexible, well-

11 documented, and easily accessible way of efficient estimation of these models is lacking. This

12 tutorial provides an implementation of three highly popular measurement models for visual

13 working memory tasks using the R package brms (Bürkner et al., 2018).

14 In the continuous reproduction task (sometimes also called delayed estimation task),

15 participants encode a set of visual objects into visual working memory and are then asked to

16 reproduce a specific feature of one cued object on a continuous scale at test (see Figure 1 for an

17 illustration). Most often the features used in these tasks are colors sampled from a color wheel

18 (Wilken & Ma, 2004) or continuous orientations of a bar or a triangle (Bays et al., 2011). The set

19 of to-be-remembered objects typically consists of one up to eight objects spatially distributed

20 over the screen. Thus, participants must associate the to-be-remembered features (e.g. color or

1
For example, a Google Scholar query for “‘mixture model’ AND ‘visual working memory’” returns 930
results; the article by Zhang and Luck (2008) which introduced mixture modeling in visual working memory has
been cited 1677 times; the MATLAB package MemToolbox which implements various mixture models for visual
working memory has been cited 252 times (Suchow et al., 2013).
Running head: ESTIMATING VWM MIXTURE MODELS IN BRMS 4

Figure 1. Illustration of a typical continuous reproduction task using colored squares as visual objects. Participants
should remember which color was presented at which location and after a short retention interval, they are asked to
reproduce the color of the cued item by selection on the color wheel. The dependent variable is the response error,
that is the deviation of the selected response from the originally presented color (illustrated by the arc).

1 orientation) with the spatial locations they are presented at. The precision of the representation of

2 an object’s feature in visual working memory is measured as the angular deviation from the true

3 feature presented at encoding.

4 Benefits of using cognitive measurement models over behavioral performance measures

5 In these continuous reproduction tasks, the simplest measure of performance is the

6 average deviation of the response from the true feature value. In many studies, this average recall

7 error has been the main dependent variable for evaluating the effect of experimental

8 manipulations. Yet, the average recall error confounds different properties of memory

9 representations and does not sufficiently represent the theoretical processes assumed by current

10 models of visual working memory. Therefore, different measurement models have been

11 proposed to formalize distinct aspects of visual working memory models and how they translate

12 into observed behavior (for an overview of currently proposed measurement models, see

13 Oberauer et al., 2017; and Oberauer, 2022).


Running head: ESTIMATING VWM MIXTURE MODELS IN BRMS 5

1 Measurement models provide a more refined representation of memory processes

2 because they decompose the average recall error into several theoretically meaningful

3 parameters. The three measurement models we will be addressing in this tutorial paper are a) the

4 two-parameter mixture model (Zhang & Luck, 2008), b) the three-parameter mixture model

5 (Bays et al., 2009), and c) the interference measurement model (Oberauer et al., 2017). The first

6 two models are mathematically equivalent to constrained versions of the interference

7 measurement model (Oberauer et al., 2017). At the core of these models is the assumption that

8 responses in continuous reproduction tasks can stem from different distributions depending on

9 the continuous activation of different memory representation or the cognitive state a person is in

10 at recall (see Figure 2).

11 The two-parameter mixture model distinguishes two states: a) having a representation of

12 the cued object with a certain precision of its feature in visual working memory (see solid blue

13 distribution in Figure 2), versus b) having no representation in visual working memory and thus

14 guessing a random response (see the dashed red distribution in Figure 2). Responses based on a

15 noisy memory representation of the correct feature come from a circular normal distribution (i.e.,

16 von Mises) centered on the correct feature value, while guessing responses come from a uniform

17 distribution along the entire circle. The three-parameter mixture model adds a third state, namely

18 confusing the cued object with another object shown during encoding and thus reporting the

19 feature of the other object (see long dashed green distribution in Figure 2). Responses from this

20 state are sometimes called non-target responses or swap errors. Finally, the interference

21 measurement model reformulates these states in terms of different continuous sources of

22 activation – that is background noise, general, and context activation – and additionally accounts

23 for the spatial proximity between items, predicting that confusion between spatially close items
Running head: ESTIMATING VWM MIXTURE MODELS IN BRMS 6

Figure 2. In the described measurement model, the probability of reporting a certain feature value depends on the
cognitive state a person is in at recall. If the person recalls the cued object, it will report values following the solid
blue distribution. If the person recalls another object, it will report values following the long dashed green
distribution centered on the other object value. If the person cannot recall anything, it will be guessing a random
value following the dashed red distribution. Depending on the relative proportions of these distribution the
overserved responses will follow the black-dotted mixture of all distributions.

1 is more likely than between distant items. For all three models, the resulting observed recall

2 distribution is a weighted mixture of the different distributions included in the model (see the

3 dotted-black line in Figure 2).

4 When applied to the data, the first two mixture models estimates several parameters: κ,

5 which is the precision of the von Mises distribution of memory representations (which is the

6 inverse of σ, the standard deviation of the circular normal distribution); pmem, the probability that

7 a response comes from memory of the correct feature; pnon-target, the probability that a response

8 comes from memory for an incorrect feature associated with another object in memory; and

9 pguessing, the probability that a response is a random guess. In the two-parameter mixture model

10 pnon-target = 0 and pmem+pguessing = 1, while in the three-parameter mixture model pmem + pnon-target +
Running head: ESTIMATING VWM MIXTURE MODELS IN BRMS 7

1 pguessing = 1. Formally, according to the mixture models the probability of responding with a

2 feature x is:

∑$+,- 𝑣𝑀(𝑥; 𝜇+ , κ)
3 𝑃(𝑥) = 𝑝!"! 𝑣𝑀(𝑥; 𝜇# , κ) + 𝑝$%$&'()*"' + 𝑝*."// 𝑣𝑀(𝑥; 0,0) (1)
𝑛−1

4 where vM is the von Mises distribution, 𝜇# is the location of the target feature, 𝜇+

5 represents the locations of the non-target features, and 𝑣𝑀(𝑥; 0,0) is the von Mises distribution

6 with 0 precision, which is equivalent to a uniform circular distribution. Finally, n specifies the

7 number of features to be held in memory (i.e., set size). The interference measurement model

8 further decomposes the probabilities of selecting a target, a non-target or random guessing into

9 continuous sources of activations for context activation (c), item activation (a), and background

10 noise (n). For more details on this decomposition see Oberauer et al. (2017).

11 Benefits of hierarchical Bayesian parameter estimation over non-hierarchical frequentist

12 parameter estimation

13 Until recently, most researchers have used custom-built code to implement the existing

14 mixture models. Although there is software that implements some of these measurement models

15 (e.g., Grange et al., 2021), this software mostly uses a two-step procedure. First, the parameters

16 of the model are estimated separately for each subject in each condition using maximum

17 likelihood methods. Then, in a second step the parameter estimates are analyzed with traditional

18 inference methods such as t-tests, ANOVA, or linear regression to determine which parameters

19 vary significantly as a function of condition. These methods constrain parameter estimation in

20 several ways and can either lead to the over- or underestimation of standard errors in statistical

21 tests (Boehm et al., 2018; Skrondal & Laake, 2001). Furthermore, to obtain robust parameter

22 estimates maximum likelihood estimation requires at least 200 trials per subject per condition

23 (Grange & Moore, 2022)


Running head: ESTIMATING VWM MIXTURE MODELS IN BRMS 8

1 To solve these issues, hierarchical Bayesian implementations of these models have also

2 been proposed (Hardman, 2016/2017; Oberauer et al., 2017; Suchow et al., 2013). Hierarchical

3 Bayesian parameter estimation provides several benefits over frequentist estimations (again, see

4 Table 1 for an overview). Critically, by estimating the data from all subjects and all conditions

5 simultaneously, robust parameter estimates can be obtained with less data per subject and

6 condition (see Appendix A for recovery simulations, which shows a case where non-hierarchical

7 maximum likelihood estimation fails to recover the correct parameters, while the hierarchical

8 Bayesian implementation fares much better). For a comprehensive overview comparing

9 frequentist vs Bayesian estimation of these measurement models see Table 1.

Table 1. Comparison of non-hierarchical frequentist versus hierarchical Bayesian estimation of


measurement models for visual working memory tasks.
Non-hierarchical frequentist Hierarchical Bayesian estimation
estimation (e.g., using mixtur) using brms

Speed of model Very quick: only seconds per subject Slow: from a few minutes for simple
estimation models to several hours or even days
for more complex models and larger
data sets

Required data > 200 retrievals per participant in > 50 retrievals per participant in each
each condition (Grange & Moore, condition (see Appendix)
2022)

Inference Stepwise approach: 1) estimate One-step inference: Parameters for


parameters for each subject in each each subject in each condition are
condition, then 2) submit parameters estimated in one model. Uncertainty
of interest to a statistical test. in subject parameters is accounted
Problem: This approach ignores the for in effect estimates of conditions
uncertainty in parameters in the differences. (Boehm et al., 2018)
second step. (Boehm et al., 2018)

Implementation Simple implementation exists for Can be implemented in a well-


estimating the model for single documented and flexible R package:
subject data in one condition; For brms;
more complex models, researchers Researchers can get support from a
broad community using brms. The
Running head: ESTIMATING VWM MIXTURE MODELS IN BRMS 9

must write their likelihood function linear model syntax can specify
and adapt it for each experiment. models for almost any use case of the
mixture model.

Evaluating Functions implemented in the R Existing tools and functions provided


model fit package mixtur allow to evaluate by brms provide straightforward
model fit. methods for checking the
For experiments using custom convergence of parameter estimates,
likelihood functions, researchers model fit, and evaluate results.
must write their own functions and
scripts to evaluate model fit.

Varying Standard implementations estimate The linear model syntax


parameters over model parameters for each subject in implemented in brms specifies which
conditions each condition. Thus, all parameters parameter should vary over which
must vary over the same set of condition and different parameters
experimental conditions. can be set to vary over different
conditions.

Model Separately, for each subject à As the model is estimated for the
comparison problem with consistency over the whole sample simultaneously, model
whole sample (for an example, see comparisons can be made over the
Popov et al., 2021) whole sample (for an example, see
Oberauer et al., 2017).

Possible Only discrete predictors or group Discrete and continuous predictors,


predictors comparisons as well as group comparisons

Constraints on If provided by the optimization The specification of informed priors


parameters algorithm, lower and upper bounds allows to impose flexible and
on parameters can be specified meaningful constraints on parameters

1 Why is there a need for an alternative hierarchical Bayesian estimation of mixture

2 models, if these already exist? The proposed implementations have specified the mixture model

3 in JAGS (Oberauer et al., 2017), MATLAB (Suchow et al., 2013) or a no longer maintained R

4 package (Hardman, 2016/2017). The JAGS implementation by Oberauer et al. (2017) uses code

5 designed for the specific experimental designs and factors analyzed in Oberauer et al. (2017).

6 Consequently, for other experiments with different factors and factor levels the JAGS code

7 would need to be adjusted for the specific conditions and groups in it. Thus, researchers would
Running head: ESTIMATING VWM MIXTURE MODELS IN BRMS 10

1 need to adapt JAGS code to apply these models to their specific experiments. This is a skill only

2 researchers with considerable experience in Bayesian modeling have. The MATLAB

3 implementation by Suchow et al. (2013) is more flexible and well documented. However,

4 MATLAB is an expensive proprietary language that not everyone has access to. Nowadays, R is

5 the language of choice when teaching statistical analysis in Psychology departments, thus many

6 more researchers are familiar with it compared to MATLAB. Finally, the CatContModel R

7 package is not available on CRAN and is no longer actively maintained and thus does not provide

8 the stability and flexibility of an actively maintained and tested package such as brms.

9 Furthermore, neither the MATLAB implementation nor the CatContModel package

10 allows to estimate all the three models we are presenting implementations for here. In particular,

11 they do not provide an implementation of the interference measurement model. And although

12 they provide a hierarchical Bayesian implementation of the mixture models over all subjects

13 none of the implementations allows to estimate the three-parameter mixture model or the

14 interference model simultaneously over different set sizes. Specifically, when parameters of the

15 model vary across conditions, these previous implementations need to fit a separate model to

16 each condition, followed by a 2-step inference procedure, thus significantly reducing the benefits

17 of hierarchical estimation (see Table 2 for a full comparison).

18 To overcome these difficulties and to make the discussed measurement models more

19 accessible, we illustrate how to implement these models in the R package brms, a general-

20 purpose package for estimating Bayesian multilevel regression models (Bürkner, 2017, 2018a,

21 2018b). The major benefit of this implementation is that brms provides a powerful linear model

22 syntax that allows us to flexibly specify which model parameters should vary dependent on

23 discrete or continuous predictors. Therefore, brms allows a general-purpose implementation of


Running head: ESTIMATING VWM MIXTURE MODELS IN BRMS 11

1 the measurement models for visual working memory tasks that can be adapted to practically any

2 experimental design. Additionally, brms uses the probabilistic programming language STAN

3 (Carpenter et al., 2017) to estimate parameters. Arguably, STAN is the most cutting-edge

4 estimation algorithm for Bayesian modeling and provides robust parameter estimates even when

Table 2. Comparison of brms/bmm and MemToolbox


MemToolbox brms/bmm

Estimation Bayesian or Maximum Likelihood Bayesian

Fitting multiple Separately to each condition Jointly (Linear model syntax)


conditions

Inference over Stepwise approach: 1) estimate One-step inference: Parameters for


multiple parameters for all subjects in each each subject in each condition are
conditions condition, then 2) submit estimated in one model.
parameters of interest to a statistical Uncertainty in subject parameters is
test. accounted for in effect estimates of
Problem: ignores uncertainty in conditions differences. (Boehm et
parameters in the second step. al., 2018)
(Boehm et al., 2018). Hierarchical
estimation applies only to different
subjects, but not different
conditions.

Allows No Yes
continuous
predictors

Can fix some No Yes – any model parameter can be


parameters predicted by any combination of
across conditions
conditions

Behavioral tasks Continuous report, Change Continuous report, Custom (see


detection General Discussion)

Included models 2-parameter, 3-parameter, Variable 2-parameter, 3-parameter,


“out-of-the- precision, Slot+averaging, Interference measurement model,
box” Slots+resources Variable precision*

Note: * any model in bmm can implement variable precision by including a random effect
over trials in the linear model syntax that captures trial-by-trial variability in parameters.
Running head: ESTIMATING VWM MIXTURE MODELS IN BRMS 12

1 parameters are correlated and with a small posterior sample size. Finally, as a general-purpose

2 package, brms has a large and active community, which can assist in solving problems. In sum,

3 implementing measurement models for visual working memory tasks in brms will enable more

4 researchers to use these models in their work and provides state-of-the-art hierarchical Bayesian

5 estimation for them.

6 brms vs bmm

7 All mixture models presented in this tutorial can be implemented in brms without

8 requiring additional packages or functions2. Implementing the simple two-parameter mixture

9 model (Zhang & Luck, 2008) this way is relatively easy. However, implementing the three-

10 parameter mixture model (Bays et al., 2009) and the interference measurement model (Oberauer

11 & Lin, 2017) is more complicated, because they require transformation of parameters and

12 combining multiple parameters into one3. Additionally, we implemented some additional

13 functionality to allow the estimation of both the three-parameter mixture model and the

14 interference measurement model over varying set sizes. To make these implementations more

15 accessible and reduce the chance for errors, we wrote an R package called bmm (Bayesian

16 Measurement Modeling)4, which provides wrapper functions around brms that take care of these

17 procedures and allow the user to specify the desired model more easily.

2
R scripts for all following examples are available on github: https://github.com/GidonFrischkorn/Tutorial-
MixtureModel-VWM/tree/main/scripts/
3
For example, you can compare the model formulas and priors for Example 3 written for brms vs bmm in
the files scripts/brms_examples/Example3_Bays2009.R and scripts/bmm_examples/Example3_Bays2009.R. The
bmm package allows the user to specify only the important elements of the formula, and then generates the
necessary formula and priors for brms itself. More generally, the relationship between bmm and brms is the same as
that between brms and stan – brms handles model specification, data recoding and other labor intensive procedures,
and then generates the relevant stan code. For measurement models, bmm takes care of similar tasks that cannot be
achieved in brms, before submitting the model to brms for estimation.
4
The bmm package and its installation instructions are available on github:
https://github.com/venpopov/bmm
Running head: ESTIMATING VWM MIXTURE MODELS IN BRMS 13

1 In the examples that follow, we first demonstrate how the two-parameter mixture model

2 can be implemented directly in brms. Although it is easier to use bmm even for these cases, we

3 wanted to show the full process in brms so that readers can become familiar with how the models

4 are implemented. Then, in Example 3-5 we only show how to use bmm to estimate the three-

5 parameter mixture model and the interference measurement model, rather than how to implement

6 them directly in brms.

7 Example 1: Implementing the two-parameter mixture model in brms

8 In most common brms use cases, researchers specify how the dependent variable is

9 distributed (e.g., a gaussian distribution for continuous data or a binomial distribution for

10 accuracy data), and then specify a linear model that predicts parameters of this data distribution

11 (usually the mean or location parameters) dependent on categorical or continuous predictors.

12 Additionally, the model syntax implements generalized hierarchical modeling, meaning you can

13 specify random effects over subjects that account for individual differences in the overall fixed

14 effect. Initially, brms was not specifically designed to implement mixture models for visual

15 working memory, but one of its recent updates makes that possible: in addition to linear models

16 for single data distributions, brms now also allows users to specify mixtures of data distributions.

17 This is a critical feature that we will build on for estimating the above-described measurement

18 models for visual working memory tasks. Specifically, there are five steps we need to take for

19 estimating these mixture models:

20 1. Specify the mixture family to use for the measurement model.

21 2. Specify the model formula to set up the model and predict parameters of interest.

22 3. Set priors to follow the assumptions of the different measurement models and

23 properly identify the different mixture components.


Running head: ESTIMATING VWM MIXTURE MODELS IN BRMS 14

1 4. Estimate the model.

2 5. Evaluate model fit and results.

3 To illustrate this general procedure, we will start by implementing the two-parameter

4 mixture model (Zhang & Luck, 2008). To additionally showcase some of the powerful features

5 of this implementation in brms we will give some examples for different experimental designs.

6 This section is long because we explain all concepts in detail, but the actual code is relatively

7 short and simple. In fact, setting up most of the examples and running the brms models took us

8 from as little as 30 minutes to less than a few hours, showcasing the flexibility and adaptability

9 of the brms implementation of the mixture models.

10 The data

11 For our first example, we used the data from Experiment 2 reported in Zhang & Luck

12 (2008). This experiment included a varying number (1, 2, 3, or 6) of spatially distributed colored

13 objects on screen and participants were asked to report the color of one randomly chosen object

14 after a short retention interval on the color wheel. For modeling with brms, the data needs to be

15 in long format, where each row represents a single observation, and each column specifies what

16 conditions generated this observation (see Figure 3 for an illustration of the first few rows of the

17 used dataset).

Figure 3. Structure of the Zhang & Luck (2008) dataset. subID = participant number; trial = trial number; setsize =
number of presented colors; RespErr = difference between response and target color location in radians; Pos_Lure1
to Pos_Lure5 = location of non-target colors relative to the target color location (in radians).
Running head: ESTIMATING VWM MIXTURE MODELS IN BRMS 15

1 Specifying the mixture family

2 For this first example, we will estimate the two-parameter mixture model, allowing the

3 precision of memory and the probability that an item is in memory to vary as a function of set

4 size. To specify the mixture family required for the two-parameter mixture model, we use the

5 mixture function provided by brms.5 For the two-parameter mixture model, we need a mixture of

6 two distributions, one for guessing and one for sampling from the memory representation. To

7 ensure that both distributions cover the same range of responses, it is easiest to specify a mixture

8 of two von Mises distributions. This is possible, because a von Mises distribution with a

9 precision of zero is equal to a uniform distribution over the circular space. Thus, the mixture

10 family for the two-parameter mixture model can be specified as:

11 ZL_mixFamily <- mixture(von_mises, von_mises)

12 Explanation of model parameters and specifying the model formula

13 We can next specify the model formula to predict parameters from the model. This serves

14 three purposes: 1) it specifies the respective measurement model, we want to estimate, 2) it

15 specifies the variable names of the dependent and independent variables in our data set, and 3) it

16 specifies which parameters of the measurement model will be predicted by which variables in

17 our data. For mixture families in brms there are two classes of parameters: a) distributional

18 parameters of each distribution contained in the mixture (such as the mean, mu, and the

5
In principle, the mixture-function in brms can be used to specify any finite mixture of the supported data
distributions. This is a powerful feature that generalizes to any case in which the data does not follow a single
distribution and we hope that this tutorial enables researchers outside visual working memory research to consider
mixture models in cases where they might provide additional theoretical insight. One important limitation of the
implementation of mixture families to date is that the number of distributions cannot be determined or estimated by
the data and thus needs to be equal for all conditions. This requires additional variables and more complex model
code for a general-purpose implementation of the more complex measurement models for visual working memory
tasks. This will be explained when these models are introduced.
Running head: ESTIMATING VWM MIXTURE MODELS IN BRMS 16

1 precision, kappa, of the von Mises distributions), and b) mixing proportions (theta) for each

2 distribution of the mixture.

3 It is important to note that brms does not directly estimate the probabilities that each

4 response comes from each distribution (e.g. pmem and pguessing). Instead, brms estimates mixing

5 proportions that are weights applied to each of the mixture distributions and they are transformed

6 into probabilities (e.g. pmem and pguessing) using a softmax normalization. Thus, for a mixture of K

7 distributions, the probability for the data to stem from a specific mixture distribution i is:

𝑒 0!
8 𝑝+ = 0"
(2)
∑1
2 ,# 𝑒

9 Therefore, the mixing weights can range from minus to plus infinity, with negative values

10 resulting in a low probability of the data stemming from the respective mixture distributions and

11 positive values resulting in high probabilities of the data coming from the respective mixture

12 distributions. Because the distribution probabilities sum to 1 there would be infinitely many

13 solutions for obtaining specific probabilities (e.g. for any value j, if 𝜃# = 𝜃- = 𝑗, pmem = 0.5).

14 Thus, by default one of the mixing proportions is fixed to 0 in brms (usually the mixing

15 proportion of the last mixture distribution, which we will use as the guessing distribution), and

16 all other proportions are estimated freely. For example, if the mixing weight for responses

17 coming from memory is estimated as 2, we can obtain the response probabilities as such:

𝑒 0# 𝑒- 7.389
18 𝑝!"! = 0 = = = 0.88
𝑒 # + 𝑒 0$ 𝑒 - + 𝑒 4 7.389 + 1

𝑒 0$ 𝑒4 1
19 𝑝*."// = 0 = = = 0.12
𝑒 # + 𝑒 0$ 𝑒 - + 𝑒 4 7.389 + 1

20 For both the distribution parameters and the mixing proportions, the parameters are

21 indexed with an integer which specifies the distribution they are associated with. So, in our case
Running head: ESTIMATING VWM MIXTURE MODELS IN BRMS 17

1 we have the distributional parameters for two von Mises distributions, that is mu1 & mu2 for the

2 means, and kappa1 & kappa2 for the precision. Additionally, we have two mixing weights

3 theta1 and theta2. Distribution 1 is the von Mises distribution centered on the target response,

4 while Distribution 2 is the uniform circular distribution. Technically, there are thus six

5 parameters in this model. Practically, four of these parameters will be fixed to a constant or

6 determined by some variable in our data (see the next section “Setting priors to identify the

7 model” for more details) – mu1 and mu2 are fixed to 0, because the mean of the error and the

8 uniform distributions are 0, kappa2 will be fixed to 0 because the second von mises distribution

9 should be uniformly distributed, and theta2 will be fixed to 0 for the reasons explained in the

10 previous paragraph. Thus, only two parameters remain to be estimated: kappa1 (memory

11 precision) and theta1 (mixing proportion for responses coming from memory).

12 For illustration purposes, we specified the model formula for all parameters except theta2

13 for the two-parameter measurement model using the brmsformula or short bf function below:

14 ZL_mixFormula <- bf(RespErr ~ 1,


15 kappa1 ~ 0 + setsize + (0 + setsize || subID),
16 kappa2 ~ 1,
17 theta1 ~ 0 + setsize + (0 + setsize || subID))

18 For all linear model formulas in brms, the left side of an equation refers to the to-be

19 predicted variable or parameter and the right side specifies the variables used to predict it. In the

20 first line of the brmsformula, the left side specifies the dependent variable (RespErr) to be used

21 to fit the model. In this formula, the precision of the first von Mises (kappa1) is set to vary over

22 setsize, and additionally this setsize effect can vary over subject. Likewise, the mixing proportion

23 of the first von Mises distribution (theta1) varies over the setsize variable, and this effect can also

24 vary over subjects. Specifically, using the 0 + setsize coding we directly estimated the parameter

25 values for each set size. Instead, we could have used an effect coding specifying 1 + setsize.
Running head: ESTIMATING VWM MIXTURE MODELS IN BRMS 18

1 Then, brms would have estimated an intercept and effects following the effect coding of the

2 setsize variable (e.g., treatment or effects contrast as specified by contr.treat or contr.sum in R).

3 The random slopes for the setsize effects are set by specifying which of the fixed effects can vary

4 over which index variable using the || coding. The double vertical line additionally specifies that

5 correlations between the different random effects should not be estimated. This setting speeds up

6 model estimation and makes interpretation of the fixed effects more straightforward. Kappa2 is

7 not estimated but should be included in the formula so that we can specify a constant prior on it

8 later.

9 Additionally, there are three points that need to be considered: 1) the von Mises

10 distribution implemented in brms is scaled to radians, so the response variable needs to be

11 converted to radians if it was originally coded in degrees, and 2) in the above specified model,

12 the response variable refers to the response error, that is the deviation of the response in one trial

13 from the target response of the to-be-reproduced item. Finally, 3) all but one of the mixing

14 proportions of any mixture family can be predicted. This is necessary to identify the softmax

15 transformation. Therefore, theta2 is not included in the model formula.

16 Setting priors to identify the model

17 To sufficiently identify the mixture distributions and implement assumptions from the

18 respective measurement model, we must constrain some model parameters via priors. Ultimately,

19 in the two-parameter mixture model we only estimate a) the precision of the target distribution,

20 that is kappa1, and b) the probability of an item being stored in memory, that is defined by the

21 mixing proportion theta1. All other model parameters thus need to be constrained via priors or

22 through the data.


Running head: ESTIMATING VWM MIXTURE MODELS IN BRMS 19

1 First, we want the guessing distribution to be uniform over the whole circular space. This

2 is achieved by fixing the precision of the second von Mises distribution to practically zero. For

3 estimation purposes brms uses a logarithmic link function for the precision parameters of the von

4 Mises. So, the precision kappa on the native scale is transformed onto the parameter space using

5 a logarithmic function. As log(0) is not defined, we can only fix kappa for the second von Mises

6 to a value very close to zero. Practically, any native kappa value below 10-3 achieves a virtually

7 uniform distribution6. Since the priors are set on the transformed parameters this would equate to

8 any value smaller than log (10&5 ). For convenience we set it to -100, which is a lot smaller.

9 Naturally, the mean or location of a uniform distribution on a circular space is not properly

10 defined, thus we also must fix this value (mu2), conventionally to 0. Likewise, we need to fix the

11 location of our target distribution (mu1), the first von Mises in our mixture, to the location of the

12 cued target. For all examples, we have specified the model formula using the response error as

13 dependent variable. Thus, the location of the memory distribution needs to be fixed to zero.

14 Finally, the softmax transformation needs to be identified internally by fixing one mixture

15 proportion as a reference. This is already internally implemented in brms for all mixture models.

16 With this default, the freely estimated mixture proportion for any mixture of two distributions

17 can be transformed into mixture probabilities using the inverse logit function. For more than two

18 distributions, we need to compute the probability using the softmax normalization. However, this

19 normalization is implemented in several R packages, or it can be computed manually.

20 These constraints are best implemented into the model using the prior function of brms.

21 Specifically, we set constant priors for both parameters of the second von Mises and another

6
This is the value used for generating the guessing distribution in Figure 2.
Running head: ESTIMATING VWM MIXTURE MODELS IN BRMS 20

1 constant prior for the mean or location of our first von Mises. The prior specification looks like

2 this:

3 ZL_mixPriors <- prior(constant(-100), class = Intercept, dpar =


4 "kappa2") +
5 prior(constant(0), class = Intercept, dpar = "mu2") +
6 prior(constant(0), class = Intercept, dpar = "mu1")

7 The first argument of the prior command always denotes the prior distribution we want to

8 use. The constant prior fixes the parameter to the exact specified value. The class argument

9 specifies which class of parameters the prior should be applied to, and the dpar argument

10 specifies the dependent parameter the prior should be applied to.

11 Estimating the model

12 Using this specification, we can now estimate the parameters of the model. This is done

13 using the Bayesian regression model (brm) function of brms. We need to call the defined mixture

14 family and the specified priors together with our mixture formula and the data for which the

15 parameters should be estimated:

16 fit_ZL_mixModel <- brm(


17 formula = ZL_mixFormula,
18 family = ZL_mixFamily,
19 priors = ZL_mixPriors,
20 data = myData)

21 Additionally, further arguments can be submitted to the brm function: for example, how

22 many warmup and total iterations should be performed, or how many MCMC chains should be

23 sampled. For an explanation on these additional settings please see the brms user guide.

24 The function will first compile the code of the STAN model and then, after finishing

25 compilation, it will run the specified number of MCMC chains. The default value is four with

26 each chain having 1000 warmup samples and 1000 samples after warmup. For faster estimation,
Running head: ESTIMATING VWM MIXTURE MODELS IN BRMS 21

1 we recommend setting parallel sampling prior to running the model. This can be done, by

2 running the following command prior to starting model estimation:

3 options(mc.cores = parallel::detectCores())

4 This option will allow the simultaneous estimation of as many MCMC chains as your

5 processor has capacity for. Typically, your system will be able to estimate at least 4 chains in

6 parallel. More recent PC and laptops can allow up to 32 parallel chains. However, 4 chains are

7 typically sufficient to ensure that different starting values converge to the same model results. To

8 avoid re-estimating the model after each time you close an R session, consider saving the fit-

9 object. For an example how to set up your R code to only estimate the model should there not be

10 a saved file for the model object, see the github repository that provides R code and results for all

11 examples.

12 Evaluating model fit & results

13 Once parameter estimation is completed, we need to evaluate the model fit and results

14 from the parameter estimation. For this, we can choose from the wide range of functions

Figure 4. Example of a posterior predictive plot obtained via the pp_check function provided by brms. The black
line illustrates the distribution of the data. The different blue lines illustrate ten independent predicted distributions
from the model. The better the model predicted distributions overlay with the data, the better the model captures the
data. The provided posterior predictive plot thus illustrates a good fit of the model to the data.
Running head: ESTIMATING VWM MIXTURE MODELS IN BRMS 22

1 provided by brms and other R packages designed for analyzing posterior predictives from

2 Bayesian models (e.g., bayesplot, tidybayes, etc.). Our recommendation is to at least have a look

3 at graphical model fit plots that overlay posterior predictions from the model over the observed

4 data. In brms this can be done using the pp_check function (see Figure 4 for an illustration). If

5 the model fits the data reasonably well, you can proceed to examine and interpret the estimated

6 parameters.

7 The general-purpose summary function will provide an overview of the estimated

8 parameters of the model (see Figure 5 for the screenshot of the results for this example). The

9 Group-Level Effects section of the model summary (top section in Figure 5) provides

10 information on the random effects, in our example the variation of effects over subjects. Unless

11 you are interested in individual differences this section is not of primary interest. The main

12 takeaway for our first example is that there is credible variation across all set sizes for both the

Figure 5. Screenshot of the summary output for the estimated parameters from the two-parameter mixture model of
the Zhang & Luck (2008, Exp. 2) data estimated via brms.
Running head: ESTIMATING VWM MIXTURE MODELS IN BRMS 23

1 precision of memory representations (kappa1) and the probability for recalling an item from

2 memory (theta1). The Population-Level Effects section (bottom section in Figure 5) summarizes

3 the differences between parameters varying over the variables we specified in our model

4 formula, in this case set size. You need to keep in mind that the reported values are for the

5 estimates on the parameter space (i.e., log transformed kappa; mixing proportions instead of

6 probabilities), so you need to transform them first. For more information about the output of

7 summary(), please consult the brms manual.

8 There are several ways of extracting information from a fitted brms object. The functions

9 fixef and ranef for example provide summaries of the estimated fixed and random effects7. These

10 can be used to transform both fixed and random effects from the parameter space to the native

11 scale. In this case, we can use an exponential transformation to transform kappa estimates (i.e.,

12 exp (𝑘𝑎𝑝𝑝𝑎1)) and an inverse logit (equation 2) to transform theta estimates into probabilities.

13 Additionally, we can convert kappa into the standard deviation of the von Mises using the

#
14 approximation of 𝑠𝑑 = G6 that is adequate for large κ values, or with the more accurate k2sd()

15 function from the bmm package. Keep in mind that this is scaled in radians, because our response

16 variable was provided in radians. Nevertheless, the standard deviation estimates in radians can
/7%
17 also be transformed into standard deviations in degrees using: 𝑠𝑑7"* = 8
∗ 180. All these

18 transformed estimates can then for example be used to plot fixed effects over conditions or

19 evaluate the consistency of effects over subjects.

7
Please note, that random effects are centered, meaning parameter estimates reflect deviations from the
respective mean not the absolute estimate for each subject. Therefore, you need to add the fixed effect for the
respective condition to the random effects and then transform them to the absolute scale. Otherwise, you will obtain
incorrect estimates for the different subjects.
Running head: ESTIMATING VWM MIXTURE MODELS IN BRMS 24

1 In Figure 6 we show the brms parameter estimates for the Zhang and Luck (2008) data.

2 Estimating the parameters from the two-parameter mixture model in brms yielded a good model

3 fit (see Figure 4) and arrived at practically equivalent results as reported in Zhang & Luck

4 (2008). This demonstrates that our implementation works as intended and converges in

5 parameter estimates with other estimation procedures. We will not go into detail for all the

6 possibilities offered by brms to evaluate model results. The R code in the online supplement

7 illustrates some common steps in evaluating and plotting model results. Additional online

8 material in several blogs or in the brms documentations provides ample introduction into post-

9 processing of model results from brms models.

Figure 6. Replication of results from Zhang & Luck (2008) using the introduced hierarchical modeling
framework for the two-parameter mixture model in brms. Panel A (on the left) shows the results for the
probability of having an item in memory. Panel B (on the right) shows the results for the precision of memory
representations. The posterior mean (point) and 95% credibility interval (line range) of the parameter
estimates are shown in black. The average of the subject wise estimates reported by Zhang & Luck (2008) are
shown by the black diamonds. The grey distributions illustrate the whole posterior distribution of estimated
parameters from the brms model implementation.
Running head: ESTIMATING VWM MIXTURE MODELS IN BRMS 25

1 How to do all steps with bmm

2 As we noted in Section brms vs bmm, the entire procedure described above can be done

3 with fewer steps with the bmm package we wrote. It uses brms for model estimation under the

4 hood, but it automatically generates the mixture family and the prior constraints, and it allows us

5 to specify the model formula directly for the parameters of interest. The following code is

6 sufficient to specify and estimate the two-parameter model:

7 ZL_mixFormula_bmm <- bf(RespErr ~ 1,


8 kappa ~ 0 + setsize + (0 + setsize || subID),
9 thetat ~ 0 + setsize + (0 + setsize || subID))

10 fit_ZL2009_bmm <- bmm::fit_model(formula = ZL_mixFormula,


11 data = data_ZL2008,
12 model_type = '2p',
13 warmup=1000, iter=2000, parallel=TRUE)

14 You should keep several things in mind. First, in the bmm specification, the parameters

15 you provide to the formula are not kappa1, kappa2 and theta1, but rather kappa and thetat

16 (mixing proportion for the target responses). These conventions make it simple to extend this to

17 the three-parameter model by including a third parameter thetant, which is the mixing proportion

18 for the non-target responses. Second, the function fit_model can be used to estimate any of the

19 three models we discussed, by specifying the argument model_type (‘2p’ for the two-parameter

20 model, ‘3p’ for the three-parameter model, or ‘IMMabc’, ‘IMMbsc’ and ‘IMMfull’ for several

21 versions of the interference measurement model). You can pass any arguments to this function

22 that you can pass to brms as well. For more information, please consult the documentation of the

23 bmm package. Once fit, you can evaluate the model in the same way as described in the section

24 Evaluating model fit & results


Running head: ESTIMATING VWM MIXTURE MODELS IN BRMS 26

1 Example 2: Estimating varying parameter values for a factorial design with within- and

2 between-subject factor

3 In our second example, we used data reported in Loaiza & Souza (2018). In this

4 experiment, a group of younger (N = 25) and older adults (N = 24) were instructed to memorize

5 five colored disks distributed around an imaginary circle in the center on screen. The retention

6 interval was manipulated to be either short or long, and a variable number of cues (0, 1, 2) could

7 indicate the to be tested item prior to recall (for details please refer to the original publication).

8 The cues presented during the retention interval are thought to bring the to-be tested item back

9 into the focus of attention and thereby improving its accessibility and potentially the precision of

10 the memory representation (Souza & Oberauer, 2016). We chose this example to illustrate how

11 to adapt the implementation of the two-parameter mixture model for a more complex design

12 combining between- and within-subject factors.

13 Except for the specification of the model formula, all steps are the same as in Example 1.

14 We specify the same mixture family of two von Mises distributions, and we use the same priors

15 to constrain our model parameters. We only need to adapt our model formula to incorporate the

16 respective independent variables to predict the mixture model parameters. For this dataset we

17 have two within subject factors (retention interval & cue condition) and one between subject

18 factor (age group). Again, we set up the model formula to directly estimate the parameters for

19 each combination of these factors by suppressing the estimation of an intercept using the 0 +

20 coding8:

8
Normally, only specifying the interaction of different factors with a colon estimates the interaction
without the main effects. However, suppressing the intercept using the 0 coding combined with the interaction coded
with the colon directly estimates the parameter means for all combinations of the involved factors.
Running head: ESTIMATING VWM MIXTURE MODELS IN BRMS 27

1 ZL_mixFormula <- bf(dev_rad ~ 1,


2 kappa1 ~ 0 + ageGroup:RI:cueCond +
3 (0 + RI:cueCond || gr(id, by = ageGroup)),
4 theta1 ~ 0 + ageGroup:RI:cueCond +
5 (0 + RI:cueCond || gr(id, by = ageGroup)),
6 kappa2 ~ 1)

7 The response error in this data set is captured in the “dev_rad” variable that codes the deviation

8 of the response from the target in radians. Therefore, we use this variable in the first line to

9 specify the main dependent variable. Next, we predict both the precision of memory responses

10 (kappa1) and the probability of recalling an item from memory (theta1) by all three independent

11 variables. This is done by specifying the full three-way interaction of all three independent

12 variables for the fixed effect part of the model formula. Combined with suppressing the intercept,

13 this will directly estimate both the precision of memory representations and the probability of

14 recalling an item from memory for all combinations of the factor levels of all independent

15 variables. Additionally, we estimate random effects reflecting individual differences in these

16 estimates for the two within subject factors. To avoid assuming that variability is equal between

17 younger and older adults, we grouped the id variable over age group to allow for different

18 degrees of variability across age groups (for details on these specific setting, please see the

19 documentation of the brmsformula function).


Running head: ESTIMATING VWM MIXTURE MODELS IN BRMS 28

1 Using the adapted model formula, we estimated the parameters of the two-parameter

2 mixture model with the same priors as set in the first example. The posterior of the parameter

3 estimates across the experimental conditions for both age groups are shown in Figure 7. Again,

4 we added the parameter estimates from the original publication to verify that the brms

5 application converges with the original results. As already found in the original publication there

6 were clear benefits for the probability of recalling an item from memory with at least one retro-

7 cue compared to no cue for both younger and older adults. Additionally, older adults generally

8 had a lower probability of recalling an item from memory than younger adults (see panel A of

9 Figure 7), as well as a lower precision of memory representations (see panel B of Figure 7).

Figure 7. Reproduction of the results by Loaiza & Souza (2018) using the brms implementation to estimate
parameters from the two-parameter mixture model. The probability of recalling an item from memory (Pmem) is
shown in panel A (left side), and the imprecision of memory representation (SD of the von Mises) is shown in panel
B (right side). The distributions shaded in black to light gray illustrate the whole posterior distribution of the
respective estimates for the different number of cues. The dot indicates the posterior mean, and the line the 95%
highest density interval of posterior estimates. The diamond indicates the average estimate from the original
publication.
Running head: ESTIMATING VWM MIXTURE MODELS IN BRMS 29

Figure 8. Example of data structure for the three-parameter model. One trial per set size is shown.

1 Example 3: Estimating the three-parameter mixture model

2 In this example, we show how to estimate Bays et al.’s (2009) three-parameter mixture

3 model using bmm. We apply the model to the data reported by Bays et al (2009), which was a

4 simple set size experiment akin to the one reported by Zhang and Luck (2008). Participants

5 performed a continuous reproduction task where 1, 2, 4 or 6 colors were presented

6 simultaneously on the screen.

7 First, we need to make sure that the data is in the correct format. The model expects that

8 the outcome variable is response error relative to the target, and that the positions of the non-

9 targets are also coded relative to the target. For example, if the target was a color with value 0.7,

10 and the non-targets were 0.5, 0.9, and 1.1, they need to be re-coded by subtracting the target

11 value, i.e., -0.2, 0.2 and 0.4. Each non-target value should be stored in a separate column. If

12 different set sizes are presented, there should be as many non-target columns as the maximum set

13 size – 1, and for set sizes less than that, values in the extraneous columns should be coded as NA

14 or zero. Figure 8 shows an example.

15 Then, just like in Section “How to do all steps with bmm”, we specify the model formula:

16 ff <- bf(RespErr ~ 1,
17 kappa ~ 0 + setsize + (0 + setsize || subID),
18 thetat ~ 0 + setsize + (0 + setsize || subID),
19 thetant ~ 0 + setsize + (0 + setsize || subID))
Running head: ESTIMATING VWM MIXTURE MODELS IN BRMS 30

1 where kappa is the precision of the von Mises distributions, thetat is the mixing proportion for

2 the target responses, and thetant is the mixing proportion for the non-target responses, setsize is a

3 factor variable, and subID is the subject number. In this model, following Bays et al. (2009), we

4 allow all parameters of the model to vary as a function of set size, and we also estimate a random

5 slope for this effect for each subject.

6 Then, we can estimate this model using the fit_model function for the bmm package:

7 fit_bays2009 <- bmm::fit_model(formula = ff,


8 data = data_Bays2009,
9 model_type = '3p',
10 non_targets=paste0('Pos_Lure', 1:5),
11 setsize="setsize",
12 warmup=1000, iter=2000, parallel=TRUE)
13
14 You need to tell the function whether you want to estimate the two-parameter or the three-

15 parameter model (in fact you can fit both separately and compare them), the names of the

16 columns containing the relative non-target values (i.e., non_targets). If the experiment contains a

17 variable set size manipulation, you should provide the name of the column that contains the set

18 size variable to the argument setsize; otherwise, if the experiment always shows the same set

19 size, you need to specify the set size number.

20 Figure 9 shows the estimates of the brms model, and the estimates originally reported by

21 Bays et al. (2009). Despite some differences, all original parameter estimates lay within the 95%

22 highest density interval of the hierarchical model estimates, showing that the two estimation

23 techniques converge. The largest discrepancy occurs in the estimate of memory imprecision for

24 large set sizes – the original publication estimates are higher than those estimated by brms. We

25 will not examine this issue in detail because it is beyond the scope of this tutorial; however, one

26 possible reason is that the original estimates used the two-step procedure we described earlier,

27 where parameters are obtained separately for each individual. Given the small number of
Running head: ESTIMATING VWM MIXTURE MODELS IN BRMS 31

Figure 9. Reproduction of the results by Bays et al. (2009) using the brms implementation to estimate parameters
from the three-parameter mixture model. A: imprecision of memory representation (SD of the von Mises), B:
Probability of non-target responses, C: Probability of random responses (guessing). The distributions shaded in gray
illustrate the whole posterior distribution of the respective estimates. The dot indicates the posterior median, and the
line the 95% highest density interval of posterior estimates. The diamond indicates the average maximum likelihood
estimate from the original publication.

1 subjects (8), even one subject with unusually high imprecision would be enough to skew the

2 estimate upwards. Hierarchical estimation uses data from all subjects to inform parameters for

3 each individual subject, which often results in so called shrinkage, and it is usually associated

4 with better parameter recovery results (for example, see Appendix A).

5 The formula syntax and the bmm function are very flexible – almost any experimental

6 design can be specified in the model formula, including categorical and continuous predictors,

7 multifactor designs, within and between-subject designs, etc. Furthermore, you can pass any

8 additional sampling or options arguments to fit_model that you would pass to brms, which

9 provides maximum flexibility. For example, even though fit_model generates reasonable priors,

10 you can supplement or replace those and pass other priors to the prior argument. For more

11 details, see the brms and bmm documentation.


Running head: ESTIMATING VWM MIXTURE MODELS IN BRMS 32

1 Example 4: Using priors to implement informative constraints and improve parameter

2 estimation

3 For this example, we will demonstrate how you can put prior constraints on parameters.

4 To illustrate when this can be helpful, we will use an unpublished dataset from our lab, in which

5 people performed continuous color report task with variable set sizes from 1 to 8. We fit the two-

6 parameter model as described in Example 1. Figure 10A and 10B show the estimates for the

7 memory imprecision and probability of guessing as a function of set size. While the general

8 pattern is as expected, that is, worse performance with increasing set size, something unusual

9 stands out for set sizes 7 and 8. The estimated guessing probability is higher for set size 7

10 compared to set size 8, but with a higher memory precision. Based on existing literature, we

11 know this pattern is highly unlikely. The problem is that when guessing probability is relatively

12 high (pg > .40), the mixture model can fail to recover parameters and can mistake high

13 imprecision for guessing, leading to a trade-off in parameter estimates (Grange & Moore, 2022).

14 Following theoretical considerations and the existing literature (Oberauer & Lin, 2017;

15 van den Berg et al., 2012), we expect memory imprecision and guessing probability to increase

16 monotonically as a function of set size. Fortunately, the Bayesian framework allows us to set

17 prior constraints on the estimates to force them to increase monotonically. This will likely

18 provide enough information to the model so that it would not trade-off imprecision with

19 guessing. We chose this example, because it occurred in our real work and represents a relatively

20 complicated case, which will showcase the flexibility of the Bayesian estimation framework.
Running head: ESTIMATING VWM MIXTURE MODELS IN BRMS 33

Figure 10. Parameter estimates of the unconstrained (A and B) and the monotonic constrained (C and D) two-
parameter model in Example 4. A: imprecision of memory representation (SD of the von Mises), B: Probability of
random responses (guessing). The distributions shaded in gray illustrate the whole posterior distribution of the
respective estimates. The dot indicates the posterior median, and the line the 95% highest density interval of
posterior estimates.

1 To enforce monotonic increase of the parameters over set size, we need to do two things.

2 First, we need to change the contrasts in the model, and second, specify priors according to our

3 theoretical assumptions over the parameters. By default, all regression models in R, including

4 brms, use dummy coding for factors. This means, that the model estimates an intercept that

5 corresponds to the parameter value for the first factor level (in this case, set size 1), and
Running head: ESTIMATING VWM MIXTURE MODELS IN BRMS 34

1 regression coefficients for each other level of the factor, which correspond to the difference

2 between each factor level and the intercept. This default coding can be extracted via the

3 contrasts(data$setsize) command. In this case, this produces the following output:

4 2 3 4 5 6 7 8
5 1 0 0 0 0 0 0 0
6 2 1 0 0 0 0 0 0
7 3 0 1 0 0 0 0 0
8 4 0 0 1 0 0 0 0
9 5 0 0 0 1 0 0 0
10 6 0 0 0 0 1 0 0
11 7 0 0 0 0 0 1 0
12 8 0 0 0 0 0 0 1

13 The row numbers reflect the 8 different values of our factor (set size), and the columns reflect the

14 desired contrast, in this case the default dummy coding. We want to set up a different type of

15 contrast where each regression coefficient reflects the differences between the current factor

16 level and the previous factor level. This would allow us to tell the model that this difference

17 should always be positive or negative. Because the regression coefficients represent the

18 differences between every pair of neighboring factors, this would enforce a monotonic increase

19 over set size. This is not a tutorial on basic linear modeling, so without going into too much

20 detail, the necessary contrast looks like this:

21 2 3 4 5 6 7 8
22 1 0 0 0 0 0 0 0
23 2 1 0 0 0 0 0 0
24 3 1 1 0 0 0 0 0
25 4 1 1 1 0 0 0 0
26 5 1 1 1 1 0 0 0
27 6 1 1 1 1 1 0 0
28 7 1 1 1 1 1 1 0
29 8 1 1 1 1 1 1 1

30 which we can specify with the following function:

31
Running head: ESTIMATING VWM MIXTURE MODELS IN BRMS 35

1 contr_ordin <- matrix(0, nrow=8, ncol=7)


2 contr_ordin[lower.tri(contr_ordin)] <- 1
3 colnames(contr_ordin) <- 2:8
4 contrasts(data_popov$setsize) <- contr_ordin
5
6 With these contrasts, any regression model fit to this dataset, including brms, will

7 estimate the difference between each consecutive factor levels, instead of difference relative to

8 the intercept. We can set priors over these estimates to force them to be all non-negative, by

9 setting the lower bound of the estimate to be 0 (lb=0):

10 ff <- bf(anglediff ~ 1,
11 kappa ~ setsize,
12 thetat ~ setsize)
13 pr <- prior_('normal(0.0,0.8)', class='b', nlpar='kappa', lb=0) +
14 prior_('logistic(0,1)', class='b', nlpar='thetat', lb=0)

15 and then we estimate the model with:

16 fit_popov <- fit_model(formula = ff,


17 data = data_popov,
18 model_type = "2p",
19 parallel = TRUE,
20 warmup = 500,
21 iter = 1000,
22 prior=pr)

23 The new estimates are shown on Figure 10C and 10D. In comparison with panels A and

24 B, we can see that both the probability of guessing, and the imprecision increase monotonically

25 over set size, as we would expect based on prior theoretical and empirical work. Specifically, the

26 model no longer estimates a dip in imprecision only at set size 7, together with more guesses

27 relative to set size 8. It also produces a less inflated imprecision estimate for set size 8. A

28 surprising finding is that imposing monotonic constraints drastically reduced uncertainty in the

29 parameter estimates, particularly for the probability of guessing (compare the highest density

30 intervals in panel D vs panel B in Figure 10).


Running head: ESTIMATING VWM MIXTURE MODELS IN BRMS 36

1 This example illustrates a case when providing a somewhat informative prior to the

2 model helps it reduce the uncertainty around model estimates. Generally, imposing constraints

3 via informative priors should also reflect theoretically reasonable constraints. From our

4 perspective, the assumption that memory precision and probability of recall from memory

5 decline with larger set sizes aligns with previous findings and theoretical assumptions (Oberauer

6 & Lin, 2023; van den Berg et al., 2012; Zhang & Luck, 2008). Critically, the assumptions that

7 these parameters increase monotonically, includes that parameters of the model will no longer

8 change between set sizes when set size reaches a certain threshold (Pratte, 2020). Ultimately, the

9 goal of this example was to illustrate the possibilities researchers have in considering informative

10 priors to improve model estimation and testing theoretical assumptions (Haaf & Rouder, 2018).

11 Example 5: Comparing parameter estimates from the 3-parameter mixture model with the

12 inference measurement model

13 For our last example, we will demonstrate how you can estimate parameters from the

14 interference measurement model and illustrate what the interference measurement provides in

15 terms of theoretical interpretation that goes beyond the 3-parameter mixture model. For this, we

16 have re-analyzed data from Experiment 1 reported in Oberauer et al. (2017). This experiment

17 collected data from 20 young adults who had to do a continuous color reproduction task – a

18 variable number of color patches (from one up to eight) appeared on the screen, and then

19 participants had to report the color of one of the patches on a color wheel. For each set size 100

20 trials were collected.

21 First, we estimated parameters for the three-parameter mixture model from this data. For

22 this we used the fit_model function implemented in the bmm package and specified that we want

23 to fit the 3-parameter mixture model and vary all three parameters over set size:
Running head: ESTIMATING VWM MIXTURE MODELS IN BRMS 37

1 ff <- bf(devRad ~ 1,
2 kappa ~ 0 + SetSize + (0 + SetSize || ID),
3 thetat ~ 0 + SetSize + (0 + SetSize || ID),
4 thetant ~ 0 + SetSize + (0 + SetSize || ID))

5 fit_3pMM <- bmm::fit_model(


6 formula = ff,
7 data = df_OberauerLin2017_E1,
8 model_type = '3p',
9 non_targets = paste0('Item',2:8,'_Col_rad'),
10 setsize = "SetSize")

11 As for Example 3, we first specified the dependent variable in the formula. In this case

12 devRad is the deviation of the response from the target in radians. Then we estimated the

13 precision of memory responses (kappa), and both the proportion of target and non-target

14 responses (thetat & thetant) for all set sizes. We also included random effects for all estimated

15 parameters for all set sizes.

Figure 11. Estimates for the


three-parameter mixture model
for Experiment 1 reported by
Oberauer et al. (2017).
We freely estimated the
probability of recalling an item
from memory (Panel A), the
probability for swap errors (i.e.,
recalling a non-target, Panel B),
and the precision of memory
representations (Panel D). For
completeness we also show the
probability for guessing
responses (Panel C).
All plots show the posterior
mean (dot) and the 95% highest
density interval (line range), as
well as the full posteriors of the
respective estimate (grey
distribution) for all setsizes.
Running head: ESTIMATING VWM MIXTURE MODELS IN BRMS 38

1 The results for the parameters of the three parameters mixture model are displayed in

2 Fehler! Verweisquelle konnte nicht gefunden werden.. Consistent with the results from

3 previous examples, we see that the probability of recalling the target (Pmem) from memory

4 reduces from small to large set sizes (see Fehler! Verweisquelle konnte nicht gefunden

5 werden.A). Conversely, the probability of committing a swap error (Pswap), that is recalling one

6 of the non-target items increased from small to large set sizes (see Fehler! Verweisquelle

7 konnte nicht gefunden werden.B). Finally, the precision of memory responses (kappa) reduces

8 with set size (see Fehler! Verweisquelle konnte nicht gefunden werden.D). These results

9 replicate the results reported by Oberauer et al. (2017; Figure 20).

10 Neither the two-parameter nor the three-parameter mixture model provide any theoretical

11 foundation of the processes underlying the mixture of different response distributions. The

12 interference measurement model (IMM; Oberauer et al., 2017) attempts to fill this gap by

13 assuming that the probability of recalling an item from one of the different mixture distributions

14 is determined by several continuous sources of activation. Specifically, the IMM separates

15 background noise (n) from general activation (a) for all items presented in the current trial, and

16 context activation (c) for memory items that are associated with the context cued at retrieval. In

17 the full IMM it is additionally assumed that context activation can be generalized to non-targets

18 following a generalization gradient (s) that reflects the precision of cue-target associations on the

19 context dimension (for a more detailed description please see, Oberauer et al., 2017). However,

20 as the estimation of the s parameter requires the additional information of spatial distance

21 between target and non-targets and more data to be estimated with sufficient precision, we will

22 focus on the reduced IMM only including the a, b, and c parameters (IMMabc).
Running head: ESTIMATING VWM MIXTURE MODELS IN BRMS 39

1 Both the two-parameter and the three-parameter mixture models are special cases of the

2 full IMM (Oberauer et al., 2017). The IMMabc is mathematically equivalent to the three-

3 parameter mixture model and provides a re-parameterization of the recall probabilities into

4 different sources of activation. Additionally discarding the general activation a from the IMM

5 would result in a model equivalent to the two-parameter mixture model. The main benefit of the

6 IMM over the two- and the three-parameter mixture model is that its parameters are grounded in

7 an explanatory model of visual working memory (Oberauer & Lin, 2017) that provides

8 parameters linked to specific activation sources in working memory.

9 Fitting the IMM to data using the fit_model function implemented in the bmm packages is

10 like fitting the three-parameter or the two-parameter mixture model. First, we need to specify the

11 formula indicating which model parameters should be estimated dependent on which

12 experimental conditions:

13 ff <- bf(devRad ~ 1,
14 kappa ~ 0 + SetSize + (0 + SetSize || ID),
15 c ~ 0 + SetSize + (0 + SetSize || ID),
16 a ~ 0 + SetSize + (0 + SetSize || ID))
17
18 First, we again specified the dependent variable, devRad (i.e., the deviation of the response from

19 the target in radian). Then we specified that the precision of memory responses (kappa), and both

20 the context activation c and general activation a should be estimated for all set sizes. We again

21 included random effects for all estimated parameters for all set sizes.

22 Then, we can submit this formula to the fit_model function, now choosing the model type

23 “IMMabc” to estimate the IMMabc. Like for the three-parameter mixture model we additionally

24 have to specify the variables that contain the position of non-targets relative to the target in

25 radians (i.e. Item2_Col_rad to Item8_Col_rad).

26 fit_IMMabc <- bmm::fit_model(


Running head: ESTIMATING VWM MIXTURE MODELS IN BRMS 40

1 formula = ff,
2 data = df_OberauerLin2017_E1,
3 model_type = 'IMMabc',
4 non_targets = paste0('Item',2:8,'_Col_rad'),
5 setsize = "SetSize")

Figure 12. Parameter estimates for the IMMabc for Experiment 1 reported by Oberauer et al. (2017). We report the
posterior estimates (means, 95% highest density intervals and the full posteriors) of the context activation (Panel A)
and the general activation (Panel B) prior to being normalized through the softmax function Therefore, it is possible
to obtain negative activation values. With respect to interpretation these must be evaluated relative to the parameter
fixed for scaling. In this case, we fixed the background noise (n) to zero (illustrated by dotted red line in Panel B).
Panel C shows the estimates for the precision of memory representations. All parameters were allowed to vary
between set size.

6 The resulting parameter estimates of the IMMabc are displayed in Figure 12. Consistent

7 with previous results, the context activation (see Figure 12A), that is the strength of the

8 association between the color and the spatial location, decreases with larger set sizes (see

9 estimates of the full IMM in Oberauer et al., 2017). The general activation however remains

10 constant across set size, indicating that colors within each trial are held active at a similar

11 activation level independent of set size. At first sight the negative estimates for general activation

12 are surprising, however these estimates reflect the general activation a on the logarithmic scale

13 prior to the normalization by the softmax function. Thus, it is best to interpret the estimated

14 activations relative to the activation component fixed for scaling reasons. In this case, the

15 background noise was fixed for scaling. The negative estimates for the general activation

16 therefore indicate that these activations were lower than the background noise (dotted red line in
Running head: ESTIMATING VWM MIXTURE MODELS IN BRMS 41

1 Figure 12B), whereas context activation was higher than the background noise. Finally, like the

2 results from the three-parameter mixture model the precision of memory representations

3 decreases with set size (Panel C).

4 When comparing the results from the IMMabc to the three-parameter mixture model

5 results there are several things to note: 1) the reduction in context activation qualitatively

6 resembles the reduction in the Pmem parameter from the three-parameter mixture model, 2)

7 likewise the estimates for the precision of memory representations are practically identical for

8 the IMMabc and the three-parameter mixture model, however 3) the pattern of general activation

9 indicates no change (if anything, a reduction from small to large set sizes) whereas Pswap

10 increases with larger set sizes. This indicates that the increase is swap errors is mainly due to the

11 larger number of items a target can get confused with than each of the items having a higher

12 activation. In addition, the reduction in context activation also reduced the difference in

13 activation of the target relative to non-targets.

14 All in all, this example has illustrated that estimating the IMMabc is straightforward

15 using the implementation in the bmm package. Currently, this implementation per default fixes

16 the background noise to zero (on the logarithmic scale, reflecting a background noise of 1 on the

17 native scale) and freely estimates all other IMM parameters. The bmm package has also

18 implemented the two other versions of the IMM proposed by Oberauer et al. (2017)9. The

19 IMMbsc assumes that swap errors occur only as a function of generalization on the context

20 dimension and thus does not contain the general activation component but estimates the

21 generalization gradient (s) instead. The IMMfull combines confusions as a function of similarity

9
Please see the “scripts/bmm_examples” folder on the GitHub repository for examples implementing both
the IMMbsc and the IMMfull for Experiment 1 reported by Oberauer et al. (2017).
Running head: ESTIMATING VWM MIXTURE MODELS IN BRMS 42

1 on the context dimension, and confusions independent of similarity by estimating both the

2 generalization gradient (s) and general activation (a). To estimate both these models, users must

3 additionally provide the names for the variables coding the spatial distance between the target

4 and all non-targets. This is necessary for estimating the generalization gradient (s). An adapted

5 model specification could look like this:

6 ff <- bf(devRad ~ 1,
7 kappa ~ 0 + SetSize + (0 + SetSize || ID),
8 c ~ 0 + SetSize + (0 + SetSize || ID),
9 a ~ 0 + SetSize + (0 + SetSize || ID),
10 s ~ 0 + SetSize + (0 + SetSize || ID),
11 )

12 fit_IMMfull_mixMod <- fit_model(


13 formula = ff,
14 data = df_OberauerLin2017_E1,
15 model_type = 'IMMfull',
16 non_targets = paste0('Item',2:8,'_Col_rad'),
17 spaPos = paste0('Item',2:8,'_Pos_rad'),
18 setsize = "SetSize")
Running head: ESTIMATING VWM MIXTURE MODELS IN BRMS 43

1 General discussion

2 Measurement models of visual working memory have become a popular and useful tool

3 to decompose raw behavioral performance in continuous reproduction tasks into separate

4 meaningful model parameters (Oberauer et al., 2017). This tutorial described how researchers

5 can use an established package for Bayesian hierarchical estimates, brms, to estimate the two-

6 parameter mixture model (Zhang & Luck, 2008), the three-parameter mixture model (Bays et al.,

7 2009) and the interference measurement model (Oberauer et al., 2017). We also introduced a

8 new package, bmm, which makes it even easier to specify these models in brms. Additionally,

9 we provide a GitHub repository with well documented code for each of the five examples, which

10 can be a useful learning tool. The five examples we presented demonstrate the flexibility of these

11 implementations. Any model that can be specified with the brms formula syntax can be

12 estimated – including, single and multi-factorial designs, within- and between-subject designs,

13 continuous and categorical predictors, and various random effect structures. In contrast to the

14 typical two-step maximum likelihood procedure, researchers can predict different model

15 parameters as a function of different conditions, rather than allowing all parameters to vary

16 across all conditions.

17 When estimating mixture models with brms and bmm, users should keep several

18 important things in mind:

19 • All responses and item values should be coded in radians, not degrees

20 • The response variable should contain the response error, i.e., the response relative to

21 the target

22 • If estimating the three-parameter model or the interference measurement model, you

23 need to provide the values of the non-target items, relative to the target item. For the
Running head: ESTIMATING VWM MIXTURE MODELS IN BRMS 44

1 IMMbsc and the IMMfull you need to additionally provide the spatial distance of

2 non-targets to the target.

3 • For efficient estimation, brms transforms the kappa and probability parameters (as we

4 explained in Section Explanation of model parameters and specifying the model

5 formula). The output of the estimated model (via the summary() function) presents

6 these transformed parameters. Thus, you need to reverse the transformation by

7 exponentiating kappa values and putting thetat and thetant through the softmax

8 transformation, to obtain the probabilities associated with different distributions (see

9 the Examples on GitHub for some postprocessing routines).

10 • Before interpreting the estimated parameters, it is important to examine the overall fit

11 of the model (e.g., by visual inspection of the predicted vs. the observed data)

12 • For statistical inference, you could either use the highest density intervals of the

13 parameter estimates, or, alternatively, obtain Bayes Factors by comparing posterior

14 probabilities with prior probabilities for specific hypothesis or by comparing models

15 with and without the predictor of interest.

16 Theoretical considerations

17 Computational models of memory and cognition are often split into two classes –

18 measurement models and process or explanatory models. Both types of models decompose

19 behavior into separate meaningful parameters, but in contrast to process models, measurement

20 models typically do not provide mechanistic explanations for differences in experimental effects.

21 Instead, measurement models allow their parameters to vary across experimental conditions to

22 account for these differences. Nonetheless, measurement models provide considerable benefits

23 (Farrell & Lewandowsky, 2018; Frischkorn et al., 2022; Frischkorn & Schubert, 2018; Oberauer
Running head: ESTIMATING VWM MIXTURE MODELS IN BRMS 45

1 et al., 2017). They decompose the observed behavior into meaningful parameters and provide a

2 theoretically grounded interpretation of these parameters. Thus, measurement models enable

3 researchers to evaluate experimental effects on the level of these parameters instead of the

4 behavioral responses providing a more fine-grained perspective with respect to the potential

5 inference.

6 In particular, the interference measurement model is linked to a more complex

7 explanatory model of visual working memory (Oberauer & Lin, 2017, 2023) and thus inherits

8 some of the mechanistic processes implemented in this model. Therefore, its parameters can be

9 interpreted in terms of theoretical processes or activation sources for which the contribution to

10 observed behavior is clearly specified. The fact that both the two- and the three-parameter

11 mixture model are mathematically equivalent to special cases of the interference measurement

12 model additionally highlights that these models themselves do not provide evidence in favor of

13 either slot (Adam et al., 2017; Ngiam et al., 2022; Pratte, 2020; Zhang & Luck, 2008) or resource

14 accounts (S. Ma et al., 2022; W. J. Ma et al., 2014; van den Berg et al., 2014) of working

15 memory. Therefore, in fitting a two-parameter mixture model you should not assume that visual

16 working memory is limited by slots. To do so, additional assumptions would have to be added to

17 these models (e.g., memory precision following a specific function across set sizes) to account

18 for theoretical ideas put forth by slot or resource accounts of visual working memory.

19 Additionally, the fact that the IMM assumes continuous activations underlying the retrieval of

20 memory representations adds a third account to the explanation of capacity limits in visual

21 working memory, the binding hypothesis that assumes that capacity limits a person’s ability to

22 form and maintain bindings (Oberauer, 2021).


Running head: ESTIMATING VWM MIXTURE MODELS IN BRMS 46

1 Although the different measurement models introduced in this paper are linked to

2 different theoretical perspectives, the goal of this paper is not to discuss and compare these

3 theoretical models of working memory. Therefore, we deliberately did not weigh in on the

4 discussion which of the different measurement models is considered superior and better

5 supported by empirical data and refrained from an in-depth explanation and discussion of

6 theoretical advantages and disadvantages of these models. Instead, we pointed to relevant

7 resources were necessary.

8 The goal of this tutorial was to introduce the implementations of three measurement

9 models for continuous reproduction tasks in brms and introduce the bmm package that aims to

10 ease the use and application of these measurement models. We hope that the implementations we

11 introduced here will enable more researchers to compare these different models and contribute

12 towards the evaluation of benefits and problems of these different measurement models.

13 Generally, we think that analyzing data on the level of cognitive processes will provide more

14 refined insights into the effects of different experimental manipulations and advance our field

15 towards a more comprehensive explanation of working memory and its limited capacity.

16 Towards a broad range of easy-to-use cognitive measurement models

17 This tutorial is deliberately limited to mixture models that provide measurement models

18 for continuous reproduction tasks. Obviously, research on visual working memory and working

19 memory more generally uses a broad range of different tasks, procedures, and materials, such as

20 change detection paradigms, and digits, letters, or words as memoranda. Although it is

21 reasonable to assume that the distinction between recall from memory, random guessing, and

22 variable precision or strength of memory representations, should be relevant across a broad range

23 of tasks (Oberauer & Lewandowsky, 2019; Oberauer & Lin, 2023), modeling behavioral
Running head: ESTIMATING VWM MIXTURE MODELS IN BRMS 47

1 responses in change detection tasks or tasks using discrete stimulus material requires entirely

2 different implementations of these concepts (see for example Oberauer & Lewandowsky,

3 2019).10 Similarly, other recently proposed measurement models for continuous reproduction

4 tasks, such as the target competition confusability model (TCC; Schurgin et al., 2020) and the

5 signal discrimination model (SDM; Oberauer, 2021), do not use a continuous distribution, such

6 as the von Mises distribution, to model the recall error in continuous reproduction tasks and

7 instead introduce several additional steps and different distributional assumptions.11 Therefore,

8 we felt that to keep this tutorial accessible and limited in length it is best to focus only on a set of

9 models that uses similar distributional assumptions.

10 Nonetheless, the setup of the bmm package provides the foundation for the

11 implementation of a broad range of cognitive measurement models, while still retaining the

12 accessible and easy-to-use features that we highlighted in this tutorial. On a more abstract level,

13 the implementations for the measurement models we presented here illustrate that cognitive

14 measurement models can be specified as an extension of distributional models of observed data

15 (see Figure 13 for an illustration). Previous research has already highlighted the benefits of more

16 closely aligning the modeling of behavioral data with distributions that more adequately

17 resemble core features of the observed data (Haines et al., 2020). For example, instead of

10
First, the underlying response distribution for these tasks is different: In Change Detection Tasks
responses follow a binomial distribution, whereas with discrete stimulus material responses follow a multinomial
distribution of different responses or response categories. Second, these different response distributions necessitate a
different translation of the assumed cognitive states a person might be in, into the observed responses (Lin &
Oberauer, 2022). Due to these complications, we restricted ourselves to mixture models for continuous reproduction
tasks that all share the von Mises distribution as response distribution and assume behavioral responses to represent
a mixture of different von Mises distributions.
11
Specifically, they assume a continuous activation function (either a La Place or a von Mises distribution)
for the responses around the circle and model recall as a competitive selection of all 360 responses possible on the
color wheel. Formally, this resembles a multinomial response distribution for which the probability of selecting the
different response options is derived as a function of the continuous activation function. The dependencies in the
probabilities of recalling neighboring response options is introduced by the continuous activation function but not
included in the response scale being continuous.
Running head: ESTIMATING VWM MIXTURE MODELS IN BRMS 48

Figure 13. Illustration of the relationship between generalized linear models to cognitive measurement models.
Generalized linear models provide a distributional description of the observed data. For example, accuracy data that
can stem from a binomial process with a certain number of trials and probability of success, or reaction time data
can stem from a lognormal distribution with a certain mean and standard deviation. Cognitive measurement models
provide an additional decomposition of the distributional parameters, that is probabilities of success, means, or
standard deviations, into cognitive processes assumed to underlie the observed behavior. This often integrates
several distributional parameters or additional information about the experimental setup or stimuli.

1 modeling aggregated reaction times with standard gaussian models, using generalized linear

2 mixed models assuming the data to follow a lognormal or inverse gaussian distributions

3 dramatically improves the inference and strengthens the conclusion that can be draw from the

4 analyses (Boehm et al., 2018; Rouder & Haaf, 2019). The core insight of the here presented

5 implementations is that many cognitive measurement models can often be specified as

6 distributional models for which the distributional parameters of the generalized linear mixed

7 model are a function of cognitive measurement model parameters (again see Figure 13 for an

8 illustration). These functions that translate the cognitive measurement model parameters into

9 distributional parameters is what we essentially implemented for the three measurement models

10 discussed in this tutorial.

11 To avoid that researchers have to code the translation functions of cognitive model

12 parameters into distributional parameters themselves each time they want to use a specific

13 model, we developed the bmm package. Additionally, the bmm package will also perform some
Running head: ESTIMATING VWM MIXTURE MODELS IN BRMS 49

Figure 14. Flow chart for the functionality of the fit_model function implemented in the bmm package. The user
only needs to specify the inputs to the fit_model function, the bmm package then takes care of the rest.

1 additional checks for the data provided with the model, specify reasonable priors for the different

2 model parameters, and ensure that model estimation runs as efficiently as possible. This way,

3 data can be analyzed on the level of latent cognitive processes instead of observed behavior. For

4 this, you only have to specify the linear model formula predicting which of the cognitive model

5 parameters vary as a function of experimental manipulation and select the according model to be

6 fit in the fit_model function (see Figure 14).

7 As of now, the bmm package only includes the three cognitive measurement models

8 presented in this paper. Prospectively, we plan to extend the bmm package with implementations

9 of measurement models for a broad range of tasks, such as signal-detection models (DeCarlo,

10 1998; Vuorre, 2017), models for verbal working memory tasks (Oberauer & Lewandowsky,

11 2019), more recent models for continuous reproduction tasks (Oberauer, 2021; Schurgin et al.,

12 2020), as well as models for reaction time data (Annis et al., 2017; Peña & Vandekerckhove,

13 2024), while still retaining the easy and accessible usability we presented here. But given the

14 additional coding work that would be required for these implementations, the time it would take

15 to explore them, and an in-depth introduction how to use these models, this was beyond the

16 scope of this tutorial. Instead, this tutorial constitutes the first step in the development of a more
Running head: ESTIMATING VWM MIXTURE MODELS IN BRMS 50

1 general and broadly applicable package aiming to ease the use and application of cognitive

2 measurement models in psychology in general.

3 Conclusion

4 We have demonstrated how to implement three different kinds of mixture models for

5 visual working memory tasks in the R package brms. Additionally, we provide useful wrapper

6 functions to estimate these models in a newly developed R package for Bayesian measurement

7 models, bmm. To ease the comprehension of our examples we share R code for both the

8 implementation in brms without relying on bmm functions and the implementation using the

9 bmm function in a GitHub repository. We hope that these implementations will enable more

10 researchers to fit mixture models to visual working memory tasks with continuous reproduction

11 recall. We are curious to see how benefits of the current implementation, for example the

12 possibility to predict parameters of the implemented models by continuous predictors will aid

13 researchers in gaining insight into questions that could not be addressed with other

14 implementations of these measurement models so far.


Running head: ESTIMATING VWM MIXTURE MODELS IN BRMS 51

1 Appendix A: Parameter recovery simulations of the non-hierarchical vs hierarchical

2 estimation of the two-parameter mixture model12

3 In this simulation, we will demonstrate how the non-hierarchical maximum likelihood


4 estimates (MLE) of the two parameter mixture model are not recovered well with few
5 observations per participant (i.e. N=50). At the same time, the hierarchical version of the
6 model, implemented in the brms package does a much better job at recovering the
7 parameters for each participant.
8 To start, let’s load the relevant packages, as well as the saved output of the brms model.
9 This will save us time, as we don’t have to refit the model. If you want, you can replicate the
10 analyses without preloading the model fit, by uncommenting all relevant lines and running
11 the notebook. Note that the model fitting will take a few hours.
12 suppressPackageStartupMessages({
13 library(tidyverse)
14 library(here)
15 library(brms)
16 library(patchwork)
17 })
18 options(mc.cores = parallel::detectCores())
19 load(here('output/par_rec_multi_level_simulation.RData'))

20 Generate synthetic data

21 First, we will generate synthetic continuous reproduction data. Each of N=20 participants
22 contributes Nobs=50 observations in two conditions A and B. We selected 50 observations,
23 because from our experience, with so few observations the MLE non-hierarchical method
24 often fails to recover the correct individual parameters, so it will serve as a good showcase.
25 We assume that participants vary in their memory precision (i.e. the 𝜅 parameter of the
26 von Mises distribution) and in the likelihood that their response comes from memory (𝜌;
27 here we specify it as 𝜃, which is the logit transformed 𝜌, because that is how brms
28 estimates it). Therefore, the observations for each participant in conditions A and B are
29 generated as follows, where j indexes participants, 𝛥𝜌 and 𝛥𝜅 are the differences in the
30 parameters between conditions B and A, 𝑣𝑀 is the von Mises distribution and 𝒰 is the
31 uniform distribution:
32 𝑦92 ∼ 𝜌2 ⋅ 𝑣𝑀P0, 𝜅2 Q + P1 − 𝜌2 Q ⋅ 𝒰(−𝜋, 𝜋)

12
This appendix is adapted from a Rmarkdown notebook, which is available at:
https://github.com/GidonFrischkorn/Tutorial-MixtureModel-
VWM/blob/main/scripts/parameter_recovery_multi_level_simulation.Rmd
Running head: ESTIMATING VWM MIXTURE MODELS IN BRMS 52

1 𝑦:2 ∼ P𝜌2 + 𝛥𝜌2 Q ⋅ 𝑣𝑀P0, 𝜅2 + 𝛥𝜅2 Q + P1 − 𝜌2 − 𝛥𝜌2 Q ⋅ 𝒰(−𝜋, 𝜋)

2 𝜅 ∼ 𝒩(10,2.5)
3 𝛥𝜅 ∼ 𝒩(20,5)
𝑒0
4 𝜌=
1 + 𝑒0
5 𝜃 ∼ 𝒩(1,0.5)
6 𝛥𝜃 ∼ 𝒩(−1,0.25)
7 In simple terms, participants on average have a precision of 𝜅 = 10 in Condition A, but they
8 vary around that estimate (SD=2.5). In condition B, participants’ precision is increased on
9 average by 𝛥𝜅 = 20 relative to Condition A, but this also varies (SD=5). Similarly for the
10 probability in memory (𝜌) parameter. The code below accomplishes this (parts are
11 commented out, because we load these values at the beginning of the script):
12
13 #### generate synthetic data
14
15 #### first, set population parameters
16 N = 20
17 Nobs = 50
18 kappa1a_mu = 10
19 kappa1a_sd = 2.5
20 kappa1delta_mu = 20
21 kappa1delta_sd = 5
22 theta1a_mu = 1
23 theta1a_sd = 0.5
24 theta1delta_mu = -1
25 theta1delta_sd = 0.25
26
27 #### generate participant parameters (for theta, logit units)
28 # kappa1a_i = rnorm(N, kappa1a_mu, kappa1a_sd)
29 # kappa1delta_i = rnorm(N, kappa1delta_mu, kappa1delta_sd)
30 # theta1a_i = rnorm(N, theta1a_mu, theta1a_sd)
31 # theta1delta_i = rnorm(N, theta1delta_mu, theta1delta_sd)
32
33 #### put parameters together
34 # true_pars <- data.frame(id = rep(1:N, 2), condition = rep(c('A','B')
35 , each=N),
36 # kappa = c(kappa1a_i, kappa1a_i+kappa1delta_i),
37 # pmem = gtools::inv.logit(c(theta1a_i, theta1a_i+
38 theta1delta_i)))
39
Running head: ESTIMATING VWM MIXTURE MODELS IN BRMS 53

1
2 #### simulate data for each trial
3 # dat <- data.frame()
4 # for (i in 1:N) {
5 # A_n = floor(Nobs*gtools::inv.logit(theta1a_i[i]))
6 # B_n = floor(Nobs*gtools::inv.logit(theta1a_i[i]+theta1delta_i[i]))
7 # datA <- data.frame(y=c(rvon_mises(A_n, 0, kappa1a_i[i]), runif(Nob
8 s-A_n,-pi,pi)), condition = "A", id = i)
9 # datB <- data.frame(y=c(rvon_mises(B_n, 0, kappa1a_i[i]+kappa1delta
10 _i[i]), runif(Nobs-B_n,-pi,pi)), condition = "B", id = i)
11 # DAT <- bind_rows(datA,datB)
12 # dat <- bind_rows(dat,DAT)
13 # }

14 Visualize synthetic data

15 Here are the parameters for each participant and condition:


16 true_pars
17 ## id condition kappa pmem
18 ## 1 1 A 9.478360 0.8627272
19 ## 2 2 A 11.715649 0.6834658
20 ## 3 3 A 13.740791 0.7820907
21 ## 4 4 A 4.692635 0.7039524
22 ## 5 5 A 9.656958 0.7651808
23 ## 6 6 A 10.968378 0.7203603
24 ## 7 7 A 13.598416 0.6795157
25 ## 8 8 A 10.010820 0.5860563
26 ## 9 9 A 13.366425 0.5915341
27 ## 10 10 A 12.204754 0.6769404
28 ## 11 11 A 10.568784 0.6493821
29 ## 12 12 A 8.888721 0.6566303
30 ## 13 13 A 12.475556 0.6142899
31 ## 14 14 A 11.251142 0.7795837
32 ## 15 15 A 8.391865 0.7743183
33 ## 16 16 A 10.207182 0.7079415
34 ## 17 17 A 9.736036 0.5572051
35 ## 18 18 A 7.835275 0.8916764
36 ## 19 19 A 8.278295 0.8525931
37 ## 20 20 A 13.322670 0.7690013
38 ## 21 1 B 28.923805 0.7063868
39 ## 22 2 B 31.202578 0.3832080
40 ## 23 3 B 32.503713 0.5396529
41 ## 24 4 B 20.241197 0.4716653
42 ## 25 5 B 29.987560 0.4270660
Running head: ESTIMATING VWM MIXTURE MODELS IN BRMS 54

1 ## 26 6 B 30.961803 0.5047793
2 ## 27 7 B 35.266937 0.4891554
3 ## 28 8 B 19.996771 0.3925293
4 ## 29 9 B 37.457545 0.2399274
5 ## 30 10 B 30.225545 0.4020229
6 ## 31 11 B 33.576452 0.3754082
7 ## 32 12 B 24.988404 0.4132181
8 ## 33 13 B 41.206685 0.4409797
9 ## 34 14 B 36.394617 0.5327623
10 ## 35 15 B 27.157000 0.6347218
11 ## 36 16 B 21.471546 0.3948013
12 ## 37 17 B 30.043178 0.3331486
13 ## 38 18 B 28.469415 0.8281906
14 ## 39 19 B 25.249893 0.6915995
15 ## 40 20 B 35.677790 0.5827495
16 true_pars %>%
17 gather(par, value, kappa, pmem) %>%
18 ggplot(aes(value)) +
19 geom_histogram(bins=10) +
20 facet_grid(condition ~ par, scales="free") +
21 theme_bw()
Running head: ESTIMATING VWM MIXTURE MODELS IN BRMS 55

1
2 And here is the distribution of errors, characterized by higher precision in condition B, but
3 with more guessing:
4 dat %>%
5 ggplot(aes(y)) +
6 geom_density(aes(color=condition)) +
7 theme_bw() +
8 xlab('Angle error')
Running head: ESTIMATING VWM MIXTURE MODELS IN BRMS 56

1
2 And as you can see, there is quite some variability by participant:
3 ggplot(dat, aes(y, color=condition)) +
4 geom_density() +
5 facet_wrap(~id) +
6 theme_bw()

7
8
Running head: ESTIMATING VWM MIXTURE MODELS IN BRMS 57

2 Fit non-hierarchical MLE version of the mixture model

3 As is standard in the literature, we will first fit the mixture model using MLE separately to
4 each participant and condition. We accomplish this using a couple of custom functions, but
5 the same can be achieved with some existing R packages.
6 # 2-p likelihood function
7 LL <- function(dat) {
8 LL_resp <- function(par) {
9 y = dat$y
10 kappa = exp(par[1])
11 pmem = gtools::inv.logit(par[2])
12 lik_vm <- brms::dvon_mises(y,mean(y),kappa)
13 lik_un <- brms::dvon_mises(y,mean(y),0)
14 lik <- pmem*lik_vm+(1-pmem)*lik_un
15 LL <- -sum(log(lik))
16 }
17 }
18
19 # function for fit and return parameter estimates
20 fit_mixture <- function(dat) {
21 require(stats4)
22 LL_resp <- LL(dat)
23 fit <- optim(c(logkappa=2,theta=1), LL_resp)
24 coef = as.data.frame(t(fit$par))[1,]
25 coef$convergence <- fit$convergence
26 coef$kappa = exp(coef$logkappa)
27 coef$pmem = gtools::inv.logit(coef$theta)
28 return(coef)
29 }
30
31
32 # estimate parameters separately for each participant and condition
33 mle_est <- dat %>%
34 group_by(id, condition) %>%
35 do({fit_mixture(.)}) %>%
36 arrange(condition, id)
37 ## Loading required package: stats4

38
39 First, we check that all fits have converged:
Running head: ESTIMATING VWM MIXTURE MODELS IN BRMS 58

1
2 mean(mle_est$convergence == 0)
3 ## [1] 1

4
5 And now, we can check how well the individual participant parameters have been
6 recovered. We plot the estimated parameters (y-axis) vs the true generating parameters (x-
7 axis):
8
9 r_kappa <- round(cor.test(true_pars$kappa, mle_est$kappa)$est,2)
10 r_pmem <- round(cor.test(true_pars$pmem, mle_est$pmem)$est,2)
11
12 p1 <- left_join(true_pars, mle_est, by=c('id','condition')) %>%
13 ggplot(aes(kappa.x, kappa.y, color=condition)) +
14 geom_point() +
15 ggtitle('Kappa') +
16 xlab('True parameter') +
17 ylab('MLE estimate') +
18 theme_bw() +
19 annotate('text', x=15, y=40, label=paste0('r(40) = ', r_kappa)) +
20 geom_abline(intercept=0, slope=1) +
21 theme(legend.position="")
22
23 p2 <- left_join(true_pars, mle_est, by=c('id','condition')) %>%
24 ggplot(aes(pmem.x, pmem.y, color=condition)) +
25 geom_point() +
26 ggtitle('Pmem') +
27 xlab('True parameter') +
28 ylab('MLE estimate') +
29 theme_bw() +
30 annotate('text', x=0.4, y=0.8, label=paste0('r(40) = ', r_pmem)) +
31 geom_abline(intercept=0, slope=1)
32
33 p1+p2
Running head: ESTIMATING VWM MIXTURE MODELS IN BRMS 59

1
2
3 As we can, the non-hierarchical MLE estimates of Pmem are pretty good; however, the
4 model fails to accurately estimate the individual kappa parameters. With only 50
5 observations per participant and condition, that’s normal in our experience. Notably, from
6 these estimates, a researcher might erroneously conclude that our manipulation affects
7 only pmem, but not kappa (which, interestingly, happens often in published research on
8 VWM). Let’s see if the hierarchical model can do better.
9
10

11 Fit hierarchical version with brms

12 The code is commented out, because it takes several hours to fit. The results are preloaded
13 at the begining of the script, so they can be accessed by the relevant object names.
14 # # create mixture of von Mises distributions
15 # mix_vonMises <- mixture(von_mises,von_mises,order = "none")
16 #
17 # # set up mixture model. allow kappa and theta to vary by condition
18 # bf_mixture <- bf(y ~ 1,
19 # kappa1 ~ condition + (condition||id),
20 # kappa2 ~ 1,
21 # theta1 ~ condition + (condition||id))
22 #
23 #
24 # # check default priors
25 # get_prior(bf_mixture, dat, mix_vonMises)
Running head: ESTIMATING VWM MIXTURE MODELS IN BRMS 60

1 #
2 ## constrain priors. Set mean of von mises distributions to be 0; set
3 kappa of the
4 # # second von mises distribution to be very low, approximating a unif
5 orm distribution
6 # mix_priors <- prior(constant(0), class = Intercept, dpar = "mu1") +
7 # prior(constant(0), class = Intercept, dpar = "mu2") +
8 # prior(constant(-100), class = Intercept, dpar = "kappa2")
9 #
10 # brms_fit <- brm(bf_mixture, dat, mix_vonMises, mix_priors)

11
12 Let’s examine the model. Convergence of all parameters looks good (Rhat < 1.01, and high
13 Tail_ESS; no warnings).
14
15 brms_fit
16 ## Family: mixture(von_mises, von_mises)
17 ## Links: mu1 = tan_half; kappa1 = log; mu2 = tan_half; kappa2 = log
18 ; theta1 = identity; theta2 = identity
19 ## Formula: y ~ 1
20 ## kappa1 ~ condition + (condition || id)
21 ## kappa2 ~ 1
22 ## theta1 ~ condition + (condition || id)
23 ## Data: dat4 (Number of observations: 2000)
24 ## Samples: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
25 ## total post-warmup samples = 4000
26 ##
27 ## Group-Level Effects:
28 ## ~id (Number of levels: 20)
29 ## Estimate Est.Error l-95% CI u-95% CI Rhat Bul
30 k_ESS
31 ## sd(kappa1_Intercept) 0.22 0.10 0.03 0.42 1.00
32 975
33 ## sd(kappa1_conditionB) 0.17 0.12 0.01 0.46 1.00
34 1590
35 ## sd(theta1_Intercept) 0.53 0.14 0.28 0.83 1.00
36 956
37 ## sd(theta1_conditionB) 0.26 0.17 0.01 0.64 1.00
38 722
39 ## Tail_ESS
40 ## sd(kappa1_Intercept) 728
41 ## sd(kappa1_conditionB) 2029
42 ## sd(theta1_Intercept) 1853
Running head: ESTIMATING VWM MIXTURE MODELS IN BRMS 61

1 ## sd(theta1_conditionB) 1087
2 ##
3 ## Population-Level Effects:
4 ## Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ES
5 S Tail_ESS
6 ## mu1_Intercept 0.00 0.00 0.00 0.00 1.00 400
7 0 4000
8 ## kappa1_Intercept 2.44 0.09 2.27 2.63 1.00 327
9 9 2446
10 ## mu2_Intercept 0.00 0.00 0.00 0.00 1.00 400
11 0 4000
12 ## kappa2_Intercept -100.00 0.00 -100.00 -100.00 1.00 400
13 0 4000
14 ## theta1_Intercept 0.92 0.15 0.63 1.22 1.00 188
15 2 2779
16 ## kappa1_conditionB 0.93 0.13 0.68 1.18 1.00 427
17 8 3091
18 ## theta1_conditionB -1.10 0.14 -1.38 -0.82 1.00 447
19 4 3181
20 ##
21 ## Samples were drawn using sampling(NUTS). For each parameter, Bulk_E
22 SS
23 ## and Tail_ESS are effective sample size measures, and Rhat is the po
24 tential
25 ## scale reduction factor on split chains (at convergence, Rhat = 1).

26
27 Now, the values above don’t tell us much, because in brms, kappa is log transformed and
28 theta is the logit/softmax transformation of pmem. Therefore, we first need to extract the
29 random effects, which are the parameter estimates for each participant, and then transform
30 the parameters into the relevant scale:
31
32 ranefs <- ranef(brms_fit)$id
33 logkappa <- c(fixef(brms_fit)['kappa1_Intercept','Estimate']+ranefs[,'
34 Estimate','kappa1_Intercept'],
35 fixef(brms_fit)['kappa1_Intercept','Estimate']+ranefs[,'
36 Estimate','kappa1_Intercept']+
37 fixef(brms_fit)['kappa1_conditionB','Estimate']+ranefs[,
38 'Estimate','kappa1_conditionB'])
39
40 theta = c(fixef(brms_fit)['theta1_Intercept','Estimate']+ranefs[,'Esti
41 mate','theta1_Intercept'],
42 fixef(brms_fit)['theta1_Intercept','Estimate']+ranefs[,'Esti
43 mate','theta1_Intercept']+
Running head: ESTIMATING VWM MIXTURE MODELS IN BRMS 62

1 fixef(brms_fit)['theta1_conditionB','Estimate']+ranefs[,'Est
2 imate','theta1_conditionB'])
3
4 brms_est <- data.frame(id = rep(1:N, 2),
5 condition = rep(c('A','B'), each=N),
6 kappa = exp(logkappa),
7 pmem = c(gtools::inv.logit(theta)))

8
9 As with the MLE model before, we can now check how well the individual participant
10 parameters have been recovered. We plot the estimated parameters (y-axis) vs the true
11 generating parameters (x-axis):
12
13 r_kappa <- round(cor.test(true_pars$kappa, brms_est$kappa)$est,2)
14 r_pmem <- round(cor.test(true_pars$pmem, brms_est$pmem)$est,2)
15
16 p3 <- left_join(true_pars, brms_est, by=c('id','condition')) %>%
17 ggplot(aes(kappa.x, kappa.y, color=condition)) +
18 geom_point() +
19 ggtitle('Kappa') +
20 xlab('True parameter') +
21 ylab('BRMS estimate') +
22 theme_bw() +
23 annotate('text', x=15, y=40, label=paste0('r(40) = ', r_kappa)) +
24 geom_abline(intercept=0, slope=1) +
25 theme(legend.position="")
26
27 p4 <- left_join(true_pars, brms_est, by=c('id','condition')) %>%
28 ggplot(aes(pmem.x, pmem.y, color=condition)) +
29 geom_point() +
30 ggtitle('Pmem') +
31 xlab('True parameter') +
32 ylab('BRMS estimate') +
33 theme_bw() +
34 annotate('text', x=0.4, y=0.8, label=paste0('r(40) = ', r_pmem)) +
35 geom_abline(intercept=0, slope=1)
36
37 p3+p4
Running head: ESTIMATING VWM MIXTURE MODELS IN BRMS 63

1
2
3 And voila - we can see that the brms model does a drastically better job at recovering the
4 kappa parameter, relative to the MLE non-hierarchical version. This is because in the
5 hierarchical model, the data from all participants partially informs parameter estimates for
6 each individual participant, which is particularly helpful when we only have a few
7 observations per participant. When considering both conditions combined, the correlations
8 between the true kappa and the estimated kappa increased from 0.41 in the mle estimate
9 to 0.93 in the brms estimate. Within each condition the correlation is lower (0.57 for
10 condition A and 0.51 for condition B), but these are still several times higher than the mle
11 estimates (0.26 for condition A and 0.17 for condition B). Thus, while only 50 observations
12 per condition are still not enough for a very reliable estimation with brms of within-
13 condition individual differences, the estimates are drastically improved relative to the mle
14 implementation, which fail to recover any information related to the kappa parameter.
15
Running head: ESTIMATING VWM MIXTURE MODELS IN BRMS 64

1 Open Practices Statement

2 All code and data used in this tutorial is openly available on GitHub:

3 https://github.com/GidonFrischkorn/Tutorial-MixtureModel-VWM/tree/main/scripts/. The bmm

4 package can be installed from GitHub as well: https://github.com/venpopov/bmm. The output for

5 the models presented here is available on OSF: https://osf.io/vsrz4/


Running head: ESTIMATING VWM MIXTURE MODELS IN BRMS 65

1 References

2 Adam, K. C. S., Vogel, E. K., & Awh, E. (2017). Clear evidence for item limits in visual

3 working memory. Cognitive Psychology, 97, 79–97. https://doi.org/10/gbz4sf

4 Annis, J., Miller, B. J., & Palmeri, T. J. (2017). Bayesian inference with Stan: A tutorial

5 on adding custom distributions. Behavior Research Methods, 49(3), 863–886.

6 https://doi.org/10/gf5smk

7 Bays, P. M., Catalao, R. F. G., & Husain, M. (2009). The precision of visual working

8 memory is set by allocation of a shared resource. Journal of Vision, 9(10), 7–7.

9 https://doi.org/10.1167/9.10.7

10 Boehm, U., Marsman, M., Matzke, D., & Wagenmakers, E.-J. (2018). On the importance

11 of avoiding shortcuts in applying cognitive models to hierarchical data. Behavior Research

12 Methods, 50(4), 1614–1631. https://doi.org/10/gd6dx2

13 Bürkner, P.-C. (2017). brms: An R Package for Bayesian Multilevel Models Using Stan.

14 Journal of Statistical Software, 80(1), 1–28. https://doi.org/10/gddxwp

15 Bürkner, P.-C. (2018a). Advanced Bayesian multilevel modeling with the R package

16 brms. The R Journal. https://doi.org/10/gfxzpn

17 Bürkner, P.-C. (2018b). brms: Bayesian Regression Models using “Stan” (2.5.0)

18 [Computer software]. https://CRAN.R-project.org/package=brms

19 Carpenter, B., Gelman, A., Hoffman, M. D., Lee, D., Goodrich, B., Betancourt, M.,

20 Brubaker, M., Guo, J., Li, P., & Riddell, A. (2017). Stan: A Probabilistic Programming

21 Language. Journal of Statistical Software, 76(1), 1–32. https://doi.org/10/b2pm

22 DeCarlo, L. T. (1998). Signal detection theory and generalized linear models.

23 Psychological Methods, 3(2), 186–205. https://doi.org/10.1037/1082-989X.3.2.186


Running head: ESTIMATING VWM MIXTURE MODELS IN BRMS 66

1 Farrell, S., & Lewandowsky, S. (2018). Computational Modeling of Cognition and

2 Behavior. Cambridge University Press. https://doi.org/10.1017/CBO9781316272503

3 Frischkorn, G. T., & Schubert, A.-L. (2018). Cognitive Models in Intelligence Research:

4 Advantages and Recommendations for Their Application. Journal of Intelligence, 6(3), 34.

5 https://doi.org/10/gd3vqn

6 Frischkorn, G. T., Wilhelm, O., & Oberauer, K. (2022). Process-oriented intelligence

7 research: A review from the cognitive perspective. Intelligence, 94, 101681.

8 https://doi.org/10.1016/j.intell.2022.101681

9 Grange, J. A., & Moore, S. B. (2022). mixtur: An R package for designing, analysing,

10 and modelling continuous report visual short-term memory studies. Behavior Research Methods,

11 54(5), 2071–2100. https://doi.org/10.3758/s13428-021-01688-1

12 Grange, J. A., Moore, S. B., & Berry, E. D. J. (2021). mixtur: Modelling Continuous

13 Report Visual Short-Term Memory Studies (1.2.0) [Computer software]. https://CRAN.R-

14 project.org/package=mixtur

15 Haaf, J. M., & Rouder, J. N. (2018). Some do and some don’t? Accounting for variability

16 of individual difference structures. Psychonomic Bulletin & Review. https://doi.org/10/gd8gzb

17 Haines, N., Kvam, P. D., Irving, L. H., Smith, C., Beauchaine, T. P., Pitt, M. A., Ahn,

18 W.-Y., & Turner, B. (2020). Learning from the Reliability Paradox: How Theoretically Informed

19 Generative Models Can Advance the Social, Behavioral, and Brain Sciences.

20 https://doi.org/10/gg8662

21 Hardman, K. (2017). CatContModel: Categorical and Continuous working memory

22 models for delayed estimation tasks (Version 0.8.0) [Computer software] [R].

23 https://github.com/hardmanko/CatContModel/releases/tag/v0.8.0 (Original work published 2016)


Running head: ESTIMATING VWM MIXTURE MODELS IN BRMS 67

1 Lin, H.-Y., & Oberauer, K. (2022). An interference model for visual working memory:

2 Applications to the change detection task. Cognitive Psychology, 133, 101463.

3 https://doi.org/10.1016/j.cogpsych.2022.101463

4 Loaiza, V. M., & Souza, A. S. (2018). Is refreshing in working memory impaired in older

5 age? Evidence from the retro-cue paradigm. Annals of the New York Academy of Sciences,

6 1424(1), 175–189. https://doi.org/10.1111/nyas.13623

7 Ma, S., Popov, V., & Zhang, Q. (2022). A Neural Index Reflecting the Amount of

8 Cognitive Resources Available during Memory Encoding: A Model-based Approach (p.

9 2022.08.16.504058). bioRxiv. https://doi.org/10.1101/2022.08.16.504058

10 Ma, W. J., Husain, M., & Bays, P. M. (2014). Changing concepts of working memory.

11 Nature Neuroscience, 17(3), 347–356. https://doi.org/10/gd8cbs

12 Ngiam, W. X. Q., Foster, J. J., Adam, K. C. S., & Awh, E. (2022). Distinguishing guesses

13 from fuzzy memories: Further evidence for item limits in visual working memory. Attention,

14 Perception, & Psychophysics. https://doi.org/10.3758/s13414-022-02631-y

15 Oberauer, K. (2021). Measurement models for visual working memory—A factorial

16 model comparison. Psychological Review. https://doi.org/10.1037/rev0000328

17 Oberauer, K., & Lewandowsky, S. (2019). Simple measurement models for complex

18 working-memory tasks. Psychological Review, 126(6), 880–932.

19 https://doi.org/10.1037/rev0000159

20 Oberauer, K., & Lin, H.-Y. (2017). An interference model of visual working memory.

21 Psychological Review, 124(1), 21–59. https://doi.org/10.1037/rev0000044

22 Oberauer, K., & Lin, H.-Y. (2023). An interference model for visual and verbal working

23 memory. PsyArXiv. https://doi.org/10.31234/osf.io/eyknx


Running head: ESTIMATING VWM MIXTURE MODELS IN BRMS 68

1 Oberauer, K., Stoneking, C., Wabersich, D., & Lin, H.-Y. (2017). Hierarchical Bayesian

2 measurement models for continuous reproduction of visual features from working memory.

3 Journal of Vision, 17(5), 11. https://doi.org/10.1167/17.5.11

4 Peña, A. F. C. D. la, & Vandekerckhove, J. (2024). An EZ Bayesian hierarchical drift

5 diffusion model for response time and accuracy. https://doi.org/10.31234/osf.io/yg9b5

6 Popov, V., So, M., & Reder, L. M. (2021). Memory resources recover gradually over

7 time: The effects of word frequency, presentation rate, and list composition on binding errors and

8 mnemonic precision in source memory. Journal of Experimental Psychology: Learning,

9 Memory, and Cognition. https://doi.org/10.1037/xlm0001072

10 Pratte, M. S. (2020). Set size effects on working memory precision are not due to an

11 averaging of slots. Attention, Perception, & Psychophysics, 82(6), 2937–2949.

12 https://doi.org/10/ghr327

13 Rouder, J. N., & Haaf, J. M. (2019). A psychometrics of individual differences in

14 experimental tasks. Psychonomic Bulletin & Review, 26(2), 452–467. https://doi.org/10/gfxsct

15 Schurgin, M. W., Wixted, J. T., & Brady, T. F. (2020). Psychophysical scaling reveals a

16 unified theory of visual memory strength. Nature Human Behaviour, 4(11), Article 11.

17 https://doi.org/10.1038/s41562-020-00938-0

18 Skrondal, A., & Laake, P. (2001). Regression among factor scores. Psychometrika, 66(4),

19 563–575. https://doi.org/10/fckhxw

20 Souza, A. S., & Oberauer, K. (2016). In search of the focus of attention in working

21 memory: 13 years of the retro-cue effect. Attention, Perception, & Psychophysics, 78(7), 1839–

22 1860. https://doi.org/10/f83nvs

23 Suchow, J. W., Brady, T. F., Fougnie, D., & Alvarez, G. A. (2013). Modeling visual
Running head: ESTIMATING VWM MIXTURE MODELS IN BRMS 69

1 working memory with the MemToolbox. Journal of Vision, 13(10), 9–9.

2 https://doi.org/10.1167/13.10.9

3 van den Berg, R., Awh, E., & Ma, W. J. (2014). Factorial comparison of working

4 memory models. Psychological Review, 121(1), 124–149. https://doi.org/10.1037/a0035234

5 van den Berg, R., Shin, H., Chou, W.-C., George, R., & Ma, W. J. (2012). Variability in

6 encoding precision accounts for visual short-term memory limitations. Proceedings of the

7 National Academy of Sciences, 109(22), 8780–8785. https://doi.org/10/f3trn3

8 Vuorre, M. (2017, October 9). Bayesian Estimation of Signal Detection Models.

9 https://mvuorre.github.io/posts/2017-10-09-bayesian-estimation-of-signal-detection-theory-

10 models/#ref-estes_problem_1956

11 Zhang, W., & Luck, S. J. (2008). Discrete fixed-resolution representations in visual

12 working memory. Nature, 453, 233 EP-. https://doi.org/10/c693sk

13

You might also like