Recovering Quantitative Models of Human Information Processing With Differentiable Architecture Search
Recovering Quantitative Models of Human Information Processing With Differentiable Architecture Search
Title
Recovering Quantitative Models of Human Information Processing with Differentiable
Architecture Search
Permalink
https://escholarship.org/uc/item/9wd571ts
Journal
Proceedings of the Annual Meeting of the Cognitive Science Society, 43(43)
ISSN
1069-7977
Author
Musslick, Sebastian
Publication Date
2021
Peer reviewed
x_4 y
y
x_3 P(detected)
P(detected)
y x_2 x_4
D D
Recovered Computation Graph Recovered Computation Graph
E E
Figure 3: Architecture search results for exponential Figure 4: Architecture search results for LCA. (A, B, C)
learning. (A, B, C) The mean test loss as a function of The mean test loss as a function of the number of intermediate
the number of intermediate nodes (k) and penalty on model nodes (k) and penalty on model complexity (γ) for architec-
complexity (γ) for architectures obtained through (A) regular tures obtained through (A) regular DARTS, (B) fair DARTS
DARTS, (B), fair DARTS and (C) random search. Vertical and (C) random search. Vertical bars indicate the SEM across
bars indicate the SEM across seeds. The star designates the seeds. The star designates the test loss of the best-fitting ar-
test loss of the best-fitting architecture obtained through regu- chitecture for regular DARTS (k = ), depicted in (D). (E)
lar DARTS, shown in (D). (E) The learning curves generated Dynamics of each decision unit simulated with the original
by the original model and the recovered architecture in (D). model and the best architecture shown in (D), using the same
initial condition at t = .
best-fitting architecture relies on a number of other transfor-
mations to compute Pn based on its independent variables, introduced and evaluated a method for recovering quantita-
and fails to fully recover learning curves of the original model tive models of cognition using DARTS. The proposed method
(Figure 3E). In the General Discussion, we examine ways of treats quantitative models as DAGs, and leverages continuous
mitigating this issue. relaxation of the architectural search space to identify candi-
date models using gradient descent. We evaluated the perfor-
Case 3: Leaky Competing Accumulator The best-fitting
mance of two variants of this method, regular DARTS (Liu et
architecture, here shown for k = (Figure 4D), bears remark-
al., 2018) and fair DARTS (Chu et al., 2020), based on their
able resemblance to the original model (cf. Equation (12)),
ability to recover three different quantitative models of human
X cognition from synthetic data. Our results show that these
dxi = [. − . · x − . ReLU(xi )]dt (14)
implementations of DARTS have an advantage over random
j6=i
search, and are capable of recovering computational motifs
in that it recovers the rectified linear activation function from quantitative models of human information processing,
imposed on the two units competing with x , as well as the such as the difference operation in Weber’s law or the recti-
corresponding inhibitory weight . ≈ β = .. Yet, the re- fied linear activation function in the LCA. While the initial
covered model misses to apply this function to unit xi . How- results reported here seem promising, there are a number of
ever, the latter is not surprising given that the LCA has been limitations worth addressing in future work.
reported to not be fully recoverable, partly because its param- All limitations of DARTS pertain to its assumptions, most
eters trade off against each other (Miletić, Turner, Forstmann, of which limit the scope of discoverable models. First, not
& van Maanen, 2017). The generated dynamics are never- all quantitative models can be represented as a DAG, such as
theless capable of approximating the behavior of the original ones that require independent variables to be combined in a
model (Figure 4E). multiplicative fashion (see Test Case 2). Solving this problem
may require expanding the search space to include different
General Discussion and Conclusion
integration functions performed on every node.5 Symbolic
Empirical scientists are challenged with integrating an in-
creasingly large number of experimental phenomena into 5 Another solution would be to linearize the data or to operate in
quantitative models of cognitive function. In this article, we logarithmic space. However, the former might hamper interpretabil-
1842
regression algorithms provide another solution to this prob- Kingma, D. P., & Ba, J. (2014). Adam: A method for stochas-
lem, by recursively identifying modularity of the underlying tic optimization. arXiv preprint arXiv:1412.6980.
computation graph, such as multiplicative separability or sim- Li, L., & Talwalkar, A. (2020). Random search and repro-
ple symmetry (Udrescu et al., 2020). Second, some opera- ducibility for neural architecture search. In Uncertainty in
tions may have an unfair advantage over others when trained artificial intelligence (pp. 367–377).
via gradient descent, e.g. if their gradients are larger. This Lindauer, M., & Hutter, F. (2020). Best practices for scientific
problem can be circumvented with non-gradient based archi- research on neural architecture search. JMLR, 21(243), 1–
tecture search algorithms, such as evolutionary algorithms or 18.
reinforcement learning. Finally, the performance of DARTS Liu, H., Simonyan, K., & Yang, Y. (2018). Darts:
is contingent on a number of training and evaluation param- Differentiable architecture search. arXiv preprint
eters, as is the case for other NAS algorithms. Future work arXiv:1806.09055.
is needed to evaluate DARTS for a larger space of param- McClelland, J. L., & Rumelhart, D. E. (1986). Parallel dis-
eters, in addition to the number of intermediate nodes and tributed processing. Explorations in the Microstructure of
the penalty on model complexity as explored in this study. Cognition, 2, 216–271.
However, despite all these limitations, DARTS may provide Mendoza, H., Klein, A., Feurer, M., Springenberg, J. T., &
a first step toward automating the construction of complex Hutter, F. (2016). Towards automatically-tuned neural net-
quantitative models based on interpretable linear and non- works. In Workshop on AutoML (pp. 58–65).
linear expressions, including connectionist models of cogni- Miletić, S., Turner, B. M., Forstmann, B. U., & van Maanen,
tion (McClelland & Rumelhart, 1986; Rogers & McClelland, L. (2017). Parameter recovery for the leaky competing
2004; Musslick, Saxe, Hoskin, Reichman, & Cohen, 2020). accumulator model. J. Math. Psychol., 76, 25–50.
In this study, we consider a small number of test cases to Musslick, S., Cherkaev, A., Draut, B., Butt, A., Srikumar,
evaluate the performance of DARTS. While these test cases V., Flatt, M., & Cohen, J. D. (2020). Sweetpea: A stan-
present useful proofs of concept, we encourage the rigor- dard language for factorial experimental design. PsyArXiv,
ous evaluation of this method based on more complex quan- doi:10.31234/osf.io/mdwqh.
titative models of cognitive function. To enable such ex- Musslick, S., Saxe, A., Hoskin, A. N., Reichman, D., & Co-
plorations, we provide open access to a documented imple- hen, J. D. (2020). On the rational boundedness of cog-
mentation of the evaluation pipeline described in this article nitive control: Shared versus separated representations. ,
(www.empiricalresearch.ai). This pipeline is part of a PsyArXiv: https://doi.org/10.31234/osf.io/jkhdf.
Python toolbox for autonomous empirical research, and al- Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E.,
lows for the user-friendly integration and evaluation of other DeVito, Z., . . . Lerer, A. (2017). Automatic differentiation
search methods and test cases. As such, the repository in- in pytorch. NIPS 2017 Autodiff Workshop.
cludes additional test cases (e.g. models of controlled pro- Rogers, T. T., & McClelland, J. L. (2004). Semantic cog-
cessing) that we could not include in this article due to nition: A parallel distributed processing approach. MIT
space constraints. We invite interested researchers to evaluate press.
DARTS based on other computational models, and to utilize Thurstone, L. L. (1919). The learning curve equation. Psy-
this method for the automated discovery of quantitative mod- chological Monographs, 26(3), i.
els of human information processing. Udrescu, S.-M., Tan, A., Feng, J., Neto, O., Wu, T., &
Tegmark, M. (2020). AI Feynman 2.0: Pareto-optimal
References symbolic regression exploiting graph modularity. arXiv
preprint arXiv:2006.10782.
Chu, X., Zhou, T., Zhang, B., & Li, J. (2020). Fair darts:
Usher, M., & McClelland, J. L. (2001). The time course
Eliminating unfair advantages in differentiable architecture
of perceptual choice: the leaky, competing accumulator
search. In Eccv (pp. 465–480).
model. Psychological review, 108(3), 550.
Elsken, T., Metzen, J. H., Hutter, F., et al. (2019). Neural
Xie, S., Kirillov, A., Girshick, R., & He, K. (2019). Exploring
architecture search: A survey. JMLR, 20(55), 1–21.
randomly wired neural networks for image recognition. In
Fechner, G. T. (1860). Elemente der psychophysik (Vol. 2).
Proceedings of the IEEE/CVF (pp. 1284–1293).
Breitkopf u. Härtel.
He, X., Zhao, K., & Chu, X. (2021). AutoML: A Survey
of the State-of-the-Art. Knowledge-Based Systems, 212,
106622.
Heathcote, A., Brown, S., & Mewhort, D. J. (2000). The
power law repealed: The case for an exponential law of
practice. Psychonomic bulletin & review, 7(2), 185–207.
ity for models relying on simple non-linear functions, and the latter
may be inconvenient if the ground truth cannot be easily represented
in logarithmic space.
1843