Psychometrics is not measurement
Psychometrics is not measurement
© American Psychological Association, 2020. This paper is not the copy of record and may not exactly
replicate the authoritative document published in the APA journal. Please do not copy or cite without
author's permission. The final article is available, upon publication, at: https://doi.org/10.1037/teo0000176
Target article
Psychometrics is not measurement:
Unraveling a fundamental misconception in quantitative psychology
and the complex network of its underlying fallacies
Jana Uher *
1
School of Human Sciences, University of Greenwich
2
London School of Economics
* Correspondence:
University of Greenwich, School of Human Sciences
Old Royal Naval College, Park Row, London SE10 9LS, United Kingdom
Telephone: +44(0)20-8331 9654; E-mail: [email protected]
This research was funded by the European Commission (EC Grant Agreement number 629430).
Abstract
Psychometrics has always been confronted with fundamental criticism, highlighting serious
insufficiencies and fallacies. Many fallacies persist, however, because each critic explores only
some fallacies while still building on others. This article scrutinizes the epistemological,
metatheoretical and methodological foundations of psychometrics, revealing a complex network
of numerous conceptual fallacies underlying its framework of theory and practice. At its core lies
a key challenge for psychology: the necessity to distinguish the phenomena under study from
the means used to explore them (e.g., concepts, methods, data). This distinction is intricate
because concepts constitute psychical phenomena in themselves and many psychical
phenomena are accessible only through language-based methods. The analyses show how
insufficient consideration of this important distinction and common misconceptions about
concepts and language (e.g., signifier-referent conflation; reification of constructs) led to
conflations of disparate notions of key terms in psychological measurement (e.g., ‘variables’,
‘attributes’, ‘causality’) and numerous interrelated fallacies (e.g., construct-referent conflation,
phenomenon-quality-quantity conflation, numeral-number conflation). These fallacies are
maintained and masked by repeated conceptual back-and-forth switching between two
incompatible epistemological frameworks, 1) an operationist framework of data modelling
implemented through methodical and statistical operations and 2) a realist framework of
measurement sporadically invoked in theoretical considerations but neither theoretically
elaborated nor empirically implemented. The analyses demonstrate that psychometrics
constitutes only data modelling but not data generation or even measurement as often assumed
and that analogies to (indirect or fundamental) physical measurement are mistaken. They
provide theoretical support for the increasing criticism of psychometrics and its use in research
and applied contexts.
Keywords
Psychometrics; Replicability; Latent variable; Psychological measurement; Quantitative method
1/31
Uher, J. (2021a). Psychometrics is not measurement: Unraveling a fundamental misconception in quantitative psychology and
the complex network of its underlying fallacies [Target article]. Journal of Theoretical and Philosophical Psychology, 41, 58–84.
https://doi.org/10.1037/teo0000176
Introduction
Developing methods to quantify properties of study phenomena is a central task in many
sciences. Psychometrics, the field concerned with ‘measuring the mind’ (Borsboom, 2005), is
aimed at making this possible for psychical phenomena despite the obvious differences from
physical phenomena and inapplicability of physical measuring instruments. Over the last
century, psychometrics has become a flourishing field with substantial commercial impact. But
from the start, numerous lines of critique were voiced, highlighting serious insufficiencies and
problems, such as the implicit but untested assumption of quantitative properties in psychical
phenomena (Michell, 2008), the focus on correlational rather than causal relations for validation
(Borsboom, 2005), the lack of representation theorems (Kyngdon, 2008a) and insufficient
conceptual understanding of concepts and language (Maraun & Gabriel, 2013). Various fallacies
were highlighted, such as erroneous equations of constructs with their referents (Slaney &
Garcia, 2015) and of latent variables with constructs (Maraun & Halpin, 2008) as well as
erroneous analogies with physical measurement ((Trendler, 2019; Uher, 2020a).
Although clearly recognized, these fallacies still persist in pertinent publications—even in
the writings of scholars who are critically discussing their detrimental impact on psychological
research. This is because all these fallacies are tightly interrelated and build upon each other,
forming a complex network that underlies the psychometric framework of conceptual thinking
and empirical practice. But each single critic typically focusses on just a subset of the fallacies
already known, while implicitly still building on other fallacies in their thinking. This fragmentation
entails that the entirety of fallacies and their complex interplay are not yet fully understood and
that their effects on the theories and practices established in psychometrics have not yet been
fully analyzed.
This article aims to put together the puzzle of the different lines of criticism voiced against
psychometrics as measurement, to complement them with further fallacies still not well
considered, and to highlight their interdependencies. This network of fallacies is central to the
framework of psychometric theory and practice—and first establishes its functionality.
Understanding its complexity and functioning is crucial for developing alternative approaches
and methods of measurement in psychology that help avoid these fallacies in the future.
2/31
Uher, J. (2021a). Psychometrics is not measurement: Unraveling a fundamental misconception in quantitative psychology and
the complex network of its underlying fallacies [Target article]. Journal of Theoretical and Philosophical Psychology, 41, 58–84.
https://doi.org/10.1037/teo0000176
metrology need not matter at all for psychology. The point here however is that, when numerical
data are generated and mathematically and statistically analyzed in order to describe and
explore real-world phenomena, the generation of these data must be based on some
transparent principles that are applied consistently across sciences and that ensure that the
mathematical and statistical results obtained allow to make justified inferences about the study
phenomena. This is essential for developing knowledge about these phenomena that can be set
in relation to findings from other investigations about the same or other real-world phenomena—
no matter in which science they may have been produced—thus, for establishing a secured
knowledge base. It is also a matter of scientificity; a process as foundational to science as
measurement cannot have entirely different meanings in different fields. Transdisciplinary
analyses are therefore helpful to explore differences and commonalities in the measurement
practices of different sciences and to identify basic principles that are applicable to all.
The present analyses rely on the Transdisciplinary Philosophy-of-Science Paradigm for
Research on Individuals (TPS-Paradigm; (Uher, 2015c, 2018c) in which established concepts
from various disciplines, complemented by novel ones, have been integrated into philosophical,
metatheoretical and methodological frameworks that coherently build upon each other. These
frameworks highlight connections, differences and communalities across sciences, and thus
starting points for cross-scientific collaboration (Uher, 2020b). Together with its focus on
research on individuals, including critical considerations of scientists’ own role in research
processes, the TPS-Paradigm therefore provides useful conceptual foundations for the present
analyses. Some relevant concepts will be introduced below where needed; more information and
references are provided in the footnotes, extensive elaborations elsewhere1.
Necessarily, such transdisciplinary and philosophy-of-science frameworks require a
terminology that is more abstract than that used for the specific theories, concepts, methods and
approaches in the given disciplines and that inevitably diverges from any mono-disciplinary
standard. This may feel unfamiliar to some readers. But for the present analyses this it is
essential because key terms of psychological measurement codify conceptual fallacies and thus
contribute to these fallacies’ persistence in the field (Uher, 2021).
Critical realism: A philosophy for exploring the natural, social and experiential world
Many psychologists argue that, because measurement is about gaining knowledge about
the natural world, psychological measurement must be grounded in scientific realism
(Borsboom, 2005; Maul, 2013; Michell, 1999). The core idea of this epistemology is that both
observable and unobservable phenomena can be explored and that true knowledge can be
generated about them. This involves, metaphysically, the belief in the spatio-temporal existence
of the world independent of its perception and conception by some conscious beings (mind-
independent reality); semantically, the literal interpretation of scientific claims as describing this
mind-independent reality; and epistemologically, the idea that thus-interpreted scientific claims
yield true knowledge about that real world (Chakravartty, 2017).
1The TPS-Paradigm has already been applied 1) to integrate and expand on previous concepts of individuals’
psyche, behavior, language and contexts (Uher, 2013, 2015a, 2015c, 2016b, 2016a); 2) to refine and newly develop
concepts and methodologies for taxonomising and comparing individual differences in various kinds of phenomena
and populations (Uher, 2015b, 2015d, 2015e, 2018b, 2018c), and 3) to critically analyze concepts, theories and
practices of data generation and measurement across the sciences (Uher, 2019, 2020a) and in quantitative
psychology (Uher, 2018a, 2021). Applications are demonstrated in multi-method studies with humans and other
species (e.g., (Uher, 2015b, 2018a; Uher, Addessi, et al., 2013; Uher & Visalberghi, 2016; Uher, Werner, & Gosselt,
2013). http://researchonindividuals.org.
3/31
Uher, J. (2021a). Psychometrics is not measurement: Unraveling a fundamental misconception in quantitative psychology and
the complex network of its underlying fallacies [Target article]. Journal of Theoretical and Philosophical Psychology, 41, 58–84.
https://doi.org/10.1037/teo0000176
In psychological measurement, however, scientific realism often takes the form of naïve
realism involving assumptions that invariant quantities would exist in the world and
independently of the methods used—ideas that even metrologists reject (Mari et al., 2017). More
critical stances are needed, in particular, given that psychologists aim to explore individuals and
the specific reality of their minds. How can this reality be mind-independent (Stent, 1969)? The
point however is not, as some proponents of scientific realism seem to believe, to deny the
existence of a reality, which includes the minds of the beings that form part of it; instead, it is
about the ways in which we can explore that reality. The philosophical framework of the TPS-
Paradigm comprises the presumption that all science is done by humans and that we can gain
access to this reality only through our human perceptual and conceptual abilities (interpretations;
Peirce, 1958, CP 2.308; Wundt, 1907), which inevitably limits our possibilities to explore and
understand this reality. This should not be mistaken for ideas of radical constructivism (von
Glasersfeld, 1991), which involve the assumption that knowledge could be developed without
reference to a mind-independent ontological reality to which humans, as a species, have
adapted over millions of years (Uher, 2015a). Instead, the presupposition made in the TPS-
Paradigm highlights the fact that psychologists are always individuals themselves and thus,
cannot be independent of their objects of research, unlike most natural scientists. Indeed,
psychologists aim to explore minds—being equipped with nothing but a mind (Stent, 1969). This
entails that psychologists’ presuppositions about their study phenomena are (inevitably)
influenced by the (explicit and implicit) beliefs that they have developed about these phenomena
from their own, inherently anthropo-centric, ethno-centric and ego-centric experiences
(Fahrenberg, 2013; Uher, 2015c, 2020b).
This epistemological stance comes close to that of critical realism, which emphasizes the
reality of the objects of research and their knowability but also that our knowledge about this
reality is created on the basis of our practical engagement with and collective appraisal of that
reality (Bhaskar & Danermark, 2006). Hence, assumptions about the existence of reality must be
distinguished from claims of the existence of truth because, without sentences, there is no truth.
Sentences are elements of human languages, and human languages are human creations.
“Truth cannot be out there—cannot exist independently of the human mind—because sentences
cannot so exist or be out there. The world is out there, but descriptions of the world are not. Only
descriptions of the world can be true or false. The world on its own—unaided by the describing
activities of human beings—cannot” (Rorty, 1995, p. 5). What scientific communities establish as
truth is thus a result of their consensus, designed from language games (Wittgenstein, 2009).
That is, knowledge generation is not an increasing understanding of reality ‘as it really is’.
Instead, knowledge is increasingly useful to meet socio-practical demands—and therefore
always theory-laden, socially embedded and historically contingent (Maraun, 1998; Maraun et
al., 2009; Pinheiro, 2020).
4/31
Uher, J. (2021a). Psychometrics is not measurement: Unraveling a fundamental misconception in quantitative psychology and
the complex network of its underlying fallacies [Target article]. Journal of Theoretical and Philosophical Psychology, 41, 58–84.
https://doi.org/10.1037/teo0000176
conceptual and terminological fallacies are used to scrutinize the concepts and practices of
‘measurement’ established in psychometrics. For this purpose, and to highlight commonalities
and differences with metrology, the article introduces basic methodological principles from the
TPS-Paradigm that were shown to underlie metrologists’ concepts of measurement and
measuring instruments and to be also applicable to psychological study phenomena. These
principles are applied to demonstrate the ways in which the dense network of fallacies
highlighted in this article underlies the pathways of reasoning in psychometrics that led to
erroneous analogies with physical measurement and to the widespread but erroneous belief that
psychometric modelling approaches could constitute measurement. The article closes by
highlighting some far-reaching consequences and general directions for future developments.
2 From Greek -λογία, -logia for body of knowledge (Lewin, 1936; Uher, 2016a). Analogously, we may get viral (but not
virological) infections and we do virological research.
3
The psyche is defined in the TPS-Paradigm as the “entirety of the phenomena of the immediate experiential reality
both conscious and non-conscious of living organisms” (Uher, 2015c, p. 431), with immediacy indicating absence of
phenomena mediating their perception (Wundt, 1896).
4
In the TPS-Paradigm, behaviors are metatheoretically defined as the “external changes or activities of living
organisms that are functionally mediated by other external phenomena in the present moment” (Uher, 2016b, p. 490).
This definition highlights essential differences in the accessibility of behaviors and psychical phenomena.
5/31
Uher, J. (2021a). Psychometrics is not measurement: Unraveling a fundamental misconception in quantitative psychology and
the complex network of its underlying fallacies [Target article]. Journal of Theoretical and Philosophical Psychology, 41, 58–84.
https://doi.org/10.1037/teo0000176
their occurrences over time, leading to concepts, beliefs and knowledge about them, which are
psychical phenomena in themselves as well but different from and necessarily more stable than
those they are about5 (Uher, 2016a; Whitehead, 1929).
This explains why abstractions and complex ideas that are theoretically constructed by
humans, called constructs, are among psychology’s most frequent study phenomena (Maraun et
al., 2009; Slaney, 2017). Their abstract theoretical nature entails that these conceptual entities
often have several construct referents. Referents can be considered on different levels of
abstraction; they may involve various concrete phenomena that are perceivable at a given
moment (e.g., behaviors, emotions) but also conceptual entities (e.g., sub-constructs) that are
each linked with their own set of more concrete referents. That is, referents can have nested
conceptual structures in which meanings and referents can be ‘inherited’ from other concepts
(Uher, 2021). Abstraction involves that construing persons emphasize some aspects of the
referents they consider, while deemphasizing others. Therefore, any given construct cannot
reflect its referents in the same ways as these can be perceived at any moment (Vygotsky,
1962; Whitehead, 1929). Differences in the particular referents, aspects and levels of abstraction
that persons (implicitly) consider enable unparalleled proliferation and complexity—and thus
changeability and diversity in the constructs created.
This highlights another key challenge in psychology that is still not well considered. The
scientific concepts used to explore psychical phenomena constitute psychical phenomena in
themselves—and thus do not exist outside of the empirical systems under study (Uher, 2020b).
This substantially complicates the important distinction between the study phenomena and the
means used for their exploration. Further complications arise from human language, which is
inevitable for doing science and many psychological study phenomena are accessible only
through language.
5
Transient psychical events (e.g., thoughts, emotions), called experiencings (Erleben) in the TPS-Paradigm, can be
distinguished from temporally more persistent phenomena (e.g., beliefs, mental abilities), called memorized psychical
resultants or experiences (Erfahrung; with memorization referring to any retention process). But these latter can be
accessed only in individuals’ experiencings (Uher, 2016a), and must be reconstructed in each moment anew within
the given context, whereby they are adapted and changed before becoming memorized again (Schacter & Addis,
2007).
6/31
Uher, J. (2021a). Psychometrics is not measurement: Unraveling a fundamental misconception in quantitative psychology and
the complex network of its underlying fallacies [Target article]. Journal of Theoretical and Philosophical Psychology, 41, 58–84.
https://doi.org/10.1037/teo0000176
6 The term data is used inconsistently in psychology; see the section Disparate notions of ‘variables’… below.
7
These signifiers mean owl in German, French, Italian and Russian.
8 This statement implies that digital and paper versions of the same scale, given their different physical properties,
could also differ in their measurement properties—a, for the digital age, surprising statement.
7/31
Uher, J. (2021a). Psychometrics is not measurement: Unraveling a fundamental misconception in quantitative psychology and
the complex network of its underlying fallacies [Target article]. Journal of Theoretical and Philosophical Psychology, 41, 58–84.
https://doi.org/10.1037/teo0000176
Figure 1. Sign systems, their three components, and common fallacies about them. (a) The three
components of sign systems; (b) signifier–sign equation; (c) signifier–meaning conflation; and (d)
signifier–referent conflation. This includes conceptual phenomena (e.g., other constructs) as well.
Signifier–sign equation involves that people also erroneously assume that a sign’s
meaning would be contained in the signifier itself. This signifier–meaning conflation (Figure 1c) is
reflected, for example, in the widespread assumption that standardizing signifiers (e.g., writing
down item wordings) could allow to standardize also their meanings across persons, times and
contexts. But this ignores pronounced individual variation in item interpretations—thus,
subjectivity—that occurs even if all psychometric criteria are met (Lundmann & Villadsen, 2016;
Rosenbaum & Valsiner, 2011; Uher, 2018a; Uher & Visalberghi, 2016).
Signifier–sign equation furthermore involves that the signifier is mistaken for its referent,
thus signifier–referent conflation (Figure 1d). But signifiers do not carry their meanings in
themselves; signifiers are largely arbitrary and can therefore denote different referents (e.g., M,
D, C, L can signify letters or numerals). The triadic concept of signs (Figure 1a) thus highlights
that the meaning of particular data is not given by their signifiers. Instead, signifiers are
interpreted as representing particular phenomena and properties (referents) only given particular
theories and expectations (meaning9). With different theories, data may be interpreted differently
(Van Fraassen, 2012); this explains why data are always theory-laden (Boon, 2015; Kuhn,
1962).
These conflations of the three distinct components of sign systems are widespread in
everyday life. They led to many conceptual fallacies in psychometrics as elaborated now.
9To highlight the meaning component’s essential role for establishing the triadic interrelations, sign systems are
called semiotic representations in the TPS-Paradigm (Uher, 2015c, 2015a).
8/31
Uher, J. (2021a). Psychometrics is not measurement: Unraveling a fundamental misconception in quantitative psychology and
the complex network of its underlying fallacies [Target article]. Journal of Theoretical and Philosophical Psychology, 41, 58–84.
https://doi.org/10.1037/teo0000176
Construct–referent conflation
A further fallacy that is promoted by sign-signifier equation and signifier–meaning and
signifier–referent conflations of words in general and that is therefore widespread in research
with language-based methods such as psychometrics is to conflate constructs as theoretical-
logical-linguistic thinking tools with their construct referents, thus with the real-world entities that
they are meant to denote (Danziger, 1997). Here again, the study phenomena are not
distinguished from the concepts and methods used for their exploration. This construct–referent
10 Ideally, the term ‘variable’ should be replaced by two others to clearly distinguish its two distinct notions.
9/31
Uher, J. (2021a). Psychometrics is not measurement: Unraveling a fundamental misconception in quantitative psychology and
the complex network of its underlying fallacies [Target article]. Journal of Theoretical and Philosophical Psychology, 41, 58–84.
https://doi.org/10.1037/teo0000176
conflation11 (Figure 2; Slaney & Garcia, 2015) occurs, for example, when scientists interpret
constructs as reflecting ‘attributes’ or qualities that individuals ‘possess’ (e.g., in Cronbach &
Meehl, 1955), thus ascribing them an ontological status, as widely done with ‘trait’ constructs
(Uher, 2013, 2015e). The construal of constructs allowed scientists to turn abstract ideas into
entities, thereby making them conceptually accessible to empirical study. But this entification
misguided psychologists to overlook their constructed nature (Slaney & Garcia, 2015) and to
focus primarily on methodological and methodical approaches, ignoring the necessity to develop
their epistemological foundations as well (Hanfstingl, 2019).
Figure 2. Conceptual fallacies derived from misconceptions of language and concepts. The
term variable here means semiotic encodings.
11 Slaney and Garcia termed this construct-entity conflation; but an entity can also be a construct (conceptual entity) in
itself and construct referents can be other constructs as well.
10/31
Uher, J. (2021a). Psychometrics is not measurement: Unraveling a fundamental misconception in quantitative psychology and
the complex network of its underlying fallacies [Target article]. Journal of Theoretical and Philosophical Psychology, 41, 58–84.
https://doi.org/10.1037/teo0000176
the term ‘variable’. In conjunction with variable–referent conflation, this leads to erroneous
conflations of (data) variables denoting constructs (called ‘collective variables’; Thelen & Smith,
1994) with these constructs in themselves, thus to latent variable–construct conflation (Figure 2;
Maraun & Halpin, 2008). A (data) variable labelled ‘extraversion’ is then conflated with the
abstract concept of ‘extraversion’ that it encodes and this, in turn, is conflated with its various
referents in participants’ experience and behavior (construct–referent conflation; Figure 2).
11/31
Uher, J. (2021a). Psychometrics is not measurement: Unraveling a fundamental misconception in quantitative psychology and
the complex network of its underlying fallacies [Target article]. Journal of Theoretical and Philosophical Psychology, 41, 58–84.
https://doi.org/10.1037/teo0000176
assumed to be quantitatively structured (Michell, 1999) and to causally produce variations in the
results obtained from measurement procedures (Borsboom, 2005). It is stated that there can be
within-person and between-person variation in ‘attributes’ and that individuals can have a
position on an ‘attribute’ (Borsboom & Mellenbergh, 2007). The term ‘attributes’ is also used to
denote properties of physical objects (e.g., temperature, velocity; Maul, 2013). These are just
some examples of the disparate notions ascribed to the term ‘attribute’, denoting either 1)
psychical phenomena in themselves; 2) constructs and terms about them; 3) properties that may
occur in study phenomena and objects, and that may be quantitative and may interact with
measuring instruments; as well as 4) (data) variables that semiotically encode 1) to 3).
The conflation of these disparate notions is yet another example of the failed distinction
of the phenomena and properties under study from the means used for their exploration. This
example also illustrates the codification of this failed distinction, and thus its maintenance, in a
key term of the field.
Phenomenon–quality–quantity conflation
The disparate notions of key terms are often masked by the common nominalization of
‘variables’, ‘attributes’ and constructs (e.g., ‘traits’; Slaney & Garcia, 2015), and the entification
that this entails. Entification also masks another key fallacy in psychological measurement—the
frequent failure to specify the phenomena, qualities and quantities studied. Psychologists often
ignore that phenomena (or objects) in themselves cannot be measured; only properties can be.
Any phenomenon (or object) typically features various properties of different qualities.
Behaviors, for example, have temporal and spatial properties; they may also be ascribed
constructed qualities of social desirability, amongst others. Bricks have the properties of length,
weight, density, hardness, color, and temperature, amongst others. Therefore, scientists must
specify in their study phenomena (objects) the particular quality of interest; one cannot just
measure a behavior or a brick.
This specification is also a precondition for measurement because quantities are always
of something—a quality. Qualities (from Latin qualis for "of what sort, of such a kind") are
properties differing in kind, whereas quantities (from Latin quantus for “how much, how many”)
are divisible properties of entities of the same kind—the same quality (Hartmann, 1964).
Quantities are qualitatively homogenous; adding or dividing their magnitude does not change
their meaning; for example, adding entities of length keeps their quality as being that of length
unaltered, whereas this is not possible for perceived color. Divisible properties of the same
quality differ only quantitatively, never qualitatively (Michell, 2012).
Entities of the same quality can be compared in their divisible properties (quantities)
regarding that quality. In behaviors, divisible properties can be identified in their temporal and
spatial qualities (Uher, 2015b, 2018a); for empirical examples, Uher, Addessi, et al., 2013). But
in the manifold qualities of experiencing, what properties could be divisible? What divisible
properties could there be in the abstractions needed to study these transient processual
phenomena? And how could divisible properties be identified in constructs, thus in abstract
conceptual entities that have heterogeneous real-world referents, each featuring different and
also differently emphasized qualitative properties?
12/31
Uher, J. (2021a). Psychometrics is not measurement: Unraveling a fundamental misconception in quantitative psychology and
the complex network of its underlying fallacies [Target article]. Journal of Theoretical and Philosophical Psychology, 41, 58–84.
https://doi.org/10.1037/teo0000176
research or in the biological classification of species, as Michell illustrates. But contrasting this
taxonomic ‘order’ with homogeneous ‘order’ in terms of positional information about a series of
magnitudes of the same quality (Michell, 2012) implies that these disparate notions of ‘order’
would be somehow comparable. In terms of taxonomic order, the same person can be said to be
a human, primate, mammal, vertebrate and animal; all this concerns only more abstract levels of
consideration of the same entity. Ordinality, by contrast, refers to relations among different
entities, such as different persons’ body length, which can be ordered according to their
magnitude. The undifferentiated use of the term ‘order’ for both taxonomic classification and
ordinality obscures essential conceptual differences, thereby contributing to the common belief
that constructs could be measurable and thus thwarting Michell’s long-standing efforts to clarify
misconceptions about the measurability of psychical phenomena.
13/31
Uher, J. (2021a). Psychometrics is not measurement: Unraveling a fundamental misconception in quantitative psychology and
the complex network of its underlying fallacies [Target article]. Journal of Theoretical and Philosophical Psychology, 41, 58–84.
https://doi.org/10.1037/teo0000176
Still, and although clearly insufficient as a measurement theory, RTM stipulates the most
basic principles underlying any kind of data generation in that information about study
phenomena and their (qualitative and quantitative) properties (e.g., located in individuals) is
encoded in signs (e.g., data variables on computer). This highlights that representation theorems
also underlie the data generation methods used in psychometrics. Psychological assessment
methods, especially rating scales, are however fraught with methodological, conceptual and
terminological problems as explored now.
Disparate notions of ‘behaviors’ and ‘responses’ in psychological assessment
In psychology, given the challenges in distinguishing the study phenomena from the
means used for exploring them, misconceptions frequently emerge about what actually
constitutes the empirical relational system in a study. These misconceptions are promoted by
the widespread labelling of both psychical and behavioral phenomena as ‘responses’ or (‘overt’
and ‘covert’) behaviors, likely misguided by the immaterial, transient and processual nature of
both (Uher, 2016b). This undifferentiated terminology misleads researchers to ignore
fundamental differences in these phenomena’s accessibility that profoundly impact research
methodologies (Uher, 2019). It also entails another fallacy in psychological assessment.
Specifically, when psychologists label both individuals’ finger movements for pressing buttons or
ticking scales (Baumeister et al., 2007) and the mental processes involved in a given task as
‘behaviors’ or ‘responses’, this blurs the distinction what actually constitutes the empirical
relational system and which part of the data generation process requires a representation
theorem as demonstrated now by the example of rating scales.
Failed implementation of representation theorems in rating scales
In rating methods, neither raters’ finger movements in themselves nor their ticks on the
scales (as these behaviors’ residuals) nor raters’ judgement processes constitute the empirical
relational system. These are just operational procedures involved in the mapping process in
which raters match the outcomes of their judgements about the actual empirical system of
interest (e.g., individuals’ behaviors, feelings or beliefs) to the symbolic relational system
provided (e.g., rating scales on sheet). That is, the complex task of executing a data generation
process that is intended to meet measurement criteria is delegated to raters, thus to lay people
commonly unfamiliar with the measurement theories therefore needed. Raters are provided with
neither information about these theoretical foundations nor instructions of how these could be
applied to psychical and behavioral phenomena. By contrast, in physical and behavioral
(ethological) measurement, measurement-executing persons are instructed and trained about
how to use the given operative procedures for generating results.
Ratings do not even constitute a method that—at least theoretically and with the
necessary knowledge and instruction—could enable measurement at all. Indeed, rating methods
fail to implement even a most basic representation theorem because rating items and scale
categories serve both as descriptions of the empirical relational system (e.g., behavior and
experience in individuals) and as symbolic relational system (e.g., item variables on sheet),
leaving the two relational systems and the mapping relations between them unspecified (Uher,
2018a). These specifications are left to implicit decisions made by respondents, which remain
largely unexplored. Psychometricians commonly make only general assumptions about raters’
inattention, possible faking and response bias—unrelated to any specific referents to be judged,
thus unrelated to the actual phenomena under study.
Instead, psychometricians focus on the permissible transformations of rating data as
specified in Stevens’ (1946) scale types (uniqueness theorems), likely because these stipulate at
least some concrete quantitative concepts (later refined through latent variable theory, see
below). Given Steven’s (1946) simplified definition of measurement as the “assignment of
14/31
Uher, J. (2021a). Psychometrics is not measurement: Unraveling a fundamental misconception in quantitative psychology and
the complex network of its underlying fallacies [Target article]. Journal of Theoretical and Philosophical Psychology, 41, 58–84.
https://doi.org/10.1037/teo0000176
numerals to objects and events according to rules” (p. 677), researchers recode raters’ choices
of (lexically labelled) answer categories into numerals in always the same ways for all items,
regardless of the phenomena and properties to which they refer. The uniqueness theorems, as
specified through Steven’s scale types, are then regarded as the theoretical justification for
interpreting these numerals as numbers.
Numerals are signifiers (“a black mark on a piece of paper or certain sounds which I
utter”, Campbell, 1920, p. 267) and thus arbitrary (e.g., 1, 5; I, V). Numbers, by contrast, are
mathematical objects arising from ontological interrelations among real phenomena (Hartmann,
1964). Numerals are often used to represent numbers but also just order (e.g., 1st, 5th) or only
categorical—i.e., qualitatively different—properties that have no quantitative meaning at all (e.g.,
room ‘numbers’; Campbell, 1919/2020). Recoding rating scale categories in order to create
‘quantitative’ data thus involves numeral–number equation, another instance of signifier–
meaning conflation. In conjunction with the conflation of the different notions of ‘response’ and
‘behavior’, this leads psychologists to overlook that it is the raters who actually generate the
data, whereas researchers’ recoding of scale categories is just the transposition of one symbolic
system into another (Kyngdon, 2008a)—thus, only a step of data processing but not one of data
generation (Uher, 2018a, 2019, 2020a).
Correct responses, reaction times and their relations to psychical phenomena
Unlike ratings, many psychodiagnostic methods record responses that have a fixed
agreed meaning, such as correctness (e.g., in intelligence, educational and achievement test,
problem-solving tasks, attention and concentration tests). Psychologists also often record
reaction times, thus physical responses (e.g., in implicit association tests, flanker tasks, go/no go
tests, lexical decision tasks). These methods simplify the assignment task that respondents must
accomplish. They allow scientists to record (i.e., semiotically encode) the occurrence of correct
responses and response times, which however are only outcomes of the psychical phenomena
involved in their emergence. Hence, in these methods, it is these responses that constitute the
empirical relational system to which the symbolic relational system is mapped, but not the
psychical phenomena in themselves, thus not the actual phenomena under study. These latter
can only be inferred from the responses recorded. But externally similar responses may emerge
from different internal phenomena and different processes among them (i.e., they are
polygenetic). Expected responses are therefore no guarantee that specific psychical processes
have taken place. Moreover, the whole psychical system cannot exist without various
subprocesses that must be present for a given process to emerge at all (e.g., perception, long-
term memory). That is, the existence of an internal phenomenon is no guarantee that it will also
become manifest in observable outcomes and in expected ways. As a consequence, one-to-one
inferences from recorded outcomes to the actual phenomena of interest cannot be made (for
details, Toomela, 2008).
These profound challenges must be considered when establishing measurement
processes targeted at psychical phenomena. Some psychologists suggested psychometric
approaches would be analogous to those that metrologists have developed for measuring
physical properties that are not directly accessible. The remainder of this section introduces
basic methodological principles underlying metrologists’ concepts of measurement and
measuring instruments that are needed to scrutinize this assumption in the next section.
15/31
Uher, J. (2021a). Psychometrics is not measurement: Unraveling a fundamental misconception in quantitative psychology and
the complex network of its underlying fallacies [Target article]. Journal of Theoretical and Philosophical Psychology, 41, 58–84.
https://doi.org/10.1037/teo0000176
(Uher, 2020a) showed that, on a methodological level, this metrological framework builds on two
basic principles—data generation traceability and numerical traceability.
Data generation traceability: Object-dependent measurement processes
The first methodological principle requires that the ways in which results are assigned to
the quantity to be measured (measurand) in the study phenomena (objects) must be made fully
transparent and thus traceable. To justify that the generated results are attributable to the
measurands, measurement processes must be designed from knowledge about the study
objects and their properties (called object-dependence or object-relatedness in metrology; Mari
et al., 2017). This involves explanations how the specific operative structures allow to make
numerical assignments such that they reveal reliable and valid information about these
measurands, and only about them and not also on other influence properties (Mari et al., 2015).
This knowledge must be implemented in unbroken documented chains of comparisons that
connect the measurand with the result. At each step, the entities of the connected properties can
be compared with one another regarding their quantities so that quantitative information from
one property can be converted into quantitative information in another property, thus establishing
proportional relations between the quantities of the different properties involved (see
thermometer example below).
Numerical traceability: Subject-independent results linked to known standards
The second methodological principle of measurement underlying metrological
frameworks requires that the numerical value assigned to the measurand is also linked to known
standards, in likewise documented and transparent ways. The process design must ensure that
results are invariant with respect to the persons (subjects; e.g., operators, users) involved (called
subject-independence or inter-subjectivity in metrology). This means that results must be reliably
interpretable and always represent the same information about the measurands across time and
contexts (Mari et al., 2017). To ensure that the results have the same meaning everywhere (e.g.,
specific length of 1 meter), metrologists establish unbroken documented conversion (calibration)
chains from primary references (e.g., international prototype meter) to all working references
(e.g., meter rules) used for measurement in non-metrological research and everyday life
(JCGM200:2012, 2012). Psychologists must develop (and have in parts already done so)
analogous ways to establish an intersubjective meaning for their numerical results (e.g., time-
based measurements of behavior; answer categories with universally agreed meanings of
correctness in educational and achievement tests).
These two methodological principles are fundamental for measurement. But explicit
analogous concepts had so far been lacking in psychology although both principles are—on their
abstract methodological level—meaningfully applicable also in psychology (see below; (Uher,
2020a). Moreover, both principles are also essential for instrument development.
16/31
Uher, J. (2021a). Psychometrics is not measurement: Unraveling a fundamental misconception in quantitative psychology and
the complex network of its underlying fallacies [Target article]. Journal of Theoretical and Philosophical Psychology, 41, 58–84.
https://doi.org/10.1037/teo0000176
mercury’ though physical laws13. The latter is connected with ‘length of extension over scale’
through visual comparison, and this, in turn, is connected with (data) variables and values’
through semiotic encoding14 (Uher, 2020a).
What is the measuring instrument in psychological assessments?
Metrologists conceive the psychologists’ generation of quantitative data directly by
persons (e.g., raters, observers) as ‘human-based measurement’, ‘humans as measurement
instrument’ (Pendrill, 2014), and ‘persons as data generation systems’ (Berglund et al., 2012).
Thus, for metrologists, the person is the measuring instrument. Psychologists, by contrast,
regard the items (statements, tasks) and answer categories as their ‘measuring’ instruments.
Given this, they direct all efforts for instrument development and improvement at their tests and
rating scales (e.g., psychometric properties) but not at the respondents’ abilities and knowledge
for executing the data generation process as metrologists would do given their understanding of
the persons as constituting the instruments (Uher, 2019).
This conceptual difference illuminates further fallacies in psychometrics. Specifically,
many psychologists assume that invariant quantities exist in their study phenomena (Michell,
2008) and independently of the methods used (Borsboom & Mellenbergh, 2004). These naïve
realist views entail the belief that ideal methods (e.g., rating scales) could allow to empirically
implement an identity function that turns pre-existing ‘real’ quantities into estimated (manifest)
scores (with definable errors or probabilities). But interactions between study property and
method always influence the results obtained (Bohr, 1937; Heisenberg, 1927; Mari et al, 2017).
In psychological investigations, these interactions are intricate because they are mediated by the
data-generating persons who interact with (e.g., perceive, interpret—the meaning) both the
study phenomena and properties (e.g., behaviors, experience, frequencies—the referents) and
the methods used (e.g., rating scales—semiotic encodings, here their signifiers). Both
metrologists' and psychologists’ notions of instruments fail to conceptualize these complex
interactions, which derive from and thus reflect the triadic interrelations among signifier
(symbolic system), referent (empirical system) and the meanings that both have for particular
persons in particular contexts and which are essential for establishing these interrelations (and
thus assignment relations between both systems). That is, establishing data generation and
numerical traceability requires knowledge of raters’ understanding and use of items scales,
which vary substantially (Uher, 2018a), as well as careful consideration of the intricate
challenges that language-based methods entail for the important distinction of the study
phenomena from the means used for exploring them.
13 Hence, instrument design requires knowledge of systematic (lawful) connections among properties, identifiable
experimentally (Mari et al., 2017). But often, this knowledge is developed only during instrument development; thus,
both processes are iterative and inform each other (Boon, 2015).
14 These two latter steps are also often automatized to further reduce the involvement of human perceptual and
17/31
Uher, J. (2021a). Psychometrics is not measurement: Unraveling a fundamental misconception in quantitative psychology and
the complex network of its underlying fallacies [Target article]. Journal of Theoretical and Philosophical Psychology, 41, 58–84.
https://doi.org/10.1037/teo0000176
18/31
Uher, J. (2021a). Psychometrics is not measurement: Unraveling a fundamental misconception in quantitative psychology and
the complex network of its underlying fallacies [Target article]. Journal of Theoretical and Philosophical Psychology, 41, 58–84.
https://doi.org/10.1037/teo0000176
phenomena each featuring different qualities but also to other conceptual phenomena. Such
heterogeneous conceptual entities preclude the possibility to establish unbroken structural
connections (causal links) to possible quantitative properties in the phenomena serving as their
referents (construct indicators). ‘Nice weather’ cannot be causally linked, for example, to a
temperature of 23 °C, an air pressure of 1023hPa, a relative humidity of 30% and a wind
strength of 15 km/h. One may specify algorithms that define constellations of particular
quantitative ranges for each of these qualitatively different weather indicators. But one cannot
derive from them overall results to quantify ‘weather’ as is widely done, however, for
psychological constructs like ‘extraversion’. This highlights that, data generation traceability
cannot be established from constructs as the actual objects of research in themselves (e.g.,
‘intelligence’; ‘extraversion’, ‘weather’). This could be possible only for concrete indicators
(referents) that are used as operationalizations (e.g., reaction times, task performances; physical
properties used as indicators of ‘weather’)—an important point to consider when interpreting
findings in construct research (for details, Uher, 2020a).
Fiat ‘measurement’ was also likened to metrologists’ associative measurement
(Torgerson, 1958). But the systematic structural relations of quantitative properties that are
therefore required exist neither between psychical phenomena and the physical phenomena to
which they are bound (e.g., brain physiology; Fahrenberg, 2013) nor between the entities of
constructs and those of their (conceptual or concrete) construct referents. Unlike associative
measurement, correspondence between the ordering of the assumed quantitative properties of
constructs (e.g., ‘intelligence’) and the quantifications generated for their indicators (e.g.,
‘intelligence test’ scores) cannot be shown (Trendler, 2013).
Psychometrics is not fundamental measurement
In conjoint measurement theory (Luce & Tukey, 1964), the additivity requirement of
fundamental measurement, evidenced through empirical concatenation (Campbell, 1920), is
transformed into a statistical requirement (Bond & Fox, 2003). This mathematical method allows
to construct scales for research objects featuring multiple properties that are assumed to interact
with one another and to jointly affect a property of interest (e.g., product properties influencing
overall product evaluation). Scales are constructed such that the values of the target property
are a function of the assumed values of the single component properties, thus allowing for the
determination of optimal composition rules for scales. In additive conjoint measurement, a
specific composition rule is assumed in order to develop scales with additive properties and units
of equal distances. This enables the experimental testing of hypothesized additive structures
with clear falsification criteria (Bond & Fox, 2003). This theory was praised for allowing to
represent “meaningful and invariant amounts of anything measured in an additive, divisible, and
portable numeric form” (italics added; Fisher, 2009, p. 1279). For this reason, it was considered
a revolution in social-science measurement, fueling hopes it may finally enable the
measurement of psychical ‘attributes’ (Michell, 1997, 1999).
This theory also underlies Rasch models (Rasch, 1980) that allow to transform nominal
or ordinal manifest scores into interval scaled latent scores (Bond & Fox, 2003; Heene, 2013).
The assumption that Rasch modelling—and latent variable modelling in general—could
constitute measurement builds on the belief that the manifest data and the probabilities analyzed
in them would constitute the empirical relational system under study (Borsboom & Scholten,
2008). But, as demonstrated above, this fundamental misconception derives from variable–
referent and latent variable–construct conflations, numeral–number equation and the various
misconceptions of semiotic systems in general. Specifically, probabilities are mathematical
entities, not empirical entities (Kyngdon, 2008a). Psychometric models simply relate one
numerical (symbolic) relational system to another instead of relating an empirical relational
19/31
Uher, J. (2021a). Psychometrics is not measurement: Unraveling a fundamental misconception in quantitative psychology and
the complex network of its underlying fallacies [Target article]. Journal of Theoretical and Philosophical Psychology, 41, 58–84.
https://doi.org/10.1037/teo0000176
system to a numerical16 one as often (at least implicitly) assumed. It is for this reason that
numerical values of the latent scale are not assigned by rules but instead estimated through
these models (see Michell, 1997). Empirical relational systems, by contrast, involve “sets whose
members are natural objects, events or relations” (Kyngdon, 2008a, p. 91). The term empirical,
meaning experience-based (from Greek empeiria for experience), denotes the fact that the
entities accessible to empirical study “will consist of only a restricted set of … objects or events,
given the limitations of human cognitive and sensory-motor capacities” (Kyngdon, 2008a, p. 90-
91; Uher, 2019).
Measurement—modelling confusion
It follows that psychometrics involves only data modelling but not data generation as
required for measurement. This is also reflected in the facts that psychometric scales are the
outcome of statistical data analysis and modelling, whereas measurement scales involve
specific quantitative entities that are explicitly defined and agreed by convention before any
measurement process is executed (BIPM, 2006). Statistics is not needed to develop
measurement scales; physical measurement was successful long before statistics was
developed (Abran et al., 2012). Indeed, Rasch models were developed from a mere
mathematical framework (Rasch, 1980). “It is not necessarily wrong to develop mathematical
models independently from empirical observations. But it is also not at all self-evident that
empirical insights will result from such models” (Heene, 2013, p. 3). Measurement requires
implementation of unbroken and traceable measurand–result connections (in RTM terms,
systematic mapping relations between the empirical and the numerical relational structure),
which involves testing the generated data for possible quantitative structures. But “in latent
variable modelling in general, and Rasch modelling in particular, this is never done. Hence,
Rasch enthusiasts are leaving something out and, according to Kyngdon [2008a], this something
is not just important, but essential” (Borsboom & Scholten, 2008, p. 113).
Psychometricians produce continuous scales solely from statistical operations carried out
on symbolic relational systems—yet without explaining, or at least aiming to explore, how the
empirical relational systems that they assume to ‘scale’—that is, psychical phenomena—can
possibly vary that way. Neither Rasch models nor any other psychometric theories nor additive
conjoint analysis provide a measurement framework in themselves (Trendler, 2009, 2013, 2019)
and therefore enable neither fundamental nor any other kind of measurement. Psychometrics
builds on a fundamental confusion of measurement with statistical modelling, thus of data
generation with the analysis of data already generated (Figure 3a and b).
16Note that representation in a semiotic system (e.g., data) involves assignments not of numbers as sometimes
assumed but of numerals the meaning of which must be first established empirically and through conventions.
20/31
Uher, J. (2021a). Psychometrics is not measurement: Unraveling a fundamental misconception in quantitative psychology and
the complex network of its underlying fallacies [Target article]. Journal of Theoretical and Philosophical Psychology, 41, 58–84.
https://doi.org/10.1037/teo0000176
21/31
Uher, J. (2021a). Psychometrics is not measurement: Unraveling a fundamental misconception in quantitative psychology and
the complex network of its underlying fallacies [Target article]. Journal of Theoretical and Philosophical Psychology, 41, 58–84.
https://doi.org/10.1037/teo0000176
This network of fallacies and psychometricians’ focus on Steven’s scale types misled
many to believe that assumptions of quantitative properties in psychical phenomena could be
empirically tested through statistical models that allow to create latent scales with such
properties. Following this assumption, psychometricians develop statistical theories and models
that allow to transform nominal or ordinal scaled scores on manifest (data) variables into overall
scores on continuous latent (data) variables, which are assumed to be interval or even ratio
scaled. Such properties, however, are properties of the latent (data) variables, which are mere
statistical concepts. Latent (data) variables are neither the psychological constructs (abstract
conceptual entities) they encode nor the psychical phenomena that these constructs may
describe. “Latent variable models are not detectors of unobservable latent structures,
properties/attributes, causal sources, or anything else” (Maraun & Halpin, 2008, p. 115). They
are only mathematical-statistical models used to analyze data once these are generated. Thus,
they are not models in the classical sense in that some real-world phenomena are modelled—
i.e., represented in abstract ways to highlight structural patterns—because unbroken
measurand–symbol connections are neither established nor even intended to be established
(Kyngdon, 2008a; Maraun & Halpin, 2008). The quantitative properties identified in psychometric
models can thus have only statistical and methodical origins, such as error structure (Michell,
2008), the probability concepts invoked (Kyngdon, 2008a), and the simplistic encoding format of
rating scales aligned to statistical requirements rather than to divisible properties of the study
phenomena (Uher, 2013, 2015d, 2018a).
22/31
Uher, J. (2021a). Psychometrics is not measurement: Unraveling a fundamental misconception in quantitative psychology and
the complex network of its underlying fallacies [Target article]. Journal of Theoretical and Philosophical Psychology, 41, 58–84.
https://doi.org/10.1037/teo0000176
23/31
Uher, J. (2021a). Psychometrics is not measurement: Unraveling a fundamental misconception in quantitative psychology and
the complex network of its underlying fallacies [Target article]. Journal of Theoretical and Philosophical Psychology, 41, 58–84.
https://doi.org/10.1037/teo0000176
actual data generation, likely because the conceptual switching between the disparate notions of
‘variables’ misleads scientists to conflate items describing the study phenomena with these
phenomena in themselves (variable–referent conflation).
The measurement—data modelling confusion is also codified by the inconsistent use of
the term ‘causality’, which contributes to the network of fallacies in psychometrics.
Consequences
The network of fallacies on which the framework of psychometric theory and practice is
based misleads psychometricians to belief that the quantitative properties produced in the latent
(data) variables through various methodical and statistical operations would reflect properties of
psychical phenomena as the actual phenomena of interest in themselves, thus providing
evidence for their hypothesized quantitative structure. This erroneous conclusion is masked by a
repeated (and commonly unnoticed) conceptual back-and forth switching between two
incompatible epistemological frameworks, of which only the operationist framework is
implemented, both theoretically and empirically. But without a (critical) realist framework of
measurement, psychometrics cannot ‘measure the mind’ as the field aspires to do. Psychometric
modelling, as its name indicates, involves only data modelling—thus data analysis—but not data
generation as required for measurement. This highlights that the use of the term ‘measurement’
24/31
Uher, J. (2021a). Psychometrics is not measurement: Unraveling a fundamental misconception in quantitative psychology and
the complex network of its underlying fallacies [Target article]. Journal of Theoretical and Philosophical Psychology, 41, 58–84.
https://doi.org/10.1037/teo0000176
in psychometrics is erroneous and unrelated to that established in metrology, the physical and
other sciences. This constitutes a serious cross-scientific jingle-fallacy that, given the high public
trust that our societies place on scientific measurement (Porter, 1995), has far reaching
consequences for science and applied fields.
25/31
Uher, J. (2021a). Psychometrics is not measurement: Unraveling a fundamental misconception in quantitative psychology and
the complex network of its underlying fallacies [Target article]. Journal of Theoretical and Philosophical Psychology, 41, 58–84.
https://doi.org/10.1037/teo0000176
Data generation methods capturing reaction times and ‘responses’ with specified
meanings (e.g., correctness in ‘intelligence’ tests), by contrast, do enable the establishment of
measurement processes. But this is possible only for these observable outcomes and not for the
underlying psychical phenomena and processes as the actual phenomena of interest in
themselves and one-to-one inferences to these latter are not possible. This is still not well
considered in common interpretations of pertinent data (e.g., when ‘intelligence’ tests are
interpreted as measuring ‘intelligence’ rather than intellectual performances). More efforts are
needed to conceptually differentiate the phenomena involved (e.g., psychical versus behavioral
phenomena). This involves methods that allow to implement both data generation traceability
and numerical traceability, while carefully considering the peculiarities of psychical phenomena
and thus inherent limitations. These two methodological principles, underlying structured
frameworks of measurement established in metrology, are essential to ensuring robustness,
replicability and usefulness of measurement data in all sciences. They are needed to enable
public scrutiny and transparency, and to maintain a high degree of interpretability regarding
study phenomena in the real-world and their properties.
The key challenge for psychological methods of data generation, however, remains the
important distinction of the study phenomena in themselves from the means (e.g., concepts,
models, methods, terms and data) used for their exploration. This article showed that mastering
this distinction is far more intricate than it may seem at first sight. It requires the development of
differentiated conceptual frameworks and an unambiguous terminology17 in order to break up
and overcome the network of fallacies and conflations that are codified in the measurement
jargon currently established in psychology.
Acknowledgements
This research was funded by a Marie Curie Fellowship of the European Commission’s FP7
Programme awarded to the author (EC Grant Agreement number 629430). The author thanks
Karen Slaney and Jim Cresswell for their editorial work as well as Manfred Schmitt, Andrew
Maul and two anonymous reviewers for helpful comments on previous drafts.
References
Abran, A., Desharnais, J.-M., & Cuadrado-Gallego, J. J. (2012). Measurement and quantification are not
the same: ISO 15939 and ISO 9126. Journal of Software: Evolution and Process, 24(5), 585–601.
https://doi.org/10.1002/smr.496
Alexandrova, A., & Haybron, D. M. (2016). Is construct validation valid? Philosophy of Science, 83(5),
1098–1109. https://doi.org/10.1086/687941
Barrett, P. (2008). The consequence of sustaining a pathology: Scientific stagnation— a commentary on
the target article “Is psychometrics a pathological science?” by Joel Michell. Measurement:
Interdisciplinary Research and Perspectives, 6(1–2), 78–83.
https://doi.org/10.1080/15366360802035521
Barrett, P. (2018). The EFPA test-review model: When good intentions meet a methodological thought
disorder. Behavioral Sciences, 8(1), 5. https://doi.org/10.3390/bs8010005
Baumeister, R. F., Vohs, K. D., & Funder, D. C. (2007). Psychology as the Science of Self-Reports and
Finger Movements: Whatever Happened to Actual Behavior? Perspectives on Psychological
Science, 2(4), 396–403. https://doi.org/10.1111/j.1745-6916.2007.00051.x
Berglund, B., Rossi, G. B., Townsend, J. T., & Pendrill, L. (2012). Measurement with persons : theory,
17Developing such frameworks and terminologies with a particular focus on research on individuals and involving
various disciplines is a core aim of the TPS-Paradigm (Uher, 2015c, 2016a, 2019).
26/31
Uher, J. (2021a). Psychometrics is not measurement: Unraveling a fundamental misconception in quantitative psychology and
the complex network of its underlying fallacies [Target article]. Journal of Theoretical and Philosophical Psychology, 41, 58–84.
https://doi.org/10.1037/teo0000176
27/31
Uher, J. (2021a). Psychometrics is not measurement: Unraveling a fundamental misconception in quantitative psychology and
the complex network of its underlying fallacies [Target article]. Journal of Theoretical and Philosophical Psychology, 41, 58–84.
https://doi.org/10.1037/teo0000176
28/31
Uher, J. (2021a). Psychometrics is not measurement: Unraveling a fundamental misconception in quantitative psychology and
the complex network of its underlying fallacies [Target article]. Journal of Theoretical and Philosophical Psychology, 41, 58–84.
https://doi.org/10.1037/teo0000176
some related epistemological issues. Studies in History and Philosophy of Science, 65–66, 46–56.
https://doi.org/10.1016/j.shpsa.2017.08.001
Mari, L., Carbone, P., & Petri, D. (2015). Fundamentals of hard and soft measurement. In A. Ferrero, D.
Petri, P. Carbone, & M. Catelani (Eds.), Modern measurements: Fundamentals and applications (pp.
203–262). Hoboken, NJ: John Wiley & Sons. https://doi.org/10.1002/9781119021315.ch7
Maul, A. (2013). On the ontology of psychological attributes. Theory & Psychology, 23(6), 752–769.
https://doi.org/10.1177/0959354313506273
Michell, J. (1997). Quantitative science and the definition of measurement in psychology. British Journal of
Psychology, 88(3), 355–383. https://doi.org/10.1111/j.2044-8295.1997.tb02641.x
Michell, J. (1999). Measurement in psychology. A critical history of a methodological concept. Cambridge:
Cambridge University Press. https://doi.org/10.1017/CBO9780511490040
Michell, J. (2008). Is psychometrics pathological science? Measurement: Interdisciplinary Research &
Perspective, 6(1–2), 7–24. https://doi.org/10.1080/15366360802035489
Michell, J. (2012). Alfred Binet and the concept of heterogeneous orders. Frontiers in Psychology, 3, 261.
https://doi.org/10.3389/fpsyg.2012.00261
Narens, L. (2002). A meaningful justification for the representational theory of measurement. Journal of
Mathematical Psychology, 46, 746–768.
Ogden, C. K., & Richards, I. A. (1923). The meaning of meaning: A study of the influence of language
upon thought and of the science of symbolism. Harcourt, Brace & World.
Peirce, C. S. (1958). Collected papers of Charles Sanders Peirce, Vols. 1-6, C. Hartshorne & P. Weiss
(eds.), vols. 7-8, A. W. Burks (ed.). Cambridge, MA: Harvard University Press.
Pendrill, L. (2014). Man as a measurement instrument. NCSLI Measure, 9(4), 24–35.
https://doi.org/10.1080/19315775.2014.11721702
Pinheiro, M. A. (2020). A Wittgensteinian comment on “Psychology: A giant with feet of clay” A question
from research on creativity. Integrative Psychological and Behavioral Science.
https://doi.org/10.1007/s12124-020-09544-1
Porter, T. M. (1995). Trust in numbers: The pursuit of objectivity in science and public life. Princeton
University Press.
Rasch, G. (1980). Probabilistic model for some intelligence and achievement tests. Chicago, IL: University
of Chicago Press.
Rorty, R. (1995). Contingency, irony and solidarity. Cambridge, UK: Cambridge University Press.
Rosenbaum, P. J., & Valsiner, J. (2011). The un-making of a method: From rating scales to the study of
psychological processes. Theory & Psychology, 21(1), 47–65.
https://doi.org/10.1177/0959354309352913
Salvatore, S. (2019). Beyond the meaning given. The meaning as explanandum. Integrative Psychological
and Behavioral Science, 53(4), 632–643. https://doi.org/10.1007/s12124-019-9472-z
Schacter, D. L., & Addis, D. R. (2007). Constructive memory: The ghosts of past and future. Nature,
445(7123), 27–27. https://doi.org/10.1038/445027a
Slaney, K. L. (2017). Validating psychological constructs: Historical, philosophical, and practical
dimensions. London, UK: Palgrave Macmillan. https://doi.org/10.1057/978-1-137-38523-9
Slaney, K. L., & Garcia, D. A. (2015). Constructing psychological objects: The rhetoric of constructs.
Journal of Theoretical and Philosophical Psychology, 35(4), 244–259.
https://doi.org/10.1037/teo0000025
Stent, G. S. (1969). The coming of the Golden Age: A view of the end of progress. Garden City: The
Natural History Press. (The American Museum of Natural History).
Stevens, S. S. (1946). On the theory of scales of measurement. Science, 103, 667–680.
Thelen, E., & Smith, L. B. (1994). A dynamic systems approach to the development of cognition and
action. Cambridge, MA: MIT Press.
Thurstone, L. L. (1928). Attitudes can be measured. American Journal of Sociology, 33, 529–554.
Toomela, A. (2008). Variables in psychology: A critique of quantitative psychology. Integrative
Psychological & Behavioral Science, 42(3), 245–265. https://doi.org/10.1007/s12124-008-9059-6
Torgerson, W. S. (1958). Theory and methods of scaling. New York, NY: Wiley.
Trendler, G. (2009). Measurement theory, psychology and the revolution that cannot happen. Theory &
Psychology, 19(5), 579–599. https://doi.org/10.1177/0959354309341926
29/31
Uher, J. (2021a). Psychometrics is not measurement: Unraveling a fundamental misconception in quantitative psychology and
the complex network of its underlying fallacies [Target article]. Journal of Theoretical and Philosophical Psychology, 41, 58–84.
https://doi.org/10.1037/teo0000176
30/31
Uher, J. (2021a). Psychometrics is not measurement: Unraveling a fundamental misconception in quantitative psychology and
the complex network of its underlying fallacies [Target article]. Journal of Theoretical and Philosophical Psychology, 41, 58–84.
https://doi.org/10.1037/teo0000176
***
31/31