100% found this document useful (1 vote)
882 views

An Introduction To Psychometrics

Psychometrics is the quantitative measurement of psychological characteristics like abilities and personality traits. It involves developing tests and procedures to measure traits like intelligence and personality. Psychometric theory includes classical test theory and item response theory. Classical test theory focuses on overall test scores, while item response theory analyzes individual item performance and uses mathematical models to relate latent traits to item responses. Item response theory allows for computer adaptive testing where subsequent test items are tailored based on a test-taker's estimated ability level.

Uploaded by

Destina Warner
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
100% found this document useful (1 vote)
882 views

An Introduction To Psychometrics

Psychometrics is the quantitative measurement of psychological characteristics like abilities and personality traits. It involves developing tests and procedures to measure traits like intelligence and personality. Psychometric theory includes classical test theory and item response theory. Classical test theory focuses on overall test scores, while item response theory analyzes individual item performance and uses mathematical models to relate latent traits to item responses. Item response theory allows for computer adaptive testing where subsequent test items are tailored based on a test-taker's estimated ability level.

Uploaded by

Destina Warner
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

Psychometrics 20-21 1

An Introduction to Psychometrics

Sourced from:
Psychometric Theory & Assessment
Professor Jack Demick, Harvard Extension School

Psychometrics is a subfield of psychology, which concerns the quantitative


measurement of psychological characteristics (e.g., attitudes, knowledge, abilities,
personality traits). The subfield is primarily concerned with the quantitative
measurement of individual differences (variations along dimensions thought to reside
within individuals and not within situations, for example, age, sex, and racial
differences).

The field also has two major research foci, namely: (a) the development and
refinement of theoretical approaches to measurement most generally (psychometric
theory); and (b) the construction of instruments (tasks, tests) and procedures for the
measurement of specific characteristics with the two most widely researched
characteristics being intelligence and personality (partly because both are
multidimensional in nature). From its inception, psychometrics has been controversial
for various reasons, including the notion that the construction of standardized tests
(tests scored in a standard or consistent manner, making it possible to compare scores
of individuals and/or of groups of individuals) has been suggested as a means that
induces bias toward some groups and not others. Indeed, at one point in its history, the
field included proponents of eugenics (who argued that innate human qualities could
be improved through, e.g., limiting childbirth via sterilization among the poor, the
disabled, and others).

Relevant to the first major research focus of the field, psychometric theory refers
to the large body of theory used in the development of psychological tests and in the
analysis of data collected from these tests. It is important to note that the word test has
multiple dictionary meanings but that the term psychological test has a very specific
meaning (i.e., a systematic procedure for obtaining and evaluating samples of
behavior relevant to cognitive, affective, or interpersonal functioning in light of two
standards, namely, uniformity in test administration and comparison of results to
normative or standardization samples). Tests that sample people’s knowledge, skills,
or cognitive functions are often designated as ability tests. Tests that sample
individuals’ attitudes, interests, opinions, emotional makeup, and characteristic
reactions to people, situations, and other stimuli fall under personality tests with self-
report tasks (e.g., inventories, surveys, questionnaires) and observations of behavior
Psychometrics 20-21 2

(e.g., checklists, schedules, projective techniques) included if they adhere to the two
standards.

On the most general level, psychometric theory has been divided into classical
test theory (CTT) and more recently item response theory (IRT). CTT (Gulliksen,
1950) begins with the assumption that every person’s observed or obtained score on a
test is a function of a true score (error-free score) plus an error score (measurement
error from random noise within the individual and/or test situation), the latter of which
is assumed to be of equal magnitude for all test takers. Thus, CTT (often referred to as
true score theory) typically compares the overall test scores (sum of the item scores)
of a group of test takers to those of a normative group randomly selected from the
population toward improving the test’s psychometric properties. Psychometric
properties refer to a test’s reliability (consistency of measurement of overall test
scores) and validity (the ability of a test overall to measure what it is supposed to
measure). Over the years, researchers have identified several different kinds of
reliability (e.g., inter-rater, test-retest, parallel forms, split-half) and of validity (e.g.,
content, face, predictive, concurrent, construct). The establishment of both these
psychometric properties predominantly employs Spearman’s (1904) statistical
technique of correlation (two variables are said to be correlated when variations in the
value of one variable are synchronized with variations in the value of the other).

In contrast to CTT whose interest centers mainly on overall test scores, an


alternative approach to assessing a test’s reliability is to focus on examinees’
performance on individual items, which may be either qualitative or quantitative in
nature. A qualitative item analysis examines items on the basis of their content
coverage (e.g., number of items falling into different content categories, often found in
test manuals) and content form and relevance (e.g., number of items written according
to effective item-writing guidelines). Quantitative item analysis primarily concerns
statistical assessment of the difficulty and the discriminative value of the items. Item
difficulty is typically defined as the percentage of persons who answer the item
correctly (on ability tests) or in the keyed direction (on personality tests), expressed in
standard scores (with the most suitable items spread over a moderate difficulty range
around the 50% level). Item discrimination refers to the relation between performance
on an item and standing on the trait under consideration (e.g., by comparing those who
pass vs. fail an item on an external criterion or on the total test score through the use
of a biserial correlation with each item). Toward providing comparable scales across
samples (e.g., tested at different times of the year or in different years), some variant
of Thurstone’s absolute scaling (generating common anchor points for different
samples) was characteristically employed.
Psychometrics 20-21 3

However, with the advent of computers, precise mathematical procedures began


to be developed for sample-free measurement scales for use in the construction of
psychological tests. In this context arose a group of procedures initially grouped under
latent trait theory, which models the relations between individuals’ latent
(unobservable) traits (e.g., intelligence) and responses to test items (e.g., whether they
succeed on an item of specified difficulty). These are simply statistical constructs
derived mathematically from empirically observed relations among test responses.
Early proponents of this approach (e.g., Lord, 1980) did not want others to confuse
these mathematical constructs with physical or psychological ones (as possibly
implicated in the term latent trait) so they named the approach item response theory
instead.

IRT is a collection of measurement models that are mathematical equations


describing the association between test takers’ levels on a latent variable and the
probability of a particular response to an item, using nonlinear functions. IRT item
parameters are estimated directly using logistic models instead of proportions (item
difficulty), item-to-scale correlations (item discrimination), or simple independent
probabilities (guessing parameter corresponding to a correct response occurring by
chance). Thus, there are a number of IRT models, which vary in number of parameters
and in whether they handle dichotomous-only or polytomous items more generally.
The item characteristic curve or ICC (in some contexts referred to as a category-
response curve) is the basic unit in IRT and can be understood as the probability of
endorsing an item for individuals with a given level of the attribute. Depending on the
IRT model employed, these curves indicate which items are more difficult, are better
discriminators of the attribute, and/or are likely to have been guesses. In contrast to
the correlation coefficient employed as the predominant technique of CTT, IRT has
proposed more complex statistical methods for working with large matrices of
correlations and co-variances including factor analysis (reducing data to its basic
underlying dimensions), multidimensional scaling (finding a simple representation for
high-dimensional data), data clustering (finding objects that are like each other),
structural equation modeling (analyzing causal relations in non-experimental data),
and path analysis (evaluating the contribution of any path or combination of paths to
an overall model). With such multivariate methods, proponents of IRT attempt to
simplify large amounts of data, which allow statistically sophisticated models to be
fitted to an individual’s data (individual item responses) and tested to determine if
they are adequate fits. Further, whereas CTT relies on the use of representative
samples, IRT employs test data from large samples known to differ on the construct
being examined but they need not be representative of defined populations.
Psychometrics 20-21 4

One of the most important applications of IRT is found in computer adaptive


testing (CAT) in which a test taker does not need to answer every item on a test for
adequate assessment. By presenting the examinee with a few items that cover the
range of difficulty of the test (e.g., 10 items comprising a routing test), it is possible to
identify an individual’s approximate level of ability and then ask only questions that
will further refine his or her position within that ability level. That is, following the
routing test, subsequent items are different based on how well test takers score on the
routing test. New items are calibrated to curves on large-scale data from one or more
IRT models to represent both item characteristics (item difficulty, discrimination) and
test taker characteristics (probability of guessing) In contrast to CTT that views an
individual’s test performance as a function solely of the test, IRT sees it as a joint
function of the person and the environment (more in line with everyday life and
modern conceptions of psychology). Finally, whereas CTT assumes that more
complete (longer) tests strengthens a test’s reliability, IRT and its subsequent use of
CAT assume that shorter tests can be more reliable than longer ones. Although IRT
produces more sophisticated information for test development that CTT, there has
been hesitancy to switch to the former since it is mathematically more complex,
unfamiliar to many psychologists, and without user friendly computer programs to run
the procedures. Please don’t despair: We will demonstrate and make easy both of
these two approaches to test construction.

With respect to the field’s second major research focus, similar reasoning that
was applied to the development of the first psychometric tests of intelligence
(Stanford-Binet, Wechsler scales) has been employed to develop other psychometric
tests and instruments within all subfields of psychology. Most notably, these include
those related to intelligence (e.g., those inherent in aptitude testing, achievement
testing, educational testing, and neuropsychological testing) and personality testing.
More recently, the use of psychological testing (primarily biographical data
instruments, cognitive ability/aptitude tests, and personality tests) has become
increasingly prominent in industrial-organizational psychology toward assessing
aspects of workplace functioning (both pre- and post-employment) that complements
the development of earlier vocational testing. The practical application of
psychometrics has been most consistently evident when a clinical psychologist is
asked to conduct a psychological assessment of which psychological testing is a part.
A psychological assessment takes place when a psychologist is asked to answer a
specific question about a patient’s functioning (e.g., differential diagnosis,
determination of functional vs. organic factors underlying symptoms, identification of
functional issues, recommendations for therapy and/or medication) based on his or her
observations of behavior, review of records, interviews (with the person and/or
Psychometrics 20-21 5

significant others), and administration of standardized rating scales (e.g., Autism


Quotient, Beck Depression Inventory) and standardized psychological tests.

There is reason to believe that training is psychometrics is both important and


professionally relevant. First, the advent of the field of psychometrics was intricately
connected to the birth of the field of psychology itself. James McKeen Cattell (whose
dissertation was entitled Psychometric Investigation) studied at the University of
Leipzig with Wilhelm Wundt who in 1897 established the first psychological
laboratory to become the father of psychology. In 1889, Cattell became the first
professor of psychology in the United States, teaching at the University of
Pennsylvania and helping to establish psychology as a legitimate science by initiating
the first mental testing efforts in the United States. Second, psychometrics has
constituted a significant part of psychology since its inception and continues to do so
to this very day. Representing psychology’s preferred experimental method,
psychometrics continues to generate much interest in newer subfields (e.g., industrial-
organizational psychology) and can be expected to do so in the future. Third, the
ability to conduct psychological testing is unique to psychologists. Members of no
other professional discipline can engage in the practice of psychological testing.

Fourth, the field of psychometrics offers numerous professional opportunities for


differing roles and responsibilities. For example, a psychometrist is one who
administers tests and a psychometrician is one who constructs tests. There is no
designated educational level for the former (although most typically hold bachelors or
masters degrees) while most of the latter are psychologists (one who is trained in a
wide variety of courses on researching, teaching, writing, or practicing clinically,
leading to a doctorate from a university or school of professional psychology). A
practicing clinical psychologist requires post-Ph.D. licensing for the purpose of
protecting the public, which is mandatory for legal practice (leading him or her to be
designated as a licensed psychology provider or LPP). In contrast, a psychometrician
may or may not obtain credentialing as a LPP and may also obtain voluntary
certification as a certified specialist in psychometry (CSP), earned by passing the
minimum competency examination of the foundational knowledge in psychometry.

You might also like