Steps in Creating A Standardized Scale
Steps in Creating A Standardized Scale
Standardized Scale
By Rachel C. Reyes-Laureano, PhD
Reference:
Generally, successful tests are developed due to some combination of the three
following conditions:
1) Theoretical advances (e.g. NEO PI-R by Costa & McCrae, 1995)
2) Empirical advances (e.g. MMPI by Butcher, Dahlstrom, Graham,
Tellegen, & Kaemmer, 1989)
3) A practical or market need (e.g. SAT by Coyle & Pillow, 2008)
An Integrative Approach of the Scale Development Process
Phase A: Instrument Purpose and Construct Measured
Item Problem
Don’t you think that smoking should be banned in public Leading question—it favors a yes answer
buildings?
Implicit assumption—it assumes the respondent referred to a psycholo-
How often do you refer to a
gist
psychologist?
How often did you break down and burst into Non neutrality—“Break down” gives a negative undertone to crying
tears?
Ambiguous and unclear—Does not specify the problem and the
Do you ever suffer from back
time frame
pains?
Are you satisfied with your job or there were some Double barreled question (asks two different things at the same
problems? time)
Did you notice any motor conversion symptoms over the last 4 weeks? Complicated—Uses professional jargon
It is true that one of the things I seem to have a problem with is making Lack of brevity/economy—“I often have difficulty in making a
point”
a point when discussing with other people conveys the same meaning in fewer words
Content adapted by Barker et al., 2016: pp. 111-112; DeVellis, 2017: p. 101.
Phase C: Item Generation (Item Pool)
● Suggestions for item wording:
● 1) Avoid items in the past tense
● 2) Construct items that include a single thought only
● 3) Avoid double-negatives
● 4) Prefer items with simple sentence structure
● 5) Avoid words denoting absoluteness such as only, just, always, none
● 6) Avoid items likely to be endorsed by everyone
● 7) Avoid items with multiple interpretations
● 8) Use simple and clear language
● 9) Keep items under 20 words.
● Reading level of 11-13 year old, 4th grade level
Phase D: Item Evaluation
● The item generation phase is completed when an expert panel reviews the item
pool
● The items generated are reviewed for quality and relevance by the expert panel
and/or by pilot testing
● Alternatively, four additional methods can be used to provide feedback on the
relevance, clarity, and unambiguousness of items: 1) field pretests, 2)
cognitive interviews, 3) randomized experiments and 4) focus groups
Phase D: Item Evaluation
● Proposed Criteria for retaining and discarding items before and after expert
reviewing;
○ Highest interpretability
○ Lowest ambiguity
○ Reject double-barrelled items (checking two things in one item) like “I feel
dizziness and trembling of the hands”
○ Reject items using jargon
○ Do not mix positively and negatively stated items
○ Avoid lengthy items
Phase D: Item Evaluation
● Pilot testing the items (Pretesting)
● This next stage includes administration to an appropriate sample
● After administration, item analysis will be conducted.
● An item analysis allows detection of items that are 1) ambiguous, 2) incorrectly keyed
or scored, 3) too easy or too hard, 4) not discriminative enough
● This phase generally consists of the following statistical techniques: 1) examine
the intercorrelations between all item pairs, 2) remove items with low correlations with
total score, 3) track the differences between the item means and the 25% of the expert
ratings.
● Items that have higher values are potentially better discriminators of the target
construct 4) take into account the characteristics of each item and practical
considerations - retain items with high item-total correlations and high
discrimination.
Phase D: Item Evaluation
● Detect Response Bias
● An additional consideration is whether an item causes response sets which either
bias responses or generate response artifacts.
● The most common response biases are 1) yeay-saying (acquiescence bias -
respondents agree with the statements), 2) nay-saying (respondents reject the
statements), consistency and availability artifacts, halo effects, and social
desirability bias. Likert scales may also present a central tendency bias -
respondents avoid selection of extreme scale categories.
Phase E: Testing the Psychometric Properties of the Scale
● In the final phase of the test development process, a validation study is always
carried out in a large and representative development sample to estimate the
psychometric properties of the scale.
● To test the dimensionality of the scale, factor analysis is conducted using
exploratory factor analysis and confirmatory factor analysis.
● Usually, scales are administered, analyzed, revised, and readministered a number
of times before their psychometric properties are acceptable.
Phase E: Testing the Psychometric Properties of the Scale
● Dimensionality
● A scale’s dimensionality, or factor structure, refers to the number and nature of the
variables reflected in its items.
● A scale measuring a single construct is called unidimensional.
● This means there is a single latent variable (factor) that underlies the scale items.
● A scale measuring two or more constructs (latent variables) is
multidimensional.
Any questions?