0% found this document useful (0 votes)
68 views

Steps in Creating A Standardized Scale

The document outlines the steps involved in developing a standardized scale. It discusses defining the construct being measured, generating an initial item pool, specifying the response scale format, and developing items to measure the target construct. The goal is to end up with a final set of items that are strong indicators of the latent construct being assessed.
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
68 views

Steps in Creating A Standardized Scale

The document outlines the steps involved in developing a standardized scale. It discusses defining the construct being measured, generating an initial item pool, specifying the response scale format, and developing items to measure the target construct. The goal is to end up with a final set of items that are strong indicators of the latent construct being assessed.
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 32

Steps in Creating a

Standardized Scale
By Rachel C. Reyes-Laureano, PhD
Reference:

Kyriazos, Theodoros & Stalikas, Anastasios (2018)


Applied Psychometrics: The Steps of Scale Development and
Standardization Process
Definition of Terms

Questionnaire, test, or scale is defined as a set of items designed to measure one


or more underlying constructs, also called latent variables (Fabrigar & Ebel-
Lam, 2007).
In other words, it is a set of objective and standardized self-report
questions whose responses are then summed up to yield a score.
Definition of Terms

Scale development or construction is the act of assembling or/and writing the


most appropriate items that constitute test questions (Chadha, 2009) for a target
population. The target population is as the group for whom the
test is developed (Dorans, 2018).
Definition of Terms

Test development and standardization (norming) are two related


processes where test development comes first and standardization
follows.
During test development, after item assembly and analysis, the items which are
strongest indicators of the latent construct measured are selected and the final
pool emerges, whereas in standardization, standard norms are specified (Chadha,
2009).
Importance of Effective Scale Development

Effective scale construction has important implications on research inferences,


affecting first the quality and size of the effects obtained and second the
statistical significance of those effects (Furr, 2011), or in other words, the
accuracy and sensitivity of the instruments (Prince, 2017).
Reasons for Development of Scales

Generally, successful tests are developed due to some combination of the three
following conditions:
1) Theoretical advances (e.g. NEO PI-R by Costa & McCrae, 1995)
2) Empirical advances (e.g. MMPI by Butcher, Dahlstrom, Graham,
Tellegen, & Kaemmer, 1989)
3) A practical or market need (e.g. SAT by Coyle & Pillow, 2008)
An Integrative Approach of the Scale Development Process
Phase A: Instrument Purpose and Construct Measured

● A systematic development approach is required to develop effective


scales.
● Prerequisite is to be aware of all existing scales that could suit the purpose
of the measurement instrument you wish to develop, judging their use
without any tendency to maximizing the deficiencies before you embark on
your scale construction adventure.
● Feasibility also needs to be considered: time, cost, scoring, the
method of administration, intrusiveness, the consequences of false-
positive and false-negative decisions, etc. (Streiner et al., 2015)
● After that, the scale development process can start with a definition of the
purpose of the instrument within a specific domain.
Phase A: Instrument Purpose and Construct Measured
● Defining the construct to be measured is a crucial step requiring clarity and
specificity (DeVellis, 2017; Price, 2017).
● Outlining or defining a construct is possible by connecting ideas to a theory (e.g.
emotional intelligence)
● The philosophical foundation of a scale is a connection between the construct to be
measured and a related body of material called a domain.
● Ex. Self-efficacy exists in many models like the…
● Social Cognitive Theory (Bandura, 1997)
● Theory of Planned Behavior (Ajzen, 1991)
● Transtheoretical Model (Prochaska, Norcross, Fowler, Follick, & Abrams, 1992)
● Health Action Process Approach (Schwarzer, 2001)
Phase A: Instrument Purpose and Construct Measured
● Then the construct can be operationalized.
● Deciding on the construct is usually based on a review of literature, along with
consultation with subject matter experts.
● Then a concise, clear, and precise definition of the construct is generated.
● Using this definition, the item content is specified with precision and clarity
(Price, 2017; DeVellis, 2017).
● From this point systematic literature review is conducted again, existing tests are
identified, the nature of the target construct is studied.
● After this the test developer can refine the construct definition further (Irwing &
Hughes, 2018).
Phase A: Instrument Purpose and Construct Measured
● The construct operationalization specifies the following: 1) a model of internal
structure; 2) a model of external relationships with other constructs; 3) potential
relevant indicators; and 4) construct-related processes (Dimitrov, 2012).
● The next step is to link domain content with domain-related criteria.
● Next is planning in order to specify a wide range of options pertaining to item
specification.
● Finally, for Phase A, methods are used to identify the attributes that
accurately represent the targeted construct.
Phase B: Response Scale Specification
● One of the first decisions in designing a scale is whether to include open (allowing
answer in the respondents’ own words) or closed questions (forcing responses from
a set of choices).
● Next, methods for identifying the attributes that accurately represent the
targeted construct are chosen:
○ Subject-matter experts decide on the attributes to be measured
○ Interviews of key elements through an iterative process
○ Review of related literature
○ Content analysis to track dimensions or topic areas
○ Direct observation
○ (Price, 2017: pp. 191; Wolfe & Smith, 2007; Dimitrov, 2012)
Phase B: Response Scale Specification
Phase B: Response Scale Specification
● The developer should decide what the response format will be early on
simultaneously with the item generation so that these two have compatibility.
● Response scales come in different formats with several specifications
● Response scale format - denotes the way the items are worded and
responses are obtained and evaluated (Furr, 2011).
● Common scale formats are:
○ Guttman Scaling
○ Thurstone Scaling
○ Likert Scaling
Phase B: Response Scale Specification
● Response scale formatting considerations
● The first consideration is the number of response categories and their labels,
whether to offer a midpoint or a “no opinion” option, and other details like the
time frame.
● Number of Response Options
● The minimum required is two such as in binary scales (e,g.. Agree/Disagree,
True/False)
● Larger number of responses has benefits and costs
● Scales most often have 5 points, semantic differentials have 7 points, and
Thurstone’s has 11 points
● A potential benefit of relatively large number of options is it allows for finer
gradations, just like the increasing accuracy of a microscope.
● Reliability will also be lower for scales with only two or three points but this
reliability increase disappears after 7 points.
Phase B: Response Scale Specification
● The potential cost of having many response options is the increase in random error,
rather than the systematic portion of the increase in the target construct.
● People cannot discriminate easily beyond 7 points.
Phase B: Response Scale Specification
● Labels of response options (anchoring)
● The descriptors most often tap agreement (Strongly Agree to Strongly Disagree)
but it is possible to construct a Likert scale to measure almost any
attribute, like agreement, acceptance (Most Agreeable-Least Agreeable), similarity
(Most Like Me-Least Like Me), probability (Most likely-Least Likely)
● Research shows that the use of the fully-labeled vs. labelling on the the
endpoints response options is more effective.
Phase B: Response Scale Specification
● Mid-points
● A neutral midpoint can also be added to dichotomous/bipolar rating scales
● Specifying an odd number of points, allowing equivocation (“neither agree nor
disagree”) or uncertainty (“not sure”), “neutral”, “undecided”
● Adding midpoints increased the reliability and validity of ratings
● However, a “don’t know” response option has been shown to be inefficient
Phase C: Item Generation (Item Pool)
● Along with specifying the response format, a parallel step in developing a
questionnaire is assembling and/or devising items for the initial pool.
● The content specification of an instrument requires that the developer 1)
operationalizes the construct by specifying an exhaustive list of potential
indicators (items) of the target construct, 2) select from this list the
representative sample of indicators.
Phase C: Item Generation (Item Pool)
● Number of Items to Include
● The initial item pool is larger than the final scale set, as a rule, it is usually 3 to 4
times larger.
● Writing more good items than required permits selection of the best items.
● Content redundancy is an asset during the pool construction because it boosts
internal consistency reliability which, in turn, supports validity.
Phase C: Item Generation (Item Pool)
● Sources of potential items
● The first source of information is to examine what others have done
● Reasons for item adoption from previous instruments are 1) it saves work, 2)
existing items have usually proven to be psychometrically sound and 3) there
are not unlimited ways to ask about a specific problem.
● Other sources of ideas for writing potential items:
● 1) the target population (focus group)
● 2) theory
● 3) existing research
● 4) expert opinion or key informant interviews
● 5) clinical observation
● A scale developer may use some or all of these sources
Phase C: Item Generation (Item Pool)
● Item Wording
● Item wording is important because the way a question is phrased can
determine the response
● During item-writing, issues such as language clarity, content relevancy, and the
use of balanced scales (items worded positively and negatively) are usually
considered
● However, research suggests that balancing positively and negatively worded
items is usually inefficient (ineffective)
Table 6. Some examples of unsuccessfully item wording.

Item Problem

Don’t you think that smoking should be banned in public Leading question—it favors a yes answer
buildings?
Implicit assumption—it assumes the respondent referred to a psycholo-
How often do you refer to a
gist
psychologist?
How often did you break down and burst into Non neutrality—“Break down” gives a negative undertone to crying
tears?
Ambiguous and unclear—Does not specify the problem and the
Do you ever suffer from back
time frame
pains?
Are you satisfied with your job or there were some Double barreled question (asks two different things at the same
problems? time)
Did you notice any motor conversion symptoms over the last 4 weeks? Complicated—Uses professional jargon

It is true that one of the things I seem to have a problem with is making Lack of brevity/economy—“I often have difficulty in making a
point”
a point when discussing with other people conveys the same meaning in fewer words

Content adapted by Barker et al., 2016: pp. 111-112; DeVellis, 2017: p. 101.
Phase C: Item Generation (Item Pool)
● Suggestions for item wording:
● 1) Avoid items in the past tense
● 2) Construct items that include a single thought only
● 3) Avoid double-negatives
● 4) Prefer items with simple sentence structure
● 5) Avoid words denoting absoluteness such as only, just, always, none
● 6) Avoid items likely to be endorsed by everyone
● 7) Avoid items with multiple interpretations
● 8) Use simple and clear language
● 9) Keep items under 20 words.
● Reading level of 11-13 year old, 4th grade level
Phase D: Item Evaluation
● The item generation phase is completed when an expert panel reviews the item
pool
● The items generated are reviewed for quality and relevance by the expert panel
and/or by pilot testing
● Alternatively, four additional methods can be used to provide feedback on the
relevance, clarity, and unambiguousness of items: 1) field pretests, 2)
cognitive interviews, 3) randomized experiments and 4) focus groups
Phase D: Item Evaluation
● Proposed Criteria for retaining and discarding items before and after expert
reviewing;
○ Highest interpretability
○ Lowest ambiguity
○ Reject double-barrelled items (checking two things in one item) like “I feel
dizziness and trembling of the hands”
○ Reject items using jargon
○ Do not mix positively and negatively stated items
○ Avoid lengthy items
Phase D: Item Evaluation
● Pilot testing the items (Pretesting)
● This next stage includes administration to an appropriate sample
● After administration, item analysis will be conducted.
● An item analysis allows detection of items that are 1) ambiguous, 2) incorrectly keyed
or scored, 3) too easy or too hard, 4) not discriminative enough
● This phase generally consists of the following statistical techniques: 1) examine
the intercorrelations between all item pairs, 2) remove items with low correlations with
total score, 3) track the differences between the item means and the 25% of the expert
ratings.
● Items that have higher values are potentially better discriminators of the target
construct 4) take into account the characteristics of each item and practical
considerations - retain items with high item-total correlations and high
discrimination.
Phase D: Item Evaluation
● Detect Response Bias
● An additional consideration is whether an item causes response sets which either
bias responses or generate response artifacts.
● The most common response biases are 1) yeay-saying (acquiescence bias -
respondents agree with the statements), 2) nay-saying (respondents reject the
statements), consistency and availability artifacts, halo effects, and social
desirability bias. Likert scales may also present a central tendency bias -
respondents avoid selection of extreme scale categories.
Phase E: Testing the Psychometric Properties of the Scale
● In the final phase of the test development process, a validation study is always
carried out in a large and representative development sample to estimate the
psychometric properties of the scale.
● To test the dimensionality of the scale, factor analysis is conducted using
exploratory factor analysis and confirmatory factor analysis.
● Usually, scales are administered, analyzed, revised, and readministered a number
of times before their psychometric properties are acceptable.
Phase E: Testing the Psychometric Properties of the Scale
● Dimensionality
● A scale’s dimensionality, or factor structure, refers to the number and nature of the
variables reflected in its items.
● A scale measuring a single construct is called unidimensional.
● This means there is a single latent variable (factor) that underlies the scale items.
● A scale measuring two or more constructs (latent variables) is
multidimensional.
Any questions?

You might also like