Interpreting Change On The WAIS-III/WMS-III in Clinical Samples
Interpreting Change On The WAIS-III/WMS-III in Clinical Samples
Abstract
Clinicians should note that there is considerable variability in the reliabilities of the index and
subtest scores derived from the third editions of the Wechsler Adult Intelligence Scale (WAIS-III) and
the Wechsler Memory Scale (WMS-III). The purpose of this article is to review these reliabilites and to
illustrate how they can be used to interpret change in patients' performances from test to retest. The
WAIS-III IQ and Index scores are consistently the most reliable scores, in terms of both internal
consistency and test ± retest reliability. The most internally consistent WAIS-III subtests are
Vocabulary, Information, Digit Span, Matrix Reasoning, and Arithmetic. Information and Vocabulary
have the highest test ±retest reliability. On the WMS-III, the Auditory Immediate Index, Immediate
Memory Index, Auditory Delayed Index, and General Memory Index are the most reliable, in terms of
both internal consistency and test ± retest reliability. The Logical Memory I and Verbal Paired
Associates I subtests are the most reliable. Data from three clinical groups (i.e., Alzheimer's disease,
chronic alcohol abuse, and schizophrenia) were extracted from the Technical Manual [Psychological
Corporation (1997). WAIS-III/WMS-III Technical Manual. San Antonio: Harcourt Brace] for the
purpose of calculating reliable change estimates. A table of confidence intervals for test ± retest
measurement error is provided to help the clinician determine if patients have reliably improved or
deteriorated on follow-up testing. D 2001 National Academy of Neuropsychology. Published by
Elsevier Science Ltd.
* Department of Psychiatry, University of British Columbia, 2255 Wesbrook Mall, Vancouver, B.C., Canada
V6T 2A1. Tel.: +1-604-822-7588; fax: +1-604-822-7756.
At this point in time, based on available data, rehabilitation psychologists and neuropsy-
chologists should have the most confidence in the index scores derived from the third editions
of the Wechsler Adult Intelligence Scale (WAIS-III) and the Wechsler Memory Scale (WMS-
III). With some notable exceptions, many of the subtests, especially those from the WMS-III,
have reliabilities that limit their clinical usefulness as independent measures of specific
cognitive abilities. This is a common problem; many tests used in clinical neuropsychology
suffer from low reliability.
There are numerous reliability tables in the WAIS/WMS Technical Manual (Psychological
Corporation, 1997). These tables were studied, and some of the most relevant information for
graduate students and busy clinicians were distilled. First and foremost, the WAIS-III IQ and
Index scores are consistently the most reliable scores, in terms of both internal consistency
and test±retest reliability. The most internally consistent WAIS-III subtests are Vocabulary,
Information, Digit Span, Matrix Reasoning, and Arithmetic. Similarities and Block Design
also have quite high internal consistency. Information has the highest test±retest reliability,
followed by Vocabulary.
On the WMS-III, the Auditory Immediate, Immediate Memory, and General Memory
Indexes are the most internally consistent index scores. The Verbal Paired Associates I and
the Logical Memory I are the most internally consistent of the primary subtest scores. The
uncorrected test±retest reliabilities of the primary subtest scores, with the exception of Verbal
Paired Associates I in older adults, all range from 0.58 to 0.79. The following index scores
have uncorrected test±retest reliabilities greater than 0.80: Auditory Immediate, Immediate
Memory, Auditory Delayed Memory, and General Memory. The test±retest reliability of the
Working Memory Index in older adults is 0.80.
The Index and subtest scores were sorted into two groups, ``most reliable'' and ``least
reliable''. To be classified as ``most reliable,'' the score had to have adequate internal
consistency (0.85±0.99) and adequate test±retest reliability (0.75±0.99). To be classified as
``least reliable,'' the score had to have low internal consistency (<0.80) and low test±retest
reliability (<0.70), or a test retest reliability coefficient below 0.60. The classification results
are presented in Table 1.
G.L. Iverson / Archives of Clinical Neuropsychology 16 (2001) 183±191 185
The information presented in the preceding text and in Table 1 has four clear implications
for day-to-day clinical practice. The WAIS-III Index scores and four of the WMS-III Indexes
(Auditory Immediate, Immediate Memory, Auditory Delayed, and General Memory) are the
most reliable, so the clinician should have the greatest confidence in the precision of these
scores. Second, if the psychologist wanted to choose certain subtests on the WAIS-III to
describe individually in a report, he or she may decide to select those with high internal
consistency, such as Vocabulary, Similarities, Information, Digit Span, Block Design, and
Data provided in the Technical Manual (Psychological Corporation, 1997) can be used to
interpret change in several clinical samples on the WAIS-III/WMS-III. Some clinicians use
clinical judgment to interpret change, whereas others use psychometric data, such as standard
errors of measurement (SEMs), combined with clinical judgement. In general, incorporating
SEM information for the purpose of estimating change is preferred over clinical judgment
alone. However, a limitation of SEMs is that they are used to provide a confidence band
around a score at a single point in time. That is, they are most useful for interpreting single
test scores. The standard error of the difference (i.e., Sdiff) is more appropriate for creating a
confidence band relevant to two scores. The Sdiff formula includes the SEM from time 1 and
time 2. Therefore, it is the standard error of difference that provides the clinician with an
estimate of possible measurement error relating to test±retest scores. The purpose of this
section is to provide the clinician with tables to assist with determinations of improvement or
decline on the WAIS-III and WMS-III in three clinical samples.
3. Reliable change
A reliable change methodology (e.g., Jacobson & Truax, 1991; Chelune et al., 1993) can
be used for assessing whether a retest change in a given variable is reliable and meaningful.
186 G.L. Iverson / Archives of Clinical Neuropsychology 16 (2001) 183±191
Table 1
WAIS-III/WMS-III reliabilities
Adequate internal consistency (0.85 ± 0.99)
WAIS-III ages 16 ± 29: VC, AR, DS, IN, BD, MR, VCI, POI, WMI, PSI
WAIS-III ages 30 ± 54: VC, SI, AR, DS, IN BD, MR, VCI, POI, WMI, PSI
WAIS-III ages 55 ± 74: VC, SI, AR, DS, IN, CO, PC, BD, MR, VCI, POI, WMI, PSI
WAIS-III ages 75 ± 89: VC, SI, AR, DS, IN, LN, MR, VCI, POI, WMI, PSI
WMS-III ages 16 ± 54: LM I, VPA I, AII, IMI, ADI, GMI, WMI
Table 1 (continued)
Least reliable test scores
WAIS-III ages 16 ± 29: LN, PA, SS, OA
WAIS-III ages 30 ± 54: None
WAIS-III ages 55 ± 74: PA
WAIS-III ages 75 ± 89: DS, OA
WMS-III ages 16 ± 54: Faces I, LN, Faces II, ARDTS, ARDI, LMTH I, WL I,
VR I, SPSF, SPSB, LMTH II, WL II, WLRecog, VR
This method provides an estimate of the probability that a given difference score would not be
obtained by chance; that is, the score would not be due to measurement error. Essentially, a
confidence interval can be formed around a score that reflects the reliability of the test.
The primary measure of interest is the Sdiff. The Sdiff is derived from the SEM, which in
turn is derived from the test±retest reliability of the instrument (rxx) and the standard
deviation (SD) of the population of interest. The confidence band is formed by multiplying
the Sdiff by a value from the z-distribution. Multiplying by a value of 1.64, for example,
results in a change score in either direction that would be unlikely to occur by chance
( p<0.05 in each tail). The formulas for calculating reliable change are presented in Table 2.
Data from three clinical groups (i.e., Alzheimer's disease, chronic alcohol abuse, and
schizophrenia) were extracted form the WAIS/WMS Technical Manual (Psychological
Corporation, 1997) for the purpose of calculating reliable change estimates. The SEMs and
Sdiff 's for all the IQ and Index scores are presented in Table 3. Rounding to two decimal
places occurred at each step in calculating these values. Since none of the clinical groups
were tested twice, the test±retest correlations from the normal subjects in the standardization
sample were used to calculate the SEMs. Similarly, no retest SEM could be calculated, so the
formula for the estimated Sdiff was used (see Table 2).
188 G.L. Iverson / Archives of Clinical Neuropsychology 16 (2001) 183±191
Table 2
Formulas for calculating reliable change
SEM p
SEM SD 1 ÿ r
SD = standard deviation of the comparison sample
r = test ± retest reliability of the comparison sample
a
This formula often has been used in the literature on reliable change. Technically, it is incorrect; it represents
an ``estimated'' standard error of difference because the SEM for time 1 is weighed instead of using the SEM for
time 2.
It should be obvious from the formulas in Table 2 that the size of the Sdiff is related to the
SD and the test±retest coefficient. Therefore, larger SDs and smaller correlations will result in
larger SEMs and Sdiff 's. The Auditory Immediate Index, for example, has an Sdiff of 6.07 for
Table 3
SEM and standard error of difference scores on the WAIS-III/WMS-III
WMS-III
Auditory Immediate 4.29 6.07 6.16 8.71 6.24 8.82
Visual Immediate 5.34 7.55 7.40 10.47 7.29 10.31
Immediate Memory 4.56 6.45 6.60 9.33 6.28 8.88
Auditory Delayed 3.84 5.43 6.51 9.21 6.68 9.45
Visual Delayed 3.97 5.61 6.89 9.74 7.55 10.68
Auditory Recognition 4.64 6.56 8.19 11.58 9.39 13.28
General Memory 3.12 4.41 5.11 7.23 5.69 8.05
Working Memory 7.61 10.76 5.12 7.24 7.65 10.82
The number of patients per group was as follows: Alzheimer's = 35, chronic alcohol abuse = 28, and
schizophrenia = 42.
The following uncorrected stability coefficients were used to calculate the SEMs: Alzheimer's group Ð WAIS-
III age group 55 ± 74 and WMS-III age group 55 ± 89; chronic alcohol abuse Ð WAIS-III age group 30 ± 54 and
WMS-III age group 16 ± 54; and schizophrenia Ð WAIS-III age group 30 ± 54 and WMS-III age group 16 ± 54.
G.L. Iverson / Archives of Clinical Neuropsychology 16 (2001) 183±191 189
Table 4
Confidence intervals for measurement error in three clinical groups
WMS-III
Auditory Immediate 7.77 9.95 11.15 14.28 11.29 14.46
Visual Immediate 9.66 12.38 13.40 17.17 13.20 16.91
Immediate Memory 8.26 10.58 11.94 15.30 11.37 14.56
Auditory Delayed 6.95 8.91 11.79 15.10 12.10 14.50
Visual Delayed 7.18 9.20 12.47 15.97 13.67 17.52
Auditory Recog. Delayed 8.40 10.76 14.82 18.99 17.00 21.78
General Memory 5.64 7.23 9.25 11.86 10.30 13.20
Working Memory 13.77 17.65 9.27 11.87 13.85 17.25
Alzheimer's patients and 8.82 for persons with schizophrenia (Table 3). The 90% confidence
band for these two groups is 10 points and 15 points, respectively. This is because the
sample SD for the patients with Alzheimer's was only 11.0, compared to 15.6 for persons
with schizophrenia. Less variability in the sample resulted in lower reliable change scores.
The following three examples illustrate the clinical use of Table 4. First, a 67-year-old man
with suspected mild Alzheimer's disease is tested twice with the WMS-III at a 14-month
interval. His Auditory Delayed Index score dropped from 81 to 73. By looking at Table 4,
the clinician can conclude that this eight point decline is not due to measurement error
(0.80 confidence, two-tailed). Second, a 42-year-old woman with chronic alcoholism
completes the WMS-III after 1 month and 12 months of abstinence. Her General Memory
Index improves from 80 to 88. The clinician cannot be confident that her change in
performance is ``real''; that is, it is not due to measurement error. Third, a 37-year-old man
with schizophrenia completes the WAIS-III shortly after admission to a psychiatric facility.
His Working Memory and Processing Speed Index scores were 81 and 77, respectively.
His psychiatric condition is stabilized and 12 months post discharge (15-month test±retest
interval), he obtains a Working Memory Index score of 91 and a Processing Speed Index
score of 86. The clinician can be confident that these 10- and 9-point changes are not
due to measurement error (0.80 confidence interval, two-tailed).1 The purpose of Table 4
1
It is entirely possible that future research will alter the interpretation of change in this example. Specifically,
this patient may be showing some influence of practice and regression to the mean on his retest scores. As this
information becomes available, the values in Table 4 can be adjusted accordingly.
190 G.L. Iverson / Archives of Clinical Neuropsychology 16 (2001) 183±191
is to provide preliminary psychometric data that can be used to assist the psychologist's
clinical decision regarding whether a specific patient has improved, declined, or remained
stable on follow-up testing. These data are meant to supplement, rather than replace,
clinical judgement.
5. Discussion
References
Chelune, G. J., Naugle, R. I., Luders, H., Sedlak, J., & Awad, I. A. (1993). Individual change after epilepsy
surgery: practice effects and base-rate information. Neuropsychology 7, 41 ± 52.
Iverson, G. L. (1998). Interpretation of Mini-Mental State Examination scores in community-dwelling
elderly and geriatric neuropsychiatry patients. Int J Geriatr Psychiatry 13, 661 ± 666.
G.L. Iverson / Archives of Clinical Neuropsychology 16 (2001) 183±191 191
Jacobson, N. S., & Truax, P. (1991). Clinical significance: a statistical approach to defining meaningful
change in psychotherapy research. J Consult Clin Psychol 59, 12 ± 19.
Psychological Corporation (1997). WAIS-III/WMS-III Technical Manual. San Antonio: Harcourt Brace.
Speer, D. C. (1992). Clinically significant change: Jacobson and Truax (1991) revisited. J Consult Clin
Psychol 60, 402 ± 408.