Measuring Reading Comprehension
Measuring Reading Comprehension
The five articles in this special issue are a blend of experimental and correlational
approaches that exemplify advances in contemporary approaches to assessment of
reading comprehension. They illustrate how inferences about reading comprehen-
sion are determined in part by the material presented for comprehending and the for-
mat that is used for assessing comprehension of the material. In the future, the ap-
proaches to measuring reading comprehension in these articles could be further
integrated by perspectives that cut across particular approaches to measurement and
begin to utilize multimethod modeling of the latent variables that underlie different
tests of reading comprehension.
How is reading comprehension best measured? The five articles in this special is-
sue each address a set of problems that affect how reading comprehension is mea-
sured, utilizing an interesting blend of experimental and correlational methodolo-
gies. The clear consensus across these articles is that the measurement issues are
complicated, reflecting the complex, multidimensional nature of reading compre-
hension. Historically, education and psychology researchers have identified multi-
ple approaches to measurement, with different decades characterized by an em-
phasis on different approaches to assessment. As Pearson and Hamm (2005)
summarized, early research identified that reading comprehension involved multi-
ple components that would appear depending on the formats used to present the
material to be read and the manner in which the person was asked to indicate their
understanding of the material that was read. Despite this historical emphasis, many
modern approaches to the assessment of reading comprehension are one dimen-
sional, with little variation in the material the person reads and relatively narrow
response formats that do not vary within the test. Thus, some tests rely almost ex-
All five of these articles demonstrate that the material the participant is asked to
read is a major determinant of the inference that is made about the quality of com-
prehension. Clearly, different inferences will be made depending not only on the
difficulty of the text but also on the semantic, syntactic, and related characteristics
of the text. This is most clearly demonstrated in Deane et al.’s (2006/this issue) ar-
ticle but is also inherent in the eye movement study by Rayner et al. (2006/this is-
sue) and the explicit effort to minimize the role of certain text characteristics in
Francis et al. (2006/this issue). Deane et al.’s findings are particularly interesting,
showing that variations in text characteristics are related to standard ways of as-
sessing the readability of the passages. This research is consistent with other stud-
ies of text characteristics, most notably by Hiebert (2002) and Foorman, Francis,
Davidson, Harm, and Griffin (2005). Foorman et al. developed a computer pro-
gram for identifying lexical, semantic, and syntactic features of text in six different
commercial basal reading programs. They found considerable variability not only
in the composition of the text but also in the frequency of specific words taught in
the program. Such studies illustrate the importance of understanding text variabil-
ity as a determinant of the inferences made about reading comprehension. Cer-
tainly, having participants read text that is relatively restricted in composition lim-
its the inferences that can be made. Examinations of test characteristics should be
linked with methods derived from discourse analysis, which forms the basis for the
approach of Deane et al. to analyzing text characteristics. Millis et al. (2006/this is-
sue) nicely demonstrate the value of LSA not only for understanding the nature of
the passage but also for linking comprehension of the passage to the text.
MEASURING READING COMPREHENSION 327
RESPONSE FORMATS
All of the articles in this series demonstrate the importance of considering the for-
mat by which reading comprehension is assessed. This is most explicitly addressed
by Cutting and Scarborough (2006/this issue), who show differential discriminant
validity for a set of predictive variables as they predicted reading comprehension
outcomes. Similarly, the latent variable analyses of Francis et al. (2006/this issue)
also show differential forms of discriminant validity. The assessments of reading
comprehension in these two studies included standardized tests but expanded to
experimental and discourse-level procedures. Francis et al. and Rayner et al.
(2006/this issue) utilized a method that is independent of the participant’s actual
response to assess reading comprehension, showing that regressive eye move-
ments are good indicators of difficulties in comprehension and that these difficul-
ties are related to the complexity of the text. The use of LSA for analyzing
think-alouds in Millis et al.’s (2006/this issue) article seems particularly promis-
ing. It would be interesting to know more about how think-alouds are correlated
with performance on standardized tests of reading comprehension like those used
by Cutting and Scarborough. The use of the passages in the Nelson– Denny is in-
teresting, but the response format was somewhat modified. Across studies, the
finding of positive correlations across different measures of reading comprehen-
sion that varied in the format by which comprehension was measured was interest-
ing. Clearly, there are one or more latent variables that represent the different com-
ponents of reading comprehension that are imperfectly indexed by individual
measures, but these indices are correlated. Method bias, measurement error, and
the operation of general factors must be addressed in understanding these correla-
tions, but these components can be modeled.
INDIVIDUAL DIFFERENCES
Virtually all of the articles mentioned the need for diagnostic tests of reading com-
prehension. A diagnostic test of reading comprehension would provide informa-
tion on the strengths and weaknesses of individual people and their ability to com-
prehend reading material. Such assessments may suggest interventions at the
classroom or individual level that could be applied to enhance reading comprehen-
sion. Much is known about teaching reading comprehension, so knowing more
about how to link specific forms of instruction with the needs of individual readers
would be helpful.
Interestingly, the samples in each of these studies tended to be relatively re-
stricted in terms of the range of individual differences, with the possible exceptions
of the samples examined by Cutting and Scarborough (2006/this issue) and Fran-
cis et al. (2006/this issue). The expansion of these methods into more diverse popu-
328 FLETCHER
lations would be interesting, but larger samples would be needed. In the end, it is
likely that what one would conclude about diagnostic tests of reading comprehen-
sion would depend on how reading comprehension is assessed. The notion that in-
dividual differences can be understood with a single procedure is probably ques-
tionable. Adequate assessments of reading comprehension will have to rely on
multiple procedures that manipulate both the material that is read and the response
format. It is likely that the assessment will need to be expanded beyond just what
might happen in a psychoeducational assessment or a group-based high-stakes as-
sessment and take into account comprehension as it actually occurs in the class-
room. Far too much may have been made of the idea that reading comprehension is
an interaction among the reader and the text and occurs in a situated model, but
these are nonetheless relevant considerations when it comes to understanding indi-
vidual differences. It may be useful to incorporate observational techniques that in-
volve systematic querying of teachers about the quality of a particular student’s
reading comprehension.
This need highlights the importance of beginning to integrate research across
studies of the sort observed in this special issue and to begin to approach reading
comprehension from a multimethod perspective in which a full range of underly-
ing latent variables can be assessed and analytic methods like those used in Francis
et al. (2006/this issue) are applied to understand the relations of the constructs at
the level of the latent variable and not simply at the level of the observed variables.
CONCLUSIONS
If the goal is to develop diagnostic tests, which was mentioned by the authors of each
of the five articles, then approaches to the assessment of reading comprehension
need to incorporate multiple indicators to enhance the precision in which the under-
lying latent variables are measured. Otherwise, the results may have what the con-
struct validity world has termed a mono-operation bias (Cook & Campbell, 1979).
Although some argue that issues in assessing reading comprehension are so complex
that no psychometric approach can ever be adequate, the real issue is attempting to
model the complexity through experimental and correlational techniques. We need
multimethod research that moves beyond the unidimensional approaches that char-
acterize most contemporary approaches to measurement and looks at relations at the
latent variable level. Cook and Campbell identified construct underrepresentation,
in which a single variable does not adequately index the underlying constructs, as a
major factor limiting inferences about complex human behaviors. They noted that
the fundamental problem in construct validity research was “that the operations
which are meant to represent a particular cause or effect construct can be construed in
terms of more than one construct” (p. 59) or is underidentified because of mono-op-
eration bias. Research that isolates specific factors in reading comprehension is im-
MEASURING READING COMPREHENSION 329
portant. What is also needed is research that integrates across methods and specifi-
cally attempts to identify different constructs that are specific to reading
comprehension and their relation to other constructs that make up reading and other
cognitive skills. Such approaches will move beyond mono-operation bias and lead to
assessments that capture the richness of reading comprehension. Tests based on such
analyses not only will permit better inferences about reading comprehension but
also will be diagnostic, because variability within the test will be apparent for some
readers. This variability may be tied to differential instruction, which is the ultimate
purpose of assessing reading comprehension.
ACKNOWLEDGMENTS
This research was supported in part by a grant from the National Institute of Child
Health and Human Development, HD052117, Texas Center for Learning
Disabilities. I gratefully acknowledge contributions of Rita Taylor to manuscript
preparation.
REFERENCES
Brown, J. I., Fischo, V. V., & Hanna, G. S. (1993). Nelson-Denny Reading Test. Chicago: Riverside.
Cook, T. D., & Campbell, D. A. (1979). Quasi-experimental design: Design and analysis issues. Chi-
cago: Rand McNally.
Cutting, L. E., & Scarborough, H. S. (2006). Prediction of reading comprehension: Relative contribu-
tions of word recognition, language proficiency, and other cognitive skills can depend on how com-
prehension is measured. Scientific Studies of Reading, 10, 277–299.
Deane, P., Sheehan, K. M., Sabatini, J., Futagi, Y., & Kostin, I. (2006). Differences in text structure and
its implications for assessment of struggling readers. Scientific Studies of Reading, 10, 257–275.
Foorman, B. R., Francis, D. J., Davidson, K. C., Harm, M. W., & Griffin, J. (2004). Variability in text
features in six grade 1 basal reading programs. Scientific Studies of Reading, 8, 167–197.
Francis, D. J., Snow, C. E., August, D., Carlson, C. D., Miller, J., & Iglesias, A. (2006). Measures of
reading comprehension: A latent variable analysis of the diagnostic assessment of reading compre-
hension. Scientific Studies of Reading, 10, 301–322.
Hiebert, E. H. (2002). Standards, assessments, and text difficulty. In A. E. Farstrup & S. J. Samuels
(Eds.), What research has to say about reading instruction (pp. 337–391). Newark, DE: International
Reading Association.
Millis, K., Magliano, J., & Todaro, S. (2006). Measuring discourse-level processes with verbal proto-
cols and latent semantic analysis. Scientific Studies of Reading, 10, 225–240.
Pearson, P. D., & Hamm, D. N. (2005). The assessment of reading comprehension: A review of prac-
tices—Past, present, and future. In S. G. Paris & S. A. Stahl (Eds.), Children’s reading comprehen-
sion and assessment (pp. 13–69). Mahwah, NJ: Lawrence Erlbaum Associates, Inc.
RAND Reading Study Group. (2002). Reading for understanding. Washington, DC: RAND Education.
Rayner, K., Chace, K. H., Slattery, T. J., & Ashby, J. (2006). Eye movements as reflections of compre-
hension processes in reading. Scientific Studies of Reading, 10, 241–255.
330 FLETCHER
Wechsler, D. L. (1992). Wechsler Individual Achievement Test. San Antonio, TX: Psychological Corpo-
ration.
Wiederholt, L., & Bryant, B. (1992). Gray Oral Reading Test—3. Austin, TX: PRO-ED.
Woodcock, R. W., McGrew, K. S., & Mather, N. (2001). Woodcock–Johnson III Tests of Achievement.
Itasca, IL: Riverside.