Measuring Reading Comprehension

Uploaded by

Halim

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

4 views

Measuring Reading Comprehension

Uploaded by

Halim

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 8

SCIENTIFIC STUDIES OF READING, 10(3), 323–330

Copyright © 2006, Lawrence Erlbaum Associates, Inc.

Measuring Reading Comprehension

Jack M. Fletcher
Department of Psychology
University of Houston

The five articles in this special issue are a blend of experimental and correlational
approaches that exemplify advances in contemporary approaches to assessment of
reading comprehension. They illustrate how inferences about reading comprehen-
sion are determined in part by the material presented for comprehending and the for-
mat that is used for assessing comprehension of the material. In the future, the ap-
proaches to measuring reading comprehension in these articles could be further
integrated by perspectives that cut across particular approaches to measurement and
begin to utilize multimethod modeling of the latent variables that underlie different
tests of reading comprehension.

How is reading comprehension best measured? The five articles in this special is-
sue each address a set of problems that affect how reading comprehension is mea-
sured, utilizing an interesting blend of experimental and correlational methodolo-
gies. The clear consensus across these articles is that the measurement issues are
complicated, reflecting the complex, multidimensional nature of reading compre-
hension. Historically, education and psychology researchers have identified multi-
ple approaches to measurement, with different decades characterized by an em-
phasis on different approaches to assessment. As Pearson and Hamm (2005)
summarized, early research identified that reading comprehension involved multi-
ple components that would appear depending on the formats used to present the
material to be read and the manner in which the person was asked to indicate their
understanding of the material that was read. Despite this historical emphasis, many
modern approaches to the assessment of reading comprehension are one dimen-
sional, with little variation in the material the person reads and relatively narrow
response formats that do not vary within the test. Thus, some tests rely almost ex-

Correspondence should be sent to Jack M. Fletcher, Department of Psychology, University of

Houston, 2151 West Holcombe Boulevard, 222 TMC Annex, Houston, TX 77204–5053. E-mail:
[email protected]
324 FLETCHER

clusively on multiple choice, others on fill-in-the-blank (cloze), and others on re-

tells. The drive for high reliability, especially on high-stakes assessments, often
leads to significant restrictions of both the type of material that must be read and
the response formats. Yet, as the article by Cutting and Scarborough (2006/this is-
sue) clearly demonstrates, the inferences that are made about how well an individ-
ual person comprehends written material vary depending on how it is assessed.
Inferring how well a person comprehends is the real problem in measuring
reading (and language) comprehension (RAND Reading Study Group, 2002). In
contrast to word recognition, where the behavior of interest is fairly overt and dif-
ferent methods give similar outcomes at the latent variable level (accounting for
method variance and error of measurement), the assessment of reading compre-
hension is difficult because it is not an overt process that can be directly observed.
Rather, only the products of the process of comprehending are observed, and an in-
ference is made about the nature of the processes and the quality of the comprehen-
sion. As the article by Francis et al. (2006/this issue) demonstrates, any single,
one-dimensional attempt to assess reading comprehension is inherently imperfect.
From the psychometric view of Francis et al., differences across methods used to
measure reading comprehension can be interpreted as degrees of imperfection in
how well different indicators identify one or more latent variables that make up
reading comprehension. In this commentary I examine the five articles from this
admittedly psychometric perspective.
This perspective does help place the five articles in this issue in perspective. Each
explicitly focuses on issues involving measurement, specifically, what is read and
how comprehension of what is read is measured. Three articles are essentially exper-
imental in their origins and address both of these determinants of reading compre-
hension in different ways. Millis, Magliano, and Todaro (2006/this issue) focused on
the comprehension of expository text and asked for responses that represented
think-alouds. These responses were then analyzed at the level of discourse using la-
tent semantic analysis (LSA). The validity of this approach was evaluated by corre-
lating the products of LSA with a passage taken from Nelson–Denny reading com-
prehension test (Brown, Fischo, & Hanna, 1993). Questions were developed to
represent literal (text-based) and inferential understanding of the passages. They
found good convergence between the LSA methods of assessing reading compre-
hension and the indices that are obtained in the Nelson–Denny.
Although the difficulty level of the passages was controlled in Millis et al.’s
(2006/this issue) article, Deane, Sheehan, Sabatini, Futagi, and Kostin (2006/this
issue) explicitly demonstrated that variations in the lexical and discourse elements
of different passages accounted for variations in how comprehension was tested at
different grade levels. They measured text-based characteristics of lexiled Grade 3
and 6 expository and narrative books. Based in part on LSA, they identified a vari-
ety of text characteristics that were factor analyzed to identify different dimensions
of text variability. Comparisons of the factor structure of Grade 3 and 6 texts
MEASURING READING COMPREHENSION 325

showed differences in a variety of indices of complexity that accounted for differ-

ences in readability and undoubtedly would explain differences in levels of com-
prehension depending on the level of reading ability.
The third experimental article (Rayner, Chace, Ashby, & Slattery, 2006/this is-
sue) examined the role of eye movements as indicators of comprehension pro-
cesses. In a series of studies, the authors monitored eye movements and showed
that variations in eye movements reflect the difficulty of the passage as well as dif-
ficulties dealing with inconsistencies in the text. When the text is more difficult,
comprehension is inconsistent, and there is a much higher probability of a regres-
sive eye movement. The nature of regressive eye movements when reading text is
an indicator of the operation of different reading comprehension processes. Thus,
the presence of long regression usually indicates that the reader has found a part of
the text in which his or her understanding is not adequate, leading him or her to re-
read earlier sections of the passage.
The two correlational studies explicitly show that the inferences about reading
comprehension depend on how reading comprehension is measured. Cutting and
Scarborough (2006/this issue) addressed this issue explicitly by examining the
contributions of different measures of word decoding and oral language ability to
three different reading comprehension tests: the Wechsler Individual Achievement
Test (WIAT; Wechsler, 1992), the Gates–MacGinitie (MacGinitie, Maria, &
Dryer, 2000), and the Gray Oral Reading Test—Third Edition (Wiederholt &
Bryant, 1992). These three tests vary in terms of the material the person reads and
the format used to indicate his or her understanding. The WIAT presents short pas-
sages of 2 to 3 sentences that are read silently. The participant then answers
open-ended questions about each passage. One question is literal, and the other is
inferential, and a short-answer approach is used. In contrast, the Gates–MacGinitie
involves passages up to 15 sentences that are also read silently. The response for-
mat is multiple choice. Finally, the Gray Oral Reading Test—Third Edition in-
volves passages of about 6 to 7 sentences that are read aloud. The response format
involves multiple-choice questions. In contrast to the Gates–MacGinitie and
WIAT, the participant must respond with the passage out of view. Not surprisingly,
Cutting and Scarborough found that the pattern of correlations of these different
assessments to different measures of decoding and oral language skills varies and
is not constant.
Francis et al. (2006/this issue) explored similar issues but used a latent variable
approach to test specific models about the relation of decoding, language ability,
and comprehension. They also found that the relation of decoding and language
abilities depends on how comprehension is assessed. Two standardized tests of
reading comprehension were used: Passage Comprehension from the Wood-
cock–Johnson–III (Woodcock, McGrew, & Mather, 2001) and the Diagnostic As-
sessment of Reading Comprehension (DARC). The Passage Comprehension
subtest is a widely used measure that involves reading sentences or short passages,
326 FLETCHER

with a cloze procedure as a response format. The DARC is an experimental test

that involves reading passages of three sentences, but it is specifically designed to
control for the level of decoding that is required. The passages have been carefully
constructed to manipulate components that involve text memory and inferencing
as well as knowledge access and integration. They also measure comprehension
using linguistic discourse methods. Not surprisingly, the decoding and phonologi-
cal awareness measures are more strongly related to the Woodcock–Johnson–III
Passage Comprehension test, and the contribution to these types of measures are
not strongly related to the DARC.
These five articles are all noteworthy for the different components of reading
comprehension that are assessed. In the remainder of this commentary, I highlight
three issues that seem to be particularly important for the measurement of reading
comprehension: nature of the text, how reading comprehension is assessed, and in-
dividual differences.

NATURE OF THE TEXT

All five of these articles demonstrate that the material the participant is asked to
read is a major determinant of the inference that is made about the quality of com-
prehension. Clearly, different inferences will be made depending not only on the
difficulty of the text but also on the semantic, syntactic, and related characteristics
of the text. This is most clearly demonstrated in Deane et al.’s (2006/this issue) ar-
ticle but is also inherent in the eye movement study by Rayner et al. (2006/this is-
sue) and the explicit effort to minimize the role of certain text characteristics in
Francis et al. (2006/this issue). Deane et al.’s findings are particularly interesting,
showing that variations in text characteristics are related to standard ways of as-
sessing the readability of the passages. This research is consistent with other stud-
ies of text characteristics, most notably by Hiebert (2002) and Foorman, Francis,
Davidson, Harm, and Griffin (2005). Foorman et al. developed a computer pro-
gram for identifying lexical, semantic, and syntactic features of text in six different
commercial basal reading programs. They found considerable variability not only
in the composition of the text but also in the frequency of specific words taught in
the program. Such studies illustrate the importance of understanding text variabil-
ity as a determinant of the inferences made about reading comprehension. Cer-
tainly, having participants read text that is relatively restricted in composition lim-
its the inferences that can be made. Examinations of test characteristics should be
linked with methods derived from discourse analysis, which forms the basis for the
approach of Deane et al. to analyzing text characteristics. Millis et al. (2006/this is-
sue) nicely demonstrate the value of LSA not only for understanding the nature of
the passage but also for linking comprehension of the passage to the text.
MEASURING READING COMPREHENSION 327

RESPONSE FORMATS

All of the articles in this series demonstrate the importance of considering the for-
mat by which reading comprehension is assessed. This is most explicitly addressed
by Cutting and Scarborough (2006/this issue), who show differential discriminant
validity for a set of predictive variables as they predicted reading comprehension
outcomes. Similarly, the latent variable analyses of Francis et al. (2006/this issue)
also show differential forms of discriminant validity. The assessments of reading
comprehension in these two studies included standardized tests but expanded to
experimental and discourse-level procedures. Francis et al. and Rayner et al.
(2006/this issue) utilized a method that is independent of the participant’s actual
response to assess reading comprehension, showing that regressive eye move-
ments are good indicators of difficulties in comprehension and that these difficul-
ties are related to the complexity of the text. The use of LSA for analyzing
think-alouds in Millis et al.’s (2006/this issue) article seems particularly promis-
ing. It would be interesting to know more about how think-alouds are correlated
with performance on standardized tests of reading comprehension like those used
by Cutting and Scarborough. The use of the passages in the Nelson– Denny is in-
teresting, but the response format was somewhat modified. Across studies, the
finding of positive correlations across different measures of reading comprehen-
sion that varied in the format by which comprehension was measured was interest-
ing. Clearly, there are one or more latent variables that represent the different com-
ponents of reading comprehension that are imperfectly indexed by individual
measures, but these indices are correlated. Method bias, measurement error, and
the operation of general factors must be addressed in understanding these correla-
tions, but these components can be modeled.

INDIVIDUAL DIFFERENCES

Virtually all of the articles mentioned the need for diagnostic tests of reading com-
prehension. A diagnostic test of reading comprehension would provide informa-
tion on the strengths and weaknesses of individual people and their ability to com-
prehend reading material. Such assessments may suggest interventions at the
classroom or individual level that could be applied to enhance reading comprehen-
sion. Much is known about teaching reading comprehension, so knowing more
about how to link specific forms of instruction with the needs of individual readers
would be helpful.
Interestingly, the samples in each of these studies tended to be relatively re-
stricted in terms of the range of individual differences, with the possible exceptions
of the samples examined by Cutting and Scarborough (2006/this issue) and Fran-
cis et al. (2006/this issue). The expansion of these methods into more diverse popu-
328 FLETCHER

lations would be interesting, but larger samples would be needed. In the end, it is
likely that what one would conclude about diagnostic tests of reading comprehen-
sion would depend on how reading comprehension is assessed. The notion that in-
dividual differences can be understood with a single procedure is probably ques-
tionable. Adequate assessments of reading comprehension will have to rely on
multiple procedures that manipulate both the material that is read and the response
format. It is likely that the assessment will need to be expanded beyond just what
might happen in a psychoeducational assessment or a group-based high-stakes as-
sessment and take into account comprehension as it actually occurs in the class-
room. Far too much may have been made of the idea that reading comprehension is
an interaction among the reader and the text and occurs in a situated model, but
these are nonetheless relevant considerations when it comes to understanding indi-
vidual differences. It may be useful to incorporate observational techniques that in-
volve systematic querying of teachers about the quality of a particular student’s
reading comprehension.
This need highlights the importance of beginning to integrate research across
studies of the sort observed in this special issue and to begin to approach reading
comprehension from a multimethod perspective in which a full range of underly-
ing latent variables can be assessed and analytic methods like those used in Francis
et al. (2006/this issue) are applied to understand the relations of the constructs at
the level of the latent variable and not simply at the level of the observed variables.

CONCLUSIONS

If the goal is to develop diagnostic tests, which was mentioned by the authors of each
of the five articles, then approaches to the assessment of reading comprehension
need to incorporate multiple indicators to enhance the precision in which the under-
lying latent variables are measured. Otherwise, the results may have what the con-
struct validity world has termed a mono-operation bias (Cook & Campbell, 1979).
Although some argue that issues in assessing reading comprehension are so complex
that no psychometric approach can ever be adequate, the real issue is attempting to
model the complexity through experimental and correlational techniques. We need
multimethod research that moves beyond the unidimensional approaches that char-
acterize most contemporary approaches to measurement and looks at relations at the
latent variable level. Cook and Campbell identified construct underrepresentation,
in which a single variable does not adequately index the underlying constructs, as a
major factor limiting inferences about complex human behaviors. They noted that
the fundamental problem in construct validity research was “that the operations
which are meant to represent a particular cause or effect construct can be construed in
terms of more than one construct” (p. 59) or is underidentified because of mono-op-
eration bias. Research that isolates specific factors in reading comprehension is im-
MEASURING READING COMPREHENSION 329

portant. What is also needed is research that integrates across methods and specifi-
cally attempts to identify different constructs that are specific to reading
comprehension and their relation to other constructs that make up reading and other
cognitive skills. Such approaches will move beyond mono-operation bias and lead to
assessments that capture the richness of reading comprehension. Tests based on such
analyses not only will permit better inferences about reading comprehension but
also will be diagnostic, because variability within the test will be apparent for some
readers. This variability may be tied to differential instruction, which is the ultimate
purpose of assessing reading comprehension.

ACKNOWLEDGMENTS

This research was supported in part by a grant from the National Institute of Child
Health and Human Development, HD052117, Texas Center for Learning
Disabilities. I gratefully acknowledge contributions of Rita Taylor to manuscript
preparation.

REFERENCES

Brown, J. I., Fischo, V. V., & Hanna, G. S. (1993). Nelson-Denny Reading Test. Chicago: Riverside.
Cook, T. D., & Campbell, D. A. (1979). Quasi-experimental design: Design and analysis issues. Chi-
cago: Rand McNally.
Cutting, L. E., & Scarborough, H. S. (2006). Prediction of reading comprehension: Relative contribu-
tions of word recognition, language proficiency, and other cognitive skills can depend on how com-
prehension is measured. Scientific Studies of Reading, 10, 277–299.
Deane, P., Sheehan, K. M., Sabatini, J., Futagi, Y., & Kostin, I. (2006). Differences in text structure and
its implications for assessment of struggling readers. Scientific Studies of Reading, 10, 257–275.
Foorman, B. R., Francis, D. J., Davidson, K. C., Harm, M. W., & Griffin, J. (2004). Variability in text
features in six grade 1 basal reading programs. Scientific Studies of Reading, 8, 167–197.
Francis, D. J., Snow, C. E., August, D., Carlson, C. D., Miller, J., & Iglesias, A. (2006). Measures of
reading comprehension: A latent variable analysis of the diagnostic assessment of reading compre-
hension. Scientific Studies of Reading, 10, 301–322.
Hiebert, E. H. (2002). Standards, assessments, and text difficulty. In A. E. Farstrup & S. J. Samuels
(Eds.), What research has to say about reading instruction (pp. 337–391). Newark, DE: International
Reading Association.
Millis, K., Magliano, J., & Todaro, S. (2006). Measuring discourse-level processes with verbal proto-
cols and latent semantic analysis. Scientific Studies of Reading, 10, 225–240.
Pearson, P. D., & Hamm, D. N. (2005). The assessment of reading comprehension: A review of prac-
tices—Past, present, and future. In S. G. Paris & S. A. Stahl (Eds.), Children’s reading comprehen-
sion and assessment (pp. 13–69). Mahwah, NJ: Lawrence Erlbaum Associates, Inc.
RAND Reading Study Group. (2002). Reading for understanding. Washington, DC: RAND Education.
Rayner, K., Chace, K. H., Slattery, T. J., & Ashby, J. (2006). Eye movements as reflections of compre-
hension processes in reading. Scientific Studies of Reading, 10, 241–255.
330 FLETCHER

Wechsler, D. L. (1992). Wechsler Individual Achievement Test. San Antonio, TX: Psychological Corpo-
ration.
Wiederholt, L., & Bryant, B. (1992). Gray Oral Reading Test—3. Austin, TX: PRO-ED.
Woodcock, R. W., McGrew, K. S., & Mather, N. (2001). Woodcock–Johnson III Tests of Achievement.
Itasca, IL: Riverside.