On The Impact of Curriculum Embedded Formative Assessment On Learning A Collaboration Between Curriculum and Assessment Developers
On The Impact of Curriculum Embedded Formative Assessment On Learning A Collaboration Between Curriculum and Assessment Developers
To cite this article: Richard J. Shavelson , Donald B. Young , Carlos C. Ayala , Paul R.
Brandon , Erin Marie Furtak , Maria Araceli Ruiz-Primo , Miki K. Tomita & Yue Yin (2008) On the
Impact of Curriculum-Embedded Formative Assessment on Learning: A Collaboration between
Curriculum and Assessment Developers, Applied Measurement in Education, 21:4, 295-314, DOI:
10.1080/08957340802347647
1532-4818
0895-7347
HAME
Applied Measurement in Education
Education, Vol. 21, No. 4, August 2008: pp. 1–32
RESEARCH ARTICLES
School of Education
Stanford University
Donald B. Young
College of Education
University of Hawaii
Carlos C. Ayala1
School of Education
Sonoma State University
Paul R. Brandon
College of Education
University of Hawaii
Erin Marie Furtak
Max Planck Institute for Human Development
1
Authorship is alphabetical following the main authors.
Correspondence should be addressed to Richard J. Shavelson, School of Education, Stanford
University, 485 Lasuen Mall, Stanford, CA 94305-3096. E-mail: [email protected]
296 SHAVELSON ET AL.
Miki K. Tomita
School of Education
Stanford University
The Curriculum Research and Development Group
University of Hawaii
Yue Yin
College of Education
University of Illinois at Chicago
College of Education
University of Hawaii at Manoa
Assessment of and for learning has occupied center stage in education reform,
especially with the advent of the No Child Left Behind Federal legislation. This
study examined the formative function of assessment—assessment for learn-
ing—recognizing that such assessment needs to be aligned, at least in part, with
the summative function of assessment—indexing achievement against stan-
dards and progress. Black and Wiliam (1998) suggested that formative assess-
ment might very well improve student learning. Based on these ideas and our
own experience with reform science education, we hypothesized that for a small
investment of resources we might have a major impact on achievement by
embedding formative assessments in a nationally used curriculum. To this end
we created a collaboration, described here, between curriculum and assessment
developers, created embedded, formative assessments, and studied the impact
of science teachers teaching with these materials on middle-school students’
motivation, achievement, and conceptual change in a small randomized trial.
We also studied the collaboration itself with the intent of informing others who
might wish to enter into such collaboration about the potential strengths and
challenges experienced. The articles that follow in this special issue report in
detail what we did and found out; this article provides a rationale and overview
for the study.
INTRODUCTION
In the waves of reform that have swept over education in the past 45 years,
assessment has become a major policy lever for improving education through
comparisons among schools against standards (assessment’s summative
function). It has also become an instrument for improving classroom teach-
ing and learning (assessment’s formative function). Indeed, assessment,
especially assessment for improving learning, has increasingly been viewed
ON THE IMPACT OF FORMATIVE ASSESSMENT ON LEARNING 297
as an integral part of, no longer separate from, teaching. When the formative
and summative functions of assessment are aligned so that the signals about
what counts as achievement are consistent to educators, students, parents, and the
public, assessment is expected to improve student learning (e.g., Wilson &
Bertenthal, 2005).2 The research reported in this special issue focuses on the
formative function of assessment. Nevertheless, we recognize that what goes
into formative assessment for learning needs to be aligned with what policy
makers and the public hold teachers and schools accountable for in summa-
tive assessment of student achievement. When aligned, not only do we
expect enhanced student learning but also opportunities for students to
understand what is required of them on standardized achievement tests in
high stakes accountability environments by performing on the embedded
assessments.
The “big idea” behind this work was that if we could embed assessments in a
nationally used curriculum to help guide teaching and learning, and if these
assessments had the salutary effect on learning and achievement that research
suggested (Black & Wiliam, 1998), then for a relatively small investment
(embedding assessments) we might experience a substantial impact on learning
and achievement for large numbers of students. Moreover, assuming alignment
(at least to some degree) between curriculum assessments and standardized
assessments, students might transfer their embedded and end-of-unit test perfor-
mance skills to standardized testing situations.
Practical Influence
We did not, however, immediately arrive at this big idea. Our work grew out of
two lines of influence, one practical and one scholarly. On the practical side, we
had worked extensively with teachers and curriculum developers. We realized
that with the recognition of assessment as integral to education reform, the stage
was set for “a romance (collaboration) between curriculum reform and assess-
ment reform” (Shavelson, 1995, p. 58).
In working with science teachers, especially with teachers in an exemplary
inquiry-science elementary school (Shavelson 1995), a central question
emerged: “What do we intend students to learn in this unit?” Moreover, we
found that teachers needed end-of-unit assessments aligned with inquiry science
goals to give them a sense of what students should learn. And they also needed
mini-assessments that were coordinated with the end-of-unit assessment and
2
Indeed, in some states this alignment is happening, perhaps somewhat less optimally in desperation
rather than planned, as summative test-like items are provided to teachers for their use in class in
response to the pressures of the NCLB federal legislation.
298 SHAVELSON ET AL.
Scholarly Influence
The second line of influence was scholarly. This line of work addressed two
questions beyond the question, “Where do we want to go?” They are: “Where are
we? How do we get where we want to go?” The answer seemed to come in a
review of literature published by Black and Wiliam (1998) on the effects of what
they called “formative assessment,” on student learning. They reported that for-
mative assessment—assessment that provided immediate feedback to students on
how to improve their learning—produced a large positive effect on students’
ON THE IMPACT OF FORMATIVE ASSESSMENT ON LEARNING 299
learning. They also noted that this kind of feedback rarely occurred in class-
rooms, that studies of teachers’ formative assessment practices were needed, and
if classroom formative assessments were effective, that the potential for improv-
ing students’ learning was substantial.
THE STUDY
If our practical experience and Black and Wiliam were on target, embedding for-
mative assessments in a science curriculum should lead to improved teaching and
student learning. To test out this hypothesis, we began a collaboration between
curriculum developers in the Curriculum Research & Development Group
(CRDG) at the University of Hawaii and assessment developers in the Stanford
Education Assessment Laboratory at Stanford University (SEAL). Our goals
were twofold: (1) to test out our hypothesis that embedding formative assess-
ments within a science curriculum—CRDG’s Foundational Approaches in Sci-
ence Teaching (Pottenger & Young, 1992)—would improve teaching and
learning; and (2) to study and evaluate the assessment development process that
emerged out of this collaborative relationship.
Before describing the program of research more fully, we note several unique
features of the work. First, through a series of iterative studies we refined the
embedded assessments. This curriculum-and-assessment development work
culminated in a final study that tested the effects of embedded assessments on
teaching and students’ learning in a small randomized field trial. Second, we went
beyond the usual definitions of science achievement as largely acquisition of
declarative and procedural knowledge and evaluated the claim that formative
assessment promotes conceptual change. We conjectured that formative assess-
ment would do so directly and possibly indirectly through enhancing student
motivation and/or achievement. Third, we studied the collaboration itself—a
“study within a study”—trying to understand its ups and downs with the intent to
informing future such attempts. We followed Cronbach and Associates’ (1980,
p. 214) idea that methods of evaluation (in our case the collaborative methods
used to develop and study formative assessment) would improve faster if we
(evaluators or researchers) could provide a retrospective perspective on the
study choices. And finally, as this special issue documents, we examined for-
mative assessment in greater depth than has been reported in previous empiri-
cal work on the topic. Suffice it to say here that the comprehensive analysis of
curriculum, the backward mapping of formative assessments onto the scientific
investigations, the link between a conception of science achievement and the
embedded assessments, and the integration of formative assessment ideas,
motivation, achievement, and conceptual change, taken together, are not found,
to our knowledge, elsewhere.
300 SHAVELSON ET AL.
FORMATIVE ASSESSMENT
Informal Formal
Unplanned Planned
lesson. These questions may be general (“Why do things sink and float?”) or
more specific (“What is the relationship between mass and volume in floating
objects?”). At the right moment during class, the teacher poses these questions,
and through a discussion the teacher can learn what students know, what
evidence they have to back up their knowledge, and what different ideas need to
be discussed. This contrasts with typical classroom recitation where teachers use
simple questions to “keep the show going.”
The studies reported in this issue address a number of questions, some of which
were intentional—for example, What is the impact of formative assessment on
students’ conceptions of sinking and floating?—and some that arose in the pro-
cess of carrying out the study—for example, If assessments are to be embedded
in curricular material and used during teaching, where should they be placed?
To be a bit more explicit, the following sampling of questions arose during the
course of the study and are addressed in the articles that follow: (1) What critical
issues need to be considered in deciding where to embed formal formative
assessments? (2) What characteristics should assessments tasks have to be effec-
tive and practical as formative assessments (see Furtak & Ruiz-Primo, 2007)?
(3) How might teaching tools—such as the learning progression students move
through in reaching a scientifically justifiable explanation for sinking and float-
ing—be designed to assist teachers in interpreting students’ conceptions as they
302 SHAVELSON ET AL.
progress through the unit embedded with formative assessments? (4) How impor-
tant is it for embedded assessment to meet high psychometric standards and what
tradeoffs are involved? (5) Where do teachers have difficulty in carrying out for-
mative assessment practices (eliciting students’ conceptions? getting students to
use evidence to justify their conceptual claims?) and how can training address
these difficulties?
Embedded assessments are intended to focus teaching and learning on the goals
of the curriculum and provide feedback to students as to how to close the gap in
their knowledge between what they know and what they need to know (e.g.,
Black & Wiliam, 2004a, 2004b). What then do we want students to know? The
answer to this question is important and needs to be consistent across lessons
otherwise the assessments will be haphazard and potentially misleading.
With this logic, we built embedded assessments following SEAL’s conceptual
framework for (science) achievement. The framework proved heuristic in gener-
ating assessments—the kinds of test-like tasks overlapped but went beyond what
is typically found in state science assessments—and presaged the 2009 NAEP
science assessment (National Assessment Governing Board, 2006). The frame-
work also provided some unexpected pitfalls.
We conceived of science achievement as involving cognition, emotion, and
motivation (e.g., Shavelson et al., 2002) but for this study, focused directly on
cognition. Nevertheless, we also examined the impact of formative assessment on
motivation and emotion. Our working definition of science achievement
(Li, Ruiz-Primo, & Shavelson, 2006; Shavelson & Ruiz-Primo, 1999; see also
National Assessment Governing Board, 2006) involved four types of knowing and
reasoning in a subject matter (Figure 2). One type of such knowledge is “knowing
that”—declarative (factual, conceptual) knowledge. For example, knowing that
force equals mass times acceleration and being able to reason with this knowl-
edge. Achievement also involves “knowing how” to do something—procedural
(step-by-step or condition-action) knowledge and reasoning with this knowledge.
For example, procedural knowledge involves knowing how to get the mass of an
object or how to carry out and reason through a comparative investigation by
manipulating the variable of interest and controlling others. Achievement also
importantly involves “knowing why”—schematic (“mental model”) knowledge.
Such knowledge builds on and connects declarative and procedural knowledge; it
is used to reason about, predict, and explain things in nature. For example,
schematic knowledge is involved in explaining why some things sink in water and
others float. Finally, achievement involves “knowing when and where to apply
ON THE IMPACT OF FORMATIVE ASSESSMENT ON LEARNING 303
Schematic
Knowledge &
Reasoning
Strategic
Knowledge &
Reasoning
Declarative Procedural
Knowledge & Knowledge &
Reasoning Reasoning
3
Concept maps are networks where the nodes are key concept terms (e.g., mass, volume, density)
and students are asked to connect these terms with arrows showing the direction of relationship
between a pair of terms and label the arrow to say how the two terms go together.
ON THE IMPACT OF FORMATIVE ASSESSMENT ON LEARNING 305
SEAL and CRDG jointly developed and evaluated the assessments. An Assess-
ment Development Team (for details, see Ayala et al., this issue)—comprised
of teachers, a scientist, science educators, and assessment developers—was
responsible for providing a blueprint for the embedded assessments, following
our framework (Figure 2). The SEAL staff translated the blueprint into embed-
ded assessments and CRDG staff reviewed them and suggested revisions. The
initial blueprint and the draft assessments underwent repeated pilot testing. In
the major pilot test, three teachers were briefly trained both in what embedded
assessments are and how to use them. At the end of a six-month tryout, the
assessment blueprint and assessments were completely revised, as was teacher
training in the use of formative assessments (see Ayala et al. and Brandon
et al., this issue).
We evaluated the impact of the final version of the embedded assessments, called
“Reflective Lessons,” on teaching and student outcomes in a small randomized
trial (Figure 3; see Yin et al., this issue; Furtak et al., this issue).
Participants
Twelve experienced FAST teachers, identified by CRDG as expert in teaching
FAST and drawn from across the United States, participated in the experiment;
they varied considerably in backgrounds (Table 1). All but one teacher in the
study taught more than one section of the FAST program. Although we examined
the results of the experiment on all teachers’ classes, in this special issue we
concentrate on “focus” classes—classes in which we videotaped each and every
lesson in order to study the implementation of the formative assessment
Experimental Control
Note. aTotal: the total years the teacher has been teaching. bScience: the years the teacher has been teaching science. cFAST: the years the teacher has been
teaching FAST. dFAST I: the times the teacher has been teaching FAST I; In some schools FAST I is taught more than once a year. eNumber of classes indi-
cates teachers’ teaching load. Some teachers also taught non-FAST classes besides FAST classes.
307
308 SHAVELSON ET AL.
“treatment.” The results reported here are the same as those we obtained with all
classes (see Yin, 2005).
Experimental Design
Teacher training
Both groups received three days of common training that included: (a) study
orientation, (b) exchange of how they approached FAST physical science investi-
gations and the specially developed exercises they used to teach about buoyancy;
(d) how to use reporting tools (e.g., logs), and (e) how to set up and the use video
cameras in their classrooms.
The experimental group teachers were additionally trained in the use of for-
mative assessments (see Ayala et al., this issue, for the substance of the training).
For each assessment suite (called “Reflective Lessons”) embedded at a critical
joint, the following cycle of training was provided. Teachers first participated as
students as project staff administered a Reflective Lesson. They then discussed
the Reflective Lesson among themselves and staff, noting the procedural skills
needed as well as the role of eliciting students’ conceptions and using those con-
ceptions to build an empirically justifiable knowledge claim. Next, they worked
in small groups and taught one another with the Reflective Lessons and received
feedback from peers and staff. Then they taught a small group of students study-
ing buoyancy in CRDG’s summer school program, receiving feedback from
peers and staff, as well as from students! Finally, they reflected on how to
improve their administration of and teaching with Reflective Lessons.
Testing
The achievement/conceptual change assessment included questions focusing
on declarative, procedural, and schematic knowledge (drawing more or less on
strategic knowledge). Example achievement/conceptual change items are
included in the Appendix. Motivation and constructed response items are pre-
sented in individual papers as appropriate.
ON THE IMPACT OF FORMATIVE ASSESSMENT ON LEARNING 309
The remainder of this special issue contains four articles. They report on the
formative assessment development and teacher training processes, the evaluation
of formative assessment’s impact on student motivation and learning and
possible interpretations of these findings, the fidelity with which the formative-
assessment treatment was implemented and their possible interpretations, and on
the lessons learned.
The second article (Ayala et al.), “From Formative Assessment to Reflective
Lessons to Preparing Teachers to Use Reflective Lessons,” describes the process
of embedded assessment development, tying the embedded and end-of-unit
assessments to the FAST program and a learning trajectory based on FAST and
research on cognitive development. We found developing embedded assessments
took a great deal of care, that teachers preconceptions about assessment influ-
enced how they used embedded assessments, and that even in-depth training did
not necessarily overcome teachers’ preconceptions about assessment and inquiry
science teaching.
310 SHAVELSON ET AL.
The third article (Yin et al.), “On the Measurement and Impact of Formative
Assessment on Students’ Learning and Motivation,” evaluates the instruments
used in the study and reports the results of the randomized experiment. The find-
ings, highlighting the importance of teachers in the formative-assessment equa-
tion, showed large variability in teachers’ practices, regardless of treatment
condition, which in turn impacted student outcomes.
The fourth article (Furtak et al.), “On the Fidelity of Implementing of
Formative Embedded Assessments and Its Relation to Student Learning,” links
the findings of large teacher effects on student learning together with teachers’
classroom assessment practices. Of particular importance is the contrast in
inquiry teaching practices, regardless of treatment condition, and the teaching
strategies that distinguished more and less effective formative assessment
practices.
The fifth article (Brandon et al), “On Lessons Learned and Future Collabora-
tions,” draws together the CRDG-SEAL collaborative experiences with the intent
of informing and improving future such collaborations. The article concludes
with recommendations for future collaborations.
CONCLUDING COMMENTS
ACKNOWLEDGMENTS
This research was supported, in part, by a grant from the National Science Founda-
tion (NSF/Award ESI-0095520) and, in part, by the National Center for Research on
Evaluation, Standards, and Testing (CRESST/Award 0070 G CC908-A-10).
REFERENCES
Black, P. J., & Wiliam, D. (1998). Assessment and classroom learning. Assessment in Education,
5(1), 7–73.
Black, P., & Wiliam, D. (2004a). Classroom assessment is not (necessarily) formative assessment
(and vice-versa). In M. Wilson (Ed.), Towards a coherence between classroom assessment and
accountability (pp. 183–188). Chicago: University of Chicago Press.
Black, P., & Wiliam, D. (2004b). The formative purpose: Assessment must first promote learning. In
M. Wilson (Ed.), Towards coherence between classroom assessment and accountability (pp. 20–50).
Chicago: University of Chicago Press.
Chi, M. T. H., Feltovich, P. J., & Glaser, R. (1981). Categorization and representation of physics
problems by experts and novices. Cognitive Science, 5, 121–152.
Cronbach, L. J. & Associates (1980). Toward reform of program evaluation. San Francisco:
Jossey-Bass.
Furtak, E. M., & Ruiz-Primo, M. A. (2007). Studying the effectiveness of four types of formative assess-
ment prompts in providing information about students' understanding in writing and discussions.
Paper presented at the American Educational Research Association Annual Meeting, Chicago, IL.
Li, M., Ruiz-Primo, M. A., & Shavelson, R. J. (2006). Towards a science achievement framework:
The case of TIMSS 1999. In S. Howie & T. Plomp (Eds.), Contexts of learning mathematics and
science: Lessons learned from TIMSS (pp. 291–311). London: Routledge.
National Assessment Governing Board (2006). Science assessment and item specifications for the
2009 National Assessment of Education Progress (pre-publication ed.). Washington, DC: Author.
312 SHAVELSON ET AL.
Pottenger, F. M. I., & Young, D. B. (1992). The local environment: FAST 1, Foundational
approaches in science teaching. Honolulu: University of Hawaii Curriculum Research & Develop-
ment group.
Ruiz-Primo, M. A., & Shavelson, R. J. (1996). Problems and issues in the use of concept maps in
science assessment. Journal of Research in Science Teaching, 33(6), 569–600.
Shavelson, R. J. (2006). On the integration of formative assessment in teaching and learning: Implica-
tions for new pathways in teacher education. In F. Oser, F. Achtenhagen, & U. Renold (Eds.),
Competence-oriented teacher training: Old research demands and new pathways (pp. 63–78).
Utrecht, The Netherlands: Sense Publishers.
Shavelson, R. J. (1995). On the romance of science curriculum and assessment reform in the United
States. In D. K. Sharpes & A.-L. Leino (Eds.), The dynamic concept of curriculum: Invited papers
to honour the memory of Paul Hellgren (pp. 57–76). (Research Bulletin 90). Finland: University of
Helsinki, Department of Education,.
Shavelson, R. J., Baxter, G. P., & Pine, J. (1991). Performance assessment in science. Applied
Measurement in Education, 4(4), 347–362, (Special Issue, R. Stiggins & B. Plake, Guest
Editors.)
Shavelson, R. J., Roeser, R. W., Kupermintz, H., Lau, S., Ayala, C., Haydel, A., Schultz, S., Quihuis, G., &
Gallagher, L. (2002). Richard E. Snow’s remaking of the concept of aptitude and multidimensional
test validity: Introduction to the special issue. Educational Assessment, 8(2), 77–100.
Shavelson, R. J., & Ruiz-Primo, M. A. (1999). On the psychometrics of assessing science understand-
ing. In J. J. Mintzes, J. H. Wamhersee, & J. D. Novak (Eds.) Assessing science understanding: A
human constructivist view (pp. 303–341). New York: Academic Press.
Shavelson, R. J., Yin, Y., Furtak, E. M., Ruiz-Primo, M. A., Ayala, C. C., Young, D. B., Tomita, M. K.,
Brandon, P. R., & Pottenger, F. (2008). On the role and impact of formative assessment on science
inquiry teaching and learning. In J. E. Coffey, R. Douglas, & C. Stearns (Eds.), Assessing science
learning: Perspectives from research and practice (pp. 21–36). Washington, DC: NSTA Press.
Wilson, M. (Ed.) (2004). Towards a coherence between classroom assessment and accountability.
Chicago: University of Chicago Press.
Wilson, M. R., & Bertenthal, M. W. (Eds.) (2005). Systems for state science assessment. Washington,
DC: National Academies Press.
Yin, Y. (2005). The influence of formative assessment on student motivation, achievement, and
conceptual change. Unpublished doctoral dissertation. Stanford, CA: Stanford University.
APPENDIX
Example Multiple-Choice Achievement Test Items
A. W.
B. X.
C. Y.
D. Z.
Density of Density of
Liquid A: Liquid B:
DliquidA DliquidB Density of
Liquid C:
DliquidC
A B C
A Outside B Outside
Inside Inside
Sinks ?
A. sink.
B. float.
C. subsurface float.
D. not sure.