liying cheng - washback in lang testing
liying cheng - washback in lang testing
LANGUAGE TESTING
Research Contexts and Methods
WASHBACK IN
LANGUAGE TESTING
Research Contexts and Methods
Edited by
Liying Cheng
Queen’s University
Yoshinori Watanabe
Akita National University
With
Andy Curtis
Queen’s University
Washback in language testing : research contents and methods / edited by Liying Cheng,
Yoshinori J. Watanabe, with Andy Curtis.
p. cm.
Includes bibliographical references and indexes.
ISBN 0-8058-3986-0 (cloth : alk. paper) — ISBN 0-8058-3987-9 (pbk. : alk. paper)
1. English language—Study and teaching—Foreign speakers. 2. Language and
languages—Ability testing. 3. English language—Ability testing. 4. Test-taking skills.
I. Cheng, Liying, 1959– II. Watanabe, Yoshinori J., 1956– III. Curits, Andy.
PE1128.A2W264 2003
428¢.0076—dc22 2003061785
CIP
Foreword ix
Preface xiii
vii
viii CONTENTS
References 211
J. Charles Alderson
Lancaster University
Washback and the impact of tests more generally has become a major area
of study within educational research, and language testing in particular, as
this volume testifies, and so I am particularly pleased to welcome this book,
and to see the range of educational settings represented in it. Exactly ten
years ago, Dianne Wall and I published an article in the journal Applied Lin-
guistics which asked the admittedly somewhat rhetorical question: “Does
Washback Exist?” In that article, we noted the widespread belief that tests
have impact on teachers, classrooms, and students, we commented that
such impact is usually perceived to be negative, and we lamented the ab-
sence of serious empirical research into a phenomenon that was so widely
believed to exist. Hence, in part, our title: How do we know it exists if there
is no research into washback? Ten years on, and a slow accumulation of
empirical research later, I believe there is no longer any doubt that wash-
back does indeed exist. But we now know that the phenomenon is a hugely
complex matter, and very far from being a simple case of tests having nega-
tive impact on teaching. The question today is not “does washback exist?”
but much rather what does washback look like? What brings washback
about? Why does washback exist?
We now know, for instance, that tests will have more impact on the con-
tent of teaching and the materials that are used than they will on the
teacher’s methodology. We know that different teachers will teach to a par-
ticular test in very different ways. We know that some teachers will teach to
very different tests in very similar ways. We know that high-stakes tests—
ix
x FOREWORD
REFERENCES
Alderson, J. C., & Hamp-Lyons, L. (1996). TOEFL preparation courses: A study of washback. Lan-
guage Testing, 13, 280–297.
Fullan, M. G., with Stiegelbauer, S. (1991). The new meaning of educational change (2nd ed.). Lon-
don: Cassell.
Messick, S. (1996). Validity and washback in language testing. Language Testing, 13, 241–256.
Pearson, I. (1988). Tests as levers for change. In D. Chamberlain & R. J. Baumgardner (Eds.), ESP
in the classroom: Practice and evaluation (pp. 98–107). London: Modern English.
Wall, D. (1996). Introducing new tests into traditional systems: Insights from general education
and from innovation theory. Language Testing, 13, 334–354.
Wall, D. (1997). Impact and washback in language testing. In C. Clapham & D. Corson (Eds.), Ency-
clopedia of language and education: Vol. 7. Language testing and assessment (pp. 291–302).
Dordrecht: Kluwer Academic.
Wall, D. (1999). The impact of high-stakes examinations on classroom teaching: A case study using in-
sights from testing and innovation theory. Unpublished doctoral dissertation, Lancaster Uni-
versity, UK.
Wall, D., & Alderson, J. C. (1993). Examining washback: The Sri Lankan impact study. Language
Testing, 10, 41–69.
Preface
We live in a testing world. Our education system is awash with various high-
stakes testing, be it standardized, multiple-choice testing or portfolio as-
sessment. Washback, a term commonly used in applied linguistics, refers to
the influence of language testing on teaching and learning. The extensive
use of examination scores for various educational and social purposes in
society nowadays has made the washback effect a distinct educational phe-
nomenon. This is true both in general education and in teaching English as
a second/foreign language (ESL/EFL), from Kindergarten to Grade 12 class-
rooms to the tertiary level. Washback is a phenomenon that is of inherent
interest to teachers, researchers, program coordinators/directors, policy-
makers, and others in their day-to-day educational activities.
Despite the importance of this issue, however, it is only recently that re-
searchers have become aware of the importance of investigating this phe-
nomenon empirically. There are only a limited number of chapters in books
and papers in journals, except for the notable exception of a special issue
on washback in the journal Language Testing (edited by J. C. Alderson and
D. Wall, 1996). Once the washback effect has been examined in the light of
empirical studies, it can no longer be taken for granted that where there is a
test, there is a direct effect. The small body of research to date suggests
that washback is a highly complex phenomenon, and it has already been es-
tablished that simply changing test contents or methods will not necessar-
ily bring about direct and desirable changes in education as intended
through a testing change. Rather, various factors within a particular educa-
xiii
xiv PREFACE
The purpose of the present volume, then, is twofold; first to update teachers,
researchers, policymakers/administrators, and others on what is involved in
this complex issue of testing and its effects, and how such a phenomenon
benefits teaching and learning, and second, to provide researchers with
models of research studies on which future studies can be based. In order
to address these two main purposes, the volume consists of two parts. Part
I provides readers with an overall view of the complexity of washback, and
the various contextual factors entangled within testing, teaching, and learn-
ing. Part II provides a collection of empirical washback studies carried out
in many different parts of the world, which lead the readers further into the
heart of the issue within each educational context.
Chapter 1 discusses washback research conducted in general education,
and in language education in particular. The first part of the chapter re-
views the origin and the definition of this phenomenon. The second exam-
ines the complexity of the positive and negative influence of washback, and
the third explores its functions and mechanisms. The last part of the chap-
ter looks at the concept of bringing about changes in teaching and learning
through changes in testing.
Chapter 2 provides guidance to researchers by illustrating the process
that the author followed to investigate the effects of the Japanese univer-
sity entrance examinations. Readers are also introduced to the method-
ological aspects of the second part of this volume.
PREFACE xv
ACKNOWLEDGMENTS
The volume could not have been completed without the contributions of a
group of dedicated researchers who are passionate about washback re-
search. We thank you all for going through the whole process with us in
bringing this book to the language testing and assessment community. We
are grateful to so many individuals including:
PREFACE xvii
Finally, our greatest thanks go to our families, for their patience, encourage-
ment, and support while we were working on this book.
—Liying Cheng
—Yoshinori Watanabe
—Andy Curtis
About the Authors
THE EDITORS
xix
xx ABOUT THE AUTHORS
change and innovation in language education, and he has worked with lan-
guage teachers and learners in Europe, Asia, North, South, and Central
America.
THE CONTRIBUTORS
Stephen Andrews heads the language and literature division in Hong Kong
University’s Faculty of Education. He has extensive involvement in assess-
ment as was previously Head of the TEFL Unit at the University of Cam-
bridge Local Examinations Syndicate. He has been involved in washback re-
search for more than 10 years.
Tammi Chun is the Director of Project Evaluation for Gaining Early Aware-
ness and Readiness for Undergraduate Programs (GEAR UP) at the Univer-
sity of Hawai’i at Manoa. Chun’s research includes study of the implementa-
tion of standards-based reform, including assessment, accountability, and
instructional guidance policies, in America.
Irit Ferman is an instructor and English Center Director, at the English De-
partment, Levinsky College of Education, Tel-Aviv, Israel. She graduated the
Language Education Program, School of Education, Tel-Aviv University,
1998, with distinction. Her washback-related research has focused on the
impact of tests on EFL teaching–learning–assessment practices and the per-
ceptions of those involved.
Nick Saville is Director of Research and Validation for Cambridge ESOL Ex-
amination where he coordinates the research and validation program. He
has worked on several impact studies, including the IELTS impact project
reported in this volume, and a study of the impact of the Progetto Lingue
2000 in Italy.
I
CONCEPTS AND
METHODOLOGY
OF WASHBACK
C H A P T E R
1
Washback or Backwash:
A Review of the Impact of Testing
on Teaching and Learning
Liying Cheng
Andy Curtis
Queen’s University
1
1 In this chapter, the terms “test” and “examination” are used interchangeably to refer to the
use of assessment by means of a test or an examination.
3
4 CHENG AND CURTIS
tion. Research in language testing has centered on whether and how we as-
sess the specific characteristics of a given group of test takers and whether
and how we can incorporate such information into the ways in which we
design language tests. One of the most important theoretical developments
in language testing in the past 30 years has been the realization that a lan-
guage test score represents a complex of multiple influences. Language test
scores cannot be interpreted simplistically as an indicator of the particular
language ability we think we are measuring. The scores are also affected by
the characteristics and contents of the test tasks, the characteristics of the
test takers, the strategies test takers employ in attempting to complete the
test tasks, as well as the inferences we draw from the test results. These fac-
tors undoubtedly interact with each other.
Nearly 20 years ago, Alderson (1986) identified washback as a distinct—
and at that time emerging—area within language testing, to which we
needed to turn our attention. Alderson (1986) discussed the “potentially
powerful influence offsets” (p. 104) and argued for innovations in the lan-
guage curriculum through innovations in language testing (also see Wall,
1996, 1997, 2000). At around the same time, Davies (1985) was asking
whether tests should necessarily follow the curriculum, and suggested that
perhaps tests ought to lead and influence the curriculum. Morrow (1986) ex-
tended the use of washback to include the notion of washback validity,
which describes the relationship between testing, and teaching and learn-
ing (p. 6). Morrow also claimed that “. . . in essence, an examination of
washback validity would take testing researchers into the classroom in or-
der to observe the effects of their tests in action” (p. 6). This has important
implications for test validity.
Looking back, we can see that examinations have often been used as a
means of control, and have been with us for a long time: a thousand years
or more, if we include their use in Imperial China to select the highest offi-
cials of the land (Arnove, Altback, & Kelly, 1992; Hu, 1984; Lai, 1970). Those
examinations were probably the first civil service examinations ever devel-
oped. To avoid corruption, all essays in the Imperial Examination were
marked anonymously, and the Emperor personally supervised the final
stage of the examination. Although the goal of the examination was to se-
lect civil servants, its washback effect was to establish and control an edu-
cational program, as prospective mandarins set out to prepare themselves
for the examination that would decide not only their personal fate but also
influence the future of the Empire (Spolsky, 1995a, 1995b).
The use of examinations to select for education and employment has
also existed for a long time. Examinations were seen by some societies as
ways to encourage the development of talent, to upgrade the performance
of schools and colleges, and to counter to some degree, nepotism, favorit-
ism, and even outright corruption in the allocation of scarce opportunities
6 CHENG AND CURTIS
(Bray & Steward, 1998; Eckstein & Noah, 1992). If the initial spread of exami-
nations can be traced back to such motives, the very same reasons appear
to be as powerful today as ever they were. Linn (2000) classified the use of
tests and assessments as key elements in relation to five waves of educa-
tional reform over the past 50 years: their tracking and selecting role in the
1950s; their program accountability role in the 1960s; minimum competency
testing in the 1970s; school and district accountability in the 1980s; and the
standards-based accountability systems in the 1990s (p. 4). Furthermore, it
is clear that tests and assessments are continuing to play a crucial and criti-
cal role in education into the new millennium.
In spite of this long and well-established place in educational history, the
use of tests has, constantly, been subject to criticism. Nevertheless, tests
continue to occupy a leading place in the educational policies and practices
of a great many countries (see Baker, 1991; Calder, 1997; Cannell, 1987;
Cheng, 1997, 1998a; Heyneman, 1987; Heyneman & Ransom, 1990; James,
2000; Kellaghan & Greaney, 1992; Li, 1990; Macintosh, 1986; Runte, 1998;
Shohamy, 1993a; Shohamy, Donitsa-Schmidt, & Ferman, 1996; Widen et al.,
1997; Yang, 1999; and chapters in Part II of this volume). These researchers,
and others, have, over many years, documented the impact of testing on
school and classroom practices, and on the personal and professional lives
and experiences of principals, teachers, students, and other educational
stakeholders.
Aware of the power of tests, policymakers in many parts of the world
continue to use them to manipulate their local educational systems, to con-
trol curricula and to impose (or promote) new textbooks and new teaching
methods. Testing and assessment is “the darling of the policy-makers”
(Madaus, 1985a, 1985b) despite the fact that they have been the focus of
controversy for as long as they have existed. One reason for their longevity
in the face of such criticism is that tests are viewed as the primary tools
through which changes in the educational system can be introduced with-
out having to change other educational components such as teacher training
or curricula. Shohamy (1992) originally noted that “this phenomenon
[washback] is the result of the strong authority of external testing and the
major impact it has on the lives of test takers” (p. 513). Later Shohamy et al.
(1996; see also Stiggins & Faires-Conklin, 1992) expanded on this position
thus:
the power and authority of tests enable policy-makers to use them as effec-
tive tools for controlling educational systems and prescribing the behavior of
those who are affected by their results—administrators, teachers and stu-
dents. School-wide exams are used by principals and administrators to en-
force learning, while in classrooms, tests and quizzes are used by teachers to
impose discipline and to motivate learning. (p. 299)
1. IMPACT OF TESTING ON TEACHING AND LEARNING 7
One example of these beliefs about the legislative power and authority of
tests was seen in 1994 in Canada, where a consortium of provincial minis-
ters of education instituted a system of national achievement testing in the
areas of reading, language arts, and science (Council of Ministers of Educa-
tion, Canada, 1994). Most of the provinces now require students to pass
centrally set school-leaving examinations as a condition of school gradua-
tion (Anderson, Muir, Bateson, Blackmore, & Rogers, 1990; Lock, 2001;
Runte, 1998; Widen, O’Shea, & Pye, 1997).
Petrie (1987) concluded that “it would not be too much of an exaggera-
tion to say that evaluation and testing have become the engine for imple-
menting educational policy” (p. 175). The extent to which this is true de-
pends on the different contexts, as shown by those explored in this volume,
but a number of recurring themes do emerge. Examinations of various
kinds have been used for a very long time for many different purposes in
many different places. There is a set of relationships, planned and un-
planned, positive and negative, between teaching and testing. These two
facts mean that, although washback has only been identified relatively re-
cently, it is likely that washback effects have been occurring for an equally
long time. It is also likely that these teaching–testing relationships are likely
to become closer and more complex in the future. It is therefore essential
that the education community work together to understand and evaluate
the effects of the use of testing on all of the interconnected aspects of teach-
ing and learning within different education systems.
a test’s validity” (p. 116). The washback effect should, therefore, refer to the
effects of the test itself on aspects of teaching and learning.
The fact that there are so many other forces operating within any educa-
tion context, which also contribute to or ensure the washback effect on
teaching and learning, has been demonstrated in several washback studies
(e.g., Anderson et al., 1990; Cheng, 1998b, 1999; Herman, 1992; Madaus, 1988;
Smith, 1991a, 1991b; Wall, 2000; Watanabe, 1996a; Widen et al., 1997). The key
issue here is how those forces within a particular educational context can
be teased out to understand the effects of testing in that environment, and
how confident we can be in formulating hypotheses and drawing conclu-
sions about the nature and the scope of the effects within broader educa-
tional contexts.
Negative Washback
Tests in general, and perhaps language tests in particular, are often criti-
cized for their negative influence on teaching—so-called “negative wash-
back”—which has long been identified as a potential problem. For example,
nearly 50 years ago, Vernon (1956) claimed that teachers tended to ignore
subjects and activities that did not contribute directly to passing the exam,
and that examinations “distort the curriculum” (p. 166). Wiseman (1961) be-
lieved that paid coaching classes, which were intended for preparing stu-
dents for exams, were not a good use of the time, because students were
practicing exam techniques rather than language learning activities (p.
159), and Davies (1968) believed that testing devices had become teaching
devices; that teaching and learning was effectively being directed to past
examination papers, making the educational experience narrow and unin-
teresting (p. 125).
More recently, Alderson and Wall (1993) referred to negative washback
as the undesirable effect on teaching and learning of a particular test
deemed to be “poor” (p. 5). Alderson and Wall’s poor here means “some-
thing that the teacher or learner does not wish to teach or learn.” The tests
may well fail to reflect the learning principles or the course objectives to
which they are supposedly related. In reality, teachers and learners may
end up teaching and learning toward the test, regardless of whether or not
they support the test or fully understand its rationale or aims.
In general education, Fish (1988) found that teachers reacted negatively
to pressure created by public displays of classroom scores, and also found
that relatively inexperienced teachers felt greater anxiety and accountabil-
ity pressure than experienced teachers, showing the influence of factors
such as age and experience. Noble and Smith (1994a) also found that high-
stakes testing could affect teachers directly and negatively (p. 3), and that
“teaching test-taking skills and drilling on multiple-choice worksheets is
10 CHENG AND CURTIS
Positive Washback
Like most areas of language testing, for each argument in favor or opposed
to a particular position, there is a counterargument. There are, then, re-
searchers who strongly believe that it is feasible and desirable to bring
about beneficial changes in teaching by changing examinations, represent-
ing the “positive washback” scenario, which is closely related to “measure-
ment-driven instruction” in general education. In this case, teachers and
learners have a positive attitude toward the examination or test, and work
willingly and collaboratively toward its objectives.
For example, Heyneman (1987) claimed that many proponents of aca-
demic achievement testing view “coachability” not as a drawback, but
rather as a virtue (p. 262), and Pearson (1988) argued for a mutually benefi-
cial arrangement, in which “good tests will be more or less directly usable
as teaching-learning activities. Similarly, good teaching-learning tasks will
be more or less directly usable for testing purposes, even though practical
or financial constraints limit the possibilities” (p. 107). Considering the com-
plexity of teaching and learning and the many constraints other than those
financial, such claims may sound somewhat idealistic, and even open to ac-
cusations of being rather simplistic. However, Davies (1985) maintained
that “creative and innovative testing . . . can, quite successfully, attract to it-
1. IMPACT OF TESTING ON TEACHING AND LEARNING 11
Traditionally, tests have come at the end of the teaching and learning proc-
ess for evaluative purposes. However, with the widespread expansion and
proliferation of high-stakes public examination systems, the direction
seems to have been largely reversed. Testing can come first in the teaching
and learning process. Particularly when tests are used as levers for change,
new materials need to be designed to match the purposes of a new test, and
school administrative and management staff, teachers, and students are
12 CHENG AND CURTIS
TABLE 1.1
The Trichotomy Backwash Model
turn may affect what the participants do in carrying out their work (process),
including practicing the kind of items that are to be found in the test, which
will affect the learning outcomes, the product of the work. (p. 2)
Fullan with Stiegelbauer (1991) and Fullan (1993), also in the context of inno-
vation and change, discussed changes in schools, and identified two main
recurring themes:
· All the participants who are affected by an innovation have to find their
own “meaning” for the change.
2
2 Performance assessment based on the constructivist model of learning is defined by Gipps
(1994) as “a systematic attempt to measure a learner’s ability to use previously acquired knowl-
edge in solving novel problems or completing specific tasks. In performance assessment, real
life or simulated assessment exercises are used to elicit original responses, which are directly
observed and rated by a qualified judge” (p. 99).
16 CHENG AND CURTIS
allel in terms of beliefs about how students learn and how their learning
can be best supported.
It is possible that performance-based assessment can be designed to be
so closely linked to the goals of instruction as to be almost indistinguish-
able from them. If this were achieved, then rather than being a negative
consequence, as is the case now with many existing high-stakes standard-
ized tests, “teaching to these proposed performance assessments, accepted
by scholars as inevitable and by teachers as necessary, becomes a virtue,
according to this line of thinking” (Noble & Smith, 1994b, p. 7; see also
Aschbacher, 1990; Aschbacher, Baker, & Herman, 1988; Baker, Aschbacher,
Niemi, & Sato, 1992; Wiggins, 1989a, 1989b, 1993). This rationale relates to
the debates about negative versus positive washback, discussed earlier,
and may have been one of the results of public discontent with the quality
of schooling leading to the development of measurement-driven instruction
(Popham, Cruse, Rankin, Standifer, & Williams, 1985, p. 629). However, such
a reform strategy has been challenged, for example, described by Andrews
(1994, 1995) as a “blunt instrument” for bringing about changes in teaching
and learning, since the actual teaching and learning situation is far more
complex, as discussed earlier, than proponents of alternative assessment
appear to suggest (see also Alderson & Wall, 1993; Cheng, 1998a, 1999; Wall,
1996, 1999).
Each different educational context (including school environment, mes-
sages from administration, expectations of other teachers, students, etc.)
plays a key role in facilitating or detracting from the possibility of change,
which support Andrews’ (1994, 1995) beliefs that such reform strategies
may be simplistic. More support for this position comes from Noble and
Smith (1994a), whose study of the impact of the Arizona Student Assess-
ment Program revealed “both the ambiguities of the policy-making process
and the dysfunctional side effects that evolved from the policy’s disparities,
though the legislative passage of the testing mandate obviously demon-
strated Arizona’s commitment to top-down reform and its belief that assess-
ment can leverage educational change” (pp. 1–2). The chapters in Part II of
this volume describe and explore what impact testing has had in and on
those educational contexts, what factors facilitate or detract from the possi-
bility of change derived from assessment, and the lessons we can learn
from these studies.
The relationship between testing and teaching and learning does appear
to be far more complicated and to involve much more than just the design
of a “good” assessment. There is more underlying interplay and intertwin-
ing of influences within each specific educational context where the assess-
ment takes place. However, as Madaus (1988) has shown, a high-stakes test
can lever the development of new curricular materials, which can be a posi-
tive aspect. An important point, though, is that even if new materials are
1. IMPACT OF TESTING ON TEACHING AND LEARNING 17
Few educators would dispute the claim that these sorts of high-stakes tests
markedly influence the nature of instructional programs. Whether they are
concerned about their own self-esteem or their students’ well being, teachers
clearly want students to perform well on such tests. Accordingly, teachers
tend to focus a significant portion of their instructional activities on the
knowledge and skills assessed by such tests. (p. 680)
It is worthwhile pointing out here that performing well on a test does not
necessarily indicate good learning or high standards, and it only tells part
of the story about the actual teaching and learning. When a new test emerg-
ing—a traditional type or an alternative type of assessment emerging—is in-
troduced into an educational context as a mandate and as an accountability
measure, it is likely to produce unintended consequences (Cheng & Cou-
ture, 2000), which goes back to Messick’s (1994) consequential validity.
Teachers do not resist changes. They resist being changed (A. Kohn, per-
sonal communication, April 17, 2002). As English (1992) stated well, the end
point of educational change—classroom change—is in the teachers’ hands.
When the classroom door is closed and nobody else is around, the class-
room teacher can then select and teach almost any curriculum he or she
decides is appropriate, irrespective of reforms, innovations, and public ex-
aminations.
The studies discussed in this chapter highlight the importance of the ed-
ucational community understanding the function of testing in relation to
the many facets and scopes of teaching and learning as mentioned before,
and the importance of evaluating the impact of assessment-driven reform
on our teachers, students, and other participants within the educational
context. This chapter serves as the starting point, and the linking point to
other chapters in this volume, so we can examine the nature of this wash-
back phenomenon from many different perspectives (see chaps. 2 and 3)
and within many different educational contexts around the world (chaps. in
Part II).
References
Foreword
Adams, R. S., & Chen, D. (1981). The process of educational innovation: An international perspective.
London: Kogan Page.
AEL. (2000). Notes from the field: KERA in the classroom. Notes from the field: Education reform in
rural Kentucky, 7(1), 1–18.
Alderson, J. C. (1986). Innovations in language testing. In M. Portal (Ed.), Innovations in language
testing: Proceedings of the IUS/NFER conference (pp. 93–105). Windsor: NFER-Nelson.
Alderson, J. C. (1990). The relationship between grammar and reading in an English for academic
purposes test battery. In D. Douglas & C. Chappelle (Eds.), A new decade of language testing
research: Selected papers from the Annual Language Testing Research Colloquium (pp. 203–219).
Alexandria, VA: Teachers of English to Speakers of Other Languages.
Alderson, J. C. (1992). Guidelines for the evaluation of language education. In J. C. Alderson & A.
Beretta (Eds.), Evaluating second language education (pp. 274–304). Cambridge, England: Cam-
bridge University Press.
Alderson, J. C., & Banerjee, J. (1996). How might impact study instruments be validated? Cambridge,
England: University of Cambridge Local Examinations Syndicate.
Alderson, J. C., & Banerjee, J. (2001). Impact and washback research in language testing. In C. El-
der, A. Brown, E. Grove, K. Hill, N. Iwashita, T. Lumley, K. McLoughlin, & T. McNamara (Eds.),
Experimenting with uncertainty: Essays in honor of Alan Davies (pp. 150–161). Cambridge, Eng-
land: Cambridge University Press.
Alderson, J. C., & Hamp-Lyons, L. (1996). TOEFL preparation courses: A study of washback. Lan-
guage Testing, 13, 280–297.
Alderson, J. C., & Scott, M. (1992). Insiders and outsiders and participatory evaluation. In J. C.
Alderson & A. Beretta (Eds.), Evaluating second language curriculum (pp. 25–60). Cambridge,
England: Cambridge University Press.
Alderson, J. C., & Wall, D. (1993). Does washback exist? Applied Linguistics, 14, 115–129.
Alderson, J. C., & Wall, D. (Eds.). (1996). [Special issue]. Language Testing, 13(3).
Allwright, D., & Bailey, K. M. (1991). Focus on the language classroom: An introduction to classroom
research for language teachers. Cambridge, England: Cambridge University Press.
211
212 REFERENCES
Amano, I. (1990). Education and examination in modern Japan (W. K. Cummings & F. Cummings,
Trans.). Tokyo: University of Tokyo Press. (Original work published 1983)
Anderson, J. O., Muir, W., Bateson, D. J., Blackmore, D., & Rogers, W. T. (1990). The impact of pro-
vincial examinations on education in British Columbia: General report. Victoria: British Colum-
bia Ministry of Education.
Andrews, S. (1994). The washback effect of examinations: Its impact upon curriculum innovation
in English language teaching. Curriculum Forum, 4(1), 44–58.
Andrews, S. (1995). Washback or washout? The relationship between examination reform and
curriculum innovation. In D. Nunan, V. Berry, & R. Berry (Eds.), Bringing about change in lan-
guage education (pp. 67–81). Hong Kong: University of Hong Kong.
Andrews, S., & Fullilove, J. (1993). Backwash and the use of English oral: Speculations on the im-
pact of a new examination upon sixth form English language testing in Hong Kong. New Hori-
zons, 34, 46–52.
Andrews, S., & Fullilove, J. (1994). Assessing spoken English in public examinations—Why and
how? In J. Boyle & P. Falvey (Eds.), English language testing in Hong Kong (pp. 57–85). Hong
Kong: Chinese University Press.
Andrews, S., & Fullilove, J. (1997, December). The elusiveness of washback: Investigating the impact
of a new oral exam on students’ spoken language performance. Paper presented at the Interna-
tional Language in Education Conference, University of Hong Kong, Hong Kong.
Andrews, S., Fullilove, J., & Wong, Y. (2002). Targeting washback: A case-study. System, 30,
207–223.
Ariyoshi, H., & Senba, K. (1983). Daigaku nyushi junbi kyoiku ni kansuru kenkyu [A study on
preparatory teaching for entrance examination]. Fukuoka Kyoiku Daigaku Kiyo, 33, 1–21.
Arnove, R. F., Altback, P. G., & Kelly, G. P. (Eds.). (1992). Emergent issues in education: Comparative
perspectives. Albany, NY: State University of New York Press.
Aschbacher, P. R., Baker, E. L., & Herman, J. L. (Eds.). (1988). Improving large-scale assessment
(Resource Paper No. 9). Los Angeles: University of California, National Center for Research
on Evaluation, Standards, and Student Testing.
Aschbacher, P. R. (1990). Monitoring the impact of testing and evaluation innovations projects: State
activities and interest concerning performance-based assessment. Los Angeles: University of
California, National Center for Research on Evaluation, Standards, and Student Testing.
Association of Language Testers in Europe. (1995). Development and descriptive checklists for
tasks and examinations. Cambridge, England: Author.
Association of Language Testers in Europe. (1998). ALTE handbook of language examinations and
examination systems. Cambridge, England: University of Cambridge Local Examinations Syn-
dicate.
Bachman, L., Davidson, F., Ryan, K., & Choi, I. C. (Eds.). (1993). An investigation into the compara-
bility of two tests of English as a foreign language. Cambridge, England: Cambridge University
Press.
Bachman, L. F., & Palmer, A. S. (1996). Language testing in practice. Oxford, England: Oxford Uni-
versity Press.
Bachman, L. F., Purpura, J. E., & Cushing, S. T. (1993). Development of a questionnaire item bank to
explore test-taker characteristics. Cambridge, England: University of Cambridge Local Exami-
nations Syndicate.
Bailey, K. M. (1996). Working for washback: A review of the washback concept in language test-
ing. Language Testing, 13, 257–279.
Bailey, K. M. (1999). Washback in language testing. Princeton, NJ: Educational Testing Service.
Baker, E. L. (1989). Can we fairly measure the quality of education? (Tech. Rep. No. 290). Los An-
geles: University of California, Center for the Study of Evaluation.
Baker, E. L. (1991, September). Issues in policy, assessment, and equity. Paper presented at the na-
tional research symposium on limited English proficient students’ issues: Focus on evalua-
tion and measurement, Washington, DC.
REFERENCES 213
Baker, E., Aschbacher, P., Niemi, D., & Sato, E. (1992). Performance assessment models: Assessing
content area explanations. Los Angeles: University of California, National Center for Research
on Evaluation, Standards, and Student Testing.
Banerjee, J. V. (1996). The design of the classroom observation instruments. Cambridge, England:
University of Cambridge Local Examinations Syndicate.
Ben-Rafael, E. (1994). Language, identity, and social division: The case of Israel. Oxford, England:
Clarendon Press.
Bergeson, T., Wise, B. J., Fitton, R., Gill, D. H., & Arnold, N. (2000). Guidelines for participation and
testing accommodations for special populations on the Washington assessment of student learn-
ing (WASL). Olympia, WA: Office of Superintendent of Public Instruction.
Berry, V., Falvey, P., Nunan, D., Burnett, M., & Hunt, J. (1995). Assessment and change in the
classroom. In D. Nunan, R. Berry & V. Berry (Eds.), Bringing about change in language educa-
tion (pp. 31–54). Hong Kong: University of Hong Kong, Department of Curriculum Studies.
Berwick, R., & Ross, S. (1989). Motivation after matriculation: Are Japanese learners of English
still alive after exam hell? Japan Association for Language Teaching Journal, 11, 193–210.
Biggs, J. B. (1992). The psychology of assessment and the Hong Kong scene. Bulletin of the Hong
Kong Psychological Society, 29, 1–21.
Biggs, J. B. (1995). Assumptions underlying new approaches to educational assessment. Curricu-
lum Forum, 4(2), 1–22.
Biggs, J. B. (Ed.). (1996). Testing: To educate or to select? Education in Hong Kong at the cross-roads.
Hong Kong: Hong Kong Educational Publishing.
Biggs, J. B. (1998). Assumptions underlying new approaches to assessment. In P. G. Stimpson &
P. Morris (Eds.), Curriculum and assessment for Hong Kong (pp. 351–384). Hong Kong: Open
University of Hong Kong Press.
Biggs, J. B. (1999). Teaching for quality learning at university. Buckingham, England: Open Univer-
sity Press.
Black, P., & Wiliam, D. (1998). Assessment and classroom learning. Assessment in education: Prin-
ciples, policy and practice 5(1), 7–75.
Blenkin, G. M., Edwards, G., & Kelly, A. V. (1992). Change and the curriculum. London: P. Chapman.
Blewchamp, P. (1994). Washback in TOEFL classrooms: An exploratory investigation into the influ-
ence of the TOEFL test on teaching content and methodology. Unpublished master’s thesis, Lan-
caster University, England.
Bonkowski, F. (1996). Instrument for the assessment of teaching materials. Unpublished manu-
script, Lancaster University, England.
Borko, H., & Elliott, R. (1999). “Hands-on” pedagogy vs. “hands-off” accountability: Tensions be-
tween competing commitments for exemplary teachers of mathematics in Kentucky. Phi
Delta Kappa, 80, 394–400.
Bracey, G. W. (1987). Measurement-driven instruction: Catchy phrase, dangerous practice. Phi
Delta Kappa, 68, 683–686.
Bracey, G. W. (1989). The $150 million redundancy. Phi Delta Kappa, 70, 698–702.
Bray, M., & Steward, L. (Eds.). (1998). Examination systems in small states: Comparative perspec-
tives on policies, models and operations. London: Commonwealth Secretariat.
Briggs, C. L. (1986). Learning how to ask: A sociolinguistic appraisal of the role of the interview in so-
cial science research. Cambridge, England: Cambridge University Press.
Brindley, G. (1989). Assessing achievement in the learner-centered curriculum. Sydney, Australia:
National Center for English Language Teaching and Research.
Brindley, G. (1994). Competency-based assessment in second language programs: Some issues
and questions. Prospect, 9(2), 41–55.
Broadfoot, P. (1998, April). Categories, standards and instrumentalism: Theorizing the changing dis-
course of assessment policy in English primary education. Paper presented at the annual confer-
ence of the American Educational Research Association, San Diego, CA.
214 REFERENCES
Docking, R. (1993, May). Competency-based approaches to education and training: Progress and
promise. Paper presented at the annual meeting of the National Centre for English Language
Teaching and Research, Sydney, Australia.
Dore, R. P. (1976). The diploma disease. London: Allen and Unwin.
Dore, R. P. (1997). Reflections on the diploma disease twenty years later. Assessment in Educa-
tion, 4, 189–206.
Ebel, R. L. (1966). The social consequences of educational testing. In A. Anastasi (Ed.), Testing
problems in perspective (pp. 18–29). Washington, DC: American Council on Education.
Eckstein, M. A., & Noah, H. J. (Eds.). (1992). Examinations: Comparative and international studies.
Oxford: Pergamon Press.
Education Week. (1997, January). Quality counts ’97: A report card on the conditions of education in
50 states. Bethesda, MD: Editorial Projects in Education. Retrieved April 3, 2000, from http://
www.edweek.org/sreports/qc97/
Education Week. (1999, January 11). Quality counts ’99: Rewarding results, punishing failure.
Bethesda, MD: Editorial Projects in Education.
Education Week. (2000, January 13). Quality counts 2000: Who should teach? Bethesda, MD: Edito-
rial Project in Education.
Eisenhart, M. A., & Howe, K. R. (1992). Validity in educational research. In M. D. Lecompte, W. L.
Millroy, & J. Preissle (Eds.), Handbook of qualitative research in education (pp. 643–680). San
Diego, CA: Academic Press.
Elliott, N., & Ensign, G. (1999). The Washington assessment of student learning: An update on writing.
Olympia, WA: Office of the Superintendent of Public Instruction. Retrieved January 2, 2001,
from: http://www.k12.wa.us/assessment/assessproginfo/subdocuments/writupdate.asp
Elton, L., & Laurillard, D. (1979). Trends in student learning. Studies in Higher Education, 4, 87–102.
English, F. W. (1992). Deciding what to teach and test: Developing, aligning, and auditing the curricu-
lum. Newbury Park, CA: Corwin Press.
Erickson, F. (1986). Qualitative methods in research on teaching. In M. Wittrock (Ed.), Handbook
of research on teaching (3rd ed., pp. 119–161). New York: Macmillan.
Falvey, P. (1995). The education of teachers of English in Hong Kong: A case for special treat-
ment. In F. Lopez-Real (Ed.), Proceedings of ITEC ’95 (pp. 107–113). Hong Kong: University of
Hong Kong, Department of Curriculum Studies.
Fish, J. (1988). Responses to mandated standardized testing. Unpublished doctoral dissertation,
University of California, Los Angeles.
Frederiksen, J. R., & Collins, A. (1989). A system approach to educational testing. Educational Re-
searcher, 18(9), 27–32.
Frederiksen, N. (1984). The real test bias: Influences of testing on teaching and learning. Ameri-
can Psychology, 39, 193–202.
Fullan, M. G. (1993). Change forces: Probing the depth of educational reform. London: Falmer Press.
Fullan, M. G. (1998). Linking change and assessment. In P. Rea-Dickins & K. P. Germaine (Eds.),
Managing evaluation and innovation in language teaching: Building bridges (pp. 253–262). Lon-
don: Longman.
Fullan, M., & Park, P. (1981). Curriculum implementation: A resource booklet. Toronto, Ontario,
Canada: Ontario Ministry of Education.
Fullan, M. G., with Stiegelbauer, S. (1991). The new meaning of educational change (2nd ed.). Lon-
don: Cassell.
Gardner, H. (1992). Assessment in context: The alternative to standardized testing. In B. R.
Gifford & M. C. O’Connor (Eds.), Changing assessments: Alternative views of aptitude, achieve-
ment and instruction (pp. 77–119). London: Kluwer Academic.
Geertz, C. (1973). The interpretation of culture. New York: Basic Books.
Genesee, F. (1994). Assessment alternatives. TESOL Matters, 4(5), 2.
Gifford, B. R., & O’Connor, M. C. (Eds.). (1992). Changing assessments: Alternative views of aptitude,
achievement and instruction. London: Kluwer Academic.
REFERENCES 217
Gipps, C. V. (1994). Beyond testing: Toward a theory of educational assessment. London: Falmer
Press.
Glaser, R. (1981). The future of testing: A research agenda for cognitive psychology and
psychometrics. American Psychologist, 36, 923–936.
Glaser, R. (1990). Towards new models of assessment. International Journal for Educational Re-
search, 14, 475–483.
Glaser, R., & Bassok, M. (1989). Learning theory and the study of instruction. Annual Review of
Psychology, 40, 631–666.
Glaser, R., & Silver, E. (1994). Assessment, testing, and instruction: Retrospect and prospect (Tech.
Rep. 379). Pittsburgh, PA: University of Pittsburgh, Learning Research and Development Cen-
ter.
Goetz, J. P., & LeCompte, M. D. (1984). Ethnography and qualitative design in educational research.
Orlando, FL: Academic Press.
Goldstein, H. (1989). Psychometric test theory and educational assessment. In H. Simons & J.
Elliot (Eds.), Rethinking appraisal and assessment (pp. 140–148). Milton Keynes, England:
Open University Press.
Grove, E. (1997, October). Accountability in competency-based language programs: Issues of curricu-
lum and assessment. Paper presented at the meeting of the Applied Linguistics Association of
Australia, Toowoomba, Australia.
Gui, S., Li, X., & Li, W. (1988). A reflection on experimenting with the National Matriculation Eng-
lish Test. In National Education Examination Authorities (Ed.), Theory and practice of stan-
dardized test (pp. 70–85). Guangzhou, China: Guangdong Higher Education Press.
Hagan, P. (1994). Competency-based curriculum: The NSW AMES experience. Prospect, 9(2),
30–40.
Hagan, P., Hood, S., Jackson, E., Jones, M., Joyce, H., & Manidis, M. (1993). Certificate in spoken
and written English (2nd ed.). Sydney, Australia: New South Wales Adult Migrant English Ser-
vice.
Haladyna, T. M., Nolen, S. B., & Haas, N. S. (1991). Raising standardized achievement test scores
and the origins of test score pollution. Educational Research, 20(5), 2–7.
Hamp-Lyons, L. (1997). Washback, impact and validity: Ethical concerns. Language Testing, 14,
295–303.
Han, J. (1997). The educational statistics yearbook of China. Beijing, China: People’s Education
Press.
Hargreaves, A. (1994). Changing teachers, changing times: Teachers’ work and culture in the post-
modern age. London: Cassell.
Hayek, F. A. (1952). The counter-revolution of science: Studies on the abuse of reason. Indianapolis,
IN: Liberty Press.
Henrichsen, L. E. (1989). Diffusion of innovations in English language teaching: The ELEC effort in Ja-
pan, 1956–1968. New York: Greenwood Press.
Herman, J. L. (1989). Priorities of educational testing and evaluation: The testimony of the CRESST
National Faculty (Tech. Rep. 304). Los Angeles: University of California, Center for the Study
of Evaluation.
Herman, J. L. (1992). Accountability and alternative assessment: Research and development issues
(Tech. Rep. 384). Los Angeles: University of California, Center for the Study of Evaluation.
Herman, J. L., & Golan, S. (1993). The effects of standardized testing on teaching and schools. Ed-
ucational Measurement: Issues and Practice, 12(4), 20–25, 41–42.
Herman, J. L., & Golan, S. (n.d.). Effects of standardized testing on teachers and learning. Another
look (CSE Tech. Rep. 334). Los Angeles: University of California National Center for Research
on Evaluation, Standards, and Student Testing.
Herrington, R. (1996). Test-taking strategies and second language proficiency: Is there a relationship?
Unpublished master’s thesis, Lancaster University, England.
218 REFERENCES
Kemmis, S., & McTaggart, R. (1988). The action research planner (3rd ed.). Melbourne, Victoria,
Australia: Deakin University Press.
Kennedy, C. (1988). Evaluation of the management of change in ELT projects. Applied Linguistics,
9, 329–342.
Khaniyah, T. R. (1990). Examinations as instruments for educational change: Investigating the
washback effect of the Nepalese English exams. Unpublished doctoral dissertation, University
of Edinburgh, Scotland.
King, R. (1997). Can public examinations have a positive washback effect on classroom teaching?
In P. Grundy (Ed.), IATEFL 31th International Annual Conference Brighton, April 1997 (pp.
33–38). London: International Association of Teachers of English as a Foreign Language.
Koretz, D., Barron, S., Mitchell, K., & Stecher, B. (1996). The perceived effects of the Kentucky in-
structional results information system (KIRIS) (Document No. MR-792-PCT/FF). Santa Monica,
CA: RAND.
Koretz, D., Stecher, B., Klein, S., & McCaffrey, D. (1994). The Vermont portfolio assessment pro-
gram: Findings and implications. Educational Measurement: Issues and Practice, 13(3), 5–16.
Krashen, S. D. (1993). The power of reading. Englewood, CO: Libraries Unlimited, Inc.
Krashen, S. D. (1998). Comprehensible output? System, 26, 175–182.
Kuckartz, U. (1998). WinMax. Scientific text analysis for the social sciences: User’s guide. Thousand
Oaks, CA: Sage.
Kunnan, A. (2000). IELTS impact study project. Cambridge, England: University of Cambridge Lo-
cal Examinations Syndicate.
Lai, C. T. (1970). A scholar in imperial China. Hong Kong: Kelly & Walsh.
Lam, H. P. (1993). Washback—Can it be quantified? A study on the impact of English Examinations in
Hong Kong. Unpublished master’s thesis, University of Leeds, Leeds, England.
Latham, H. (1877). On the action of examinations considered as a means of selection. Cambridge,
England: Deighton, Bell and Company.
Lazaraton, A. (1995). Qualitative research in applied linguistics: A progress report. TESOL Quar-
terly, 29, 455–472.
LeCompte, M. D., & Preissle, J. (1993). Ethnography and qualitative design in educational research
(2nd ed.). New York: Academic Press.
LeCompte, M. D., Millroy, W. L, & Preissle, J. (1992). The handbook of qualitative research in educa-
tion. San Diego, CA: Academic Press.
Lewkowicz, J. A. (2000). Authenticity in language testing: Some outstanding questions. Language
Testing, 17, 43–64.
Li, X. (1984). In defense of the communicative approach. ELT Journal, 38(1), 2–13.
Li, X. (1988). Teaching for use, learning by use and testing through use. In H. Xiao (Ed.), Standard-
ized English test and ELT in the middle schools (pp. 80–90). Guangzhou: Guangdong Education
Press.
Li, X. (1990). How powerful can a language test be? The MET in China. Journal of Multilingual and
Multicultural Development, 11, 393–404.
Li, X., Gui, S., & Li, W. (1990). The design of the NMET and ELT in middle schools. English Lan-
guage Teaching and Research in Primary Schools and Middle Schools, 1, 1–27.
Lincoln, Y. S., & Guba, E. G. (1985). Naturalistic inquiry. Beverly Hills, CA: Sage.
Linn, R. L. (1983). Testing and instruction: Links and distinctions. Journal of Educational Measure-
ment, 20, 179–189.
Linn, R. L. (1992). Educational assessment: Expanded expectations and challenges (Tech. Rep. 351).
Boulder: University of Colorado at Boulder, Center for the Study of Evaluation.
Linn, R. L. (2000). Assessments and accountability. Educational Researcher, 29(2), 4–16.
Linn, R. L., & Herman, J. L. (1997, February). Standards-led assessment: Technical and policy issues
in measuring school and student progress (CSE technical report 426). Los Angeles: University of
California National Center for Research on Evaluation, Standards, and Student Testing.
220 REFERENCES
Linn, R. L., Baker, E. L., & Dunbar, S. B. (1991). Complex, performance-based assessment: Expec-
tations and validation criteria. Educational Researcher, 20(8), 15–21.
Liu, Y. (Ed.). (1992). Book of major educational events in China. Hangzhou, China: Zhejiang Educa-
tion Press.
Lock, C. L. (2001). The influence of a large-scale assessment program on classroom practices. Unpub-
lished doctoral dissertation, Queen’s University, Kingston, Ontario, Canada.
London, N. (1997). A national strategy for system-wide curriculum improvement in Trinidad and
Tobago. In D. W. Chapman, L. O. Mahlck, & A. Smulders (Eds.), From planning to action: Gov-
ernment initiatives for improving school level practice (pp. 133–146). Paris: International Insti-
tute for Educational Planning.
Low, G. D. (1988). The semantics of questionnaire rating scales. Evaluation and Research in Edu-
cation, 22, 69–79.
Macintosh, H. G. (1986). The prospects for public examinations in England and Wales. In D. L.
Nuttall (Ed.), Assessing educational achievement (pp. 19–34). London: Falmer Press.
Madaus, G. F. (1985a). Public policy and the testing profession: You’ve never had it so good? Edu-
cational Measurement: Issues and Practice, 4(4), 5–11.
Madaus, G. F. (1985b). Test scores as administrative mechanisms in educational policy. Phi Delta
Kappa, 66, 611–17.
Madaus, G. F. (1988). The influence of testing on the curriculum. In L. N. Tanner (Ed.), Critical is-
sues in curriculum: Eighty-seventh yearbook of the National Society for the Study of Education (pp.
83–121). Chicago: University of Chicago Press.
Madaus, G. F., & Kellaghan, T. (1992). Curriculum evaluation and assessment. In P. W. Jackson
(Ed.), Handbook of research on curriculum (pp. 119–154). New York: Macmillan.
Maeher, M. L., & Fyans, L. J., Jr. (1989). School culture, motivation, and achievement. In M. L.
Maehr & C. Ames (Eds.), Advances in motivation and achievement: Vol. 6. Motivation enhancing
environments (pp. 215–247). Greenwich, CT: JAI Press.
Markee, N. (1993). The diffusion of innovation in language teaching. Annual Review of Applied Lin-
guistics, 13, 229–243.
Markee, N. (1997). Managing curricular innovation. Cambridge, England: Cambridge University
Press.
Marton, F., Hounsell, D. J., & Entwistle, N. J. (Eds.). (1984). The experience of learning. Edinburgh,
Scotland: Scottish Academic Press.
McCallum, B., Gipps, C., McAlister, S., & Brown, M. (1995). National curriculum assessment:
Emerging models of teacher assessment in the classroom. In H. Torrance (Ed.), Evaluating
authentic assessment: Problems and possibilities in new approaches to assessment (pp. 88–104).
Buckingham, England: Open University Press.
McEwen, N. (1995a). Educational accountability in Alberta. Canadian Journal of Education, 20,
27–44.
McEwen, N. (1995b). Introducing accountability in education in Canada. Canadian Journal of Edu-
cation, 20, 1–17.
McIver, M. C., & Wolf, S. A. (1999). The power of the conference is the power of suggestion. Lan-
guage Arts, 77, 54–61.
McNamara, T. (1996). Measuring second language performance. London: Longman.
Merriam, S. B. (1988). Case study research in education: A qualitative approach. San Francisco:
Jossey Bass.
Messick, S. (1975). The standard problem: Meaning and values in measurement and evaluation.
American Psychologist, 30, 955–966.
Messick, S. (1989). Validity. In R. Linn (Ed.), Educational measurement (3rd ed., pp. 13–103). New
York: Macmillan.
Messick, S. (1992, April). The interplay between evidence and consequences in the validation of per-
formance assessments. Paper presented at the annual meeting of the National Council on
Measurement in Education, San Francisco.
REFERENCES 221
Messick, S. (1994). The interplay of evidence and consequences in the validation of performance
assessments. Educational Researcher, 23(2), 13–23.
Messick, S. (1996). Validity and washback in language testing. Language Testing, 13, 241–256.
Milanovic, M., & Saville, N. (1996). Considering the impact of Cambridge EFL examinations. Cam-
bridge, England: University of Cambridge Local Examinations Syndicate.
Miles, M. B., & Huberman, A. M. (1994). Qualitative data analysis: An expanded sourcebook (2nd
ed.). Thousand Oaks, CA: Sage.
Ministry of Education, Science and Culture. (1992). Atarashii jidai ni taio suru kyoiku no shosedo no
kaikaku—dai 14 ki chuo kyoiku shingikai toshin [Reforming educational systems for a new
era—a report from 14th conference on education]. Tokyo: Author.
Ministry of Education. (1998). Director General Bulletin. Jerusalem: English Inspectorate.
Ministry of Education. (1999). Internal document: No. 3. Beijing, China: Author.
Morris, N. (1961). An historian’s view of examinations. In S. Wiseman (Ed.), Examinations and
English education. Manchester, England: Manchester University Press.
Morris, P. (1985). Teachers’ perceptions of the barriers to the implementation of a pedagogic in-
novation: A South East Asian case study. International Review of Education, 31, 3–18.
Morris, P. (1990). Teachers’ perceptions of the barriers to the implementation of a pedagogic in-
novation. In P. Morris (Ed.), Curriculum development in Hong Kong (pp. 45–60). Hong Kong:
Hong Kong University Press.
Morris, P. (1995). The Hong Kong school curriculum: Development, issues and policies. Hong Kong:
Hong Kong University Press.
Morris, P., Adamson, R., Au, M. L., Chan, K. K., Chan, W. Y., Yuk, K. P., et al. (1996). Target oriented
curriculum evaluation project: Interim report. Hong Kong: University of Hong Kong, Faculty of
Education.
Morrow, K. (1986). The evaluation of tests of communicative performance. In M. Portal (Ed.), In-
novations in language testing: Proceedings of the IUS/NFER conference (pp. 1–13). London:
NFER/Nelson.
Mosier, C. I. (1947). A critical examination of the concepts of face validity. Educational and Psy-
chological Measurement, 7, 191–205.
Munby, J. (1978). Communicative syllabus design. Cambridge, England: Cambridge University
Press.
Nagano, S. (1984). Kyoiku hyoka ron [Evaluation of education]. Tokyo: Daiichi hoki.
National Teaching Syllabus. (1990). China: Ministry of Education, PRC.
National Teaching Syllabus. (1996). China: Ministry of Education, PRC.
New South Wales Adult Migrant English Service. (1995). Certificates I, II, III and IV in spoken and
written English. Sydney, Australia: Author.
New South Wales Adult Migrant English Service. (1997). Certificates I, II, III and IV in spoken and
written English. Sydney, Australia: Author.
Noble, A. J., & Smith, M. L. (1994a). Measurement-driven reform: Research on policy, practice, reper-
cussion (Tech. Rep. 381). Tempe, AZ: Arizona State University, Center for the Study of Evalua-
tion.
Noble, A. J., & Smith, M. L. (1994b). Old and new beliefs about measurement-driven reform: ‘The
more things change, the more they stay the same’ (Tech. Rep. 373). Tempe, AZ: Arizona State
University, Center for the Study of Evaluation.
Nolen, S. B., Haladyna, T. M., & Haas, N. S. (1992). Uses and abuses of achievement test scores.
Educational Measurement: Issues and Practice, 11(2), 9–15.
Nunan, D. (1989). Understanding language classrooms: A guide for teacher-initiated action. New
York: Prentice-Hall.
Office of the Superintendent of Public Instruction. The Washington assessment of student learning:
An update on writing. Retrieved February 15, 2000, from http://www.k12.wa.us/assessment/
assessproginfo/subdocuments/writupdate.asp
222 REFERENCES
Ogawa, Y. (1981). Hanaseru dake ga eigo ja nai [Beyond English conversation: Making school Eng-
lish work for you]. Tokyo: Simul Press.
Oppenheim, A. N. (1992). Questionnaire design, interviewing and attitude measurement. London:
Pinter.
Oxenham, J. (Ed.). (1984). Education versus qualifications? London: Allen and Unwin.
Paris, S. G., Lawton, T. A., Turner, J. C., & Roth, J. L. (1991). A developmental perspective on stan-
dardized achievement testing. Educational Researcher, 20(5), 12–19.
Patton, M. Q. (1987). How to use qualitative methods in evaluation. London: Sage.
Pearson, I. (1988). Tests as levers for change. In D. Chamberlain & R. J. Baumgardner (Eds.), ESP
in the classroom: Practice and evaluation (pp. 98–107). London: Modern English.
Petrie, H. G. (1987). Introduction to evaluation and testing. Educational Policy, 1, 175–180.
Phillipson, R. (1992). Linguistic imperialism. Oxford, England: Oxford University Press.
Popham, W. J. (1983). Measurement as an instructional catalyst. In R. B. Ekstrom (Ed.), New direc-
tions for testing and measurement: Measurement, technology, and individuality in education (pp.
19–30). San Francisco: Jossey-Bass.
Popham, W. J. (1987). The merits of measurement-driven instruction. Phi Delta Kappa, 68,
679–682.
Popham, W. J. (1993). Measurement-driven instruction as a ‘quick-fix’ reform strategy. Measure-
ment and Evaluation in Counseling and Development, 26, 31–34.
Popham, W. J., Cruse, K. L., Rankin, S. C., Standifer, P. D., & Williams, P. L. (1985). Measurement-
driven instruction: It is on the road. Phi Delta Kappa, 66, 628–634.
Por, L. H., & Ki, C. S. (1995). Methodology washback: An insider’s view. In D. Nunan, R. Berry, & V.
Berry (Eds.), Bringing about change in language education (pp. 217–235). Hong Kong: Univer-
sity of Hong-Kong, Department of Curriculum Studies.
Purpura, J. (1999). Learner strategy use and performance on language tests. Cambridge, England:
Cambridge University Press.
Quinn, T. J. (1993). The competency movement, applied linguistics and language testing: Some
reflections and suggestions for a possible research agenda. Melbourne Papers in Language
Testing, 2(2), 55–87.
Quirk, R. (1995, December). The threat and promise of English. Paper presented at the Language
Planning and Policy Conference, Ramat Gan, Israel.
Read, J. (1999, July). The policy context of English testing for immigrants. Paper presented at the
Language Testing Research Colloquium, Tsukuba, Japan.
Reischauer, E. O., Kobayashi, H., & Naya, Y. (1989). Nihon no kokusaika [The internationalization
of Japan]. Tokyo: Bunge Shunju Sha.
Resnick, L. B. (1989). Toward the thinking curriculum: An overview. In L. B. Resnick & L. E.
Klopfer (Eds.), Toward the thinking curriculum: Current cognitive research (pp. 1–18). Reston,
VA: Association for Supervision and Curriculum Development.
Resnick, L. B., & Resnick, D. P. (1992). Assessing the thinking curriculum: New tools for educa-
tional reform. In B. R. Gifford & M. C. O’Connor (Eds.), Changing assessments: Alternative views
of aptitude, achievement and instruction (pp. 37–75). London: Kluwer Academic.
Robinson, P. (1993). Teachers facing change. Adelaide, Australia: National Center for Vocational
Education Research.
Rogers, E. M. (1983). The diffusion of innovations (3rd ed.). New York: Macmillan.
Rohlen, T. P. (1983). Japan’s high schools. Berkeley: University of California Press.
Rumsey, D. (1993, November). A practical model for assessment of workplace competence within a
competency-based system of training. Paper presented at the Testing Times Conference, Syd-
ney, NSW, Australia.
Runte, R. (1998). The impact of centralized examinations on teacher professionalism. Canadian
Journal of Education, 23, 166–181.
Saito, T., Arita, S., & Nasu, I. (1984). Tashi-sentaku tesuto ga igaku kyoiku ni oyobosu eikyo [The in-
fluence of multiple-choice test on medical education] (Nihon igaku kyoiku shinko zaidan
REFERENCES 223
kenkyu jose ni yoru kenkyu hokoku sho [Technical report of the Japan Medical Education Re-
search Fund]). Okayama: Kawasaki Medical School.
Sanders, W. L., & Horn, S. P. (1995). Educational assessment reassessed: The usefulness of stan-
dardized and alternative measures of student achievement as indicators for the assessment
of educational outcomes. Education Policy Analysis Archives, 3(6), 1–15.
Saville, N. (1998). Predicting impact on language learning and the classroom. UCLES internal report.
Cambridge, England: University of Cambridge Local Examinations Syndicate.
Saville, N. (2000). Investigating the impact of international language examinations (Research Notes
No. 2). Available from University of Cambridge Local Examinations Syndicate Web site, http://
www.cambridge-efl.org/rs_notes.
Scaramucci, M. (1999, July). A study of washback in Brazil. Paper presented at the Language
Testing Research Colloquium, Tsukuba, Japan.
Schiefelbein, E. (1993). The use of national assessments to improve primary education in Chile.
In D. W. Chapman & L. O. Mahlck (Eds.), From data to action: Information systems in educational
planning (pp. 117–146). Paris, France: UNESCO.
Seliger, H. W., & Shohamy, E. G. (1989). Second language research methods. Oxford: Oxford Univer-
sity Press.
Shavelson, R. J., & Stern, P. (1981). Research on teachers’ pedagogical thoughts, judgments, deci-
sions, and behavior. Review of Educational Research, 51, 455–498.
Shepard, L. A. (1990). Inflated test score gains: Is the problem old norms or teaching the test? Ed-
ucational Measurement: Issues and Practice, 9, 15–22.
Shepard, L. A. (1991a). Interview on assessment issues with Lorrie Shepard. Educational Re-
searcher, 20(2), 21–27.
Shepard, L. A. (1991b). Psychometricians’ beliefs about learning. Educational Researcher, 20(6),
2–16.
Shepard, L. A. (1992). What policy makers who mandate tests should know about the new psy-
chology of intellectual ability and learning. In B. R. Gifford & M. C. O’Connor (Eds.), Changing
assessments: Alternative views of aptitude, achievement and instruction (pp. 301–327). London:
Kluwer Academic.
Shepard, L. A. (1993). The place of testing reform in educational reform: A reply to Cizek. Educa-
tional Researcher, 22(4), 10–14.
Shepard, L. A., & Dougherty, K. C. (1991, April). Effects of high-stakes testing on instruction. Paper
presented at the annual meeting of the American Educational Research Association and the
National Council on Measurement in Education, Chicago.
Shiozawa, T. (1983). Daigaku nyushi—genjo to kadai [University entrance examinations—the pres-
ent situation and problems]. Eigo Kyoiku Sokan 30-shunen Kinen Zokango [English Teacher’s
Magazine, 30th Anniversary Special Issue], 39–41.
Shohamy, E. (1992). Beyond proficiency testing: A diagnostic feedback testing model for assess-
ing foreign language learning. Modern Language Journal, 76, 513–521.
Shohamy, E. (1993a). The power of test: The impact of language testing on teaching and learning.
Washington, DC: National Foreign Language Center Occasional Papers. The National Foreign
Language Center, Washington, DC.
Shohamy, E. (1993b). The exercise of power and control in the rhetorics of testing. Center for Applied
Language Studies, Carleton University, Ottawa, Canada, 10:48–62.
Shohamy, E. (1997). Testing methods, testing consequences: Are they ethical? Are they fair? Lan-
guage Testing, 14, 340–349.
Shohamy, E. (1999). Language testing: Impact. In B. Spolsky (Ed.), Concise Encyclopedia of Educa-
tional Linguistics (pp. 711–714). Oxford, England: Pergamon.
Shohamy, E. (2000). Using language tests for upgrading knowledge. Hong Kong Journal of Applied
Linguistics, 5(1), 1–18.
Shohamy, E., & Donitsa-Schmidt, S. (1995, April). The perceptions and stereotypes of Hebrew vs.
English among three different ethnic groups in Israel. Paper presented at the meeting of the
American Association of Applied Linguistics, Long Beach, CA.
224 REFERENCES
Shohamy, E., Donitsa-Schmidt, S., & Ferman, I. (1996). Test Impact revisited: Washback effect
over time. Language Testing, 13, 298–317.
Silverman, D. (1993). Interpreting qualitative data: Methods for analyzing talk, text and interaction.
London: Sage.
Simon, B. (1974). The two nations and educational structure 1780–1870. London: Lawrence & Wis-
hart.
Smith, M. L. (1991a). Meanings of test preparation. American Educational Research Journal, 28,
521–542.
Smith, M. L. (1991b). Put to the test: The effects of external testing on teachers. Educational Re-
searcher, 20(5), 8–11.
Snyder, C. W., Jr., Prince, B., Johanson, G., Odaet, C., Jaji, L., & Beatty, M. (1997). Exam fervor and
fever: Case studies of the influence of primary leaving examinations on Uganda classrooms,
teachers, and pupils: Vol. 1. Washington, DC: Academy for Educational Development, Ad-
vancing Basic Education and Literacy Project.
Somerset, A. (1983). Examination reform: The Kenya experience. Washington, DC: World Bank.
Spada, N., & Froehlich, M. (1995). COLT: Communicative orientation of language teaching observa-
tion scheme, coding conventions and applications. Sydney, NSW: Macquarie University, Na-
tional Center for English Language Teaching and Research.
Spolsky, B. (1995a). The examination of classroom backwash cycle: Some historical cases. In D.
Nunan, V. Berry, & R. Berry (Eds.), Bringing about change in language education (pp. 55–66).
Hong Kong: University of Hong Kong, Department of Curriculum Studies.
Spolsky, B. (1995b). Measured words. Oxford: Oxford University Press.
Stecher, B., & Barron, S. (1999). Quadrennial milepost accountability testing in Kentucky (Tech. Rep.
505). Los Angeles: University of California, National Center for Research on Evaluation, Stan-
dards, and Student Testing.
Stecher, B., Barron, S., Chun, T., Krop, C., & Ross, K. (2000). The effects of Washington education re-
form on schools and classrooms (Tech. Rep. 525). Los Angeles: University of California, Na-
tional Center for Research on Evaluation, Standards, and Student Testing.
Stecher, B., Barron, S., Kaganoff, T., & Goodwin, J. (1998). The effects of standards-based assess-
ment on classroom practices: Results of the 1996–97 RAND survey of Kentucky teachers of mathe-
matics and writing (Tech. Rep. 482). Los Angeles: University of California, National Center for
Research on Evaluation, Standards, and Student Testing.
Stecher, B., & Chun, T. (2002). School and classroom practices during two years of education reform
in Washington state (CSE Tech. Rep. No. 550). Los Angeles: University of California, National
Center for Research on Evaluation, Standards, and Student Testing.
Steiner, J. (1995a). Changes in the English Bagrut exam. Jerusalem: Ministry of Education, English
Inspectorate.
Steiner, J. (1995b). Reading for pleasure. Jerusalem: Ministry of Education, English Inspectorate.
Stevenson, D. K., & Riewe, U. (1981). Teachers’ attitudes towards language tests and testing. In T.
Culhane, C. Klein-Braley, & D. K. Stevenson (Eds.), Practice and problems in language testing.
Occasional Papers, 26 (pp. 146–155). Essex, UK: University of Essex.
Stiggins, R., & Faires-Conkin, N. (1992). In teachers’ hands. Albany, NY: State University of New
York Press.
Stoller, F. (1994). The diffusion of innovations in intensive ESL programs. Applied Linguistics, 15,
300–327.
Swain, M. (1984). Large-scale communicative language testing: A case study. In S. J. Savignon &
M. Berns (Eds.), Initiatives in communicative language teaching (pp. 185–201). Reading, MA: Ad-
dison-Wesley.
Swain, M. (1985). Large-scale communicative language testing: A case study. In Y. P. Lee, A. C. Y. Y.
Fok, R. Lord, & G. Low (Eds.), New directions in language testing (pp. 35–46). Oxford:
Pergamon.
REFERENCES 225
Swain, M. (1995). Three functions of output in second language learning. In G. Cook & B.
Scidelhofer (Eds.), Principle and practice in applied linguistics: Studies in honor of J. G.
Woddowson (pp. 125–144). Oxford, England: Oxford University Press.
Swain, M., & Lapkin, S. (1995). Problems in output and the cognitive processes they generate: A
step towards second language learning. Applied Linguistics, 16, 371–391.
Takano, F. (1992). Daigaku nyushi no kaizen ni mukete [Towards a reform of university entrance
examination]. Shizen, 7, 13–26.
Tang, C., & Biggs, J. B. (1996). How Hong Kong students cope with assessment. In D. A. Watkins &
J. B. Biggs (Eds.), The Chinese learner: Cultural, psychological and contextual influences (pp.
159–182). Hong Kong: Center for Comparative Research in Education.
Troman, G. (1989). Testing tension: The politics of educational assessment. British Educational
Research Journal, 15, 279–295.
University of Cambridge Local Examinations Syndicate (UCLES). (1999). The IELTS handbook.
Cambridge, England: Authors.
University of Cambridge Local Examinations Syndicate (UCLES). (2000). IELTS handbook. Cam-
bridge, England: Author.
Valette, R. M. (1967). Modern language testing. New York: Harcourt Brace.
van Lier, L. (1988). The classroom and the language learner. New York: Longman.
VanPatten, B., & Sanz, C. (1995). From input to output: Processing instruction and communica-
tive task. In F. R. Eckman, D. Highland, P. W. Lee, J. Mileham, & R. R. Weber (Eds.), Second lan-
guage acquisition: Theory and pedagogy (pp. 169–186). Mahwah, NJ: Lawrence Erlbaum Associ-
ates.
Vernon, P. E. (1956). The measurement of abilities (2nd ed.). London: University of London Press.
Vogel, E. F. (1979). Japan as number one: Lessons for America. Tokyo: Charles E. Tuttle.
Wall, D. (1996). Introducing new tests into traditional systems: Insights from general education
and from innovation theory. Language Testing, 13, 334–354.
Wall, D. (1997). Impact and washback in language testing. In C. Clapham & D. Corson (Eds.), Ency-
clopedia of language and education: Vol. 7. Language testing and assessment (pp. 291–302).
Dordrecht: Kluwer Academic.
Wall, D. (1999). The impact of high-stakes examinations on classroom teaching: A case study using in-
sights from testing and innovation theory. Unpublished doctoral dissertation, Lancaster Uni-
versity, UK.
Wall, D. (2000). The impact of high-stakes testing on teaching and learning: Can this be predicted
or controlled? System, 28, 499–509.
Wall, D., & Alderson, J. C. (1993). Examining washback: The Sri Lankan impact study. Language
Testing, 10, 41–69.
Wall, D., & Alderson, J. C. (1996). Examining washback: The Sri Lanka impact study. In A.
Cumming & R. Berwick (Eds.), Validation in language testing (pp. 194–221). Philadelphia: Multi-
lingual Matters.
Wall, D., Kalnberzina, V., Mazuoliene, Z., & Truus, K. (1996). The Baltic States Year 12 examina-
tion project. Language Testing Update, 19, 15–27.
Washington State Commission on Student Learning. (1997). Essential academic learning require-
ments: Technical manual. Olympia, WA: Author.
Watanabe, Y. (1996a). Investigating washback in Japanese EFL classrooms: Problems of method-
ology. In G. Wigglesworth & C. Elder (Eds.), The language testing circle: From inception to
washback (pp. 208–239). Melbourne, Victoria, Australia: Applied Linguistics Association of
Australia.
Watanabe, Y. (1996b). Does grammar translation come from the entrance examination? Prelimi-
nary findings from classroom-based research. Language Testing, 13, 318–333.
Watanabe, Y. (1997a). Nyushi kara eigo o hazusu to jugyo wa kawaru ka [Will elimination of Eng-
lish from the entrance examination change classroom instruction?] Eigo kyoiku [English
teachers magazine], September, special issue, 30–35. Tokyo: Taihukan shoten.
226 REFERENCES
Watanabe, Y. (1997b). Washback effects of the Japanese university entrance examination: Class-
room-based research. Unpublished doctoral dissertation, Lancaster University, UK.
Watanabe, Y. (2000). Washback effects of the English section of the Japanese university entrance
examinations on instruction in pre-college level EFL. Language Testing Update, 27, 42–47.
Watanabe, Y. (2001). Does the university entrance examination motivate learners? A case study
of learner interviews. In Akita Association of English Studies (Eds.), Trans-equator exchanges:
A collection of academic papers in honor of Professor David Ingram (pp. 100–110). Akita, Japan:
Author.
Watson-Gegeo, K. A. (1988). Ethnography in ESL: Defining the essentials. TESOL Quarterly, 22,
575–592.
Watson-Gegeo, K. A. (1997). Classroom ethnography. In N. H. Hornberger & D. Corson (Eds.), En-
cyclopedia of language and education: Vol. 8. Research methods in language and education (pp.
135–144). London: Kluwer Academic.
Weir, C. J. (2002). Continuity and innovation: The revision of CPE 1913–2013. Cambridge, England:
Cambridge University Press/UCLES.
Whetton, C. (1999, May). Attempting to find the true cost of assessment systems. Paper presented at
the annual meeting of the International Association for Educational Assessment, Bled, Slo-
venia.
White, R. V. (1988). The ELT curriculum: Design, innovation and management. Oxford: Blackwell.
White, R. V. (1991). Managing curriculum development and innovation. In R. V. White, M. Martin,
M. Stimson, & R. Hodge (Eds.), Management in English language teaching (pp. 166–195). Cam-
bridge, England: Cambridge University Press.
Widen, M. F., O’Shea, T., & Pye, I. (1997). High-stakes testing and the teaching of science. Cana-
dian Journal of Education, 22, 428–444.
Wiggins, G. (1989a). A true test: Toward more authentic and equitable assessment. Phi Delta
Kappa, 70, 703–713.
Wiggins, G. (1989b). Teaching to the (authentic) test. Educational Leadership, 46(7), 41–47.
Wiggins, G. (1993). Assessment: Authenticity, context, and validity. Phi Delta Kappa, 75, 200–214.
Wilkins, D. (1976). Notional syllabuses. Oxford, England: Oxford University Press.
Williams, M., & Burden, R. (1997). Psychology for language teachers: A social constructivist ap-
proach. Cambridge, England: Cambridge University Press.
Winetroube, S. (1997). The design of the teachers’ attitude questionnaires. Cambridge, England: Uni-
versity of Cambridge Local Examinations Syndicate.
Wiseman, S. (Ed.). (1961). Examinations and English education. Manchester, England: Manchester
University Press.
Wolf, S. A., Borko, H., Elliot, R., & McIver, M. (2000). “That dog won’t hunt!”: Exemplary school
change efforts within the Kentucky reform. American Educational Research Journal, 37, 349–
393.
Woods, A., Fletcher, P., & Hughes, A. (1986). Statistics in language studies. Cambridge, England:
Cambridge University Press.
Woods, D. (1996). Teacher cognition in language teaching: Beliefs, decision-making and classroom
practice. Cambridge, England: Cambridge University Press.
Yang, H. (1999, August 5). The validation study of the National College English Test. Paper presented
at the annual meeting of Association Internationale de Linguistique Appliquée (AILA), To-
kyo.
Yue, W. W. (1997). An investigation of textbook materials designed to prepare students for the IELTS
test: A study of washback. Unpublished master’s thesis, Lancaster University, England.