0% found this document useful (0 votes)
27 views58 pages

liying cheng - washback in lang testing

The document discusses the concept of washback in language testing, which refers to the influence of tests on teaching and learning. It highlights the complexity of washback effects, emphasizing the need for empirical research to understand the various factors that contribute to positive or negative outcomes in educational contexts. The book is edited by Liying Cheng and Yoshinori Watanabe, and includes contributions from various authors exploring washback studies from around the world.

Uploaded by

emmang1307
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
27 views58 pages

liying cheng - washback in lang testing

The document discusses the concept of washback in language testing, which refers to the influence of tests on teaching and learning. It highlights the complexity of washback effects, emphasizing the need for empirical research to understand the various factors that contribute to positive or negative outcomes in educational contexts. The book is edited by Liying Cheng and Yoshinori Watanabe, and includes contributions from various authors exploring washback studies from around the world.

Uploaded by

emmang1307
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 58

WASHBACK IN

LANGUAGE TESTING
Research Contexts and Methods
WASHBACK IN
LANGUAGE TESTING
Research Contexts and Methods

Edited by

Liying Cheng
Queen’s University

Yoshinori Watanabe
Akita National University

With

Andy Curtis
Queen’s University

LAWRENCE ERLBAUM ASSOCIATES, PUBLISHERS


2004 Mahwah, New Jersey London
This edition published in the Taylor & Francis e-Library, 2008.
“To purchase your own copy of this or any of Taylor & Francis or Routledge’s
collection of thousands of eBooks please go to www.eBookstore.tandf.co.uk.”

Copyright Ó 2004 by Lawrence Erlbaum Associates, Inc.


All rights reserved. No part of this book may be reproduced in
any form, by photostat, microform, retrieval system, or any other
means, without the prior written permission of the publisher.

Lawrence Erlbaum Associates, Inc., Publishers


10 Industrial Avenue
Mahwah, New Jersey 07430

Cover design by Kathryn Houghtaling Lacey

Library of Congress Cataloging-in-Publication Data

Washback in language testing : research contents and methods / edited by Liying Cheng,
Yoshinori J. Watanabe, with Andy Curtis.
p. cm.
Includes bibliographical references and indexes.
ISBN 0-8058-3986-0 (cloth : alk. paper) — ISBN 0-8058-3987-9 (pbk. : alk. paper)
1. English language—Study and teaching—Foreign speakers. 2. Language and
languages—Ability testing. 3. English language—Ability testing. 4. Test-taking skills.
I. Cheng, Liying, 1959– II. Watanabe, Yoshinori J., 1956– III. Curits, Andy.

PE1128.A2W264 2003
428¢.0076—dc22 2003061785
CIP

ISBN 1-4106-0973-1 Master e-book ISBN


To Jack and Andy for their love, support, and understanding
with this washback book, and to those who have
conducted and will conduct washback research
—Liying

To my parents, Akiko, and all the friends and teachers


who made this project possible
—Yoshinori (Josh)

To Manboadh Sookdeo and his family, for the tragic


and testing times of 2002
—Andy
Contents

Foreword ix

Preface xiii

About the Authors xix

PART I: CONCEPTS AND METHODOLOGY OF WASHBACK

1 Washback or Backwash: A Review of the Impact


of Testing on Teaching and Learning 3
Liying Cheng and Andy Curtis

2 Methodology in Washback Studies 19


Yoshinori Watanabe

3 Washback and Curriculum Innovation 37


Stephen Andrews

PART II: WASHBACK STUDIES FROM DIFFERENT


PARTS OF THE WORLD

4 The Effects of Assessment-Driven Reform on the


Teaching of Writing in Washington State 53
Brian Stecher, Tammi Chun, and Sheila Barron

vii
viii CONTENTS

5 The IELTS Impact Study: Investigating


Washback on Teaching Materials 73
Nick Saville and Roger Hawkey

6 IELTS Test Preparation in New Zealand:


Preparing Students for the IELTS Academic Module 97
Belinda Hayes and John Read

7 Washback in Classroom-Based Assessment:


A Study of the Washback Effect in the Australian
Adult Migrant English Program 113
Catherine Burrows

8 Teacher Factors Mediating Washback 129


Yoshinori Watanabe

9 The Washback Effect of a Public Examination Change


on Teachers’ Perceptions Toward Their
Classroom Teaching 147
Liying Cheng

10 Has a High-Stakes Test Produced the Intended Changes? 171


Luxia Qi

11 The Washback of an EFL National Oral Matriculation


Test to Teaching and Learning 191
Irit Ferman

References 211

Author Index 227

Subject Index 233


Foreword

J. Charles Alderson
Lancaster University

Washback and the impact of tests more generally has become a major area
of study within educational research, and language testing in particular, as
this volume testifies, and so I am particularly pleased to welcome this book,
and to see the range of educational settings represented in it. Exactly ten
years ago, Dianne Wall and I published an article in the journal Applied Lin-
guistics which asked the admittedly somewhat rhetorical question: “Does
Washback Exist?” In that article, we noted the widespread belief that tests
have impact on teachers, classrooms, and students, we commented that
such impact is usually perceived to be negative, and we lamented the ab-
sence of serious empirical research into a phenomenon that was so widely
believed to exist. Hence, in part, our title: How do we know it exists if there
is no research into washback? Ten years on, and a slow accumulation of
empirical research later, I believe there is no longer any doubt that wash-
back does indeed exist. But we now know that the phenomenon is a hugely
complex matter, and very far from being a simple case of tests having nega-
tive impact on teaching. The question today is not “does washback exist?”
but much rather what does washback look like? What brings washback
about? Why does washback exist?
We now know, for instance, that tests will have more impact on the con-
tent of teaching and the materials that are used than they will on the
teacher’s methodology. We know that different teachers will teach to a par-
ticular test in very different ways. We know that some teachers will teach to
very different tests in very similar ways. We know that high-stakes tests—

ix
x FOREWORD

tests that have important consequences for individuals and institutions—will


have more impact than low-stakes tests, although it is not always clear how
to identify and define the nature of those stakes, since what is a trivial conse-
quence for one person may be an important matter for another.
Although the possibility of positive washback has also often been
mooted, there are, interestingly, few examples of this having been demon-
strated by careful research. Indeed, the study that Dianne Wall and I con-
ducted in Sri Lanka (Wall & Alderson, 1993; Wall, 1996, 1999) was initially ex-
pected to show that introducing new tests into the curriculum would
reinforce innovations in teaching materials and curricula and produce posi-
tive washback. We were therefore surprised to discover that the impact of
the introduction of new tests was much more limited than expected, and we
were forced to re-examine our beliefs about washback. I cite this as an ex-
ample of how important it is to research one’s beliefs, rather than simply to
accept what appear to be truisms. But I also cite it because it was during
this research that we came to realize more deeply the complexity of the
matter, and the importance of understanding the nature of washback ef-
fects. It was in that research, for example, that we first became aware of the
importance of distinguishing between impact on teaching content and im-
pact of teaching methodology.
In subsequent research (Alderson & Hamp-Lyons, 1996) into the impact
of the TOEFL test on teaching (and incidentally and curiously, this is the
only published research to date into the washback of a test that is very
widespread and almost unanimously believed to have negative impact on
teachers and learner as well as materials) I became aware of the teacher
factor in washback, when I discovered how differently two teachers taught
toward the same test. And it was during that same research that I began to
realize that the crucial issue is not to ask whether washback exists, but to
understand why it has what effects it does have. I will never forget one of
the teachers I observed replying to the question: “Is it possible to teach
TOEFL communicatively?” by saying: “I never thought of that.” Which I in-
terpreted as meaning that he had not given much thought as to what might
be the most appropriate way to teach toward such an important test. And
when I interviewed a group of teachers about what they thought about
teaching toward TOEFL, I was surprised to learn that two things they liked
most about teaching TOEFL (there were, of course, many things they did
not like) was that they did not have to plan lessons, and they did not have
to mark homework. Two of the most important things teachers do is pre-
pare their lessons and give students feedback, and yet when teaching to-
ward TOEFL some teachers at least do not feel that this is necessary. In
short, it is at least as much the teacher who brings about washback, be it
positive or negative, as it is the test.
FOREWORD xi

In current views of the nature of test validity, the “Messickian view” of


construct validity, it is commonplace to assert the need for test validation
to include a consideration of the consequences of test use. Morrow goes so
far as to call this “washback validity.” I have serious problems with this
view of a test’s influence, not only because it is now clear that washback is
brought about by people in classrooms, not by test developers, but also be-
cause it is clearly the case that there is only so much that test developers
can do to influence how people might prepare students for their test. I ac-
cept that it is highly desirable for test developers to consider the likely im-
pact—negative as well as positive—of the test they are designing on teaching
and learning, and seeking to engineer positive washback by test design (as
Messick, 1996, put it) is certainly a sensible thing to do. But there are limits
to what a test developer can achieve, and much more attention needs to be
paid to the reasons why teachers teach the way they do. We need to under-
stand their beliefs about teaching and learning, the degree of their profes-
sionalism, the adequacy of their training and of their understanding of the
nature of and rationale for the test.
Equally, as is attested by several authors in this book, educational au-
thorities and politicians can be seen as responsible for the nature of wash-
back, because tests are frequently used to engineer innovation, to steer and
guide the curriculum. Tests are often intended as “levers for change” (Pear-
son, 1988), in a very naïve fashion. Curricular innovation is, in fact, a very
complex matter, as Fullan (1991) has clearly shown, and washback studies
need to take careful account, not only of the context into which the innova-
tion is being introduced, but of all the myriad forces that can both enhance
and hinder the implementation of the intended change. Wall (1996, 1999)
shows clearly how innovation theory, and a study of innovation practice,
can increase our understanding of how and why washback comes about.
If I may permit myself the luxury of a footnote, in reference to the use of
two terms to refer to the same phenomenon, namely backwash and wash-
back, I should explain that one of the reasons why the Alderson and Wall ar-
ticle was entitled “Does Washback Exist?” was because it seemed to us that
the word washback was commonly used in discussions, in presentations at
conferences and in teacher training. When I was studying at the University
of Edinburgh, Scotland, for example, Alan Davies, the doyen of British lan-
guage testing, frequently used the term washback and I do not recall him
ever using backwash. Whereas in what literature there was at the time, the
word “backwash” seemed much more prevalent. Hence another reason for
our question: “Does Washback Exist?” But to clarify the distinction between
the terms backwash and washback: there is none. The only difference is that
if somebody has studied at the University of Reading, UK, where Arthur
Hughes used to teach, they are likely to use the term backwash. If they have
xii FOREWORD

studied language testing anywhere else, but especially in Edinburgh or Lan-


caster in the UK, they will almost certainly use the term washback.
I would like to congratulate the editors on their achievement in commis-
sioning and bringing together such a significant collection of chapters on
washback. I am confident that this volume will not only serve to further our
understanding of the phenomenon, but I also hope it will settle once and for
all that washback, not backwash, does indeed exist, but that its existence
raises more questions than it answers, and that therefore we need to study
the phenomenon closely, carefully, systematically, and critically in order
better to understand it. For that reason, I am very pleased to welcome this
publication and I am honored to have been invited to write this Foreword.

REFERENCES

Alderson, J. C., & Hamp-Lyons, L. (1996). TOEFL preparation courses: A study of washback. Lan-
guage Testing, 13, 280–297.
Fullan, M. G., with Stiegelbauer, S. (1991). The new meaning of educational change (2nd ed.). Lon-
don: Cassell.
Messick, S. (1996). Validity and washback in language testing. Language Testing, 13, 241–256.
Pearson, I. (1988). Tests as levers for change. In D. Chamberlain & R. J. Baumgardner (Eds.), ESP
in the classroom: Practice and evaluation (pp. 98–107). London: Modern English.
Wall, D. (1996). Introducing new tests into traditional systems: Insights from general education
and from innovation theory. Language Testing, 13, 334–354.
Wall, D. (1997). Impact and washback in language testing. In C. Clapham & D. Corson (Eds.), Ency-
clopedia of language and education: Vol. 7. Language testing and assessment (pp. 291–302).
Dordrecht: Kluwer Academic.
Wall, D. (1999). The impact of high-stakes examinations on classroom teaching: A case study using in-
sights from testing and innovation theory. Unpublished doctoral dissertation, Lancaster Uni-
versity, UK.
Wall, D., & Alderson, J. C. (1993). Examining washback: The Sri Lankan impact study. Language
Testing, 10, 41–69.
Preface

We live in a testing world. Our education system is awash with various high-
stakes testing, be it standardized, multiple-choice testing or portfolio as-
sessment. Washback, a term commonly used in applied linguistics, refers to
the influence of language testing on teaching and learning. The extensive
use of examination scores for various educational and social purposes in
society nowadays has made the washback effect a distinct educational phe-
nomenon. This is true both in general education and in teaching English as
a second/foreign language (ESL/EFL), from Kindergarten to Grade 12 class-
rooms to the tertiary level. Washback is a phenomenon that is of inherent
interest to teachers, researchers, program coordinators/directors, policy-
makers, and others in their day-to-day educational activities.
Despite the importance of this issue, however, it is only recently that re-
searchers have become aware of the importance of investigating this phe-
nomenon empirically. There are only a limited number of chapters in books
and papers in journals, except for the notable exception of a special issue
on washback in the journal Language Testing (edited by J. C. Alderson and
D. Wall, 1996). Once the washback effect has been examined in the light of
empirical studies, it can no longer be taken for granted that where there is a
test, there is a direct effect. The small body of research to date suggests
that washback is a highly complex phenomenon, and it has already been es-
tablished that simply changing test contents or methods will not necessar-
ily bring about direct and desirable changes in education as intended
through a testing change. Rather, various factors within a particular educa-

xiii
xiv PREFACE

tional context seem to be involved in engineering desirable washback. The


question then is what factors are involved and under which conditions ben-
eficial washback is most likely to be generated. Thus, researchers have
started to pay attention to the specific educational contexts and testing cul-
tures within which different types of tests are being used for different pur-
poses, so that implications and recommendations can be made available to
education and testing organizations in many parts of the world.
In the field of language testing, researchers’ major interest has been to
address issues and problems inherent in a test in order to increase its reli-
ability and validity. However, washback goes well beyond the test itself. Re-
searchers now need to take account of a plethora of variables, including
school curriculum, behaviors of teachers and learners inside and outside
the classroom, their perceptions of the test, how test scores are used, and
so forth. This volume is at the intersection of language testing and teaching
practices and aims to provide theoretical, methodological, and practical
guidance for current and future washback studies.

STRUCTURE OF THE BOOK

The purpose of the present volume, then, is twofold; first to update teachers,
researchers, policymakers/administrators, and others on what is involved in
this complex issue of testing and its effects, and how such a phenomenon
benefits teaching and learning, and second, to provide researchers with
models of research studies on which future studies can be based. In order
to address these two main purposes, the volume consists of two parts. Part
I provides readers with an overall view of the complexity of washback, and
the various contextual factors entangled within testing, teaching, and learn-
ing. Part II provides a collection of empirical washback studies carried out
in many different parts of the world, which lead the readers further into the
heart of the issue within each educational context.
Chapter 1 discusses washback research conducted in general education,
and in language education in particular. The first part of the chapter re-
views the origin and the definition of this phenomenon. The second exam-
ines the complexity of the positive and negative influence of washback, and
the third explores its functions and mechanisms. The last part of the chap-
ter looks at the concept of bringing about changes in teaching and learning
through changes in testing.
Chapter 2 provides guidance to researchers by illustrating the process
that the author followed to investigate the effects of the Japanese univer-
sity entrance examinations. Readers are also introduced to the method-
ological aspects of the second part of this volume.
PREFACE xv

Chapter 3 examines the relationship between washback and curricular


innovation. It discusses theories of research on washback from both gen-
eral education and language education, and relates that discussion to what
we now know about innovation, especially educational innovation.
Chapter 4 reports on a survey research study conducted in Washington
State to examine the effects of the state’s standards-based reform on school
and classroom practices. The chapter reports on a variety of changes in
classroom practices that occurred following the reform, including changes
in curriculum and in instructional strategies. However, the core of writing
instruction continued to be writing conventions and the writing process, as
it had been before the new tests were introduced. This study concludes
that both the standards and the tests appeared to influence practice, but it
is difficult to determine their relative impact.
Chapter 5 describes the development of data collection instruments for
an impact study of the International English Language Testing System
(IELTS). Among a broad range of test impact areas the study covers, this
chapter concentrates on the impact study instrument for the evaluation of
textbooks and other materials, tracing its design, development, and valida-
tion through iterative processes of trailing and focus group analyses. Key is-
sues of data collection instrumentation classifications, format, and scale are
exemplified and discussed, and the finalized instrument for the analysis of
textbook materials is presented.
Chapter 6 reports on research in New Zealand on the washback effects of
preparation courses for the IELTS. The study involves intensive classroom
observation of two IELTS courses over a 4-week period. The results show
clear differences between the two courses. One was strongly focused on fa-
miliarizing students with the test and practicing test tasks, while the other
covered a wider range of academic study tasks. The research highlights
both the potential and the limitations of this kind of study in the investiga-
tion of washback.
Chapter 7 is a report of the study that provides an examination of the
washback effect in the context of classroom-based, achievement assess-
ment in Australia. Using conceptualization derived from a survey, inter-
views, and classroom observations based on structured observation instru-
ments, the author proposes a new model for washback, which places the
teacher, and the teacher’s beliefs, assumptions, and knowledge (Woods,
1996), at the center of the washback effect.
Chapter 8 reports on part of a large project investigating the effect of the
Japanese university entrance examinations on secondary level classroom
instructions. The results of observation studies accompanied with teacher
interviews indicate that teacher factors, such as personal beliefs and educa-
tional background, are important in the process of producing examination
effects. To induce beneficial washback in light of the results, an argument is
xvi PREFACE

made for the importance of incorporating a type of re-attribution training in


teacher development courses, and of taking account of a type of face valid-
ity during the test development process.
Chapter 9 investigates washback by identifying the ways in which an ex-
amination reform influenced teachers and their classroom teaching within
the context of teaching English as a second language (ESL) in Hong Kong
secondary schools. It reports comparative survey findings from teachers’
perspectives in relation to their reactions and attitudes, and day-to-day
classroom teaching activities, toward an examination change. The findings
illustrate certain washback effects on teachers’ perceptions toward the new
examination, although teachers’ daily teaching did not seem to be much in-
fluenced by the examination at the time of the research.
Chapter 10 investigates the intended washback of the National Matricula-
tion English Test in China (NMET) with a view to deepening our understand-
ing of the washback phenomenon through new empirical evidence. Analyses
of interview data reveal that there is considerable discrepancy between the
test constructors’ intentions and school practice. The study concludes that
the NMET has achieved very limited intended washback and is an inefficient
tool for bringing about pedagogical changes in schools in China.
Chapter 11 examines the washback effects of the Israeli national EFL oral
matriculation test immediately following its administration. The study at-
tempts to find whether this high-stake test affects the educational proc-
esses, the participants and the products of teaching and learning in Israeli
high schools, and if so, how. The study examines various factors that have
been found to be involved in the process of generating washback.
This volume is intended for a wide variety of audiences, in particular,
language teachers and testing researchers who are interested in the appli-
cation of findings to actual teaching and learning situations, researchers
who wish to keep abreast of new issues in this area, researchers and gradu-
ate students in broader language education and educational measurement
and evaluation areas who wish to conduct washback research in their own
contexts, policy and decision makers in educational and testing organiza-
tions, comparative education audiences, and language teachers, who would
like to know what washback looks like and who would like to carry out
washback research in their own context.

ACKNOWLEDGMENTS

The volume could not have been completed without the contributions of a
group of dedicated researchers who are passionate about washback re-
search. We thank you all for going through the whole process with us in
bringing this book to the language testing and assessment community. We
are grateful to so many individuals including:
PREFACE xvii

· Professor J. C. Alderson for his Foreword to this book, and as a pioneer


in the field of washback research
· Hong Wang at Queen’s University, Rie Koizumi and Yo In’nami at Univer-
sity of Tsukuba for proofing and printing the drafts
· Naomi Silverman, Senior Editor, and Lori Hawver, Assistant Editor, at
Lawrence Erlbaum Associates for supporting us in completing this
book project
· Antony Kunnan, California State University, Los Angeles; James D.
Brown, University of Hawai’i, and one anonymous reviewer for detailed
and constructive feedback

Finally, our greatest thanks go to our families, for their patience, encourage-
ment, and support while we were working on this book.

—Liying Cheng
—Yoshinori Watanabe
—Andy Curtis
About the Authors

THE EDITORS

Liying Cheng is an assistant professor and a member of the Assessment


and Evaluation Group (AEG) at the Faculty of Education, Queen’s University
in Canada. Before she joined Queen’s University, she was a Killam postdoc-
toral fellow in the Center for Research in Applied Measurement and Evalua-
tion, University of Alberta. Her doctoral dissertation—The Washback Effect of
Public Examination Change on Classroom Teaching—from the University of
Hong Kong won the seventh TOEFL Award for Outstanding Doctoral Disser-
tation Research on Second/Foreign Language Testing.

Yoshinori Watanabe is an associate professor of the Faculty of Education


and Human Studies at Akita University, Japan. He is also a research fellow
of the Japanese Ministry of Education, Science and Culture, investigating
the issue of new curricular innovation implemented in 2002. His long-
standing research interest lies in language learning strategies, classroom
observation, examination washback, and research methodology of ESL.

Andy Curtis is the Director of the School of English at Queen’s University in


Canada, where international students from around the world are tested,
taught, and assessed. Before he joined Queen’s University, he was an asso-
ciate professor in the Department of Language Teacher Education at the
School for International Training in Vermont. He has published research on

xix
xx ABOUT THE AUTHORS

change and innovation in language education, and he has worked with lan-
guage teachers and learners in Europe, Asia, North, South, and Central
America.

THE CONTRIBUTORS

Stephen Andrews heads the language and literature division in Hong Kong
University’s Faculty of Education. He has extensive involvement in assess-
ment as was previously Head of the TEFL Unit at the University of Cam-
bridge Local Examinations Syndicate. He has been involved in washback re-
search for more than 10 years.

Sheila Barron is an assistant professor of educational measurement and


statistics at the University of Iowa. While doing a postdoctoral fellowship at
RAND, she began a series of research studies with Dan Koretz and Brian
Stecher investigating the consequences of high stakes testing on school
and classroom practices.

Catherine Burrows is the Manager of the TAFE Strategic Services Unit in


the New SouthWales Department of Education and Training. Her doctoral
research, which forms the basis of her chapter in this book, was under-
taken when she was the Coordinator of Staff and Curriculum Development
in NSW Adult Migrant English Service.

Tammi Chun is the Director of Project Evaluation for Gaining Early Aware-
ness and Readiness for Undergraduate Programs (GEAR UP) at the Univer-
sity of Hawai’i at Manoa. Chun’s research includes study of the implementa-
tion of standards-based reform, including assessment, accountability, and
instructional guidance policies, in America.

Irit Ferman is an instructor and English Center Director, at the English De-
partment, Levinsky College of Education, Tel-Aviv, Israel. She graduated the
Language Education Program, School of Education, Tel-Aviv University,
1998, with distinction. Her washback-related research has focused on the
impact of tests on EFL teaching–learning–assessment practices and the per-
ceptions of those involved.

Roger Hawkey is a consultant on testing and evaluation, currently working


on several test validation research projects for the University of Cambridge
ESOL Examinations. These include the IELTS impact study described in this
volume, and a study of the impact of the Progetto Lingue 2000 language
teaching reform program in Italy.
ABOUT THE AUTHORS xxi

Belinda Hayes is a senior lecturer at the Auckland University of Tech-


nology, New Zealand, where she teaches international students, creates
courses, and trains teachers.

John Read teaches courses in applied linguistics, TESOL, and academic


writing at Victoria University of Wellington, New Zealand. His research in-
terests are in testing English for academic purposes and second language
vocabulary assessment. He is the author of Assessing Vocabulary (Cam-
bridge University Press, 2000) and coeditor of the journal Language Testing.

Nick Saville is Director of Research and Validation for Cambridge ESOL Ex-
amination where he coordinates the research and validation program. He
has worked on several impact studies, including the IELTS impact project
reported in this volume, and a study of the impact of the Progetto Lingue
2000 in Italy.

Brian Stecher is a senior social scientist in the education program at RAND.


His research emphasis is applied educational measurement, including the
implementation, quality, and impact of state assessment and accountability
systems; and the cost, quality, and feasibility of performance-based assess-
ments.

Luxia Qi is an associate professor of English at the Guangdong University


of Foreign Studies in China. Her teaching and research areas include lan-
guage testing, reading in a foreign language, and second language acquisi-
tion. Her doctoral studies at City University of Hong Kong focused on the is-
sue of washback in language testing.
P A R T

I
CONCEPTS AND
METHODOLOGY
OF WASHBACK
C H A P T E R

1
Washback or Backwash:
A Review of the Impact of Testing
on Teaching and Learning

Liying Cheng
Andy Curtis
Queen’s University

Washback or backwash, a term now commonly used in applied linguistics,


refers to the influence of testing on teaching and learning (Alderson & Wall,
1993), and has become an increasingly prevalent and prominent phenome-
non in education—“what is assessed becomes what is valued, which be-
comes what is taught” (McEwen, 1995a, p. 42). There seems to be at least
two major types or areas of washback or backwash studies—those relating
to traditional, multiple-choice, large-scale tests, which are perceived to
have had mainly negative influences on the quality of teaching and learning
(Madaus & Kellaghan, 1992; Nolan, Haladyna, & Haas, 1992; Shepard, 1990),
and those studies where a specific test or examination1 has been modified
and improved upon (e.g., performance-based assessment), in order to exert
a positive influence on teaching and learning (Linn & Herman, 1997; Sanders
& Horn, 1995). The second type of studies has shown, however, positive,
negative, or no influence on teaching and learning. Furthermore, many of
those studies have turned to focus on understanding the mechanism of
how washback or backwash is used to change teaching and learning
(Cheng, 1998a; Wall, 1999).

1
1 In this chapter, the terms “test” and “examination” are used interchangeably to refer to the
use of assessment by means of a test or an examination.

3
4 CHENG AND CURTIS

WASHBACK: THE DEFINITION AND ORIGIN

Although washback is a term commonly used in applied linguistics today, it


is rarely found in dictionaries. However, the word backwash can be found in
certain dictionaries and is defined as “the unwelcome repercussions of
some social action” by the New Webster’s Comprehensive Dictionary, and
“unpleasant after-effects of an event or situation” by the Collins Cobuild Dic-
tionary. The negative connotations of these two definitions are interesting,
as they inadvertently touch on some of the negative responses and reac-
tions to the relationships between teaching and testing, which we explore
in more detail shortly.
Washback (Alderson & Wall, 1993) or backwash (Biggs, 1995, 1996) here
refers to the influence of testing on teaching and learning. The concept is
rooted in the notion that tests or examinations can and should drive teach-
ing, and hence learning, and is also referred to as measurement-driven in-
struction (Popham, 1987). In order to achieve this goal, a “match” or an over-
lap between the content and format of the test or the examination and the
content and format of the curriculum (or “curriculum surrogate” such as
the textbook) is encouraged. This is referred to as curriculum alignment by
Shepard (1990, 1991b, 1992, 1993). Although the idea of alignment—matching
the test and the curriculum—has been descried by some as “unethical,” and
threatening the validity of the test (Haladyna, Nolen, & Haas, 1991, p. 4;
Widen, O’Shea, & Pye, 1997), such alignment is evident in a number of coun-
tries, for example, Hong Kong (see Cheng, 1998a; Stecher, Barron, Chun,
Krop, & Ross, 2000). This alignment, in which a new or revised examination
is introduced into the education system with the aim of improving teaching
and learning, is referred to as systemic validity by Frederiksen and Collins
(1989), consequential validity by Messick (1989, 1992, 1994, 1996), and test im-
pact by Bachman and Palmer (1996) and Baker (1991).
Wall (1997) distinguished between test impact and test washback in
terms of the scope of the effects. According to Wall, impact refers to “. . . any
of the effects that a test may have on individuals, policies or practices,
within the classroom, the school, the educational system or society as a
whole” (see Stecher, Chun, & Barron, chap. 4, this volume), whereas wash-
back (or backwash) is defined as “the effects of tests on teaching and learn-
ing” (Wall, 1997, p. 291).
Although different terms are preferred by different researchers, they all
refer to different facets of the same phenomenon—the influence of testing
on teaching and learning. The authors of this chapter have chosen to use
the term washback, as it is the mostly commonly used in the field of applied
linguistics.
The study of washback has resulted in recent developments in language
testing, and measurement-driven reform of instruction in general educa-
1. IMPACT OF TESTING ON TEACHING AND LEARNING 5

tion. Research in language testing has centered on whether and how we as-
sess the specific characteristics of a given group of test takers and whether
and how we can incorporate such information into the ways in which we
design language tests. One of the most important theoretical developments
in language testing in the past 30 years has been the realization that a lan-
guage test score represents a complex of multiple influences. Language test
scores cannot be interpreted simplistically as an indicator of the particular
language ability we think we are measuring. The scores are also affected by
the characteristics and contents of the test tasks, the characteristics of the
test takers, the strategies test takers employ in attempting to complete the
test tasks, as well as the inferences we draw from the test results. These fac-
tors undoubtedly interact with each other.
Nearly 20 years ago, Alderson (1986) identified washback as a distinct—
and at that time emerging—area within language testing, to which we
needed to turn our attention. Alderson (1986) discussed the “potentially
powerful influence offsets” (p. 104) and argued for innovations in the lan-
guage curriculum through innovations in language testing (also see Wall,
1996, 1997, 2000). At around the same time, Davies (1985) was asking
whether tests should necessarily follow the curriculum, and suggested that
perhaps tests ought to lead and influence the curriculum. Morrow (1986) ex-
tended the use of washback to include the notion of washback validity,
which describes the relationship between testing, and teaching and learn-
ing (p. 6). Morrow also claimed that “. . . in essence, an examination of
washback validity would take testing researchers into the classroom in or-
der to observe the effects of their tests in action” (p. 6). This has important
implications for test validity.
Looking back, we can see that examinations have often been used as a
means of control, and have been with us for a long time: a thousand years
or more, if we include their use in Imperial China to select the highest offi-
cials of the land (Arnove, Altback, & Kelly, 1992; Hu, 1984; Lai, 1970). Those
examinations were probably the first civil service examinations ever devel-
oped. To avoid corruption, all essays in the Imperial Examination were
marked anonymously, and the Emperor personally supervised the final
stage of the examination. Although the goal of the examination was to se-
lect civil servants, its washback effect was to establish and control an edu-
cational program, as prospective mandarins set out to prepare themselves
for the examination that would decide not only their personal fate but also
influence the future of the Empire (Spolsky, 1995a, 1995b).
The use of examinations to select for education and employment has
also existed for a long time. Examinations were seen by some societies as
ways to encourage the development of talent, to upgrade the performance
of schools and colleges, and to counter to some degree, nepotism, favorit-
ism, and even outright corruption in the allocation of scarce opportunities
6 CHENG AND CURTIS

(Bray & Steward, 1998; Eckstein & Noah, 1992). If the initial spread of exami-
nations can be traced back to such motives, the very same reasons appear
to be as powerful today as ever they were. Linn (2000) classified the use of
tests and assessments as key elements in relation to five waves of educa-
tional reform over the past 50 years: their tracking and selecting role in the
1950s; their program accountability role in the 1960s; minimum competency
testing in the 1970s; school and district accountability in the 1980s; and the
standards-based accountability systems in the 1990s (p. 4). Furthermore, it
is clear that tests and assessments are continuing to play a crucial and criti-
cal role in education into the new millennium.
In spite of this long and well-established place in educational history, the
use of tests has, constantly, been subject to criticism. Nevertheless, tests
continue to occupy a leading place in the educational policies and practices
of a great many countries (see Baker, 1991; Calder, 1997; Cannell, 1987;
Cheng, 1997, 1998a; Heyneman, 1987; Heyneman & Ransom, 1990; James,
2000; Kellaghan & Greaney, 1992; Li, 1990; Macintosh, 1986; Runte, 1998;
Shohamy, 1993a; Shohamy, Donitsa-Schmidt, & Ferman, 1996; Widen et al.,
1997; Yang, 1999; and chapters in Part II of this volume). These researchers,
and others, have, over many years, documented the impact of testing on
school and classroom practices, and on the personal and professional lives
and experiences of principals, teachers, students, and other educational
stakeholders.
Aware of the power of tests, policymakers in many parts of the world
continue to use them to manipulate their local educational systems, to con-
trol curricula and to impose (or promote) new textbooks and new teaching
methods. Testing and assessment is “the darling of the policy-makers”
(Madaus, 1985a, 1985b) despite the fact that they have been the focus of
controversy for as long as they have existed. One reason for their longevity
in the face of such criticism is that tests are viewed as the primary tools
through which changes in the educational system can be introduced with-
out having to change other educational components such as teacher training
or curricula. Shohamy (1992) originally noted that “this phenomenon
[washback] is the result of the strong authority of external testing and the
major impact it has on the lives of test takers” (p. 513). Later Shohamy et al.
(1996; see also Stiggins & Faires-Conklin, 1992) expanded on this position
thus:

the power and authority of tests enable policy-makers to use them as effec-
tive tools for controlling educational systems and prescribing the behavior of
those who are affected by their results—administrators, teachers and stu-
dents. School-wide exams are used by principals and administrators to en-
force learning, while in classrooms, tests and quizzes are used by teachers to
impose discipline and to motivate learning. (p. 299)
1. IMPACT OF TESTING ON TEACHING AND LEARNING 7

One example of these beliefs about the legislative power and authority of
tests was seen in 1994 in Canada, where a consortium of provincial minis-
ters of education instituted a system of national achievement testing in the
areas of reading, language arts, and science (Council of Ministers of Educa-
tion, Canada, 1994). Most of the provinces now require students to pass
centrally set school-leaving examinations as a condition of school gradua-
tion (Anderson, Muir, Bateson, Blackmore, & Rogers, 1990; Lock, 2001;
Runte, 1998; Widen, O’Shea, & Pye, 1997).
Petrie (1987) concluded that “it would not be too much of an exaggera-
tion to say that evaluation and testing have become the engine for imple-
menting educational policy” (p. 175). The extent to which this is true de-
pends on the different contexts, as shown by those explored in this volume,
but a number of recurring themes do emerge. Examinations of various
kinds have been used for a very long time for many different purposes in
many different places. There is a set of relationships, planned and un-
planned, positive and negative, between teaching and testing. These two
facts mean that, although washback has only been identified relatively re-
cently, it is likely that washback effects have been occurring for an equally
long time. It is also likely that these teaching–testing relationships are likely
to become closer and more complex in the future. It is therefore essential
that the education community work together to understand and evaluate
the effects of the use of testing on all of the interconnected aspects of teach-
ing and learning within different education systems.

WASHBACK: POSITIVE, NEGATIVE,


NEITHER OR BOTH?

Movement in a particular direction is an inherent part of the use of the


washback metaphor to describe teaching–testing relationships. For exam-
ple, Pearson (1988) stated that “public examinations influence the attitudes,
behaviors, and motivation of teachers, learners and parents, and, because
examinations often come at the end of a course, this influence is seen work-
ing in a backward direction—hence the term ‘washback’ ” (p. 98). However,
like Davies (1985), Pearson believed that the direction in which washback ac-
tually works must be forward (i.e., testing leading teaching and learning).
The potentially bidirectional nature of washback has been recognized
by, for example, Messick (1996), who defined washback as the “extent to
which a test influences language teachers and learners to do things they
would not necessarily otherwise do that promote or inhibit [emphasis
added] language learning” (p. 241, as cited in Alderson & Wall, 1993, p. 117).
Wall and Alderson also noted that “tests can be powerful determiners, both
positively and negatively, [emphasis added] of what happens in classrooms”
(Alderson & Wall, 1993, p. 117; Wall & Alderson, 1993, p. 41).
8 CHENG AND CURTIS

Messick (1996) went on to comment that some proponents have even


maintained that a test’s validity should be appraised by the degree to
which it manifests positive or negative washback, which is similar to Fred-
eriksen and Collins’ (1989) notion of systemic validity.
Underpinning the notion of direction is the issue of what it is that is be-
ing directed. Biggs (1995) used the term backwash (p. 12) to refer to the fact
that testing drives not only the curriculum, but also the teaching methods
and students’ approaches to learning (Crooks, 1988; Frederiksen, 1984; Fred-
eriksen & Collins, 1989). However, Spolsky (1994) believed that “backwash
is better applied only to accidental side-effects of examinations, and not to
those effects intended when the first purpose of the examination is control
of the curriculum” (p. 55). In an empirical study of an intended public exam-
ination change on classroom teaching in Hong Kong, Cheng (1997, 1998a)
combined movement and motive, defining washback as “an intended direc-
tion and function of curriculum change, by means of a change of public ex-
aminations, on aspects of teaching and learning” (Cheng, 1997, p. 36). As
Cheng’s study showed, when a public examination is used as a vehicle for
an intended curriculum change, unintended and accidental side effects can
also occur, that is, both negative and positive influence, as such change in-
volves elaborate and extensive webs of interwoven causes and effects.
Whether the effect of testing is deemed to be positive or negative should
also depend on who it is that actually conducts the investigation within a
particular education context, as well as where, the school or university con-
texts, when, the time and duration of using such assessment practices, why,
the rationale, and how, the different approaches used by different partici-
pants within the context.
If the potentially bidirectional nature of washback is accepted, and
movement in a positive direction is accepted as the aim, the question then
becomes methodological, that is, how to bring about this positive move-
ment. After considering several definitions of washback, Bailey (1996) con-
cluded that more empirical research needed to be carried out in order to
document its exact nature and mechanisms, while also identifying “con-
cerns about what constitutes both positive and negative washback, as well
as about how to promote the former and inhibit the latter” (p. 259).
According to Messick (1996), “for optimal positive washback there
should be little, if any, difference between activities involved in learning the
language and activities involved in preparing for the test” (pp. 241–242).
However, the lack of simple, one-to-one relationships in such complex sys-
tems was highlighted by Messick (1996): “A poor test may be associated
with positive effects and a good test with negative effects because of other
things that are done or not done in the education system” (p. 242). In terms
of complexity and validity, Alderson and Wall (1993) argued that washback
is “likely to be a complex phenomenon which cannot be related directly to
1. IMPACT OF TESTING ON TEACHING AND LEARNING 9

a test’s validity” (p. 116). The washback effect should, therefore, refer to the
effects of the test itself on aspects of teaching and learning.
The fact that there are so many other forces operating within any educa-
tion context, which also contribute to or ensure the washback effect on
teaching and learning, has been demonstrated in several washback studies
(e.g., Anderson et al., 1990; Cheng, 1998b, 1999; Herman, 1992; Madaus, 1988;
Smith, 1991a, 1991b; Wall, 2000; Watanabe, 1996a; Widen et al., 1997). The key
issue here is how those forces within a particular educational context can
be teased out to understand the effects of testing in that environment, and
how confident we can be in formulating hypotheses and drawing conclu-
sions about the nature and the scope of the effects within broader educa-
tional contexts.

Negative Washback

Tests in general, and perhaps language tests in particular, are often criti-
cized for their negative influence on teaching—so-called “negative wash-
back”—which has long been identified as a potential problem. For example,
nearly 50 years ago, Vernon (1956) claimed that teachers tended to ignore
subjects and activities that did not contribute directly to passing the exam,
and that examinations “distort the curriculum” (p. 166). Wiseman (1961) be-
lieved that paid coaching classes, which were intended for preparing stu-
dents for exams, were not a good use of the time, because students were
practicing exam techniques rather than language learning activities (p.
159), and Davies (1968) believed that testing devices had become teaching
devices; that teaching and learning was effectively being directed to past
examination papers, making the educational experience narrow and unin-
teresting (p. 125).
More recently, Alderson and Wall (1993) referred to negative washback
as the undesirable effect on teaching and learning of a particular test
deemed to be “poor” (p. 5). Alderson and Wall’s poor here means “some-
thing that the teacher or learner does not wish to teach or learn.” The tests
may well fail to reflect the learning principles or the course objectives to
which they are supposedly related. In reality, teachers and learners may
end up teaching and learning toward the test, regardless of whether or not
they support the test or fully understand its rationale or aims.
In general education, Fish (1988) found that teachers reacted negatively
to pressure created by public displays of classroom scores, and also found
that relatively inexperienced teachers felt greater anxiety and accountabil-
ity pressure than experienced teachers, showing the influence of factors
such as age and experience. Noble and Smith (1994a) also found that high-
stakes testing could affect teachers directly and negatively (p. 3), and that
“teaching test-taking skills and drilling on multiple-choice worksheets is
10 CHENG AND CURTIS

likely to boost the scores but unlikely to promote general understanding”


(1994b, p. 6). From an extensive qualitative study of the role of external test-
ing in elementary schools in the United States, Smith (1991b) listed a num-
ber of damaging effects, as the “testing programs substantially reduce the
time available for instruction, narrow curricular offerings and modes of in-
struction, and potentially reduce the capacities of teachers to teach content
and to use methods and materials that are incompatible with standardized
testing formats” (p. 8).
This narrowing was not the only detrimental effect found in a Canadian
study, in which Anderson et al. (1990) carried out a survey study investigat-
ing the impact of re-introducing final examinations at Grade 12 in British Co-
lumbia. The teachers in the study reported a narrowing to the topics the ex-
amination was most likely to include, and that students adopted more of a
memorization approach, with reduced emphasis on critical thinking. In a
more recent Canadian study (Widen et al., 1997), Grade 12 science teachers
reported their belief that they had lost much of their discretion in curricu-
lum decision making, and, therefore, much of their autonomy. When teach-
ers believe they are being circumscribed and controlled by the examina-
tions, and students’ focus is on what will be tested, teaching and learning
are in danger of becoming limited and confined to those aspects of the sub-
ject and field of study that are testable (see also Calder, 1990, 1997).

Positive Washback

Like most areas of language testing, for each argument in favor or opposed
to a particular position, there is a counterargument. There are, then, re-
searchers who strongly believe that it is feasible and desirable to bring
about beneficial changes in teaching by changing examinations, represent-
ing the “positive washback” scenario, which is closely related to “measure-
ment-driven instruction” in general education. In this case, teachers and
learners have a positive attitude toward the examination or test, and work
willingly and collaboratively toward its objectives.
For example, Heyneman (1987) claimed that many proponents of aca-
demic achievement testing view “coachability” not as a drawback, but
rather as a virtue (p. 262), and Pearson (1988) argued for a mutually benefi-
cial arrangement, in which “good tests will be more or less directly usable
as teaching-learning activities. Similarly, good teaching-learning tasks will
be more or less directly usable for testing purposes, even though practical
or financial constraints limit the possibilities” (p. 107). Considering the com-
plexity of teaching and learning and the many constraints other than those
financial, such claims may sound somewhat idealistic, and even open to ac-
cusations of being rather simplistic. However, Davies (1985) maintained
that “creative and innovative testing . . . can, quite successfully, attract to it-
1. IMPACT OF TESTING ON TEACHING AND LEARNING 11

self a syllabus change or a new syllabus which effectively makes it into an


achievement test” (p. 8). In this case, the test no longer needs to be just an
obedient servant. It can also be a leader.
As the foregoing studies show, there are conflicting reactions toward
positive and negative washback on teaching and learning, and no obvious
consensus in the research community as to whether certain washback ef-
fects are positive or negative. As was discussed earlier, one reason for this
is the potentially bidirectional nature of an exam or test, the positive or
negative nature of which can be influenced by many contextual factors.
According to Pearson (1988), a test’s washback effect will be negative if it
fails to reflect the learning principles and course objectives to which the
test supposedly relates, and it will be positive if the effects are beneficial
and “encourage the whole range of desired changes” (p. 101). Alderson and
Wall (1993), on the other hand, stressed that the quality of the washback ef-
fect might be independent of the quality of a test (pp. 117–118). Any test,
good or bad, may result in beneficial or detrimental washback effects.
It is possible that research into washback may benefit from turning its at-
tention toward looking at the complex causes of such a phenomenon in
teaching and learning, rather than focusing on deciding whether or not the
effects can be classified as positive or negative. According to Alderson and
Wall (1993), one way of doing this is to first investigate as thoroughly as
possible the broad educational context in which an assessment is intro-
duced, since other forces exist within the society and the education system
that might prevent washback from appearing (p. 116). A potentially key so-
cietal factor is the political forces at work. As Heyneman (1987) put it:
“Testing is a profession, but it is highly susceptible to political interference.
To a large extent, the quality of tests relies on the ability of a test agency to
pursue professional ends autonomous” (p. 262). If the consequences of a
particular test for teaching and learning are to be evaluated, the educa-
tional context in which the test takes place needs to be fully understood.
Whether the washback effect is positive or negative will largely depend on
where and how it exists and manifests itself within a particular educational
context, such as those studies explored in this volume.

WASHBACK: FUNCTIONS AND MECHANISMS

Traditionally, tests have come at the end of the teaching and learning proc-
ess for evaluative purposes. However, with the widespread expansion and
proliferation of high-stakes public examination systems, the direction
seems to have been largely reversed. Testing can come first in the teaching
and learning process. Particularly when tests are used as levers for change,
new materials need to be designed to match the purposes of a new test, and
school administrative and management staff, teachers, and students are
12 CHENG AND CURTIS

generally required to learn to work in alternative ways, and often work


harder, to achieve high scores on the test. In addition to these changes,
many more changes in the teaching and learning context can occur as the
result of a new test, although the consequences and effects may be inde-
pendent of the original intentions of the test designers, due to the complex
interplay of forces and factors both within and beyond the school.
Such influences were linked to test validity by Shohamy (1993a), who
pointed out that “the need to include aspects of test use in construct valida-
tion originates in the fact that testing is not an isolated event; rather, it is
connected to a whole set of variables that interact in the educational proc-
ess” (p. 2). Similarly, Linn (1992) encouraged the measurement research
community “to make the case that the introduction of any new high-stakes
examination system should pay greater attention to investigations of both
the intended and unintended consequences of the system than was typical
of previous test-based reform efforts” (p. 29).
As a result of this complexity, Messick (1989) recommended a unified va-
lidity concept, which requires that when an assessment model is designed
to make inferences about a certain construct, the inferences drawn from
that model should not only derive from test score interpretation, but also
from other variables operating within the social context (Bracey, 1989;
Cooley, 1991; Cronbach, 1988; Gardner, 1992; Gifford & O’Connor, 1992; Linn,
Baker, & Dunbar, 1991; Messick, 1992). The importance of collaboration was
also highlighted by Messick (1975): “Researchers, other educators, and pol-
icy makers must work together to develop means of evaluating educational
effectiveness that accurately represent a school or district’s progress to-
ward a broad range of important educational goals” (p. 956).
In exploring the mechanism of such an assessment function, Bailey
(1996, pp. 262–264) cited Hughes’ trichotomy (1993) to illustrate the com-
plex mechanisms through which washback occurs in actual teaching and
learning environments (see Table 1.1). Hughes (1993) explained his model
as follows:

The trichotomy . . . allows us to construct a basic model of backwash. The na-


ture of a test may first affect the perceptions and attitudes of the participants
towards their teaching and learning tasks. These perceptions and attitudes in

TABLE 1.1
The Trichotomy Backwash Model

(a) Participants—students, classroom teachers, administrators, materials developers and


publishers, whose perceptions and attitudes toward their work may be affected by a test
(b) Processes—any actions taken by the participants which may contribute to the process of
learning
(c) Products—what is learned (facts, skills, etc.) and the quality of the learning

Note. Adapted from Hughes, 1993, p. 2. Cited in Bailey (1996).


1. IMPACT OF TESTING ON TEACHING AND LEARNING 13

turn may affect what the participants do in carrying out their work (process),
including practicing the kind of items that are to be found in the test, which
will affect the learning outcomes, the product of the work. (p. 2)

Whereas Hughes focused on participants, processes, and products in his


model to illustrate the washback mechanism, Alderson and Wall (1993), in
their Sri Lankan study, focused on micro aspects of teaching and learning
that might be influenced by examinations. Based on that study, they drew
up 15 hypotheses regarding washback (pp. 120–121), which referred to areas
of teaching and learning that are generally affected by washback. Alderson
and Wall concluded that further research on washback is needed, and that
such research must entail “increasing specification of the Washback Hy-
pothesis” (p. 127). They called on researchers to take account of findings in
the research literature in at least two areas: (a) motivation and perform-
ance, and (b) innovation and change in the educational settings.
One response to Alderson and Wall’s (1993) recommendation was a
large-scale quantitative and qualitative empirical study, in which Cheng
(1997, 1998a) developed the notion of “washback intensity” to refer to the
degree of the washback effect in an area or a number of areas of teaching
and learning affected by an examination. Each of the areas was studied in
order to chart and understand the function and mechanism of washback—
the participants, the processes, and the products—that might have been
brought about by the change of a major public examination within a spe-
cific educational context (Hong Kong).
Wall (1996) stressed the difficulties in finding explanations of how tests
exert influence on teaching (p. 334). Wall (1999, 2000) used the innovation
literature and incorporated findings from this literature into her research
areas to propose ways of exploring the complex aspect of washback:

· The writing of detailed baseline studies to identify important character-


istics in the target system and the environment, including an analysis of
current testing practices (Shohamy et al., 1996), current teaching prac-
tices, resources (Bailey, 1996; Stevenson & Riewe, 1981), and attitudes of
key stakeholders (Bailey, 1996; Hughes, 1993).
· The formation of management teams representing all the important in-
terest groups, for example, teachers, teacher trainers, university spe-
cialists, ministry officials, parents and learners, etc. (Cheng, 1998a).

Fullan with Stiegelbauer (1991) and Fullan (1993), also in the context of inno-
vation and change, discussed changes in schools, and identified two main
recurring themes:

· Innovation should be seen as a process rather than as an event.


14 CHENG AND CURTIS

· All the participants who are affected by an innovation have to find their
own “meaning” for the change.

Fullan explained that the “subjective reality” which teachers’ experience


would always contrast with the “objective reality” that the proponents of
change had originally imagined. According to Fullan, teachers work on their
own, with little reference to experts or consultation with colleagues. They
are forced to make on-the-spot decisions, with little time to reflect on better
solutions. They are pressured to accomplish a great deal, but are given far
too little time to achieve their goals. When, on top of this, they are expected
to carry forward an innovation that is generally not of their own making,
their lives can become very difficult indeed. This may help to explain why
intended washback does or does not occur in teaching and learning. If edu-
cational change is imposed upon those parties most directly affected by the
change, that is, learners and teachers, without consultation of those par-
ties, resistance is likely to be the natural response (Curtis, 2000). In addi-
tion, it has also been found that there tend to be discrepancies between the
intention of any innovation or curriculum change and the understanding of
teachers who are tasked with the job of implementing that change (An-
drews, 1994, 1995; Markee, 1997).
Andrews (1994, 1995) highlighted the complexity of the relationship be-
tween washback and curriculum innovation, and summarized three possi-
ble responses of educators in response to washback: fight it, ignore it, or
use it (see also Andrew’s chap. 3 in this volume; Heyneman, 1987, p. 260). By
“fight it,” Heyneman referred to the effort to replace examinations with
other sorts of selection processes and criteria, on the grounds that exami-
nations have encouraged rote memorization at the expense of more desir-
able educational practices. In terms of “ignoring it,” Andrews (1994) used
the metaphor of the ostrich pretending that on-coming danger does not re-
ally exist by hiding its head in the sand (pp. 51–52). According to Andrews,
those who are involved with mainstream activities, such as syllabus design,
material writing, and teacher training, view testers as a “special breed” us-
ing an obscure and arcane terminology. Tests and exams have been seen as
an occasional necessary evil, a dose of unpleasant medicine, the taste of
which should be washed away as quickly as possible.
The third response, “use it,” is now perhaps the most common of the
three, and using washback to promote particular pedagogical goals is now
a well-established approach in education (see also Andrews & Fullilove,
1993, 1994; Blenkin, Edwards, & Kelly, 1992; Brooke & Oxenham, 1984;
Pearson, 1988; Somerset, 1983; Swain, 1984). The question of who it is that
uses it relates, at least in part, to the earlier discussion of the legislative
power of tests as perceived by governments and policymakers in many
parts of the world (see also Stecher, Chun, & Barron, chap. 4, this volume).
1. IMPACT OF TESTING ON TEACHING AND LEARNING 15

WASHBACK: THE CURRENT TRENDS


IN ASSESSMENT

One of the main functions of assessment is generally believed to be as one


form of leverage for educational change, which has often led to top-down
educational reform strategies by employing “better” kinds of assessment
practices (James, 2000; Linn, 2000; Noble & Smith, 1994a). Assessment prac-
tices are currently undergoing a major paradigm shift in many parts of the
world, which can be described as a reaction to the perceived shortcomings
of the prevailing paradigm, with its emphasis on standardized testing
(Biggs, 1992, 1996; Genesee, 1994). Alternative or authentic assessment
methods have thus emerged as systematic attempts to measure learners’
abilities to use previously acquired knowledge in solving novel problems or
completing specific tasks, as part of this use of assessment to reform curric-
ulum and improve instruction at the school and classroom level (Linn, 1983,
1992; Lock, 2001; Noble & Smith, 1994a, 1994b; Popham, 1983).
According to Noble and Smith (1994b), “the most pervasive tool of top-
down policy reform is to mandate assessment that can serve as both guide-
posts and accountability” (p. 1; see also Baker, 1989; Herman, 1989, 1992;
McEwen, 1995a, 1995b; Resnick, 1989; Resnick & Resnick, 1992). Noble and
Smith (1994a) also pointed out that the goal of current measurement-driven
reforms in assessment is to build better tests that will drive schools toward
more ambitious goals and reform them toward a curriculum and pedagogy
geared more toward thinking and away from rote memory and isolated
skills.
Beliefs about testing tend to follow beliefs about teaching and learning
(Glaser & Bassok, 1989; Glaser & Silver, 1994), as seen, for example, in the
shift from behaviorism to cognitive–constructivism in teaching and learn-
ing beliefs. According to the more recent psychological and pedagogi-
cal cognitive–constructivist views of learning, effective instruction must
mesh with how students think. The direct instruction model under the in-
fluence of behaviorism—tell-show-do approach—does not match how stu-
dents learn, nor does it take into account students’ intentions, interests,
and choices. Teaching that fits the cognitive–constructivist view of learn-
ing is likely to be holistic, integrated, project-oriented, long-term, discov-
ery-based, and social. Likewise, testing should aim to be all of these things
too. Thus cognitive–constructivists see performance assessment2 as par-

2
2 Performance assessment based on the constructivist model of learning is defined by Gipps
(1994) as “a systematic attempt to measure a learner’s ability to use previously acquired knowl-
edge in solving novel problems or completing specific tasks. In performance assessment, real
life or simulated assessment exercises are used to elicit original responses, which are directly
observed and rated by a qualified judge” (p. 99).
16 CHENG AND CURTIS

allel in terms of beliefs about how students learn and how their learning
can be best supported.
It is possible that performance-based assessment can be designed to be
so closely linked to the goals of instruction as to be almost indistinguish-
able from them. If this were achieved, then rather than being a negative
consequence, as is the case now with many existing high-stakes standard-
ized tests, “teaching to these proposed performance assessments, accepted
by scholars as inevitable and by teachers as necessary, becomes a virtue,
according to this line of thinking” (Noble & Smith, 1994b, p. 7; see also
Aschbacher, 1990; Aschbacher, Baker, & Herman, 1988; Baker, Aschbacher,
Niemi, & Sato, 1992; Wiggins, 1989a, 1989b, 1993). This rationale relates to
the debates about negative versus positive washback, discussed earlier,
and may have been one of the results of public discontent with the quality
of schooling leading to the development of measurement-driven instruction
(Popham, Cruse, Rankin, Standifer, & Williams, 1985, p. 629). However, such
a reform strategy has been challenged, for example, described by Andrews
(1994, 1995) as a “blunt instrument” for bringing about changes in teaching
and learning, since the actual teaching and learning situation is far more
complex, as discussed earlier, than proponents of alternative assessment
appear to suggest (see also Alderson & Wall, 1993; Cheng, 1998a, 1999; Wall,
1996, 1999).
Each different educational context (including school environment, mes-
sages from administration, expectations of other teachers, students, etc.)
plays a key role in facilitating or detracting from the possibility of change,
which support Andrews’ (1994, 1995) beliefs that such reform strategies
may be simplistic. More support for this position comes from Noble and
Smith (1994a), whose study of the impact of the Arizona Student Assess-
ment Program revealed “both the ambiguities of the policy-making process
and the dysfunctional side effects that evolved from the policy’s disparities,
though the legislative passage of the testing mandate obviously demon-
strated Arizona’s commitment to top-down reform and its belief that assess-
ment can leverage educational change” (pp. 1–2). The chapters in Part II of
this volume describe and explore what impact testing has had in and on
those educational contexts, what factors facilitate or detract from the possi-
bility of change derived from assessment, and the lessons we can learn
from these studies.
The relationship between testing and teaching and learning does appear
to be far more complicated and to involve much more than just the design
of a “good” assessment. There is more underlying interplay and intertwin-
ing of influences within each specific educational context where the assess-
ment takes place. However, as Madaus (1988) has shown, a high-stakes test
can lever the development of new curricular materials, which can be a posi-
tive aspect. An important point, though, is that even if new materials are
1. IMPACT OF TESTING ON TEACHING AND LEARNING 17

produced as a result of a new examination, they might not be molded ac-


cording to the innovators’ view of what is desirable in terms of teaching,
and might instead conform to publishers’ views of what will sell, which was
shown to be the case within the Hong Kong education context (see An-
drews, 1995; Cheng, 1998a).
In spite of the reservations about examination-driven educational re-
form, measurement-driven instruction will occur when a high-stakes testing
of educational achievement influences the instructional program that pre-
pares students for the test, since important contingencies are associated
with the students’ performance in such a situation, as Popham (1987) has
pointed out:

Few educators would dispute the claim that these sorts of high-stakes tests
markedly influence the nature of instructional programs. Whether they are
concerned about their own self-esteem or their students’ well being, teachers
clearly want students to perform well on such tests. Accordingly, teachers
tend to focus a significant portion of their instructional activities on the
knowledge and skills assessed by such tests. (p. 680)

It is worthwhile pointing out here that performing well on a test does not
necessarily indicate good learning or high standards, and it only tells part
of the story about the actual teaching and learning. When a new test emerg-
ing—a traditional type or an alternative type of assessment emerging—is in-
troduced into an educational context as a mandate and as an accountability
measure, it is likely to produce unintended consequences (Cheng & Cou-
ture, 2000), which goes back to Messick’s (1994) consequential validity.
Teachers do not resist changes. They resist being changed (A. Kohn, per-
sonal communication, April 17, 2002). As English (1992) stated well, the end
point of educational change—classroom change—is in the teachers’ hands.
When the classroom door is closed and nobody else is around, the class-
room teacher can then select and teach almost any curriculum he or she
decides is appropriate, irrespective of reforms, innovations, and public ex-
aminations.
The studies discussed in this chapter highlight the importance of the ed-
ucational community understanding the function of testing in relation to
the many facets and scopes of teaching and learning as mentioned before,
and the importance of evaluating the impact of assessment-driven reform
on our teachers, students, and other participants within the educational
context. This chapter serves as the starting point, and the linking point to
other chapters in this volume, so we can examine the nature of this wash-
back phenomenon from many different perspectives (see chaps. 2 and 3)
and within many different educational contexts around the world (chaps. in
Part II).
References

Foreword

Alderson, J. C., & Hamp-Lyons, L. (1996). TOEFL preparation


courses: A study of washback. Language Testing, 13,
280–297.

Fullan, M. G., with Stiegelbauer, S. (1991). The new


meaning of educational change (2nd ed.). London: Cassell.

Messick, S. (1996). Validity and washback in language


testing. Language Testing, 13, 241–256.

Pearson, I. (1988). Tests as levers for change. In D.


Chamberlain & R. J. Baumgardner (Eds.), ESP in the
classroom: Practice and evaluation (pp. 98–107). London:
Modern English.

Wall, D. (1996). Introducing new tests into traditional


systems: Insights from general education and from
innovation theory. Language Testing, 13, 334–354.

Wall, D. (1997). Impact and washback in language testing.


In C. Clapham & D. Corson (Eds.), Encyclopedia of language
and education: Vol. 7. Language testing and assessment (pp.
291–302). Dordrecht: Kluwer Academic.

Wall, D. (1999). The impact of high-stakes examinations on


classroom teaching: A case study using insights from
testing and innovation theory. Unpublished doctoral
dissertation, Lancaster University, UK.

Wall, D., & Alderson, J. C. (1993). Examining washback: The


Sri Lankan impact study. Language Testing, 10, 41–69.
References

Adams, R. S., & Chen, D. (1981). The process of educational innovation: An international perspective.
London: Kogan Page.
AEL. (2000). Notes from the field: KERA in the classroom. Notes from the field: Education reform in
rural Kentucky, 7(1), 1–18.
Alderson, J. C. (1986). Innovations in language testing. In M. Portal (Ed.), Innovations in language
testing: Proceedings of the IUS/NFER conference (pp. 93–105). Windsor: NFER-Nelson.
Alderson, J. C. (1990). The relationship between grammar and reading in an English for academic
purposes test battery. In D. Douglas & C. Chappelle (Eds.), A new decade of language testing
research: Selected papers from the Annual Language Testing Research Colloquium (pp. 203–219).
Alexandria, VA: Teachers of English to Speakers of Other Languages.
Alderson, J. C. (1992). Guidelines for the evaluation of language education. In J. C. Alderson & A.
Beretta (Eds.), Evaluating second language education (pp. 274–304). Cambridge, England: Cam-
bridge University Press.
Alderson, J. C., & Banerjee, J. (1996). How might impact study instruments be validated? Cambridge,
England: University of Cambridge Local Examinations Syndicate.
Alderson, J. C., & Banerjee, J. (2001). Impact and washback research in language testing. In C. El-
der, A. Brown, E. Grove, K. Hill, N. Iwashita, T. Lumley, K. McLoughlin, & T. McNamara (Eds.),
Experimenting with uncertainty: Essays in honor of Alan Davies (pp. 150–161). Cambridge, Eng-
land: Cambridge University Press.
Alderson, J. C., & Hamp-Lyons, L. (1996). TOEFL preparation courses: A study of washback. Lan-
guage Testing, 13, 280–297.
Alderson, J. C., & Scott, M. (1992). Insiders and outsiders and participatory evaluation. In J. C.
Alderson & A. Beretta (Eds.), Evaluating second language curriculum (pp. 25–60). Cambridge,
England: Cambridge University Press.
Alderson, J. C., & Wall, D. (1993). Does washback exist? Applied Linguistics, 14, 115–129.
Alderson, J. C., & Wall, D. (Eds.). (1996). [Special issue]. Language Testing, 13(3).
Allwright, D., & Bailey, K. M. (1991). Focus on the language classroom: An introduction to classroom
research for language teachers. Cambridge, England: Cambridge University Press.

211
212 REFERENCES

Amano, I. (1990). Education and examination in modern Japan (W. K. Cummings & F. Cummings,
Trans.). Tokyo: University of Tokyo Press. (Original work published 1983)
Anderson, J. O., Muir, W., Bateson, D. J., Blackmore, D., & Rogers, W. T. (1990). The impact of pro-
vincial examinations on education in British Columbia: General report. Victoria: British Colum-
bia Ministry of Education.
Andrews, S. (1994). The washback effect of examinations: Its impact upon curriculum innovation
in English language teaching. Curriculum Forum, 4(1), 44–58.
Andrews, S. (1995). Washback or washout? The relationship between examination reform and
curriculum innovation. In D. Nunan, V. Berry, & R. Berry (Eds.), Bringing about change in lan-
guage education (pp. 67–81). Hong Kong: University of Hong Kong.
Andrews, S., & Fullilove, J. (1993). Backwash and the use of English oral: Speculations on the im-
pact of a new examination upon sixth form English language testing in Hong Kong. New Hori-
zons, 34, 46–52.
Andrews, S., & Fullilove, J. (1994). Assessing spoken English in public examinations—Why and
how? In J. Boyle & P. Falvey (Eds.), English language testing in Hong Kong (pp. 57–85). Hong
Kong: Chinese University Press.
Andrews, S., & Fullilove, J. (1997, December). The elusiveness of washback: Investigating the impact
of a new oral exam on students’ spoken language performance. Paper presented at the Interna-
tional Language in Education Conference, University of Hong Kong, Hong Kong.
Andrews, S., Fullilove, J., & Wong, Y. (2002). Targeting washback: A case-study. System, 30,
207–223.
Ariyoshi, H., & Senba, K. (1983). Daigaku nyushi junbi kyoiku ni kansuru kenkyu [A study on
preparatory teaching for entrance examination]. Fukuoka Kyoiku Daigaku Kiyo, 33, 1–21.
Arnove, R. F., Altback, P. G., & Kelly, G. P. (Eds.). (1992). Emergent issues in education: Comparative
perspectives. Albany, NY: State University of New York Press.
Aschbacher, P. R., Baker, E. L., & Herman, J. L. (Eds.). (1988). Improving large-scale assessment
(Resource Paper No. 9). Los Angeles: University of California, National Center for Research
on Evaluation, Standards, and Student Testing.
Aschbacher, P. R. (1990). Monitoring the impact of testing and evaluation innovations projects: State
activities and interest concerning performance-based assessment. Los Angeles: University of
California, National Center for Research on Evaluation, Standards, and Student Testing.
Association of Language Testers in Europe. (1995). Development and descriptive checklists for
tasks and examinations. Cambridge, England: Author.
Association of Language Testers in Europe. (1998). ALTE handbook of language examinations and
examination systems. Cambridge, England: University of Cambridge Local Examinations Syn-
dicate.
Bachman, L., Davidson, F., Ryan, K., & Choi, I. C. (Eds.). (1993). An investigation into the compara-
bility of two tests of English as a foreign language. Cambridge, England: Cambridge University
Press.
Bachman, L. F., & Palmer, A. S. (1996). Language testing in practice. Oxford, England: Oxford Uni-
versity Press.
Bachman, L. F., Purpura, J. E., & Cushing, S. T. (1993). Development of a questionnaire item bank to
explore test-taker characteristics. Cambridge, England: University of Cambridge Local Exami-
nations Syndicate.
Bailey, K. M. (1996). Working for washback: A review of the washback concept in language test-
ing. Language Testing, 13, 257–279.
Bailey, K. M. (1999). Washback in language testing. Princeton, NJ: Educational Testing Service.
Baker, E. L. (1989). Can we fairly measure the quality of education? (Tech. Rep. No. 290). Los An-
geles: University of California, Center for the Study of Evaluation.
Baker, E. L. (1991, September). Issues in policy, assessment, and equity. Paper presented at the na-
tional research symposium on limited English proficient students’ issues: Focus on evalua-
tion and measurement, Washington, DC.
REFERENCES 213

Baker, E., Aschbacher, P., Niemi, D., & Sato, E. (1992). Performance assessment models: Assessing
content area explanations. Los Angeles: University of California, National Center for Research
on Evaluation, Standards, and Student Testing.
Banerjee, J. V. (1996). The design of the classroom observation instruments. Cambridge, England:
University of Cambridge Local Examinations Syndicate.
Ben-Rafael, E. (1994). Language, identity, and social division: The case of Israel. Oxford, England:
Clarendon Press.
Bergeson, T., Wise, B. J., Fitton, R., Gill, D. H., & Arnold, N. (2000). Guidelines for participation and
testing accommodations for special populations on the Washington assessment of student learn-
ing (WASL). Olympia, WA: Office of Superintendent of Public Instruction.
Berry, V., Falvey, P., Nunan, D., Burnett, M., & Hunt, J. (1995). Assessment and change in the
classroom. In D. Nunan, R. Berry & V. Berry (Eds.), Bringing about change in language educa-
tion (pp. 31–54). Hong Kong: University of Hong Kong, Department of Curriculum Studies.
Berwick, R., & Ross, S. (1989). Motivation after matriculation: Are Japanese learners of English
still alive after exam hell? Japan Association for Language Teaching Journal, 11, 193–210.
Biggs, J. B. (1992). The psychology of assessment and the Hong Kong scene. Bulletin of the Hong
Kong Psychological Society, 29, 1–21.
Biggs, J. B. (1995). Assumptions underlying new approaches to educational assessment. Curricu-
lum Forum, 4(2), 1–22.
Biggs, J. B. (Ed.). (1996). Testing: To educate or to select? Education in Hong Kong at the cross-roads.
Hong Kong: Hong Kong Educational Publishing.
Biggs, J. B. (1998). Assumptions underlying new approaches to assessment. In P. G. Stimpson &
P. Morris (Eds.), Curriculum and assessment for Hong Kong (pp. 351–384). Hong Kong: Open
University of Hong Kong Press.
Biggs, J. B. (1999). Teaching for quality learning at university. Buckingham, England: Open Univer-
sity Press.
Black, P., & Wiliam, D. (1998). Assessment and classroom learning. Assessment in education: Prin-
ciples, policy and practice 5(1), 7–75.
Blenkin, G. M., Edwards, G., & Kelly, A. V. (1992). Change and the curriculum. London: P. Chapman.
Blewchamp, P. (1994). Washback in TOEFL classrooms: An exploratory investigation into the influ-
ence of the TOEFL test on teaching content and methodology. Unpublished master’s thesis, Lan-
caster University, England.
Bonkowski, F. (1996). Instrument for the assessment of teaching materials. Unpublished manu-
script, Lancaster University, England.
Borko, H., & Elliott, R. (1999). “Hands-on” pedagogy vs. “hands-off” accountability: Tensions be-
tween competing commitments for exemplary teachers of mathematics in Kentucky. Phi
Delta Kappa, 80, 394–400.
Bracey, G. W. (1987). Measurement-driven instruction: Catchy phrase, dangerous practice. Phi
Delta Kappa, 68, 683–686.
Bracey, G. W. (1989). The $150 million redundancy. Phi Delta Kappa, 70, 698–702.
Bray, M., & Steward, L. (Eds.). (1998). Examination systems in small states: Comparative perspec-
tives on policies, models and operations. London: Commonwealth Secretariat.
Briggs, C. L. (1986). Learning how to ask: A sociolinguistic appraisal of the role of the interview in so-
cial science research. Cambridge, England: Cambridge University Press.
Brindley, G. (1989). Assessing achievement in the learner-centered curriculum. Sydney, Australia:
National Center for English Language Teaching and Research.
Brindley, G. (1994). Competency-based assessment in second language programs: Some issues
and questions. Prospect, 9(2), 41–55.
Broadfoot, P. (1998, April). Categories, standards and instrumentalism: Theorizing the changing dis-
course of assessment policy in English primary education. Paper presented at the annual confer-
ence of the American Educational Research Association, San Diego, CA.
214 REFERENCES

Broadfoot, P. (1999, September). Empowerment or performativity? English assessment policy in the


late twentieth century. Paper presented at the British Educational Research Association An-
nual Conference, Sussex, England.
Brooke, N., & Oxenham, J. (1984). The influence of certification and selection on teaching and
learning. In J. Oxenham (Ed.), Education versus qualifications? (pp. 147–175). London: Allen
and Unwin.
Brown, J. D. (1997). Do tests washback on the language classroom? The TESOLANZ Journal, 5(5),
63–80.
Brown, J. D. (2001). Using surveys in language programs. Cambridge, England: Cambridge Univer-
sity Press.
Bude, U. (1989). The challenge of quality in primary education in Africa. Bonn, Germany: German
Foundation for International Development, Education, Science and Documentation Center.
Burrows, C. (1993). Assessment guidelines for the certificate in spoken and written English: Educa-
tional Draft (Vols. 1–5). Sydney, Australia: New South Wales Adult Migrant English Service.
Burrows, C. (1998). Searching for washback: An investigation into the impact on teachers of the im-
plementation into the adult migrant English program of the certificate in spoken and written Eng-
lish. Unpublished doctoral dissertation, Macquarie University, Sydney, Australia.
Burrows, C. (1999, July). Adopters, adapters, and resisters: Did the assessment of the certificates in
spoken and written English change teaching in AMEP. Paper presented at the Language Testing
Research Colloquium, Tsukuba, Japan.
Burrows, C. (2001). Searching for washback: The impact of assessment in the Certificate in Spo-
ken and Written English. In G. Brindley & C. Burrows (Eds.), Studies in immigrant English lan-
guage assessment: Vol. 2. Sydney, Australia: National Center for English Language Teaching
and Research.
Bush, M. (1998). A sociocultural view of washback in Japanese university entrance exams. Unpub-
lished manuscript, Ontario Institute for Studies in Education, Toronto, Ontario, Canada.
Calder, P. (1990). Impact of diploma examinations on the teaching-learning process. Admonition, Al-
berta, Canada: Alberta Teacher Association.
Calder, P. (1997). Impact of Alberta achievement tests on the teaching-learning process. Admonition,
Alberta, Canada: Alberta Teacher Association.
Camron, H. (1985). Guide to the oral Bagrut examination. Jerusalem: Ministry of Education and
Culture, English Inspectorate.
Cannell, J. J. (1987). Nationally-normed elementary achievement testing in America’s public
schools: How all 50 states are above the national average. Educational Measurement: Issues
and Practice, 7(4), 12–15.
Chapman, D. W., & Snyder, C. W. (2000). Can high-stakes national testing improve instruction: Re-
examining conventional wisdom. International Journal of Educational Development, 20,
457–474.
Chaudron, C. (1986). The interaction of quantitative and qualitative approaches to research: A
view of the second language classroom. TESOL Quarterly, 20, 709–717.
Chaudron, C. (1988). Second language classrooms: Research on teaching and learning. Cambridge,
England: Cambridge University Press.
Cheng, L. (1997). How does washback influence teaching? Implications for Hong Kong. Language
and Education, 11, 38–54.
Cheng, L. (1998a). The washback effect of public examination change on classroom teaching: An im-
pact study of the 1996 Hong Kong certificate of education in English on the classroom teaching of
English in Hong Kong secondary schools. Unpublished doctoral dissertation, University of
Hong Kong, Hong Kong.
Cheng, L. (1998b). Impact of a public English examination change on students’ perceptions and
attitudes toward their English learning. Studies in Educational Evaluation, 24, 279–301.
Cheng, L. (1999). Changing assessment: Washback on teacher perspectives and actions.
Teaching and Teacher Education, 15, 253–271.
REFERENCES 215

Cheng, L. (2001). Washback studies: Methodological considerations. Curriculum Forum, 10(2),


17–32.
Cheng, L., & Couture, J. C. (2000). Teachers’ work in the global culture of performance. Alberta
Journal of Educational Research, 46(1), 65–74.
Chin, R., & Benne, K. D. (1976). General strategies for effecting changes in human systems. In
W. G. Bennis, K. D. Benne, R. Chin, & K. E. Corey (Eds.), The planning of change (3rd ed., pp.
22–45). New York: Holt, Rinehart and Winston.
Clark, J. L. (1987). Curriculum renewal in school foreign language learning. Oxford, England: Oxford
University Press.
Cohen, L. (1976). Educational research in classrooms and schools: A manual of materials and meth-
ods. London: Harper & Row.
Cohen, L., & Manion, L. (1989). Research methods in education (3rd ed.). London: Routledge.
Cohen, L., Manion, L., & Morrison, K. (2000). Research methods in education (5th ed.). London:
Routledge Falmer.
Cooley, W. W. (1991). State-wide student assessment. Educational Measurement: Issues and Prac-
tice, 10, 3–6.
Cooper, R. (1989). Language planning and social change. Cambridge, England: Cambridge Univer-
sity Press.
Corbett, H. D., & Wilson, B. L. (1988). Raising the stakes in statewide mandatory minimum com-
petency testing. In W. L. Boyd & C. T. Kerchner (Eds.), The politics of excellence and choice in
education: The 1987 politics of education association yearbook (pp. 27–39). New York: Falmer
Press.
Corbett, H. D., & Wilson, B. L. (1991). Testing, reform and rebellion. Norwood, NJ: Ablex.
Council of Chief State School Officers. (1998). Annual Survey of State Student Assessment Pro-
grams, Summary Report. Washington, DC: Author.
Council of Ministers of Education, Canada. (1994). School achievement indicators program. To-
ronto, Ontario, Canada: Author.
Cronbach, L. J. (1982). Prudent aspirations for social inquiry. In W. H. Kruskal (Ed.), The social sci-
ences: Their nature and uses (pp. 61–81). Chicago: University of Chicago Press.
Cronbach, L. J. (1988). Five perspectives on the validity argument. In H. Wainer & H. I. Braun
(Eds.), Test validity (pp. 3–17). Hillsdale, NJ: Lawrence Erlbaum Associates.
Crooks, T. J. (1988). The impact of classroom evaluation practices on students. Review of Educa-
tional Research, 58, 438–481.
Csikszentmihalyi, M. (1992). Flow: The psychology of happiness. London: Rider.
Curriculum Development Council. (1982). The syllabus for English (Forms I–V). Hong Kong: Hong
Kong Government Printer.
Curtis, A. (2000). A problem-solving approach to the management of change in language educa-
tion. Korea TESOL Journal, 3(1), 1–12.
Darling-Hammond, L., & Wise, A. E. (1985). Beyond standardization: State standards and school
improvement. The Elementary School Journal, 85, 315–336.
Davies, A. (Ed.). (1968). Language testing symposium: A psycholinguistic approach. Oxford: Oxford
University Press.
Davies, A. (1985). Follow my leader: Is that what language tests do? In Y. P. Lee, C. Y. Y. Fok, R.
Lord, & G. Low (Eds.), New directions in language testing (pp. 1–12). Oxford: Pergamon Press.
Davies, A. (1997). Demands of being professional in language testing. Language Testing, 14,
328–339.
Davis, K. A. (1992). Validity and reliability in qualitative research on second language acquisition
and teaching: Another researcher comments. TESOL Quarterly, 26, 605–608.
Davis, K. A. (1995). Qualitative theory and methods in applied linguistics research. TESOL Quar-
terly, 29, 427–453.
216 REFERENCES

Docking, R. (1993, May). Competency-based approaches to education and training: Progress and
promise. Paper presented at the annual meeting of the National Centre for English Language
Teaching and Research, Sydney, Australia.
Dore, R. P. (1976). The diploma disease. London: Allen and Unwin.
Dore, R. P. (1997). Reflections on the diploma disease twenty years later. Assessment in Educa-
tion, 4, 189–206.
Ebel, R. L. (1966). The social consequences of educational testing. In A. Anastasi (Ed.), Testing
problems in perspective (pp. 18–29). Washington, DC: American Council on Education.
Eckstein, M. A., & Noah, H. J. (Eds.). (1992). Examinations: Comparative and international studies.
Oxford: Pergamon Press.
Education Week. (1997, January). Quality counts ’97: A report card on the conditions of education in
50 states. Bethesda, MD: Editorial Projects in Education. Retrieved April 3, 2000, from http://
www.edweek.org/sreports/qc97/
Education Week. (1999, January 11). Quality counts ’99: Rewarding results, punishing failure.
Bethesda, MD: Editorial Projects in Education.
Education Week. (2000, January 13). Quality counts 2000: Who should teach? Bethesda, MD: Edito-
rial Project in Education.
Eisenhart, M. A., & Howe, K. R. (1992). Validity in educational research. In M. D. Lecompte, W. L.
Millroy, & J. Preissle (Eds.), Handbook of qualitative research in education (pp. 643–680). San
Diego, CA: Academic Press.
Elliott, N., & Ensign, G. (1999). The Washington assessment of student learning: An update on writing.
Olympia, WA: Office of the Superintendent of Public Instruction. Retrieved January 2, 2001,
from: http://www.k12.wa.us/assessment/assessproginfo/subdocuments/writupdate.asp
Elton, L., & Laurillard, D. (1979). Trends in student learning. Studies in Higher Education, 4, 87–102.
English, F. W. (1992). Deciding what to teach and test: Developing, aligning, and auditing the curricu-
lum. Newbury Park, CA: Corwin Press.
Erickson, F. (1986). Qualitative methods in research on teaching. In M. Wittrock (Ed.), Handbook
of research on teaching (3rd ed., pp. 119–161). New York: Macmillan.
Falvey, P. (1995). The education of teachers of English in Hong Kong: A case for special treat-
ment. In F. Lopez-Real (Ed.), Proceedings of ITEC ’95 (pp. 107–113). Hong Kong: University of
Hong Kong, Department of Curriculum Studies.
Fish, J. (1988). Responses to mandated standardized testing. Unpublished doctoral dissertation,
University of California, Los Angeles.
Frederiksen, J. R., & Collins, A. (1989). A system approach to educational testing. Educational Re-
searcher, 18(9), 27–32.
Frederiksen, N. (1984). The real test bias: Influences of testing on teaching and learning. Ameri-
can Psychology, 39, 193–202.
Fullan, M. G. (1993). Change forces: Probing the depth of educational reform. London: Falmer Press.
Fullan, M. G. (1998). Linking change and assessment. In P. Rea-Dickins & K. P. Germaine (Eds.),
Managing evaluation and innovation in language teaching: Building bridges (pp. 253–262). Lon-
don: Longman.
Fullan, M., & Park, P. (1981). Curriculum implementation: A resource booklet. Toronto, Ontario,
Canada: Ontario Ministry of Education.
Fullan, M. G., with Stiegelbauer, S. (1991). The new meaning of educational change (2nd ed.). Lon-
don: Cassell.
Gardner, H. (1992). Assessment in context: The alternative to standardized testing. In B. R.
Gifford & M. C. O’Connor (Eds.), Changing assessments: Alternative views of aptitude, achieve-
ment and instruction (pp. 77–119). London: Kluwer Academic.
Geertz, C. (1973). The interpretation of culture. New York: Basic Books.
Genesee, F. (1994). Assessment alternatives. TESOL Matters, 4(5), 2.
Gifford, B. R., & O’Connor, M. C. (Eds.). (1992). Changing assessments: Alternative views of aptitude,
achievement and instruction. London: Kluwer Academic.
REFERENCES 217

Gipps, C. V. (1994). Beyond testing: Toward a theory of educational assessment. London: Falmer
Press.
Glaser, R. (1981). The future of testing: A research agenda for cognitive psychology and
psychometrics. American Psychologist, 36, 923–936.
Glaser, R. (1990). Towards new models of assessment. International Journal for Educational Re-
search, 14, 475–483.
Glaser, R., & Bassok, M. (1989). Learning theory and the study of instruction. Annual Review of
Psychology, 40, 631–666.
Glaser, R., & Silver, E. (1994). Assessment, testing, and instruction: Retrospect and prospect (Tech.
Rep. 379). Pittsburgh, PA: University of Pittsburgh, Learning Research and Development Cen-
ter.
Goetz, J. P., & LeCompte, M. D. (1984). Ethnography and qualitative design in educational research.
Orlando, FL: Academic Press.
Goldstein, H. (1989). Psychometric test theory and educational assessment. In H. Simons & J.
Elliot (Eds.), Rethinking appraisal and assessment (pp. 140–148). Milton Keynes, England:
Open University Press.
Grove, E. (1997, October). Accountability in competency-based language programs: Issues of curricu-
lum and assessment. Paper presented at the meeting of the Applied Linguistics Association of
Australia, Toowoomba, Australia.
Gui, S., Li, X., & Li, W. (1988). A reflection on experimenting with the National Matriculation Eng-
lish Test. In National Education Examination Authorities (Ed.), Theory and practice of stan-
dardized test (pp. 70–85). Guangzhou, China: Guangdong Higher Education Press.
Hagan, P. (1994). Competency-based curriculum: The NSW AMES experience. Prospect, 9(2),
30–40.
Hagan, P., Hood, S., Jackson, E., Jones, M., Joyce, H., & Manidis, M. (1993). Certificate in spoken
and written English (2nd ed.). Sydney, Australia: New South Wales Adult Migrant English Ser-
vice.
Haladyna, T. M., Nolen, S. B., & Haas, N. S. (1991). Raising standardized achievement test scores
and the origins of test score pollution. Educational Research, 20(5), 2–7.
Hamp-Lyons, L. (1997). Washback, impact and validity: Ethical concerns. Language Testing, 14,
295–303.
Han, J. (1997). The educational statistics yearbook of China. Beijing, China: People’s Education
Press.
Hargreaves, A. (1994). Changing teachers, changing times: Teachers’ work and culture in the post-
modern age. London: Cassell.
Hayek, F. A. (1952). The counter-revolution of science: Studies on the abuse of reason. Indianapolis,
IN: Liberty Press.
Henrichsen, L. E. (1989). Diffusion of innovations in English language teaching: The ELEC effort in Ja-
pan, 1956–1968. New York: Greenwood Press.
Herman, J. L. (1989). Priorities of educational testing and evaluation: The testimony of the CRESST
National Faculty (Tech. Rep. 304). Los Angeles: University of California, Center for the Study
of Evaluation.
Herman, J. L. (1992). Accountability and alternative assessment: Research and development issues
(Tech. Rep. 384). Los Angeles: University of California, Center for the Study of Evaluation.
Herman, J. L., & Golan, S. (1993). The effects of standardized testing on teaching and schools. Ed-
ucational Measurement: Issues and Practice, 12(4), 20–25, 41–42.
Herman, J. L., & Golan, S. (n.d.). Effects of standardized testing on teachers and learning. Another
look (CSE Tech. Rep. 334). Los Angeles: University of California National Center for Research
on Evaluation, Standards, and Student Testing.
Herrington, R. (1996). Test-taking strategies and second language proficiency: Is there a relationship?
Unpublished master’s thesis, Lancaster University, England.
218 REFERENCES

Heyneman, S. P. (1987). Use of examinations in developing countries: Selection, research, and


education sector management. International Journal of Education Development, 7, 251–263.
Heyneman, S. P., & Ransom, A. W. (1990). Using examinations and testing to improve educational
quality. Educational Policy, 4, 177–192.
Hivela, A., & Law, E. (1991). A survey of local English teachers’ attitudes towards English and
E. L. T. Institute of Language in Education Journal, 8, 25–28.
Hong Kong Examinations Authority. (1993). Hong Kong certificate of education examination 1996—
Proposed English language syllabus. Hong Kong: Author.
Hong Kong Examinations Authority. (1994a). Hong Kong certificate of education examination
1996—English language. Hong Kong: Author.
Hong Kong Examinations Authority. (1994b). The work of the Hong Kong examinations authority—
1977—93. Hong Kong: Author.
Hong Kong Government. (1993). Enrolment survey 1993. Hong Kong: Education Department.
Hood, S. (1995). From curriculum to courses: Why do teachers do what they do? In A. Burns & S.
Hood (Eds.), Teachers’ voices: Exploring course design in a changing curriculum (pp. 21–34).
Sydney, NSW, Australia: National Center for English Language Teaching and Research.
Hopkins, D. (1985). A teacher’s guide to classroom research. Milton Keynes, UK: Open University
Press.
Horak, T. (1996). IELTS impact study project. Unpublished manuscript, Lancaster University, Eng-
land.
Hu, C. T. (1984). The historical background: Examinations and controls in pre-modern China.
Comparative Education, 20, 7–26.
Hu, Y. (1990). Teaching English in Chinese secondary schools. In Y. F. Dzau (Ed.), English in China
(pp. 59–67). Hong Kong: API Press.
Huberman, A. M. (1973). Understanding change in education: An introduction. Paris: Organization
for Economic Co-operation and Development.
Hughes, A. (1988). Introducing a needs-based test of English language proficiency into an Eng-
lish-medium university in Turkey. In A. Hughes (Ed.), Testing English for university study (pp.
134–153). London: Modern English Publications.
Hughes, A. (1989). Testing for language teachers. Cambridge, England: Cambridge University
Press.
Hughes, A. (1993). Backwash and TOEFL 2000. Unpublished manuscript, University of Reading,
England.
Ingram, D. E. (1984). Australian second language proficiency ratings. Canberra, Australia: Depart-
ment of Immigration and Ethnic Affairs.
Jaeger, R. M. (1988). Survey research methods in education. In R. M. Jaeger (Ed.), Complementary
methods for research in education (pp. 303–330). Washington, DC: American Educational Re-
search Association.
James, M. (2000). Measured lives: The rise of assessment as the engine of change in English
schools. Curriculum Journal, 11, 343–364.
James, M., & Gipps, C. (1998). Broadening the basis of assessment to prevent the narrowing of
learning. The Curriculum Journal, 9, 285–297.
Japan Association of College English Teachers (JACET). (1993). 21 seki ni muketeno eigo kyoiku
[English education for 21 centuries]. Tokyo: Taishukan shoten.
Johnson, R. K. (Ed.). (1989). The second language curriculum. Cambridge, England: Cambridge
University Press.
Johnston, P. (1989). Constructive evaluation and the improvement of teaching and learning.
Teachers College Record, 90, 509–528.
Kellaghan, T., & Greaney, V. (1992). Using examinations to improve education: A study of fourteen
African countries. Washington, DC: World Bank.
Kellaghan, T., Madaus, G. F., & Airasian, P. (1982). The effects of standardized testing. Boston, MA:
Kluwer-Nijhoff.
REFERENCES 219

Kemmis, S., & McTaggart, R. (1988). The action research planner (3rd ed.). Melbourne, Victoria,
Australia: Deakin University Press.
Kennedy, C. (1988). Evaluation of the management of change in ELT projects. Applied Linguistics,
9, 329–342.
Khaniyah, T. R. (1990). Examinations as instruments for educational change: Investigating the
washback effect of the Nepalese English exams. Unpublished doctoral dissertation, University
of Edinburgh, Scotland.
King, R. (1997). Can public examinations have a positive washback effect on classroom teaching?
In P. Grundy (Ed.), IATEFL 31th International Annual Conference Brighton, April 1997 (pp.
33–38). London: International Association of Teachers of English as a Foreign Language.
Koretz, D., Barron, S., Mitchell, K., & Stecher, B. (1996). The perceived effects of the Kentucky in-
structional results information system (KIRIS) (Document No. MR-792-PCT/FF). Santa Monica,
CA: RAND.
Koretz, D., Stecher, B., Klein, S., & McCaffrey, D. (1994). The Vermont portfolio assessment pro-
gram: Findings and implications. Educational Measurement: Issues and Practice, 13(3), 5–16.
Krashen, S. D. (1993). The power of reading. Englewood, CO: Libraries Unlimited, Inc.
Krashen, S. D. (1998). Comprehensible output? System, 26, 175–182.
Kuckartz, U. (1998). WinMax. Scientific text analysis for the social sciences: User’s guide. Thousand
Oaks, CA: Sage.
Kunnan, A. (2000). IELTS impact study project. Cambridge, England: University of Cambridge Lo-
cal Examinations Syndicate.
Lai, C. T. (1970). A scholar in imperial China. Hong Kong: Kelly & Walsh.
Lam, H. P. (1993). Washback—Can it be quantified? A study on the impact of English Examinations in
Hong Kong. Unpublished master’s thesis, University of Leeds, Leeds, England.
Latham, H. (1877). On the action of examinations considered as a means of selection. Cambridge,
England: Deighton, Bell and Company.
Lazaraton, A. (1995). Qualitative research in applied linguistics: A progress report. TESOL Quar-
terly, 29, 455–472.
LeCompte, M. D., & Preissle, J. (1993). Ethnography and qualitative design in educational research
(2nd ed.). New York: Academic Press.
LeCompte, M. D., Millroy, W. L, & Preissle, J. (1992). The handbook of qualitative research in educa-
tion. San Diego, CA: Academic Press.
Lewkowicz, J. A. (2000). Authenticity in language testing: Some outstanding questions. Language
Testing, 17, 43–64.
Li, X. (1984). In defense of the communicative approach. ELT Journal, 38(1), 2–13.
Li, X. (1988). Teaching for use, learning by use and testing through use. In H. Xiao (Ed.), Standard-
ized English test and ELT in the middle schools (pp. 80–90). Guangzhou: Guangdong Education
Press.
Li, X. (1990). How powerful can a language test be? The MET in China. Journal of Multilingual and
Multicultural Development, 11, 393–404.
Li, X., Gui, S., & Li, W. (1990). The design of the NMET and ELT in middle schools. English Lan-
guage Teaching and Research in Primary Schools and Middle Schools, 1, 1–27.
Lincoln, Y. S., & Guba, E. G. (1985). Naturalistic inquiry. Beverly Hills, CA: Sage.
Linn, R. L. (1983). Testing and instruction: Links and distinctions. Journal of Educational Measure-
ment, 20, 179–189.
Linn, R. L. (1992). Educational assessment: Expanded expectations and challenges (Tech. Rep. 351).
Boulder: University of Colorado at Boulder, Center for the Study of Evaluation.
Linn, R. L. (2000). Assessments and accountability. Educational Researcher, 29(2), 4–16.
Linn, R. L., & Herman, J. L. (1997, February). Standards-led assessment: Technical and policy issues
in measuring school and student progress (CSE technical report 426). Los Angeles: University of
California National Center for Research on Evaluation, Standards, and Student Testing.
220 REFERENCES

Linn, R. L., Baker, E. L., & Dunbar, S. B. (1991). Complex, performance-based assessment: Expec-
tations and validation criteria. Educational Researcher, 20(8), 15–21.
Liu, Y. (Ed.). (1992). Book of major educational events in China. Hangzhou, China: Zhejiang Educa-
tion Press.
Lock, C. L. (2001). The influence of a large-scale assessment program on classroom practices. Unpub-
lished doctoral dissertation, Queen’s University, Kingston, Ontario, Canada.
London, N. (1997). A national strategy for system-wide curriculum improvement in Trinidad and
Tobago. In D. W. Chapman, L. O. Mahlck, & A. Smulders (Eds.), From planning to action: Gov-
ernment initiatives for improving school level practice (pp. 133–146). Paris: International Insti-
tute for Educational Planning.
Low, G. D. (1988). The semantics of questionnaire rating scales. Evaluation and Research in Edu-
cation, 22, 69–79.
Macintosh, H. G. (1986). The prospects for public examinations in England and Wales. In D. L.
Nuttall (Ed.), Assessing educational achievement (pp. 19–34). London: Falmer Press.
Madaus, G. F. (1985a). Public policy and the testing profession: You’ve never had it so good? Edu-
cational Measurement: Issues and Practice, 4(4), 5–11.
Madaus, G. F. (1985b). Test scores as administrative mechanisms in educational policy. Phi Delta
Kappa, 66, 611–17.
Madaus, G. F. (1988). The influence of testing on the curriculum. In L. N. Tanner (Ed.), Critical is-
sues in curriculum: Eighty-seventh yearbook of the National Society for the Study of Education (pp.
83–121). Chicago: University of Chicago Press.
Madaus, G. F., & Kellaghan, T. (1992). Curriculum evaluation and assessment. In P. W. Jackson
(Ed.), Handbook of research on curriculum (pp. 119–154). New York: Macmillan.
Maeher, M. L., & Fyans, L. J., Jr. (1989). School culture, motivation, and achievement. In M. L.
Maehr & C. Ames (Eds.), Advances in motivation and achievement: Vol. 6. Motivation enhancing
environments (pp. 215–247). Greenwich, CT: JAI Press.
Markee, N. (1993). The diffusion of innovation in language teaching. Annual Review of Applied Lin-
guistics, 13, 229–243.
Markee, N. (1997). Managing curricular innovation. Cambridge, England: Cambridge University
Press.
Marton, F., Hounsell, D. J., & Entwistle, N. J. (Eds.). (1984). The experience of learning. Edinburgh,
Scotland: Scottish Academic Press.
McCallum, B., Gipps, C., McAlister, S., & Brown, M. (1995). National curriculum assessment:
Emerging models of teacher assessment in the classroom. In H. Torrance (Ed.), Evaluating
authentic assessment: Problems and possibilities in new approaches to assessment (pp. 88–104).
Buckingham, England: Open University Press.
McEwen, N. (1995a). Educational accountability in Alberta. Canadian Journal of Education, 20,
27–44.
McEwen, N. (1995b). Introducing accountability in education in Canada. Canadian Journal of Edu-
cation, 20, 1–17.
McIver, M. C., & Wolf, S. A. (1999). The power of the conference is the power of suggestion. Lan-
guage Arts, 77, 54–61.
McNamara, T. (1996). Measuring second language performance. London: Longman.
Merriam, S. B. (1988). Case study research in education: A qualitative approach. San Francisco:
Jossey Bass.
Messick, S. (1975). The standard problem: Meaning and values in measurement and evaluation.
American Psychologist, 30, 955–966.
Messick, S. (1989). Validity. In R. Linn (Ed.), Educational measurement (3rd ed., pp. 13–103). New
York: Macmillan.
Messick, S. (1992, April). The interplay between evidence and consequences in the validation of per-
formance assessments. Paper presented at the annual meeting of the National Council on
Measurement in Education, San Francisco.
REFERENCES 221

Messick, S. (1994). The interplay of evidence and consequences in the validation of performance
assessments. Educational Researcher, 23(2), 13–23.
Messick, S. (1996). Validity and washback in language testing. Language Testing, 13, 241–256.
Milanovic, M., & Saville, N. (1996). Considering the impact of Cambridge EFL examinations. Cam-
bridge, England: University of Cambridge Local Examinations Syndicate.
Miles, M. B., & Huberman, A. M. (1994). Qualitative data analysis: An expanded sourcebook (2nd
ed.). Thousand Oaks, CA: Sage.
Ministry of Education, Science and Culture. (1992). Atarashii jidai ni taio suru kyoiku no shosedo no
kaikaku—dai 14 ki chuo kyoiku shingikai toshin [Reforming educational systems for a new
era—a report from 14th conference on education]. Tokyo: Author.
Ministry of Education. (1998). Director General Bulletin. Jerusalem: English Inspectorate.
Ministry of Education. (1999). Internal document: No. 3. Beijing, China: Author.
Morris, N. (1961). An historian’s view of examinations. In S. Wiseman (Ed.), Examinations and
English education. Manchester, England: Manchester University Press.
Morris, P. (1985). Teachers’ perceptions of the barriers to the implementation of a pedagogic in-
novation: A South East Asian case study. International Review of Education, 31, 3–18.
Morris, P. (1990). Teachers’ perceptions of the barriers to the implementation of a pedagogic in-
novation. In P. Morris (Ed.), Curriculum development in Hong Kong (pp. 45–60). Hong Kong:
Hong Kong University Press.
Morris, P. (1995). The Hong Kong school curriculum: Development, issues and policies. Hong Kong:
Hong Kong University Press.
Morris, P., Adamson, R., Au, M. L., Chan, K. K., Chan, W. Y., Yuk, K. P., et al. (1996). Target oriented
curriculum evaluation project: Interim report. Hong Kong: University of Hong Kong, Faculty of
Education.
Morrow, K. (1986). The evaluation of tests of communicative performance. In M. Portal (Ed.), In-
novations in language testing: Proceedings of the IUS/NFER conference (pp. 1–13). London:
NFER/Nelson.
Mosier, C. I. (1947). A critical examination of the concepts of face validity. Educational and Psy-
chological Measurement, 7, 191–205.
Munby, J. (1978). Communicative syllabus design. Cambridge, England: Cambridge University
Press.
Nagano, S. (1984). Kyoiku hyoka ron [Evaluation of education]. Tokyo: Daiichi hoki.
National Teaching Syllabus. (1990). China: Ministry of Education, PRC.
National Teaching Syllabus. (1996). China: Ministry of Education, PRC.
New South Wales Adult Migrant English Service. (1995). Certificates I, II, III and IV in spoken and
written English. Sydney, Australia: Author.
New South Wales Adult Migrant English Service. (1997). Certificates I, II, III and IV in spoken and
written English. Sydney, Australia: Author.
Noble, A. J., & Smith, M. L. (1994a). Measurement-driven reform: Research on policy, practice, reper-
cussion (Tech. Rep. 381). Tempe, AZ: Arizona State University, Center for the Study of Evalua-
tion.
Noble, A. J., & Smith, M. L. (1994b). Old and new beliefs about measurement-driven reform: ‘The
more things change, the more they stay the same’ (Tech. Rep. 373). Tempe, AZ: Arizona State
University, Center for the Study of Evaluation.
Nolen, S. B., Haladyna, T. M., & Haas, N. S. (1992). Uses and abuses of achievement test scores.
Educational Measurement: Issues and Practice, 11(2), 9–15.
Nunan, D. (1989). Understanding language classrooms: A guide for teacher-initiated action. New
York: Prentice-Hall.
Office of the Superintendent of Public Instruction. The Washington assessment of student learning:
An update on writing. Retrieved February 15, 2000, from http://www.k12.wa.us/assessment/
assessproginfo/subdocuments/writupdate.asp
222 REFERENCES

Ogawa, Y. (1981). Hanaseru dake ga eigo ja nai [Beyond English conversation: Making school Eng-
lish work for you]. Tokyo: Simul Press.
Oppenheim, A. N. (1992). Questionnaire design, interviewing and attitude measurement. London:
Pinter.
Oxenham, J. (Ed.). (1984). Education versus qualifications? London: Allen and Unwin.
Paris, S. G., Lawton, T. A., Turner, J. C., & Roth, J. L. (1991). A developmental perspective on stan-
dardized achievement testing. Educational Researcher, 20(5), 12–19.
Patton, M. Q. (1987). How to use qualitative methods in evaluation. London: Sage.
Pearson, I. (1988). Tests as levers for change. In D. Chamberlain & R. J. Baumgardner (Eds.), ESP
in the classroom: Practice and evaluation (pp. 98–107). London: Modern English.
Petrie, H. G. (1987). Introduction to evaluation and testing. Educational Policy, 1, 175–180.
Phillipson, R. (1992). Linguistic imperialism. Oxford, England: Oxford University Press.
Popham, W. J. (1983). Measurement as an instructional catalyst. In R. B. Ekstrom (Ed.), New direc-
tions for testing and measurement: Measurement, technology, and individuality in education (pp.
19–30). San Francisco: Jossey-Bass.
Popham, W. J. (1987). The merits of measurement-driven instruction. Phi Delta Kappa, 68,
679–682.
Popham, W. J. (1993). Measurement-driven instruction as a ‘quick-fix’ reform strategy. Measure-
ment and Evaluation in Counseling and Development, 26, 31–34.
Popham, W. J., Cruse, K. L., Rankin, S. C., Standifer, P. D., & Williams, P. L. (1985). Measurement-
driven instruction: It is on the road. Phi Delta Kappa, 66, 628–634.
Por, L. H., & Ki, C. S. (1995). Methodology washback: An insider’s view. In D. Nunan, R. Berry, & V.
Berry (Eds.), Bringing about change in language education (pp. 217–235). Hong Kong: Univer-
sity of Hong-Kong, Department of Curriculum Studies.
Purpura, J. (1999). Learner strategy use and performance on language tests. Cambridge, England:
Cambridge University Press.
Quinn, T. J. (1993). The competency movement, applied linguistics and language testing: Some
reflections and suggestions for a possible research agenda. Melbourne Papers in Language
Testing, 2(2), 55–87.
Quirk, R. (1995, December). The threat and promise of English. Paper presented at the Language
Planning and Policy Conference, Ramat Gan, Israel.
Read, J. (1999, July). The policy context of English testing for immigrants. Paper presented at the
Language Testing Research Colloquium, Tsukuba, Japan.
Reischauer, E. O., Kobayashi, H., & Naya, Y. (1989). Nihon no kokusaika [The internationalization
of Japan]. Tokyo: Bunge Shunju Sha.
Resnick, L. B. (1989). Toward the thinking curriculum: An overview. In L. B. Resnick & L. E.
Klopfer (Eds.), Toward the thinking curriculum: Current cognitive research (pp. 1–18). Reston,
VA: Association for Supervision and Curriculum Development.
Resnick, L. B., & Resnick, D. P. (1992). Assessing the thinking curriculum: New tools for educa-
tional reform. In B. R. Gifford & M. C. O’Connor (Eds.), Changing assessments: Alternative views
of aptitude, achievement and instruction (pp. 37–75). London: Kluwer Academic.
Robinson, P. (1993). Teachers facing change. Adelaide, Australia: National Center for Vocational
Education Research.
Rogers, E. M. (1983). The diffusion of innovations (3rd ed.). New York: Macmillan.
Rohlen, T. P. (1983). Japan’s high schools. Berkeley: University of California Press.
Rumsey, D. (1993, November). A practical model for assessment of workplace competence within a
competency-based system of training. Paper presented at the Testing Times Conference, Syd-
ney, NSW, Australia.
Runte, R. (1998). The impact of centralized examinations on teacher professionalism. Canadian
Journal of Education, 23, 166–181.
Saito, T., Arita, S., & Nasu, I. (1984). Tashi-sentaku tesuto ga igaku kyoiku ni oyobosu eikyo [The in-
fluence of multiple-choice test on medical education] (Nihon igaku kyoiku shinko zaidan
REFERENCES 223

kenkyu jose ni yoru kenkyu hokoku sho [Technical report of the Japan Medical Education Re-
search Fund]). Okayama: Kawasaki Medical School.
Sanders, W. L., & Horn, S. P. (1995). Educational assessment reassessed: The usefulness of stan-
dardized and alternative measures of student achievement as indicators for the assessment
of educational outcomes. Education Policy Analysis Archives, 3(6), 1–15.
Saville, N. (1998). Predicting impact on language learning and the classroom. UCLES internal report.
Cambridge, England: University of Cambridge Local Examinations Syndicate.
Saville, N. (2000). Investigating the impact of international language examinations (Research Notes
No. 2). Available from University of Cambridge Local Examinations Syndicate Web site, http://
www.cambridge-efl.org/rs_notes.
Scaramucci, M. (1999, July). A study of washback in Brazil. Paper presented at the Language
Testing Research Colloquium, Tsukuba, Japan.
Schiefelbein, E. (1993). The use of national assessments to improve primary education in Chile.
In D. W. Chapman & L. O. Mahlck (Eds.), From data to action: Information systems in educational
planning (pp. 117–146). Paris, France: UNESCO.
Seliger, H. W., & Shohamy, E. G. (1989). Second language research methods. Oxford: Oxford Univer-
sity Press.
Shavelson, R. J., & Stern, P. (1981). Research on teachers’ pedagogical thoughts, judgments, deci-
sions, and behavior. Review of Educational Research, 51, 455–498.
Shepard, L. A. (1990). Inflated test score gains: Is the problem old norms or teaching the test? Ed-
ucational Measurement: Issues and Practice, 9, 15–22.
Shepard, L. A. (1991a). Interview on assessment issues with Lorrie Shepard. Educational Re-
searcher, 20(2), 21–27.
Shepard, L. A. (1991b). Psychometricians’ beliefs about learning. Educational Researcher, 20(6),
2–16.
Shepard, L. A. (1992). What policy makers who mandate tests should know about the new psy-
chology of intellectual ability and learning. In B. R. Gifford & M. C. O’Connor (Eds.), Changing
assessments: Alternative views of aptitude, achievement and instruction (pp. 301–327). London:
Kluwer Academic.
Shepard, L. A. (1993). The place of testing reform in educational reform: A reply to Cizek. Educa-
tional Researcher, 22(4), 10–14.
Shepard, L. A., & Dougherty, K. C. (1991, April). Effects of high-stakes testing on instruction. Paper
presented at the annual meeting of the American Educational Research Association and the
National Council on Measurement in Education, Chicago.
Shiozawa, T. (1983). Daigaku nyushi—genjo to kadai [University entrance examinations—the pres-
ent situation and problems]. Eigo Kyoiku Sokan 30-shunen Kinen Zokango [English Teacher’s
Magazine, 30th Anniversary Special Issue], 39–41.
Shohamy, E. (1992). Beyond proficiency testing: A diagnostic feedback testing model for assess-
ing foreign language learning. Modern Language Journal, 76, 513–521.
Shohamy, E. (1993a). The power of test: The impact of language testing on teaching and learning.
Washington, DC: National Foreign Language Center Occasional Papers. The National Foreign
Language Center, Washington, DC.
Shohamy, E. (1993b). The exercise of power and control in the rhetorics of testing. Center for Applied
Language Studies, Carleton University, Ottawa, Canada, 10:48–62.
Shohamy, E. (1997). Testing methods, testing consequences: Are they ethical? Are they fair? Lan-
guage Testing, 14, 340–349.
Shohamy, E. (1999). Language testing: Impact. In B. Spolsky (Ed.), Concise Encyclopedia of Educa-
tional Linguistics (pp. 711–714). Oxford, England: Pergamon.
Shohamy, E. (2000). Using language tests for upgrading knowledge. Hong Kong Journal of Applied
Linguistics, 5(1), 1–18.
Shohamy, E., & Donitsa-Schmidt, S. (1995, April). The perceptions and stereotypes of Hebrew vs.
English among three different ethnic groups in Israel. Paper presented at the meeting of the
American Association of Applied Linguistics, Long Beach, CA.
224 REFERENCES

Shohamy, E., Donitsa-Schmidt, S., & Ferman, I. (1996). Test Impact revisited: Washback effect
over time. Language Testing, 13, 298–317.
Silverman, D. (1993). Interpreting qualitative data: Methods for analyzing talk, text and interaction.
London: Sage.
Simon, B. (1974). The two nations and educational structure 1780–1870. London: Lawrence & Wis-
hart.
Smith, M. L. (1991a). Meanings of test preparation. American Educational Research Journal, 28,
521–542.
Smith, M. L. (1991b). Put to the test: The effects of external testing on teachers. Educational Re-
searcher, 20(5), 8–11.
Snyder, C. W., Jr., Prince, B., Johanson, G., Odaet, C., Jaji, L., & Beatty, M. (1997). Exam fervor and
fever: Case studies of the influence of primary leaving examinations on Uganda classrooms,
teachers, and pupils: Vol. 1. Washington, DC: Academy for Educational Development, Ad-
vancing Basic Education and Literacy Project.
Somerset, A. (1983). Examination reform: The Kenya experience. Washington, DC: World Bank.
Spada, N., & Froehlich, M. (1995). COLT: Communicative orientation of language teaching observa-
tion scheme, coding conventions and applications. Sydney, NSW: Macquarie University, Na-
tional Center for English Language Teaching and Research.
Spolsky, B. (1995a). The examination of classroom backwash cycle: Some historical cases. In D.
Nunan, V. Berry, & R. Berry (Eds.), Bringing about change in language education (pp. 55–66).
Hong Kong: University of Hong Kong, Department of Curriculum Studies.
Spolsky, B. (1995b). Measured words. Oxford: Oxford University Press.
Stecher, B., & Barron, S. (1999). Quadrennial milepost accountability testing in Kentucky (Tech. Rep.
505). Los Angeles: University of California, National Center for Research on Evaluation, Stan-
dards, and Student Testing.
Stecher, B., Barron, S., Chun, T., Krop, C., & Ross, K. (2000). The effects of Washington education re-
form on schools and classrooms (Tech. Rep. 525). Los Angeles: University of California, Na-
tional Center for Research on Evaluation, Standards, and Student Testing.
Stecher, B., Barron, S., Kaganoff, T., & Goodwin, J. (1998). The effects of standards-based assess-
ment on classroom practices: Results of the 1996–97 RAND survey of Kentucky teachers of mathe-
matics and writing (Tech. Rep. 482). Los Angeles: University of California, National Center for
Research on Evaluation, Standards, and Student Testing.
Stecher, B., & Chun, T. (2002). School and classroom practices during two years of education reform
in Washington state (CSE Tech. Rep. No. 550). Los Angeles: University of California, National
Center for Research on Evaluation, Standards, and Student Testing.
Steiner, J. (1995a). Changes in the English Bagrut exam. Jerusalem: Ministry of Education, English
Inspectorate.
Steiner, J. (1995b). Reading for pleasure. Jerusalem: Ministry of Education, English Inspectorate.
Stevenson, D. K., & Riewe, U. (1981). Teachers’ attitudes towards language tests and testing. In T.
Culhane, C. Klein-Braley, & D. K. Stevenson (Eds.), Practice and problems in language testing.
Occasional Papers, 26 (pp. 146–155). Essex, UK: University of Essex.
Stiggins, R., & Faires-Conkin, N. (1992). In teachers’ hands. Albany, NY: State University of New
York Press.
Stoller, F. (1994). The diffusion of innovations in intensive ESL programs. Applied Linguistics, 15,
300–327.
Swain, M. (1984). Large-scale communicative language testing: A case study. In S. J. Savignon &
M. Berns (Eds.), Initiatives in communicative language teaching (pp. 185–201). Reading, MA: Ad-
dison-Wesley.
Swain, M. (1985). Large-scale communicative language testing: A case study. In Y. P. Lee, A. C. Y. Y.
Fok, R. Lord, & G. Low (Eds.), New directions in language testing (pp. 35–46). Oxford:
Pergamon.
REFERENCES 225

Swain, M. (1995). Three functions of output in second language learning. In G. Cook & B.
Scidelhofer (Eds.), Principle and practice in applied linguistics: Studies in honor of J. G.
Woddowson (pp. 125–144). Oxford, England: Oxford University Press.
Swain, M., & Lapkin, S. (1995). Problems in output and the cognitive processes they generate: A
step towards second language learning. Applied Linguistics, 16, 371–391.
Takano, F. (1992). Daigaku nyushi no kaizen ni mukete [Towards a reform of university entrance
examination]. Shizen, 7, 13–26.
Tang, C., & Biggs, J. B. (1996). How Hong Kong students cope with assessment. In D. A. Watkins &
J. B. Biggs (Eds.), The Chinese learner: Cultural, psychological and contextual influences (pp.
159–182). Hong Kong: Center for Comparative Research in Education.
Troman, G. (1989). Testing tension: The politics of educational assessment. British Educational
Research Journal, 15, 279–295.
University of Cambridge Local Examinations Syndicate (UCLES). (1999). The IELTS handbook.
Cambridge, England: Authors.
University of Cambridge Local Examinations Syndicate (UCLES). (2000). IELTS handbook. Cam-
bridge, England: Author.
Valette, R. M. (1967). Modern language testing. New York: Harcourt Brace.
van Lier, L. (1988). The classroom and the language learner. New York: Longman.
VanPatten, B., & Sanz, C. (1995). From input to output: Processing instruction and communica-
tive task. In F. R. Eckman, D. Highland, P. W. Lee, J. Mileham, & R. R. Weber (Eds.), Second lan-
guage acquisition: Theory and pedagogy (pp. 169–186). Mahwah, NJ: Lawrence Erlbaum Associ-
ates.
Vernon, P. E. (1956). The measurement of abilities (2nd ed.). London: University of London Press.
Vogel, E. F. (1979). Japan as number one: Lessons for America. Tokyo: Charles E. Tuttle.
Wall, D. (1996). Introducing new tests into traditional systems: Insights from general education
and from innovation theory. Language Testing, 13, 334–354.
Wall, D. (1997). Impact and washback in language testing. In C. Clapham & D. Corson (Eds.), Ency-
clopedia of language and education: Vol. 7. Language testing and assessment (pp. 291–302).
Dordrecht: Kluwer Academic.
Wall, D. (1999). The impact of high-stakes examinations on classroom teaching: A case study using in-
sights from testing and innovation theory. Unpublished doctoral dissertation, Lancaster Uni-
versity, UK.
Wall, D. (2000). The impact of high-stakes testing on teaching and learning: Can this be predicted
or controlled? System, 28, 499–509.
Wall, D., & Alderson, J. C. (1993). Examining washback: The Sri Lankan impact study. Language
Testing, 10, 41–69.
Wall, D., & Alderson, J. C. (1996). Examining washback: The Sri Lanka impact study. In A.
Cumming & R. Berwick (Eds.), Validation in language testing (pp. 194–221). Philadelphia: Multi-
lingual Matters.
Wall, D., Kalnberzina, V., Mazuoliene, Z., & Truus, K. (1996). The Baltic States Year 12 examina-
tion project. Language Testing Update, 19, 15–27.
Washington State Commission on Student Learning. (1997). Essential academic learning require-
ments: Technical manual. Olympia, WA: Author.
Watanabe, Y. (1996a). Investigating washback in Japanese EFL classrooms: Problems of method-
ology. In G. Wigglesworth & C. Elder (Eds.), The language testing circle: From inception to
washback (pp. 208–239). Melbourne, Victoria, Australia: Applied Linguistics Association of
Australia.
Watanabe, Y. (1996b). Does grammar translation come from the entrance examination? Prelimi-
nary findings from classroom-based research. Language Testing, 13, 318–333.
Watanabe, Y. (1997a). Nyushi kara eigo o hazusu to jugyo wa kawaru ka [Will elimination of Eng-
lish from the entrance examination change classroom instruction?] Eigo kyoiku [English
teachers magazine], September, special issue, 30–35. Tokyo: Taihukan shoten.
226 REFERENCES

Watanabe, Y. (1997b). Washback effects of the Japanese university entrance examination: Class-
room-based research. Unpublished doctoral dissertation, Lancaster University, UK.
Watanabe, Y. (2000). Washback effects of the English section of the Japanese university entrance
examinations on instruction in pre-college level EFL. Language Testing Update, 27, 42–47.
Watanabe, Y. (2001). Does the university entrance examination motivate learners? A case study
of learner interviews. In Akita Association of English Studies (Eds.), Trans-equator exchanges:
A collection of academic papers in honor of Professor David Ingram (pp. 100–110). Akita, Japan:
Author.
Watson-Gegeo, K. A. (1988). Ethnography in ESL: Defining the essentials. TESOL Quarterly, 22,
575–592.
Watson-Gegeo, K. A. (1997). Classroom ethnography. In N. H. Hornberger & D. Corson (Eds.), En-
cyclopedia of language and education: Vol. 8. Research methods in language and education (pp.
135–144). London: Kluwer Academic.
Weir, C. J. (2002). Continuity and innovation: The revision of CPE 1913–2013. Cambridge, England:
Cambridge University Press/UCLES.
Whetton, C. (1999, May). Attempting to find the true cost of assessment systems. Paper presented at
the annual meeting of the International Association for Educational Assessment, Bled, Slo-
venia.
White, R. V. (1988). The ELT curriculum: Design, innovation and management. Oxford: Blackwell.
White, R. V. (1991). Managing curriculum development and innovation. In R. V. White, M. Martin,
M. Stimson, & R. Hodge (Eds.), Management in English language teaching (pp. 166–195). Cam-
bridge, England: Cambridge University Press.
Widen, M. F., O’Shea, T., & Pye, I. (1997). High-stakes testing and the teaching of science. Cana-
dian Journal of Education, 22, 428–444.
Wiggins, G. (1989a). A true test: Toward more authentic and equitable assessment. Phi Delta
Kappa, 70, 703–713.
Wiggins, G. (1989b). Teaching to the (authentic) test. Educational Leadership, 46(7), 41–47.
Wiggins, G. (1993). Assessment: Authenticity, context, and validity. Phi Delta Kappa, 75, 200–214.
Wilkins, D. (1976). Notional syllabuses. Oxford, England: Oxford University Press.
Williams, M., & Burden, R. (1997). Psychology for language teachers: A social constructivist ap-
proach. Cambridge, England: Cambridge University Press.
Winetroube, S. (1997). The design of the teachers’ attitude questionnaires. Cambridge, England: Uni-
versity of Cambridge Local Examinations Syndicate.
Wiseman, S. (Ed.). (1961). Examinations and English education. Manchester, England: Manchester
University Press.
Wolf, S. A., Borko, H., Elliot, R., & McIver, M. (2000). “That dog won’t hunt!”: Exemplary school
change efforts within the Kentucky reform. American Educational Research Journal, 37, 349–
393.
Woods, A., Fletcher, P., & Hughes, A. (1986). Statistics in language studies. Cambridge, England:
Cambridge University Press.
Woods, D. (1996). Teacher cognition in language teaching: Beliefs, decision-making and classroom
practice. Cambridge, England: Cambridge University Press.
Yang, H. (1999, August 5). The validation study of the National College English Test. Paper presented
at the annual meeting of Association Internationale de Linguistique Appliquée (AILA), To-
kyo.
Yue, W. W. (1997). An investigation of textbook materials designed to prepare students for the IELTS
test: A study of washback. Unpublished master’s thesis, Lancaster University, England.

You might also like