Hoadjli's PhD Thesis
Hoadjli's PhD Thesis
Section of English
Board of Examiners:
March 2015
Contents
Dedication ............................................................................................................................................................ I
Acknowledgements ........................................................................................................................................... II
Abstract ............................................................................................................................................................. IV
General Introduction.........................................................................................................................................1
Introduction...................................................................................................................................................... 10
XI
General Introduction
It may seem paradoxical but tests are not sanctions. They should be looked at as a
rewarding experience. In the past, language tests used to be regarded as the students’ “bête
noire”. Students did not enjoy taking tests and teachers did not enjoy marking them.
Nowadays, more focus is put on the relevance of language tests to the teaching operation.
Instead of being a separate subject that frequently takes place at the end of a course, a term or
an academic year, language tests have become an integral part of the curriculum; that is, they
are seen as a learning experience, which is part of the on-going course of study.
More importantly, language tests are now regarded as a valuable tool for providing
information that is relevant to several concerns in language teaching. They can be one way of
providing systematic feedback for both teachers and students. The teacher can see how well
or badly the students are performing and check for any discrepancies between expectations
and actual performance. Likewise, the students can know how much attainment and progress
Language tests can also be a good means of evaluating instructional materials and tasks
and their relevance to the educational goals. Ideally, the goals of language tests and/or test-
items should be clear to students, so that they need not spend time guessing what the teacher
means. If the students perceive the tests as relevant to their needs, they themselves are
probably going to engage more actively in the process of dealing with them.
Another aspect of language tests concerns the insights and inferences a teacher often
draws from the outcomes of language tests. The usefulness of such inferences is manifested
when they provide feedback to be utilized in making the teaching programme more effective
and when they provide information to the kind of materials and tasks students need. These
inferences can be the only ground on which teachers can make appropriate decisions to the
teaching operation.
1
To be able to achieve its aims, a language test must meet the requirements of some
fundamental test-qualities, such as, validity, reliability, and practicality. In other contexts, it
should be authentic, interactive, and should have an impact on all the concerned parties.
Certainly, these considerations will vary from one situation to another because what might be
understanding of how one designs and develops a language test is very crucial.
that a language test exerts an influence on both teaching and learning. It can transgress these
levels to concern even border areas in the educational setting. In the assessment literature,
there is now a clear consensus that the impact of a test on teachers and students is termed
washback. For many specialists, the concept of washback direction encapsulates the
principles that some effects of a language test may be beneficial, while some other effects
may be harmful. A positive washback is often seen as to encourage learning, and, conversely,
a negative washback usually inhibits the attainment of the educational goals held by learners.
In this sense, washback is judged positive or negative according to how far it enhances or
Enchanted by the power of the decisions they can provide, language tests are key
language testing system. In line with this assumption, research on washback has
demonstrated that a language test can also be a way by which teachers’ and students’
behaviours and perceptions of their own abilities can change. In addition, a testing instrument
may influence the content and methodology of teaching programmes, attitudes towards the
value of certain educational goals and activities; in the long term, it many even serve the
2
needs of society as a whole. Given this potential power, a language test, as an innovative act,
can thus become a useful source for contending participants who seek to bring about a new
In Algeria, evidence from assessment practices in the school system is scanty and such
evidence is not of quality to support the inferences that tests are expected to yield in relation
to teaching and learning. It can be argued that the current testing system used by teachers in
the Algerian educational system may be developed under pressure to reflect more closely,
and to sustain desired educational goals. There is clear evidence that these practices are
meant to exploit the format and content of developed tests to improve the final outcomes
quickly and efficiently. In reality, there is something wrong in the way these tests are
conceived. The assessment practices do not consider the fundamental theoretical and
procedural constructs of tests and the connection that should exist between teaching and
learning and the mechanisms by which students have to be assessed. In this spirit, it is
assumed that the employed traditional testing system leads to inappropriate or outmoded
forms of inferences which often fail to keep pace with the requirements of pedagogy.
should be used principally in support of teaching and learning, rather than in producing
limited outcomes for the sake of just passing or failing. The underlying objective in this
context should be employed in supporting of the learning process. Thus, the aim of
assessment is said to develop and improve students’ achievement and progress in learning the
target language, and not just to measure their performances in some given skills and language
components. Of course, this could be attained only when testing is connected to teaching and
learning process through processes of feedback. In addition, there is a need that teachers
should ensure appropriate standards of test design and development and content strategies
with the intention to relate their tests more closely to valued classroom behaviours as these
3
are the underlying assumptions of the educational objectives set out in the official syllabuses
suggested for the teaching of English language in the Algerian schools. In short, achievement
tests should mirror the best practices of teachers, so that test practice will involve students in
activities which will develop a full range of skills in the target language.
From our experience of 10 years teaching English language at the secondary school
level, we observed that the testing practices through the available assessment instruments did
not provide specific available information about students’ achievement and progress in
English. The current testing system employed by the teachers in the Algerian schools relied
only on either copies of the “Bac” exam model, or merely intuitive tests constructed without
reference to any theoretical bases or operational procedures. Wrongly, what teachers attempt
to do is to build test contents on the basis of previous existing tests, believing that such a
practice would improve students’ scores. It is ostensible that there is not much congruence
between test-content and the contents of the syllabuses. The syllabuses, very often, are based
on the integration of the four skills and the development of competencies; but, when it comes
to assessment, our observation with regard to this issue has shown that tests assess one skill
and/or two skills and eschew the other skills. For instance, the didactic unit presented to
students throughout the learning cycle starts with a listening phase, but students are never
tested on that skill. The same remark can be made about the speaking skill.
This study was stimulated by another observation in the field of language testing in the
Algerian context. A dangerous phenomenon has cropped up: the scores have lost their
credibility in judging the actual degree of achievement and progress of students. Those who
might seem bright are really not the best while those with low scores are often believed to be
weak, as far as the learning of English language is concerned. The interpretation of this
phenomenon might reside in the misconception of language test design and development. As
4
a result, a negative washback effect has emerged. Instead of designing and developing a test
that should assess the amount of mastery of the content of the syllabus, and to see to what
extent the educational goals have been reached, teachers turned to becoming mere trainers of
students on how to respond mechanically to a typology of questions and activities that are
frequently included in the “Bac” exam papers. This system urged these teachers to
considerably reduce the time available for instruction, restrict the range of the curriculum and
limit the teaching methods, and potentially diminish their freedom to teach content or use
As pointed out earlier in this section, having been an English teacher, what bothered us
most about the English language teaching in the Algerian secondary schools is that teachers
in this context do not realise how important assessment has been in shaping the current
teaching and learning situation. We consider the situation one of a serious case of negative
It is within this framework that we conceived our work in order to understand the
reasons of the set of anomalies observed in the current testing system in the context under
exploration. The intention is not to criticize this system for the sake of criticizing it, but it is
believed that this study would be an opportunity to judge the mechanisms by which the
Algerian teachers come to evaluate and assess English language learners, and highlight
common errors that currently occur in their assessment practices. To remedy this situation,
we found it more appropriate to suggest an Alternative Testing Model (ATM) that seeks to
overcome the deficiencies diagnosed in the current testing model, and to repair the
shortcomings that are thought to be one of the sources of the decline in learning English
language. To be able to succeed in this new project, there is a need to conceive the
relationship that ought to exist between teaching, learning, and testing as one of partnership.
5
Each element in this equation has a tremendous role to play for the ultimate goals traced in
this research.
2.Research Questions
Based on what has been stated in the problematic, the study explores these research
questions:
RQ1: What is the nature and scope of the washback effect on teachers’ behaviours of
aspects of teaching English language in the Algerian secondary schools due to
the use of the current testing system?
RQ2: What is the nature and scope of the washback effect on students’ perceptions
of aspects of teaching English language in the Algerian secondary schools due
to the use of the current testing system?
RQ3: What strategies does one need to implement the Alternative Testing Model
with EFL classes at the Algerian secondary school level?
RQ4: What is the nature and scope of the washback effect on teachers’ behaviours as
a result of the Alternative Testing Model implementation?
RQ5: What is the nature and scope of the washback effect on students’ perceptions
as a result of the Alternative Testing Model implementation?
The general purpose of the present study is to explore how those involved in teaching
and learning of English in the Algerian secondary schools perceive themselves to be affected
by the implementation of the Alternative Testing Model. More specifically, this study aims at:
students) reacted to the implementation of the ATM. In other words, the study
6
purpose can provide a new solid ground for more consistent assessment reforms
perceptions of the new model, and teachers’ behaviours within the context of
4. Hypotheses
a. A useful achievement test for EFL secondary classes in Algeria will influence
positively teaching and learning; and, conversely, an achievement test for EFL
learners that is not useful will influence negatively teaching and learning.
b. A useful achievement test will influence attitudes towards the content and
washback; and, conversely, an achievement test that does not have important
5. Research Methodology
Because the intention of this study is an exploration of the washback effect of an ATM
on teaching and learning for EFL learners at different levels, focusing on perceptions, values,
and situational factors in the complex and varying situation of the Algerian secondary
schools, the design of the study needs to take into account all variables that are concerned
with the different facets of teaching and learning where such a phenomenon may occur.
In order to draw a picture of the context where the current testing practices take place
and find out the differences of this situation with the intended changes that might occur with
the ATM implementation, the participants’ perceptions, attitudes, and opinions were
7
significantly quested between two phases: (1) prior and (2) after the introduction of the new
methods have different strengths and weaknesses. Thus, using a range of methods can be
would allow the refinement and checking in context of these methods. As a result, the
observations, and focus groups as data collection methods. These methods are seen to
In order to verify the results and check for the validity of these results, the commonly
known statistical method SPSS (Statistical Package for Social Sciences) is employed as an
additional validating instrument in this study. From a research point of view, SPSS is useable
as a complement to the used methods to corroborate the results. From a statistical point of
view, the application of this method allows the researcher to validate the findings in some
Besides a general introduction and a general conclusion, the thesis is divided into 6
chapters. Chapter 1 provides background information and the literature review on the central
research concept of ‘washback’ in this study. It also displays its implications to innovation in
Chapter 2 identifies the research methodology adopted in the current study. The
chapter describes the research methodology, research strategies, and data collection methods.
Chapter 3 describes how the teacher-participants in this study perceive the current
testing system they use to assess and evaluate their students’ achievement and progress. In
8
this chapter, the research design highlights the research questions and purposes and delineates
the procedures used to develop the research instruments in the Preliminary Study. This
includes procedures for validating and increasing reliability of the data collection methods,
Chapter 4 describes the proposed ATM, i.e., it displays the components of the new
testing system. The content and format of the three sub-tests that form an achievement test
based on the new model are discussed. The chapter ends with a description of the time
Chapter 5 discusses the results obtained after the implementation of the ATM. The
reactions of the participants to this innovation are shown in this chapter. As with the
Preliminary Study, the design of the Final Study is displayed. This concerns the data
collection methods used, the kind of respondents in each one of the research instrument, the
data collection and analysis procedures. The chapter ends with discussion and summary of
the findings.
implementation time of the new testing model. In addition, the chapter devotes a section to
the implications and recommendations of the study for those parties involved in the teaching
9
Chapter One: Washback in Language Testing
Introduction
The gist of this chapter turns around the underpinnings that shaped and guided this
research. First, the chapter highlights the origins of washback as a recent concept that has
come to emerge in the scenes of language education in general and language testing in
particular. Second, it provides definition of this concept and its related constructs as it is
conceived by most language testing specialists. Then, it identifies the different types of
washback. Following this section, the chapter displays the functions and mechanisms of the
concept under exploration. Next, washback as a lever to innovation and change in educational
settings is discussed. Finally, this chapter ends with a potential washback model for the
present study that investigates how the different components that make-up washback are
structured. It also provides an ideal opportunity to understand how new testing systems are
In recent years, washback has led to a greater understanding of this construct in the
testing literature. In this section, a review of the literature related to the origins and
definitions of this concept will be displayed. Besides, from this literature review, the different
points of review about what the construct of this concept may encompass will be thoroughly
discussed.
Although the subject of the effects of examination has long been discussed in the
literature of General Education (Kellaghan et al., 1982; Vernon, 1956), and has been looked
at from different points of view (Madaus, 1988; Fredericksen, 1984), it has been common in
the testing literature that the concept of ‘washback’, as it is known now, has come to attract
10
the attention of testing an assessment researchers only at the beginning of the 1990's. Before,
applied linguists used different terms to refer to the idea of examination influence. Some of
these terms included, ‘test impact’ (Bachman & Palmer, 1996; Baker, 1991), ‘systemic
alignment’ (Shepard, 1990), ‘backwash’ (Biggs, 1993), and possibly other terms.
Among this set of terms, two of them dominated the scene of the issue of examination
influence. Reference here is to the two concepts: ‘backwash’ and ‘washback’. Testing
specialists have quickly admitted that in order to avoid any sort of confusion in terms of the
adoption and use of the appropriate terminology, it would be better to assume that these two
concepts can be used interchangeably, and, therefore there is no need to place a clear
distinction between, on the one hand, ‘washback’, and, on the other, ‘backwash’ in language
testing uses and practices. In this sense, Hamp-Lyons (1997) corroborated the idea and noted
that “washback is one of the set of forms that has been used in language education and
language testing to refer to a number of beliefs about the relationship between testing and
learning” (295). She goes on to add that “another set of terms is ‘backwash’, but it would
appear that the terms ‘backwash’ and ‘washback’ are used interchangeably in the field”
(ibid). To confirm this interpretation of the two concepts, Hughes (1993) points out that there
is an interchangeable use of the two terms. He makes it more explicit when he states that
“where washback comes from, I don't know. That I know is that you can find backwash in
dictionaries, but not washback” (57). However, in another context, Cheng and Curtis (2005)
prefer the term ‘washback’ and not ‘backwash’ since they think that ‘washback’ is the
concept that is frequently found in applied linguistics in general and language testing
literature in particular. In short, this brief reviewing of the distinction between ‘washback’
and ‘backwash’ leads one to believe that these two terms do not present any kind of
difference, rather, the great majority of testing researchers who have dealt with this matter
11
have agreed upon the idea that both of the two terms stand for the same sense, and hence each
Language testing researchers have realized that the emergence of this concept is the
result of the considerable reforms and advances that have taken part in the field of language
testing during the last two decades at the end of the twentieth century. Indeed, it has been
assumed that one of the areas that was actively discussed in that period of time was the
influence of tests on both instruction and learning. Cheng (2005) indicates that the subject of
examination influence was rooted in the notion that tests are often seen to drive teaching and
instruction, there is a dire need to seek for the creation of the matching between the construct
of the test and what teachers present in instruction. In other words, he aims to say that the
clearer the fit between test content and teaching is, the greater the potential improvement will
be on the test. In a different approach, Messick (1989) placed the concept washback in a
broader scope of construct validity. He claims that this construct encompasses a set of aspects
about testing such as the impact of tests on language test takers and teachers, the
interpretation of scores by decision makers and the intended uses of these scores. In such a
view, the concept washback stands as an inherent quality of any kind of assessment,
important research concept in the field of language teaching and testing. Tsagari (2006)
proposed an artificial time framework divided into three different but successive phases: (i)
the 'pre-1993' phase, (ii) the '1993' phase, and, finally, (iii) the 'post-1993' phase. First, he
labelled the 'pre-1993' phase the 'myth' phase. He identified it as the period of time when
writers recognized the examination influence phenomenon but no one accounted for it. What
is noticeable in this era was that few empirical studies were carried out and published to the
12
language testing community (Wesdrop, 1982; Hughes, 1988), which made strong claims of
the absence of this phenomenon. Most of the available studies in this period were merely
based on self-report data or on direct results or on test results rather that direct contact with
participants involved. Second, the '1993' phase, was markedly different from the previous one
since it was typically characterised by the publication of a seminal work paper by two
prominent language testing researchers, Alderson and Wall, who are indebted the fact they
were the first who questioned the nature of examination influence. More crucially, the
hypotheses. Finally, the third 'post 1993' phase, or as Tsagari named it the 'reality phase', was
used developed models to accurately decorticate and explain the various components that
To summarize, although relatively little has been written about the origins of washback,
a great deal of information has emerged about various concepts that refer to examination
influence. Based on this review, it is not an exaggeration to say that the study of the origins of
washback is crucial to shape the scope of further needed research in this area. This matter
should be treated as a direct consequence of other educational studies that targeted the
definitions vary from simple and straightforward to complex. Some take a narrow focus on
teachers and students, while others transgress to concern even educational systems and
society in general. Some definitions stress on intentionality whereas others insist they occur
haphazardly (Bailey, 1999 : 3). In this subsection, a discussion of these definitions and how
13
Many applied linguists have indicated that the concept washback is rarely found in
language dictionaries. The few available definitions can be found in dictionaries such as 'The
Apart from these two examples, a meticulous research in other common language dictionaries
in English has shown that there does not exist an explanation or indication of the term
Unlike the rare definitions found in language dictionaries, a great number of other
definitions of the concept washback is present throughout the published assessment research
and literature with various meanings. In a paper on testing listening comprehension, Buck
(1988) describes the apparent effect of Japanese University entrance examinations on English
There is a natural tendency for both teachers and students to tailor their classroom
activities to the demands of the test, especially when the test is very important to the
future of student, and pass rates are used as a measure of teacher success. This
influence of the test on the classroom (referred to as washback by language testers) is,
of course important; this washback effect can be either beneficial or harmful (17).
In this sense, Buck's definition emphasises the importance of what teachers and
students do in classrooms.
applied linguistics. For him, this term refers to the extent to which the interaction and the use
of a test influence language teachers and learners to do things they would not otherwise do
that promote or inhibit language learning. Shohamy (1992) also focuses on washback in
terms of language learners as test-takers when she describes, “the utilisation of external
language tests to affect and derive foreign language learning in the school context” (15). She
points out that “this phenomenon is the result of the strong authority of external testing and
14
the major impact it has on the lives of test takers” (ibid). To corroborate this belief, Shohamy
cites the example of the introduction of an oral test proficiency based on an interview in the
Unites States. She says that this example involves “the power of tests to change the behaviour
Bailey (1999) refers to Shohamy who summarized four key definitions that are crucial
1. Washback effect refers to the impact tests have on teaching and learning.
2. Measurement driven instruction refers to the motion that tests should drive
learning.
3. Curriculum alignment focuses on the connection between testing and the teaching
syllabus.
4. Systemic validity implies the integration of tests into the educational system and
the need to demonstrate that the introduction of a new test can improve learning.
Cheng (2005) converged to some extent with Shohamy's (1992) ideas. She relies on
Pearsons (1988) to show the influence of external examinations on the attitudes, behaviours,
and motivations of classroom teachers, learners, and even on other broader, related areas of
this research concept. Similarly, Cheng (1997) introduced the concept of intensity, “the
degree of washback effects in area or a number of areas of teaching and learning affected by
an examination” (43). Cohen (1994) also takes a broad view. He describes washback in terms
of “how assessment instruments affect educational practices and beliefs” (41). In the same
vein, Pierce (1992) suggests that “the washback effect, sometimes referred to as the systemic
yielding an accurate definition of the concept washback in its boarder sense, there is a crucial
15
need to clearly display the distinction between washback and the other confusing concept
impact in language testing. On this matter, Tsagari (2006), in a study on washback, explores
the relationship and/or distinction between these two major concepts: washback and impact.
He argues that the common view which prevails in the field of language assessment considers
washback as one dimension of impact. The latter is often used to describe effects on the
wider educational context. Tsagari goes back to Wall (1995) to discuss in some detail the
existing relationship between washback and impact. The latter suggests that “washback is
frequently seen to refer to the effects that tests may have on teaching and learning, whereas
impact deals with the effects that tests may have on individuals, policies, and practices,
within the classroom, the school, the educational system, or even societies as a whole” (16).
Following this interpretation, Tsagari (2006) recognizes that this view intersects to a great
extent with other writers' explanations, such as McNamara (2000) and Shohamy et al. (1996)
These ideas are revisited in Bachman and Palmer (1996 : 35). They note that washback,
however, is a more complex phenomenon than simply the effect of a test on teaching and
learning. Instead, they feel that the impact of a test should be evaluated with reference to
contextual variables of society's goals and values, the educational system in which the test is
used, and the potential outcomes of its use. They referred to these uses at two levels:
a micro-level, in terms of individuals who are affected by the particular tests uses,
Bailey (1996) adopts “a holistic view on washback, but prefer to consider overall
impact in terms of ‘washback of learners’, and ‘washback to the program’, counsellors, etc.”
(263-264). For Cain (2005), although the two forms of washback and impact are used in
many cases, interchangeably, the test impact more accurately refers to the wider implications
16
and effects of a given test use. Andrews (1994 : 37) writing on washback, appears to
The term washback is interpreted broadly ... washback refers to the effects of tests on
teaching and learning, the educational system, and the various stockholders in the
educational process, whereas the word process is used in a non-technical sense, as a
synonym of effect.
Hawkey (2006) comments on the fact the concepts washback and impact are often
logical location,
definition scope,
intentionality,
complexity,
direction,
intensity, emphasis,
Alderson and Wall (1993) also discussed the notion of washback, and tried to identify
what washback was. The authors reviewed the concept as it has been presented by language
specialists up to that time. They concluded that the concept was too vaguely defined to be
useful and much of what have been said and written about this concept had been based on
assertion rather than empirical findings. As response to this claim, they presented a number of
'washback hypotheses', which were meant to illustrate some of the effects that tests might
have on teaching and learning. They argued that test developers should specify the types of
impact that they wished to promote and the kinds of effects test evaluators should look for
when deciding whether or not the desired washback has occurred (Wall, 2005: 51). The
17
washback hypotheses they presented stated:
11. A test will influence attitudes to the content, method, etc. of teaching and
learning
12. Tests that have important consequences will have washback, and conversely.
13. Tests that do not have important consequences will have no washback.
15. Tests will have washback effects for some learners and some teachers, but not
for others.
Alderson and Wall proposed these hypotheses as a result of their own extensive work in
Sri Lanka and reviewing case studies conducted in Nepal (Khaniya, 1990), Turkey (Hughes,
After reviewing numerous definitions of the concept washback, it is evident that this
concept is open to a variety of explanations and that there are number of variables one needs
to consider when conducting research on this subject. Crucially, what comes out from this
discussion is that washback can be defined according to two major scopes. One following a
18
narrower definition which focuses on the effects that a test has on teaching and learning, and
the other following a wider and a more holistic view of washback that transgresses the
classroom to take into account the educational system and society at large, which can as
noted earlier in this section would be more accurately referred to as test impact. In this
connection, Hamp-Lyons (1997) summarises the situation and the terminology well. She
finds that Alderson and Wall's limitation of the term washback to influence on teaching,
teachers, and learning seems now to be generally accepted, and the discussion of the wider
influences of tests is considered under the term impact, with the term used in wider
Palmer's definition, which refers to issues of test use and social impact as 'macro' issues of
impact, while washback takes place at the 'micro' level of participants, particularly teachers
It is not within the scope of the present study to look in details at the wider implications
of testing. Rather, in the context of this research, and since this work is exploratory in nature,
the washback adopted will be primarily concerned with the area identified by Alderson &
Wall (1993) and Bachman & Palmer (1997), i.e., ‘washback to teaching’ and ‘washback to
learning’. In other words, the researcher will adopt the narrow definition of washback
focusing more on the washback at the micro-level that investigates the effects of a suggested
ATM on teachers and students. Besides, he will try to be consistent in the use of the two
terms ‘washback’ and ‘impact’. In this respect, he uses ‘washback’ to cover influences of
language tests on language learners and teachers, language learning and teaching processes
and outcomes. In the same vein, he uses ‘impact’ to cover influences of language tests on
19
1.2 Types of Washback
Assessment studies have indicated that washback very often implies movement in a
particular direction. This movement in a particular direction is an inherent part on the use of
Besides, washback has also been perceived as bipolar- either negative (harmful) or
positive (beneficial). Messick (1996) cites Alderson and Wall's (1993 : 17) definition of
washback as the “extent to which a test influences language teachers and learners to do things
they would not necessarily otherwise do that promote or inhibit language learning” (Messik:
241). They add that “tests can be powerful determiners, both positively and negatively, of
Following this line of argument in this sub-section, regarding the two types of
washback, the tremendous impact and power of testing on teaching and learning in schools
and whether this washback exerts a positive or negative influence will be discussed in some
detail.
teaching and learning. Alderson and Wall (1993) point out that:
In this case, such tests will lead to the narrowing of content in the curriculum, instead
of covering a definite content from what has been learnt in class. For Vernon (1956),
20
“teachers tend to ignore subjects in activities that are not directly related to passing
examination, and testing accordingly alter the curriculum in a negative way” (18). Once
again, it is logical that those tests may fail to create correspondence between the learning
principles and/or the course objectives to which they should be related (Cheng, 2005 : 8).
More dangerous, negative washback can substantially reduce the time available for
instruction, narrow curriculum offering, and modes of instruction, and potentially reduce the
capacities for teachers to teach content and use methods and materials that are incompatible
with useful testing instruments (Smith, 1991 : 120). Madaus (1988) intersects with the above
One strong impression that resulted from negative washback is that an increasing
number of coaching classes are set up to prepare students for examinations, but what students
will learn are test- taking skills rather than language activities (Wiseman, 1961 : 21). In such
a learning context, an atmosphere of high anxiety and fear of test results become current
among teachers and learners (Shohamy et al., 1996 : 9). For Shohamy, teachers will feel that
success or failure of their students is reflected on them, and they speak of pressure to cover
the materials for the examination. When the students know that one single measure of
performance can determine their levels, they will less likely take a positive attitude toward
learning.
There are other testing researchers, on the other hand, who have seen washback in a
more positive way (Andrews, Fullilove and Wong, 2002; Bailey, 1996; Davies, 1985; Hsu,
2009). Those researchers strongly believe that it is possible to bring about beneficial changes
21
in teaching by changing examinations, representing the positive washback (Cheng &
Watanabe, 2004 : 10). This phenomenon refers to tests and examinations that influence
teaching and learning positively (Alderson & Wall, 1993 : 15). In a broad sense, good tests
positive teaching-learning process (Pearson, 1988 : 7). Andrews et al. (2002) suggest
language testing. For instance, an oral proficiency test was introduced in the expectation that
it would promote the teaching of speaking (Hsu, 2009 : 49). Davies (1985) considers that “a
creative and innovative test can advantageously result in syllabus alternation or even in a new
syllabus” (18). In this sense, a test no longer needs to be only an obedient servant; rather, it
Nevertheless, in educational settings, things sound a little bit different as one may think
of in that assessment researchers have come to realise that there exists a set of conflicting
positions towards washback in language testing. That is, most of these experts claim that
there is no clear consensus among practitioners as to whether certain washback effects are
negative or positive. One justification to this conflicting situation is that potentially positive
or negative nature of the test can be influenced by many contextual factors (Hsu, 2009 : 9).
Alderson and Wall (1993 : 117-118), commenting on this particular case, posit that the
quality of the washback effect might be said to have beneficial or detrimental washback.
They add that whatever changes educators would like to bring about in teaching and learning
Therefore, for many testing specialists research into washback may be more fruitful if
this latter turns its attention looking at the complex causes of such phenomenon in teaching
and learning, rather than focusing on deciding whether or not the effects can be classified as
22
positive or negative. According to Alderson and Wall (1993), the best way to realise this is to
investigate as thoroughly as possible the broad educational context in which the act of
assessing is taking part, since the major variables that often affect this act exist within the
education system, and that might prevent washback from appearing. Cheng and Watanabe
(2004) summarise this situation, and note that “if the consequences of a particular test for
which teaching and learning are to be evaluated, the educational context in which the test
takes place needs to be fully understood” (31). This means that whether the washback effect
is positive or negative, this will largely depend on where and how it exists and manifests
washback.
Traditionally, tests used to be given at the end of the teaching and learning processes to
provide an accurate diagnosis of the effects of teaching and learning. Nevertheless, with the
advances and changes made in the field of testing and how this latter is conceived, a test can
also be developed to be used at the beginning or in the middle of the teaching and learning
processes in order to influence either or both processes. This section intends to shed light on
the functions and mechanisms by which washback occurs in relation to other educational
In discussing the functions of language tests through which washback occurs in actual
teaching and learning environments, Wall (2005) referred to a set of reviews of those tests
and influences they could have on the systems they are introduced into. One of these crucial
reviews is the one that was produced by Eckstein and Noah (1993). In its essence, Eckstein
and Noah provided a historical account of the myriads of a number of functions and
influences of some types of tests that displayed appropriately how people over history have
23
usually considered tests as an important tool by which they take the desired decisions for
some targeted purposes. For instance, for the authors, the first documented use of written,
public examination systems occurred under the Han Dynasty in China about 200 B.C. The
main function of these particular examinations was to select candidates for entry into the
government services. In other words, the candidates were used to break the monopoly over
With Eckstein and Noah, the second example of the functions of tests was that one
which sought to check patronage and corruption. A typical example of this function was
Britain where people could gain entry into higher education or the profession of strengths. An
important, direct consequence of this examination was the establishment of a great deal of
public schools, which aimed at preparing students to sit for these examinations. In addition to
this, a third example of functions of examinations, suggested by Eckstein and Noah, was to
encourage levels of competence and knowledge amongst those who were entering
government services or professions. The intention was to design and develop examinations
which reflected the demands and requirements of the target situations; students for those
examinations could have to develop skills which were relevant to the work they hoped to get
in the future.
The fourth function, in this series of examples, was that of allocating spare places in
higher education. At this level, examinations were used as means of selecting the most able
candidates for the available places. This type of examinations is the same to what is referred
to as placement tests in the testing literature in the present time. The fifth function in this
illustration was to measure and improve the effectiveness of teachers and schools. Eckstein
and Noah used again Britain, as an example describing how, at a certain time, the government
examinations through the allocation of considerable funds. The amount of funds that the
24
school received depended on how its students performed. However, the system had serious
unintended consequences and at last had forced to achieve the expected objectives. The last
function, in this set of examples suggested by Eckstein and Noah, was limiting curriculum
differentiation. In Britain, in the nineteenth and the twentieth centuries, there was a
remarkable resistance to the idea of centralised education, and all the schools had the freedom
to decide on their own curriculum and means of assessment. With the establishment of
certificate examinations, these schools had a common target they could aim for, and all these
schools turned to teach the curriculum that can help better in doing well in the examinations
In the modern world, tests are frequently used for accountability within the system, and
in particular for certification of achievement in education. They form part of the procedure
for decisions about the allocation of scarce resources of both systemic and an individual
level. For example, tests in many countries and Algeria is one of them, tests control the
transition between school and higher education, and they may lead to the awarding of a
degree. Tests are also seen as ways to upgrade knowledge and to improve the performance of
institutions (Hsu, 2009 : 62). Through testing, education policy can be rapidly defused and
implemented at relatively low cost (Linn, 2000). Test results that are visible and ideally
measurable can be reported by the media in terms the public can understand and can be used
to show that change has or has not taken place. However, tests are also criticized for exerting
a certain authority and power on both systemic and individual level. But, in spite of the
criticism levelled at them, tests continue to occupy a leading place in the educational system
of many countries.
The series of functions, exposed above, are typical situations where these tests were
used to exert influence on the final outcomes to suit the expected intentions of those who are
in authority to make and impose their policies. As pointed out by some testing specialists, this
25
is an especially common practice in countries with centralized educational systems, where the
taught programmes are controlled by central agencies. Policymakers in these contexts and
countries have used tests to manipulate educational systems, to control curricula, and to
impose new textbooks and teaching methods. In such settings, tests have been viewed as a
primary tool through which changes in the educational system can be introduced without
having to change other educational components such as teaching training and curricular. On
The power and authority of tests and external examinations enable policy-makers to
use them as effective tools for controlling educational systems and prescribing the
behaviours of those who are affected by their results - administrators, teachers,
students and others (239).
Given the status of tests and examinations in public spheres, it seems that it is important
to understand the functions of testing in relation to many facets and scopes of teaching as
mentioned in the examples discussed earlier. The importance of considering these functions
serves as a starting point and also a linking point to get a clear picture of the various
teaching and learning environments. Bailey (1996) cited Hughes (1993) trichotomy to show
how this phenomenon works in different contexts. Bailey points out that this particular
particular to develop a basic model of washback that explains how the various components
that make-up this framework interact to help the understanding of the nature of this subject of
interest. In describing this model, Hughes states that the trichotomy is formed of three parts.
First, the participants who are mainly the people such as classroom-teachers, students,
administrators, materials developers, and even publishers whose perceptions and attitudes
26
toward their work may be a test; Hughes' second component in this framework is termed
process. The latter covers any actions taken by the participants, which may contribute to the
product refers to what is learnt as facts, skills, and other aspects and also the quality of
learning.
Contrary to Hughes who stresses more on the three component that make-up this
model, Alderson and Wall (1993), in their Sri Lankan study, focus on what they referred to as
and Wall argued that there is little evidence provided by empirical research to sustain the idea
The concept is not well defined, and we believe that it is important to be more
precise about what washback might be before we can investigate its nature and
whether it is a natural or inevitable consequence of testing (117).
Consequently, they suggest 15 hypotheses that can aid researchers to illustrate areas in
teaching and learning that are usually affected by washback and can stand as a basis for
further researcher. This set of hypotheses has shown that there exists a strong correlation
between the importance of tests and the extent of washback. Alderson and Wall concluded
that further research is needed and that such research must “entail increasing specification of
the washback hypothesis”. They called on that researchers in the field of language testing had
to take into account to research literature in at least two areas: motivation, performance, and
Following this seminal work realised on washback hypotheses, Wall (1996) followed
up their study and stressed the difficulties in finding explanations on how tests exerted
influence on teaching. She went back to innovation theory and literature to explore the
complex topic of washback. In this respect, she proposed that the research areas that are seen
to be relevant to washback should include (a) the writing of detailed baseline studies to
27
identify important characteristics in the target system and the environment, including an
analysis of the current testing practices (Shohamy, Donista-Scmidt and Ferman, 1996),
(c) formation of management terms representing all important interest groups; teachers,
change in teaching and learning, Hsu (2009) referred to Smith (1991) who investigated an
ELT project and worked on to construct a corresponding model of variables involved with the
aim to introduce the desired change in the teaching and learning processes. In its essence,
smith's model comprises five components of change: the target system, the management
system, the innovation itself, the resources available, and the environment in which change is
supposed to take place. Hsu adds that, on the ground of the same idea, Markee (1997)
illustrated through another study how change might occur on larger subjects such as
curricular through following stages which are to design, to implement, and finally to
maintain. In this respect, Markee suggested a framework that was based on the composed
questions that were posed by Cooper (1989) and which referred to: who (participants), what
(product), where (the content), when (the time, duration), why (the rationale), and how
In two other studies, Fullan with Steingelbawer (1991) dealt with the issue of washback
effect but in its broader uses. They discussed the effects and changes of tests on schools and
came to identify two main recurring themes: first, a washback effect should be seen as a
process rather that an event. Second, all participants who are affected by this phenomenon
have to find their understanding of what washback effect is. Cheng (2004) made this last
point clearer. He explained that according to Fullan teachers work on their own with little
28
reference to experts or consultation with colleagues. Thus, those latter are usually forced to
make on -the-spot- decisions, with little time to reflect on their better solutions. The other
problem they often encounter in this context is that they are always unable to accomplish
what they prepared to do. Consequently, their lives can become very difficult, indeed. This
reality can explain why intended washback does or does not occur in teaching and learning.
In other words, this means that, if educational change is often imposed upon teachers and
2000 : 4).
corroborated the fundamental relationship between the design of given tests and their positive
or negative impact and power on teaching and learning. However, it is worth noting that the
outcomes of these studies, even if they have contributed in advancing research into the
domain of washback in language testing, they remain insufficient to draw a larger and
transparent picture of this issue since a number of raised questions on the mechanisms of
In this section, a number of common empirical research studies into washback of both
language and general education are discussed. This literature review is a summary of detailed
reviews realized by a number of researchers. The latter highlighted the basis of the central
research concept and pointed out useful research methods adopted by the myriads of
researchers to carry out their investigations. Such an elucidation is of a great utility for the
present exploration since it shapes the scope of the study and serves as a guideline for many
relevant issues for further needed research. For ease of reference Table 1.1 provides
background information for the most used studies in terms of the educational context, exam
29
Authors Context Exam Methods
Wesdrop The Netherlands Multiple choice Scores
(1982) language assessment Analysis of tests
and final exams in Teacher and students'
Dutch Secondary questionnaires.
Schools.
30
Watanabe Japan University entrance Questionnaires
(1997) exam Interviews with
students and teachers
Referring back to the above illustrative table, what is ostensible is that most reports
from the various, available research studies into washback indicate that the influence of
washback has been observed on various aspects of learning and teaching, and that this
phenomenon generated was mediated by numerous factors. What is more significant on this
matter is that almost all research projects looking at washback have been carried out in
several different countries and various contexts. Crucially, all these research studies are
participants-washback on feelings and attitudes of teachers and students in the context under
31
exploration. Thus, in order to have a clear understanding of how these research studies were
carried out, it is worth examining several of the most interesting tenets of these research
projects.
secondary school institutions would lead to the impoverishment of the curriculum. He argued
that such a fear could happen because the skills that could not be tested through multiple-
choice would not be practised and hence would eventually completely disappear. He added
that if this happens there would be a failure in the adoption of some teaching methods. Also,
this fact may provoke some changes in the way in that students prepared themselves for tests.
Wesdrop concluded that his study revealed that there are no differences between teaching
practices and students' preparation methods. He concluded with the assumption that after all
the so-called washback effects are a mere “myth”. If they do exist, they must be so weak or
the introduction of a test in English for academic purposes helps to improve English
demonstrated that the basic aim of this test was to devise a new proficient test as the sole test
by which students could get access to undergraduate programmes. The intended test was
developed after completing a needs analysis. It comprised sections on three major skills:
listening, reading, and writing. Hughes, after introducing this test, noted that, “for the first
time, the foreign languages schools teachers were compelled by the test to consider seriously
just how to provide their students with training appropriately for the tasks which they could
face them at the end of the course” (44). Hughes concluded that the washback effect occurred
as a result of the incorporation of this test, with great changes in the materials used in the
32
Likewise, Li (1990), in another washback study examined high-stakes examination
taken by Chinese students at the end of secondary school, the so-called the matriculation
English test-MET-. He pointed out that this test was first introduced in 1984 to replace an
earlier examination, which was so weak and lacked considerable validity and reliability. Li
aimed at identifying whether or not the introduced test would lead to better results or not, in
comparison to the previous adopted examination. After a period of time from the
incorporation of this new test, Li came to the conclusion that the teachers' and students'
attitudes towards the MET were positive. In this respect, he wrote “tests are able to subjugate
the minds of millions of people to the thraldom of forced memorisation, but, we would say it
is a greater kind of power to be able to liberate people's minds from such thraldom” (42).
From the short discussion of these three research projects into washback illustrated
above, the common view is that the very few empirical studies that were carried out prior to
the 1990's period indicated that little effect and impact of tests on learning and teaching was
found on classroom assessment. Such a result would be explained by the fact that these
studies into washback research failed to construct a definite washback model that would take
into account the array of factors which may place a part in determining why teachers react in
In brief, the washback studies that prevailed in this particular phase succeeded to
provide a clear definition of the concept washback, some guidelines about how to achieve
positive washback, and a few references to the effects of tests on the contents they had been
introduced into; but, in the meantime, there were few detailed accounts of specific attempts to
innovate through testing. Most of the research was based on questionnaires or on test results
33
Contrary to the preceding period of time, starting precisely from 1993 the scenes of
language education in general and testing in particular have known a significant increase in
empirical studies on washback effects. A great deal of language testing researchers recognise
that the recent research projects have led to a more detailed understanding of the phenomenon
in the domain of language education and of the factors which contribute to it.
Wall and Alderson (1993), for example, examined the effects of the new O-level
examination on English teaching and learning in secondary schools in Sri Lanka. They
emphasized that, by the time the study was published, it was the only investigation that
included classroom observation as one of its research methods. Wall (1996) summarised the
The examination had had considerable impact of English lessons on the way
teachers designed their classroom tests, but, on the other hand, this examination had
had little to no impact in the methodology they used in the classroom or on the way
they marked their students' performance (348).
Wall and Alderson found that the potential factors which impeded teachers from using
In another context, Lam (1993) examined the New Use of English (NUE) examination
in Hong Kong. Lam through this study, attempted to find out clear answers to a set of some
raised questions, such as whether the amount of time that schools allocated to the teaching of
English language is sufficient, the schools set aside special time to prepare for one particular
section of the examination, how the attitudes and abilities of their students are, how the
quality of English textbooks is, and how the content of the teaching and the students'
performance are. He concluded that it is worth noting for the examination designers to take
into account how different factors in the context where this examination occurs might interact
with one another to yield the appropriate intended results and a clear picture of the expected
examination.
34
Andrews' (1994) Hong Kong study, was about the development of the RUE -the
Revised Use of English- test to measure students’ oral performance in Hong Kong. In order
to see the degree of efficiency of this newly, developed test, Andrews conducted a study
using two parallel questionnaires to the working party members and teachers with three
groups of candidates. The results of this investigation were that there was not one definite
conclusion to the washback effect of the designed oral tests; rather, Andrews remarked that
the final outcomes indicated that the nature of washback varied across the three groups: only
a small improvement in performance between the first and the second group was ostensible.
These results led the researchers to conclude that the washback effect of the test was delayed.
For this reason, the findings of Andrews' study suggested to re-use the test in a second year to
Alderson and Hamp-Lyons (1996) found, in their study of washback of the test of
English as a foreign language (TOEFL) on preparation courses, that “this particular test was
seen to have a more direct washback effect on teaching content than on teaching
methodology”. The researchers employed three different types of data: interviews with
students in groups, interviews with teachers (both individuals and groups), and field notes
and audio-recordings during classroom observations. Like Watanabe (1996), Alderson and
Hamp-Lyons observed “two different teachers while they taught both TOEFL preparation
classes and courses. This particular design permitted Alderson and Hamp-Lyons to compare
TOEFL preparation to non-TOEFL preparation classes” (cited in Bailey 1999 : 32). The
authors concluded that the amount and type of washback which occurred depended on
The status of the test, the extent to which the test is counter, the current practice, the
extent to which teachers and textbook writers think about appropriate methods for
test preparation, and the extent to which teachers and textbook writers are willing to
and able to innovate (296).
35
A similar research design was used by Watanabe (1997), who investigated the
university entrance examination in Japan through two different types of data collection
methods: questionnaires, and interviews with teachers and students. Watanabe found that all
the textbooks used by the teachers observed consisted of past exam papers and materials. In
addition, the results showed the presence of grammar translation questions on a particular
university entrance examination did not influence the teachers in the same way in that some
teachers were affected by these exams, and others were not. Watanabe identified that “three
possible factor that might promote or inhibit washback to teachers: (1) the teachers'
educational background and/or experience; (2) differences in teachers' beliefs about effective
teaching methods; and (3) the teaming of the researcher's observation” (cited in Bailey 1999 :
23) . Thus, Watanabe concluded that "teacher factors may out weight the influence of an
examination in terms of how an exam preparation courses are actually taught" (ibid).
Moreover, he noted that school cultures might influence the degree of washback in that “a
school positive atmosphere which encouraged students to interact with authentic language
corroborate the idea of whether the modified Hong Kong Certificate of Education
Examination (HKCEE) taken by most secondary graduates brought about the positive
washback on teaching that was intended. In this study, Cheng used questionnaires,
interviews, and observations during the first year after the announcement and discovered that
the newly introduced examination was having considerable influence to the 'what' teachers
teach, and not to the 'how' they teach (Wall, 2005). In other words, the charge of the
examination would change teachers' classroom activities, but it did not change teachers'
beliefs and attitudes about teaching. Cheng suggested that, “to change the how ... genuine
changes in how teachers teach and textbooks are designed must be involved. A change in the
36
examination syllabus itself will not alone fulfil the intended goals”
In a study that converged with the findings found in Cheng's work, Read and Hayes
(2003) attempted to measure the learning outcomes through an investigation conducted on the
IELTS in New Zealand. The researchers used four data collection tools: interviews,
questionnaires, observations, and pre- and post-English test. What was particular in this study
was that the researchers had two small groups of 17 students. Those latter took retired version
of the IELTS exam as a pre- and post-tests to two ILTS courses (intensive and general). Like
the final result obtained in Cheng, Read and Hayes study did not show any significant
improvement overall, nor between the groups of students. The researchers concluded that
by washback effect, Ferman (2004) examined the influence of the introduction of an oral test
on learners' achievement. The author found that average ability level students were
significantly different from other students: their anxiety level was the lightest and they were
not adversely affected by potential failure in the test. For that reason, Ferman concluded that
in order that washback could occur, it is important to consider the individual differences
among learners.
Like Ferman (2004), Gosa (2006) sought to identify possible washback effects that took
place inside and outside classrooms as experienced by her students in Romania. Gosa used
students' diaries to analyse whether or not the students’ study environment was affected by
test washback. Adopting that particular method, Gosa recognised that the individual
differences among the learners and the environment where they operate need to be considered
to see if an exam might exert the expected effects, attitudes; perceptions, beliefs, learning
styles, and anxiety should be taken into account when trying to promote positive washback as
they are likely to interact with the test, and hence intervene in the washback process.
37
Qui (2004) conducted a survey to examine the impact of the National Matriculation
English Test (NMET) in China. This investigation focused on the main function used to
select students for higher education. The obtained data revealed that the NMET has a
considerable impact on materials and learning activities, but not the type of the intended
results set out at the beginning of this survey. Qui concluded that one of the reasons for this
was that teachers failed to teach students the required skills that are supposed to be an integral
part of instructional objectives. Instead, these teachers felt more pressured to work only for
Wall and Horâk (2006) examined the impact of the changes of the TOEFL on teaching
and learning in preparing students to take the test from teachers' point of view in central
eastern Europe. Wall and Horâk used interviews over the period of five months to detect the
degree of answers of teachers on the changes in the TOEFL test. They observed that there
was indeed a certain awareness, but it grew very slowly. Nevertheless, the two researchers
found that there was a positive impact toward the introduction of a speaking test and the
integrated writing skill. They concluded that the availability and quality of the information
about the test and test preparation materials would be a major source contributing to teachers'
Therefore, from the above literature review on washback effects a number of findings
have emerged with regard to this phenomenon and the ways in which it can be investigated.
First of all, the review of the literature showed clearly that washback is broad and
multi-faceted and can be brought through the agency of many independent and intervening
variables besides the exam itself. As far as washback is concerned, one can see now some of
the factors which seem to have affected the form that washback can take included teachers
and students factors such as beliefs, attitudes, experience, education, training, personality, the
38
status of the subject to be tested, resources, classroom conditions, management of practices in
the schools, communication between test providers and test users, and even the socio-political
context which the test is put to use. In addition, what stands out clearly is that to carry on
interviews, group discussion, questionnaires, analysis of participants' diaries and their talk in
the context under exploration. All these methods and instruments aimed to examine the
As was seen in this review of literature, the majority of research studies into washback
tried to report the effects of examinations on the teaching content (Alderson & Hamp-Lyons,
1996; Read & Hayes, 2003; Wall & Alderson, 1993). Some results have indicated that tests
altered teaching methods and materials, but others have shown that the tests had limited or no
impact on either (Alderson & Hamp-Lyons, 1996; Cheng, 1997; Wall & Alderson, 1993).
Washback may also be differential, it occurs with some teachers, but not with others
(Alderson & Hamp-Lyons, 1996; Watanabe, 1996). Besides, most of the analysed data
revealed that tests have a superficial impact on students learning, and these individual
learners like teachers have experienced this influence in different ways, with the potential for
considerable impact in terms of effective factors and teachers' behaviours (Cheng, 1997;
Ferman, 2004). The differences between the degree of washback on teachers and students
have raised questions about the extent to which washback to teachers can be assumed to be
At the end, on the ground of the literature review dealt with above, there has been an
evidence displaying most of the common and available empirical studies that investigated
washback in language testing on learning and teaching but at large-scale proficiency tests. In
39
other words, this implies that a study of washback on assessment in a classroom context is
still not well explored. This reality is expressed by McNamara (2000), too much language
testing research is about high-stakes proficiency test, ignoring classroom context, and
focusing on the use of technically sophisticated qualitative methods to improve the quality of
tests at the expense of methods more available to non-expert. Hence, what is required to
overcome this shortage is to realize a study of washback on learning and teaching for
classroom tests, and the influences of these tests on the teachers' and students' behaviours and
attitudes. This objective becomes the priority of any washback investigation, and this is
actually the main argument on which the present study in this thesis rests.
implication of this concept to educational innovation. The aim of this section is to highlight a
possible framework that can help test developers to judge whether their innovations (the
testing they are developing) are likely to have the impact they intend them to have. In the
literature, it was argued that to understand the nature of washback it is also important to take
account of findings in the research literature in the area of innovation in language and change
in educational innovation.
A great deal of applied linguists assert that there has been a well-established tradition,
which led to the realisation of a number of networks that served to yield the most elegant
compilation of ideas about the different phases in the innovation process at the factors at
work in every phase (Rogers, 2003; Fullan, 2007), and an increasing body of literature
focusing on the English teaching context (Henrichsen, 1989; Kennedy, 1990; Markee, 1993,
White, 1993; Li,2001). Crucially, what is most remarkable with these research studies is that
they succeeded to some extent to make clear for readers the complexity of the innovation
process, and the factors which inhibit or facilitate successful diffusion and implementation.
40
Because it is not possible to cover all of innovation theory in a single action, the
discussion will be limited to the ideas which are relevant to the present study. The
particularity of what is going to be displayed in this section is that ideas are arranged in a
certain way in order to help readers find a link between innovation and washback in
education. The researcher looks first at the term innovation and what it implies as a specific
concept in relation to washback in language testing, then he considers what distinguishes this
term from the other types of change needs to be considered. This will be followed by a
discussion of the process of innovation and the sense of change, for the individuals who are
most affected by it. This section concludes with the provision of several models of
Henrichsen (1989), which served as the starting point for the analysis of data in the present
study.
The first question that needs to be answered is what the term 'innovation' refers to.
Following wall (2005: 60), Rogers (1985) defines innovation as an “idea, practice, or object
that is perceived as new by an individual or other unit of adoption”. For Rogers, he sees that:
It does not matter whether the idea is objectively new (in terms of the amount of
time that has passed since its discovery or invention), but rather whether it is felt to
be new by those who may be adopting or using it (ibid).
In Hsu (2009), innovation can be usefully defined as a planned and deliberate effort,
Hsu makes this idea more explicit. He advocates that educational innovation is the result of a
number of problems that a given educational system can present such as failure in students'
accountability reporting. What is worth noting about these problems is that they also
transgress to touch some aspects of educational systems that concern systematic attempts by
41
some authorities to change educational policies and practices with the intention to achieve
Some other researchers make a distinction between innovation and other types of
change. For Wall (2005: 60), who cites White (1993), “the difference has to do with
intentionality: while 'change' is any difference that occurs between time one and time two, an
'innovation' requires human intervention” (244). For Miles (1964) “innovation is a deliberate,
novel, specific change, which is thought to be more efficacious in accomplishing the goals of
a system” (cited by White, 1988: 211). Nevertheless, there is another view borne by some
other researchers who use the terms as synonyms. Many of them argue that if they believe
that the distinction is a valid one, they use them interchangeably to avoid any sort of
confusion or ambiguity, the researcher prefers to opt for the view that regards that these two
terms bear the same sense since the efforts required to launch innovation in this research are
so high, and the challenges go beyond the discussion between the two concepts.
provided by Markee (1987) are said to be the most comprehensive. Markee recommends that
to understand why their attempts to innovate meet with success or failure. In other words, this
means that specialists need to be aware of the matters and findings reported by educators and
vice versa. For Markee, this approach will not only provide language educators with a
coherent set of guidelines and principles for the development of their own innovation
evaluation, but will also apply them with criteria for retrospective evaluation of the extent to
which these innovations have actually been implemented (cited in Wall, 2005 : 61).
What comes out from this brief discussion on the right sense of the concept
‘innovation’ is that it is deliberate, intentional and also planned. With regard to this
definition, it is apparent that the other concept ‘washback’, the central research concept in
42
this study, should be conceived with a meaning that overlaps to a large extent with this
definition of innovation in order to bring about changes and improvements to the teaching
and learning processes. Obviously, this is what it is intended by the innovation of a newly,
testing system in the context under study in this thesis. In what follows, more understanding
about the process of innovation and what it encompasses, and how it is adapted to this
The concept of innovation has been so far defined. The next step in this section is to
synthesise the process of innovation through the discussion of four major innovation views
bring about their relevant ideas together: a basic model for innovation by Rogers (1995), a
comprehensive survey of innovation by Fullan (1991), a set of principles that are seen to
guide an innovative act by Markee (1997), and Henrichsen's (1989) innovation model. The
latter is discussed in some detail because of its relevance to language education and to
with the success of implementing innovation. One of the most cited set of attributes is
perhaps that proposed by Rogers (2003). For Rogers, there are five attributes that compose
his model, which are relative advantage, compatibility, observability, trialibity and
complexity. The first attribute, as Rogers posits, is about the answer to the question that is
first raised on the persons whom are mostly affected by innovation. In addition, relative
advantage represents the perception that those persons have on the innovative act. It is
believed that the greater an individual perceives the relative advantage to be, the quicker it
43
will be adopted. Hsu (2009) refers to Rogers to make this point more explicit:
congruence between the innovation and the existing values, past experiences and perceived
needs of those who are expected to adopt this innovation. For Rogers, it is clear that if there is
a high degree of compatibility between the innovation and the standard norms and values of a
system where innovation is to occur, the act of innovation is going to happen rapidly.
Contrary to this attribute, the third one, complexity, is proposed by Rogers to display the
innovation is seen by its adopters to be difficult, it will be difficult to diffuse and adopt
(Rogers, 2003).
The next attribute in this series of Rogers' model concerns mainly the issue of triability.
The latter refers to the extent to which a prospective adopter could try out an innovation
before its adoption. For Rogers, an innovation that is triable represents less uncertainty to the
person who is considering for its adoption, as it is possible to adopt the innovation a little at a
time rather than all at once and to learn by doing. The final attribute, observability, pertains to
the adopter's ability to actually see the innovation being used by others. For Rogers, if an
innovation is observable, it is easy to adopt and defuse (Hsu, 2009 : 70). What is significant
about this model is that it claims that considering these five attributes makes it easy for the
innovation to be adopted and rapid for its diffusion Among these five attributes, innovation
researchers believe that relative advantage and compatibility are of a great value, and very
44
1.5.2.2 Fullan's View
rather than an event. For Fullan (2007), the innovation process is identified through three
worth noting on these stages is that Fullan recognised that it is not possible to predict how
events in one phase will influence those in the others, or how long it will take for one phase
Referring back to these three basic stages in Fullan's model, these are defined as
follows:
1. The 'Initiation' stage: it is the process that occurs between the first appearance of an idea
for a change and the time when it is adopted. In this stage, Fullan proposes to ask a set of
questions to see whether or not the idea to be adopted is worthy. These questions include:
2. The 'Implementation' stage: it is the process of putting into practice an idea, programme,
or a set of activities and structures new to the people attempting or expecting change.
Fullan insists on the assumption that there should be a definite consideration to the factors,
which are important in this stage. This particularity includes three aspects: the
characteristics of the local context (the district, the community, the principal and the
45
teachers), and the characteristics of external bodies such as (government and ministries)
educational system, or whether it fails and is rejected. Like the previous stage, a number of
the degree to which an innovation has been built into the system;
the number of people who are committed to and skilled in the change;
Besides the identification of the three stages of the implementation process, Fullan
(2007) observed that quality in implementing projects needs to be considered; he thought that
period between the decision to initiate and start up is often too short to allow for adequate
quality assurance. Hsu (2009) made this idea more explicit; he posits that “when adoption is
more important than implementation, decisions are frequently made with the follow-up or
preparation time necessary to generate adequate materials” (74). To overcome this problem,
Fullan (2007) proposes that “it is important to attempt substantial change and to do it by
persistently working on multi-level meaning across the system over time” (92). By this,
Fullan shows that innovation is a complicated task to carry out. In order to make it easier for
innovators, it is wiser to break down complex changes into components, which can be
46
summarize and display the relevance of ideas from innovation theory. In the diffusion of
professionals:
To understand the factors that affect the design, implementation, and maintenance of
innovation. His frame work is based on the questions that were posed by Cooper
(1989): these include questions such as, who adopts, what, when, why, and how (118).
In terms of who, Markee, based on Fullan (1982), associates the who to the teachers.
He sees that those latter are key players in the language teaching innovation. Though the
teachers are different from one context to another, they tend to assume the role as
implementers who carry out the process of innovation on the ground. Markee also reported
that Kennedy's (1988) study shows the individuals in this process of innovation may
transgress to concern often people other than the teachers, mainly those who are seen to play
the role of deciders. These are individuals such as Ministry of Education, directors of schools,
and general inspectors. The other part in this process are students, who are the clients. All of
those individuals form the community that the innovation act targets.
In the course of the implementation process, the potential adopters, drawing from the
studies by Rogers (1983) and Rogers and Shoemaker (1971), should pay attention to basic
desired objectives, which is fundamental and which is planned and deliberate” (cited in
47
Chinda (2009), based on Markee's interpretation of the what, sees this latter:
Here, Markee addresses two issues he felt were missing in Nicholls' definition. These
mainly are: the notion of fundamental change and the question of whether innovations need
In terms of where, citing Cooper (1989), Markee says that “where in an innovation is
implemented or is a socio-cultural, not a geographical issue” (55). Markee also stresses the
importance of understanding the context where innovation takes place. In general terms, the
context here refers to a social and cultural context where many factors, such as cultural,
ideological and socio-linguistic are involved and currently affect it. Markee cites Kennedy
(1988) who gives the name of 'sub-systems' to these factors displayed in Figure 1.1.
Figure 1.1: The Hierarchy of inter-relating systems in which an innovation has to operate
In terms of when, Markee discusses the rate of diffusion. He points out that this rate
may vary from one type of innovation to another. He also adds that “the diffusion process
tends to begin slowly and then accelerates to finally shaken” (58). Besides, Markee thinks
48
that innovation takes time to implement and always takes longer to implement than expected.
In order to grasp this idea in the diffusion process of an innovation, it is appropriate to refer
to Rogers (1995), who explained the rates of diffusion in the form of S-shaped curve (Figure
1.2). In this version, Rogers claims that most innovations follow the same pattern. First, the
rate of diffusion is slow in the beginning, but then after the adoption of the rate by
individuals, the rate accelerates. This is indicated by the step climb in the curve.
Figure 1.2: The Rate of adoption of an innovation (The S-shaped diffusion curve)
Rogers (1995) makes clear what the attributes of the rate of diffusions are. In his words,
he states that:
the rate on adoption is usually determined by five types of variables: the attributes of
innovation, the type of innovation, decision, communication channels operating in
the environment, the nature of the social system, and the extent of the agents (cited
in Wall, 2005 : 74).
which can facilitate or hinder innovation. About the characteristics of adopters, Markee refers
to Rogers (1993) and emphasizes on the assumption to consider five categories. These were
discussed with some detail under the who. He adds that giving too much importance to the
characteristics of adaptors make it possible that the agent would be more convinced about the
adoption of the innovative act. On this particular point, Rogers used the phrase ‘audience
49
segmentation’ to talk about the various communication channels or appeals that are used to
target different categories of adopters (Rogers, 1993, cited in Wall, 2005 : 74). The second
factor that was discussed by Markee is about the features of successful innovation. Rogers
(1995) stresses that it is the adopters perceptions of the features which decide whether to
adopt or reject the innovation. Drawing on this assumption, Markee points out five
fundamental crucial attributes to adopt or reject the innovation. These attributes were
Finally, in terms of how, Markee (1997) describes five different approaches to affecting
communication: individuals belong to one or more network and information about spreads
and colleagues interact with others in their own social grouping” (62-63).
2. The research-development, and diffusion model: “it assumes that research, long-term
planning and specialist teams working on different aspects of development can ensure
and innovation will be adopted and from it should take, and pass their decision on the
4. The problem-solving model: “it is a ‘bottom-up’ model, where it is the potential users of
innovation who decide whether there is a need for change. They identify possible, trial and
evaluate them, and repeat the process until they reach satisfactory outcomes” (67-68).
5. The linkage model: “it is corporate features from the social interaction, problem-solving
models, and which acknowledge that different approaches should be used in different
50
1.5.2.4 Henrichsen's View
In congruence with the previous illustrated innovation views discussed earlier in this
section, Henrichsen's (1989) model suggests a full understanding of the diffusion and
itself, but also an in-depth examination of (a) the role of the change agent (eg. policy makers,
deciders of innovation,), (b) the role of the adopted (eg. teachers and students), (c) the various
stages of the innovation diffusion process (eg. decision making, adoption, implementation,
diffusion), and (d) the local constraint which reformers operate. In other words, it is crucial to
understand the context where innovation would occur and take part, the length of time that is
required for successful innovation, and the factors that are present in the context where the
different components of the innovation process, innovators will find it very complicated to
In this respect, from his attempt in diffusion innovations in English language teaching,
1. The ‘antecedents’ component of the model focus on the significance of the set of
On this point, Henrichsen insists on that those who postulate for an innovation must be
aware of the characteristics of the intended 'user system', the characteristics of the 'users',
traditional pedagogical practices, and the experience of the pervious reforms before they
decide on the suitable innovation to be carried out. The characteristics of the 'user system'
correspond to the structure and power relationships in schools and society. The
characteristics of the intended ‘users’ of the innovation process include the used attitudes,
values, norms, and abilities. Traditional pedagogical practices consist in deriving from a
51
variety of cultural and historical influences. Finally, the experiences of the previous
reforms will provide an understanding on how to achieve the goal or how to overcome the
2. In the ‘process’ component of the model, Henrichsen describes and analyses the factors
which stand as facilitators and/or hindrances to change; he lists the factors as follows:
3. In the ‘consequences’ component, the hybrid model provides different types of the
innovation decisions and outcomes. In this section, Henrichsen describes how a decision
to adopt or reject an innovation can be changed at a later stage; he also describes the types
Besides, he labels the types of outcomes that can be in mediate or delayed manifest or
latent, and functional or dysfunctional functions (Wall, 2005 : 83-86, Chinda, 2009 :66-
67).
implications for the present study since the major aim in this research is to repair the myriads
of anomalies present in the current testing system adopted by EFL teachers in the Algerian
secondary schools, and hence seek to implement an ATM; besides, another implication
52
mainly concerns to examine the impact of the intended innovation on those who are
concerned by it in the context to be explored. In this respect, this synthesis of the available
literature review on educational innovation has led us to consider that Rogers (2003) model
has provided a general definition of innovation, its basic characteristics, and its diffusion
process. Fullan (1991) has proposed broad phases of change in innovation, as well as factors
affecting each phase. Fullan, through his model, has made clear that the process of innovation
should be seen as a process rather than as an event, and all the participants who are affected
by this act have to find their own understanding for the change. Markee's (1997) model has
yielded specific perspectives in the domain of innovation in education. The set of proposed
questions have proved to be crucial in making this process true. Finally, Henrichsen's (1989)
hybrid model has offered insights into the fundamental factors affecting the different stages
ostensible that different theories of innovation and change have provided the researcher in the
present exploration useful insights on how one should proceed to implement subjects that are
new for the people concerned by this change and to serve to bring positive outcomes.
Knowing that not all what was discussed above by these theories can be taken for granted, it
is essential to note that there is a dire need to adapt the contents of these models to the subject
and objective of the present study, so that conceiving things in such a way makes these
As has been pointed out at several instances in this review of the literature on
innovation in education, bringing about any kind of change can be extremely long, complex,
and difficult. Research into washback has consistently shown that tests, in many cases, can be
seen as effective and useful levers for innovation in education. This is why the final
implication of this discussion of innovation in education to the current study is that it has
53
enabled the researcher to base his investigation on a clear-founded theoretical background,
and, second, it has also offered the understanding of how systematic the process of
diagnose the factors affecting the different stages of this innovation in this study.
In section 1.3 in this chapter, the functions and mechanisms of washback through which
it is believed to operate were investigated. The present section explores a washback model
arrived at from the set of reviews of different empirical studies in various contexts. Its
fundamental aim is to investigate the effects of the introduction of new tests in this research
and to consider the nature of evidence required to support claims of a washback effect.
The model proposes that the nature of washback from language tests flows from
overlap, the distance between the contents, and the instructional objectives set out in the
relevant, taught syllabus. The greater there is a correspondence between the two, the likely
positive washback becomes. Nevertheless, in this model, washback is not simply a matter of
test design, it is realised through, and limited by, participant characteristics. Participants
perceptions, attitudes, and reactions of test importance and difficulty, and their ability to
accommodate to test demands, will moderate the strength of any effect and certainly the
To provide a structure of this model, the researcher used an adapted model of washback
that is based on a framework that was suggested by Hughes (1993). It is worth noting that this
latter is very common in the literature review on the most current research studies on
washback. The researcher discussed this model with some detail in section 1.4. The choice of
this washback model among many other available models in the language testing area is
manifested by the fact that such a framework is very appropriate to the nature of the present
research. In what follows, the different components that form the adopted and adapted
54
washback model in this study are discussed. However, before proceeding to this, it is
important to distinguish between participants, process, and product, recognising that these
In considering this first component in the adopted framework in this research, one can
point out that the participants' behaviour can either support or override the intended washback
effect of the introduction of the new testing system. As noted by Shohamy et al. (1996), the
results obtained from tests can have serious consequences for individuals as well as
programmes, since many crucial decisions are made on the basis of test results. In this study,
the researcher called for one of these two important sorts of washback, that is, washback on
participants. This idea overlaps, to some extent with, Bachman's and Palmer's (1996 : 30-31)
micro-level of washback. On this point, Bailey (1999 : 12) views that the participants either
teachers or students affected by washback may be influenced by information that a test bears
prior to its administration, or by ‘folk-information’ (such as reports from students who have
taken earlier version of the test). Besides, these participants may also be influenced by several
sources of feedback following the administration of the test. These would include the actual
test scores provided by teachers, feedback from students, and feedback from the teachers to
To access the participants' attitudes in a washback study, the literature review on this
issue presents myriads of means of investigation. Alderson and Wall (1993) point to the
inadequacies of relying on survey data in isolation, but acknowledge that surveys can help to
context can provide access to the world view of the participants. He adds that qualitative
55
interview can also assist the researcher both in the design on more quantitative instruments
The second component in the adopted model is washback on process. Hughes (1993: 2)
defined process “as any actions taken by the participants which may contribute to the process
teachers is needed (Watanabe, 2004; Shohamy, 2001; Turner, 2001; Alderson & Wall, 1993).
Hence, they suggest that both questionnaires responses and interview data will need to be
sustained by another instrument, Cheng (1997), citing Bailey (1999) agrees that “observation
allows for a richer understanding of washback than surveys alone and argues for a
combination of asking through surveys and interviews - and watching through observation”
The last component in the adopted washback model in the current study is washback on
products. Hughes (1993) defines “the products associated washback on products as what is
learnt (facts, skills, etc.) and the quality of learning”. What is notable on this component is
that it is sometimes difficult to untangle it from the two other components that are
participants and process. For Bailey (1989), much of the literature review about participants
and washback describes the various processes participants try to increase. Such processes
include aspects, such as reviewing what one carries on in teaching. Shohamy (1993)
highlights “processes as well when claims that negative washback often brings about under
56
emphasis on the means by which the learner arrives at proficiency” (186). This means to
complicated to measure products. The reasons for the lack on consideration given to tests
include the problem of comparing non-equivalent, often-distant groups and the selection of
alternative outcome measures (Green, 2007: 29). For Madaus (1988), in evaluating outcomes,
it is important to bear in mind the circularity of evaluating test impact through score gains. A
rise in score does not obligatorily imply that there is an improvement in learning. Rather, the
score may mask the reality that there is no positive washback of the test on the final outcomes
of learners. Because of this, and in order to make washback on products actual, in the current
study, the researcher would work on the evidential link between test design issues and test
score interpretation. This can result in gains on the newly testing system.
Having explained the choice of a washback model, highlighted its structure and the
access each of these components in this research, now we turn to dependent variables of the
model arrived at above. This raises how washback can be recognized and gives a clear picture
on its nature. A thorough examination of these dependent variables is needed and will be
presented further through the two fundamental studies: the “Preliminary Study” and the
On this matter, Green (2007) cites Wall and Alderson (1993) who argue that it is
hypotheses suggest predictions regarding content (what), methods (how), rate, sequence,
degree, and depth of teaching and learning as potential dependent variables for investigation.
In the same vein, calling for the same explicitness of these dependent variables, Bailey (1999)
remarks that washback studies can broadly be divided into those focusing on perceptions and
57
those concerning actions. Hughes (1993), whose ideas are the starting point of the washback
model adopted in this research, provides his model that attempts to tackle the dependent
variables as a basis for research which encompasses both perceptions and actions and links
these two variables to learning outcomes. Bailey (1996), who developed this model presents
it in the form of flow diagramme, where the conditions outlined for washback are met. In this
sense, washback will occur to participants, affecting their attitudes towards work.
Participants' attitudes will affect processes, including both what participants do, and how they
do. Processes concern aspects such as teaching materials, teaching, and learning. In their turn,
these processes will influence the product: the content and the extent of learning.
Drawing on the reviewed literature, dependent variables in this study will include the
effects of the prevailing, existing testing system in the Algerian secondary schools on
participants' attitudes and beliefs, the content and methods of teaching and learning and the
students' outcomes, in the form of their scores and self-assessed gains. Likewise, the same
way of conceiving these dependent variables will be adopted with the effects of the newly,
introduced testing system in the context under exploration on participants' attitudes, beliefs,
and reactions to that innovation. Each of these facets to be considered poses its own
challenges for the researcher in this investigation. Therefore, this study aims to explore how
different components in the Algerian educational system reacted when washback was
strategically anticipated to determine the possible areas of washback intensity in teaching and
learning English in the Algerian Secondary schools and to define the interrelationship
58
Conclusion
To summarize, this chapter reviewed a number of issues related to the central research
concept in this research. Crucially, an attempt has been made to elucidate the origins,
literature review on the concept has also considerably helped us to display the power and
authority of tests on the teaching and learning processes, indicate how language tests become
effective ways for influencing educational system, prescribe the behaviour of those who are
affected by their results. Some ideas related to the question of impact of tests on teaching and
learning either in positive or negative sides are still not well explained, and the raised
questions remain without thorough and comprehensive answers. Next to this part, in this
chapter, an array of assessment studies on washback have revealed that a large number of
investigations on this phenomenon are from different perspectives and multiple levels are
available. In the meantime, those studies have shown that a few of them are of empirical
nature, and findings become fewer when research turns to explore the washback effects of
59