0% found this document useful (0 votes)
24 views5 pages

T08R01 (McNamara - 2000 - 3-11)

The document discusses the different types of language tests, including traditional paper-and-pencil tests and performance tests. It covers how language testing has evolved to be less imposing and more focused on assessing abilities rather than catching mistakes. It also explains how understanding language testing is important for both test developers and others working in language education.

Uploaded by

Anna Szollosi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
24 views5 pages

T08R01 (McNamara - 2000 - 3-11)

The document discusses the different types of language tests, including traditional paper-and-pencil tests and performance tests. It covers how language testing has evolved to be less imposing and more focused on assessing abilities rather than catching mistakes. It also explains how understanding language testing is important for both test developers and others working in language education.

Uploaded by

Anna Szollosi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

Testing, testing...

What is a language test?

Testing is a universal feature of social life. Throughout history


people have been put to the test to prove their capabilities or to
establish their credentials; this is the stuff of Homeric epic, of
Arthurian legend. In modern societies such tests have proliferated
rapidly. Testing for purposes of detection or to establish identity has
become an accepted part of sport (drugs testing), the law (DNA
tests, paternity tests, lie detection tests), medicine (blood tests, can­
cer screening tests, hearing, and eye tests), and other fields. Tests to
see how a person performs particularly in relation to a threshold of
performance have become important social institutions and fulfil a
gatekeeping function in that they control entry to many important
social roles. These include the driving test and a range of tests in edu­
cation and the workplace. Given the centrality of testing in social
life, it is perhaps surprising that its practice is so little understood. In
fact, as so often happens in the modern world, this process, which so
much affects our lives, becomes the province of experts and we
become dependent on them. The expertise of those involved in test­
ing is seen as remote and obscure, and the tests they produce are typ­
ically associated in us with feelings of anxiety and powerlessness.
What is true of testing in general is true also of language testing,
not a topic likely to quicken the pulse or excite much immediate
interest. If it evokes any reaction, it will probably take the form of
negative associations. For many, language tests may conjure up
an image of an examination room, a test paper with questions,
desperate scribbling against the clock. Or a chair outside the
interview room and a nervous victim waiting with rehearsed
phrases to be called into an inquisitional conversation with the
examiners. But there is more to language testing than this.

TESTING, TESTING 3
To begin with, the very nature of testing has changed quite rad­ administrator, teaching to a test, administering tests, or relying
ically over the years to become less impositional, more humanis­ on information from tests to make decisions on the placement of
tic, conceived not so much to catch people out on what they do students on particular courses.
not know, but as a more neutral assessment of what they do. Finally, if you are conducting research in language study you
Newer forms of language assessment may no longer involve the may need to have measures of the language proficiency of your
ordeal of a single test performance under time constraints. subjects. For this you need either to choose an appropriate exist­
Learners may be required to build up a portfolio of written or ing language test or design your own.
recorded oral performances for assessment. They may be Thus, an understanding of language testing is relevant both for
observed in their normal activities of communication in the lan­ those actually involved in creating language tests, and also more
guage classroom on routine pedagogical tasks. They may be generally for those involved in using tests or the information they
asked to carry out activities outside the classroom context and provide, in practical and research contexts.
provide evidence of their performance. Pairs of learners may be
asked to take part in role plays or in group discussions as part of Types of test
oral assessment. Tests may be delivered by computer, which may N ot all language tests are of the same kind. They differ with
tailor the form of the test to the particular abilities of individual respect to how they are designed, and what they are for: in other
candidates. Learners may be encouraged to assess aspects of their words, in respect to test method and test purpose.
own abilities. In terms of method, we can broadly distinguish traditional
Clearly these assessment activities are very different from the paper-and-pencil language tests from performance tests. Paper-and-
solitary confinement and interrogation associated with tradi­ pencil tests take the form of the familiar examination question
tional testing. The question arises, of course, as to how these dif­ paper. They are typically used for the assessment either of sepa­
ferent activities have developed, and what their principles of rate components of language knowledge (grammar, vocabulary
design might be. It is the purpose of this book to address these etc.) or of receptive understanding (listening and reading compre­
questions. hension). Test items in such tests, particularly if they are profes­
sionally made standardized tests, will often be in fixed response
Understanding language testing format in which a number of possible responses is presented from
There are many reasons for developing a critical understanding of which the candidate is required to choose. There are several types
the principles and practice of language assessment. Obviously of fixed response format, of which the most important is multiple
you will need to do so if you are actually responsible for language choice format, as in the following example from a vocabulary test:
test development and claim expertise in this field. But many other Select the most appropriate completion of the sentence.
people working in the field of language study more generally will
I wonder what the newspaper says about the new play. I must
want to be able to participate as necessary in the discourse of this
read the
field, for a number of reasons.
(a) criticism
First, language tests play a powerful role in many people’s lives,
(b) opinion
acting as gateways at important transitional moments in educa­
*(c) review
tion, in employment, and in moving from one country to another.
(d) critic
Since language tests are devices for the institutional control of
individuals, it is clearly important that they should be under­ Items in multiple choice format present a range of anticipated
stood, and subjected to scrutiny. Secondly, you may be working likely responses to the test-taker. Only one of the presented alter­
with language tests in your professional life as a teacher or natives (the key, marked here with an asterisk) is correct; the

SURVEY TESTING, TESTING


others (the distractors) are based on typical confusions or misun­ ticular points of grammar or vocabulary, for example). This will
derstandings seen in learners’ attempts to answer the questions not be the case if the syllabus is itself concerned with the outside
freely in try-outs of the test material, or on observation of errors world, as the test will then automatically reflect that reality in the
made in the process of learning more generally. The candidate’s process of reflecting the syllabus. M ore commonly though,
task is simply to choose the best alternative among those pre­ achievement tests are more easily able to be innovative, and to
sented. Scoring then follows automatically, and is indeed often reflect progressive aspects of the curriculum, and are associated
done by machine. Such tests are thus efficient to administer and with some of the most interesting new developments in language
score, but since they only require picking out one item from a set assessment in the movement known as alternative assessment.
of given alternatives, they are not much use in testing the produc­ This approach stresses the need for assessment to be integrated
tive skills of speaking and writing, except indirectly. with the goals of the curriculum and to have a constructive rela­
In performance based tests, language skills are assessed in an tionship with teaching and learning. Standardized tests are seen
act of communication. Performance tests are most commonly as too often having a negative, restricting influence on progressive
tests of speaking and writing, in which a more or less extended teaching. Instead, for example, learners may be encouraged to
sample of speech or writing is elicited from the test-taker, and share in the responsibility for assessment, and be trained to evalu­
judged by one or more trained raters using an agreed rating proce­ ate their own capacities in performance in a range of settings in a
dure. These samples are elicited in the context of simulations of process known as self-assessment.
real-world tasks in realistic contexts. Whereas achievement tests relate to the past in that they mea­
sure what language the students have learned as a result of teach­
Test purpose ing, proficiency tests look to the future situation of language use
Language tests also differ according to their purpose. In fact, the without necessarily any reference to the previous process of
same form of test may be used for differing purposes, although in teaching. The future ‘real life’ language use is referred to as the cri­
other cases the purpose may affect the form. The most familiar terion. In recent years tests have increasingly sought to include
distinction in terms of test purpose is that between achievement performance features in their design, whereby characteristics of
and proficiency tests. the criterion setting are represented. For example, a test of the
Achievem ent tests are associated with the process of instruc­ communicative abilities of health professionals in w ork settings
tion. Examples would be: end of course tests, portfolio assess­ will be based on representations of such workplace tasks as com­
ments, or observational procedures for recording progress on the municating with patients or other health professionals. Courses
basis of classroom w ork and participation. Achievement tests of study to prepare candidates for the test may grow up in the
accumulate evidence during, or at the end of, a course of study in wake of its establishment, particularly if it has an important gate-
order to see whether and where progress has been made in terms keeping function, for example admission to an overseas univer­
of the goals of learning. Achievement tests should support the sity, or to an occupation requiring practical second language
teaching to which they relate. Writers have been critical of the use skills.
of multiple choice standardized tests for this purpose, saying that
The criterion
they have a negative effect on classrooms as teachers teach to the
test, and that there is often a mismatch between the test and the Testing is about making inferences; this essential point is
curriculum, for example where the latter emphasizes perfor­ obscured by the fact that some testing procedures, particularly in
mance. An achievement test may be self-enclosed in the sense that performance assessment, appear to involve direct observation.
it may not bear any direct relationship to language use in the Even where the test simulates real world behaviour— reading a
world outside the classroom (it may focus on knowledge of par­ newspaper, role playing a conversation with a patient, listening to

SURVEY TESTING, TESTING


a lecture— test performances are not valued in themselves, but pencil tests, it is clear that the test performance does not exist for
only as indicators of how a person would perform similar, or its own sake. The test-taker is not really reading the newspaper
related, tasks in the real world setting of interest. Understanding provided in the test for the specific information within it; the test
testing involves recognizing a distinction between the criterion taking doctor is not really advising the ‘patient’ . As one writer
(relevant communicative behaviour in the target situation) and famously put it, everyone is aware that in a conversation used
the test. The distinction between test and criterion is set out for to assess oral ability ‘this is a test, not a tea party’ . The effect of
performance-based tests in Figure i . i test method on the realism of tests will be discussed further in
Chapter 3.
Characterization Criterion There are a number of other limits to the authenticity of tests,
Test
of the essential which force us to recognize an inevitable gap between the test and
A performance A series of the criterion. For one thing, even in those forms of direct perfor­
features of the
or series of performances mance assessment where the period in which behaviour is
criterion influences
performances, subsequent to observed is quite extended (for example, a teacher’s ability to use
the design of
simulating/ the test; the l he target language in class may be observed on a series of lessons
the test
representing or target with real students), there comes a point at which we have to stop
sampled from observing and reach our decision about the candidate— that is,
the criterion make an inference about the candidate’s probable behaviour in
inferences about —>► (unobservable) situations subsequent to the assessment period. While it may be
(observed)
likely that our conclusions based on the assessed lessons may be
valid in relation to the subsequent unobserved teaching, differ­
f i g u r e i.i Test and criterion ences in the conditions of performance may in fact jeopardize
Test performances are used as the basis for making inferences their validity (their generalizability). For example, factors such as
about criterion performances. Thus, for example, listening to a the careful preparation of lessons when the teacher was under
lecture in a test is used to infer how a person would cope with lis­ observation may not be replicated in the criterion, and the effect
tening to lectures in the course of study he/she is aiming to enter. of this cannot be known in advance. The point is that observation
It is important to stress that although this criterion behaviour, as of behaviour as part of the activity of assessment is naturally self-
relevant to the appropriate communicative role (as nurse, for limiting, on logistical grounds if for no other reason. In fact, of
example, or student), is the real object of interest, it cannot be course, most test situations allow only a very brief period of sam­
accounted for as such by the test. It remains elusive since it cannot pling of candidate behaviour— usually a couple of hours or so at
be directly observed. most; oral tests may last only a few minutes. Another constraint
There has been a resistance among some proponents of direct (>11 direct knowledge of the criterion is the testing equivalent of the
testing to this idea. Surely test tasks can be authentic samples of Observer’s Paradox: that is, the very act of observation may
behaviour? Sometimes it is true that the materials and tasks in change the behaviour being observed. We all know how tense
language tests can be relatively realistic but they can never be real. being assessed can make us, and conversely how easy it some­
For example, an oral examination might include a conversation, times is to play to the camera, or the gallery.
or a role-play appropriate to the target destination. In a test of In judging test performances then, we are not interested in the
English for immigrant health professionals, this might be between observed instances of actual use for their own sake; if we were,
a doctor and a patient. But even where performance test materials and that is all we were interested in, the sample performance
appear to be very realistic compared to traditional paper-and- would not be a test. Rather, we want to know what the particular

SURVEY TESTING, TESTING


performance reveals of the potential for subsequent performances relationship between scores given under the various categories.
in the criterion situation. We look so to speak underneath or Are the categories indeed independent? Test validation thus
through the test performance to those qualities in it which are involves two things. In the first place, it involves understanding
indicative of what is held to underlie it. how, in principle, performance on the test can be used to infer
If our inferences about subsequent candidate behaviour are performance in the criterion. In the second place, it involves using
wrong, this may have serious consequences for the candidate and empirical data from test performances to investigate the defensi-
others who have a stake in the decision. Investigating the defensi- bility of that understanding and hence of the interpretations (the
bility of the inferences about candidates that have been made on judgements about test-takers) that follow from it. These matters
the basis of test performance is known as test validation, and is the will be considered in detail in Chapter 5, on test validity.
main focus of testing research.
Conclusion
The test-criterion relationship In this chapter we have looked at the nature of the test-criterion
The very practical activity of testing is inevitably underpinned by relationship. We have seen that a language test is a procedure for
theoretical understanding of the relationship between the crite­ gathering evidence of general or specific language abilities from
rion and test performance. Tests are based on theories of the performance on tasks designed to provide a basis for predictions
nature of language use in the target setting and the w ay in which about an individual’s use of those abilities in real world contexts.
this is understood will be reflected in test design. Theories of All such tests require us to make a distinction between the data of
language and language in use have of course developed in very the learner’s behaviour, the actual language that is produced in
different directions over the years and tests will reflect a variety of lest performance, and what these data signify, that is to say what
theoretical orientations. For example, approaches which see per­ they count as in terms of evidence of ‘proficiency’, ‘readiness for
formance in the criterion as an essentially cognitive activity will communicative roles in the real w orld’, and so on. Testing thus
understand language use in terms of cognitive constructs such as necessarily involves interpretation of the data of test performance
knowledge, ability, and proficiency. On the other hand, ap­ as evidence of knowledge or ability of one kind or another. Tike
proaches which conceive of criterion performance as a social and 1 he soothsayers of ancient Rome, who inspected the entrails of
interactional achievement will emphasize social roles and interac­ slain animals in order to make their interpretations and subse­
tion in test design. This will be explored in detail in Chapter 2. quent predictions of future events, testers need specialized know l­
However, it is not enough simply to accept the proposed rela­ edge of what signs to look for, and a theory of the relationship of
tionship between criterion and test implicit in all test design. (hose signs to events in the world. While language testing resem­
Testers need to check the empirical evidence for their position in bles other kinds of testing in that it conforms to general principles
the light of candidates’ actual performance on test tasks. In other and practices of measurement, as other areas of testing do, it is
words, analysis of test data is called for, to put the theory of the distinctive in that the signs and evidence it deals with have to do
test-criterion relationship itself to the test. For example, current specifically with language. We need then to consider how views
models of communicative ability state that there are distinct about the nature of language have had an impact on test design.
aspects of that ability, which should be measured in tests. As a
result, raters of speaking skills are sometimes required to fill in a
grid where they record separate impressions of aspects of speak­
ing such as pronunciation, appropriateness, grammatical accu­
racy, and the like. Using data (test scores) produced by such
procedures, we will be in a position to examine empirically the

10 SURVEY TESTING, TESTING

You might also like